# Proposal: Hardware Abstraction and Cloud Deployment

How capOS goes from "boots in QEMU" to "boots on a real cloud VM" (GCP, AWS,
Azure). This covers the hardware abstraction infrastructure missing between the
current QEMU-only kernel and real x86_64 hardware, plus the build system
changes needed to produce deployable images.


**Depends on:** Kernel Networking Smoke Test (for PCI enumeration), Stage 5
(for timer history), Stage 7 / SMP proposal Phase C (for LAPIC timer and IPI).

**Complements:** Networking proposal (extends virtio-net toward cloud NICs),
Storage proposal (extends local block-device work toward virtio-scsi and NVMe),
SMP proposal (LAPIC timer/IPI infrastructure shared, with x2APIC tracked as a
later backend).

---

## Current State

The kernel boots via Limine UEFI, outputs to COM1 serial, has QEMU legacy PCI
enumeration for the virtio-net smoke path, and has LAPIC timer/IPI groundwork
from the SMP track. It also has an initial bounded, read-only ACPI diagnostic
parser for Limine RSDP, RSDT/XSDT table inventory, MADT summaries, and MCFG
presence/allocation summaries, plus a Q35 smoke that proves the reusable PCI
config backend can enumerate a capped PCIe ECAM function inventory from MCFG.
The x86 path exports bounded MADT I/O APIC/source-override records, maps the
I/O APIC, and programs masked legacy IRQ routes to LAPIC vectors while honoring
source overrides. PCI drivers can validate and map memory BAR subregions through
a shared kernel helper; the virtio-net modern transport uses that helper for its
common, notify, ISR, and device configuration regions. The PCI capability walk
also reports MSI/MSI-X metadata for the virtio-net function, and the QEMU net
smoke uses that metadata for a bounded kernel-owned virtio-net MSI-X
dispatch/unmask and lifecycle proof through the device MSI vector pool; the
remaining `run-net` fixture also covers queue setup, descriptor guards, ARP, and
ICMP. Device-autonomous virtio-net MSI-X delivery is covered by the dedicated
userspace-provider gates after the kernel L4 owner is retired.

The cloudboot image/harness slice landed in commit `02635421`
(`2026-05-05 06:51 UTC`):
`make capos-cloudboot-image` builds the importable raw disk tarball and
`make cloudboot-test` drives the GCE upload/import/temporary-instance/serial-log
loop with teardown. The first GCP imported-image serial-console boot proof is
run `1778230874-715a` (`2026-05-08 09:06 UTC`) against source commit
`3951e275` (`2026-05-08 08:50 UTC`), reaching the `capos kernel starting`
serial landmark on a temporary no-public-IP, no-service-account/scopes
`e2-small` instance before teardown.

It still lacks public L4/SSH/WebShell ingress, AWS/Azure boot proofs and
provider drivers, broader storage variants, high-throughput/multiqueue NIC
readiness, direct-remapping DMA, production cloud-image release paths, and a
cloud-ready clocksource/clockevent closeout. The GCP-first provider rollup has
live serial-console operator access, selected NIC raw-frame reachability,
selected NVMe Persistent Disk I/O, and gVNIC portability evidence.

The GCP-first usable cloud-instance provider rollup is closed by
`docs/tasks/done/2026-06-07/cloud-usable-instance-provider-nic-storage.md`.
Do not cite the cloudboot harness or the first GCP serial-console boot alone as
evidence for provider NIC/storage readiness; the closeout depends on separate
live NIC, storage, operator-access, and gVNIC evidence records. AWS/Azure,
public ingress, and production cloud-image release gates remain separate.

### Trusted Build Inputs And Reproducibility Cross-Links

Cloud deployment depends on the same trusted-build-inputs inventory that
covers local builds. The consolidated supply-chain risk view -- floating Rust
nightly, observed-not-pinned `xorriso` / `qemu-system-x86_64` / OVMF, CI
publication and comparison of build-provenance records, and pinned production
runner identity -- is tracked as R13 in `docs/design-risks-register.md`; the
detailed inventory, dependency policy, vendored-snapshot table, and the
build-provenance retention/comparison policy live in
`docs/trusted-build-inputs.md`. This proposal is recorded as a secondary
owner of R13 because cloud-image release paths and provider-driver bring-up
both depend on those reproducibility gates.

The implication for cloud bring-up is concrete: imported cloud images must
travel with the corresponding `make build-provenance` record (source commit,
toolchain identity, embedded-binary hashes, OVMF identity or explicit
absence) before any provider serial-console run is cited as production
evidence. Until the R13 gates close, cloud images remain local/CI proof
artifacts rather than third-party reproducible boot images.

### What Cloud VMs Provide

GCP (n2-standard), AWS (m6i/c7i), and Azure (Dv5) all expose:

| Resource | Cloud interface | capOS status |
|---|---|---|
| Boot firmware | UEFI (all three) | Limine UEFI works |
| Serial console | COM1 0x3F8 | Works (serial.rs) |
| Boot media | Hybrid BIOS+UEFI raw disk image, packaged per provider import rules | **Partial** (`make capos-cloudboot-image` builds a GCE-importable raw disk tarball; production release packaging and non-GCP provider packaging remain future) |
| Storage | virtio-scsi or NVMe (GCP Persistent Disk), NVMe/EBS (AWS Nitro), managed disks | **Partial** (GCP NVMe Persistent Disk brokered `READ` proof landed; GCP virtio-scsi, Local SSD, AWS/Azure storage, and broader filesystem-backed cloud storage remain future) |
| NIC | virtio-net or gVNIC (GCP), ENA (AWS), MANA (Azure) | **Partial** (GCP legacy virtio-net raw-frame `provider-nic-bound` and gVNIC raw-frame / typed-Nic proofs landed; public ingress, high-throughput/multiqueue, ENA, and MANA remain future) |
| Virtio NIC | QEMU, GCP where selectable, some bare-metal | **Partial** (QEMU smoke; reusable/cloud path planned) |
| Timer | LAPIC timer, TSC, HPET | **Partial** (LAPIC timer groundwork; cloud clocksource work missing) |
| Interrupt delivery | I/O APIC, MSI/MSI-X | **Partial** (masked MADT-backed I/O APIC routes, MSI/MSI-X capability metadata, and bounded kernel-owned virtio-net MSI-X dispatch/lifecycle proof; I/O APIC ownership and userspace interrupt authority missing) |
| Device discovery | ACPI + PCI/PCIe | **Partial** (QEMU legacy PCI smoke, bounded ACPI diagnostics/routing state, reusable legacy/ECAM PCI config access, kernel BAR/MMIO validation, MSI/MSI-X metadata discovery, and bounded virtio-net MSI-X dispatch proof; broader driver authority still missing) |
| Display | None (headless) | N/A |

### Cloud NIC And Storage Portability Notes

The Device Driver Foundation is not complete just because QEMU `virtio-net`
works. Cloud bring-up has provider-specific NIC and storage surfaces, and the
first implementation slices must keep those differences visible while still
deferring the actual provider drivers.

| Provider path | Expected device surface | capOS dependency | Current state |
| --- | --- | --- | --- |
| QEMU / constrained GCP virtio-net | Virtio PCI transport, virtqueues, MSI-X where available | Shared virtio transport helpers, `DMAPool`, `DeviceMmio`, `Interrupt`, and queue lifecycle proofs | QEMU virtio-net proofs and the live GCE legacy virtio-net raw-frame `provider-nic-bound` proof landed. This does not claim public L4 ingress, high-throughput/multiqueue readiness, or device-autonomous MSI-X completion delivery |
| GCP gVNIC | [gVNIC](https://cloud.google.com/compute/docs/networking/using-gvnic) as the modern Compute Engine NIC, replacing virtio-net on newer machine generations and required for some features | PCI BAR/MMIO binding, MSI-X routing, per-queue ring setup, image metadata declaring `GVNIC`, and fallback choice between virtio-net and gVNIC by machine family | Grounding plus bounded live proofs landed: the [`docs/devices/gvnic.md`](../devices/gvnic.md) provenance map records the spec basis and authority mapping, the GCE harness can request `GVNIC` image/instance posture and inventory the `1ae0:0042` PCI function, the admin-queue/register proof maps BAR0 and issues one `DESCRIBE_DEVICE`, the raw-frame proof configures one GQI/QPL TX/RX queue pair, and the typed `Nic` adaptation proof exercises inline-frame `Nic.transmit` / `Nic.receive` over live gVNIC. No QEMU gVNIC model exists. This remains a separate GCE portability lane, not a blocker for the first public Web UI proof on a virtio-compatible machine type |
| AWS Nitro ENA + EBS | [ENA enhanced networking](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) plus [Nitro NVMe storage](https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-nitro-instances.html) | ENA queue/MSI-X driver, NVMe controller/storage path, IOMMU or bounce-buffer policy, and image import with ENA/NVMe expectations | Planned; no ENA, NVMe EBS, or AWS boot proof |
| Azure Accelerated Networking | [Accelerated Networking](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview) exposes SR-IOV hardware families, with MANA as the newer Azure NIC and Mellanox mlx4/mlx5 still relevant on some hosts | Synthetic-interface fallback awareness, VF binding/revocation handling, MANA/Mellanox driver binding, MSI-X routing, and reset/revoke paths that survive VF removal | Planned; no MANA, Mellanox VF, or Azure boot proof |

These rows are planning gates, not implementation evidence. Each provider NIC
has its own queue layout, feature negotiation, MSI-X/vector conventions, reset
behavior, and driver-binding rules. Azure's accelerated-networking path also
requires the OS and applications to tolerate dynamic SR-IOV VF revocation by
falling back to the synthetic network interface. Provider storage follows the
same rule: AWS Nitro uses NVMe for EBS, GCP can require NVMe on newer or
Confidential VM paths while retaining virtio-scsi on older paths, and Azure
uses SCSI on many older families while Azure Boost and newer NVMe-capable VM
families expose managed disks through NVMe. The shared foundation therefore
needs ACPI/PCIe discovery, BAR validation, interrupt ownership, `DMAPool`
accounting, IOMMU/bounce-buffer policy, and lifecycle teardown before any cloud
NIC or storage driver is treated as portable.

### What Already Works

- **UEFI boot** -- Limine ISO includes `BOOTX64.EFI`. The boot path itself
  is cloud-compatible.
- **Serial output** -- all three clouds expose COM1. `gcloud compute
  instances get-serial-port-output`, `aws ec2 get-console-output`, and Azure
  serial console all read from it.
- **x86_64 long mode** -- cloud VMs are KVM-based x86_64. Architecture matches.

---

## Managed Application Services

Booting capOS on a cloud VM and using managed cloud services are separate
tracks. The VM path proves hardware, disk, network, and serial behavior. Managed
services can be useful earlier for application persistence, especially game
profile/world state, as long as they sit behind narrow capOS service
capabilities.

For a GCP-backed adventure persistence bridge:

- Cloud Run hosts a small bridge endpoint. It translates capOS save/load/append
  requests into provider calls and enforces request bounds before touching cloud
  APIs.
- Cloud KMS owns the key-encrypting keys (KEKs) for each game-world instance or
  shard. The bridge or game-world service gets narrow authority to wrap or
  unwrap data-encrypting keys (DEKs) through Cloud KMS envelope encryption.
  Ordinary browser clients do not receive DEKs, game-world key capabilities, KMS
  decrypt/unwrap grants, or provider-independent plaintext authority; provider
  storage objects contain ciphertext, wrapped DEKs, and metadata only.
- Firestore Native mode stores mutable profile summaries, indexes, and
  compare-and-set version records.
- Cloud Storage stores larger immutable snapshots, evidence blobs, exports, and
  content-addressed records. Object versioning and lifecycle policy are required
  before using it for durable game data.
- Secret Manager stores bridge-side provider credentials and rotation material.
  Those secrets are never granted to ordinary capOS game clients.

This does not change the storage proposal's rule: persistence is still
application-level serialization of bounded Cap'n Proto records. The cloud bridge
is just one backing implementation for `Store`, `Namespace`, or an
app-specific `AdventureSaveStore`/`CloudGameStore` capability. Local fake-cloud
tests must enforce stale-write rejection, wrong-profile rejection, append-only
ledger behavior, and size bounds before a real GCP deployment is trusted.

A separate browser-mediated path can serve user-owned private backups. In that
model, the browser or web terminal host authenticates the user to Google, stores
encrypted save capsules in Drive `appDataFolder` or Firebase user documents, and
returns only opaque provider handles and encrypted capsule bytes through
explicit restore flows. DEK unwrap and plaintext validation happen in the local
capOS key domain or in the game-world service with KMS/IAM authority, not in
browser JavaScript.
This is appropriate for user profile backup, private expedition checkpoints,
and settings sync. It is not appropriate for authoritative public world state,
reward witness records, market receipts, or multiplayer outcomes. The user's
browser holds provider tokens; capOS game services do not. For GCP-backed game
worlds, the browser transports envelope-encrypted capsules with wrapped DEKs but
does not hold game-world key capabilities, KMS decrypt/unwrap grants, DEKs, or
plaintext authority.

Firebase user-document capsule paths must make the auth binding visible in the
path template, not just in policy metadata. Use a narrow shape such as
`users/{request.auth.uid}/saveCapsules/{capsule_id}` so Firestore rules can
bind the user wildcard to `request.auth.uid`; literal profile names such as
`users/alice/...` are not accepted by the capOS policy model. Firestore rules
remain access control for opaque encrypted capsules only. They must not be
treated as validation for decrypted adventure semantics, and path segments must
respect Firestore ID constraints such as no `.`, no `..`, no `__.*__`, and the
1,500-byte collection/document ID limit.

### GCP Cloud KMS And IAM Notes For Adventure Saves

GCP-backed adventure save capsules follow the same envelope-encryption model as
`CloudKmsKeySource` and the volume-encryption proposal: Cloud KMS holds a
key-encrypting key (KEK), the game-world service owns the capsule
data-encrypting key (DEK), and KMS `Encrypt`/`Decrypt` wraps or unwraps that
DEK rather than bulk-encrypting capsule bytes. Provision one Cloud KMS key ring
and one symmetric CryptoKey KEK per game-world instance or shard. The key ring
is an administrative grouping boundary; ordinary runtime authority should be
granted on the CryptoKey resource where possible, not at the project or key-ring
level. Do not claim key-version-scoped IAM as a design primitive for this path:
predefined Cloud KMS crypto roles have `CryptoKey` as their lowest grantable
resource.

Service accounts are split by operation:

- Writers that only create new ciphertext receive
  `roles/cloudkms.cryptoKeyEncrypter` on the configured game-world CryptoKey so
  they can wrap a freshly generated DEK.
- Restore, validation, and migration workers that must read protected capsules
  receive `roles/cloudkms.cryptoKeyDecrypter` on that CryptoKey so they can
  unwrap an existing DEK.
- The narrow game-world service account receives
  `roles/cloudkms.cryptoKeyEncrypterDecrypter` only when the same service must
  both wrap and unwrap DEKs. Avoid `roles/cloudkms.cryptoOperator`,
  project-wide grants, owner/editor roles, browser OAuth identities, and
  service-agent roles for ordinary adventure runtime access.

The browser-vault boundary does not change. Browser JavaScript may carry
ciphertext, wrapped DEKs, capsule metadata, and opaque Drive/Firebase provider
handles. It must not receive plaintext DEKs, capOS `SymmetricKey` or
`KeySource` capabilities, Cloud KMS decrypt/unwrap grants, service account
credentials, or provider-independent plaintext. The game-world service may use
the unwrapped DEK internally as service authority, modeled as a `SymmetricKey`
capability, but that authority does not cross into browser JavaScript.
Possession of a Drive file id or Firebase document path is only transport
authority over opaque encrypted bytes.

Rotation creates a new primary KEK version for future DEK wrapping. It does not
re-encrypt existing capsules, rewrite wrapped DEK blobs, or disable/destroy old
key versions automatically. Capsule re-encryption or rewrapping is a managed
game-world service operation: unwrap the old DEK while its KEK version remains
enabled and authorized, decrypt and validate the capsule inside the service,
then write a new capsule using a new DEK or a DEK rewrapped by the current
primary KEK version. The service verifies content hashes and ledger/profile
bindings before replacing capsule metadata. Old KEK versions should only be
disabled or scheduled for destruction after inventory proves no accepted wrapped
DEK still depends on them.

Retiring a game-world first removes IAM decrypt authority from the world service
and migration workers. If the retirement is meant to make existing capsules
inaccessible, disable the relevant key versions and record the expected outage
and recovery procedure before doing it. Destruction is delayed by Cloud KMS'
scheduled destruction period and is irreversible once completed, so destroy key
versions only after audit retention, export, and break-glass recovery decisions
are recorded. Disabling or destroying a key version can make all capsules that
depend on it unreadable; this is a revocation tool, not cleanup.

---

## Phase 1: Bootable Disk Image And Serial Diagnostics

**Goal:** Produce a raw hybrid BIOS+UEFI disk image that can boot locally and
can be packaged for cloud import, alongside the existing ISO for QEMU. The
first cloud-visible proof is serial-console boot to init/diagnostics, not
network shell access.

### The Problem

Cloud VMs boot from disk images, not ISOs. Each cloud has provider-specific
format and boot-mode rules:

| Cloud | Image format | Import method |
|---|---|---|
| GCP | `disk.raw` in gzip `.tar.gz` using old GNU tar; raw size in 1 GiB increments | `gcloud compute images create --source-uri=gs://...` |
| AWS | raw, VMDK, VHD/VHDX, or OVA | `aws ec2 import-image` with explicit boot-mode notes |
| Azure | VHD (fixed size) | `az image create --source` |

GCP's manual import path documents a functional MBR partition table or a
hybrid GPT+MBR bootloader configuration for imported boot disks, plus ACPI
support. AWS VM Import/Export supports both UEFI and legacy BIOS boot modes,
but UEFI imports need a fallback EFI binary at `/EFI/BOOT/BOOTX64.EFI`; Nitro
instances generally expect NVMe storage and ENA networking for useful
operation. Therefore the first capOS image target should be a **hybrid
BIOS+UEFI raw disk**: an ESP for UEFI fallback boot and a BIOS/MBR-compatible
Limine path for import paths that still validate MBR bootability.

### Disk Layout

```
Hybrid raw disk image (1 GiB-aligned for cloud packaging)
  Protective/hybrid MBR + GPT
  Partition 1: EFI System Partition (FAT32, ~32 MB)
    /EFI/BOOT/BOOTX64.EFI     (Limine UEFI loader)
    /limine.conf               (bootloader config)
    /boot/kernel               (capOS kernel ELF)
    /boot/init                 (init process ELF)
  Partition 2: (reserved for future use -- persistent store backing)
```

### Build Tooling

New Makefile target `make image` using standard tools:

```makefile
IMAGE := capos.img
IMAGE_SIZE := 1024  # MB, keeps GCP raw image packaging simple

image: kernel init $(LIMINE_DIR)
	# Create raw disk image
	dd if=/dev/zero of=$(IMAGE) bs=1M count=$(IMAGE_SIZE)
	# Partition with GPT + ESP; keep room for hybrid/MBR boot metadata.
	sgdisk -n 1:2048:+32M -t 1:ef00 $(IMAGE)
	# Format ESP as FAT32, copy files
	# (mtools or loop mount + mkfs.fat)
	mformat -i $(IMAGE)@@1M -F -T 65536 ::
	mcopy -i $(IMAGE)@@1M $(LIMINE_DIR)/BOOTX64.EFI ::/EFI/BOOT/
	mcopy -i $(IMAGE)@@1M limine.conf ::/
	mcopy -i $(IMAGE)@@1M $(KERNEL) ::/boot/kernel
	mcopy -i $(IMAGE)@@1M $(INIT) ::/boot/init
	# Install Limine BIOS path as well as UEFI fallback files.
	$(LIMINE_DIR)/limine bios-install $(IMAGE)
```

New QEMU target to test disk boot locally:

```makefile
run-disk: $(IMAGE)
	qemu-system-x86_64 -drive file=$(IMAGE),format=raw \
		-bios /usr/share/edk2/x64/OVMF.4m.fd \
		-display none $(QEMU_COMMON); \
	test $$? -eq 1
```

Cloud upload helpers (scripts, not Makefile targets):

```bash
# GCP
cp capos.img disk.raw
tar --format=oldgnu -Sczf capos.tar.gz disk.raw
gcloud storage cp capos.tar.gz gs://my-bucket/
gcloud compute images create capos --source-uri=gs://my-bucket/capos.tar.gz

# AWS
aws ec2 import-image --disk-containers \
  "Format=raw,UserBucket={S3Bucket=my-bucket,S3Key=capos.img}" \
  --boot-mode uefi
```

Serial diagnostics are part of Phase 1 rather than a later convenience. The
cloud bring-up loop should be:

1. `make run-disk` proves the hybrid image under local QEMU/OVMF.
2. a local BIOS-mode disk run proves the MBR/Limine path if provider import
   requires it;
3. a serial diagnostics prompt is reachable on COM1 in QEMU;
4. GCP/AWS imported instances reach the same prompt through provider serial
   console output.

The serial diagnostics prompt should expose bounded read-only commands for
`status`, `cpu`, `mem`, `acpi`, `pci`, `irq`, `timers`, `devices`, and `logs`,
plus `reboot`/`halt`. It is the early remote debugging path for cloud driver
bring-up before NICs or disks are reliable. It should not be required to upload
large binaries, replace kernels in place, or stream high-volume tracing through
cloud serial consoles.

### Dependencies

- `sgdisk` (gdisk package) -- GPT partitioning
- `mtools` (mformat, mcopy) -- FAT32 manipulation without root/loop mount

### Scope

Makefile/helper script work for the image plus a narrow diagnostics-mode
surface. Kernel changes are limited to serial diagnostics and any boot path
adjustments needed for disk images; network and block drivers remain later
phases.

### Phase 0 closeout: GCE harness landed (2026-05-05 06:51 UTC)

Commit `02635421` (`2026-05-05 06:51 UTC`) records this harness closeout.

The first build-and-boot leg of Phase 1 landed as the cloud-boot harness.
`make capos-cloudboot-image` produces a 10 GiB GPT-partitioned `target/disk.raw`
with a 128 MiB FAT32 EFI System Partition holding the Limine UEFI loader,
`limine.conf`, the kernel ELF, and the manifest, plus the Limine BIOS stage 2
embedded in the GPT for legacy SeaBIOS boot. The disk is repackaged as
`target/capos-disk.tar.gz` using `tar --format=oldgnu -czf`, the exact form
GCE's manual import path expects. Disk size is enforced as an exact multiple
of 1 GiB.

`tools/cloudboot/run-test.sh` (also wired as `make cloudboot-test`) drives the
end-to-end loop on a sandbox GCE project: an idempotent orphan sweep on a
configured project-pinned label, a staging tarball upload, image creation,
instance creation with no public IP, no service account, no API scopes, the
same project-pinned label set, and the configured sandbox subnet, then
serial-port polling for the `capos kernel starting` landmark with a hard
wall-clock budget. Serial output is captured under
`target/cloudboot-evidence/run-<id>/serial.log` BEFORE teardown, and a bash
trap on `EXIT INT TERM` always deletes the instance, image, and staged
tarball even on signal or partial failure. The harness hard-fails if the
active project name does not match the configured sandbox.

Sandbox project name, subnet, staging bucket, and the IAM custom roles the
harness assumes are operational details that depend on the host environment;
they belong in `tools/cloudboot/README.md` and operator-local configuration,
not in this proposal.

This is the harness only. The recurring portability gate that records cloud
boot evidence on every reviewed cloud-relevant change remains open as
`docs/backlog/hardware-boot-storage.md` Task 6, and the userspace driver
authority gate remains open under DDF Task 5.

### First GCP serial-console boot proof (2026-05-08 09:06 UTC)

The first imported-image GCP serial-console proof reached
`capos kernel starting` as run `1778230874-715a` at `2026-05-08 09:06 UTC`,
against source commit `3951e275` from `2026-05-08 08:50 UTC`. The run used
the cloudboot harness to import the staged disk image, create a temporary
`e2-small` instance with no public IP and no service account/scopes, poll
serial output for the kernel-start landmark, save the serial log under the
run evidence directory, and tear down the temporary instance/image/staging
objects.

This proves imported-image firmware/bootloader/kernel serial reachability on
one GCP sandbox run only. It does not prove a usable cloud instance, provider
NIC or storage drivers, cloud clocking, persistence, SSH/network shell access,
AWS/Azure import, or production cloud readiness.

### Private Web UI Reachability Evidence Contract

The first self-hosted Web UI provider proof is private GCE reachability, not
operator browser exposure. The behavior task
[`cloud-gce-private-self-hosted-webui-proof`](../tasks/on-hold/cloud-gce-private-self-hosted-webui-proof.md)
extends `tools/cloudboot/run-test.sh` with `--require-web-ui-proof` only after
the local Web UI L4 proof, DHCP/IPv4 configuration, and Web UI hardening tasks
are closed. This proposal defines the evidence contract for that later behavior
slice; it does not authorize a billable GCE run, a public endpoint, broad
firewall changes, TLS certificate provisioning, service-account broadening, or a
production release.

The proof must keep the current cloudboot posture unless the behavior task is
explicitly amended: no public IP on the capOS VM, no service account, no API
scopes, no public firewall rule, and teardown through the existing orphan-sweep
and `EXIT INT TERM` trap discipline. The reachability probe must cross the live
GCE virtual network boundary. Acceptable shapes include a same-VPC probe
instance, a provider-supported internal probe path, or another reviewed private
path that sends packets through the capOS VM's GCE NIC and private endpoint.

Evidence classes stay separate:

| Evidence class | What it can prove | What it cannot prove |
|---|---|---|
| Cloudboot-only | The image imports, boots, emits serial markers, and tears down provider resources | Web UI reachability over the provider network |
| Provider-private | A private probe reaches `remote-session-web-ui` through the live GCE NIC and Phase C L4 path | Public operator access, TLS readiness, DNS readiness, or browser production posture |
| Operator-exposure | A separately authorized public or browser-mediated path reaches the Web UI under the selected ingress policy | The private proof by itself; it must depend on the private proof instead |

The private Web UI proof records, before teardown, at least:

| Field | Requirement |
|---|---|
| Run identity | Cloudboot run id plus source commit or image provenance used for the imported image |
| Machine shape | GCE machine family/type, NIC selection posture, and zone |
| Private posture | `public_ip=false` or equivalent, service-account/scopes posture, and no public firewall rule |
| Private endpoint | Internal IP or provider-private endpoint, UI port, and probe source identity |
| Probe path | Same-VPC probe, provider-supported internal probe, or other reviewed private path that crosses the GCE virtual network boundary |
| Web UI marker | A run-unique Web UI response marker, header, or body token observed by the private probe |
| Phase C L4 marker | The `remote-session-web-ui` Phase C L4 evidence marker, such as `cloudboot-evidence: remote-session-web-ui-l4 <token>`, tied to the same source commit/image |
| Private proof marker | A final structured marker, such as `cloudboot-evidence: gce-private-self-hosted-webui <token>`, emitted only after the private probe succeeds |
| Teardown | Instance, image, staged object, probe resources, and any private firewall or route resources created by the run were deleted or reported as a failed run |

#### Private Proof Runbook Checklist

The future `--require-web-ui-proof` harness gate closes provider-private Web UI
reachability only when the run records these steps in order:

1. Preflight confirms the local Web UI L4 proof, DHCP/IPv4 proof, session
   hardening, and connection-bound prerequisites are closed, and confirms that
   the run has current authorization for billable private GCE execution.
2. Image/source provenance records the cloudboot run id, source commit, imported
   image or staged object identity, and the local artifact set used for the VM.
3. Launch posture records the zone, machine type, NIC posture, no public IP,
   no service account or API scopes, and no public firewall rule.
4. Probe setup records the private endpoint, UI port, probe source identity, and
   same-VPC or provider-supported private path that crosses the GCE virtual
   network boundary.
5. The private probe fetches the Web UI over that provider-private path and
   records a run-unique response marker, header, or body token.
6. The serial or harness evidence ties the same run to the Phase C L4 marker
   for `remote-session-web-ui`, such as
   `cloudboot-evidence: remote-session-web-ui-l4 <token>`, from the same source
   commit/image.
7. The harness emits the private proof marker, such as
   `cloudboot-evidence: gce-private-self-hosted-webui <token>`, only after the
   provider-private probe and L4-marker correlation both succeed.
8. Teardown removes the VM, imported image, staged object, probe resources, and
   any private firewall or route resources created by the run, using the normal
   orphan-sweep and trap discipline.
9. Failed-run reporting preserves the run id, failure class, last observed
   private posture, teardown result, and whether any loopback, same-guest, or
   serial-only diagnostics passed without treating those diagnostics as a
   provider-private proof.

#### No-Spend Preflight (Step 1, Landed as a Local Gate)

Step 1 of the checklist is implemented and testable today without any provider
mutation: `tools/cloudboot/run-test.sh --require-web-ui-proof --preflight-only`
runs the local no-spend preflight and exits before the harness access probe,
orphan sweep, upload, image import, instance launch, firewall mutation, or any
probe resource. It validates that the local prerequisite proofs are done
(`cloud-prod-remote-session-web-ui-l4-local-proof`,
`remote-session-web-ui-session-hardening`,
`remote-session-web-ui-connection-bounds`, and the legacy-datapath serving
prerequisite `cloud-gce-legacy-virtio-webui-serving-local-proof`), that an
operator supplied a firewall-IAM attestation (the documented live blocker), and
that a current per-run billable authorization is present, emitting one
structured `cloudboot-webui-preflight:` line per check naming the failure class
without printing credentials or attestation values. `make
cloudboot-gce-private-webui-preflight-check` is the fixture gate proving the
safe failure paths and that no provider CLI is invoked on any preflight path
(`tools/cloudboot/README.md` documents the inputs and failure classes). A
preflight pass is cloudboot-only evidence -- the output labels itself
`evidence-class=cloudboot-local-preflight` -- and is neither the
provider-private proof nor authorization for a billable run. The live
`--require-web-ui-proof` gate remains unimplemented and fails closed without
`--preflight-only`.

#### Evidence-Grammar Fixture (Local Gate)

The closeout evidence grammar for the table above is also locally testable
without any provider mutation:
`tools/cloudboot/validate-private-webui-evidence.sh` validates a
harness-rendered evidence report for field completeness, marker ordering (the
private proof marker only after the recorded private-probe pass and the
correlated `remote-session-web-ui-l4` marker), run/source identity agreement,
private posture, and teardown result, and rejects loopback-only, serial-only,
same-guest, public-IP, public-firewall, and missing-teardown evidence with
structured failure classes. `make
cloudboot-gce-private-webui-evidence-fixture-check` is the fixture gate
(`tools/cloudboot/README.md` documents the report grammar and failure
classes). A pass is
`evidence-class=cloudboot-local-private-webui-evidence-fixture` with an
explicit `provider-private-reachability=not-proven` label: it proves only that
a future successful run's evidence will be parsed, ordered, and classified
correctly, not that any provider-private probe has run.

Loopback-only checks (`127.0.0.1`, guest-local `localhost`, or an in-guest HTTP
health request) are supplemental service-health evidence. They may help diagnose
a failed run, but they do not close `cloud-gce-private-self-hosted-webui-proof`
because they do not prove the provider NIC, VPC routing, private endpoint, or
probe-to-VM packet path. Serial-only markers are likewise insufficient for the
private Web UI proof unless the private probe also succeeds and the harness
records the required provider-private fields.

The public ingress policy below remains a later authorization boundary. Closing
the private proof does not permit a public IP, load balancer, DNS name, TLS
certificate, Identity-Aware Proxy, operator browser exposure, or widened service
account. Public browser-facing exposure must reference the private proof as an
input and then satisfy the separate public-ingress policy and on-hold approval
gate.

---

## Public Web UI Ingress Policy (First Operator-Access Proof)

The cloudboot harness intentionally launches with no public IP, no service
account, and no API scopes. Exposing the self-served capOS Web UI
(`remote-session-web-ui`, see
[remote-session-capset-client.md](../backlog/remote-session-capset-client.md)
Gate 1B) to an operator browser is therefore a separate, reviewed exposure
decision, not a follow-on of the private reachability proof. This section is the
selected policy that the first public-ingress behavior task
([`cloud-gce-public-self-hosted-webui-ingress-tls`](../tasks/on-hold/cloud-gce-public-self-hosted-webui-ingress-tls.md))
builds against, decided by
[`cloud-gce-public-webui-ingress-tls-policy-design`](../tasks/done/2026-06-03/cloud-gce-public-webui-ingress-tls-policy-design.md).

### Selected Ingress Shape: Provider-Terminated HTTPS Load Balancer

The first public proof uses a **GCP external Application Load Balancer that
terminates HTTPS at the Google front end**. capOS serves only plain HTTP/1.1 on
its UI backend port; the operator browser reaches the UI exclusively through the
load balancer's HTTPS virtual IP and hostname. TLS is terminated by Google's
front end against a managed certificate; capOS never holds the TLS private key
and never parses hostile TLS bytes in this proof.

```mermaid
graph LR
    B[Operator browser] -- HTTPS --> LB[GCP external HTTPS<br/>Application Load Balancer<br/>Google-managed cert]
    LB -- HTTP, health-check-scoped firewall --> NEG[Zonal NEG / backend service]
    NEG --> VM[capOS VM<br/>remote-session-web-ui :8080<br/>plain HTTP/1.1, no public IP]
    style LB fill:#2d5,stroke:#333
    style VM fill:#2d5,stroke:#333
```

Why this shape is the first proof rather than direct capOS TLS termination:

- **No capOS TLS termination stack exists yet.** The Phase-1 certificate
  verifier has landed, but the capability-native TLS termination model
  (`TlsServerConfig`, ACME issuance, OCSP stapling, and private-key custody) is
  not landed in
  [certificates-and-tls-proposal.md](certificates-and-tls-proposal.md), and the
  userspace L4 network stack has not yet completed full `TcpSocket` relocation.
  The ACME/Let's Encrypt successor path is decomposed, but it still depends on
  minimal `PrivateKey` / `KeyVault` / `KeySource` custody, server-side TLS, the
  RFC 8555 client, the scoped `http-01` solver, and
  `CertificateStore.watch` renewal. A direct external IP would put capOS's
  nascent userspace HTTP parser at the first byte of hostile internet traffic
  with no TLS and no reviewed key custody.
- **Least privilege and reversibility.** Provider-terminated TLS keeps the VM
  with no public IP, no inbound `0.0.0.0/0`, and no private-key custody in either
  capOS or the harness. Teardown is the deletion of a bounded set of provider
  resources, not the rotation of an exposed key.
- **Clean successor path.** When the capability-native TLS stack and an ACME
  flow ship, the direct-external-IP / capOS-terminated shape becomes available
  as a second, separately reviewed ingress. This proof does not foreclose it; it
  is the bootstrap step before it. The interim posture is recorded as
  "Bootstrap TLS for the First Public GCE Web UI" in
  [certificates-and-tls-proposal.md](certificates-and-tls-proposal.md), and the
  public GCE successor task is
  [`cloud-gce-public-webui-letsencrypt-direct-termination-proof`](../tasks/on-hold/cloud-gce-public-webui-letsencrypt-direct-termination-proof.md).
  That successor requires a controlled public DNS name plus explicit
  billable/public-ingress authorization, and any Let's Encrypt production call
  requires explicit CA authorization.

Raw public HTTP is **not** acceptable closeout evidence. If port 80 is published
at all, it exists only as an HTTP-to-HTTPS 301 redirect at the load balancer and
never reaches capOS. The closeout evidence must be the HTTPS path.

An optional hardening for the first proof is to enable **Identity-Aware Proxy
(IAP)** on the backend service so the public door is gated by Google IAM before
any request reaches the capOS backend. IAP here is not a separate ingress shape:
it rides on the same external HTTPS load balancer and gates that backend service,
so the ALB is still the only public entry point. IAP composes with, and does not
replace, the capOS `SessionManager`/`AuthorityBroker` login boundary: IAP
authenticates the human to Google; capOS still mints its own `UserSession` and
projects only browser-safe view models. The browser never receives raw capOS
caps.

### Certificate and Key Custody

| Concern | First proof | Successor (deferred) |
|---|---|---|
| TLS terminator | Google front end (load balancer) | capOS userspace TLS service |
| Certificate source | Google-managed certificate (Certificate Manager or classic managed cert), or an operator-supplied cert resource on the load balancer | ACME (`AcmeClient` + `http-01`/`tls-alpn-01` solver) from [certificates-and-tls-proposal.md](certificates-and-tls-proposal.md) |
| Private-key custody | Google-held; never in capOS or the harness | capOS `PrivateKey` cap sealed under a `KeySource` |
| Min TLS version / cipher policy | Load balancer SSL policy (TLS 1.2+ minimum; prefer the GCP `MODERN`/`RESTRICTED` profile) | capOS `CipherPolicy` (`modern`) |

The first proof must not write a private key into the disk image, the manifest,
the cloudboot evidence directory, or any harness-staged object. A managed
certificate keeps key material entirely on the provider side.

The successor must preserve the same no-export rule on the capOS side: the ACME
account key and TLS private key remain behind `PrivateKey` / `KeyVault`
authority and are not copied into cloudboot images, manifests, logs, or evidence
directories. Local ACME proofs use a local directory; public GCE/Let's Encrypt
proofs require explicit run authorization, DNS-name control, public-ingress
teardown evidence, and staging-vs-production CA labeling.

### Browser Session and Origin Policy

The self-served Web UI keeps the Gate 1B boundary: `remote-session-web-ui` is
the trusted backend that holds remote-session/CapSet state server-side, and
browser JavaScript receives only browser-safe view models. Public exposure adds
the following reviewed browser rules:

- **Single public origin.** UI assets and the same-origin JSON API are served
  from the one HTTPS origin (the load balancer hostname). No second origin, no
  wildcard CORS, no cross-origin credentialed requests. The service-side
  policy is implemented in `remote-session-web-ui` as a boot-manifest input:
  one `public_origin.<host>` marker cap (an inert Endpoint, granted after the
  service caps) fixes the accepted `https://<host>` origin at boot, validated
  fail-closed (second marker, malformed, loopback-named, or IP-literal-shaped
  host, or any unrecognized extra grant fails the boot), and consulted by the
  `Host`/`Origin`/`Referer` gates only for requests on the trusted
  forwarded-scheme HTTPS path, so a direct client can never claim the public
  origin. Browser-supplied principal/source hint headers (IAP assertions,
  authenticated-user hints) are rejected on the public-origin path before any
  backend-held capability dispatch, no CORS headers are emitted, and login
  ingress extends to the recorded GFE ranges only when a public origin is
  configured. Locally proven by `make run-cloud-prod-remote-session-web-ui-l4`
  (in-process trusted-forwarder fixture positive plus cross-origin,
  mixed-scheme, wildcard, missing-origin, hostile-Referer, principal-hint, and
  real-ingress direct-client forged negatives); the proof claims no DNS name,
  load balancer, TLS endpoint, or live public exposure.
- **Forwarded-scheme trust is firewall-bounded.** Because the backend hop is
  plain HTTP, capOS derives the external scheme from the load balancer's
  `X-Forwarded-Proto`/forwarding headers. It must trust those headers **only**
  from the Google front-end source ranges (enforced by the firewall below), and
  treat any such header from an unexpected source as absent (default to "not
  HTTPS", fail closed on secure-context assumptions). The service-side trust
  gate is implemented in `remote-session-web-ui`
  (`forwarded_scheme_peer_trusted` / `external_scheme_is_https`, pinned to
  `130.211.0.0/22` and `35.191.0.0/16`, fail-closed on unknown peer formats)
  and locally proven by `make run-cloud-prod-remote-session-web-ui-l4`: a real
  ingress client forging `X-Forwarded-Proto: https` keeps the non-`Secure`
  cookie posture, and a fixture simulating the recorded ranges is the only path
  that flips the session cookie to `Secure`. The local proof remains
  plaintext-loopback and claims no live load balancer or TLS endpoint.
- **Session cookies.** The session cookie is `Secure`, `HttpOnly`, and
  `SameSite`. The `SameSite` value is picked deterministically rather than
  mid-slice: `Strict` when no IAP front door is used, and `Lax` when IAP is
  enabled (the IAP sign-in redirect is a cross-site top-level navigation that
  would drop a `Strict` cookie on return). `Secure` is honored because the
  browser only ever sees the cookie over the load balancer's HTTPS origin.
  The switch is implemented in `remote-session-web-ui` as a boot-manifest
  policy input: an IAP-fronted deployment manifest grants the inert
  `iap_fronted_ingress` marker cap (last in the web-ui grant list) to select
  `Lax`; without it the service emits `Strict`, and `SameSite=None` is never
  emitted. The posture applies uniformly to the session, CSRF, and
  logout/expiry clear-cookie headers, stays independent of the
  forwarded-scheme-derived `Secure` attribute, and is fixed at boot so no
  request header, cookie, or body field can select the weaker branch. Because
  a `Lax` cookie attaches on cross-site top-level GET navigations, the Lax
  posture additionally rejects authenticated GET views whose Fetch Metadata
  provenance (`Sec-Fetch-Site`) is cross-site -- and cookie-bearing GETs with
  no Fetch Metadata at all, covering legacy browsers and webviews that attach
  Lax cookies without stating provenance -- before any session state is
  touched; the gate is inert under `Strict`, where the cookie never attaches
  cross-site.
  `make run-cloud-prod-remote-session-web-ui-l4` proves the default `Strict`
  posture end to end (including a real-ingress login forging IAP-shaped
  headers and body fields) and the `Lax` branch through the service's
  in-process policy fixture; the live IAP-fronted deployment is future work.
- **HSTS and redirect.** The HTTPS edge sets
  `Strict-Transport-Security` with a conservative `max-age` (no `preload`,
  no `includeSubDomains` commitment for the first proof). Any port-80 listener
  is a 301 to HTTPS only.
- **CSRF.** State-changing JSON routes require a per-session anti-CSRF token and
  an `Origin`/`Referer` check against the known public origin; cross-origin or
  origin-absent state changes are rejected.
- **Session lifetime and logout.** Sessions carry a bounded idle timeout and an
  absolute lifetime. Logout drops the server-side session and clears the cookie;
  the existing self-served stale-session / logout failure-closed boundary
  (proven in the Gate 1B implementation gate) extends unchanged to the public
  endpoint. A stale or expired cookie yields no authority.

### Firewall and Source-Range Policy

The instance keeps no public IP. Ingress to the capOS UI backend port is allowed
**only** from Google's load-balancer and health-check ranges, never from
`0.0.0.0/0`:

| Allowed source | Purpose |
|---|---|
| `130.211.0.0/22`, `35.191.0.0/16` | Google Front Ends and load-balancer health checks reaching the backend port |
| `35.235.240.0/20` | Identity-Aware Proxy (only if IAP fronting or IAP-tunneled SSH/diagnostics is used) |

No other ingress rule is created. The proof does not broaden the service
account, add API scopes beyond the LB/health-check need, open SSH to the public
internet, or attach a broad firewall tag. Egress stays default-deny-friendly:
the LB-terminated path needs no capOS outbound, and the future ACME path (which
would require egress `443` to the ACME directory) is explicitly out of scope
here.

### Backend Health-Check Contract (Local Proof Landed)

The backend port is reachable only from the GFE/health-check ranges above, so
the load balancer's health checker is the route's only intended public caller.
The backend health contract, proven locally by
`make run-cloud-prod-remote-session-web-ui-l4`:

- **Route**: `GET /healthz` on the Web UI backend port, served by
  `demos/remote-session-web-ui` (`HEALTH_BODY`). The exact bounded response
  body is `{"ok":true,"service":"remote-session-web-ui"}` with
  `Content-Type: application/json` and `Cache-Control: no-store`; it carries no
  cap ids, session ids, user/profile names, endpoint handles, provider resource
  ids, host paths, or secret material.
- **No authority**: the route is unauthenticated and never creates, rotates,
  refreshes, or consumes a browser session; it never emits `Set-Cookie`, and a
  presented (even forged) session cookie changes nothing. The local proof
  drives a `/healthz` probe with live session cookies against an idle-expired
  session and asserts the next authenticated call still fails closed. It is the
  only unauthenticated public-ingress liveness exception; the
  Host/Origin/CSRF/session gates on authority-bearing routes are unchanged.
  (`/api/health` remains the bundled operator app's same-origin page-load ping
  with the same no-authority posture; the provider health check never probes
  it.)
- **Host-gate exemption**: the health checker probes the backend by IP, so
  `/healthz` deliberately does not require the loopback/public-host `Host`
  allowlist that authority-bearing routes enforce.
- **Fail-closed variants**: non-`GET` methods and path variants
  (`POST /healthz`, `/healthz/extra`, `/HEALTHZ`) return 404 without reaching
  any authority-bearing handler.
- **Availability under abuse**: the slow-client phases of the L4 smoke prove a
  concurrent `/healthz` keeps completing while idle, partial-request, and
  drip-feed clients are held open, and after they are abandoned.

This is local backend readiness for the selected policy
(`evidence-class=local-qemu`), not a live GCE health check: no health-check
resource, load balancer, firewall rule, or public endpoint exists, and a
passing local contract proof authorizes none of them.

### Audit and Evidence Fields

The public proof records, before teardown, at least:

- selected ingress shape (`https-load-balancer`) and whether IAP was enabled;
- public endpoint (hostname and HTTPS virtual IP);
- TLS posture: terminator (`google-frontend`), certificate type
  (`google-managed` or `operator-supplied`), and the load balancer SSL-policy
  minimum TLS version;
- authentication method exercised (capOS `SessionManager` login, and Google IAM
  identity if IAP is enabled);
- firewall/forwarding scope: the named source ranges, backend port, and the
  URL-map/forwarding-rule chain created;
- HTTP-to-HTTPS redirect and HSTS header observation;
- teardown result for every resource the proof created.

### Teardown Checklist

The existing harness deletes the instance, image, and staging tarball in an
`EXIT INT TERM` trap. The public proof extends that trap to delete, in
dependency order, every ingress resource it creates:

- global forwarding rule and target HTTPS proxy;
- URL map and any HTTP-to-HTTPS redirect URL map / target HTTP proxy;
- backend service and health check;
- zonal/serverless NEG or managed instance group backing the backend;
- managed certificate / certificate-map entry / SSL policy created for the run;
- the LB-scoped and (if used) IAP-scoped firewall rules;
- the reserved external IP address, if one was allocated for the LB;
- the instance, image, and staged tarball (existing harness behavior).

Teardown must be idempotent and must run on signal or partial failure, matching
the existing orphan-sweep discipline. A run that cannot confirm deletion of an
ingress resource is a failed run, not a passed one.

### Local Plan Gate (Landed)

The resource graph above is locally reviewable before any billable work:
`tools/cloudboot/plan-public-webui-ingress.sh` renders and validates the
selected plan shape with zero provider interaction, and
`make cloudboot-public-webui-ingress-plan-check` is the fixture gate proving
each rejected hazard (raw public HTTP to capOS, instance public IP,
`0.0.0.0/0` backend ingress, missing `/healthz` health check, broad service
account/scopes, staged private-key material, non-provider certificate custody)
fails closed by structured class before any provider CLI could be invoked.
Output is stamped `evidence-class=cloudboot-local-plan` with
`operator-exposure=not-proven`; a plan pass is not public reachability, TLS
readiness, or authorization for the on-hold public proof. The command contract
and failure classes are documented in `tools/cloudboot/README.md` ("Public Web
UI ingress plan gate").

### Local Teardown Fixture Gate (Landed)

The teardown checklist above is locally proven before any billable work:
`tools/cloudboot/teardown-public-webui-ingress.sh` is the dependency-ordered,
idempotent, deletion-confirming teardown engine over a per-run
created-resources journal, and
`make cloudboot-public-webui-teardown-fixture-check` exercises it against
recording stub provider CLIs across complete, partial-create,
command-failure, delete-claims-success-but-persists, unreadable-state,
signal-trap, and orphan-sweep paths. Every checklist resource class is
modeled and the engine's class list must equal the plan gate's rendered
`teardown-order=` line (the fixture fails on drift), so a class added to the
selected plan cannot go missing from the cleanup graph. An unconfirmed
deletion is a blocking structured failure (`undeleted-<class>` /
`resource-state-unknown`), matching the failed-run policy above. All
public-ingress resource names must carry the `capos-test-` sweepable marker;
a journal naming anything else is rejected before any provider call, and the
orphan sweep enforces the marker client-side so out-of-scope resources are
never deleted. Output is stamped
`evidence-class=cloudboot-local-teardown-fixture live-teardown=not-proven`;
a fixture pass is local harness evidence only, never live provider teardown
evidence, and authorizes no public ingress. The journal grammar, sweep
contract, and failure classes are documented in `tools/cloudboot/README.md`
("Public Web UI ingress teardown fixture gate").

### Local Evidence Fixture Gate (Landed)

The "Audit and Evidence Fields" contract above is locally proven before any
billable work: `tools/cloudboot/validate-public-webui-evidence.sh` validates
a harness-rendered public-proof closeout report against the selected
evidence grammar, and
`make cloudboot-public-webui-proof-evidence-fixture-check` is the fixture
gate proving accepted and rejected reports over stub inputs with zero
provider CLI invocations. Acceptance requires the recorded ingress shape,
public HTTPS hostname/VIP, provider TLS terminator and managed or
operator-supplied certificate resource, minimum TLS policy, IAP posture,
no-key-custody statement, no-public-IP instance posture, GFE/health-check
firewall scope, health-check, HTTP-to-HTTPS redirect and HSTS observations,
capOS `SessionManager` login observation, a public HTTPS probe record, the
correlated `gce-public-self-hosted-webui-ingress-tls` proof marker, and a
per-resource teardown record pinned to the plan gate's `teardown-order=`
class list (the fixture fails on drift). Raw public HTTP, a direct
instance public IP, wildcard backend ingress, a missing health check,
missing HSTS/redirect observation, capOS or harness private-key custody,
stale/missing/incomplete teardown, a non-provider TLS terminator, and
private-proof-only evidence (a same-VPC or provider-internal probe path,
or a proof marker without a recorded HTTPS probe) each fail closed by
structured class. The `tls terminator=` label structurally separates this
provider-terminated evidence contract from the later capOS-terminated TLS
successor, so successor evidence can never pass through the first-proof
grammar. Output names field names, classes, and line numbers only; input
values are never echoed. Every pass is stamped
`evidence-class=cloudboot-local-public-webui-evidence-fixture` with
`operator-exposure=not-proven`: a fixture pass is local evidence-grammar
validation only, never public reachability or operator-access evidence,
and it does not authorize public exposure or move the live proof out of
[`cloud-gce-public-self-hosted-webui-ingress-tls`](../tasks/on-hold/cloud-gce-public-self-hosted-webui-ingress-tls.md).
The report grammar and failure classes are documented in
`tools/cloudboot/README.md` ("Public Web UI evidence-grammar fixture
gate").

### Local Provider Command Allowlist Gate (Landed)

The provider command boundary the future public proof may use is locally
proven before any billable work:
`tools/cloudboot/check-public-webui-provider-commands.sh` validates a
recorded provider-command transcript against the selected resource graph,
and `make cloudboot-public-webui-provider-command-allowlist-check` is the
fixture gate proving both directions over recording stub `gcloud`/`gsutil`
with zero live provider invocations. The allowlist permits only the
resource families the plan and teardown checklist name -- forwarding rules,
target HTTPS/HTTP proxies, URL maps, backend services, health checks,
zonal NEGs, scoped firewall rules, managed-certificate resources, SSL
policies, reserved addresses, instance/image creation, and staged
tarball upload/delete -- and requires the `capos-test-` marker on every
created resource, journal-pinned deletion (a delete must name a resource
the created-resources journal recorded), GFE/IAP-only firewall source
ranges, the `capos-test` filter on every listing, marker discipline on
create-wired references, per-surface create flags and parameters pinned to
the selected graph shape, an explicit pin of the documented sandbox project on
every command, and explicit `--global`/`--zone` scope on deletes (ambient
Cloud SDK project/region defaults are never trusted). Drift toward broader
provider authority fails closed
by structured class: IAM mutation, service-account/scopes changes, DNS
mutation, private-key upload, `0.0.0.0/0` backend ingress, unmarked
resources, deletion outside the journal (zone-pinned), project-wide or
filter-restating sweeps, ambient credential flags, project/network/region
scope overrides beyond the pinned sandbox forms, `--flags-file`
indirection, non-selected create parameters, shell/environment
inspection, and provider CLI resolution from an unexpected path. Rejected
command content is reported by class and line number only; credentials,
principals, key paths, and rejected names are never echoed. Output is
stamped `evidence-class=cloudboot-local-provider-command-allowlist` with
`provider-mutation=none`: a pass narrows what the future live proof may
execute, it is not live provider evidence and does not authorize the
on-hold public proof. The transcript grammar and failure classes are
documented in `tools/cloudboot/README.md` ("Public Web UI
provider-command allowlist gate").

---

## Phase 2: ACPI and Device Discovery

**Goal:** Parse ACPI tables to discover hardware topology, interrupt routing,
and PCI root complexes. This replaces QEMU-specific hardcoded assumptions.

### Why ACPI

On QEMU with default settings, you can hardcode PCI config space at
`0xCF8`/`0xCFC` and assume legacy interrupt routing. On real cloud hardware:

- PCI root complex addresses come from ACPI MCFG table (PCIe ECAM)
- Interrupt routing comes from ACPI MADT (I/O APIC entries) and \_PRT
- CPU topology comes from ACPI MADT (LAPIC entries)
- Timer info comes from ACPI HPET/PMTIMER tables

Limine provides the RSDP (Root System Description Pointer) address via its
protocol. From there, the kernel can walk RSDT/XSDT to find specific tables.

### Required Tables

| Table | Purpose | Priority |
|---|---|---|
| MADT | LAPIC and I/O APIC addresses, CPU enumeration | High (Phase 2) |
| MCFG | PCIe Enhanced Configuration Access Mechanism base | High (Phase 2) |
| HPET | High Precision Event Timer address | Medium (fallback timer) |
| FADT | PM timer, shutdown/reset methods | Low (future) |

### Landed Discovery Slice

The first landed slices are bounded diagnostics plus reusable config access.
The ACPI parser requests
Limine's RSDP, validates RSDP/RSDT/XSDT/static-table lengths and checksums
within fixed caps, emits serial summaries for RSDT/XSDT table count and
MADT/MCFG presence, reports MADT LAPIC/I/O APIC/interrupt-source-override
inputs, and reports MCFG ECAM allocation records when firmware provides the
table. The PCI layer now keeps the existing legacy I/O-port backend and adds an
ECAM backend selected from MCFG allocations; devices retain their discovery
backend so config reads, writes, capability walking, and BAR sizing use the
same access path. The PCI layer also exposes a shared memory-BAR subregion
validator/mapper, and the virtio-net transport uses it for modern capability
regions. It also reports MSI/MSI-X capability metadata for the virtio-net
function and uses kernel-owned config/RX/TX source records with a bounded
first-fit LAPIC device MSI vector pool plus lock-free dispatch slots for QEMU
virtio-net MSI-X table programming, virtio vector assignment, driver-owned
route unmask, claimed-route lifecycle/reassignment proof, and TX delivery
proof. The x86 setup
maps MADT I/O APICs and programs masked legacy IRQ routes from MADT source
overrides before higher-level drivers can depend on interrupt routing. The Q35
smoke asserts both the ECAM inventory lines, a
`pci: config backend=ecam enumerated ...` proof line, and representative masked
I/O APIC route lines; the net smoke asserts virtio-net BAR, capability, MSI-X
metadata, source-route records, route unmask records, vector programming,
queue assignment, descriptor guards, ARP, and ICMP fixture lines before
MMIO transport mapping completes. This path does not interpret AML, provide
userspace driver authorities, or provide full unbounded bus discovery yet.

### Implementation

```rust
// kernel/src/acpi.rs

/// Minimal ACPI table parser.
/// Walks RSDP -> XSDT -> individual tables.
/// Does NOT implement AML interpretation -- static tables only.

pub struct AcpiInfo {
    pub lapics: Vec<LapicEntry>,
    pub io_apics: Vec<IoApicEntry>,
    pub iso_overrides: Vec<InterruptSourceOverride>,
    pub mcfg_base: Option<u64>,  // PCIe ECAM base address
    pub hpet_base: Option<u64>,
}

pub fn parse_acpi(rsdp_addr: u64, hhdm: u64) -> AcpiInfo { ... }
```

For the fuller static-table subsystem, prefer the `acpi` crate (or an
equivalent maintained `no_std` parser) rather than expanding the diagnostic
parser into a general hand-written ACPI stack. The landed parser is a boot-time
inventory proof for RSDP/RSDT/MADT/MCFG summaries; it can be retired or
narrowed once the crate-backed table model fits capOS mapping and table
lifetime constraints.

### Limine RSDP

```rust
use limine::request::RsdpRequest;

static RSDP: RsdpRequest = RsdpRequest::new();

// In kmain:
let rsdp_addr = RSDP.response().expect("no RSDP").address as u64;
let acpi_info = acpi::parse_acpi(rsdp_addr, hhdm_offset);
```

### Crate Dependencies

| Crate | Purpose | no_std |
|---|---|---|
| `acpi` | Planned fuller/static ACPI table parsing (MADT, MCFG, HPET, FADT, etc.) | yes |

### Scope

The landed diagnostic slice is kernel-local bounded read-only parsing for
serial inventory. Fuller handling should be mostly glue around a maintained
static-table parser plus capOS mapping, lifetime, and authority types.

---

## Phase 3: Interrupt Infrastructure

**Goal:** Set up I/O APIC for device interrupt routing and MSI/MSI-X for
modern PCI devices. This replaces the implicit legacy PIC setup.

### I/O APIC

The I/O APIC routes external device interrupts (keyboard, serial, PCI devices)
to specific LAPIC entries (CPUs). Its address and configuration come from the
ACPI MADT (Phase 2).

```rust
// kernel/src/arch/x86_64/ioapic.rs

pub struct IoApic {
    base: *mut u32,  // MMIO registers via HHDM
}

impl IoApic {
    /// Route an IRQ to a specific LAPIC/vector.
    pub fn route_irq(&mut self, irq: u8, lapic_id: u8, vector: u8) { ... }

    /// Mask/unmask an IRQ line.
    pub fn set_mask(&mut self, irq: u8, masked: bool) { ... }
}
```

The current x86 implementation maps MADT I/O APIC MMIO, reads each controller's
ID/version/redirection count, and programs legacy IRQ 0-15 routes to LAPIC
vectors while keeping the redirection entries masked. It respects Interrupt
Source Override entries from MADT (for example, Q35 remaps IRQ 0 to GSI 2).
Driver-owned unmask policy, dispatch, and EOI handling remain planned.

### MSI/MSI-X

Modern PCI/PCIe devices (NVMe, cloud NICs) use Message Signaled Interrupts
instead of pin-based IRQs routed through the I/O APIC. MSI/MSI-X writes
directly to the LAPIC's interrupt command register, bypassing the I/O APIC
entirely.

This is critical for cloud deployment because:

- NVMe controllers require MSI or MSI-X (no legacy IRQ fallback on many
  controllers)
- Cloud NICs (ENA, gVNIC) use MSI-X exclusively
- MSI-X supports per-queue interrupts (one vector per virtqueue/submission
  queue), enabling better SMP scalability

```rust
// kernel/src/pci/msi.rs

/// Configure MSI for a PCI device.
pub fn enable_msi(device: &PciDevice, vector: u8, lapic_id: u8) { ... }

/// Configure MSI-X for a PCI device.
pub fn enable_msix(
    device: &PciDevice,
    table_bar: u8,
    entries: &[(u16, u8, u8)],  // (index, vector, lapic_id)
) { ... }
```

MSI/MSI-X capability structures are found by walking the PCI capability list
(already needed for PCI enumeration in the networking proposal). The current
PCI path reports MSI/MSI-X capability metadata for virtio-net so diagnostics
can see the advertised table and pending-bit-array layout. The virtio-net QEMU
smoke now records kernel-owned config/RX/TX MSI-X sources, publishes them into
the device interrupt dispatch table, allocates LAPIC vectors from the bounded
device MSI vector pool to program their table entries and virtio vector
registers, lets the in-kernel virtio-net owner unmask only those routes, then
proves TX delivery by observing that source's dispatch counter advance after
maskable interrupts are live. The same smoke uses an unused masked MSI-X table
entry to prove claimed-route reassignment, stale old-route rejection,
old-vector unregistered delivery, reassigned-vector masked delivery,
unsupported-vector delivery, and release. Broader driver dispatch and
userspace interrupt authority remain planned.

### Integration with SMP

LAPIC initialization is shared with the SMP proposal. The active x86 path uses
xAPIC MMIO for the immediate QEMU/KVM timer and IPI foundation, with PIT/PIC
fallback. This cloud phase consumes that architectural LAPIC path for local
interrupt delivery and now adds masked ACPI MADT I/O APIC routing plus
MSI/MSI-X capability metadata discovery and a bounded virtio-net MSI-X
dispatch/lifecycle proof; userspace device interrupts remain planned.

KVM/QEMU paravirtual features such as PV EOI, PV IPI, and PV TLB flush are
host-specific accelerations. They are useful later for cloud performance, but
cloud boot correctness should use the architectural LAPIC path first. x2APIC is
a later backend for newer/high-core systems and firmware states where xAPIC is
unavailable or undesirable; it is not a blocker for the current LAPIC path.

### Scope

~300-400 lines total:
- I/O APIC driver: ~150 lines
- MSI/MSI-X setup: ~100-150 lines
- Integration/routing logic: ~50-100 lines

---

## Phase 4: PCI/PCIe Infrastructure

**Goal:** Standalone PCI bus enumeration and device management, usable by all
device drivers (virtio-net, NVMe, cloud NICs).

The networking proposal includes PCI enumeration as a substep for finding
virtio-net. This phase promotes it to a reusable kernel subsystem that all
device drivers build on.

### PCI Configuration Access

Two mechanisms, determined by ACPI:

1. **Legacy I/O ports** (0xCF8/0xCFC) -- works in QEMU, limited to 256 bytes
   of config space per function. Insufficient for PCIe extended capabilities.
2. **PCIe ECAM** (Enhanced Configuration Access Mechanism) -- memory-mapped
   config space, 4 KB per function. Base address from ACPI MCFG table. Required
   for MSI-X capability parsing and NVMe BAR discovery on real hardware.

Legacy I/O and Q35 ECAM config access exist today behind the same early PCI
backend abstraction. The PCI layer also validates memory BAR subregions with
checked offset/length/alignment bounds and maps selected subregions through the
kernel MMIO window for in-kernel drivers, and it records non-programming
MSI/MSI-X metadata for the current virtio-net path by walking the standard PCI
capability list. The virtio-net path now selects a usable MSI-X capability and
programs config/RX/TX table entries through the typed PCI MSI-X table helper
using the kernel-owned source records and bounded first-fit LAPIC device MSI
vectors. The QEMU net smoke lets the in-kernel virtio-net owner claim and
unmask those routes, assigns the virtio common and queue MSI-X vector
registers, and proves TX delivery by observing that source's dispatch counter
advance after the TX completion path has run and maskable interrupts are live.
It also proves claimed-route reassignment and release with an unused masked
MSI-X table entry. The next steps are using that path for full bus discovery,
userspace `DeviceMmio` authority, broader driver dispatch, and driver binding.

### Device Enumeration

```rust
// kernel/src/pci.rs

pub struct PciDevice {
    pub bus: u8,
    pub device: u8,
    pub function: u8,
    pub vendor_id: u16,
    pub device_id: u16,
    pub class: u8,
    pub subclass: u8,
    pub bars: [Option<Bar>; 6],
    pub interrupt_pin: u8,
    pub interrupt_line: u8,
}

pub enum Bar {
    Memory {
        base: u64,
        size: u64,
        prefetchable: bool,
        width: MemoryBarWidth,
    },
    Io { base: u32, size: u32 },
}

/// Scan all PCI buses and return discovered devices.
pub fn enumerate() -> Vec<PciDevice> { ... }

/// Find a device by vendor/device ID.
pub fn find_device(vendor: u16, device: u16) -> Option<PciDevice> { ... }

/// Walk the PCI capability list for a device.
pub fn capabilities(device: &PciDevice) -> Vec<PciCapability> { ... }
```

### BAR Mapping

Device drivers need MMIO access to BAR regions. The kernel now maps validated
memory-BAR subregions into its bounded MMIO virtual window for in-kernel
drivers. A future `DeviceMmio` capability will carry equivalent authority to
userspace drivers as described in the networking proposal.

### PCI Device IDs for Cloud Hardware

| Device | Vendor:Device | Cloud |
|---|---|---|
| virtio-net | 1AF4:1000 (transitional) or 1AF4:1041 (modern) | QEMU, supported first/second-generation GCP machine families |
| virtio-blk | 1AF4:1001 (transitional) or 1AF4:1042 (modern) | QEMU |
| NVMe | 8086:various, 144D:various, etc. | All clouds (EBS, PD, Managed Disk) |
| AWS ENA | 1D0F:EC20 / 1D0F:EC21 | AWS |
| GCP gVNIC | 1AE0:0042 | GCP |
| Azure MANA | 1414:00BA | Azure |

### Scope

~400-500 lines:
- Config space access (I/O + ECAM): ~100 lines
- Bus enumeration: ~150 lines
- BAR parsing and mapping: ~100 lines
- Capability list walking: ~50-100 lines

---

## Phase 5: NVMe Driver

**Goal:** Basic NVMe block device driver, sufficient to read/write sectors.
This is the storage equivalent of virtio-net for networking -- the first
real storage driver.

### Why NVMe Over virtio-blk

The storage-and-naming proposal mentions virtio-blk for Phase 3 (persistent
store). On cloud VMs, all three providers expose NVMe:

- **AWS EBS** -- NVMe interface (even for gp3/io2 volumes)
- **GCP Persistent Disk** -- NVMe or SCSI (NVMe is default for newer VMs)
- **Azure Managed Disks** -- SCSI on many older VM families such as D/Ev5 or
  Fv2 and older; NVMe on Azure Boost and newer NVMe-capable families such as
  Ebsv5 and Da/Ea/Fav6 and newer

virtio-blk is QEMU-only. An NVMe driver unlocks persistent storage on all
cloud platforms where the selected VM shape exposes NVMe. For QEMU testing,
QEMU also emulates NVMe well:
`-drive file=disk.img,if=none,id=d0 -device nvme,drive=d0,serial=capos0`.

### NVMe Architecture

NVMe is a register-level standard with well-defined queue-pair semantics:

```
Application
    |
    v
Submission Queue (SQ) -- ring buffer of 64-byte command entries
    |
    | doorbell write (MMIO)
    v
NVMe Controller (hardware)
    |
    | DMA completion
    v
Completion Queue (CQ) -- ring buffer of 16-byte completion entries
    |
    | MSI-X interrupt
    v
Driver processes completions
```

Minimum viable driver needs:
1. Admin Queue Pair (for identify, create I/O queues)
2. One I/O Queue Pair (for read/write commands)
3. MSI-X for completion notification (or polling)

### Implementation Sketch

```rust
// kernel/src/nvme.rs (or kernel/src/drivers/nvme.rs)

pub struct NvmeController {
    bar0: *mut u8,          // MMIO registers
    admin_sq: SubmissionQueue,
    admin_cq: CompletionQueue,
    io_sq: SubmissionQueue,
    io_cq: CompletionQueue,
    namespace_id: u32,
    block_size: u32,
    block_count: u64,
}

impl NvmeController {
    pub fn init(pci_device: &PciDevice) -> Result<Self, NvmeError> { ... }
    pub fn read(&self, lba: u64, count: u16, buf: &mut [u8]) -> Result<(), NvmeError> { ... }
    pub fn write(&self, lba: u64, count: u16, buf: &[u8]) -> Result<(), NvmeError> { ... }
    pub fn identify(&self) -> NvmeIdentify { ... }
}
```

### DMA Considerations

NVMe uses DMA for data transfer. The controller reads/writes directly from
physical memory addresses provided in commands. Requirements:

- Buffers must be physically contiguous (or use PRP lists / SGLs for scatter-gather)
- Physical addresses must be provided (not virtual)
- Cache coherence is handled by hardware on x86_64 (DMA-coherent architecture)

The existing frame allocator can provide physically contiguous pages. For
larger transfers, PRP (Physical Region Page) lists allow scatter-gather.

### Crate Dependencies

| Crate | Purpose | no_std |
|---|---|---|
| (none) | NVMe register-level protocol is simple enough to implement directly | N/A |

The NVMe spec is cleaner than virtio and the register interface is
straightforward. A minimal driver (admin + 1 I/O queue pair, read/write)
is ~500-700 lines without external dependencies.

### Integration with Storage Proposal

The storage proposal's Phase 3 (Persistent Store) specifies virtio-blk as
the backing device. This can be generalized to a `BlockDevice` trait:

```rust
trait BlockDevice {
    fn read(&self, lba: u64, count: u16, buf: &mut [u8]) -> Result<(), Error>;
    fn write(&self, lba: u64, count: u16, buf: &[u8]) -> Result<(), Error>;
    fn block_size(&self) -> u32;
    fn block_count(&self) -> u64;
}
```

Both NVMe and virtio-blk implement this trait. The store service doesn't
care which backing driver it uses.

### Scope

~500-700 lines for a minimal in-kernel NVMe driver (admin queue + 1 I/O
queue pair, read/write, identify). Userspace decomposition follows the same
pattern as the networking proposal (kernel driver first, then extract to
userspace process with DeviceMmio + Interrupt caps).

---

## Phase 6: Cloud NIC Strategy

**Goal:** Define the path to networking on cloud VMs, given that each cloud
uses a different proprietary NIC.

### The Landscape

| Cloud | Primary NIC | Virtio NIC available? | Open-source driver? |
|---|---|---|---|
| GCP | gVNIC (1AE0:0042) | Yes on supported first/second-generation machine families | Yes (Linux, ~3000 LoC) |
| AWS | ENA (1D0F:EC20) | No (Nitro only) | Yes (Linux, ~8000 LoC) |
| Azure | MANA (1414:00BA) | No (accelerated networking) | Yes (Linux, ~6000 LoC) |

### Recommended Strategy

**Short term: constrained virtio-net on GCP**

GCP can expose `VIRTIO_NET` on supported first/second-generation machine
families. After the shared image, ACPI/PCIe, interrupt, DMA/MMIO, and virtio
foundation exists, that gives a constrained early cloud-network proof without
writing a provider-specific NIC driver. It is not the general GCP target:
third-generation-and-later machine families, Tau T2A, Confidential VM, and
some higher-bandwidth paths require gVNIC.

```bash
gcloud compute instances create capos-test \
    --image=capos \
    --machine-type=e2-micro \
    --network-interface=nic-type=VIRTIO_NET
```

**Medium term: gVNIC driver**

gVNIC is a simpler device than ENA or MANA. The Linux driver is ~3000 lines
(vs ~8000 for ENA). It uses standard PCI BAR MMIO + MSI-X interrupts. A
minimal gVNIC driver (init, link up, send/receive) would be ~800-1200 lines.

gVNIC is worth prioritizing because:
- GCP's constrained virtio-net path can de-risk cloud networking before a
  provider-specific NIC driver exists
- Graduating from virtio-net to gVNIC on the same cloud is the required path
  for newer, Tau T2A, Confidential VM, and higher-bandwidth GCP instances
- The gVNIC register interface is documented in the Linux driver source

**Long term: ENA and MANA**

ENA and MANA are more complex and less well-documented outside their Linux
drivers. These should be deferred until the driver model is mature (userspace
drivers with DeviceMmio caps, as described in the networking proposal Part 2).

At that point, the kernel only needs to provide PCI enumeration + BAR mapping +
MSI-X routing. The actual NIC driver logic runs in a userspace process, making
it feasible to port from the Linux driver source with appropriate licensing
considerations.

### Alternative: Paravirt Abstraction Layer

Instead of writing native drivers for each cloud NIC, an alternative is a
thin paravirt layer:

```
Application -> NetworkManager cap -> Net Stack (smoltcp) -> NIC cap -> [driver]
```

Where `[driver]` is one of:
- `virtio-net` (QEMU, supported first/second-generation GCP machine families)
- `gvnic` (GCP)
- `ena` (AWS)
- `mana` (Azure)

All drivers implement the same `Nic` capability interface from the networking
proposal. The network stack and applications are driver-agnostic.

This is already the architecture described in the networking proposal. The
only addition is recognizing that multiple driver implementations will exist
behind the same `Nic` interface.

---

## Phase Summary and Dependencies

```mermaid
graph TD
    P1[Phase 1: Disk Image + Serial Diagnostics] --> BOOT[Boots on Cloud VM]
    P2[Phase 2: ACPI Parsing] --> P3[Phase 3: Interrupt Infrastructure]
    P2 --> P4[Phase 4: PCI/PCIe]
    P3 --> P5[Phase 5: NVMe Driver]
    P4 --> P5
    P4 --> NET[Networking Smoke Test<br>virtio-net driver]
    P3 --> NET
    P4 --> P6[Phase 6: Cloud NIC Drivers]
    P3 --> P6
    NET --> P6

    S5[Stage 5: Scheduling] --> P3
    SMP_C[SMP Phase C: LAPIC timer/IPI] --> P3

    style P1 fill:#2d5,stroke:#333
    style BOOT fill:#2d5,stroke:#333
```

| Phase | Depends on | Estimated scope | Enables |
|---|---|---|---|
| 1: Disk image + diagnostics | Nothing | image tooling plus bounded diagnostics mode | Cloud serial boot |
| 2: ACPI | Nothing (kernel code) | ~200-300 lines | Phases 3, 4 |
| 3: Interrupts | Phase 2, LAPIC (SMP Phase C) | ~300-400 lines | NVMe, cloud NICs |
| 4: PCI/PCIe | Phase 2 | ~400-500 lines | All device drivers |
| 5: NVMe | Phases 3, 4 | ~500-700 lines | Cloud storage |
| 6: Cloud NICs | Phases 3, 4, networking smoke test | ~800-1200 lines each | Cloud networking |

### Minimum Path to "Boots on Cloud VM, Prints Hello"

Raw serial output and UEFI boot support already exist, so the smallest
"prints hello" experiment is mostly Phase 1 image packaging plus any boot-path
adjustments needed to reach the same COM1 output from an imported disk image.
That experiment is a precursor, not the full Phase 1 closeout.

Phase 1 closeout also includes a bounded serial diagnostics prompt so cloud
driver bring-up can inspect CPU, memory, ACPI, PCI, IRQ, timer, device, and log
state before cloud NICs or storage drivers are reliable. That diagnostics
surface is kernel/userspace behavior, not just build-system work.

### Minimum Path to "Useful on Cloud VM"

Phases 1-5 (disk image + ACPI + interrupts + PCI + NVMe) plus the existing
roadmap items (Stages 4-6 for capability syscalls, scheduling, IPC). On a
supported first/second-generation GCP machine family, networking can use the
existing virtio-net proposal without a provider-specific gVNIC/ENA/MANA driver
on that constrained target.

---

## QEMU Testing

All phases can be tested in QEMU before deploying to cloud:

| Phase | QEMU flags |
|---|---|
| Disk image | `-drive file=capos.img,format=raw -bios OVMF.4m.fd` |
| ACPI | Default QEMU provides ACPI tables (MADT, MCFG, etc.) |
| I/O APIC | Default QEMU emulates I/O APIC |
| PCI/PCIe | `-device ...` adds PCI devices; QEMU has PCIe root complex |
| NVMe | `-drive file=disk.img,if=none,id=d0 -device nvme,drive=d0,serial=capos0` |
| MSI-X | Supported by QEMU's NVMe and virtio-net-pci emulation; current net smoke asserts metadata selection, kernel-owned source-route records, route unmask, vector programming, virtio queue assignment, descriptor guards, ARP, and ICMP fixture evidence. Device-autonomous virtio-net MSI-X delivery is covered by the dedicated userspace-provider gates. |
| Multi-CPU | `-smp 4` (already works with Limine SMP) |
| x2APIC backend | future explicit QEMU CPU feature such as `-cpu qemu64,+smep,+smap,+rdrand,+x2apic` |

---

## aarch64 and ARM Cloud Instances

This proposal focuses on x86_64 because that's the current kernel target, but
ARM-based cloud instances are significant and growing:

| Cloud | ARM offering | Instance types |
|---|---|---|
| AWS | Graviton2/3/4 | m7g, c7g, r7g, etc. |
| GCP | Tau T2A (Ampere Altra) | t2a-standard-* |
| Azure | Cobalt 100 (Arm Neoverse) | Dpsv6, Dplsv6 |

ARM cloud VMs have the same general requirements (UEFI boot, ACPI tables,
PCI/PCIe, NVMe storage) but different specifics:

- **Interrupt controller**: GIC (Generic Interrupt Controller) instead of
  APIC. GICv3 is standard on cloud ARM instances.
- **Boot**: UEFI via Limine (already targets aarch64). Limine handles the
  architecture differences at boot time.
- **Timer**: ARM generic timer (CNTPCT_EL0) instead of LAPIC/PIT/TSC.
- **Serial**: PL011 UART instead of 16550 COM1. Different register interface.
- **NIC**: Same PCI devices (ENA, gVNIC, MANA) with the same register
  interfaces -- PCI/PCIe is architecture-neutral.
- **NVMe**: Same NVMe register interface -- PCIe is architecture-neutral.

The arch-neutral parts of this proposal (PCI enumeration, NVMe, disk image
format, ACPI table parsing) apply equally to aarch64. The arch-specific
parts (I/O APIC, MSI delivery address format, LAPIC) need aarch64 equivalents
(GIC, ARM MSI translation).

The existing roadmap lists "aarch64 support" as a future item. For cloud
deployment, aarch64 should be considered as soon as the x86_64 hardware
abstraction is stable, since:

1. Device drivers (NVMe, virtio-net, cloud NICs) are architecture-neutral --
   they talk to PCI config space and MMIO BARs, which are the same on both
   architectures
2. The `acpi` crate handles both x86_64 and aarch64 ACPI tables
3. Limine already targets aarch64
4. AWS Graviton instances are often cheaper than x86_64 equivalents

The main aarch64 kernel work is: exception handling (EL0/EL1 instead of
Ring 0/3), GIC driver (instead of APIC), ARM generic timer, PL011 serial,
and the MMU setup (4-level page tables exist on both but with different
register interfaces).

---

## Open Questions

1. **ACPI scope.** The landed diagnostic parser covers bounded read-only
   RSDP/RSDT/MADT/MCFG summaries only. The `acpi` crate can parse fuller
   static tables (MADT, MCFG, HPET, FADT). Full ACPI requires AML
   interpretation (for \_PRT interrupt routing, dynamic device enumeration).
   Do we need AML, or are static tables sufficient for cloud VMs? Cloud VM
   firmware typically provides simple, static ACPI tables -- AML interpretation
   is likely unnecessary initially.

2. **PCIe ECAM vs legacy.** Should we support both config access methods, or
   require ECAM (which all cloud VMs and modern QEMU provide)? Supporting
   both adds ~50 lines but makes bare-metal testing on older hardware possible.

3. **NVMe queue depth.** A single I/O queue pair with depth 32 is sufficient
   for initial use. Per-CPU queues (leveraging MSI-X per-queue interrupts)
   improve SMP throughput but add complexity. Defer per-CPU queues to after
   SMP is working.

4. **Driver model unification.** **Resolved:** PCI enumeration is the
   standalone PCI/PCIe Infrastructure item in the roadmap. The networking
   smoke test and NVMe driver both consume this shared subsystem. The
   networking proposal's Part 1 Step 1 has been updated to reference this
   phase.

5. **GCP vs AWS as first cloud target.** The first cloud proof should be
   imported-image serial-console boot on both providers when practical, because
   that validates image format, firmware, bootloader, and early ACPI without
   depending on cloud NICs. For the later usable-networked-instance milestone,
   a constrained first/second-generation GCP virtio-net target is the easiest
   first network proof; broader GCP coverage needs gVNIC, and AWS follows once
   the NVMe/ENA path or an explicit workaround is ready.

---

## References

### Specifications

- [NVMe Base Specification 2.1](https://nvmexpress.org/specifications/) --
  register interface, queue semantics, command set
- [PCI Express Base Specification](https://pcisig.com/specifications) --
  ECAM, MSI/MSI-X capability structures
- [ACPI Specification 6.5](https://uefi.org/specs/ACPI/6.5/) --
  MADT, MCFG, HPET table formats
- [Intel SDM Vol. 3, Ch. 10](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
  -- APIC architecture (LAPIC, I/O APIC)

### Crates

- [acpi](https://crates.io/crates/acpi) -- no_std ACPI table parser
- [virtio-drivers](https://crates.io/crates/virtio-drivers) -- no_std virtio
  (already in networking proposal)

### Prior Art

- [Redox PCI](https://gitlab.redox-os.org/redox-os/drivers/-/tree/master/pcid)
  -- microkernel PCI driver in Rust
- [Hermit NVMe](https://github.com/hermit-os/kernel) -- unikernel NVMe driver
- [rCore virtio](https://github.com/rcore-os/rCore) -- educational OS with
  virtio + PCI in Rust
- [Linux gVNIC driver](https://github.com/torvalds/linux/tree/master/drivers/net/ethernet/google/gve)
  -- reference for gVNIC register interface (~3000 LoC)
- [Linux ENA driver](https://github.com/amzn/amzn-drivers/tree/master/kernel/linux/ena)
  -- reference for ENA

### Cloud Documentation

- [GCP: Creating custom images](https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images)
- [GCP: Manually import boot disks](https://cloud.google.com/compute/docs/import/import-existing-image)
- [GCP: Requirements to build custom images](https://cloud.google.com/compute/docs/images/building-custom-os)
- [GCP: Persistent Disk storage interfaces](https://cloud.google.com/compute/docs/disks/persistent-disks)
- [AWS: Importing VM images](https://docs.aws.amazon.com/vm-import/latest/userguide/vmimport-image-import.html)
- [AWS: VM Import/Export requirements](https://docs.aws.amazon.com/vm-import/latest/userguide/prerequisites.html)
- [AWS: VM Import/Export limitations](https://docs.aws.amazon.com/vm-import/latest/userguide/limitations-image-importing.html)
- [AWS: EC2 UEFI boot mode requirements](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launch-instance-boot-mode.html)
- [Azure: Creating custom images](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/create-upload-generic)
- [GCP: Choosing a NIC type](https://cloud.google.com/compute/docs/networking/using-gvnic)
- [GCP: Cloud Run overview](https://docs.cloud.google.com/run/docs/overview/what-is-cloud-run)
- [GCP: Firestore Native mode](https://docs.cloud.google.com/firestore/native/docs)
- [GCP: Cloud Storage object versioning](https://docs.cloud.google.com/storage/docs/object-versioning)
- [GCP: Secret Manager](https://cloud.google.com/secret-manager)
- [GCP: Cloud KMS overview](https://cloud.google.com/kms/docs)
- [GCP: Cloud KMS IAM](https://cloud.google.com/kms/docs/iam)
- [GCP: Cloud KMS roles and permissions](https://cloud.google.com/iam/docs/roles-permissions/cloudkms)
- [GCP: Cloud KMS key rotation](https://cloud.google.com/kms/docs/key-rotation)
- [GCP: Rotate a Cloud KMS key](https://cloud.google.com/kms/docs/rotate-key)
- [GCP: Enable and disable Cloud KMS key versions](https://cloud.google.com/kms/docs/enable-disable)
- [GCP: Destroy and restore Cloud KMS key versions](https://cloud.google.com/kms/docs/destroy-restore)
- [AWS: Enhanced networking](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html)
- [AWS: Nitro instances](https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-nitro-instances.html)
- [Azure: Accelerated Networking](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview)
- [Azure: Microsoft Azure Network Adapter](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-mana-overview)
- [Azure: Manage Accelerated Networking](https://learn.microsoft.com/en-us/azure/virtual-network/manage-accelerated-networking)
- [Azure: NVMe overview](https://learn.microsoft.com/en-us/azure/virtual-machines/nvme-overview)
- [Google Drive: application data folder](https://developers.google.com/workspace/drive/api/guides/appdata)
- [Google Drive: Drive API scopes](https://developers.google.com/workspace/drive/api/guides/api-specific-auth)
- [Firebase: Firestore offline persistence](https://firebase.google.com/docs/firestore/manage-data/enable-offline)
- [Firebase: Firestore security rule conditions](https://firebase.google.com/docs/firestore/security/rules-conditions)
- [Firebase: Firestore usage and limits](https://firebase.google.com/docs/firestore/quotas)
- [Firebase: Google sign-in for web](https://firebase.google.com/docs/auth/web/google-signin)

### capOS Cross-Links

- `docs/design-risks-register.md` -- R13 (trusted build inputs are partly
  pinned) consolidates the long-horizon supply-chain risk view that gates
  cloud-image release paths; this proposal is recorded as a secondary owner.
- `docs/trusted-build-inputs.md` -- the actual inventory of pinned and
  observed-not-pinned build inputs, dependency policy, vendored upstream
  snapshots, and the build-provenance retention/comparison policy that cloud
  proofs must satisfy before they are cited as production evidence.
- `docs/tasks/done/2026-06-07/cloud-usable-instance-provider-nic-storage.md` --
  the completed GCP-first usable-instance provider rollup covering provider
  NIC/storage authority, DMA backend selection, cloud teardown, and
  serial-console operator access.
- `docs/dma-isolation-design.md` -- DMA isolation backend selection
  (kernel-owned bounce buffers vs IOMMU/remapping) that cloud provider
  drivers must commit to before claiming usable-instance status.
- `docs/backlog/hardware-boot-storage.md` -- DDF Tasks 5 (userspace driver
  authority) and 6 (recurring cloud-portability gate) referenced from Phase
  1 closeout above.
