# GCP Persistent Disk (storage)

This is a provenance map for the **GCP Persistent Disk (PD)** storage shape:
how a GCE instance presents its persistent disks to the guest, why most current
families expose them as **standard NVMe** namespaces the shared NVMe foundation
already drives, and the small GCP delta capOS adds on top. It is not a re-spec;
the NVMe register/queue/PRP wire subset capOS actually touches is documented
once in [`nvme.md`](nvme.md) and not repeated here.

**Maturity caveat.** This page documents one bounded live-GCE NVMe Persistent
Disk proof on a `c3-standard-4` VM, plus the local QEMU/cloudboot proofs that
preceded it. The live proof is a single brokered NVMe READ through provider
authority; it is not a general reusable storage provider, filesystem
integration, virtio-scsi path, Local SSD path, direct-DMA claim, or
device-autonomous MSI-X claim. The older
`cloud-prod-storage-bound-local-proof` composes production grant surfaces over a
discovered NVMe function and emits
`cloudboot-evidence: storage-bound` on a local boot of the
`make capos-cloudboot-image` disk under QEMU. The later
`cloud-prod-nvme-brokered-userspace-provider-local-proof` child chain drives the
same local QEMU `-device nvme` surface through brokered controller bring-up,
admin `IDENTIFY`, I/O queue creation, `BlockDevice` read/write/flush, a
dedicated data-completion `Interrupt` route, and multi-PRP windows while
preserving manager-authored queue-base/PRP materialization. The live GCE
closeout is the `cloud-gcp-storage-driver` run described in §6.

## 1. Spec basis

- **Device**: GCE Persistent Disk. GCE exposes attached PD volumes as a block
  device on the guest PCI surface. The legacy first-/second-generation
  families use `virtio-scsi`; current generations (Tau T2A,
  third-generation-or-later N2/N2D/C3, Confidential VM paths) expose them as
  **NVMe namespaces** behind a standard NVMe PCI controller -- PCI class
  `0x01` (mass storage), subclass `0x08` (NVM), programming interface `0x02`
  (NVM Express) -- the same class triple QEMU emulates with `-device nvme`
  and the kernel detects with `PciDevice::is_nvme_controller`
  (`kernel/src/pci.rs`).
- **Production PCI identity**: the GCE NVMe PD controller carries Google's
  PCI vendor id (current generation `0x1ae0`, distinct from QEMU's `0x1b36`).
  capOS therefore classifies on the device *class surface* and the brokered
  no-IOMMU bounce DMA shape, **not** on a QEMU vendor-id match (see §3). The
  live `cloud-gcp-storage-driver` run confirmed the GCE NVMe PD identity as
  `vendor.1ae0` / `dev.001f` on BDF `0000:00:05.0`.
- **Authoritative spec**: the NVM Express Base Specification (NVMe 1.4 / 2.0)
  is the wire contract; Google publishes no separate PD register spec because
  the device *is* a standard NVMe controller on the NVMe-family GCE shapes.
  Google documents PD device exposure under the "Persistent Disk overview"
  and "Local SSD" pages
  (<https://cloud.google.com/compute/docs/disks>).
- **virtio-scsi alternative**: older GCE families use `virtio-scsi` for PD
  rather than NVMe. capOS has **no userspace virtio-scsi provider driver**
  and the in-tree `make run-virtio-blk` proves the **kernel-owned**
  virtio-blk driver, which would leave the hidden kernel DMA ownership the
  userspace-provider acceptance forbids. So the older-family
  `virtio-scsi` path is recorded out of scope here
  (`gcp_scsi_path=no-userspace-provider-driver-out-of-scope`), the same
  shape as `docs/devices/azure-disk.md` records for the Hyper-V/virtio-scsi
  older-family path.

## 2. Wire format (shared with `docs/devices/nvme.md`)

GCE NVMe PD is **standard NVMe**: the controller registers, admin SQ/CQ
descriptors, IDENTIFY data, I/O SQ/CQ descriptors, PRP entries, and the
on-notify validator scan targets are exactly the ones documented in
[`nvme.md`](nvme.md) §2. No GCP-specific subset is reproduced here. The
shared NVMe storage-provider foundation
(`nvme-bind-claimed-mmio-read`,
`nvme-controller-reset-selected-write`,
`nvme-no-iommu-brokered-controller-enable`,
`nvme-admin-queue-identify`,
`nvme-admin-interrupt-delivery`,
`nvme-io-queue-and-read`) is the same wire model the local production
cloudboot chain ports into `kernel/src/device_manager/stub.rs` and the
production grant-source modules. The `cloud-gcp-storage-driver` closeout
validated that provider/storage binding against the live GCE PD controller
identity and evidence surface for one bounded NVMe READ.

## 3. capOS mapping

- **Cloud-shape classification**: `kernel/src/pci.rs` `report_cloud_nvme_shape`
  (the GCP path) classifies the bound controller against the GCE NVMe surface
  and emits the `nvme: cloud shape classification cloud_shape=gcp-persistent-disk
  ...` proof line on `make run-pci-nvme`, conjunctively with the bounce-buffer
  `dma: backend selection` line.
- **DMA backend**: GCE IOMMU-availability is the
  direct-remapping-if-verified-else-bounce-buffer policy from
  `cloud-dma-backend-selection` and the "Cloud DMA Backend" section of
  `docs/dma-isolation-design.md`. The 2026-05-24 GCE live probes recorded
  `n1-standard-1`, `e2-small`, `c3-standard-4`, and
  `n2d-standard-2` Confidential shapes as `IOMMU disabled → SWIOTLB →
  labeled bounce-buffer` in
  [`docs/research/cloud-dma-provider-evidence.md`](../research/cloud-dma-provider-evidence.md),
  so the cloud-shape proof line and the production storage-bind proof both
  run conjunctively with the bounce-buffer DMA backend.
- **No host-physical / IOVA export**: `iova_export=disabled-future-only`,
  `host_physical_user_visible=0`, `direct_dma=blocked`,
  `real_dma=not-attempted` — the same brokered-bounce shape NVMe records
  in §6–§8 of `nvme.md` and the production storage-bind proof records in §9.

## 4. Production storage-bind proof (local QEMU; non-`qemu` kernel)

`cloud-prod-storage-bound-local-proof` (the prerequisite of the billable
`cloud-gcp-storage-driver` slice) lands the production-path NVMe storage-bind
proof on the non-`qemu` cloud kernel. The implementation, composition, MSI-X
table program, I/O-completion handoff (kernel-side proxy), masked-no-wake,
teardown / stale-handle assertions, headline cloudboot evidence shape, why
the proof is settled with a kernel-side proxy, and asserted proof lines are
documented once in [`nvme.md` §9](nvme.md#9-production-path-storage-bind-proof-non-qemu-cloud-kernel)
and not reproduced here. The marker is parsed by `tools/cloudboot/run-test.sh`
as `STORAGE_BOUND_MARKER` into `provider.json.storage_bind_proof`.

The local QEMU boot of `target/disk.raw` (`make capos-cloudboot-image`,
`-device nvme`) demonstrates the bound on QEMU's NVMe class triple; it does not
exercise a live GCE PD NVMe vendor id.

## 5. Local production brokered NVMe provider chain

The moved parent
[`cloud-prod-nvme-brokered-userspace-provider-local-proof`](../tasks/done/2026-06-07/cloud-prod-nvme-brokered-userspace-provider-local-proof.md)
closes the local production provider prerequisite through its child records.
The implemented path is the same brokered no-IOMMU shape as `nvme.md`: the
manager authors `AQA`/`ASQ`/`ACQ`, queue-base pages, PRP1 entries, PRP lists,
doorbells, and completion consumption from live `DMAPool` ledger records. The
provider sees capability results and returned data bytes, not host-physical
addresses, IOVAs, queue-base values, or provider-authored PRP/SGL fields.

The local evidence covers:

- brokered controller enable and admin `IDENTIFY`;
- I/O queue creation, bounded READ/WRITE, second-LBA and multiblock I/O;
- `BlockDevice.readBlocks`, `writeBlocks`, and FLUSH-backed higher-level
  consumers over the `NvmeBrokered` backend;
- dedicated data-path `Interrupt.wait` / `Interrupt.acknowledge` completion
  proof;
- multi-PRP windows larger than one PRP1 page, with PRP list entries written by
  the manager.

This remains the local QEMU/cloudboot foundation under the same brokered
authority model. The billable real-GCE Persistent Disk bind run is the bounded
NVMe evidence in §6.

## 6. Live GCE NVMe Persistent Disk proof

`cloud-gcp-storage-driver` closed with live GCE run `1780806087-bf69`, launched
by `make cloudboot-gcp-storage-nvme-io-read-test` at source commit
`28518165518c29a48633682f4a6d9b5844c43335`. The run used a `c3-standard-4`
instance in `europe-west3-a` with `storage_interface=nvme`. The harness launched
with `GVNIC` guest feature / NIC type because C3 requires that launch posture;
this storage page does not claim a gVNIC driver or NIC datapath proof.

The evidence identified the GCE PD NVMe controller as class `01.08.02`,
`vendor.1ae0`, `device.001f`, BDF `0000:00:05.0`, with
`selected_dma_backend=bounce_buffer` and `enumeration_source=legacy-io`. The
manager drove the shared brokered NVMe chain: admin `IDENTIFY`, I/O CQ/SQ
creation, and one I/O `READ` against NSID 1, SLBA 0, NLB 1 / 512 bytes. The
serial marker recorded `live_cloud=gce-persistent-disk`,
`io_read=completed`, `io_sq_doorbell=performed`,
`io_cq_completion=polled-io-cq`, `prp_source=manager-ledger`,
`host_physical_user_visible=0`, and `iova_export=disabled-future-only`. The
read digest prefix was `eb3c904c494d494e4520200002000000`.

The capOS authority mapping is the same one recorded in `nvme.md`: `DeviceMmio`
gates BAR register and doorbell effects, `DMAPool` owns queue/data pages and
manager-authored PRP materialization, and `Interrupt` is present as the bounded
provider authority surface. The live read proof polls the I/O CQ; it does not
claim device-autonomous MSI-X delivery. The cloud harness evidence also recorded
no public IP, no service account, and `teardown_status=complete`.

## 7. Not in scope

- The older-family `virtio-scsi` PD path
  (`gcp_scsi_path=no-userspace-provider-driver-out-of-scope`).
- The Local SSD storage path (separate device surface, deferred).
- Multi-namespace, FUA, DSM, reusable `BlockDevice`/filesystem integration on
  live GCE, or live-provider device-autonomous completion delivery (deferred per
  `nvme.md`).
- Direct DMA, IOVA export, IOMMU/remapping programming (the
  `direct-remapping-if-verified` branch of the DMA-backend policy applies
  once a GCE shape with a verified vIOMMU is added; no current probed
  GCE shape satisfies that branch).
- AWS EBS, Azure managed disk, and GCP NIC readiness.
