GCP Persistent Disk (storage)
This is a provenance map for the GCP Persistent Disk (PD) storage shape: how a GCE instance presents its persistent disks to the guest, why most current families expose them as standard NVMe namespaces the shared NVMe foundation already drives, and the small GCP delta capOS adds on top. It is not a re-spec; the NVMe register/queue/PRP wire subset capOS actually touches is documented once in NVMe and not repeated here.
Maturity caveat. This page documents one bounded live-GCE NVMe Persistent
Disk proof on a c3-standard-4 VM, plus the local QEMU/cloudboot proofs that
preceded it. The live proof is a single brokered NVMe READ through provider
authority; it is not a general reusable storage provider, filesystem
integration, virtio-scsi path, Local SSD path, direct-DMA claim, or
device-autonomous MSI-X claim. The older
cloud-prod-storage-bound-local-proof composes production grant surfaces over a
discovered NVMe function and emits
cloudboot-evidence: storage-bound on a local boot of the
make capos-cloudboot-image disk under QEMU. The later
cloud-prod-nvme-brokered-userspace-provider-local-proof child chain drives the
same local QEMU -device nvme surface through brokered controller bring-up,
admin IDENTIFY, I/O queue creation, BlockDevice read/write/flush, a
dedicated data-completion Interrupt route, and multi-PRP windows while
preserving manager-authored queue-base/PRP materialization. The live GCE
closeout is the cloud-gcp-storage-driver run described in §6.
1. Spec basis
- Device: GCE Persistent Disk. GCE exposes attached PD volumes as a block
device on the guest PCI surface. The legacy first-/second-generation
families use
virtio-scsi; current generations (Tau T2A, third-generation-or-later N2/N2D/C3, Confidential VM paths) expose them as NVMe namespaces behind a standard NVMe PCI controller – PCI class0x01(mass storage), subclass0x08(NVM), programming interface0x02(NVM Express) – the same class triple QEMU emulates with-device nvmeand the kernel detects withPciDevice::is_nvme_controller(kernel/src/pci.rs). - Production PCI identity: the GCE NVMe PD controller carries Google’s
PCI vendor id (current generation
0x1ae0, distinct from QEMU’s0x1b36). capOS therefore classifies on the device class surface and the brokered no-IOMMU bounce DMA shape, not on a QEMU vendor-id match (see §3). The livecloud-gcp-storage-driverrun confirmed the GCE NVMe PD identity asvendor.1ae0/dev.001fon BDF0000:00:05.0. - Authoritative spec: the NVM Express Base Specification (NVMe 1.4 / 2.0) is the wire contract; Google publishes no separate PD register spec because the device is a standard NVMe controller on the NVMe-family GCE shapes. Google documents PD device exposure under the “Persistent Disk overview” and “Local SSD” pages (https://cloud.google.com/compute/docs/disks).
- virtio-scsi alternative: older GCE families use
virtio-scsifor PD rather than NVMe. capOS has no userspace virtio-scsi provider driver and the in-treemake run-virtio-blkproves the kernel-owned virtio-blk driver, which would leave the hidden kernel DMA ownership the userspace-provider acceptance forbids. So the older-familyvirtio-scsipath is recorded out of scope here (gcp_scsi_path=no-userspace-provider-driver-out-of-scope), the same shape asdocs/devices/azure-disk.mdrecords for the Hyper-V/virtio-scsi older-family path.
2. Wire format (shared with docs/devices/nvme.md)
GCE NVMe PD is standard NVMe: the controller registers, admin SQ/CQ
descriptors, IDENTIFY data, I/O SQ/CQ descriptors, PRP entries, and the
on-notify validator scan targets are exactly the ones documented in
NVMe §2. No GCP-specific subset is reproduced here. The
shared NVMe storage-provider foundation
(nvme-bind-claimed-mmio-read,
nvme-controller-reset-selected-write,
nvme-no-iommu-brokered-controller-enable,
nvme-admin-queue-identify,
nvme-admin-interrupt-delivery,
nvme-io-queue-and-read) is the same wire model the local production
cloudboot chain ports into kernel/src/device_manager/stub.rs and the
production grant-source modules. The cloud-gcp-storage-driver closeout
validated that provider/storage binding against the live GCE PD controller
identity and evidence surface for one bounded NVMe READ.
3. capOS mapping
- Cloud-shape classification:
kernel/src/pci.rsreport_cloud_nvme_shape(the GCP path) classifies the bound controller against the GCE NVMe surface and emits thenvme: cloud shape classification cloud_shape=gcp-persistent-disk ...proof line onmake run-pci-nvme, conjunctively with the bounce-bufferdma: backend selectionline. - DMA backend: GCE IOMMU-availability is the
direct-remapping-if-verified-else-bounce-buffer policy from
cloud-dma-backend-selectionand the “Cloud DMA Backend” section ofdocs/dma-isolation-design.md. The 2026-05-24 GCE live probes recordedn1-standard-1,e2-small,c3-standard-4, andn2d-standard-2Confidential shapes asIOMMU disabled → SWIOTLB → labeled bounce-bufferin Cloud DMA Provider Evidence Inventory, so the cloud-shape proof line and the production storage-bind proof both run conjunctively with the bounce-buffer DMA backend. - No host-physical / IOVA export:
iova_export=disabled-future-only,host_physical_user_visible=0,direct_dma=blocked,real_dma=not-attempted— the same brokered-bounce shape NVMe records in §6–§8 ofnvme.mdand the production storage-bind proof records in §9.
4. Production storage-bind proof (local QEMU; non-qemu kernel)
cloud-prod-storage-bound-local-proof (the prerequisite of the billable
cloud-gcp-storage-driver slice) lands the production-path NVMe storage-bind
proof on the non-qemu cloud kernel. The implementation, composition, MSI-X
table program, I/O-completion handoff (kernel-side proxy), masked-no-wake,
teardown / stale-handle assertions, headline cloudboot evidence shape, why
the proof is settled with a kernel-side proxy, and asserted proof lines are
documented once in nvme.md §9
and not reproduced here. The marker is parsed by tools/cloudboot/run-test.sh
as STORAGE_BOUND_MARKER into provider.json.storage_bind_proof.
The local QEMU boot of target/disk.raw (make capos-cloudboot-image,
-device nvme) demonstrates the bound on QEMU’s NVMe class triple; it does not
exercise a live GCE PD NVMe vendor id.
5. Local production brokered NVMe provider chain
The moved parent
cloud-prod-nvme-brokered-userspace-provider-local-proof
closes the local production provider prerequisite through its child records.
The implemented path is the same brokered no-IOMMU shape as nvme.md: the
manager authors AQA/ASQ/ACQ, queue-base pages, PRP1 entries, PRP lists,
doorbells, and completion consumption from live DMAPool ledger records. The
provider sees capability results and returned data bytes, not host-physical
addresses, IOVAs, queue-base values, or provider-authored PRP/SGL fields.
The local evidence covers:
- brokered controller enable and admin
IDENTIFY; - I/O queue creation, bounded READ/WRITE, second-LBA and multiblock I/O;
BlockDevice.readBlocks,writeBlocks, and FLUSH-backed higher-level consumers over theNvmeBrokeredbackend;- dedicated data-path
Interrupt.wait/Interrupt.acknowledgecompletion proof; - multi-PRP windows larger than one PRP1 page, with PRP list entries written by the manager.
This remains the local QEMU/cloudboot foundation under the same brokered authority model. The billable real-GCE Persistent Disk bind run is the bounded NVMe evidence in §6.
6. Live GCE NVMe Persistent Disk proof
cloud-gcp-storage-driver closed with live GCE run 1780806087-bf69, launched
by make cloudboot-gcp-storage-nvme-io-read-test at source commit
28518165518c29a48633682f4a6d9b5844c43335. The run used a c3-standard-4
instance in europe-west3-a with storage_interface=nvme. The harness launched
with GVNIC guest feature / NIC type because C3 requires that launch posture;
this storage page does not claim a gVNIC driver or NIC datapath proof.
The evidence identified the GCE PD NVMe controller as class 01.08.02,
vendor.1ae0, device.001f, BDF 0000:00:05.0, with
selected_dma_backend=bounce_buffer and enumeration_source=legacy-io. The
manager drove the shared brokered NVMe chain: admin IDENTIFY, I/O CQ/SQ
creation, and one I/O READ against NSID 1, SLBA 0, NLB 1 / 512 bytes. The
serial marker recorded live_cloud=gce-persistent-disk,
io_read=completed, io_sq_doorbell=performed,
io_cq_completion=polled-io-cq, prp_source=manager-ledger,
host_physical_user_visible=0, and iova_export=disabled-future-only. The
read digest prefix was eb3c904c494d494e4520200002000000.
The capOS authority mapping is the same one recorded in nvme.md: DeviceMmio
gates BAR register and doorbell effects, DMAPool owns queue/data pages and
manager-authored PRP materialization, and Interrupt is present as the bounded
provider authority surface. The live read proof polls the I/O CQ; it does not
claim device-autonomous MSI-X delivery. The cloud harness evidence also recorded
no public IP, no service account, and teardown_status=complete.
7. Not in scope
- The older-family
virtio-scsiPD path (gcp_scsi_path=no-userspace-provider-driver-out-of-scope). - The Local SSD storage path (separate device surface, deferred).
- Multi-namespace, FUA, DSM, reusable
BlockDevice/filesystem integration on live GCE, or live-provider device-autonomous completion delivery (deferred pernvme.md). - Direct DMA, IOVA export, IOMMU/remapping programming (the
direct-remapping-if-verifiedbranch of the DMA-backend policy applies once a GCE shape with a verified vIOMMU is added; no current probed GCE shape satisfies that branch). - AWS EBS, Azure managed disk, and GCP NIC readiness.