Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GCP Persistent Disk (storage)

This is a provenance map for the GCP Persistent Disk (PD) storage shape: how a GCE instance presents its persistent disks to the guest, why most current families expose them as standard NVMe namespaces the shared NVMe foundation already drives, and the small GCP delta capOS adds on top. It is not a re-spec; the NVMe register/queue/PRP wire subset capOS actually touches is documented once in NVMe and not repeated here.

Maturity caveat. This page documents one bounded live-GCE NVMe Persistent Disk proof on a c3-standard-4 VM, plus the local QEMU/cloudboot proofs that preceded it. The live proof is a single brokered NVMe READ through provider authority; it is not a general reusable storage provider, filesystem integration, virtio-scsi path, Local SSD path, direct-DMA claim, or device-autonomous MSI-X claim. The older cloud-prod-storage-bound-local-proof composes production grant surfaces over a discovered NVMe function and emits cloudboot-evidence: storage-bound on a local boot of the make capos-cloudboot-image disk under QEMU. The later cloud-prod-nvme-brokered-userspace-provider-local-proof child chain drives the same local QEMU -device nvme surface through brokered controller bring-up, admin IDENTIFY, I/O queue creation, BlockDevice read/write/flush, a dedicated data-completion Interrupt route, and multi-PRP windows while preserving manager-authored queue-base/PRP materialization. The live GCE closeout is the cloud-gcp-storage-driver run described in §6.

1. Spec basis

  • Device: GCE Persistent Disk. GCE exposes attached PD volumes as a block device on the guest PCI surface. The legacy first-/second-generation families use virtio-scsi; current generations (Tau T2A, third-generation-or-later N2/N2D/C3, Confidential VM paths) expose them as NVMe namespaces behind a standard NVMe PCI controller – PCI class 0x01 (mass storage), subclass 0x08 (NVM), programming interface 0x02 (NVM Express) – the same class triple QEMU emulates with -device nvme and the kernel detects with PciDevice::is_nvme_controller (kernel/src/pci.rs).
  • Production PCI identity: the GCE NVMe PD controller carries Google’s PCI vendor id (current generation 0x1ae0, distinct from QEMU’s 0x1b36). capOS therefore classifies on the device class surface and the brokered no-IOMMU bounce DMA shape, not on a QEMU vendor-id match (see §3). The live cloud-gcp-storage-driver run confirmed the GCE NVMe PD identity as vendor.1ae0 / dev.001f on BDF 0000:00:05.0.
  • Authoritative spec: the NVM Express Base Specification (NVMe 1.4 / 2.0) is the wire contract; Google publishes no separate PD register spec because the device is a standard NVMe controller on the NVMe-family GCE shapes. Google documents PD device exposure under the “Persistent Disk overview” and “Local SSD” pages (https://cloud.google.com/compute/docs/disks).
  • virtio-scsi alternative: older GCE families use virtio-scsi for PD rather than NVMe. capOS has no userspace virtio-scsi provider driver and the in-tree make run-virtio-blk proves the kernel-owned virtio-blk driver, which would leave the hidden kernel DMA ownership the userspace-provider acceptance forbids. So the older-family virtio-scsi path is recorded out of scope here (gcp_scsi_path=no-userspace-provider-driver-out-of-scope), the same shape as docs/devices/azure-disk.md records for the Hyper-V/virtio-scsi older-family path.

2. Wire format (shared with docs/devices/nvme.md)

GCE NVMe PD is standard NVMe: the controller registers, admin SQ/CQ descriptors, IDENTIFY data, I/O SQ/CQ descriptors, PRP entries, and the on-notify validator scan targets are exactly the ones documented in NVMe §2. No GCP-specific subset is reproduced here. The shared NVMe storage-provider foundation (nvme-bind-claimed-mmio-read, nvme-controller-reset-selected-write, nvme-no-iommu-brokered-controller-enable, nvme-admin-queue-identify, nvme-admin-interrupt-delivery, nvme-io-queue-and-read) is the same wire model the local production cloudboot chain ports into kernel/src/device_manager/stub.rs and the production grant-source modules. The cloud-gcp-storage-driver closeout validated that provider/storage binding against the live GCE PD controller identity and evidence surface for one bounded NVMe READ.

3. capOS mapping

  • Cloud-shape classification: kernel/src/pci.rs report_cloud_nvme_shape (the GCP path) classifies the bound controller against the GCE NVMe surface and emits the nvme: cloud shape classification cloud_shape=gcp-persistent-disk ... proof line on make run-pci-nvme, conjunctively with the bounce-buffer dma: backend selection line.
  • DMA backend: GCE IOMMU-availability is the direct-remapping-if-verified-else-bounce-buffer policy from cloud-dma-backend-selection and the “Cloud DMA Backend” section of docs/dma-isolation-design.md. The 2026-05-24 GCE live probes recorded n1-standard-1, e2-small, c3-standard-4, and n2d-standard-2 Confidential shapes as IOMMU disabled → SWIOTLB → labeled bounce-buffer in Cloud DMA Provider Evidence Inventory, so the cloud-shape proof line and the production storage-bind proof both run conjunctively with the bounce-buffer DMA backend.
  • No host-physical / IOVA export: iova_export=disabled-future-only, host_physical_user_visible=0, direct_dma=blocked, real_dma=not-attempted — the same brokered-bounce shape NVMe records in §6–§8 of nvme.md and the production storage-bind proof records in §9.

4. Production storage-bind proof (local QEMU; non-qemu kernel)

cloud-prod-storage-bound-local-proof (the prerequisite of the billable cloud-gcp-storage-driver slice) lands the production-path NVMe storage-bind proof on the non-qemu cloud kernel. The implementation, composition, MSI-X table program, I/O-completion handoff (kernel-side proxy), masked-no-wake, teardown / stale-handle assertions, headline cloudboot evidence shape, why the proof is settled with a kernel-side proxy, and asserted proof lines are documented once in nvme.md §9 and not reproduced here. The marker is parsed by tools/cloudboot/run-test.sh as STORAGE_BOUND_MARKER into provider.json.storage_bind_proof.

The local QEMU boot of target/disk.raw (make capos-cloudboot-image, -device nvme) demonstrates the bound on QEMU’s NVMe class triple; it does not exercise a live GCE PD NVMe vendor id.

5. Local production brokered NVMe provider chain

The moved parent cloud-prod-nvme-brokered-userspace-provider-local-proof closes the local production provider prerequisite through its child records. The implemented path is the same brokered no-IOMMU shape as nvme.md: the manager authors AQA/ASQ/ACQ, queue-base pages, PRP1 entries, PRP lists, doorbells, and completion consumption from live DMAPool ledger records. The provider sees capability results and returned data bytes, not host-physical addresses, IOVAs, queue-base values, or provider-authored PRP/SGL fields.

The local evidence covers:

  • brokered controller enable and admin IDENTIFY;
  • I/O queue creation, bounded READ/WRITE, second-LBA and multiblock I/O;
  • BlockDevice.readBlocks, writeBlocks, and FLUSH-backed higher-level consumers over the NvmeBrokered backend;
  • dedicated data-path Interrupt.wait / Interrupt.acknowledge completion proof;
  • multi-PRP windows larger than one PRP1 page, with PRP list entries written by the manager.

This remains the local QEMU/cloudboot foundation under the same brokered authority model. The billable real-GCE Persistent Disk bind run is the bounded NVMe evidence in §6.

6. Live GCE NVMe Persistent Disk proof

cloud-gcp-storage-driver closed with live GCE run 1780806087-bf69, launched by make cloudboot-gcp-storage-nvme-io-read-test at source commit 28518165518c29a48633682f4a6d9b5844c43335. The run used a c3-standard-4 instance in europe-west3-a with storage_interface=nvme. The harness launched with GVNIC guest feature / NIC type because C3 requires that launch posture; this storage page does not claim a gVNIC driver or NIC datapath proof.

The evidence identified the GCE PD NVMe controller as class 01.08.02, vendor.1ae0, device.001f, BDF 0000:00:05.0, with selected_dma_backend=bounce_buffer and enumeration_source=legacy-io. The manager drove the shared brokered NVMe chain: admin IDENTIFY, I/O CQ/SQ creation, and one I/O READ against NSID 1, SLBA 0, NLB 1 / 512 bytes. The serial marker recorded live_cloud=gce-persistent-disk, io_read=completed, io_sq_doorbell=performed, io_cq_completion=polled-io-cq, prp_source=manager-ledger, host_physical_user_visible=0, and iova_export=disabled-future-only. The read digest prefix was eb3c904c494d494e4520200002000000.

The capOS authority mapping is the same one recorded in nvme.md: DeviceMmio gates BAR register and doorbell effects, DMAPool owns queue/data pages and manager-authored PRP materialization, and Interrupt is present as the bounded provider authority surface. The live read proof polls the I/O CQ; it does not claim device-autonomous MSI-X delivery. The cloud harness evidence also recorded no public IP, no service account, and teardown_status=complete.

7. Not in scope

  • The older-family virtio-scsi PD path (gcp_scsi_path=no-userspace-provider-driver-out-of-scope).
  • The Local SSD storage path (separate device surface, deferred).
  • Multi-namespace, FUA, DSM, reusable BlockDevice/filesystem integration on live GCE, or live-provider device-autonomous completion delivery (deferred per nvme.md).
  • Direct DMA, IOVA export, IOMMU/remapping programming (the direct-remapping-if-verified branch of the DMA-backend policy applies once a GCE shape with a verified vIOMMU is added; no current probed GCE shape satisfies that branch).
  • AWS EBS, Azure managed disk, and GCP NIC readiness.