Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cloud Driver Foundation: Gap Analysis

Premise Correction

A prior framing held that “capOS has no userspace device-driver foundation.” That is wrong. The userspace virtio driver foundation exists and is proven in QEMU across a month of landed DDF work. This document establishes precisely what the foundation covers and reduces each blocked cloud-driver task to its narrow real remaining gap, so no one re-implements a foundation that already exists.

What The Foundation Already Provides (proven, in docs/tasks/done/)

  • Device-agnostic virtio DMA/notify seam + relocated queue/discovery (ddf-virtio-driver-foundation-boundary, 2026-05-25). The split-ring Virtqueue and discover_modern_transport live in kernel/src/virtio.rs mod transport, driven through the VirtqueueDma seam (preflight/register/allocate/free/record-submission/record-completion over the device_dma ledger). virtio-net is one caller of the seam, not the only possible caller – a non-net virtio device (e.g. virtio-blk) can drive the same bounded ledger semantics. Proofs: make run-net, make run-ddf-provider-consumer.
  • Userspace provider owns the selected virtio-net TX queue end-to-end (ddf-provider-virtio-net-driver-closeout, 2026-05-23). A userspace process publishes real selected-queue TX descriptors, rings the doorbell through a DeviceMmio notify-write claim, consumes the TX used-ring completion, and exposes CQ identity – all through user-mode DMAPool/DMABuffer/DeviceMmio/ Interrupt authority, with no silent fallback to the in-kernel virtio-net TX helper while the provider owns TX. RX is bounded synthetic-token CQ identity (kernel RX cohabitation explicit). DMA backend is manager-owned bounce buffers.
  • Manager-granted provider/consumer authority lifecycle (ddf-userspace-driver-provider-consumer, 2026-05-11). A userspace provider consumes manifest-granted DMAPool/DeviceMmio/Interrupt authority; stale-authority rejection, revoke, and release/reset/driver-death teardown are proven.
  • GCP virtio-net function bound through the gate locally in QEMU (cloud-gcp-virtio-net-local-qemu-binding, 2026-05-26). The enumerated/bound function matches the documented GCP 1st/2nd-gen virtio-net surface (vendor 0x1af4), the resolved DMA backend is the labeled bounce-buffer path, proven by make run-net and make run-ddf-provider-consumer.
  • DMA backend selection (cloud-dma-backend-selection, 2026-05-24): boot probe -> fail-closed select -> manifest override; GCE resolves to bounce-buffer.
  • Production IOMMU remapping closeout (ddf-iommu-remapping-production-closeout, 2026-05-23): the direct-remapping domain path for IOMMU shapes (make run-iommu-remapping).
  • First BlockDevice CapObject (ddf-blockdevice-boundary-virtio-blk-smoke, 2026-05-25): a bounded sector write/read-back over virtio-blk (make run-virtio-blk). Note: this BlockDevice is kernel-side, over manager-owned bounce buffers – it is not a userspace storage provider.

Boundary Of The Foundation (where userspace ownership stops today)

  • NIC: userspace owns virtio-net TX; RX is synthetic/cohabited. No live hardware RX used-ring ownership, no direct DMA/IOMMU on the provider path, no cloud enumeration.
  • Storage: there is no userspace storage provider of any device class. The BlockDevice cap is kernel-side; NVMe is metadata-only (kernel/src/pci.rs enumerates the controller and emits a no-authority/ no-driver ... controller_init=not-started line, no register/queue/IDENTIFY/IO code). The NIC userspace driver does not transfer to storage: NVMe is a different device class (admin/IO submission+completion queue pairs, doorbells, PRP/SGL), and even userspace virtio-blk/virtio-scsi has no provider driver – the foundation seam makes it possible, but no slice has built it.
  • Production grant sources stage an arbitrary function through one device-agnostic entry point (done 2026-05-30). The non-qemu {dmapool,devicemmio,interrupt}_grant_source_prod statics previously inferred their candidate function from a hardcoded selection rule narrowed by #[cfg(feature = "cloud_*")] blocks scattered through each pick_candidate body. cloud-prod-grant-source-despecialization replaced that with one stage_with_class entry point per source that takes an explicit ProdGrantClass device-class descriptor (cap::prod_grant_source_class): AnyFunction (plain BAR / first usable function), DmaCapable (virtio or NVMe), or NvmeController (NVMe only); the DeviceMmio source additionally takes the explicit mapped-window length (one page for the plain/virtio-net notify family, two pages for the NVMe CC/admin-register selected-write region). The no-arg init() wrappers select the build’s descriptor and delegate, so a non-virtio-net function is staged by passing the matching descriptor rather than by reaching virtio-net-specific code. The transitional in-kernel qemu-path grant sources still carry the per-function init_*_for_device / init_provider_* variants; those follow the virtio transport into userspace under Phase C of the networking proposal rather than through this slice.

Per-Task Gap (the narrow real Y)

cloud-gcp-virtio-net-nic-driver -> runnable-now claim is superseded

The 2026-05-27 version of this document concluded that the GCP virtio-net live driver task was runnable as a cloud-evidence slice. That conclusion is now stale. The local production cloudboot bind markers have landed, but cloud-prod-provider-nic-bound-local-proof deliberately settled its completion boundary with a kernel-side dispatch-slot proxy because the production userspace-provider grant/waiter surface is still not available in the non-qemu cloudboot build.

The current local production chain is therefore still implementation work, not just billable evidence capture: The cloud-prod-provider-devicemmio-grant-source-local-proof, cloud-prod-provider-dmapool-grant-source-local-proof, and cloud-prod-provider-interrupt-grant-source-local-proof children are done (2026-05-28): the non-qemu cloudboot kernel can deliver DeviceMmio, DMAPool, and Interrupt grants to small userspace provider services through manifest/process-spawner delivery, each with its own local-QEMU proof and bounded caveats. The aggregate docs-status closeout cloud-prod-provider-grant-surface-local-proof is also done (2026-05-28): it records those landed children as one provider grant-surface boundary without adding new behavior. The remaining local production work is cloud-prod-provider-cap-waiter-local-proof, then cloud-prod-virtio-net-userspace-provider-local-proof (and the brokered NVMe sibling). Only after those local production userspace-provider tasks land does the live-GCE NIC task reduce to a cloud evidence/harness run.

The access and spend corrections still stand: GCE access is provisioned and the operator authorized billable runs on 2026-05-27. The blocker is local production userspace-provider authority, not cloud access.

Storage tasks -> gap is a userspace NVMe-class storage provider

cloud-gcp-storage-driver, cloud-gcp-storage-local-qemu-binding, cloud-aws-nvme-storage-driver, cloud-azure-disk-storage-driver all reduce to the same genuine missing piece: a userspace storage provider driver. virtio-net TX ownership does not carry to storage. Two real sub-gaps:

  1. No userspace storage provider driver. Either (a) a userspace virtio-blk/ virtio-scsi provider over the existing virtio seam (the kernel BlockDevice is kernel-side and does not satisfy the “no hidden kernel DMA ownership” acceptance), or (b) a userspace NVMe-class driver (controller bring-up + admin/ IO queue pairs + doorbells + PRP DMA) over the bounce-buffer/IOMMU backend. NVMe is the strategic target: GCP 3rd-gen+, AWS Nitro EBS, and Azure Boost are all NVMe, so one NVMe foundation unblocks all three providers’ storage legs.

  2. The no-IOMMU run-pci-nvme proof gate and the DMA-address ownership model. A real provider-driven NVMe completion + “no hidden kernel DMA ownership” + “no host-physical exposure” must all hold under the no-IOMMU bounce-buffer shape. The 2026-05-27 Model B override (provider writes queue-base/PRP addresses, kernel validates on notify) does not satisfy those constraints on the current no-IOMMU gate: device-visible equals host physical, and reviewed IOVA export discipline intentionally returns no usable device address to userspace.

    The correction is to split the lanes. Model B remains valid for a verified direct-remapping/vIOMMU gate, or a future synthetic address namespace translated by trusted code. The GCP/no-IOMMU lane must use brokered bounce: the provider owns NVMe protocol state and buffer/command capabilities, while the kernel or device manager materializes ASQ/ACQ, I/O queue-base, and PRP/SGL device-visible fields from the live DMAPool ledger. That is the only current path that preserves no-host-physical-exposure on GCP.

The ordered NVMe work therefore splits into:

  • no-IOMMU brokered lane: nvme-no-iommu-brokered-controller-enable (landed 2026-05-27 21:38 UTC, commit 11b86568) -> nvme-admin-queue-identify (landed 2026-05-27 22:34 UTC, commit cede5257) -> nvme-admin-interrupt-delivery (landed 2026-05-27 23:07 UTC, commit 18fd25c7) -> nvme-io-queue-and-read (ready brokered I/O/read);
  • direct-remapping lane: nvme-doorbell-dma-validator (landed mechanism) -> provider-written enable/admin/I/O slices on a verified IOMMU/vIOMMU gate.

Those are the real storage Y for the NVMe path; the virtio-scsi path is an alternative userspace provider of comparable size. None of this is “build a foundation” – it is “build a storage device-class provider on the existing foundation.”

AWS / Azure storage -> consume the GCP NVMe foundation + provider delta

cloud-aws-nvme-storage-driver and cloud-azure-disk-storage-driver already re-scope themselves to a small provider delta once the shared NVMe foundation lands. No new driver decomposition; their blocked-until is the GCP NVMe child chain. Their AWS/Azure NIC siblings (ENA, MANA) are vendor-custom and out of GCP-first scope.

What This Document Changes

  1. Supersedes the cloud-gcp-virtio-net-nic-driver runnable-now claim. The QEMU userspace virtio foundation remains useful grounding, but the live GCP NIC task stays blocked until the local production userspace-provider grant-source, waiter, and userspace virtio-net provider chain lands.
  2. Decomposes the storage gap GCP-first into a no-IOMMU brokered-bounce userspace NVMe lane for GCP and a separate direct-remapping Model B lane for IOMMU/vIOMMU proofs.
  3. Re-points AWS/Azure storage at the GCP NVMe child chain.

Design Grounding

  • docs/tasks/done/2026-05-25/ddf-virtio-driver-foundation-boundary.md
  • docs/tasks/done/2026-05-23/ddf-provider-virtio-net-driver-closeout.md
  • docs/tasks/done/2026-05-11/ddf-userspace-driver-provider-consumer.md
  • docs/tasks/done/2026-05-26/cloud-gcp-virtio-net-local-qemu-binding.md
  • docs/tasks/done/2026-05-25/ddf-blockdevice-boundary-virtio-blk-smoke.md
  • docs/tasks/done/2026-05-24/cloud-dma-backend-selection.md
  • docs/tasks/done/2026-05-23/ddf-iommu-remapping-production-closeout.md
  • docs/proposals/nvme-model-b-doorbell-dma-validator.md (conditional Model B validator for direct-remapping/synthetic-address lanes)
  • docs/research/dma-userspace-driver-isolation.md
  • docs/dma-isolation-design.md (Cloud DMA Backend; IOVA export discipline)
  • kernel/src/virtio.rs (transport::VirtqueueDma, transport::Virtqueue), kernel/src/cap/{dma_pool,dma_buffer,device_mmio,interrupt,block_device}.rs, kernel/src/device_dma.rs, kernel/src/device_interrupt.rs, kernel/src/pci.rs
  • docs/proposals/cloud-deployment-proposal.md, docs/backlog/hardware-boot-storage.md#cloud-device-tracks