Cloud Driver Foundation: Gap Analysis
Premise Correction
A prior framing held that “capOS has no userspace device-driver foundation.” That is wrong. The userspace virtio driver foundation exists and is proven in QEMU across a month of landed DDF work. This document establishes precisely what the foundation covers and reduces each blocked cloud-driver task to its narrow real remaining gap, so no one re-implements a foundation that already exists.
What The Foundation Already Provides (proven, in docs/tasks/done/)
- Device-agnostic virtio DMA/notify seam + relocated queue/discovery
(
ddf-virtio-driver-foundation-boundary, 2026-05-25). The split-ringVirtqueueanddiscover_modern_transportlive inkernel/src/virtio.rs mod transport, driven through theVirtqueueDmaseam (preflight/register/allocate/free/record-submission/record-completion over thedevice_dmaledger). virtio-net is one caller of the seam, not the only possible caller – a non-net virtio device (e.g. virtio-blk) can drive the same bounded ledger semantics. Proofs:make run-net,make run-ddf-provider-consumer. - Userspace provider owns the selected virtio-net TX queue end-to-end
(
ddf-provider-virtio-net-driver-closeout, 2026-05-23). A userspace process publishes real selected-queue TX descriptors, rings the doorbell through aDeviceMmionotify-write claim, consumes the TX used-ring completion, and exposes CQ identity – all through user-modeDMAPool/DMABuffer/DeviceMmio/Interruptauthority, with no silent fallback to the in-kernel virtio-net TX helper while the provider owns TX. RX is bounded synthetic-token CQ identity (kernel RX cohabitation explicit). DMA backend is manager-owned bounce buffers. - Manager-granted provider/consumer authority lifecycle
(
ddf-userspace-driver-provider-consumer, 2026-05-11). A userspace provider consumes manifest-granted DMAPool/DeviceMmio/Interrupt authority; stale-authority rejection, revoke, and release/reset/driver-death teardown are proven. - GCP virtio-net function bound through the gate locally in QEMU
(
cloud-gcp-virtio-net-local-qemu-binding, 2026-05-26). The enumerated/bound function matches the documented GCP 1st/2nd-gen virtio-net surface (vendor0x1af4), the resolved DMA backend is the labeled bounce-buffer path, proven bymake run-netandmake run-ddf-provider-consumer. - DMA backend selection (
cloud-dma-backend-selection, 2026-05-24): boot probe -> fail-closed select -> manifest override; GCE resolves to bounce-buffer. - Production IOMMU remapping closeout (
ddf-iommu-remapping-production-closeout, 2026-05-23): the direct-remapping domain path for IOMMU shapes (make run-iommu-remapping). - First
BlockDeviceCapObject (ddf-blockdevice-boundary-virtio-blk-smoke, 2026-05-25): a bounded sector write/read-back over virtio-blk (make run-virtio-blk). Note: thisBlockDeviceis kernel-side, over manager-owned bounce buffers – it is not a userspace storage provider.
Boundary Of The Foundation (where userspace ownership stops today)
- NIC: userspace owns virtio-net TX; RX is synthetic/cohabited. No live hardware RX used-ring ownership, no direct DMA/IOMMU on the provider path, no cloud enumeration.
- Storage: there is no userspace storage provider of any device class. The
BlockDevicecap is kernel-side; NVMe is metadata-only (kernel/src/pci.rsenumerates the controller and emits ano-authority/ no-driver ... controller_init=not-startedline, no register/queue/IDENTIFY/IO code). The NIC userspace driver does not transfer to storage: NVMe is a different device class (admin/IO submission+completion queue pairs, doorbells, PRP/SGL), and even userspace virtio-blk/virtio-scsi has no provider driver – the foundation seam makes it possible, but no slice has built it. - Production grant sources stage an arbitrary function through one
device-agnostic entry point (done 2026-05-30). The non-
qemu{dmapool,devicemmio,interrupt}_grant_source_prodstatics previously inferred their candidate function from a hardcoded selection rule narrowed by#[cfg(feature = "cloud_*")]blocks scattered through eachpick_candidatebody.cloud-prod-grant-source-despecializationreplaced that with onestage_with_classentry point per source that takes an explicitProdGrantClassdevice-class descriptor (cap::prod_grant_source_class):AnyFunction(plain BAR / first usable function),DmaCapable(virtio or NVMe), orNvmeController(NVMe only); the DeviceMmio source additionally takes the explicit mapped-window length (one page for the plain/virtio-net notify family, two pages for the NVMeCC/admin-register selected-write region). The no-arginit()wrappers select the build’s descriptor and delegate, so a non-virtio-net function is staged by passing the matching descriptor rather than by reaching virtio-net-specific code. The transitional in-kernelqemu-path grant sources still carry the per-functioninit_*_for_device/init_provider_*variants; those follow the virtio transport into userspace under Phase C of the networking proposal rather than through this slice.
Per-Task Gap (the narrow real Y)
cloud-gcp-virtio-net-nic-driver -> runnable-now claim is superseded
The 2026-05-27 version of this document concluded that the GCP virtio-net live
driver task was runnable as a cloud-evidence slice. That conclusion is now
stale. The local production cloudboot bind markers have landed, but
cloud-prod-provider-nic-bound-local-proof deliberately settled its completion
boundary with a kernel-side dispatch-slot proxy because the production
userspace-provider grant/waiter surface is still not available in the
non-qemu cloudboot build.
The current local production chain is therefore still implementation work, not
just billable evidence capture:
The cloud-prod-provider-devicemmio-grant-source-local-proof,
cloud-prod-provider-dmapool-grant-source-local-proof, and
cloud-prod-provider-interrupt-grant-source-local-proof children are done
(2026-05-28): the non-qemu cloudboot kernel can deliver DeviceMmio,
DMAPool, and Interrupt grants to small userspace provider services through
manifest/process-spawner delivery, each with its own local-QEMU proof and
bounded caveats. The aggregate docs-status closeout
cloud-prod-provider-grant-surface-local-proof is also done (2026-05-28):
it records those landed children as one provider grant-surface boundary
without adding new behavior. The remaining local production work is
cloud-prod-provider-cap-waiter-local-proof, then
cloud-prod-virtio-net-userspace-provider-local-proof (and the brokered NVMe
sibling). Only after those local production userspace-provider tasks land does
the live-GCE NIC task reduce to a cloud evidence/harness run.
The access and spend corrections still stand: GCE access is provisioned and the operator authorized billable runs on 2026-05-27. The blocker is local production userspace-provider authority, not cloud access.
Storage tasks -> gap is a userspace NVMe-class storage provider
cloud-gcp-storage-driver, cloud-gcp-storage-local-qemu-binding,
cloud-aws-nvme-storage-driver, cloud-azure-disk-storage-driver all reduce to
the same genuine missing piece: a userspace storage provider driver. virtio-net
TX ownership does not carry to storage. Two real sub-gaps:
-
No userspace storage provider driver. Either (a) a userspace virtio-blk/ virtio-scsi provider over the existing virtio seam (the kernel
BlockDeviceis kernel-side and does not satisfy the “no hidden kernel DMA ownership” acceptance), or (b) a userspace NVMe-class driver (controller bring-up + admin/ IO queue pairs + doorbells + PRP DMA) over the bounce-buffer/IOMMU backend. NVMe is the strategic target: GCP 3rd-gen+, AWS Nitro EBS, and Azure Boost are all NVMe, so one NVMe foundation unblocks all three providers’ storage legs. -
The no-IOMMU
run-pci-nvmeproof gate and the DMA-address ownership model. A real provider-driven NVMe completion + “no hidden kernel DMA ownership” + “no host-physical exposure” must all hold under the no-IOMMU bounce-buffer shape. The 2026-05-27 Model B override (provider writes queue-base/PRP addresses, kernel validates on notify) does not satisfy those constraints on the current no-IOMMU gate: device-visible equals host physical, and reviewed IOVA export discipline intentionally returns no usable device address to userspace.The correction is to split the lanes. Model B remains valid for a verified direct-remapping/vIOMMU gate, or a future synthetic address namespace translated by trusted code. The GCP/no-IOMMU lane must use brokered bounce: the provider owns NVMe protocol state and buffer/command capabilities, while the kernel or device manager materializes
ASQ/ACQ, I/O queue-base, and PRP/SGL device-visible fields from the liveDMAPoolledger. That is the only current path that preserves no-host-physical-exposure on GCP.
The ordered NVMe work therefore splits into:
- no-IOMMU brokered lane:
nvme-no-iommu-brokered-controller-enable(landed 2026-05-27 21:38 UTC, commit11b86568) ->nvme-admin-queue-identify(landed 2026-05-27 22:34 UTC, commitcede5257) ->nvme-admin-interrupt-delivery(landed 2026-05-27 23:07 UTC, commit18fd25c7) ->nvme-io-queue-and-read(ready brokered I/O/read); - direct-remapping lane:
nvme-doorbell-dma-validator(landed mechanism) -> provider-written enable/admin/I/O slices on a verified IOMMU/vIOMMU gate.
Those are the real storage Y for the NVMe path; the virtio-scsi path is an alternative userspace provider of comparable size. None of this is “build a foundation” – it is “build a storage device-class provider on the existing foundation.”
AWS / Azure storage -> consume the GCP NVMe foundation + provider delta
cloud-aws-nvme-storage-driver and cloud-azure-disk-storage-driver already
re-scope themselves to a small provider delta once the shared NVMe foundation
lands. No new driver decomposition; their blocked-until is the GCP NVMe child
chain. Their AWS/Azure NIC siblings (ENA, MANA) are vendor-custom and out of GCP-first scope.
What This Document Changes
- Supersedes the
cloud-gcp-virtio-net-nic-driverrunnable-now claim. The QEMU userspace virtio foundation remains useful grounding, but the live GCP NIC task stays blocked until the local production userspace-provider grant-source, waiter, and userspace virtio-net provider chain lands. - Decomposes the storage gap GCP-first into a no-IOMMU brokered-bounce userspace NVMe lane for GCP and a separate direct-remapping Model B lane for IOMMU/vIOMMU proofs.
- Re-points AWS/Azure storage at the GCP NVMe child chain.
Design Grounding
docs/tasks/done/2026-05-25/ddf-virtio-driver-foundation-boundary.mddocs/tasks/done/2026-05-23/ddf-provider-virtio-net-driver-closeout.mddocs/tasks/done/2026-05-11/ddf-userspace-driver-provider-consumer.mddocs/tasks/done/2026-05-26/cloud-gcp-virtio-net-local-qemu-binding.mddocs/tasks/done/2026-05-25/ddf-blockdevice-boundary-virtio-blk-smoke.mddocs/tasks/done/2026-05-24/cloud-dma-backend-selection.mddocs/tasks/done/2026-05-23/ddf-iommu-remapping-production-closeout.mddocs/proposals/nvme-model-b-doorbell-dma-validator.md(conditional Model B validator for direct-remapping/synthetic-address lanes)docs/research/dma-userspace-driver-isolation.mddocs/dma-isolation-design.md(Cloud DMA Backend; IOVA export discipline)kernel/src/virtio.rs(transport::VirtqueueDma,transport::Virtqueue),kernel/src/cap/{dma_pool,dma_buffer,device_mmio,interrupt,block_device}.rs,kernel/src/device_dma.rs,kernel/src/device_interrupt.rs,kernel/src/pci.rsdocs/proposals/cloud-deployment-proposal.md,docs/backlog/hardware-boot-storage.md#cloud-device-tracks