Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DMA User-Space Driver Isolation

This note records the DMA-addressing and isolation consequences capOS must use when planning user-space storage and NIC drivers. It is intentionally about authority boundaries, not about a particular NVMe or virtio implementation.

Address Spaces And Trust Boundaries

A DMA-capable device does not use a process virtual address. It consumes a device-visible address carried in descriptors, queue-base registers, PRP/SGL entries, or an equivalent protocol field.

On a bare host with an IOMMU:

user VA --CPU MMU--> host physical address
device IOVA --IOMMU--> host physical address

On a guest VM:

guest user VA --guest MMU--> guest physical address --EPT/NPT--> host physical address

With a virtual or assigned IOMMU, a guest can additionally reason about:

guest device IOVA --vIOMMU or paravirt grant layer--> guest physical address

The host still owns the real host IOMMU or equivalent hypervisor translation. A guest-programmable vIOMMU is useful because it gives the guest kernel a guest-internal DMA authority boundary; it is not direct control of the host IOMMU.

Host User-Space Driver Pattern

A safe host user-space driver resembles the VFIO/IOMMUFD split:

  • The kernel owns PCI discovery, BAR assignment, PCI configuration mediation, IOMMU domain creation, DMA map/unmap, page pinning, interrupt or MSI-X routing, reset, hotplug, and revocation.
  • The user-space driver owns protocol logic: queue formats, descriptor contents, device-specific register sequencing, doorbells, polling, completion handling, and command construction.
  • The driver may receive a domain-scoped IOVA for a live buffer only when the kernel has installed and can revoke the IOMMU mapping for that device.
  • The driver must not receive unrestricted host physical addresses.

UIO-style “map a BAR and deliver interrupts” is not a complete security model for a DMA-capable PCI device. If a user-space process can program a DMA engine through MMIO, then DMA isolation requires either an IOMMU domain or a stricter broker that prevents raw device-address publication.

Guest Microkernel Pattern

Host isolation and guest isolation are different claims.

For an assigned PCI device or SR-IOV VF without a guest-visible IOMMU, the host can still protect itself by mapping the device only to the VM’s memory. That does not protect the guest kernel from an untrusted guest user-space driver: from the guest’s perspective the device can still DMA to arbitrary guest physical pages.

Virtual devices have the same guest-internal issue in a different form. If an untrusted driver can put arbitrary guest physical addresses into virtqueue descriptors, the host backend can write into guest kernel memory while still staying inside the VM boundary. The host remains protected; the guest kernel is not.

A guest microkernel that wants untrusted user-space drivers therefore needs one of these guest-visible authorization layers:

  • a vIOMMU or virtio-iommu path where the guest kernel controls guest IOVA to guest physical mappings;
  • a paravirtual grant-table model where descriptors carry grant identifiers instead of raw guest physical addresses;
  • a trusted mediation service that owns descriptor/device-address fields and lets the untrusted driver submit only typed commands, buffer capabilities, or opaque handles.

The invariant is:

Never let an untrusted guest driver provide a raw guest physical address to a
device or backend unless a guest-visible DMA authorization layer validates it.

BAR, MSI-X, And DMA Are Separate Authority Surfaces

BAR/MMIO controls CPU-to-device register access. DMA controls device-to-memory access. MSI/MSI-X controls device-to-interrupt-controller messages. A safe user-space driver interface needs all three mediated.

  • Mapping a BAR is not enough; a BAR write can enable bus mastering or ring a doorbell that makes descriptors visible to the device.
  • MSI-X tables often live inside a BAR. A driver must not get arbitrary write access to MSI-X message address/data entries unless the kernel or hypervisor can mediate interrupt remapping.
  • IOMMU memory remapping does not by itself protect BAR register semantics or interrupt routing.

For capOS, DeviceMmio, DMAPool/DMABuffer, and Interrupt must remain separate capabilities with a single device-manager ledger tying them to the same owner generation and teardown state.

No-IOMMU Bounce-Buffer Consequences

On a shape without guest-programmable remapping, a real PCI device’s device-visible address is the host physical or bus address the controller uses for DMA. A bounce buffer can keep the data path manager-owned, but it does not magically create an untrusted-driver-safe IOVA namespace.

The no-IOMMU fallback can preserve no-host-physical-exposure only if userspace does not author raw device-address fields. The kernel or a trusted device manager must instead:

  • allocate and pin the device-visible bounce pages;
  • program queue-base registers and PRP/SGL or virtqueue address fields, or translate typed driver requests into those fields;
  • copy between device-visible bounce pages and non-device memory when the selected backend requires it;
  • quiesce outstanding DMA before revoke or page reuse;
  • scrub bounce pages before reuse;
  • keep hostile_hardware_isolation=not-claimed.

The costs are direct: extra copies, higher latency, CPU/cache pressure, bounded pool exhaustion risk, more teardown bookkeeping, and no hostile-hardware memory isolation claim. These costs are the price of not exposing host physical addresses when no guest-programmable remapping exists.

GCP And QEMU Implications

The GCE probes in Cloud DMA Provider Evidence Inventory show no guest-programmable IOMMU on the sampled GCP shapes: no usable DMAR/IVRS/IORT tables or IOMMU groups, and SWIOTLB software bounce buffering in the Linux guest. Host-side or provider-side isolation may still exist, but capOS cannot program or validate it from inside the guest.

The practical split is:

  • QEMU run-iommu-remapping remains the right local proof lane for direct-remapping behavior: domain-scoped IOVA export, per-device domains, invalidation, faults, and stale-DMA behavior.
  • GCP storage and NIC driver planning must treat the probed shapes as no-IOMMU/bounce-buffer targets until a future runtime probe observes a guest-programmable remapping unit.
  • A design that requires the provider to write device-visible queue-base or PRP/SGL addresses is valid only on a verified direct-remapping/vIOMMU path, or after capOS implements a separate synthetic address namespace that the kernel translates before hardware sees it.
  • On the current GCP/no-IOMMU path, the recommended storage design is brokered: userspace owns protocol decisions and buffer capabilities, while the kernel or device manager materializes the device-visible DMA addresses.

Use three explicit modes in planning and task acceptance:

ModeWhen it appliesUser-space device-address exposure
direct-remappingcapOS discovers, programs, and validates a guest-visible IOMMU/vIOMMU domain.Domain-scoped IOVA only, labeled as meaningless outside that domain.
brokered-bounceNo usable guest IOMMU, but a manager-owned bounce path can safely support the device.None: provider passes buffer caps, grant IDs, or typed commands; kernel writes device-visible addresses.
unsupportedObservations are contradictory, unsafe, or no safe brokered path exists.None: device stays unbound or disabled.

For GCP today, brokered-bounce is the only credible storage/NIC driver target on the probed shapes. direct-remapping remains a QEMU proof lane and a future cloud/hardware lane only after runtime evidence shows guest-programmable remapping.