Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DMA Assurance Model

Current DMA authority and isolation design authority lives in DMA Isolation. This proposal defines the accepted evidence model and is retained as the grounding record for DMA proof obligations.

The DMA assurance model is the evidence scaffold for moving capOS from bounded QEMU-local provider proofs toward cloud and production device-driver claims. It does not select a cloud DMA backend. It defines the claims that a backend must prove, the model objects those claims refer to, and the tools that should check each claim before a driver slice can cite it.

The immediate use is the cloud DMA backend decision: direct DMA through a reviewed remapping domain, labeled bounce buffers, or unsupported. The binding choice and any per-VM-shape safety claim remain attended decisions.

Claim Boundary

The model is about DMA authority, not whole-kernel correctness.

In scope:

  • ownership of Device, DMAPool, DMABuffer, Page, IommuDomain, Iova, descriptor, completion, and interrupt-route state;
  • lifecycle transitions from allocation through mapping, publication, completion, revocation, invalidation, scrub, and reuse;
  • stale handle, stale completion, revoke/reset race, teardown-under-DMA, no-host-physical-exposure, and cross-domain aliasing claims;
  • the evidence split between IOMMU-backed direct DMA and labeled bounce-buffer fallback.

Out of scope:

  • proving all kernel behavior;
  • proving cloud-provider hardware facts without attended evidence;
  • treating QEMU Intel VT-d evidence as general hardware evidence;
  • creating a new prover or proof kernel.

The capOS-specific layer may become a DSL later, but it must emit to mature checkers or proof assistants. A self-authenticating capOS prover would increase the trusted base and is not part of this plan.

Model Objects

The abstract model uses these terms consistently across docs, model files, and future proof harnesses:

ObjectMeaning
DevicePCI function or provider device that can issue DMA.
IommuDomainDevice-manager-owned translation context or trusted sharing group.
DMAPoolCapability-scoped allocation authority for DMA buffers.
DMABufferLive buffer handle with owner, slot, generation, and mapping state.
PagePhysical backing page owned by the device manager or held fail-closed.
IovaDevice-visible address meaningful only inside one domain.
DescriptorDevice-visible command referencing a live buffer generation.
CompletionDevice or software observation that a descriptor finished.
IrqRouteInterrupt source, route generation, waiter, mask, and ack state.

The first model files live under models/dma/. They are small by design: reviewers should be able to read the whole state machine and tell whether it matches the DMA design before any checker is involved.

Required Invariants

InvariantRequired meaning
No host-physical exposureResult caps, diagnostics, audit, and cloud evidence never expose a host physical address to a driver. IOMMU-backed paths may expose only a domain-scoped IOVA labeled with its domain.
Mapping before publicationA descriptor cannot become device-visible until the backing buffer is live, owned by the device manager, and either mapped in the selected IOMMU domain or copied through the selected bounce-buffer path.
No page reuse before teardownA DMA page cannot return to the general free pool until submissions are stopped, in-flight descriptors are drained or invalidated, mappings are removed, required invalidations complete, and the page is scrubbed.
Stale handles fail closedA stale pool, buffer, slot, page, source, route, or generation cannot create a new side effect.
Stale completions fail closedA completion whose descriptor, buffer, slot, page, owner, or generation no longer matches cannot publish CQ state, ack IRQ state, free pages, or reuse buffers.
Domain-scoped aliasing onlyThe same IOVA may be reused in different domains, but one domain cannot map the same IOVA to two pages unless an explicit trusted sharing group model permits it.
Fail-closed leaks are boundedIf teardown cannot prove that hardware can no longer reach a page, the page or pool may be held, but that hold must be accounted, bounded, and surfaced as a remediation item.
Backend evidence is explicitDirect DMA requires remapping-domain evidence. Bounce-buffer fallback must stay labeled as not hostile-hardware isolation. Unsupported devices stay disabled.

Tool Mapping

The assurance model intentionally uses several narrow tools instead of one large proof.

ToolcapOS use
TLA+ / TLCModel lifecycle ordering and races: allocate, map, publish, complete, revoke, flush, scrub, reuse, reset, and fail-closed hold. The v0 skeleton is models/dma/dma_authority.tla.
AlloyModel the relational authority graph: device, domain, IOVA, page, owner, and alias constraints. The v0 skeleton is models/dma/dma_authority.als.
KaniProve pure Rust validators and accounting helpers once they are extracted into host-checkable code: generation matching, budget arithmetic, stale rejection, and fail-closed transitions.
LoomCover concurrency-sensitive state that depends on atomics, queues, or multi-CPU ordering. The first target was the DeferredCompletionQueue / TLB-shootdown model gap now recorded in docs/tasks/done/2026-06-04/dma-assurance-model-deferred-completion-loom.md.
VerusCandidate later tool for small critical Rust cores that need unbounded functional contracts and are stable enough to justify annotation cost.
HAMR / MicrokitReference architecture for static component contracts and traceability, not a replacement runtime for capOS. Useful for comparing device-manager and driver partitioning assumptions.

Do not claim a checked model result merely because the files exist. A checked claim requires recording the exact tool, version, configuration, model bounds, and command output in the task evidence.

V0 Gate

dma-assurance-model-v0 is complete when:

  • this proposal defines the model objects, invariants, tool mapping, and claim boundaries;
  • models/dma/ contains inspectable TLA+ and Alloy skeletons for the lifecycle and authority graph;
  • the cloud DMA backend draft task depends on this model before it can be promoted beyond proposal text;
  • the verification workflow names these model files as planned design evidence, while making clear that no required checker gate exists yet;
  • docs workflow and diff hygiene pass.

Future slices should add actual checker commands only after the repo has pinned tool installation and run targets. Suggested future targets are make model-dma-tla, make model-dma-alloy, make kani-dma-authority, and a focused Loom target for DeferredCompletionQueue.

V1 Operationalization

dma-assurance-model-operationalization (2026-06-04) reconciles the v0 skeletons with the DMA authority code that landed after them and emits the checker tracks as concrete task records, so the work cannot be silently parked again. The reconciliation gap table — which invariants the skeletons already capture and which landed-since invariants are MISSING — is recorded in models/dma/README.md and grounded against named symbols in kernel/src/device_dma.rs, kernel/src/cap/dma_buffer.rs, kernel/src/device_manager/stub.rs, kernel/src/cap/virtio_net_userspace_rx_dma_proof.rs, and kernel/src/arch/x86_64/tlb.rs.

Landed-since invariants MISSING from the v0 skeletons: ownership-generation bump on recycle, map-record-before-PTE-install ordering, drive-pin/quarantine, the queue-enable epoch fence, and the deferred-EOI / completion-queue concurrency. Each is owned by an emitted checker slice (each names its make target, pinned tool + version, model bounds, and the exact invariant it checks, and each must record checked output per the anti-overclaim rule above):

Trackmake targetToolSlice
Lifecycle ordering + generation + stale-completionmake model-dma-tlaTLA+/TLC (pinned; TLC-pin owner shared with the scheduler/IRQ model tracks)dma-assurance-model-tla-checked-gate (done 2026-06-04, checked clean at 2/2/2/2, gen 0..1)
Device/domain/IOVA/page/alias authority graph + generationmake model-dma-alloyAlloy (pinned 6.2.0; Alloy-pin owner)dma-assurance-model-alloy-checked-gate (done 2026-06-04, checked for 4)
Extracted pure ownership-generation / stale-handle / no-re-expose coremake kani-dma-authorityKani (pinned 0.67.0, kani-lib style)dma-assurance-model-kani-authority-core (done 2026-06-04, 3 harnesses checked over capos_lib::dma_authority)
Deferred-EOI / completion-queue concurrencynew Loom targetLoom (test-ring-loom sibling)dma-assurance-model-deferred-completion-loom
CI wiring (make check / GitHub gate) + cite checked evidence(wiring only)dma-assurance-model-ci-wiring (done 2026-06-05)

Cloud Backend Use

The cloud backend draft must cite this model and fill an evidence matrix for each backend candidate:

CandidateRequired evidence before sign-off
Direct remapping domainCloud VM shape exposes guest-programmable remapping hardware; capOS can discover and program it; descriptor publication is ordered after mapping; teardown removes mappings and observes required invalidations before page reuse; hostile stale-DMA and stale-completion smokes cover the selected path.
Labeled bounce-buffer fallbackDirect DMA remains blocked; all device-visible addresses are manager-owned bounce pages; no host physical address is exposed; stale handle/completion/teardown evidence covers the selected fallback; documentation states that hostile bus-mastering hardware isolation is not claimed.
UnsupportedDevice remains disabled or unbound; no driver-visible DMA, MMIO doorbell, interrupt ownership, or storage/network readiness claim is made.

The matrix must distinguish provider-side isolation facts from guest-controlled isolation facts. SR-IOV, virtual NIC, GPU, accelerator, or local NVMe support is evidence that a VM exposes DMA-capable device surfaces, but it is not direct remapping evidence unless the guest also exposes an IOMMU or equivalent translation authority that capOS can program. Each VM-shape row should record the provider, region or zone, instance type, image and kernel, provider API or documentation source and date, live guest probe output, visible PCI/device drivers, visible IOMMU tables or groups, maintenance/revocation behavior, and the resulting backend classification.

The matrix is a support-policy input, not a hardcoded boot oracle. capOS should infer the safest available backend at runtime from the device inventory, remapping authority it can actually program, driver self-tests, and fail-closed probe results. Unknown or contradictory observations select Unsupported, not direct DMA. Provider evidence remains necessary for VM shapes the project wants to advertise as supported, because a guest probe cannot fully prove host-side provider isolation or maintenance behavior.

The matrix is an input to attended sign-off. It is not itself the sign-off.

Design Grounding