Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Plan: Device Driver Foundation

Overview

Selected milestone for capOS. Build reusable hardware/cloud bring-up plumbing before storage or network expansion: ACPI/MADT/MCFG discovery, PCI/PCIe enumeration, BAR/MMIO mapping, masked I/O APIC routing, MSI/MSI-X discovery metadata, and the DMA/MMIO/IRQ authority boundaries that future virtio block/network work consumes. Long-form decomposition lives in docs/backlog/hardware-boot-storage.md and docs/dma-isolation-design.md.

Closeout rule: do not call cloud/storage/network expansion ready until the foundation can enumerate QEMU/firmware devices deterministically, route interrupts through the intended LAPIC/I/O APIC or MSI path, and expose enough bounded serial diagnostics to debug failures before higher-level drivers are trusted. WORKPLAN.md keeps the live milestone status; this plan stays in sync with it.

Pre-existing material caveats (do not start sub-gate 5 until each is closed):

  • The current foundation is still early bring-up. QEMU ECAM full scans, the BAR helper, the MSI/MSI-X registry, the device manager, and the DMA ledger are not production userspace-driver authority.
  • Userspace DeviceMmio, Interrupt, and DMAPool handles remain blocked on production lifecycle hooks, real handle/page generation and epoch enforcement, stale IRQ/DMA completion proofs, budget/OOM residency policy, IOMMU/remapping or bounce-buffer policy integration, and S.11.2 hostile smokes.
  • The production handle epoch invariants for future DMAPool, DeviceMmio, and Interrupt handles are documented in docs/dma-isolation-design.md. The bounded pure capos-lib::device_authority validator and host tests now cover those documented identity/state/epoch checks, and the current QEMU DMAPool lifecycle/imported-live proofs exercise a device-manager adapter through that validator. Production handle objects, broader DeviceMmio/Interrupt kernel wiring, QEMU stale-handle smokes, userspace exposure, and S.11.2 hostile smokes remain open.
  • I/O APIC ownership/dispatch through the shared interrupt-source registry, production interrupt waiters, full driver-binding policy, provider NIC/storage drivers, and cloud portability smokes are still planned.

Adjacent caveat (not a DDF prerequisite): make run-measure is broken on main because the thread-lifecycle measure-mode child exits before its park-path proof line. Track the repair in measure-mode-repair.md. It is listed here only so DDF workers know the regression exists; it does not gate any DDF task. The measure-mode plan also keeps WORKPLAN.md and REVIEW_FINDINGS.md in sync when it closes.

Conflict Surface

Owned by this plan:

  • kernel/src/pci.rs, kernel/src/arch/x86_64/pci_config.rs
  • kernel/src/device_interrupt.rs, kernel/src/device_dma.rs
  • kernel/src/device_manager.rs
  • kernel/src/acpi.rs IOMMU discovery/policy sections (current IOMMU implementation lives here, not in a dedicated iommu.rs; introduce a dedicated module only as part of a reviewed restructure)
  • kernel/src/virtio.rs and any new kernel/src/virtio_*.rs
  • kernel/src/diagnostics.rs device-related commands (including IOMMU table snapshots)
  • kernel/src/cap/device_*.rs, kernel/src/cap/dma_pool.rs (when introduced)
  • schema/capos.capnp device-authority interface additions (shared serial surface – see docs/plans/README.md Concurrency Notes; this plan owns device-authority interfaces only and must coordinate with the Remote Session, Aurelian, and Paperclips plans before changing the file)
  • docs/dma-isolation-design.md, docs/proposals/cloud-deployment-proposal.md, relevant docs/backlog/hardware-boot-storage.md sections
  • Makefile device/diagnostics targets such as run-net, run-diagnostics
  • ProcessSpawner / manifest plumbing for the new userspace device authority cap surface (Task 5). When a DDF iteration adds manifest fields or loader changes, the touched paths overlap with the system-config plan’s capos-config/ and tools/mkmanifest/ scope; coordinate that overlap before changing manifest shape so DDF and the focused-proof migration do not race the same loader.

Do not touch from this plan:

  • tools/remote-session-client/ (owned by remote-session plan)
  • demos/paperclips-*, demos/adventure-* (owned by demo plans)
  • cue/defaults/, repo-root system-*.cue migration scope (owned by system-config slice 3 plan)
  • demos/thread-lifecycle/ measure-mode entry point (owned by measure-mode-repair plan)

Validation Commands

  • make fmt-check
  • cargo build --features qemu
  • cargo test-lib
  • cargo test-config
  • cargo test-ring-loom
  • make generated-code-check
  • make dependency-policy-check
  • make capos-rt-check
  • make run-smoke
  • make run-net
  • make run-diagnostics (required when the iteration changes kernel/src/diagnostics.rs device/interrupt/DMA dumps or the feature-gated early-boot prompt commands consumed by DDF)

Success Criteria

The milestone is recorded done when WORKPLAN.md lists Device Driver Foundation in the completed-milestones block with a commit hash, and:

  • Userspace DeviceMmio, Interrupt, and DMAPool authority is exposed through reviewed cap-table entries with epoch-checked handles.
  • Stale IRQ/DMA completion proofs, S.11.2 hostile smokes, bounded budget/OOM residency policy, and IOMMU/bounce-buffer integration are recorded.
  • The diagnostics console can fully describe device, interrupt, and DMA ledger state on COM1 before higher-level drivers run.

Task 1: Production lifecycle hooks for DeviceMmio/Interrupt/DMAPool authority

  • Convert the current kernel-owned device_dma proof ledger into a real DMAPool authority record with page-backed pool lifecycle cleanup, pages staying committed/resident/unswappable while device-visible, and scrubbed-before-release semantics.
  • Carry the existing budget/OOM closed-cases policy into real userspace handle creation paths, including pool-bytes/page-count/MMIO mapping bytes/interrupt holds/RX-TX ring depth/in-flight descriptor accounting checks at handle creation, transfer, and revoke.
  • Wire device-manager teardown triggers (cap release, process exit, driver crash, reset/disable, interrupt waiter, future DeviceMmio, future DMAPool) to user-visible objects, not just trigger labels.

Task 2: Generation-checked DMA/MMIO/IRQ handles

  • Add production generation/epoch identity to userspace DMAPool, DeviceMmio, and Interrupt handles. The current bounded DMAPool record proof shows the shape; promote it to real handle objects.
  • Prove a stale handle fails closed after revoke, reset, reassignment, or object reuse via host tests and a QEMU smoke.
  • Document the handle epoch invariants in docs/dma-isolation-design.md alongside the existing state machine notes: object identity fields, owner generation versus pool/slot/mapping/source/route generation, non-wrapping epochs or permanent retirement, fail-closed behavior, and the host/QEMU proof obligations. This completes only the documentation subtask; production handles and proofs remain open.
  • Add a bounded pure host-testable validator for the documented production handle epoch invariants in capos-lib::device_authority. This completes only the ABI-independent validator prerequisite; production userspace handles, QEMU stale-handle smokes, and S.11.2 hostile smokes remain open.
  • Wire the current kernel device-manager DMAPool lifecycle and imported-live accounting proofs through the pure validator for active handle acceptance and stale-after-revoke rejection. This completes only the bounded DMAPool proof adapter; production userspace handles, DeviceMmio/Interrupt handle wiring, QEMU stale-handle smokes, and S.11.2 hostile smokes remain open.

Task 3: Stale IRQ/DMA completion proof (S.11.2)

Current prerequisite evidence: run-net now includes bounded device-manager stale IRQ after-detach, after-revoke masked-route, and synthetic route-registry reset-reuse proofs plus a scratch stale DMA completion ordering proof for generation-stale completions before completion accounting. The scratch stale completion proof now also exercises the pure DMA-buffer validator for active completion and stale-slot side-effect blocking, and a paired scratch proof now blocks synthetic reset/reuse completion publication and new-owner exposure. The interrupt handoff proof now also exercises the pure Interrupt validator for an active attached source and stale-owner side-effect blocking after revoke begins. The production stale IRQ/DMA hostile-smoke gate remains open.

  • Add a bounded device-manager interrupt handoff proof that a stale vector after detach is unregistered and reports stale_irq_wake_blocked=true. This is prerequisite evidence only; pending hardware IRQ/reset delivery, userspace Interrupt waiters, and reassignment reuse remain open.
  • Route the bounded device-manager interrupt handoff proof through the pure capos-lib::device_authority Interrupt validator. The QEMU gate records active wait validation, stale-owner-generation rejection, and side-effect blocking before preserving the existing stale route and stale vector delivery checks. This remains proof-adapter evidence only.
  • Add bounded stale IRQ after-revoke and reset-reuse proof points to the interrupt handoff smoke. The QEMU gate records a still-attached claimed route as masked after revoke begins, then records synthetic same-vector route-registry reuse during reset with a bumped route generation and masked delivery against the new route. This remains prerequisite evidence only; true pending hardware MSI/reset delivery, userspace Interrupt waiters, reassignment hostile smokes, and DMA buffer reuse races remain open.
  • Route the bounded scratch stale-DMA-completion proof through the pure capos-lib::device_authority DMA-buffer validator. The QEMU gate records active completion validation, stale-slot-generation rejection, and side-effect blocking before the existing stale-dma-handle completion outcome. This remains scratch/no-real-DMA prerequisite evidence only.
  • Add a paired bounded scratch stale-DMA-completion publication proof. The QEMU gate records synthetic reset rejection as stale-owner-generation with side-effect-blocked, same-slot reuse rejection as stale-dma-handle, preserved new-owner submission state, and blocked CQ publication plus new-owner exposure. This remains scratch/no-real-DMA prerequisite evidence only.
  • Add the production QEMU/host hostile-smoke proof that stale IRQ delivery after revoke or reset cannot wake a new owner or race reuse of freed DMA buffers. A stale interrupt must drain against the old generation and be ignored, or be prevented by mask/reset before reassignment.
  • Add the production paired stale-DMA-completion proof showing old completions cannot publish stale CQ notifications or expose new-owner memory after real revoke/reset and reuse.
  • Wire both proofs into the chosen automated gate (run-net extension or a new focused run-* smoke) and document the assertion shape.

Task 4: IOMMU policy integration

  • Replace the current direct-DMA-blocked, bounce-buffer-only policy with reviewed IOMMU domain programming or an explicit bounce-buffer policy tied to device-manager state. Keep PCI DMA diagnostics separated between retained IOMMU metadata attachment/coverage and the active direct-DMA policy.
  • Verify the PCI_DEVICE_MANAGER -> DEVICE_INTERRUPT_ROUTES lock order survives the change. Document any new lock with the same explicit ordering note.
  • Add a bounded diagnostics dump that explains current trusted-domain and bounce-buffer state on COM1 without leaking owner identity. This completes only the serial diagnostics mirror: the devices command reports the current blocked direct-DMA policy, zero trusted domains, unprogrammed remapping tables, IOVA-only future exports, userspace device authority still not started, and kernel-owned bounce-buffer-only prototype devices.

Task 5: Userspace DeviceMmio and Interrupt authority cap surface

  • Expose userspace DeviceMmio and Interrupt authority only after Tasks 1-4 close: the generic interrupt path, the second-device proof, the device manager, and the DMAPool/S.11.2 hostile-smoke gates. This is the first userspace-driver authority boundary, not a place to introduce new lower-level routing or DMA safety.
  • Add the cap-table entries, ProcessSpawner manifest plumbing, and at least one in-tree consumer (provider NIC or block driver smoke) that exercises bounded MMIO mapping, interrupt acknowledgment, and DMA handle release without hitting the legacy in-kernel path.
  • Add cap-tagged audit records and refresh docs/proposals/security-and-verification-proposal.md with the new boundaries.

Task 6: Provider NIC/storage smoke and cloud portability gate

  • Pick the first provider-driver smoke (virtio-blk or a userspace virtio-net path) and prove it through reviewed user-mode driver authority rather than the existing in-kernel proof ledger.
  • Record a cloud portability check (GCP/AWS serial-console boot or an equivalent stand-in) so future cloud expansion does not re-derive device discovery rules.
  • Update docs/roadmap.md, docs/backlog/hardware-boot-storage.md, WORKPLAN.md, REVIEW_FINDINGS.md (close any DDF blockers it currently advertises – DMA owner state machine, generation-checked handles, IRQ/DMA stale interrupt proof, DMA ResourceLedger/OOM, etc.), and docs/changelog.md (per WORKPLAN, gate history lives there) to record completion with commit hash and a minute-precision timestamp using the explicit timezone abbreviation from date '+%Y-%m-%d %H:%M %Z', e.g. 2026-04-20 17:42 UTC (per CLAUDE.md).