# Plan: Device Driver Foundation

## Overview

Selected milestone for capOS. Build reusable hardware/cloud bring-up plumbing
before storage or network expansion: ACPI/MADT/MCFG discovery, PCI/PCIe
enumeration, BAR/MMIO mapping, masked I/O APIC routing, MSI/MSI-X discovery
metadata, and the DMA/MMIO/IRQ authority boundaries that future virtio
block/network work consumes. Long-form decomposition lives in
`docs/backlog/hardware-boot-storage.md` and `docs/dma-isolation-design.md`.

Closeout rule: do not call cloud/storage/network expansion ready until the
foundation can enumerate QEMU/firmware devices deterministically, route
interrupts through the intended LAPIC/I/O APIC or MSI path, and expose enough
bounded serial diagnostics to debug failures before higher-level drivers are
trusted. `WORKPLAN.md` keeps the live milestone status; this plan stays in
sync with it.

Pre-existing material caveats (do not start sub-gate 5 until each is closed):

- The current foundation is still early bring-up. QEMU ECAM full scans, the
  BAR helper, the MSI/MSI-X registry, the device manager, and the DMA ledger
  are not production userspace-driver authority.
- Userspace `DeviceMmio`, `Interrupt`, and `DMAPool` handles remain blocked on
  production lifecycle hooks, real handle/page generation and epoch
  enforcement, stale IRQ/DMA completion proofs, budget/OOM residency policy,
  IOMMU/remapping or bounce-buffer policy integration, and S.11.2 hostile
  smokes.
- The production handle epoch invariants for future `DMAPool`, `DeviceMmio`,
  and `Interrupt` handles are documented in `docs/dma-isolation-design.md`.
  The bounded pure `capos-lib::device_authority` validator and host tests now
  cover those documented identity/state/epoch checks, and the current QEMU
  `DMAPool` lifecycle/imported-live proofs exercise a device-manager adapter
  through that validator. Production handle objects, broader
  `DeviceMmio`/`Interrupt` kernel wiring, QEMU stale-handle smokes, userspace
  exposure, and S.11.2 hostile smokes remain open.
- I/O APIC ownership/dispatch through the shared interrupt-source registry,
  production interrupt waiters, full driver-binding policy, provider
  NIC/storage drivers, and cloud portability smokes are still planned.

Adjacent caveat (not a DDF prerequisite): `make run-measure` is broken on
`main` because the `thread-lifecycle` measure-mode child exits before its
park-path proof line. Track the repair in `measure-mode-repair.md`. It is
listed here only so DDF workers know the regression exists; it does not
gate any DDF task. The measure-mode plan also keeps `WORKPLAN.md` and
`REVIEW_FINDINGS.md` in sync when it closes.

## Conflict Surface

Owned by this plan:

- `kernel/src/pci.rs`, `kernel/src/arch/x86_64/pci_config.rs`
- `kernel/src/device_interrupt.rs`, `kernel/src/device_dma.rs`
- `kernel/src/device_manager.rs`
- `kernel/src/acpi.rs` IOMMU discovery/policy sections (current IOMMU
  implementation lives here, not in a dedicated `iommu.rs`; introduce a
  dedicated module only as part of a reviewed restructure)
- `kernel/src/virtio.rs` and any new `kernel/src/virtio_*.rs`
- `kernel/src/diagnostics.rs` device-related commands (including IOMMU
  table snapshots)
- `kernel/src/cap/device_*.rs`, `kernel/src/cap/dma_pool.rs` (when introduced)
- `schema/capos.capnp` device-authority interface additions (shared serial
  surface -- see `docs/plans/README.md` Concurrency Notes; this plan owns
  device-authority interfaces only and must coordinate with the
  Remote Session, Aurelian, and Paperclips plans before changing the
  file)
- `docs/dma-isolation-design.md`, `docs/proposals/cloud-deployment-proposal.md`,
  relevant `docs/backlog/hardware-boot-storage.md` sections
- `Makefile` device/diagnostics targets such as `run-net`, `run-diagnostics`
- ProcessSpawner / manifest plumbing for the new userspace device authority
  cap surface (Task 5). When a DDF iteration adds manifest fields or
  loader changes, the touched paths overlap with the system-config plan's
  `capos-config/` and `tools/mkmanifest/` scope; coordinate that overlap
  before changing manifest shape so DDF and the focused-proof migration
  do not race the same loader.

Do not touch from this plan:

- `tools/remote-session-client/` (owned by remote-session plan)
- `demos/paperclips-*`, `demos/adventure-*` (owned by demo plans)
- `cue/defaults/`, repo-root `system-*.cue` migration scope (owned by
  system-config slice 3 plan)
- `demos/thread-lifecycle/` measure-mode entry point (owned by
  measure-mode-repair plan)

## Validation Commands

- `make fmt-check`
- `cargo build --features qemu`
- `cargo test-lib`
- `cargo test-config`
- `cargo test-ring-loom`
- `make generated-code-check`
- `make dependency-policy-check`
- `make capos-rt-check`
- `make run-smoke`
- `make run-net`
- `make run-diagnostics` (required when the iteration changes
  `kernel/src/diagnostics.rs` device/interrupt/DMA dumps or the
  `feature-gated` early-boot prompt commands consumed by DDF)

## Success Criteria

The milestone is recorded done when `WORKPLAN.md` lists Device Driver
Foundation in the completed-milestones block with a commit hash, and:

- Userspace `DeviceMmio`, `Interrupt`, and `DMAPool` authority is exposed
  through reviewed cap-table entries with epoch-checked handles.
- Stale IRQ/DMA completion proofs, S.11.2 hostile smokes, bounded budget/OOM
  residency policy, and IOMMU/bounce-buffer integration are recorded.
- The diagnostics console can fully describe device, interrupt, and DMA
  ledger state on COM1 before higher-level drivers run.

### Task 1: Production lifecycle hooks for DeviceMmio/Interrupt/DMAPool authority

- [ ] Convert the current kernel-owned `device_dma` proof ledger into a real
      `DMAPool` authority record with page-backed pool lifecycle cleanup,
      pages staying committed/resident/unswappable while device-visible, and
      scrubbed-before-release semantics.
- [ ] Carry the existing budget/OOM closed-cases policy into real userspace
      handle creation paths, including pool-bytes/page-count/MMIO mapping
      bytes/interrupt holds/RX-TX ring depth/in-flight descriptor accounting
      checks at handle creation, transfer, and revoke.
- [ ] Wire device-manager teardown triggers (cap release, process exit,
      driver crash, reset/disable, interrupt waiter, future `DeviceMmio`,
      future `DMAPool`) to user-visible objects, not just trigger labels.

### Task 2: Generation-checked DMA/MMIO/IRQ handles

- [ ] Add production generation/epoch identity to userspace `DMAPool`,
      `DeviceMmio`, and `Interrupt` handles. The current bounded `DMAPool`
      record proof shows the shape; promote it to real handle objects.
- [ ] Prove a stale handle fails closed after revoke, reset, reassignment,
      or object reuse via host tests and a QEMU smoke.
- [x] Document the handle epoch invariants in `docs/dma-isolation-design.md`
      alongside the existing state machine notes: object identity fields,
      owner generation versus pool/slot/mapping/source/route generation,
      non-wrapping epochs or permanent retirement, fail-closed behavior, and
      the host/QEMU proof obligations. This completes only the documentation
      subtask; production handles and proofs remain open.
- [x] Add a bounded pure host-testable validator for the documented production
      handle epoch invariants in `capos-lib::device_authority`. This completes
      only the ABI-independent validator prerequisite; production userspace
      handles, QEMU stale-handle smokes, and S.11.2 hostile smokes remain
      open.
- [x] Wire the current kernel device-manager `DMAPool` lifecycle and
      imported-live accounting proofs through the pure validator for active
      handle acceptance and stale-after-revoke rejection. This completes only
      the bounded `DMAPool` proof adapter; production userspace handles,
      `DeviceMmio`/`Interrupt` handle wiring, QEMU stale-handle smokes, and
      S.11.2 hostile smokes remain open.

### Task 3: Stale IRQ/DMA completion proof (S.11.2)

Current prerequisite evidence: `run-net` now includes bounded
device-manager stale IRQ after-detach, after-revoke masked-route, and synthetic
route-registry reset-reuse proofs plus a scratch stale DMA completion ordering
proof for generation-stale completions before completion accounting. The
scratch stale completion proof now also exercises the pure DMA-buffer
validator for active completion and stale-slot side-effect blocking, and a
paired scratch proof now blocks synthetic reset/reuse completion publication
and new-owner exposure. The interrupt handoff proof now also exercises the
pure Interrupt validator for an active attached source and stale-owner
side-effect blocking after revoke begins. The production stale IRQ/DMA
hostile-smoke gate remains open.

- [x] Add a bounded device-manager interrupt handoff proof that a stale vector
      after detach is `unregistered` and reports `stale_irq_wake_blocked=true`.
      This is prerequisite evidence only; pending hardware IRQ/reset delivery,
      userspace `Interrupt` waiters, and reassignment reuse remain open.
- [x] Route the bounded device-manager interrupt handoff proof through the pure
      `capos-lib::device_authority` Interrupt validator. The QEMU gate records
      active wait validation, stale-owner-generation rejection, and
      side-effect blocking before preserving the existing stale route and
      stale vector delivery checks. This remains proof-adapter evidence only.
- [x] Add bounded stale IRQ after-revoke and reset-reuse proof points to the
      interrupt handoff smoke. The QEMU gate records a still-attached claimed
      route as masked after revoke begins, then records synthetic same-vector
      route-registry reuse during reset with a bumped route generation and
      masked delivery against the new route. This remains prerequisite evidence
      only; true pending hardware MSI/reset delivery, userspace `Interrupt`
      waiters, reassignment hostile smokes, and DMA buffer reuse races remain
      open.
- [x] Route the bounded scratch stale-DMA-completion proof through the pure
      `capos-lib::device_authority` DMA-buffer validator. The QEMU gate records
      active completion validation, stale-slot-generation rejection, and
      side-effect blocking before the existing `stale-dma-handle` completion
      outcome. This remains scratch/no-real-DMA prerequisite evidence only.
- [x] Add a paired bounded scratch stale-DMA-completion publication proof. The
      QEMU gate records synthetic reset rejection as `stale-owner-generation`
      with `side-effect-blocked`, same-slot reuse rejection as
      `stale-dma-handle`, preserved new-owner submission state, and blocked CQ
      publication plus new-owner exposure. This remains scratch/no-real-DMA
      prerequisite evidence only.
- [ ] Add the production QEMU/host hostile-smoke proof that stale IRQ delivery
      after revoke or reset cannot wake a new owner or race reuse of freed DMA
      buffers. A stale interrupt must drain against the old generation and be
      ignored, or be prevented by mask/reset before reassignment.
- [ ] Add the production paired stale-DMA-completion proof showing old
      completions cannot publish stale CQ notifications or expose new-owner
      memory after real revoke/reset and reuse.
- [ ] Wire both proofs into the chosen automated gate (`run-net` extension
      or a new focused `run-*` smoke) and document the assertion shape.

### Task 4: IOMMU policy integration

- [ ] Replace the current direct-DMA-blocked, bounce-buffer-only policy with
      reviewed IOMMU domain programming or an explicit bounce-buffer policy
      tied to device-manager state. Keep PCI DMA diagnostics separated
      between retained IOMMU metadata attachment/coverage and the active
      direct-DMA policy.
- [ ] Verify the `PCI_DEVICE_MANAGER` -> `DEVICE_INTERRUPT_ROUTES` lock
      order survives the change. Document any new lock with the same
      explicit ordering note.
- [x] Add a bounded diagnostics dump that explains current trusted-domain
      and bounce-buffer state on COM1 without leaking owner identity.
      This completes only the serial diagnostics mirror: the `devices`
      command reports the current blocked direct-DMA policy, zero trusted
      domains, unprogrammed remapping tables, IOVA-only future exports,
      userspace device authority still not started, and kernel-owned
      bounce-buffer-only prototype devices.

### Task 5: Userspace DeviceMmio and Interrupt authority cap surface

- [ ] Expose userspace `DeviceMmio` and `Interrupt` authority only after
      Tasks 1-4 close: the generic interrupt path, the second-device proof,
      the device manager, and the `DMAPool`/S.11.2 hostile-smoke gates.
      This is the first userspace-driver authority boundary, not a place
      to introduce new lower-level routing or DMA safety.
- [ ] Add the cap-table entries, ProcessSpawner manifest plumbing, and at
      least one in-tree consumer (provider NIC or block driver smoke) that
      exercises bounded MMIO mapping, interrupt acknowledgment, and DMA
      handle release without hitting the legacy in-kernel path.
- [ ] Add cap-tagged audit records and refresh `docs/proposals/security-and-verification-proposal.md`
      with the new boundaries.

### Task 6: Provider NIC/storage smoke and cloud portability gate

- [ ] Pick the first provider-driver smoke (virtio-blk or a userspace
      virtio-net path) and prove it through reviewed user-mode driver
      authority rather than the existing in-kernel proof ledger.
- [ ] Record a cloud portability check (GCP/AWS serial-console boot or an
      equivalent stand-in) so future cloud expansion does not re-derive
      device discovery rules.
- [ ] Update `docs/roadmap.md`, `docs/backlog/hardware-boot-storage.md`,
      `WORKPLAN.md`, `REVIEW_FINDINGS.md` (close any DDF blockers it
      currently advertises -- DMA owner state machine, generation-checked
      handles, IRQ/DMA stale interrupt proof, DMA ResourceLedger/OOM,
      etc.), and `docs/changelog.md` (per WORKPLAN, gate history lives
      there) to record completion with commit hash and a minute-precision
      timestamp using the explicit timezone abbreviation from
      `date '+%Y-%m-%d %H:%M %Z'`, e.g. `2026-04-20 17:42 UTC`
      (per CLAUDE.md).
