Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Device Manager Refactor

kernel/src/device_manager.rs is the current convergence point for production device authority, transitional userspace cap surfaces, QEMU proof harnesses, audit labels, DMA/MMIO/IRQ policy, and serialization checks. That shape was useful while the Device Driver Foundation gate was evolving, but it now hides the target userspace-driver model behind a single large file.

The refactor should keep the kernel device manager as the authoritative ownership ledger for claimed devices while separating proof scaffolding and domain-specific record logic into clearer modules. It must not weaken the single ownership transaction across DMAPool, DMABuffer, DeviceMmio, and Interrupt.

Design Grounding

  • DMA Isolation Design requires one device-manager ledger of record for each claimed device before userspace NIC or block drivers receive hardware authority.
  • Service Architecture makes init the holder of DeviceManager and ProcessSpawner, with child hardware drivers receiving only scoped device caps.
  • Networking defines the target NIC split: a userspace NIC driver holds DeviceMmio, Interrupt, and DMAPool, then exports a Nic cap to a separate network stack.
  • Device Driver Foundation is the active implementation track for the hardware authority gates that make this refactor useful.

No external prior-art report is required for the initial split: this is a repo-local maintainability refactor that preserves the existing accepted authority model rather than selecting a new OS design.

Target Module Shape

Convert kernel/src/device_manager.rs into kernel/src/device_manager/:

kernel/src/device_manager/
  mod.rs          public API, re-exports, lock-order notes
  handles.rs      BDF, owner/state, and handle structs
  error.rs        production DeviceManagerError and display helpers
  registry.rs     PCI claim, transfer, revoke, release, generation table
  ledger.rs       aggregate claimed-device MMIO/DMA/IRQ ledger helpers
  mmio.rs         DeviceMmio records and map/unmap/read/write admission
  dma_pool.rs     DMAPool records, accounting, budget, teardown evidence
  dma_buffer.rs   DMABuffer records, map/free/submit/complete admission
  interrupt.rs    interrupt records and route/wait/ack/mask/unmask admission
  proofs/         transitional QEMU proof entry points and proof logs

PciDeviceRecord should remain the aggregate owner of a claimed device’s ledger. The split should move record-specific logic behind modules, not create independent managers that can diverge during teardown.

Refactor Strategy

  1. Split proof code first. Move prove_qemu_*, proof log structs, proof-only error enums, and bounded proof helper functions into a proof module. Keep wrapper functions in mod.rs so current virtio.rs call sites do not churn.

  2. Split handles and errors. Move PciBdf, owner/state enums, handle structs, DeviceMmioRegion, and DeviceManagerError into dedicated modules. This should be mostly mechanical and makes later diffs smaller.

  3. Split record domains. Move MMIO, DMA pool, DMA buffer, and interrupt attached-record logic into domain modules. PciDeviceRecord may keep aggregate arrays and lifecycle state, but the validation and admission details should live with their domain.

  4. Preserve one authoritative ledger. Every operation that creates, consumes, or releases device-visible authority must still update the claimed-device ledger as part of the same ownership transaction that changes device-manager state.

  5. Improve internal APIs after the split. Once modules are separated, introduce small transaction helpers such as with_active_record, with_devicemmio_record, with_dmapool_record, or typed admission contexts to remove repeated stale-handle, owner, and state checks.

Constraints

  • Preserve the existing lock order: PCI_DEVICE_MANAGER before DEVICE_INTERRUPT_ROUTES.
  • Preserve cap semantics, audit labels, proof labels, and QEMU smoke output during the initial split.
  • Keep userspace-driver authority blocked until the Device Driver Foundation gates still marked open are closed.
  • Avoid broad call-site churn. Compatibility wrappers are acceptable during the mechanical phase.
  • Do not move authority decisions into userspace. Userspace drivers receive scoped caps, but the kernel remains the ledger and enforcement point.
  • Keep proof code available until userspace-driver production gates have equivalent coverage.

Validation

For mechanical file movement, run:

  • make fmt-check
  • cargo build --features qemu
  • make workflow-check

When a slice moves code that emits or validates device proof labels, also run the affected QEMU gates:

  • make run-net
  • make run-devicemmio-grant
  • make run-dmapool-grant
  • make run-interrupt-grant
  • relevant make run-hardware-audit* targets when audit or proof labels move

Choose those gates from the moved authority surface, not from the file move alone:

  • proof-log or proof-label movement needs the QEMU target that asserts those exact proof lines;
  • grant-source or cap-object movement needs the matching run-*-grant target plus any parent lifecycle target it depends on;
  • audit emission, snapshot decode, or audit-label movement needs the matching run-hardware-audit* target;
  • MMIO, DMA, IRQ, or teardown transaction movement needs the focused grant target and the broader device proof target that exercises stale handle, revoke/reset, or release behavior;
  • pure type, handle, or error-module movement may stop at make fmt-check, cargo build --features qemu, and make workflow-check only when the public diff leaves proof labels, grant behavior, and authority transactions unchanged.

Success Criteria

  • device_manager is a module tree rather than one monolithic source file.
  • Production authority paths are visibly separated from QEMU proof scaffolding.
  • Public behavior and existing proof/audit labels are unchanged by the initial split.
  • The module boundaries match the target userspace-driver design: kernel code owns claim, revoke, teardown, MMIO, DMA, and IRQ authority; userspace drivers consume only scoped capabilities.