# Repository Map

This map names the main source locations for the current system. It is not an
ownership file; use it to find the code behind architecture and validation
claims.

## Root Files

- `README.md` gives the compact project overview.
- `docs/roadmap.md` records long-range stages and broad feature direction.
- `docs/tasks/state.toml` records the current selected milestone.
- `docs/tasks/README.md` defines the task-ledger schema and dispatch semantics.
- `docs/tasks/*.md`, `docs/tasks/on-hold/`, `docs/tasks/active/`,
  `docs/tasks/review/`, and `docs/tasks/done/` carry task lifecycle records.
- `docs/tasks/**` carries open review-finding remediation records;
  `REVIEW_FINDINGS.md` is a tombstone for pre-migration links.
- `REVIEW.md` defines review expectations.
- `Makefile` builds pinned tools, userspace binaries, manifests, ISO images,
  QEMU targets, formatting checks, generated-code checks, and policy checks.
- `rust-toolchain.toml` declares the Rust nightly channel, required targets,
  and `rust-src`; it does not pin an exact nightly by date or commit.
- `.cargo/config.toml` sets the default bare-metal target and useful cargo
  aliases.

## Schema and Shared ABIs

- `docs/abi-evolution-policy.md` defines compatibility classes, schema ordinal
  rules, ring-layout rules, version negotiation, and deprecation windows for
  externally visible ABI changes.
- `schema/capos.capnp` defines capability interfaces, manifest structures,
  exceptions, ProcessSpawner, ProcessHandle, and transfer-related schema.
- `capos-abi/src/lib.rs` defines small no_std ABI/policy constants shared by
  crates that should not depend on schema/config internals, including process
  quotas and credential policy limits.
- `capos-config/src/manifest.rs` defines the host and no_std manifest model.
- `capos-config/src/ring.rs` defines `CapRingHeader`, SQE/CQE structures,
  opcodes, flags, and transport error constants shared by kernel and userspace.
- `capos-config/src/capset.rs` defines the read-only bootstrap CapSet ABI.
- `capos-config/src/cue.rs` supports evaluated CUE-style manifest data.
- `capos-config/src/credential_policy.rs` re-exports credential policy limits;
  full PHC parsing is enabled by the `credential-validation` feature for
  bootstrap validators that need credential checks.
- `capos-config/tests/ring_loom.rs` models bounded ring protocol behavior with
  Loom.

Validation: `cargo test-config`, `cargo test-ring-loom`,
`make generated-code-check`.

## Shared Pure Logic

- `capos-lib/src/elf.rs` parses ELF64 images for kernel loading and host tests.
- `capos-lib/src/cap_table.rs` implements `CapId`, capability-table storage,
  stale-generation checks, grant preparation, transfer transaction helpers,
  commit, rollback, and the CapTable quota constants sourced from `capos-abi`.
- `capos-lib/src/frame_bitmap.rs` implements the host-testable physical frame
  bitmap core.
- `capos-lib/src/frame_ledger.rs` contains a bounded frame-grant helper kept
  for host-test coverage; current MemoryObject accounting charges
  `CapTable::ResourceLedger`.
- `capos-lib/src/lazy_buffer.rs` provides bounded lazy buffers used by ring
  scratch paths.
- `capos-lib/src/iso9660.rs` is the pure ISO 9660 primary-volume-descriptor
  and directory-record parser the kernel boot-ISO driver (`kernel/src/iso/`)
  delegates to; fuzz target `iso9660_volume`.
- `capos-lib/src/storage_format.rs` holds the pure `CAPOSRO1` (`rofs`),
  `CAPOSST1` (`disk_store`), and `CAPOSWF1` (`writable_fs`) mount parsers the
  kernel storage cap backers delegate to, including the shared record-layout
  constants the kernel writers reuse; fuzz targets `storage_rofs_mount`,
  `storage_disk_store_mount`, `storage_writable_fs_mount`.

Validation: `cargo test-lib`, `cargo miri-lib`, `make kani-lib`, fuzz targets
under `fuzz/fuzz_targets/`.

## Kernel

- `kernel/src/main.rs` is the boot entry point, hardware setup sequence,
  manifest parsing path, and boot-launched service creation path. `run_init`
  resolves PID 1 from the kernel-embedded `boot::INIT_ELF` when
  `initConfig.init.binary == capos_config::RESERVED_INIT_BINARY_NAME` (`"init"`)
  and otherwise from `SystemManifest.binaries`; for the embedded case it also
  injects the embedded image into the `ProcessSpawner` binary set under the
  reserved name so child spawns of `init` resolve.
- `kernel/src/boot.rs` exposes `boot::INIT_ELF: &[u8]`, the PID 1 init image
  packaged at build time. `kernel/build.rs` reads the prebuilt `init/` artifact
  (`CAPOS_INIT_ELF`, with a conventional-path fallback) and generates the
  `include_bytes!` static; `init/` stays a standalone crate (byte packaging, not
  linker merging).
- `kernel/src/spawn.rs` loads user ELF images, creates process state, maps
  bootstrap pages, and enqueues spawned processes.
- `kernel/src/process.rs` defines `Process`, `Thread`, `ThreadState`,
  per-thread kernel stacks, park waiter storage, and userspace CPU context.
- `kernel/src/sched.rs` implements the single-CPU scheduler, timer-driven
  preemption, blocking `cap_enter`, direct IPC handoff, ParkSpace wait/wake,
  and deferred cancellation wakeups.
- `kernel/src/serial.rs` implements COM1/COM2 UART setup, manifest-driven
  console-vs-terminal routing, and kernel print macros.
- `kernel/src/pci.rs` implements early PCI config-space access through legacy
  I/O ports and ACPI MCFG/PCIe ECAM, with QEMU diagnostics for the current
  virtio-net and Q35 discovery paths, plus reusable memory-BAR subregion
  validation, kernel MMIO mapping helpers for in-kernel drivers, and
  MSI/MSI-X capability metadata discovery plus typed MSI-X table programming.
- `kernel/src/device_interrupt.rs` records the current kernel-owned
  virtio-net MSI-X config/RX/TX sources, their generation ids, route state,
  in-kernel driver owner, lock-free bounded device MSI vector-pool dispatch
  slots, and claimed-route reassignment/release without exposing userspace
  interrupt authority.
- `kernel/src/device_dma.rs` holds the kernel-owned, fixed-size DMA pool
  accounting ledgers. The net-keyed `VIRTIO_NET_DMA_POOL` backs virtio-net's
  `DmaPage` path; a focused single-queue `VIRTIO_BLK_DMA_POOL` (reusing the
  shared `ActivePage`/`QueueAccount` types, same generation-checked handle and
  scrub-before-free invariants) backs the virtio-blk request buffer. Each
  device's `VirtqueueDma` seam impl delegates to its own pool's keyed API.
- `kernel/src/dma_backend.rs` (always compiled) records the boot-time IOMMU
  probe verdict and resolves the fail-closed DMA backend selection (direct
  IOMMU remapping only with a verified probe, else kernel-owned bounce
  buffers) per the "Cloud DMA Backend" contract in
  `docs/dma-isolation-design.md`, emitting the boot proof line.
- `kernel/src/device_manager/` holds bounded in-kernel PCI device ownership
  records. The full DDF surface (device records, DMA pools/buffers, MSI-X
  interrupts, NVMe brokered controller registers, IOMMU domain ledgers,
  virtio ring publication, proofs) compiles only under `cfg(feature = "qemu")`
  in `qemu_full.rs`; the MMIO-only surface used by `cap::device_mmio` exists
  in both builds, dispatching to `stub.rs` (one-slot parked-region
  `DeviceMmio` record) in the production non-`qemu` build.
- `kernel/src/nvme_storage_backend.rs` (`cfg(not(feature = "qemu"))`) is the
  fail-closed activation gate for the always-built NVMe `BlockDevice` read
  arm: modeled on `dma_backend`, it resolves a production handle only when a
  brokered controller was discovered and a live `device_mmio` grant is
  staged, otherwise the `block_device` grant fails closed with a typed error.
- `kernel/src/virtio_transport.rs` (always compiled) is the device-agnostic
  virtio modern-PCI transport host surface: capability/region discovery
  constants and bounded volatile MMIO accessors usable outside the
  `qemu`-gated legacy virtio path.
- `kernel/src/virtio.rs` (`cfg(qemu)`) holds the legacy in-kernel virtio
  transport, now a qemu-only fixture: the non-`qemu` production build compiles
  `kernel/src/virtio_stub.rs` instead, whose typed negative results keep stale
  or fixture-only kernel networking call sites failing closed. It includes
  the virtqueue drivers used by the IOMMU remapping proof. Its
  `pub(crate) mod transport` is the device-generic layer: split-ring/common-config
  constants, the `MmioRegion` accessor, the `VirtqueueDescriptorTracker`, the
  `VirtqueueDma` DMA/notify seam, the seam-driven `Virtqueue`/`DmaPage` with their
  poll/submit/complete loop and the multi-descriptor `submit_request_chain`, and
  the device-id-parameterized `discover_modern_transport`. virtio-net is one seam
  caller (`VirtioNetDma`); virtio-blk is a second (`VirtioBlkDma` + `VirtioBlkDriver`,
  `diagnose_virtio_blk_transport`, the `block_device_*` request API behind the
  `BlockDevice` cap). Net-specific provider/proof methods stay in the parent module
  as `impl Virtqueue<VirtioNetDma>`.
- `kernel/src/iommu.rs` (`cfg(qemu)`) programs the Intel VT-d legacy-mode
  remapping tables, drives the hardware-DMA translation/fault proof, and runs
  the register-based invalidation revocation cycle.
- `kernel/src/iso/` (`cfg(boot_iso_read)` / `cfg(boot_iso)` / `cfg(qemu)`) is the
  boot-time ISO reader for the Boot Binary ISO Layout track. `AtapiDevice` (gate 1)
  locates the legacy IDE ATAPI device and exposes a bounded
  `read_sectors(lba, count, buf)` over polled-PIO `READ(12)` packet commands
  with range/length validation. `IsoFs` (gate 2) is a read-only ISO 9660 driver
  layered on it: it parses the primary volume descriptor, walks directory
  records, and serves `open_file(name) -> (lba, size)` under `/boot/bins/`,
  validating every directory record and derived extent against the volume size
  before use (fail-closed `BadVolume`/`NotFound`/`NotDirectory`).
  `boot_read_proof()` reads the PVD (`CD001`) and `boot_fs_proof()` walks to
  `/boot/bins/PAYLOAD.BIN` and verifies its content, both behind `boot_iso_read`
  as the `make run-boot-iso-read` proof. The `boot_source` submodule (gate 4,
  `cfg(boot_iso)`) builds a validated `(name, lba, size)` registry from every
  declared manifest binary name (mapping each name to the ISO 9660 d-character
  form, e.g. `capos-shell` -> `/boot/bins/CAPOS_SHELL`) and reads ELF bytes on
  demand behind a device mutex; `run_init` and `ProcessSpawnerCap` consume it so
  the `boot_iso` kernel loads binaries from the ISO instead of embedded
  `NamedBlob.data`. Proofs: `make run-boot-iso` and the default `make
  run-smoke`. Under `cfg(qemu)` the always-on `AtapiDevice`/`IsoFs` surface
  (plus a qemu-gated `block_size()`/`list_boot_bins()` enumeration helper) also
  backs the read-only install-source fixture cap
  (`kernel/src/cap/installable_image.rs`).

Validation: `cargo build --features qemu`, `make run-smoke`, `make run-spawn`,
`make run-net`, `make run-iommu-remapping`.

## Kernel Architecture

- `kernel/src/arch/x86_64/gdt.rs` sets up kernel/user segments and TSS state.
- `kernel/src/arch/x86_64/idt.rs` handles exceptions and timer interrupts;
  CPL3 #PF/#GP/#UD/#DB/#BP faults terminate the whole owning process through
  `sched::exit_current_thread_terminating_process` (deferred whole-process
  termination when sibling threads are live; proof `make run-user-fault`),
  while CPL0 faults still halt the machine.
- `kernel/src/arch/x86_64/syscall.rs` implements syscall MSR setup and entry.
- `kernel/src/arch/x86_64/context.rs` defines timer context-switch state.
- `kernel/src/arch/x86_64/pic.rs` and `pit.rs` configure legacy interrupt
  hardware.
- `kernel/src/arch/x86_64/ioapic.rs` maps MADT I/O APICs and programs masked
  legacy IRQ routes from interrupt-source overrides.
- `kernel/src/arch/x86_64/lapic.rs` programs the xAPIC LAPIC timer and IPIs.
- `kernel/src/arch/x86_64/smap.rs` enables SMEP/SMAP and brackets user memory
  access.
- `kernel/src/arch/x86_64/tls.rs` handles FS-base/TLS support.
- `kernel/src/arch/x86_64/pci_config.rs` provides legacy PCI config I/O used
  by the higher-level PCI module alongside its ECAM backend.
- `kernel/src/arch/x86_64/percpu.rs`, `smp.rs`, and `tlb.rs` provide per-CPU
  data, AP startup, and TLB shootdown for the SMP scheduler.

## Kernel Memory

- `kernel/src/mem/frame.rs` wraps the shared frame bitmap with Limine memory
  map initialization and global kernel access.
- `kernel/src/mem/paging.rs` manages page tables, address spaces, permissions,
  user mappings, W^X enforcement, and address-space teardown.
- `kernel/src/mem/heap.rs` initializes the kernel heap.
- `kernel/src/mem/validate.rs` validates user buffers before kernel access.

Related docs: [DMA Isolation](dma-isolation-design.md),
[Trusted Build Inputs](trusted-build-inputs.md).

## Kernel Capabilities

- `kernel/src/cap/mod.rs` initializes kernel capabilities and builds the
  first service's kernel-sourced bootstrap capability table.
- `kernel/src/cap/table.rs` re-exports shared capability-table logic and owns
  the kernel-global table.
- `kernel/src/cap/ring.rs` validates and dispatches ring SQEs.
- `kernel/src/cap/transfer.rs` validates transfer descriptors and prepares
  transfer transactions.
- `kernel/src/cap/endpoint.rs` implements Endpoint CALL, RECV, RETURN, queued
  state, cleanup, and cancellation behavior.
- `kernel/src/cap/console.rs` implements serial Console.
- `kernel/src/cap/terminal_session.rs` implements the session-scoped
  TerminalSession line-oriented terminal with bounded `readLine`, echo modes,
  and cancellation.
- `kernel/src/cap/boot_package.rs` implements the read-only BootPackage
  manifest-size/chunked-read capability.
- `kernel/src/cap/manual.rs` implements the read-only Manual capability: it
  parses the boot-packaged `ManualCorpus` blob (embedded as the `manual-corpus`
  named binary) and answers `page`/`apropos`/`topics`/`section`/`describe`/
  `buildInfo`.
- `kernel/src/cap/log.rs` implements the Phase 1 monitoring log surface:
  `LogSink` (`write`) and `LogReader` (`read`) over a shared bounded,
  drop-oldest kernel recent-record ring. The sink drops records below the
  boot-seeded `SystemConfig.logLevel` threshold and forwards accepted records
  to serial; the reader returns records at/after a cursor with `LogFilter`
  (`minLevel`/`componentPrefix`), `nextCursor`, and `dropped`
  (`docs/proposals/system-monitoring-proposal.md`).
- `kernel/src/cap/block_device.rs` implements the `BlockDevice`
  `CapObject` (`readBlocks`/`writeBlocks`/`info`/`flush`). In the non-`qemu`
  production build the `block_device` source resolves to the
  userspace-brokered NVMe arm (`BlockDeviceBackend::NvmeBrokered`, gated by
  `kernel/src/nvme_storage_backend.rs`); the `qemu` build routes bounded
  inline-`Data` sector I/O to the kernel-owned virtio-blk driver in
  `kernel/src/virtio.rs` as a named fixture, not production storage (proof
  `make run-virtio-blk`). The cap is scoped to one
  `device_index`: the `block_device` source reaches the resolved non-target
  boot/storage disk, and `block_device_target`
  (`KernelCapSource.blockDeviceTarget @44`) reaches the manifest-selected PCI
  identity when it names a bound non-boot virtio-blk disk. A cap for one disk
  grants no authority over another. The kernel binds up to
  `device_dma::MAX_VIRTIO_BLK_DEVICES` (currently 2) virtio-blk devices, each
  with an independent driver/DMA-pool/interrupt-route instance
  (`VirtioBlkDriver<const DEV>` / `VirtioBlkDma<const DEV>` over
  `VIRTIO_BLK_DMA_POOLS[DEV]`); `kernel/src/pci.rs` enumerates each device with a
  device index (proof `make run-multi-virtio-blk`). Target grants fail closed
  when the selector is absent, mismatched, or names the resolved boot disk.
  Counts are bounded to one bounce-buffer page.
- `kernel/src/cap/readonly_fs.rs` implements the read-only filesystem service:
  `ReadOnlyFsDirectoryCap` / `ReadOnlyFsFileCap` parse a fixed `CAPOSRO1`
  on-disk layout read through the kernel-owned virtio-blk driver and serve
  `Directory.list`/`open` + `File.read`/`stat`; every mutating method fails
  closed. Granted via the `read_only_fs_root` `KernelCapSource` (returns a root
  `Directory` cap; qemu-gated, mounts at grant resolution and fails closed on a
  malformed/absent image). Host image builder `tools/mkstore-image --readonly-fs`;
  proof `make run-storage-fs`.
- `kernel/src/cap/persistent_store.rs` implements the disk-backed persistent
  `Store`: `DiskStoreCap` serves the `Store` interface (`put`/`get`/`has`/
  `delete`) over a fixed `CAPOSST1` on-disk layout read and written through a
  read+write `BlockSource` seam. `put` bump-allocates a data extent, writes the
  blob and entry record, then rewrites the superblock last as the durability
  commit point; `delete` tombstones the entry slot, and a later space-exhausting
  `put` compacts live entries through a shadow generation before recommitting the
  canonical front generation; the mount validates the superblock and every entry
  extent in-bounds and fails closed on a malformed image. The `Virtio`
  `BlockSource` (qemu kernel) routes to the kernel-owned virtio-blk driver
  byte-identically (folding in the `data_region_base_lba()` offset) and mounts
  eagerly at grant resolution; the
  `Nvme` `BlockSource` (built under `cloud_persistent_store_over_nvme_proof`)
  reads/writes through a granted NVMe `BlockDevice` window op and defers its
  mount-parse to the first `Store` call. Granted via the `persistent_store`
  `KernelCapSource` (virtio arm qemu-gated; the third NVMe-proof arm resolves the
  live `device_mmio` handle). Host image builder `tools/mkstore-image`; reboot
  proof `make run-storage-persist` (two QEMU passes on one disk image); NVMe
  put-then-get proof `make run-cloud-provider-persistent-store-over-nvme` via
  `kernel/src/cap/persistent_store_over_nvme_proof.rs`.
- `kernel/src/cap/writable_fs.rs` implements the disk-backed writable filesystem
  service: `WritableDirectoryCap` serves `list`/`open`/`mkdir`/`remove`/`rename`/
  `create` and `WritableFileCap` serves `read`/`write`/`stat`/`truncate`/`sync`/
  `close` over a fixed `CAPOSWF1` on-disk layout (a flat node-record array with
  parent pointers + a bump-allocated data region) written through a `BlockSource`
  seam. The RAM tree is the working copy; each mutation write-through-commits in
  the order data sector → node-record sector → superblock. A filesystem-wide
  fail-closed single-writer policy admits one writer at a time. The `Virtio`
  `BlockSource` (qemu/installable kernels) routes to the kernel-owned virtio-blk
  driver byte-identically (folding the `data_region_base_lba()` offset) and mounts
  the singleton eagerly; the `Nvme` `BlockSource` (built under
  `cloud_writable_fs_over_nvme_proof`) reads/writes through a granted NVMe
  `BlockDevice` window op and defers the singleton mount-parse to the first
  `Directory`/`File` call. Granted via the `writable_fs_root` `KernelCapSource`
  (virtio arm qemu-gated; the third NVMe-proof arm resolves the live `device_mmio`
  handle), which mounts the process-wide singleton volume once and hands each
  grant a distinct writer id; fails closed on a malformed image. The NVMe
  write-then-read durability proof (`make run-cloud-provider-writable-fs-over-nvme`
  via `kernel/src/cap/writable_fs_over_nvme_proof.rs`, which supersedes and drops
  the persistent-store-over-NVMe proof) exercises both `BlockDevice` arms with the
  single-writer policy intact. The combined image
  builder `tools/mkstore-image --writable` co-locates the `CAPOSST1` `Store`
  sub-volume (LBA 0) and the `CAPOSWF1` filesystem sub-volume on one disk; reboot
  proof `make run-storage-writable` (two QEMU passes: mutate then verify both the
  filesystem and the store survive). A slot becomes live on the next mount only
  once the superblock's bumped `node_count` is observed, so a poweroff in the
  record-written / superblock-pending window leaves an orphan slot the mount
  ignores. The proof-only `storage_writable_recovery` feature arms an induced
  forced poweroff in exactly that window (`recovery_crash_after_record`); bounded
  recovery proof `make run-storage-writable-recovery` (pass 1 commits then is
  `kill -9`d mid-allocation, pass 2 verifies recovery to a consistent tree with
  the interrupted allocation atomically absent). The same crash window is proven
  over the NVMe `BlockDevice` arm by `make run-cloud-provider-writable-fs-over-nvme-recovery`
  via `kernel/src/cap/writable_fs_over_nvme_recovery_proof.rs` (a recovery cap-waiter
  clone that implies and supersedes the happy-path proof module/route/init); the
  `cloud_writable_fs_over_nvme_recovery_proof` feature widens the
  `storage_writable_recovery` crash-window cfg gate, and the host-built NVMe image
  (`tools/mkstore-image --writable-nvme`, empty superblock + root-only node table)
  is booted twice with `-device nvme` (no `@20` seed). `writable_fs::mount_config_root`
  (qemu-gated) scopes a writable `Directory` to the `system/config` subtree for
  the boot-time data-region grant below.
- `kernel/src/cap/installable_image.rs` implements the read-only
  install-source fixture (Installable System track item 5b):
  `InstallableImageDirectoryCap` serves `list`/`open` and `InstallableImageFileCap`
  serves `read`/`stat`/`close` over the booted CD-ROM ISO 9660 `/boot/bins/` tree,
  reading through the `kernel/src/iso/` `boot_iso` ATAPI/ISO 9660 driver behind a
  single shared-device mutex (so PIO does not interleave across CPUs). Every
  mutating method fails closed; a past-EOF read clamps to empty and an absent name
  is rejected, reusing the driver's `validate_extent`/`read_sectors` range checks.
  Granted via the qemu-gated `installable_image_source` `KernelCapSource` (mounts
  the ATAPI volume and validates `/boot/bins/` at grant resolution, failing the
  spawn closed on an absent/malformed medium). Physically scoped to the ATAPI
  CD-ROM, so it cannot reach the writable virtio-blk target disk
  (`block_device_target`/`writable_fs_root`). Consumer demo
  `demos/installable-image-source/`; manifest `system-installable-image-source.cue`;
  proof `make run-installable-image-source`.
- `demos/installable-system-install/` implements `capos-system-install`, the
  Installable System install flow (track item 6): under the read-only
  `installable_image_source` `Directory` and the target-scoped
  `block_device_target` `BlockDevice` selected by manifest PCI identity, it
  copies the packaged bootable boot-region head (`BOOTHEAD.BIN`) to LBA 0,
  writes the backup GPT (`BOOTGPT.BIN`) at the LBA read from the primary GPT
  header, and initializes an empty data region (`DATAIMG.BIN`,
  `tools/mkstore-image --writable --empty-config`) at the fixed
  `cap::data_region_base_lba`, validating ranges and verifying the read-back. It
  reads packaged files in 32 KiB windows (under the read-path reply scratch
  bound; see `docs/tasks/done/2026-06-05/storage-file-read-reply-scratch-clamp.md`)
  and zero-skips the FAT free space.
  `tools/split-boot-region.py` splits the mkdiskimage boot image into the head +
  backup GPT so only the populated prefix is packaged. Pass-1 installer manifest
  `system-installable-install.cue`; pass-2 installed manifest (baked into the
  boot region) `system-installable-install-target.cue`; harness
  `tools/qemu-installable-install-smoke.sh`; proof `make run-installable-install`
  (pass 1 installs into a second virtio-blk disk, pass 2 boots it standalone).
- `kernel/src/cap/mod.rs` `grant_data_region` (proof-only `installable_data_region`
  feature) is the Installable System boot-time data-region mount: `run_init`
  best-effort grants init a `system/config` `Directory` (`data-config`) plus the
  persistent `Store` (`data-store`) over the auto-attached data disk, failing
  closed wholesale to the base manifest (caps unchanged, "no data region; base
  floor" diagnostic) when the disk is absent, malformed, or missing
  `system/config`. No new cap type or schema change. Proof
  `make run-installable-data-region` (seeded disk prints resolved contents; no
  disk and zeroed-superblock disk hit the base floor).
- Installable System config-overlay compose/merge (track item 3): the
  `SystemConfigOverlay` capnp object + `SystemManifest.extensionPoints`
  (`ManifestExtensionPoints`) live in `schema/capos.capnp`; the typed decode,
  content-hash check, and `compose_onto` precedence (base-pins-win /
  overlay-adds-within-declared-extension-points / no-new-authority) live in
  `capos-config/src/manifest.rs`. `init/src/main.rs` `apply_config_overlay`
  reads `system/config/overlay.bin` from the granted `data-config` `Directory`,
  composes the overlay over the base plan, and falls closed to the base floor
  with `[init] overlay rejected: <reason>`. The `tools/mkmanifest` `mkoverlay`
  bin encodes overlays (filling the canonical hash) and
  `tools/mkstore-image --writable --seed-overlay` seeds them. Proof
  `make run-installable-overlay`.
- Installable System generations + rollback + failed-boot auto-fallback (track
  item 4): userspace-only over the already-granted Store + writable
  `system/config` `Directory`, no schema or kernel change. `init/src/main.rs`
  `run_generation_rollback_checks` (gated by a base service named
  `generation-proof`) represents system-config generations as content-addressed
  `Store` objects keyed by SHA-256, tracks the known-good `active` pointer and a
  staged/attempting `candidate` pointer as monotonic-epoch marker files
  (`gen-active`/`gen-candidate`) in the writable config region, records a boot
  attempt durably before applying a candidate, auto-falls-back to the known-good
  generation when a candidate is left unconfirmed (the brick-proofing
  guarantee), promotes a confirmed candidate, rolls config back to a retained
  prior generation, and rejects a stale/replayed (lower-or-equal-epoch) pointer.
  A present-but-undecodable `gen-candidate` marker (the torn size-0 file a
  poweroff inside the CREATE|TRUNCATE rewrite window leaves, or garbage bytes)
  is discarded with a loud diagnostic and boot falls back to the known-good
  generation, while a corrupt `gen-active` marker takes a distinct loud FATAL
  refuse-to-boot path (the known-good generation is genuinely unknown).
  Manifest `system-installable-generation.cue`; proof
  `make run-installable-generation` boots a `--seed-config` disk three times
  (boot 1 exercises the mechanism and leaves an unconfirmed candidate; boot 2
  proves across-reboot auto-fallback to the known-good generation, then leaves
  a torn size-0 candidate marker; boot 3 proves torn-marker recovery).
- Installable System integrated bootable disk (track item 5, proof-only
  `installable_disk` feature, implies `installable_data_region`): one disk
  carries the boot ESP (GPT partition 1) and the co-located `CAPOSST1` Store +
  `CAPOSWF1` writable data region (GPT partition 2). `kernel/src/cap/mod.rs`
  `data_region_base_lba` returns the fixed partition-2 base LBA (264192) under
  the feature (0 otherwise), applied at the single `persistent_store`/
  `writable_fs` `read_range`/`write_range` choke points so the kernel reads the
  region at that fixed tool/kernel-contract LBA without parsing the GPT.
  `tools/mkdiskimage.sh` `--data-image`/`--data-offset-bytes` fold the
  `tools/mkstore-image --writable` image into partition 2 and derive the ESP
  size from `--esp-sectors` (integrated disk uses the same 128 MiB ESP as the raw
  disk-image targets so a debug kernel fits). Manifest `system-installable-disk.cue`; proof
  `make run-installable-disk` boots one virtio-blk disk and asserts the data
  region mounts from the boot disk and a data-region-only overlay service runs.
- `kernel/src/cap/frame_alloc.rs` implements FrameAllocator and MemoryObject.
- `kernel/src/cap/virtual_memory.rs` implements per-process anonymous memory
  operations.
- `kernel/src/cap/timer.rs` implements monotonic `now` and bounded `sleep`.
- `kernel/src/cap/wall_clock.rs` implements the read-only `WallClock.wallTime`
  cap: UTC over a fixed boot base layered on the monotonic timebase, reporting
  the fail-closed `untrusted` `ClockProvenance` (Phase 1 fixed-boot-base
  variant; `docs/proposals/time-and-clock-proposal.md`).
- `kernel/src/cap/park_space.rs` implements the process-local ParkSpace
  marker capability used by compact park (`CAP_OP_PARK`/`CAP_OP_UNPARK`) opcodes.
- `kernel/src/cap/network.rs` implements the qemu-only NetworkManager,
  TcpListener, TcpSocket, and UdpSocket fixture caps.
  The kernel no longer depends on `smoltcp`; non-`qemu` manifests reject the
  kernel `network_manager` / `tcp_listen_authority` grant sources (fail
  closed), and the production socket path is the Phase C userspace
  network-stack process. The socket-backed `SocketTerminalSession` shim is
  retired: `TcpSocket.intoTerminalSession` fails closed in every dispatch
  path.
- `kernel/src/cap/process_spawner.rs` implements ProcessSpawner and
  ProcessHandle.
- `kernel/src/cap/provider_cap_waiter_proof.rs` (non-`qemu`,
  `cloud_provider_cap_waiter_proof` Cargo feature) stages a
  fully-programmed-route bootstrap `Interrupt` grant source and the
  `InterruptCapWaiterProof` cap whose `Interrupt.wait` injects one
  `device_interrupt::handle_lapic_delivery` dispatch and whose
  `Interrupt.acknowledge` retires the deferred LAPIC EOI; the cap's
  `on_release` runs the masked-no-wake + reassign + stale-handle assertion
  chain before emitting `cloudboot-evidence: provider-cap-waiter <token>`.
  Mutually exclusive with `cap::interrupt_grant_source_prod` (default
  cloudboot path) and skips `cap::provider_nic_bind_proof` /
  `cap::storage_bind_proof` to keep the bound route live for the userspace
  cap-waiter handoff. Proof: `make run-cloud-provider-cap-waiter`.
- `kernel/src/cap/virtio_net_device_bringup_proof.rs` (non-`qemu`,
  `cloud_virtio_net_device_bringup_proof` Cargo feature; mutually
  exclusive with `cloud_provider_cap_waiter_proof` and the userspace
  selected-write handshake proof) drives the bounded
  virtio status sequence kernel-side over the picked virtio-net PCI
  function (vendor `0x1af4`, device `0x1000` / `0x1041`): resolves the
  modern virtio PCI transport regions through
  `virtio_transport::parse_modern_pci_transport_capabilities`, maps the
  common configuration window through `pci::map_bar_region`, and drives
  reset → ACKNOWLEDGE → DRIVER → feature discovery + driver-feature
  selection (`VIRTIO_F_VERSION_1` only) → FEATURES_OK → DRIVER_OK with a
  trailing reset on every exit path. Inline assertions gate the headline
  `cloudboot-evidence: virtio-net-device-bringup <token>` on the
  negotiated feature set, `COMMON_NUM_QUEUES >= 2`, DRIVER_OK
  observation, and the final reset returning `device_status` to 0.
  Marker carries `queue_setup=not-attempted`,
  `tx_descriptor=not-published`, `userspace_cap=not-issued`,
  `msix_function_enable=not-toggled`,
  `device_autonomous_raise=not-attempted`, `live_cloud=not-attempted`.
  Proof: `make run-cloud-provider-virtio-net-bringup`.
- `cloud_virtio_net_userspace_features_ok_proof` (non-`qemu`; proof
  `make run-cloud-prod-nic-driver-userspace-features-ok`) is Phase C slice 1
  of the userspace NIC relocation track. It makes
  `cap::devicemmio_grant_source_prod` stage the picked virtio-net modern
  common-config window as a selected-write `DeviceMmio` cap with
  `registerWrite=selected-write-common-config-handshake`; the userspace smoke
  drives reset -> ACKNOWLEDGE -> DRIVER -> FEATURES_OK over `DeviceMmio.write32`
  and proves queue-address writes remain fail-closed. It is mutually exclusive
  with the kernel-owned virtio-net bringup, bundle, and queue-materialization
  proof chain over the same BDF/grant path, and with the
  `cloud_nvme_readonly_bind_proof` descendant chain because both stage a
  proof-specific production `DeviceMmio` grant source.
- `kernel/src/cap/virtio_net_tx_authority_bundle_proof.rs`,
  `kernel/src/cap/virtio_net_tx_queue_materialization_proof.rs`, and
  `kernel/src/cap/virtio_net_msix_function_enable_proof.rs` are the
  decomposed userspace-TX track. Each is non-`qemu` and gated by its own
  focused-proof Cargo feature
  (`cloud_virtio_net_tx_authority_bundle_proof`,
  `cloud_virtio_net_tx_queue_materialization_proof`, and
  `cloud_virtio_net_msix_function_enable_proof` respectively; the last
  implies the second so the bundle observer + production grant-source
  pickers + userspace bundle smoke stay compiled in across the chain).
  The bundle proof observes the three production grant sources
  (`devicemmio_grant_source_prod`, `dmapool_grant_source_prod`,
  `interrupt_grant_source_prod`) issuing one cap each into the spawned
  userspace bundle smoke and asserts same-BDF; the queue-materialization
  proof drives the kernel-side modern-virtio status sequence through
  `DRIVER_OK` and materializes one manager-owned TX virtqueue from three
  zeroed brokered frames, asserting register read-backs and post-reset
  clearance; the MSI-X function-enable proof extends that sequence with
  one canonical mask-first PCI MSI-X function-level enable (set
  `FUNCTION_MASK`, then `ENABLE`, then clear both) plus best-effort
  cleanup on every exit path. Each child emits its own headline marker
  (`cloudboot-evidence: virtio-net-tx-authority-bundle <token>`,
  `cloudboot-evidence: virtio-net-tx-queue-materialization <token>`, and
  `cloudboot-evidence: virtio-net-msix-function-enable <token>`); when
  the later feature is active the earlier markers are intentionally
  suppressed because their discipline labels would be inaccurate.
  Proofs: `make run-cloud-provider-virtio-net-tx-authority-bundle`,
  `make run-cloud-provider-virtio-net-tx-queue-materialization`,
  `make run-cloud-provider-virtio-net-msix-function-enable`.
- `kernel/src/cap/virtio_net_userspace_rx_dma_proof.rs` (Phase C slice 4a-ii,
  gated `cloud_virtio_net_userspace_rx_bringup_proof`) drives the first real RX
  DMA from the **shim-owned** vring: `post_rx_descriptor` writes the RX descriptor
  + avail over the shim's retained RX vring physes at `DMABuffer.submitDescriptor`
  time, and `drive_rx_dma` (reached from the now-live
  `provider_notify_doorbell_write_for_cap`) rings the RX doorbell, submits a
  kernel-half SLIRP TX ARP stimulus over the retained TX physes, polls one real
  device->host completion, and resets the device (clearing the retained `enabled`
  flags to release the ring-buffer pins). Self-contained byte-level vring helpers
  are duplicated from `virtio_net_polled_provider` to protect `run-net`. The
  notify region is mapped kernel-side + the per-queue notify slot offsets captured
  by `cap::devicemmio_grant_source_prod` (`rx_dma_notify_state`). Proof
  `make run-cloud-prod-nic-driver-userspace-rx-bringup` (extended).
- `kernel/src/cap/null.rs` implements the measurement-only NullCap.
- `kernel/src/cap/park_bench.rs` implements the measurement-only ParkBench
  authority used by `make run-measure`.

Related docs: [Capability Model](capability-model.md),
[Authority Accounting](authority-accounting-transfer-design.md).

## Userspace

- `init/` is the standalone init process. In the spawn smoke, it uses
  ProcessSpawner, grants initial child capabilities, waits on ProcessHandles,
  and checks hostile spawn inputs.
- `capos-rt/src/entry.rs` owns the runtime entry path and bootstrap validation.
- `capos-rt/src/alloc.rs` initializes the userspace heap.
- `capos-rt/src/syscall.rs` provides raw syscall wrappers.
- `capos-rt/src/capset.rs` provides typed CapSet lookup helpers.
- `capos-rt/src/ring.rs` implements the safe single-owner ring client,
  out-of-order completion handling, transfer descriptor packing, and result-cap
  parsing.
- `capos-rt/src/client.rs` implements typed clients for Console,
  TerminalSession, BootPackage, ProcessSpawner, ProcessHandle, and Timer. The
  client-side methods are generic over `Transport`; result-cap-adopting methods
  stay on the concrete `RuntimeRingClient`.
- `capos-rt/src/transport.rs` defines the `Transport` seam (the client-side
  `CALL`/completion/`RELEASE` ring operations) and the in-system `RingTransport`
  (`RingClient` viewed through the seam). A host remote transport is a later
  slice; see `docs/backlog/capos-sdk-dual-transport.md`.
- `capos/` is the front-door SDK facade crate: for the default `ring` feature it
  re-exports the `capos-rt` runtime, typed clients, the `entry_point!` macro,
  and a `prelude`. The `remote` feature is reserved. Standalone, like
  `capos-rt`.
- `capos-rt/src/pollselect.rs` is the pure POSIX `poll`/`select` bridge:
  `SocketReadiness` -> `poll` `revents` (`POLLIN`/`POLLOUT`/`POLLHUP`/`POLLERR`/
  `POLLNVAL`) and `select` set membership, plus `unsupported_request_bits` for
  fail-closed flag handling. Shared by the `libcapos-posix` C surface and the
  `posix-socket-poll-select-smoke` proof. Proof: `make run-posix-socket-poll-select`.
- `capos-rt/src/panic.rs` provides the emergency Console panic output path.
- `capos-rt/src/bin/smoke.rs` is the runtime smoke binary used by focused
  runtime proofs rather than the default boot manifest.
- `capos-service/src/lib.rs` is the standalone `no_std` service lifecycle layer
  above `capos-rt`; slice 1 exposes `ServiceMain`, `ServiceRuntime`, and ordered
  initialize/dependency-wait/ready/run/drain/shutdown/cleanup phases.
- `shell/src/main.rs` is the native capability shell, built as the standalone
  `capos-shell` crate and packaged by `system.cue`, `system-shell.cue`, and
  the focused login manifests.

Validation: `make capos-rt-check`, `make run-smoke`, `make run-spawn`,
`make run-shell`, `make run-terminal`. The former Telnet fixture is retired
with the qemu-only kernel TCP listener.

## Standalone C and WASI Substrates

These are standalone crates (not workspace members) built by the Makefile.

- `libcapos/` builds `libcapos.a`, a `no_std` Rust staticlib exposing the
  capos-rt syscall/ring/CapSet path and typed `Console`/`Timer`/
  `EntropySource`/`VirtualMemory` wrappers plus C heap shims to C consumers.
  Public header at `libcapos/include/capos/capos.h`. No POSIX surface.
- `libcapos-posix/` builds `libcapos_posix.a`, a `no_std` Rust staticlib
  layering a POSIX adapter over `libcapos`: per-process fd table, errno cell,
  historical UDP socket wrappers over the retired qemu-only kernel `UdpSocket`
  cap, clock over `Timer`, `pipe`/`dup`
  over `Pipe`, `poll`/`select` (`poll.rs`, `<poll.h>`/`<sys/select.h>`) over the
  `capos-rt::pollselect` readiness bridge with fail-closed unsupported-flag /
  `EBADF` / `EINVAL` handling, and `fork`/`execve`/`waitpid` via the recording-shim
  ProcessSpawner Move-grant path, plus the libc surface the dash port needs:
  stdio/string/stdlib/ctype helpers, `strerror`/`qsort`/`umask`/`abort`/
  `strtoll`/`strpbrk`/`lstat`/`getgroups`/`wait3`/`vfork`, byte-order helpers
  (`inet.rs`), `getrlimit`/`setrlimit` (`resource.rs`), `setlocale`
  (`locale.rs`), `times` + `tcgetattr`, C-locale `wchar`/`wctype` multibyte
  (`wchar.rs`), the `environ` pointer, and the `sys_siglist` array. C headers
  (the namespaced source of truth) under `libcapos-posix/include/capos/posix/`
  -- including the dash-needed `sys/types.h`, `termios.h`, `sys/resource.h`,
  `sys/times.h`, `wchar.h`, `wctype.h`, `locale.h`, `inttypes.h`, and the
  decl-only `sys/ioctl.h`/`sys/mman.h`/`arpa/inet.h`/`getopt.h`/`paths.h`/
  `sys/param.h`. `libcapos-posix/sysroot/include/` is the `-nostdinc`
  bare-header sysroot (`<stdio.h>`, `<unistd.h>`, `<sys/stat.h>`, …) whose
  wrappers forward into that namespace; mirrored C ports (dash) build against
  it via the Makefile's `CAPOS_C_SYSROOT_INCLUDE` flags on the
  `capos-c-multitu-elf` rule. Focused sysroot proof `make run-c-libc-surface`.
- `capos-wasm/` is the `no_std` WASI host adapter: a `wasmi`-backed `Runtime`,
  the `wasm-host` userspace binary, the Preview 1 import resolver, and the
  manifest-supplied wasm payload reader.
- `vendor/wasmi-no_std/` and `vendor/dns-c-wahern/` are static-pinned,
  no-patches upstream snapshots consumed by `capos-wasm/` and the POSIX DNS
  smoke; do not patch them in place (refresh procedure in each
  `VENDORED_FROM.md`).
- `vendor/dash/` is the mirror-as-is dash `0.5.13.4` snapshot (`src/` stays
  byte-identical; capOS deviations live under `patches/`). Its capOS build
  pipeline lives outside the mirror under `vendor/dash/capos/`: the pinned
  `config.h` and `gen-tables.sh` (stages a patched source copy + runs the six
  host table generators). The Makefile `dash` target builds `target/dash/dash.elf`
  through `capos-c-multitu-elf` against `libcapos.a` + `libcapos_posix.a`.

Validation: `make run-c-hello`, `make run-posix-pipe-smoke`,
`make run-posix-printf`, `make run-wasm-host`, `make run-wasi-hello-rust`,
`make run-wasi-random`. The former POSIX DNS smoke is retired with the
qemu-only kernel `UdpSocket` owner.

## Demo Services

`demos/` is a nested userspace smoke-test workspace. Each demo is a release-built
service binary packaged into the boot manifest:

- `adventure-client`, `adventure-server`, `adventure-npc-shopkeeper`,
  `adventure-npc-wanderer`
- `capos-chat`, `chat-bot`, `chat-client`, `chat-server`
- `capset-bootstrap`, `console-paths`, `credential-store`
- `endpoint-queue-limit-smoke`, `endpoint-roundtrip`, `ipc-server`,
  `ipc-client`, `in-flight-call-limit-smoke`
- `frame-allocator-cleanup`, `memoryobject-shared-child`,
  `memoryobject-shared-parent`
- `paperclips`, `paperclips-content`
- `revocable-read`, `revocation-observer`
- `ring-corruption`, `ring-reserved-opcodes`, `ring-nop`, `ring-fairness`
- `service-common`, `shell-spawn-test`, `shell-typed-call`
- `terminal-session`, `terminal-stranger`
- `timer-smoke`, `timer-flood`
- `tls-smoke`, `unprivileged-stranger`, `virtual-memory`
- `user-fault-parent`, `user-fault-victim` (user fault containment proof,
  `make run-user-fault`)

Shared demo support lives in `demos/capos-demo-support/src/lib.rs` and uses
`capos-rt` for entry, allocator, syscall, CapSet, and panic support while
keeping raw ring helpers for low-level transport smokes.

Validation: `make run-spawn`.

## Manifest and Tooling

- `system.cue` is the default init-owned boot manifest source. It imports the
  shared defaults package, boot-launches standalone `init`, and lets init start
  the shell, remote-session CapSet gateway, and resident services.
- `system-spawn.cue` is the ProcessSpawner smoke manifest source.
- `system-smoke.cue` is the scripted focused shell-led login/shell smoke
  manifest source.
- `system-chat.cue`, `system-adventure.cue`, and `system-paperclips.cue` are
  focused resident-service and terminal-demo manifest sources.
- `system-memoryobject-shared.cue`, `system-revocable-read.cue`, and
  `system-measure.cue` are focused regression/measurement manifest sources.
- `system-shell.cue` is the focused anonymous-shell manifest source (no
  verifier, shell stays anonymous).
- `system-terminal.cue` is the focused TerminalSession proof manifest source.
- `system-credential.cue` is the focused CredentialStore proof manifest source.
- `system-login.cue` is the focused password-login proof manifest source.
- `system-login-setup.cue` is the focused first-boot setup proof manifest source.
- `tools/mkmanifest/` evaluates manifest input, embeds binaries, validates
  manifest shape, writes boot-manifest Cap'n Proto bytes, and provides
  `cue-to-capnp` for schema-aware CUE-authored data-message conversion. Its
  sibling `mkoverlay` bin encodes a `SystemConfigOverlay` from CUE into the
  `system/config/overlay.bin` bytes (filling the canonical content hash) for the
  installable-system config-overlay proof.
- `tools/manualc/` is the System Manual corpus compiler: it parses
  `schema/capos.capnp` for section-2 interface pages, reads the authored
  man corpus under `docs/manual/man<section>/*.man`, and emits the
  boot-packaged `ManualCorpus` blob. It fails the build if any in-tree
  capability interface lacks a section-2 page (i.e. a schema doc comment).
- `docs/manual/` holds the authored man-shaped corpus consumed by `manualc`
  (section 1 shell-command pages and section 7 concept pages); section-2
  capability pages are generated from the schema, not stored here.
- `system-manual-smoke.cue` is the focused Manual proof manifest source.
- `tools/agent-session-recaps/` contains private-session recap and raw-archive
  tooling for the agentic development experiment. The tools are tracked here;
  raw transcripts and generated recap stores stay outside the repo unless
  explicitly redacted and reviewed.
- `tools/check-generated-capnp.sh` verifies checked-in generated schema output.
- `scripts/record_worklog.py` emits per-task commit spans (from each task's
  `commits:` list, falling back to task-file history) for the development
  timeline/Gantt; `scripts/validate_backfill_tasks.py` validates backfilled
  task-file frontmatter against the chunk's real SHAs; `scripts/check-md-links.py`
  is the pre-commit broken-relative-link gate over all `.md`.
- `tools/githooks/` is the repo `core.hooksPath` (enabled with `make hooks`):
  `prepare-commit-msg` stamps provenance trailers (`Plan-Item`/`Run-Id`/
  `Agent-Kind`) onto run-driven commits, alongside the git-lfs hooks.
- `tools/qemu-net-harness.sh` runs the current QEMU net harness, with
  `tools/qemu-net-smoke.sh` asserting virtio-net transport, MSI-X metadata
  selection, kernel-owned MSI-X vector-pool allocation/programming, masked
  route-lifecycle proof, queue vector assignment, descriptor guards, ARP, and
  ICMP fixture lines.
- `fuzz/` contains fuzz targets for manifest Cap'n Proto decoding (with the
  production reader-options envelope), mkmanifest JSON conversion/validation,
  ELF parsing, Telnet IAC filtering, terminal line discipline, ring SQE wire
  validation, ISO 9660 PVD/directory-record parsing, the `CAPOSRO1`/
  `CAPOSST1`/`CAPOSWF1` storage mount parsers, and the `capos-tls` X.509
  validity walk.

Validation: `cargo test-mkmanifest`, `make generated-code-check`,
`make fuzz-build`, `make fuzz-smoke`.

## Documentation

- `docs/capability-model.md` is the current capability architecture reference.
- `docs/architecture/threading.md` and `docs/architecture/park.md` record the
  accepted contracts and first implementation for in-process thread ownership
  and private ParkSpace authority.
- `docs/*-design.md` files record targeted implemented or accepted designs.
- `docs/proposals/` contains accepted, future, exploratory, and rejected
  designs.
- `docs/research/` summarizes prior art (the `capability-systems-survey.md`
  synthesis plus per-system deep-dive reports).
- `docs/proposals/mdbook-docs-site-proposal.md` defines the documentation site
  structure and status vocabulary used by the orientation pages.
