Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Repository Map

This map names the main source locations for the current system. It is not an ownership file; use it to find the code behind architecture and validation claims.

Root Files

  • README.md gives the compact project overview.
  • docs/roadmap.md records long-range stages and broad feature direction.
  • docs/tasks/state.toml records the current selected milestone.
  • docs/tasks/README.md defines the task-ledger schema and dispatch semantics.
  • docs/tasks/*.md, docs/tasks/on-hold/, docs/tasks/active/, docs/tasks/review/, and docs/tasks/done/ carry task lifecycle records.
  • docs/tasks/** carries open review-finding remediation records; REVIEW_FINDINGS.md is a tombstone for pre-migration links.
  • REVIEW.md defines review expectations.
  • Makefile builds pinned tools, userspace binaries, manifests, ISO images, QEMU targets, formatting checks, generated-code checks, and policy checks.
  • rust-toolchain.toml declares the Rust nightly channel, required targets, and rust-src; it does not pin an exact nightly by date or commit.
  • .cargo/config.toml sets the default bare-metal target and useful cargo aliases.

Schema and Shared ABIs

  • docs/abi-evolution-policy.md defines compatibility classes, schema ordinal rules, ring-layout rules, version negotiation, and deprecation windows for externally visible ABI changes.
  • schema/capos.capnp defines capability interfaces, manifest structures, exceptions, ProcessSpawner, ProcessHandle, and transfer-related schema.
  • capos-abi/src/lib.rs defines small no_std ABI/policy constants shared by crates that should not depend on schema/config internals, including process quotas and credential policy limits.
  • capos-config/src/manifest.rs defines the host and no_std manifest model.
  • capos-config/src/ring.rs defines CapRingHeader, SQE/CQE structures, opcodes, flags, and transport error constants shared by kernel and userspace.
  • capos-config/src/capset.rs defines the read-only bootstrap CapSet ABI.
  • capos-config/src/cue.rs supports evaluated CUE-style manifest data.
  • capos-config/src/credential_policy.rs re-exports credential policy limits; full PHC parsing is enabled by the credential-validation feature for bootstrap validators that need credential checks.
  • capos-config/tests/ring_loom.rs models bounded ring protocol behavior with Loom.

Validation: cargo test-config, cargo test-ring-loom, make generated-code-check.

Shared Pure Logic

  • capos-lib/src/elf.rs parses ELF64 images for kernel loading and host tests.
  • capos-lib/src/cap_table.rs implements CapId, capability-table storage, stale-generation checks, grant preparation, transfer transaction helpers, commit, rollback, and the CapTable quota constants sourced from capos-abi.
  • capos-lib/src/frame_bitmap.rs implements the host-testable physical frame bitmap core.
  • capos-lib/src/frame_ledger.rs contains a bounded frame-grant helper kept for host-test coverage; current MemoryObject accounting charges CapTable::ResourceLedger.
  • capos-lib/src/lazy_buffer.rs provides bounded lazy buffers used by ring scratch paths.
  • capos-lib/src/iso9660.rs is the pure ISO 9660 primary-volume-descriptor and directory-record parser the kernel boot-ISO driver (kernel/src/iso/) delegates to; fuzz target iso9660_volume.
  • capos-lib/src/storage_format.rs holds the pure CAPOSRO1 (rofs), CAPOSST1 (disk_store), and CAPOSWF1 (writable_fs) mount parsers the kernel storage cap backers delegate to, including the shared record-layout constants the kernel writers reuse; fuzz targets storage_rofs_mount, storage_disk_store_mount, storage_writable_fs_mount.

Validation: cargo test-lib, cargo miri-lib, make kani-lib, fuzz targets under fuzz/fuzz_targets/.

Kernel

  • kernel/src/main.rs is the boot entry point, hardware setup sequence, manifest parsing path, and boot-launched service creation path. run_init resolves PID 1 from the kernel-embedded boot::INIT_ELF when initConfig.init.binary == capos_config::RESERVED_INIT_BINARY_NAME ("init") and otherwise from SystemManifest.binaries; for the embedded case it also injects the embedded image into the ProcessSpawner binary set under the reserved name so child spawns of init resolve.
  • kernel/src/boot.rs exposes boot::INIT_ELF: &[u8], the PID 1 init image packaged at build time. kernel/build.rs reads the prebuilt init/ artifact (CAPOS_INIT_ELF, with a conventional-path fallback) and generates the include_bytes! static; init/ stays a standalone crate (byte packaging, not linker merging).
  • kernel/src/spawn.rs loads user ELF images, creates process state, maps bootstrap pages, and enqueues spawned processes.
  • kernel/src/process.rs defines Process, Thread, ThreadState, per-thread kernel stacks, park waiter storage, and userspace CPU context.
  • kernel/src/sched.rs implements the single-CPU scheduler, timer-driven preemption, blocking cap_enter, direct IPC handoff, ParkSpace wait/wake, and deferred cancellation wakeups.
  • kernel/src/serial.rs implements COM1/COM2 UART setup, manifest-driven console-vs-terminal routing, and kernel print macros.
  • kernel/src/pci.rs implements early PCI config-space access through legacy I/O ports and ACPI MCFG/PCIe ECAM, with QEMU diagnostics for the current virtio-net and Q35 discovery paths, plus reusable memory-BAR subregion validation, kernel MMIO mapping helpers for in-kernel drivers, and MSI/MSI-X capability metadata discovery plus typed MSI-X table programming.
  • kernel/src/device_interrupt.rs records the current kernel-owned virtio-net MSI-X config/RX/TX sources, their generation ids, route state, in-kernel driver owner, lock-free bounded device MSI vector-pool dispatch slots, and claimed-route reassignment/release without exposing userspace interrupt authority.
  • kernel/src/device_dma.rs holds the kernel-owned, fixed-size DMA pool accounting ledgers. The net-keyed VIRTIO_NET_DMA_POOL backs virtio-net’s DmaPage path; a focused single-queue VIRTIO_BLK_DMA_POOL (reusing the shared ActivePage/QueueAccount types, same generation-checked handle and scrub-before-free invariants) backs the virtio-blk request buffer. Each device’s VirtqueueDma seam impl delegates to its own pool’s keyed API.
  • kernel/src/dma_backend.rs (always compiled) records the boot-time IOMMU probe verdict and resolves the fail-closed DMA backend selection (direct IOMMU remapping only with a verified probe, else kernel-owned bounce buffers) per the “Cloud DMA Backend” contract in docs/dma-isolation-design.md, emitting the boot proof line.
  • kernel/src/device_manager/ holds bounded in-kernel PCI device ownership records. The full DDF surface (device records, DMA pools/buffers, MSI-X interrupts, NVMe brokered controller registers, IOMMU domain ledgers, virtio ring publication, proofs) compiles only under cfg(feature = "qemu") in qemu_full.rs; the MMIO-only surface used by cap::device_mmio exists in both builds, dispatching to stub.rs (one-slot parked-region DeviceMmio record) in the production non-qemu build.
  • kernel/src/nvme_storage_backend.rs (cfg(not(feature = "qemu"))) is the fail-closed activation gate for the always-built NVMe BlockDevice read arm: modeled on dma_backend, it resolves a production handle only when a brokered controller was discovered and a live device_mmio grant is staged, otherwise the block_device grant fails closed with a typed error.
  • kernel/src/virtio_transport.rs (always compiled) is the device-agnostic virtio modern-PCI transport host surface: capability/region discovery constants and bounded volatile MMIO accessors usable outside the qemu-gated legacy virtio path.
  • kernel/src/virtio.rs (cfg(qemu)) holds the legacy in-kernel virtio transport, now a qemu-only fixture: the non-qemu production build compiles kernel/src/virtio_stub.rs instead, whose typed negative results keep stale or fixture-only kernel networking call sites failing closed. It includes the virtqueue drivers used by the IOMMU remapping proof. Its pub(crate) mod transport is the device-generic layer: split-ring/common-config constants, the MmioRegion accessor, the VirtqueueDescriptorTracker, the VirtqueueDma DMA/notify seam, the seam-driven Virtqueue/DmaPage with their poll/submit/complete loop and the multi-descriptor submit_request_chain, and the device-id-parameterized discover_modern_transport. virtio-net is one seam caller (VirtioNetDma); virtio-blk is a second (VirtioBlkDma + VirtioBlkDriver, diagnose_virtio_blk_transport, the block_device_* request API behind the BlockDevice cap). Net-specific provider/proof methods stay in the parent module as impl Virtqueue<VirtioNetDma>.
  • kernel/src/iommu.rs (cfg(qemu)) programs the Intel VT-d legacy-mode remapping tables, drives the hardware-DMA translation/fault proof, and runs the register-based invalidation revocation cycle.
  • kernel/src/iso/ (cfg(boot_iso_read) / cfg(boot_iso) / cfg(qemu)) is the boot-time ISO reader for the Boot Binary ISO Layout track. AtapiDevice (gate 1) locates the legacy IDE ATAPI device and exposes a bounded read_sectors(lba, count, buf) over polled-PIO READ(12) packet commands with range/length validation. IsoFs (gate 2) is a read-only ISO 9660 driver layered on it: it parses the primary volume descriptor, walks directory records, and serves open_file(name) -> (lba, size) under /boot/bins/, validating every directory record and derived extent against the volume size before use (fail-closed BadVolume/NotFound/NotDirectory). boot_read_proof() reads the PVD (CD001) and boot_fs_proof() walks to /boot/bins/PAYLOAD.BIN and verifies its content, both behind boot_iso_read as the make run-boot-iso-read proof. The boot_source submodule (gate 4, cfg(boot_iso)) builds a validated (name, lba, size) registry from every declared manifest binary name (mapping each name to the ISO 9660 d-character form, e.g. capos-shell -> /boot/bins/CAPOS_SHELL) and reads ELF bytes on demand behind a device mutex; run_init and ProcessSpawnerCap consume it so the boot_iso kernel loads binaries from the ISO instead of embedded NamedBlob.data. Proofs: make run-boot-iso and the default make run-smoke. Under cfg(qemu) the always-on AtapiDevice/IsoFs surface (plus a qemu-gated block_size()/list_boot_bins() enumeration helper) also backs the read-only install-source fixture cap (kernel/src/cap/installable_image.rs).

Validation: cargo build --features qemu, make run-smoke, make run-spawn, make run-net, make run-iommu-remapping.

Kernel Architecture

  • kernel/src/arch/x86_64/gdt.rs sets up kernel/user segments and TSS state.
  • kernel/src/arch/x86_64/idt.rs handles exceptions and timer interrupts; CPL3 #PF/#GP/#UD/#DB/#BP faults terminate the whole owning process through sched::exit_current_thread_terminating_process (deferred whole-process termination when sibling threads are live; proof make run-user-fault), while CPL0 faults still halt the machine.
  • kernel/src/arch/x86_64/syscall.rs implements syscall MSR setup and entry.
  • kernel/src/arch/x86_64/context.rs defines timer context-switch state.
  • kernel/src/arch/x86_64/pic.rs and pit.rs configure legacy interrupt hardware.
  • kernel/src/arch/x86_64/ioapic.rs maps MADT I/O APICs and programs masked legacy IRQ routes from interrupt-source overrides.
  • kernel/src/arch/x86_64/lapic.rs programs the xAPIC LAPIC timer and IPIs.
  • kernel/src/arch/x86_64/smap.rs enables SMEP/SMAP and brackets user memory access.
  • kernel/src/arch/x86_64/tls.rs handles FS-base/TLS support.
  • kernel/src/arch/x86_64/pci_config.rs provides legacy PCI config I/O used by the higher-level PCI module alongside its ECAM backend.
  • kernel/src/arch/x86_64/percpu.rs, smp.rs, and tlb.rs provide per-CPU data, AP startup, and TLB shootdown for the SMP scheduler.

Kernel Memory

  • kernel/src/mem/frame.rs wraps the shared frame bitmap with Limine memory map initialization and global kernel access.
  • kernel/src/mem/paging.rs manages page tables, address spaces, permissions, user mappings, W^X enforcement, and address-space teardown.
  • kernel/src/mem/heap.rs initializes the kernel heap.
  • kernel/src/mem/validate.rs validates user buffers before kernel access.

Related docs: DMA Isolation, Trusted Build Inputs.

Kernel Capabilities

  • kernel/src/cap/mod.rs initializes kernel capabilities and builds the first service’s kernel-sourced bootstrap capability table.
  • kernel/src/cap/table.rs re-exports shared capability-table logic and owns the kernel-global table.
  • kernel/src/cap/ring.rs validates and dispatches ring SQEs.
  • kernel/src/cap/transfer.rs validates transfer descriptors and prepares transfer transactions.
  • kernel/src/cap/endpoint.rs implements Endpoint CALL, RECV, RETURN, queued state, cleanup, and cancellation behavior.
  • kernel/src/cap/console.rs implements serial Console.
  • kernel/src/cap/terminal_session.rs implements the session-scoped TerminalSession line-oriented terminal with bounded readLine, echo modes, and cancellation.
  • kernel/src/cap/boot_package.rs implements the read-only BootPackage manifest-size/chunked-read capability.
  • kernel/src/cap/manual.rs implements the read-only Manual capability: it parses the boot-packaged ManualCorpus blob (embedded as the manual-corpus named binary) and answers page/apropos/topics/section/describe/ buildInfo.
  • kernel/src/cap/log.rs implements the Phase 1 monitoring log surface: LogSink (write) and LogReader (read) over a shared bounded, drop-oldest kernel recent-record ring. The sink drops records below the boot-seeded SystemConfig.logLevel threshold and forwards accepted records to serial; the reader returns records at/after a cursor with LogFilter (minLevel/componentPrefix), nextCursor, and dropped (docs/proposals/system-monitoring-proposal.md).
  • kernel/src/cap/block_device.rs implements the BlockDevice CapObject (readBlocks/writeBlocks/info/flush). In the non-qemu production build the block_device source resolves to the userspace-brokered NVMe arm (BlockDeviceBackend::NvmeBrokered, gated by kernel/src/nvme_storage_backend.rs); the qemu build routes bounded inline-Data sector I/O to the kernel-owned virtio-blk driver in kernel/src/virtio.rs as a named fixture, not production storage (proof make run-virtio-blk). The cap is scoped to one device_index: the block_device source reaches the resolved non-target boot/storage disk, and block_device_target (KernelCapSource.blockDeviceTarget @44) reaches the manifest-selected PCI identity when it names a bound non-boot virtio-blk disk. A cap for one disk grants no authority over another. The kernel binds up to device_dma::MAX_VIRTIO_BLK_DEVICES (currently 2) virtio-blk devices, each with an independent driver/DMA-pool/interrupt-route instance (VirtioBlkDriver<const DEV> / VirtioBlkDma<const DEV> over VIRTIO_BLK_DMA_POOLS[DEV]); kernel/src/pci.rs enumerates each device with a device index (proof make run-multi-virtio-blk). Target grants fail closed when the selector is absent, mismatched, or names the resolved boot disk. Counts are bounded to one bounce-buffer page.
  • kernel/src/cap/readonly_fs.rs implements the read-only filesystem service: ReadOnlyFsDirectoryCap / ReadOnlyFsFileCap parse a fixed CAPOSRO1 on-disk layout read through the kernel-owned virtio-blk driver and serve Directory.list/open + File.read/stat; every mutating method fails closed. Granted via the read_only_fs_root KernelCapSource (returns a root Directory cap; qemu-gated, mounts at grant resolution and fails closed on a malformed/absent image). Host image builder tools/mkstore-image --readonly-fs; proof make run-storage-fs.
  • kernel/src/cap/persistent_store.rs implements the disk-backed persistent Store: DiskStoreCap serves the Store interface (put/get/has/ delete) over a fixed CAPOSST1 on-disk layout read and written through a read+write BlockSource seam. put bump-allocates a data extent, writes the blob and entry record, then rewrites the superblock last as the durability commit point; delete tombstones the entry slot, and a later space-exhausting put compacts live entries through a shadow generation before recommitting the canonical front generation; the mount validates the superblock and every entry extent in-bounds and fails closed on a malformed image. The Virtio BlockSource (qemu kernel) routes to the kernel-owned virtio-blk driver byte-identically (folding in the data_region_base_lba() offset) and mounts eagerly at grant resolution; the Nvme BlockSource (built under cloud_persistent_store_over_nvme_proof) reads/writes through a granted NVMe BlockDevice window op and defers its mount-parse to the first Store call. Granted via the persistent_store KernelCapSource (virtio arm qemu-gated; the third NVMe-proof arm resolves the live device_mmio handle). Host image builder tools/mkstore-image; reboot proof make run-storage-persist (two QEMU passes on one disk image); NVMe put-then-get proof make run-cloud-provider-persistent-store-over-nvme via kernel/src/cap/persistent_store_over_nvme_proof.rs.
  • kernel/src/cap/writable_fs.rs implements the disk-backed writable filesystem service: WritableDirectoryCap serves list/open/mkdir/remove/rename/ create and WritableFileCap serves read/write/stat/truncate/sync/ close over a fixed CAPOSWF1 on-disk layout (a flat node-record array with parent pointers + a bump-allocated data region) written through a BlockSource seam. The RAM tree is the working copy; each mutation write-through-commits in the order data sector → node-record sector → superblock. A filesystem-wide fail-closed single-writer policy admits one writer at a time. The Virtio BlockSource (qemu/installable kernels) routes to the kernel-owned virtio-blk driver byte-identically (folding the data_region_base_lba() offset) and mounts the singleton eagerly; the Nvme BlockSource (built under cloud_writable_fs_over_nvme_proof) reads/writes through a granted NVMe BlockDevice window op and defers the singleton mount-parse to the first Directory/File call. Granted via the writable_fs_root KernelCapSource (virtio arm qemu-gated; the third NVMe-proof arm resolves the live device_mmio handle), which mounts the process-wide singleton volume once and hands each grant a distinct writer id; fails closed on a malformed image. The NVMe write-then-read durability proof (make run-cloud-provider-writable-fs-over-nvme via kernel/src/cap/writable_fs_over_nvme_proof.rs, which supersedes and drops the persistent-store-over-NVMe proof) exercises both BlockDevice arms with the single-writer policy intact. The combined image builder tools/mkstore-image --writable co-locates the CAPOSST1 Store sub-volume (LBA 0) and the CAPOSWF1 filesystem sub-volume on one disk; reboot proof make run-storage-writable (two QEMU passes: mutate then verify both the filesystem and the store survive). A slot becomes live on the next mount only once the superblock’s bumped node_count is observed, so a poweroff in the record-written / superblock-pending window leaves an orphan slot the mount ignores. The proof-only storage_writable_recovery feature arms an induced forced poweroff in exactly that window (recovery_crash_after_record); bounded recovery proof make run-storage-writable-recovery (pass 1 commits then is kill -9d mid-allocation, pass 2 verifies recovery to a consistent tree with the interrupted allocation atomically absent). The same crash window is proven over the NVMe BlockDevice arm by make run-cloud-provider-writable-fs-over-nvme-recovery via kernel/src/cap/writable_fs_over_nvme_recovery_proof.rs (a recovery cap-waiter clone that implies and supersedes the happy-path proof module/route/init); the cloud_writable_fs_over_nvme_recovery_proof feature widens the storage_writable_recovery crash-window cfg gate, and the host-built NVMe image (tools/mkstore-image --writable-nvme, empty superblock + root-only node table) is booted twice with -device nvme (no @20 seed). writable_fs::mount_config_root (qemu-gated) scopes a writable Directory to the system/config subtree for the boot-time data-region grant below.
  • kernel/src/cap/installable_image.rs implements the read-only install-source fixture (Installable System track item 5b): InstallableImageDirectoryCap serves list/open and InstallableImageFileCap serves read/stat/close over the booted CD-ROM ISO 9660 /boot/bins/ tree, reading through the kernel/src/iso/ boot_iso ATAPI/ISO 9660 driver behind a single shared-device mutex (so PIO does not interleave across CPUs). Every mutating method fails closed; a past-EOF read clamps to empty and an absent name is rejected, reusing the driver’s validate_extent/read_sectors range checks. Granted via the qemu-gated installable_image_source KernelCapSource (mounts the ATAPI volume and validates /boot/bins/ at grant resolution, failing the spawn closed on an absent/malformed medium). Physically scoped to the ATAPI CD-ROM, so it cannot reach the writable virtio-blk target disk (block_device_target/writable_fs_root). Consumer demo demos/installable-image-source/; manifest system-installable-image-source.cue; proof make run-installable-image-source.
  • demos/installable-system-install/ implements capos-system-install, the Installable System install flow (track item 6): under the read-only installable_image_source Directory and the target-scoped block_device_target BlockDevice selected by manifest PCI identity, it copies the packaged bootable boot-region head (BOOTHEAD.BIN) to LBA 0, writes the backup GPT (BOOTGPT.BIN) at the LBA read from the primary GPT header, and initializes an empty data region (DATAIMG.BIN, tools/mkstore-image --writable --empty-config) at the fixed cap::data_region_base_lba, validating ranges and verifying the read-back. It reads packaged files in 32 KiB windows (under the read-path reply scratch bound; see docs/tasks/done/2026-06-05/storage-file-read-reply-scratch-clamp.md) and zero-skips the FAT free space. tools/split-boot-region.py splits the mkdiskimage boot image into the head + backup GPT so only the populated prefix is packaged. Pass-1 installer manifest system-installable-install.cue; pass-2 installed manifest (baked into the boot region) system-installable-install-target.cue; harness tools/qemu-installable-install-smoke.sh; proof make run-installable-install (pass 1 installs into a second virtio-blk disk, pass 2 boots it standalone).
  • kernel/src/cap/mod.rs grant_data_region (proof-only installable_data_region feature) is the Installable System boot-time data-region mount: run_init best-effort grants init a system/config Directory (data-config) plus the persistent Store (data-store) over the auto-attached data disk, failing closed wholesale to the base manifest (caps unchanged, “no data region; base floor” diagnostic) when the disk is absent, malformed, or missing system/config. No new cap type or schema change. Proof make run-installable-data-region (seeded disk prints resolved contents; no disk and zeroed-superblock disk hit the base floor).
  • Installable System config-overlay compose/merge (track item 3): the SystemConfigOverlay capnp object + SystemManifest.extensionPoints (ManifestExtensionPoints) live in schema/capos.capnp; the typed decode, content-hash check, and compose_onto precedence (base-pins-win / overlay-adds-within-declared-extension-points / no-new-authority) live in capos-config/src/manifest.rs. init/src/main.rs apply_config_overlay reads system/config/overlay.bin from the granted data-config Directory, composes the overlay over the base plan, and falls closed to the base floor with [init] overlay rejected: <reason>. The tools/mkmanifest mkoverlay bin encodes overlays (filling the canonical hash) and tools/mkstore-image --writable --seed-overlay seeds them. Proof make run-installable-overlay.
  • Installable System generations + rollback + failed-boot auto-fallback (track item 4): userspace-only over the already-granted Store + writable system/config Directory, no schema or kernel change. init/src/main.rs run_generation_rollback_checks (gated by a base service named generation-proof) represents system-config generations as content-addressed Store objects keyed by SHA-256, tracks the known-good active pointer and a staged/attempting candidate pointer as monotonic-epoch marker files (gen-active/gen-candidate) in the writable config region, records a boot attempt durably before applying a candidate, auto-falls-back to the known-good generation when a candidate is left unconfirmed (the brick-proofing guarantee), promotes a confirmed candidate, rolls config back to a retained prior generation, and rejects a stale/replayed (lower-or-equal-epoch) pointer. A present-but-undecodable gen-candidate marker (the torn size-0 file a poweroff inside the CREATE|TRUNCATE rewrite window leaves, or garbage bytes) is discarded with a loud diagnostic and boot falls back to the known-good generation, while a corrupt gen-active marker takes a distinct loud FATAL refuse-to-boot path (the known-good generation is genuinely unknown). Manifest system-installable-generation.cue; proof make run-installable-generation boots a --seed-config disk three times (boot 1 exercises the mechanism and leaves an unconfirmed candidate; boot 2 proves across-reboot auto-fallback to the known-good generation, then leaves a torn size-0 candidate marker; boot 3 proves torn-marker recovery).
  • Installable System integrated bootable disk (track item 5, proof-only installable_disk feature, implies installable_data_region): one disk carries the boot ESP (GPT partition 1) and the co-located CAPOSST1 Store + CAPOSWF1 writable data region (GPT partition 2). kernel/src/cap/mod.rs data_region_base_lba returns the fixed partition-2 base LBA (264192) under the feature (0 otherwise), applied at the single persistent_store/ writable_fs read_range/write_range choke points so the kernel reads the region at that fixed tool/kernel-contract LBA without parsing the GPT. tools/mkdiskimage.sh --data-image/--data-offset-bytes fold the tools/mkstore-image --writable image into partition 2 and derive the ESP size from --esp-sectors (integrated disk uses the same 128 MiB ESP as the raw disk-image targets so a debug kernel fits). Manifest system-installable-disk.cue; proof make run-installable-disk boots one virtio-blk disk and asserts the data region mounts from the boot disk and a data-region-only overlay service runs.
  • kernel/src/cap/frame_alloc.rs implements FrameAllocator and MemoryObject.
  • kernel/src/cap/virtual_memory.rs implements per-process anonymous memory operations.
  • kernel/src/cap/timer.rs implements monotonic now and bounded sleep.
  • kernel/src/cap/wall_clock.rs implements the read-only WallClock.wallTime cap: UTC over a fixed boot base layered on the monotonic timebase, reporting the fail-closed untrusted ClockProvenance (Phase 1 fixed-boot-base variant; docs/proposals/time-and-clock-proposal.md).
  • kernel/src/cap/park_space.rs implements the process-local ParkSpace marker capability used by compact park (CAP_OP_PARK/CAP_OP_UNPARK) opcodes.
  • kernel/src/cap/network.rs implements the qemu-only NetworkManager, TcpListener, TcpSocket, and UdpSocket fixture caps. The kernel no longer depends on smoltcp; non-qemu manifests reject the kernel network_manager / tcp_listen_authority grant sources (fail closed), and the production socket path is the Phase C userspace network-stack process. The socket-backed SocketTerminalSession shim is retired: TcpSocket.intoTerminalSession fails closed in every dispatch path.
  • kernel/src/cap/process_spawner.rs implements ProcessSpawner and ProcessHandle.
  • kernel/src/cap/provider_cap_waiter_proof.rs (non-qemu, cloud_provider_cap_waiter_proof Cargo feature) stages a fully-programmed-route bootstrap Interrupt grant source and the InterruptCapWaiterProof cap whose Interrupt.wait injects one device_interrupt::handle_lapic_delivery dispatch and whose Interrupt.acknowledge retires the deferred LAPIC EOI; the cap’s on_release runs the masked-no-wake + reassign + stale-handle assertion chain before emitting cloudboot-evidence: provider-cap-waiter <token>. Mutually exclusive with cap::interrupt_grant_source_prod (default cloudboot path) and skips cap::provider_nic_bind_proof / cap::storage_bind_proof to keep the bound route live for the userspace cap-waiter handoff. Proof: make run-cloud-provider-cap-waiter.
  • kernel/src/cap/virtio_net_device_bringup_proof.rs (non-qemu, cloud_virtio_net_device_bringup_proof Cargo feature; mutually exclusive with cloud_provider_cap_waiter_proof and the userspace selected-write handshake proof) drives the bounded virtio status sequence kernel-side over the picked virtio-net PCI function (vendor 0x1af4, device 0x1000 / 0x1041): resolves the modern virtio PCI transport regions through virtio_transport::parse_modern_pci_transport_capabilities, maps the common configuration window through pci::map_bar_region, and drives reset → ACKNOWLEDGE → DRIVER → feature discovery + driver-feature selection (VIRTIO_F_VERSION_1 only) → FEATURES_OK → DRIVER_OK with a trailing reset on every exit path. Inline assertions gate the headline cloudboot-evidence: virtio-net-device-bringup <token> on the negotiated feature set, COMMON_NUM_QUEUES >= 2, DRIVER_OK observation, and the final reset returning device_status to 0. Marker carries queue_setup=not-attempted, tx_descriptor=not-published, userspace_cap=not-issued, msix_function_enable=not-toggled, device_autonomous_raise=not-attempted, live_cloud=not-attempted. Proof: make run-cloud-provider-virtio-net-bringup.
  • cloud_virtio_net_userspace_features_ok_proof (non-qemu; proof make run-cloud-prod-nic-driver-userspace-features-ok) is Phase C slice 1 of the userspace NIC relocation track. It makes cap::devicemmio_grant_source_prod stage the picked virtio-net modern common-config window as a selected-write DeviceMmio cap with registerWrite=selected-write-common-config-handshake; the userspace smoke drives reset -> ACKNOWLEDGE -> DRIVER -> FEATURES_OK over DeviceMmio.write32 and proves queue-address writes remain fail-closed. It is mutually exclusive with the kernel-owned virtio-net bringup, bundle, and queue-materialization proof chain over the same BDF/grant path, and with the cloud_nvme_readonly_bind_proof descendant chain because both stage a proof-specific production DeviceMmio grant source.
  • kernel/src/cap/virtio_net_tx_authority_bundle_proof.rs, kernel/src/cap/virtio_net_tx_queue_materialization_proof.rs, and kernel/src/cap/virtio_net_msix_function_enable_proof.rs are the decomposed userspace-TX track. Each is non-qemu and gated by its own focused-proof Cargo feature (cloud_virtio_net_tx_authority_bundle_proof, cloud_virtio_net_tx_queue_materialization_proof, and cloud_virtio_net_msix_function_enable_proof respectively; the last implies the second so the bundle observer + production grant-source pickers + userspace bundle smoke stay compiled in across the chain). The bundle proof observes the three production grant sources (devicemmio_grant_source_prod, dmapool_grant_source_prod, interrupt_grant_source_prod) issuing one cap each into the spawned userspace bundle smoke and asserts same-BDF; the queue-materialization proof drives the kernel-side modern-virtio status sequence through DRIVER_OK and materializes one manager-owned TX virtqueue from three zeroed brokered frames, asserting register read-backs and post-reset clearance; the MSI-X function-enable proof extends that sequence with one canonical mask-first PCI MSI-X function-level enable (set FUNCTION_MASK, then ENABLE, then clear both) plus best-effort cleanup on every exit path. Each child emits its own headline marker (cloudboot-evidence: virtio-net-tx-authority-bundle <token>, cloudboot-evidence: virtio-net-tx-queue-materialization <token>, and cloudboot-evidence: virtio-net-msix-function-enable <token>); when the later feature is active the earlier markers are intentionally suppressed because their discipline labels would be inaccurate. Proofs: make run-cloud-provider-virtio-net-tx-authority-bundle, make run-cloud-provider-virtio-net-tx-queue-materialization, make run-cloud-provider-virtio-net-msix-function-enable.
  • kernel/src/cap/virtio_net_userspace_rx_dma_proof.rs (Phase C slice 4a-ii, gated cloud_virtio_net_userspace_rx_bringup_proof) drives the first real RX DMA from the shim-owned vring: post_rx_descriptor writes the RX descriptor
    • avail over the shim’s retained RX vring physes at DMABuffer.submitDescriptor time, and drive_rx_dma (reached from the now-live provider_notify_doorbell_write_for_cap) rings the RX doorbell, submits a kernel-half SLIRP TX ARP stimulus over the retained TX physes, polls one real device->host completion, and resets the device (clearing the retained enabled flags to release the ring-buffer pins). Self-contained byte-level vring helpers are duplicated from virtio_net_polled_provider to protect run-net. The notify region is mapped kernel-side + the per-queue notify slot offsets captured by cap::devicemmio_grant_source_prod (rx_dma_notify_state). Proof make run-cloud-prod-nic-driver-userspace-rx-bringup (extended).
  • kernel/src/cap/null.rs implements the measurement-only NullCap.
  • kernel/src/cap/park_bench.rs implements the measurement-only ParkBench authority used by make run-measure.

Related docs: Capability Model, Authority Accounting.

Userspace

  • init/ is the standalone init process. In the spawn smoke, it uses ProcessSpawner, grants initial child capabilities, waits on ProcessHandles, and checks hostile spawn inputs.
  • capos-rt/src/entry.rs owns the runtime entry path and bootstrap validation.
  • capos-rt/src/alloc.rs initializes the userspace heap.
  • capos-rt/src/syscall.rs provides raw syscall wrappers.
  • capos-rt/src/capset.rs provides typed CapSet lookup helpers.
  • capos-rt/src/ring.rs implements the safe single-owner ring client, out-of-order completion handling, transfer descriptor packing, and result-cap parsing.
  • capos-rt/src/client.rs implements typed clients for Console, TerminalSession, BootPackage, ProcessSpawner, ProcessHandle, and Timer. The client-side methods are generic over Transport; result-cap-adopting methods stay on the concrete RuntimeRingClient.
  • capos-rt/src/transport.rs defines the Transport seam (the client-side CALL/completion/RELEASE ring operations) and the in-system RingTransport (RingClient viewed through the seam). A host remote transport is a later slice; see docs/backlog/capos-sdk-dual-transport.md.
  • capos/ is the front-door SDK facade crate: for the default ring feature it re-exports the capos-rt runtime, typed clients, the entry_point! macro, and a prelude. The remote feature is reserved. Standalone, like capos-rt.
  • capos-rt/src/pollselect.rs is the pure POSIX poll/select bridge: SocketReadiness -> poll revents (POLLIN/POLLOUT/POLLHUP/POLLERR/ POLLNVAL) and select set membership, plus unsupported_request_bits for fail-closed flag handling. Shared by the libcapos-posix C surface and the posix-socket-poll-select-smoke proof. Proof: make run-posix-socket-poll-select.
  • capos-rt/src/panic.rs provides the emergency Console panic output path.
  • capos-rt/src/bin/smoke.rs is the runtime smoke binary used by focused runtime proofs rather than the default boot manifest.
  • capos-service/src/lib.rs is the standalone no_std service lifecycle layer above capos-rt; slice 1 exposes ServiceMain, ServiceRuntime, and ordered initialize/dependency-wait/ready/run/drain/shutdown/cleanup phases.
  • shell/src/main.rs is the native capability shell, built as the standalone capos-shell crate and packaged by system.cue, system-shell.cue, and the focused login manifests.

Validation: make capos-rt-check, make run-smoke, make run-spawn, make run-shell, make run-terminal. The former Telnet fixture is retired with the qemu-only kernel TCP listener.

Standalone C and WASI Substrates

These are standalone crates (not workspace members) built by the Makefile.

  • libcapos/ builds libcapos.a, a no_std Rust staticlib exposing the capos-rt syscall/ring/CapSet path and typed Console/Timer/ EntropySource/VirtualMemory wrappers plus C heap shims to C consumers. Public header at libcapos/include/capos/capos.h. No POSIX surface.
  • libcapos-posix/ builds libcapos_posix.a, a no_std Rust staticlib layering a POSIX adapter over libcapos: per-process fd table, errno cell, historical UDP socket wrappers over the retired qemu-only kernel UdpSocket cap, clock over Timer, pipe/dup over Pipe, poll/select (poll.rs, <poll.h>/<sys/select.h>) over the capos-rt::pollselect readiness bridge with fail-closed unsupported-flag / EBADF / EINVAL handling, and fork/execve/waitpid via the recording-shim ProcessSpawner Move-grant path, plus the libc surface the dash port needs: stdio/string/stdlib/ctype helpers, strerror/qsort/umask/abort/ strtoll/strpbrk/lstat/getgroups/wait3/vfork, byte-order helpers (inet.rs), getrlimit/setrlimit (resource.rs), setlocale (locale.rs), times + tcgetattr, C-locale wchar/wctype multibyte (wchar.rs), the environ pointer, and the sys_siglist array. C headers (the namespaced source of truth) under libcapos-posix/include/capos/posix/ – including the dash-needed sys/types.h, termios.h, sys/resource.h, sys/times.h, wchar.h, wctype.h, locale.h, inttypes.h, and the decl-only sys/ioctl.h/sys/mman.h/arpa/inet.h/getopt.h/paths.h/ sys/param.h. libcapos-posix/sysroot/include/ is the -nostdinc bare-header sysroot (<stdio.h>, <unistd.h>, <sys/stat.h>, …) whose wrappers forward into that namespace; mirrored C ports (dash) build against it via the Makefile’s CAPOS_C_SYSROOT_INCLUDE flags on the capos-c-multitu-elf rule. Focused sysroot proof make run-c-libc-surface.
  • capos-wasm/ is the no_std WASI host adapter: a wasmi-backed Runtime, the wasm-host userspace binary, the Preview 1 import resolver, and the manifest-supplied wasm payload reader.
  • vendor/wasmi-no_std/ and vendor/dns-c-wahern/ are static-pinned, no-patches upstream snapshots consumed by capos-wasm/ and the POSIX DNS smoke; do not patch them in place (refresh procedure in each VENDORED_FROM.md).
  • vendor/dash/ is the mirror-as-is dash 0.5.13.4 snapshot (src/ stays byte-identical; capOS deviations live under patches/). Its capOS build pipeline lives outside the mirror under vendor/dash/capos/: the pinned config.h and gen-tables.sh (stages a patched source copy + runs the six host table generators). The Makefile dash target builds target/dash/dash.elf through capos-c-multitu-elf against libcapos.a + libcapos_posix.a.

Validation: make run-c-hello, make run-posix-pipe-smoke, make run-posix-printf, make run-wasm-host, make run-wasi-hello-rust, make run-wasi-random. The former POSIX DNS smoke is retired with the qemu-only kernel UdpSocket owner.

Demo Services

demos/ is a nested userspace smoke-test workspace. Each demo is a release-built service binary packaged into the boot manifest:

  • adventure-client, adventure-server, adventure-npc-shopkeeper, adventure-npc-wanderer
  • capos-chat, chat-bot, chat-client, chat-server
  • capset-bootstrap, console-paths, credential-store
  • endpoint-queue-limit-smoke, endpoint-roundtrip, ipc-server, ipc-client, in-flight-call-limit-smoke
  • frame-allocator-cleanup, memoryobject-shared-child, memoryobject-shared-parent
  • paperclips, paperclips-content
  • revocable-read, revocation-observer
  • ring-corruption, ring-reserved-opcodes, ring-nop, ring-fairness
  • service-common, shell-spawn-test, shell-typed-call
  • terminal-session, terminal-stranger
  • timer-smoke, timer-flood
  • tls-smoke, unprivileged-stranger, virtual-memory
  • user-fault-parent, user-fault-victim (user fault containment proof, make run-user-fault)

Shared demo support lives in demos/capos-demo-support/src/lib.rs and uses capos-rt for entry, allocator, syscall, CapSet, and panic support while keeping raw ring helpers for low-level transport smokes.

Validation: make run-spawn.

Manifest and Tooling

  • system.cue is the default init-owned boot manifest source. It imports the shared defaults package, boot-launches standalone init, and lets init start the shell, remote-session CapSet gateway, and resident services.
  • system-spawn.cue is the ProcessSpawner smoke manifest source.
  • system-smoke.cue is the scripted focused shell-led login/shell smoke manifest source.
  • system-chat.cue, system-adventure.cue, and system-paperclips.cue are focused resident-service and terminal-demo manifest sources.
  • system-memoryobject-shared.cue, system-revocable-read.cue, and system-measure.cue are focused regression/measurement manifest sources.
  • system-shell.cue is the focused anonymous-shell manifest source (no verifier, shell stays anonymous).
  • system-terminal.cue is the focused TerminalSession proof manifest source.
  • system-credential.cue is the focused CredentialStore proof manifest source.
  • system-login.cue is the focused password-login proof manifest source.
  • system-login-setup.cue is the focused first-boot setup proof manifest source.
  • tools/mkmanifest/ evaluates manifest input, embeds binaries, validates manifest shape, writes boot-manifest Cap’n Proto bytes, and provides cue-to-capnp for schema-aware CUE-authored data-message conversion. Its sibling mkoverlay bin encodes a SystemConfigOverlay from CUE into the system/config/overlay.bin bytes (filling the canonical content hash) for the installable-system config-overlay proof.
  • tools/manualc/ is the System Manual corpus compiler: it parses schema/capos.capnp for section-2 interface pages, reads the authored man corpus under docs/manual/man<section>/*.man, and emits the boot-packaged ManualCorpus blob. It fails the build if any in-tree capability interface lacks a section-2 page (i.e. a schema doc comment).
  • docs/manual/ holds the authored man-shaped corpus consumed by manualc (section 1 shell-command pages and section 7 concept pages); section-2 capability pages are generated from the schema, not stored here.
  • system-manual-smoke.cue is the focused Manual proof manifest source.
  • tools/agent-session-recaps/ contains private-session recap and raw-archive tooling for the agentic development experiment. The tools are tracked here; raw transcripts and generated recap stores stay outside the repo unless explicitly redacted and reviewed.
  • tools/check-generated-capnp.sh verifies checked-in generated schema output.
  • scripts/record_worklog.py emits per-task commit spans (from each task’s commits: list, falling back to task-file history) for the development timeline/Gantt; scripts/validate_backfill_tasks.py validates backfilled task-file frontmatter against the chunk’s real SHAs; scripts/check-md-links.py is the pre-commit broken-relative-link gate over all .md.
  • tools/githooks/ is the repo core.hooksPath (enabled with make hooks): prepare-commit-msg stamps provenance trailers (Plan-Item/Run-Id/ Agent-Kind) onto run-driven commits, alongside the git-lfs hooks.
  • tools/qemu-net-harness.sh runs the current QEMU net harness, with tools/qemu-net-smoke.sh asserting virtio-net transport, MSI-X metadata selection, kernel-owned MSI-X vector-pool allocation/programming, masked route-lifecycle proof, queue vector assignment, descriptor guards, ARP, and ICMP fixture lines.
  • fuzz/ contains fuzz targets for manifest Cap’n Proto decoding (with the production reader-options envelope), mkmanifest JSON conversion/validation, ELF parsing, Telnet IAC filtering, terminal line discipline, ring SQE wire validation, ISO 9660 PVD/directory-record parsing, the CAPOSRO1/ CAPOSST1/CAPOSWF1 storage mount parsers, and the capos-tls X.509 validity walk.

Validation: cargo test-mkmanifest, make generated-code-check, make fuzz-build, make fuzz-smoke.

Documentation

  • docs/capability-model.md is the current capability architecture reference.
  • docs/architecture/threading.md and docs/architecture/park.md record the accepted contracts and first implementation for in-process thread ownership and private ParkSpace authority.
  • docs/*-design.md files record targeted implemented or accepted designs.
  • docs/proposals/ contains accepted, future, exploratory, and rejected designs.
  • docs/research/ summarizes prior art (the capability-systems-survey.md synthesis plus per-system deep-dive reports).
  • docs/proposals/mdbook-docs-site-proposal.md defines the documentation site structure and status vocabulary used by the orientation pages.