Repository Map
This map names the main source locations for the current system. It is not an ownership file; use it to find the code behind architecture and validation claims.
Root Files
README.mdgives the compact project overview.docs/roadmap.mdrecords long-range stages and broad feature direction.docs/tasks/state.tomlrecords the current selected milestone.docs/tasks/README.mddefines the task-ledger schema and dispatch semantics.docs/tasks/*.md,docs/tasks/on-hold/,docs/tasks/active/,docs/tasks/review/, anddocs/tasks/done/carry task lifecycle records.docs/tasks/**carries open review-finding remediation records;REVIEW_FINDINGS.mdis a tombstone for pre-migration links.REVIEW.mddefines review expectations.Makefilebuilds pinned tools, userspace binaries, manifests, ISO images, QEMU targets, formatting checks, generated-code checks, and policy checks.rust-toolchain.tomldeclares the Rust nightly channel, required targets, andrust-src; it does not pin an exact nightly by date or commit..cargo/config.tomlsets the default bare-metal target and useful cargo aliases.
Schema and Shared ABIs
docs/abi-evolution-policy.mddefines compatibility classes, schema ordinal rules, ring-layout rules, version negotiation, and deprecation windows for externally visible ABI changes.schema/capos.capnpdefines capability interfaces, manifest structures, exceptions, ProcessSpawner, ProcessHandle, and transfer-related schema.capos-abi/src/lib.rsdefines small no_std ABI/policy constants shared by crates that should not depend on schema/config internals, including process quotas and credential policy limits.capos-config/src/manifest.rsdefines the host and no_std manifest model.capos-config/src/ring.rsdefinesCapRingHeader, SQE/CQE structures, opcodes, flags, and transport error constants shared by kernel and userspace.capos-config/src/capset.rsdefines the read-only bootstrap CapSet ABI.capos-config/src/cue.rssupports evaluated CUE-style manifest data.capos-config/src/credential_policy.rsre-exports credential policy limits; full PHC parsing is enabled by thecredential-validationfeature for bootstrap validators that need credential checks.capos-config/tests/ring_loom.rsmodels bounded ring protocol behavior with Loom.
Validation: cargo test-config, cargo test-ring-loom,
make generated-code-check.
Shared Pure Logic
capos-lib/src/elf.rsparses ELF64 images for kernel loading and host tests.capos-lib/src/cap_table.rsimplementsCapId, capability-table storage, stale-generation checks, grant preparation, transfer transaction helpers, commit, rollback, and the CapTable quota constants sourced fromcapos-abi.capos-lib/src/frame_bitmap.rsimplements the host-testable physical frame bitmap core.capos-lib/src/frame_ledger.rscontains a bounded frame-grant helper kept for host-test coverage; current MemoryObject accounting chargesCapTable::ResourceLedger.capos-lib/src/lazy_buffer.rsprovides bounded lazy buffers used by ring scratch paths.capos-lib/src/iso9660.rsis the pure ISO 9660 primary-volume-descriptor and directory-record parser the kernel boot-ISO driver (kernel/src/iso/) delegates to; fuzz targetiso9660_volume.capos-lib/src/storage_format.rsholds the pureCAPOSRO1(rofs),CAPOSST1(disk_store), andCAPOSWF1(writable_fs) mount parsers the kernel storage cap backers delegate to, including the shared record-layout constants the kernel writers reuse; fuzz targetsstorage_rofs_mount,storage_disk_store_mount,storage_writable_fs_mount.
Validation: cargo test-lib, cargo miri-lib, make kani-lib, fuzz targets
under fuzz/fuzz_targets/.
Kernel
kernel/src/main.rsis the boot entry point, hardware setup sequence, manifest parsing path, and boot-launched service creation path.run_initresolves PID 1 from the kernel-embeddedboot::INIT_ELFwheninitConfig.init.binary == capos_config::RESERVED_INIT_BINARY_NAME("init") and otherwise fromSystemManifest.binaries; for the embedded case it also injects the embedded image into theProcessSpawnerbinary set under the reserved name so child spawns ofinitresolve.kernel/src/boot.rsexposesboot::INIT_ELF: &[u8], the PID 1 init image packaged at build time.kernel/build.rsreads the prebuiltinit/artifact (CAPOS_INIT_ELF, with a conventional-path fallback) and generates theinclude_bytes!static;init/stays a standalone crate (byte packaging, not linker merging).kernel/src/spawn.rsloads user ELF images, creates process state, maps bootstrap pages, and enqueues spawned processes.kernel/src/process.rsdefinesProcess,Thread,ThreadState, per-thread kernel stacks, park waiter storage, and userspace CPU context.kernel/src/sched.rsimplements the single-CPU scheduler, timer-driven preemption, blockingcap_enter, direct IPC handoff, ParkSpace wait/wake, and deferred cancellation wakeups.kernel/src/serial.rsimplements COM1/COM2 UART setup, manifest-driven console-vs-terminal routing, and kernel print macros.kernel/src/pci.rsimplements early PCI config-space access through legacy I/O ports and ACPI MCFG/PCIe ECAM, with QEMU diagnostics for the current virtio-net and Q35 discovery paths, plus reusable memory-BAR subregion validation, kernel MMIO mapping helpers for in-kernel drivers, and MSI/MSI-X capability metadata discovery plus typed MSI-X table programming.kernel/src/device_interrupt.rsrecords the current kernel-owned virtio-net MSI-X config/RX/TX sources, their generation ids, route state, in-kernel driver owner, lock-free bounded device MSI vector-pool dispatch slots, and claimed-route reassignment/release without exposing userspace interrupt authority.kernel/src/device_dma.rsholds the kernel-owned, fixed-size DMA pool accounting ledgers. The net-keyedVIRTIO_NET_DMA_POOLbacks virtio-net’sDmaPagepath; a focused single-queueVIRTIO_BLK_DMA_POOL(reusing the sharedActivePage/QueueAccounttypes, same generation-checked handle and scrub-before-free invariants) backs the virtio-blk request buffer. Each device’sVirtqueueDmaseam impl delegates to its own pool’s keyed API.kernel/src/dma_backend.rs(always compiled) records the boot-time IOMMU probe verdict and resolves the fail-closed DMA backend selection (direct IOMMU remapping only with a verified probe, else kernel-owned bounce buffers) per the “Cloud DMA Backend” contract indocs/dma-isolation-design.md, emitting the boot proof line.kernel/src/device_manager/holds bounded in-kernel PCI device ownership records. The full DDF surface (device records, DMA pools/buffers, MSI-X interrupts, NVMe brokered controller registers, IOMMU domain ledgers, virtio ring publication, proofs) compiles only undercfg(feature = "qemu")inqemu_full.rs; the MMIO-only surface used bycap::device_mmioexists in both builds, dispatching tostub.rs(one-slot parked-regionDeviceMmiorecord) in the production non-qemubuild.kernel/src/nvme_storage_backend.rs(cfg(not(feature = "qemu"))) is the fail-closed activation gate for the always-built NVMeBlockDeviceread arm: modeled ondma_backend, it resolves a production handle only when a brokered controller was discovered and a livedevice_mmiogrant is staged, otherwise theblock_devicegrant fails closed with a typed error.kernel/src/virtio_transport.rs(always compiled) is the device-agnostic virtio modern-PCI transport host surface: capability/region discovery constants and bounded volatile MMIO accessors usable outside theqemu-gated legacy virtio path.kernel/src/virtio.rs(cfg(qemu)) holds the legacy in-kernel virtio transport, now a qemu-only fixture: the non-qemuproduction build compileskernel/src/virtio_stub.rsinstead, whose typed negative results keep stale or fixture-only kernel networking call sites failing closed. It includes the virtqueue drivers used by the IOMMU remapping proof. Itspub(crate) mod transportis the device-generic layer: split-ring/common-config constants, theMmioRegionaccessor, theVirtqueueDescriptorTracker, theVirtqueueDmaDMA/notify seam, the seam-drivenVirtqueue/DmaPagewith their poll/submit/complete loop and the multi-descriptorsubmit_request_chain, and the device-id-parameterizeddiscover_modern_transport. virtio-net is one seam caller (VirtioNetDma); virtio-blk is a second (VirtioBlkDma+VirtioBlkDriver,diagnose_virtio_blk_transport, theblock_device_*request API behind theBlockDevicecap). Net-specific provider/proof methods stay in the parent module asimpl Virtqueue<VirtioNetDma>.kernel/src/iommu.rs(cfg(qemu)) programs the Intel VT-d legacy-mode remapping tables, drives the hardware-DMA translation/fault proof, and runs the register-based invalidation revocation cycle.kernel/src/iso/(cfg(boot_iso_read)/cfg(boot_iso)/cfg(qemu)) is the boot-time ISO reader for the Boot Binary ISO Layout track.AtapiDevice(gate 1) locates the legacy IDE ATAPI device and exposes a boundedread_sectors(lba, count, buf)over polled-PIOREAD(12)packet commands with range/length validation.IsoFs(gate 2) is a read-only ISO 9660 driver layered on it: it parses the primary volume descriptor, walks directory records, and servesopen_file(name) -> (lba, size)under/boot/bins/, validating every directory record and derived extent against the volume size before use (fail-closedBadVolume/NotFound/NotDirectory).boot_read_proof()reads the PVD (CD001) andboot_fs_proof()walks to/boot/bins/PAYLOAD.BINand verifies its content, both behindboot_iso_readas themake run-boot-iso-readproof. Theboot_sourcesubmodule (gate 4,cfg(boot_iso)) builds a validated(name, lba, size)registry from every declared manifest binary name (mapping each name to the ISO 9660 d-character form, e.g.capos-shell->/boot/bins/CAPOS_SHELL) and reads ELF bytes on demand behind a device mutex;run_initandProcessSpawnerCapconsume it so theboot_isokernel loads binaries from the ISO instead of embeddedNamedBlob.data. Proofs:make run-boot-isoand the defaultmake run-smoke. Undercfg(qemu)the always-onAtapiDevice/IsoFssurface (plus a qemu-gatedblock_size()/list_boot_bins()enumeration helper) also backs the read-only install-source fixture cap (kernel/src/cap/installable_image.rs).
Validation: cargo build --features qemu, make run-smoke, make run-spawn,
make run-net, make run-iommu-remapping.
Kernel Architecture
kernel/src/arch/x86_64/gdt.rssets up kernel/user segments and TSS state.kernel/src/arch/x86_64/idt.rshandles exceptions and timer interrupts; CPL3 #PF/#GP/#UD/#DB/#BP faults terminate the whole owning process throughsched::exit_current_thread_terminating_process(deferred whole-process termination when sibling threads are live; proofmake run-user-fault), while CPL0 faults still halt the machine.kernel/src/arch/x86_64/syscall.rsimplements syscall MSR setup and entry.kernel/src/arch/x86_64/context.rsdefines timer context-switch state.kernel/src/arch/x86_64/pic.rsandpit.rsconfigure legacy interrupt hardware.kernel/src/arch/x86_64/ioapic.rsmaps MADT I/O APICs and programs masked legacy IRQ routes from interrupt-source overrides.kernel/src/arch/x86_64/lapic.rsprograms the xAPIC LAPIC timer and IPIs.kernel/src/arch/x86_64/smap.rsenables SMEP/SMAP and brackets user memory access.kernel/src/arch/x86_64/tls.rshandles FS-base/TLS support.kernel/src/arch/x86_64/pci_config.rsprovides legacy PCI config I/O used by the higher-level PCI module alongside its ECAM backend.kernel/src/arch/x86_64/percpu.rs,smp.rs, andtlb.rsprovide per-CPU data, AP startup, and TLB shootdown for the SMP scheduler.
Kernel Memory
kernel/src/mem/frame.rswraps the shared frame bitmap with Limine memory map initialization and global kernel access.kernel/src/mem/paging.rsmanages page tables, address spaces, permissions, user mappings, W^X enforcement, and address-space teardown.kernel/src/mem/heap.rsinitializes the kernel heap.kernel/src/mem/validate.rsvalidates user buffers before kernel access.
Related docs: DMA Isolation, Trusted Build Inputs.
Kernel Capabilities
kernel/src/cap/mod.rsinitializes kernel capabilities and builds the first service’s kernel-sourced bootstrap capability table.kernel/src/cap/table.rsre-exports shared capability-table logic and owns the kernel-global table.kernel/src/cap/ring.rsvalidates and dispatches ring SQEs.kernel/src/cap/transfer.rsvalidates transfer descriptors and prepares transfer transactions.kernel/src/cap/endpoint.rsimplements Endpoint CALL, RECV, RETURN, queued state, cleanup, and cancellation behavior.kernel/src/cap/console.rsimplements serial Console.kernel/src/cap/terminal_session.rsimplements the session-scoped TerminalSession line-oriented terminal with boundedreadLine, echo modes, and cancellation.kernel/src/cap/boot_package.rsimplements the read-only BootPackage manifest-size/chunked-read capability.kernel/src/cap/manual.rsimplements the read-only Manual capability: it parses the boot-packagedManualCorpusblob (embedded as themanual-corpusnamed binary) and answerspage/apropos/topics/section/describe/buildInfo.kernel/src/cap/log.rsimplements the Phase 1 monitoring log surface:LogSink(write) andLogReader(read) over a shared bounded, drop-oldest kernel recent-record ring. The sink drops records below the boot-seededSystemConfig.logLevelthreshold and forwards accepted records to serial; the reader returns records at/after a cursor withLogFilter(minLevel/componentPrefix),nextCursor, anddropped(docs/proposals/system-monitoring-proposal.md).kernel/src/cap/block_device.rsimplements theBlockDeviceCapObject(readBlocks/writeBlocks/info/flush). In the non-qemuproduction build theblock_devicesource resolves to the userspace-brokered NVMe arm (BlockDeviceBackend::NvmeBrokered, gated bykernel/src/nvme_storage_backend.rs); theqemubuild routes bounded inline-Datasector I/O to the kernel-owned virtio-blk driver inkernel/src/virtio.rsas a named fixture, not production storage (proofmake run-virtio-blk). The cap is scoped to onedevice_index: theblock_devicesource reaches the resolved non-target boot/storage disk, andblock_device_target(KernelCapSource.blockDeviceTarget @44) reaches the manifest-selected PCI identity when it names a bound non-boot virtio-blk disk. A cap for one disk grants no authority over another. The kernel binds up todevice_dma::MAX_VIRTIO_BLK_DEVICES(currently 2) virtio-blk devices, each with an independent driver/DMA-pool/interrupt-route instance (VirtioBlkDriver<const DEV>/VirtioBlkDma<const DEV>overVIRTIO_BLK_DMA_POOLS[DEV]);kernel/src/pci.rsenumerates each device with a device index (proofmake run-multi-virtio-blk). Target grants fail closed when the selector is absent, mismatched, or names the resolved boot disk. Counts are bounded to one bounce-buffer page.kernel/src/cap/readonly_fs.rsimplements the read-only filesystem service:ReadOnlyFsDirectoryCap/ReadOnlyFsFileCapparse a fixedCAPOSRO1on-disk layout read through the kernel-owned virtio-blk driver and serveDirectory.list/open+File.read/stat; every mutating method fails closed. Granted via theread_only_fs_rootKernelCapSource(returns a rootDirectorycap; qemu-gated, mounts at grant resolution and fails closed on a malformed/absent image). Host image buildertools/mkstore-image --readonly-fs; proofmake run-storage-fs.kernel/src/cap/persistent_store.rsimplements the disk-backed persistentStore:DiskStoreCapserves theStoreinterface (put/get/has/delete) over a fixedCAPOSST1on-disk layout read and written through a read+writeBlockSourceseam.putbump-allocates a data extent, writes the blob and entry record, then rewrites the superblock last as the durability commit point;deletetombstones the entry slot, and a later space-exhaustingputcompacts live entries through a shadow generation before recommitting the canonical front generation; the mount validates the superblock and every entry extent in-bounds and fails closed on a malformed image. TheVirtioBlockSource(qemu kernel) routes to the kernel-owned virtio-blk driver byte-identically (folding in thedata_region_base_lba()offset) and mounts eagerly at grant resolution; theNvmeBlockSource(built undercloud_persistent_store_over_nvme_proof) reads/writes through a granted NVMeBlockDevicewindow op and defers its mount-parse to the firstStorecall. Granted via thepersistent_storeKernelCapSource(virtio arm qemu-gated; the third NVMe-proof arm resolves the livedevice_mmiohandle). Host image buildertools/mkstore-image; reboot proofmake run-storage-persist(two QEMU passes on one disk image); NVMe put-then-get proofmake run-cloud-provider-persistent-store-over-nvmeviakernel/src/cap/persistent_store_over_nvme_proof.rs.kernel/src/cap/writable_fs.rsimplements the disk-backed writable filesystem service:WritableDirectoryCapserveslist/open/mkdir/remove/rename/createandWritableFileCapservesread/write/stat/truncate/sync/closeover a fixedCAPOSWF1on-disk layout (a flat node-record array with parent pointers + a bump-allocated data region) written through aBlockSourceseam. The RAM tree is the working copy; each mutation write-through-commits in the order data sector → node-record sector → superblock. A filesystem-wide fail-closed single-writer policy admits one writer at a time. TheVirtioBlockSource(qemu/installable kernels) routes to the kernel-owned virtio-blk driver byte-identically (folding thedata_region_base_lba()offset) and mounts the singleton eagerly; theNvmeBlockSource(built undercloud_writable_fs_over_nvme_proof) reads/writes through a granted NVMeBlockDevicewindow op and defers the singleton mount-parse to the firstDirectory/Filecall. Granted via thewritable_fs_rootKernelCapSource(virtio arm qemu-gated; the third NVMe-proof arm resolves the livedevice_mmiohandle), which mounts the process-wide singleton volume once and hands each grant a distinct writer id; fails closed on a malformed image. The NVMe write-then-read durability proof (make run-cloud-provider-writable-fs-over-nvmeviakernel/src/cap/writable_fs_over_nvme_proof.rs, which supersedes and drops the persistent-store-over-NVMe proof) exercises bothBlockDevicearms with the single-writer policy intact. The combined image buildertools/mkstore-image --writableco-locates theCAPOSST1Storesub-volume (LBA 0) and theCAPOSWF1filesystem sub-volume on one disk; reboot proofmake run-storage-writable(two QEMU passes: mutate then verify both the filesystem and the store survive). A slot becomes live on the next mount only once the superblock’s bumpednode_countis observed, so a poweroff in the record-written / superblock-pending window leaves an orphan slot the mount ignores. The proof-onlystorage_writable_recoveryfeature arms an induced forced poweroff in exactly that window (recovery_crash_after_record); bounded recovery proofmake run-storage-writable-recovery(pass 1 commits then iskill -9d mid-allocation, pass 2 verifies recovery to a consistent tree with the interrupted allocation atomically absent). The same crash window is proven over the NVMeBlockDevicearm bymake run-cloud-provider-writable-fs-over-nvme-recoveryviakernel/src/cap/writable_fs_over_nvme_recovery_proof.rs(a recovery cap-waiter clone that implies and supersedes the happy-path proof module/route/init); thecloud_writable_fs_over_nvme_recovery_prooffeature widens thestorage_writable_recoverycrash-window cfg gate, and the host-built NVMe image (tools/mkstore-image --writable-nvme, empty superblock + root-only node table) is booted twice with-device nvme(no@20seed).writable_fs::mount_config_root(qemu-gated) scopes a writableDirectoryto thesystem/configsubtree for the boot-time data-region grant below.kernel/src/cap/installable_image.rsimplements the read-only install-source fixture (Installable System track item 5b):InstallableImageDirectoryCapserveslist/openandInstallableImageFileCapservesread/stat/closeover the booted CD-ROM ISO 9660/boot/bins/tree, reading through thekernel/src/iso/boot_isoATAPI/ISO 9660 driver behind a single shared-device mutex (so PIO does not interleave across CPUs). Every mutating method fails closed; a past-EOF read clamps to empty and an absent name is rejected, reusing the driver’svalidate_extent/read_sectorsrange checks. Granted via the qemu-gatedinstallable_image_sourceKernelCapSource(mounts the ATAPI volume and validates/boot/bins/at grant resolution, failing the spawn closed on an absent/malformed medium). Physically scoped to the ATAPI CD-ROM, so it cannot reach the writable virtio-blk target disk (block_device_target/writable_fs_root). Consumer demodemos/installable-image-source/; manifestsystem-installable-image-source.cue; proofmake run-installable-image-source.demos/installable-system-install/implementscapos-system-install, the Installable System install flow (track item 6): under the read-onlyinstallable_image_sourceDirectoryand the target-scopedblock_device_targetBlockDeviceselected by manifest PCI identity, it copies the packaged bootable boot-region head (BOOTHEAD.BIN) to LBA 0, writes the backup GPT (BOOTGPT.BIN) at the LBA read from the primary GPT header, and initializes an empty data region (DATAIMG.BIN,tools/mkstore-image --writable --empty-config) at the fixedcap::data_region_base_lba, validating ranges and verifying the read-back. It reads packaged files in 32 KiB windows (under the read-path reply scratch bound; seedocs/tasks/done/2026-06-05/storage-file-read-reply-scratch-clamp.md) and zero-skips the FAT free space.tools/split-boot-region.pysplits the mkdiskimage boot image into the head + backup GPT so only the populated prefix is packaged. Pass-1 installer manifestsystem-installable-install.cue; pass-2 installed manifest (baked into the boot region)system-installable-install-target.cue; harnesstools/qemu-installable-install-smoke.sh; proofmake run-installable-install(pass 1 installs into a second virtio-blk disk, pass 2 boots it standalone).kernel/src/cap/mod.rsgrant_data_region(proof-onlyinstallable_data_regionfeature) is the Installable System boot-time data-region mount:run_initbest-effort grants init asystem/configDirectory(data-config) plus the persistentStore(data-store) over the auto-attached data disk, failing closed wholesale to the base manifest (caps unchanged, “no data region; base floor” diagnostic) when the disk is absent, malformed, or missingsystem/config. No new cap type or schema change. Proofmake run-installable-data-region(seeded disk prints resolved contents; no disk and zeroed-superblock disk hit the base floor).- Installable System config-overlay compose/merge (track item 3): the
SystemConfigOverlaycapnp object +SystemManifest.extensionPoints(ManifestExtensionPoints) live inschema/capos.capnp; the typed decode, content-hash check, andcompose_ontoprecedence (base-pins-win / overlay-adds-within-declared-extension-points / no-new-authority) live incapos-config/src/manifest.rs.init/src/main.rsapply_config_overlayreadssystem/config/overlay.binfrom the granteddata-configDirectory, composes the overlay over the base plan, and falls closed to the base floor with[init] overlay rejected: <reason>. Thetools/mkmanifestmkoverlaybin encodes overlays (filling the canonical hash) andtools/mkstore-image --writable --seed-overlayseeds them. Proofmake run-installable-overlay. - Installable System generations + rollback + failed-boot auto-fallback (track
item 4): userspace-only over the already-granted Store + writable
system/configDirectory, no schema or kernel change.init/src/main.rsrun_generation_rollback_checks(gated by a base service namedgeneration-proof) represents system-config generations as content-addressedStoreobjects keyed by SHA-256, tracks the known-goodactivepointer and a staged/attemptingcandidatepointer as monotonic-epoch marker files (gen-active/gen-candidate) in the writable config region, records a boot attempt durably before applying a candidate, auto-falls-back to the known-good generation when a candidate is left unconfirmed (the brick-proofing guarantee), promotes a confirmed candidate, rolls config back to a retained prior generation, and rejects a stale/replayed (lower-or-equal-epoch) pointer. A present-but-undecodablegen-candidatemarker (the torn size-0 file a poweroff inside the CREATE|TRUNCATE rewrite window leaves, or garbage bytes) is discarded with a loud diagnostic and boot falls back to the known-good generation, while a corruptgen-activemarker takes a distinct loud FATAL refuse-to-boot path (the known-good generation is genuinely unknown). Manifestsystem-installable-generation.cue; proofmake run-installable-generationboots a--seed-configdisk three times (boot 1 exercises the mechanism and leaves an unconfirmed candidate; boot 2 proves across-reboot auto-fallback to the known-good generation, then leaves a torn size-0 candidate marker; boot 3 proves torn-marker recovery). - Installable System integrated bootable disk (track item 5, proof-only
installable_diskfeature, impliesinstallable_data_region): one disk carries the boot ESP (GPT partition 1) and the co-locatedCAPOSST1Store +CAPOSWF1writable data region (GPT partition 2).kernel/src/cap/mod.rsdata_region_base_lbareturns the fixed partition-2 base LBA (264192) under the feature (0 otherwise), applied at the singlepersistent_store/writable_fsread_range/write_rangechoke points so the kernel reads the region at that fixed tool/kernel-contract LBA without parsing the GPT.tools/mkdiskimage.sh--data-image/--data-offset-bytesfold thetools/mkstore-image --writableimage into partition 2 and derive the ESP size from--esp-sectors(integrated disk uses the same 128 MiB ESP as the raw disk-image targets so a debug kernel fits). Manifestsystem-installable-disk.cue; proofmake run-installable-diskboots one virtio-blk disk and asserts the data region mounts from the boot disk and a data-region-only overlay service runs. kernel/src/cap/frame_alloc.rsimplements FrameAllocator and MemoryObject.kernel/src/cap/virtual_memory.rsimplements per-process anonymous memory operations.kernel/src/cap/timer.rsimplements monotonicnowand boundedsleep.kernel/src/cap/wall_clock.rsimplements the read-onlyWallClock.wallTimecap: UTC over a fixed boot base layered on the monotonic timebase, reporting the fail-closeduntrustedClockProvenance(Phase 1 fixed-boot-base variant;docs/proposals/time-and-clock-proposal.md).kernel/src/cap/park_space.rsimplements the process-local ParkSpace marker capability used by compact park (CAP_OP_PARK/CAP_OP_UNPARK) opcodes.kernel/src/cap/network.rsimplements the qemu-only NetworkManager, TcpListener, TcpSocket, and UdpSocket fixture caps. The kernel no longer depends onsmoltcp; non-qemumanifests reject the kernelnetwork_manager/tcp_listen_authoritygrant sources (fail closed), and the production socket path is the Phase C userspace network-stack process. The socket-backedSocketTerminalSessionshim is retired:TcpSocket.intoTerminalSessionfails closed in every dispatch path.kernel/src/cap/process_spawner.rsimplements ProcessSpawner and ProcessHandle.kernel/src/cap/provider_cap_waiter_proof.rs(non-qemu,cloud_provider_cap_waiter_proofCargo feature) stages a fully-programmed-route bootstrapInterruptgrant source and theInterruptCapWaiterProofcap whoseInterrupt.waitinjects onedevice_interrupt::handle_lapic_deliverydispatch and whoseInterrupt.acknowledgeretires the deferred LAPIC EOI; the cap’son_releaseruns the masked-no-wake + reassign + stale-handle assertion chain before emittingcloudboot-evidence: provider-cap-waiter <token>. Mutually exclusive withcap::interrupt_grant_source_prod(default cloudboot path) and skipscap::provider_nic_bind_proof/cap::storage_bind_proofto keep the bound route live for the userspace cap-waiter handoff. Proof:make run-cloud-provider-cap-waiter.kernel/src/cap/virtio_net_device_bringup_proof.rs(non-qemu,cloud_virtio_net_device_bringup_proofCargo feature; mutually exclusive withcloud_provider_cap_waiter_proofand the userspace selected-write handshake proof) drives the bounded virtio status sequence kernel-side over the picked virtio-net PCI function (vendor0x1af4, device0x1000/0x1041): resolves the modern virtio PCI transport regions throughvirtio_transport::parse_modern_pci_transport_capabilities, maps the common configuration window throughpci::map_bar_region, and drives reset → ACKNOWLEDGE → DRIVER → feature discovery + driver-feature selection (VIRTIO_F_VERSION_1only) → FEATURES_OK → DRIVER_OK with a trailing reset on every exit path. Inline assertions gate the headlinecloudboot-evidence: virtio-net-device-bringup <token>on the negotiated feature set,COMMON_NUM_QUEUES >= 2, DRIVER_OK observation, and the final reset returningdevice_statusto 0. Marker carriesqueue_setup=not-attempted,tx_descriptor=not-published,userspace_cap=not-issued,msix_function_enable=not-toggled,device_autonomous_raise=not-attempted,live_cloud=not-attempted. Proof:make run-cloud-provider-virtio-net-bringup.cloud_virtio_net_userspace_features_ok_proof(non-qemu; proofmake run-cloud-prod-nic-driver-userspace-features-ok) is Phase C slice 1 of the userspace NIC relocation track. It makescap::devicemmio_grant_source_prodstage the picked virtio-net modern common-config window as a selected-writeDeviceMmiocap withregisterWrite=selected-write-common-config-handshake; the userspace smoke drives reset -> ACKNOWLEDGE -> DRIVER -> FEATURES_OK overDeviceMmio.write32and proves queue-address writes remain fail-closed. It is mutually exclusive with the kernel-owned virtio-net bringup, bundle, and queue-materialization proof chain over the same BDF/grant path, and with thecloud_nvme_readonly_bind_proofdescendant chain because both stage a proof-specific productionDeviceMmiogrant source.kernel/src/cap/virtio_net_tx_authority_bundle_proof.rs,kernel/src/cap/virtio_net_tx_queue_materialization_proof.rs, andkernel/src/cap/virtio_net_msix_function_enable_proof.rsare the decomposed userspace-TX track. Each is non-qemuand gated by its own focused-proof Cargo feature (cloud_virtio_net_tx_authority_bundle_proof,cloud_virtio_net_tx_queue_materialization_proof, andcloud_virtio_net_msix_function_enable_proofrespectively; the last implies the second so the bundle observer + production grant-source pickers + userspace bundle smoke stay compiled in across the chain). The bundle proof observes the three production grant sources (devicemmio_grant_source_prod,dmapool_grant_source_prod,interrupt_grant_source_prod) issuing one cap each into the spawned userspace bundle smoke and asserts same-BDF; the queue-materialization proof drives the kernel-side modern-virtio status sequence throughDRIVER_OKand materializes one manager-owned TX virtqueue from three zeroed brokered frames, asserting register read-backs and post-reset clearance; the MSI-X function-enable proof extends that sequence with one canonical mask-first PCI MSI-X function-level enable (setFUNCTION_MASK, thenENABLE, then clear both) plus best-effort cleanup on every exit path. Each child emits its own headline marker (cloudboot-evidence: virtio-net-tx-authority-bundle <token>,cloudboot-evidence: virtio-net-tx-queue-materialization <token>, andcloudboot-evidence: virtio-net-msix-function-enable <token>); when the later feature is active the earlier markers are intentionally suppressed because their discipline labels would be inaccurate. Proofs:make run-cloud-provider-virtio-net-tx-authority-bundle,make run-cloud-provider-virtio-net-tx-queue-materialization,make run-cloud-provider-virtio-net-msix-function-enable.kernel/src/cap/virtio_net_userspace_rx_dma_proof.rs(Phase C slice 4a-ii, gatedcloud_virtio_net_userspace_rx_bringup_proof) drives the first real RX DMA from the shim-owned vring:post_rx_descriptorwrites the RX descriptor- avail over the shim’s retained RX vring physes at
DMABuffer.submitDescriptortime, anddrive_rx_dma(reached from the now-liveprovider_notify_doorbell_write_for_cap) rings the RX doorbell, submits a kernel-half SLIRP TX ARP stimulus over the retained TX physes, polls one real device->host completion, and resets the device (clearing the retainedenabledflags to release the ring-buffer pins). Self-contained byte-level vring helpers are duplicated fromvirtio_net_polled_providerto protectrun-net. The notify region is mapped kernel-side + the per-queue notify slot offsets captured bycap::devicemmio_grant_source_prod(rx_dma_notify_state). Proofmake run-cloud-prod-nic-driver-userspace-rx-bringup(extended).
- avail over the shim’s retained RX vring physes at
kernel/src/cap/null.rsimplements the measurement-only NullCap.kernel/src/cap/park_bench.rsimplements the measurement-only ParkBench authority used bymake run-measure.
Related docs: Capability Model, Authority Accounting.
Userspace
init/is the standalone init process. In the spawn smoke, it uses ProcessSpawner, grants initial child capabilities, waits on ProcessHandles, and checks hostile spawn inputs.capos-rt/src/entry.rsowns the runtime entry path and bootstrap validation.capos-rt/src/alloc.rsinitializes the userspace heap.capos-rt/src/syscall.rsprovides raw syscall wrappers.capos-rt/src/capset.rsprovides typed CapSet lookup helpers.capos-rt/src/ring.rsimplements the safe single-owner ring client, out-of-order completion handling, transfer descriptor packing, and result-cap parsing.capos-rt/src/client.rsimplements typed clients for Console, TerminalSession, BootPackage, ProcessSpawner, ProcessHandle, and Timer. The client-side methods are generic overTransport; result-cap-adopting methods stay on the concreteRuntimeRingClient.capos-rt/src/transport.rsdefines theTransportseam (the client-sideCALL/completion/RELEASEring operations) and the in-systemRingTransport(RingClientviewed through the seam). A host remote transport is a later slice; seedocs/backlog/capos-sdk-dual-transport.md.capos/is the front-door SDK facade crate: for the defaultringfeature it re-exports thecapos-rtruntime, typed clients, theentry_point!macro, and aprelude. Theremotefeature is reserved. Standalone, likecapos-rt.capos-rt/src/pollselect.rsis the pure POSIXpoll/selectbridge:SocketReadiness->pollrevents(POLLIN/POLLOUT/POLLHUP/POLLERR/POLLNVAL) andselectset membership, plusunsupported_request_bitsfor fail-closed flag handling. Shared by thelibcapos-posixC surface and theposix-socket-poll-select-smokeproof. Proof:make run-posix-socket-poll-select.capos-rt/src/panic.rsprovides the emergency Console panic output path.capos-rt/src/bin/smoke.rsis the runtime smoke binary used by focused runtime proofs rather than the default boot manifest.capos-service/src/lib.rsis the standaloneno_stdservice lifecycle layer abovecapos-rt; slice 1 exposesServiceMain,ServiceRuntime, and ordered initialize/dependency-wait/ready/run/drain/shutdown/cleanup phases.shell/src/main.rsis the native capability shell, built as the standalonecapos-shellcrate and packaged bysystem.cue,system-shell.cue, and the focused login manifests.
Validation: make capos-rt-check, make run-smoke, make run-spawn,
make run-shell, make run-terminal. The former Telnet fixture is retired
with the qemu-only kernel TCP listener.
Standalone C and WASI Substrates
These are standalone crates (not workspace members) built by the Makefile.
libcapos/buildslibcapos.a, ano_stdRust staticlib exposing the capos-rt syscall/ring/CapSet path and typedConsole/Timer/EntropySource/VirtualMemorywrappers plus C heap shims to C consumers. Public header atlibcapos/include/capos/capos.h. No POSIX surface.libcapos-posix/buildslibcapos_posix.a, ano_stdRust staticlib layering a POSIX adapter overlibcapos: per-process fd table, errno cell, historical UDP socket wrappers over the retired qemu-only kernelUdpSocketcap, clock overTimer,pipe/dupoverPipe,poll/select(poll.rs,<poll.h>/<sys/select.h>) over thecapos-rt::pollselectreadiness bridge with fail-closed unsupported-flag /EBADF/EINVALhandling, andfork/execve/waitpidvia the recording-shim ProcessSpawner Move-grant path, plus the libc surface the dash port needs: stdio/string/stdlib/ctype helpers,strerror/qsort/umask/abort/strtoll/strpbrk/lstat/getgroups/wait3/vfork, byte-order helpers (inet.rs),getrlimit/setrlimit(resource.rs),setlocale(locale.rs),times+tcgetattr, C-localewchar/wctypemultibyte (wchar.rs), theenvironpointer, and thesys_siglistarray. C headers (the namespaced source of truth) underlibcapos-posix/include/capos/posix/– including the dash-neededsys/types.h,termios.h,sys/resource.h,sys/times.h,wchar.h,wctype.h,locale.h,inttypes.h, and the decl-onlysys/ioctl.h/sys/mman.h/arpa/inet.h/getopt.h/paths.h/sys/param.h.libcapos-posix/sysroot/include/is the-nostdincbare-header sysroot (<stdio.h>,<unistd.h>,<sys/stat.h>, …) whose wrappers forward into that namespace; mirrored C ports (dash) build against it via the Makefile’sCAPOS_C_SYSROOT_INCLUDEflags on thecapos-c-multitu-elfrule. Focused sysroot proofmake run-c-libc-surface.capos-wasm/is theno_stdWASI host adapter: awasmi-backedRuntime, thewasm-hostuserspace binary, the Preview 1 import resolver, and the manifest-supplied wasm payload reader.vendor/wasmi-no_std/andvendor/dns-c-wahern/are static-pinned, no-patches upstream snapshots consumed bycapos-wasm/and the POSIX DNS smoke; do not patch them in place (refresh procedure in eachVENDORED_FROM.md).vendor/dash/is the mirror-as-is dash0.5.13.4snapshot (src/stays byte-identical; capOS deviations live underpatches/). Its capOS build pipeline lives outside the mirror undervendor/dash/capos/: the pinnedconfig.handgen-tables.sh(stages a patched source copy + runs the six host table generators). The Makefiledashtarget buildstarget/dash/dash.elfthroughcapos-c-multitu-elfagainstlibcapos.a+libcapos_posix.a.
Validation: make run-c-hello, make run-posix-pipe-smoke,
make run-posix-printf, make run-wasm-host, make run-wasi-hello-rust,
make run-wasi-random. The former POSIX DNS smoke is retired with the
qemu-only kernel UdpSocket owner.
Demo Services
demos/ is a nested userspace smoke-test workspace. Each demo is a release-built
service binary packaged into the boot manifest:
adventure-client,adventure-server,adventure-npc-shopkeeper,adventure-npc-wanderercapos-chat,chat-bot,chat-client,chat-servercapset-bootstrap,console-paths,credential-storeendpoint-queue-limit-smoke,endpoint-roundtrip,ipc-server,ipc-client,in-flight-call-limit-smokeframe-allocator-cleanup,memoryobject-shared-child,memoryobject-shared-parentpaperclips,paperclips-contentrevocable-read,revocation-observerring-corruption,ring-reserved-opcodes,ring-nop,ring-fairnessservice-common,shell-spawn-test,shell-typed-callterminal-session,terminal-strangertimer-smoke,timer-floodtls-smoke,unprivileged-stranger,virtual-memoryuser-fault-parent,user-fault-victim(user fault containment proof,make run-user-fault)
Shared demo support lives in demos/capos-demo-support/src/lib.rs and uses
capos-rt for entry, allocator, syscall, CapSet, and panic support while
keeping raw ring helpers for low-level transport smokes.
Validation: make run-spawn.
Manifest and Tooling
system.cueis the default init-owned boot manifest source. It imports the shared defaults package, boot-launches standaloneinit, and lets init start the shell, remote-session CapSet gateway, and resident services.system-spawn.cueis the ProcessSpawner smoke manifest source.system-smoke.cueis the scripted focused shell-led login/shell smoke manifest source.system-chat.cue,system-adventure.cue, andsystem-paperclips.cueare focused resident-service and terminal-demo manifest sources.system-memoryobject-shared.cue,system-revocable-read.cue, andsystem-measure.cueare focused regression/measurement manifest sources.system-shell.cueis the focused anonymous-shell manifest source (no verifier, shell stays anonymous).system-terminal.cueis the focused TerminalSession proof manifest source.system-credential.cueis the focused CredentialStore proof manifest source.system-login.cueis the focused password-login proof manifest source.system-login-setup.cueis the focused first-boot setup proof manifest source.tools/mkmanifest/evaluates manifest input, embeds binaries, validates manifest shape, writes boot-manifest Cap’n Proto bytes, and providescue-to-capnpfor schema-aware CUE-authored data-message conversion. Its siblingmkoverlaybin encodes aSystemConfigOverlayfrom CUE into thesystem/config/overlay.binbytes (filling the canonical content hash) for the installable-system config-overlay proof.tools/manualc/is the System Manual corpus compiler: it parsesschema/capos.capnpfor section-2 interface pages, reads the authored man corpus underdocs/manual/man<section>/*.man, and emits the boot-packagedManualCorpusblob. It fails the build if any in-tree capability interface lacks a section-2 page (i.e. a schema doc comment).docs/manual/holds the authored man-shaped corpus consumed bymanualc(section 1 shell-command pages and section 7 concept pages); section-2 capability pages are generated from the schema, not stored here.system-manual-smoke.cueis the focused Manual proof manifest source.tools/agent-session-recaps/contains private-session recap and raw-archive tooling for the agentic development experiment. The tools are tracked here; raw transcripts and generated recap stores stay outside the repo unless explicitly redacted and reviewed.tools/check-generated-capnp.shverifies checked-in generated schema output.scripts/record_worklog.pyemits per-task commit spans (from each task’scommits:list, falling back to task-file history) for the development timeline/Gantt;scripts/validate_backfill_tasks.pyvalidates backfilled task-file frontmatter against the chunk’s real SHAs;scripts/check-md-links.pyis the pre-commit broken-relative-link gate over all.md.tools/githooks/is the repocore.hooksPath(enabled withmake hooks):prepare-commit-msgstamps provenance trailers (Plan-Item/Run-Id/Agent-Kind) onto run-driven commits, alongside the git-lfs hooks.tools/qemu-net-harness.shruns the current QEMU net harness, withtools/qemu-net-smoke.shasserting virtio-net transport, MSI-X metadata selection, kernel-owned MSI-X vector-pool allocation/programming, masked route-lifecycle proof, queue vector assignment, descriptor guards, ARP, and ICMP fixture lines.fuzz/contains fuzz targets for manifest Cap’n Proto decoding (with the production reader-options envelope), mkmanifest JSON conversion/validation, ELF parsing, Telnet IAC filtering, terminal line discipline, ring SQE wire validation, ISO 9660 PVD/directory-record parsing, theCAPOSRO1/CAPOSST1/CAPOSWF1storage mount parsers, and thecapos-tlsX.509 validity walk.
Validation: cargo test-mkmanifest, make generated-code-check,
make fuzz-build, make fuzz-smoke.
Documentation
docs/capability-model.mdis the current capability architecture reference.docs/architecture/threading.mdanddocs/architecture/park.mdrecord the accepted contracts and first implementation for in-process thread ownership and private ParkSpace authority.docs/*-design.mdfiles record targeted implemented or accepted designs.docs/proposals/contains accepted, future, exploratory, and rejected designs.docs/research/summarizes prior art (thecapability-systems-survey.mdsynthesis plus per-system deep-dive reports).docs/proposals/mdbook-docs-site-proposal.mddefines the documentation site structure and status vocabulary used by the orientation pages.