Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Design Proposal: Installable capOS System

This is a design proposal with its bounded local/QEMU path landed. The persistent data-region mount, config-overlay schema + init compose/merge with fail-closed fallback (make run-installable-overlay), generation/rollback machinery (make run-installable-generation), integrated installable disk (make run-installable-disk), target-disk install (make run-installable-install), first-boot provision (make run-installable-provision), and update/rollback flow (make run-installable-update) are implemented. The storage and disk-image prerequisites it builds on have also landed (see Build-On Relationship): block-device-backed read-only and writable filesystems, a persistent content-addressed Store, reboot-surviving writable persistence, and a hybrid BIOS+UEFI disk image. This proposal has been reconciled against those landed contracts and is decomposed separately (see Closeout And Decomposition).

Throughout, landed behavior is written in the present tense and planned behavior in the future/conditional tense. The installed-system proof remains a bounded local/QEMU result: it does not claim secure boot/signing, production release authority, public ingress, AWS/Azure live support, direct-remapping production hardware, userspace smoltcp/L4 readiness, or a persistent Namespace.

Problem

The baseline capOS boot path is a boot-from-image research system. The build packs a Cap’n Proto SystemManifest (compiled from system.cue) plus the userspace binaries into an ISO; Limine loads the manifest as a module; the kernel parses it, builds init’s bootstrap caps, and enters the single initConfig.init process. The boot-binary ISO layout (behind the boot_iso feature) can instead read ELFs on demand from /boot/bins/ so the manifest carries names only. Without the installable-system path, the system that boots is still exactly the image that was built: the next boot re-reads the same immutable manifest and rebuilds the same capability graph.

That baseline is correct for a research image and insufficient for an installed system. An installed capOS is one that:

  • boots from a local disk rather than a re-imaged ISO each time;
  • carries mutable system configuration – installed services, local accounts, network/runtime settings – that persists across reboot and is not baked into the image; and
  • can be updated to a new system generation and rolled back to a known-good one.

The hard question is not the disk format. It is how persistent, mutable system configuration composes with the immutable boot manifest without reintroducing ambient authority or a single mutable blob that can brick the system. That composition is the center of this proposal.

Non-Goals

  • Designing the block device, filesystem, or Store persistence mechanisms. Those are owned by Storage and Naming and the storage tracks in Hardware, Boot, and Storage. This proposal composes them and must not redesign them.
  • Defining the local-account schema. That is Local Users, Storage, and Policy; the account store is a consumer of the persistent-config region designed here.
  • Secure boot, image signing, and manifest trust. Those are tracked as storage-proposal Open Question #5 and the security/verification track; this proposal notes where a signature check would attach but does not specify the cryptography.
  • Any cloud-image or non-ATAPI boot-binary loader work; see the Cloud Device Tracks backlog.

On-Disk Layout

The installed system needs three regions with distinct mutability and authority: a read-only boot region, an immutable-per-generation system region, and a mutable data region. How those regions map onto physical disks is the first reconciliation point, because the landed building blocks already fix part of it.

Landed shape:

  • make image produces a single hybrid BIOS+UEFI raw image with one GPT ESP (FAT32) carrying Limine + the kernel + manifest.bin. That is the boot region (tools/mkdiskimage.sh, make run-disk / make run-disk-bios).
  • The persistent content-addressed Store (CAPOSST1) and writable filesystem (CAPOSWF1) are co-located in the data-region image produced by tools/mkstore-image --writable. Focused storage and early data-region smokes can still attach that image as a separate virtio-blk device.
  • The installable-system disk path has folded those regions onto one bootable disk: GPT partition 1 is the ESP and GPT partition 2 carries the co-located CAPOSST1 + CAPOSWF1 data region at a fixed base LBA read through cap::data_region_base_lba (no GPT parser in the kernel). make run-installable-disk proves boot from that integrated disk.
  • capos-system-install writes the same layout to a manifest-selected target BlockDevice and then boots the installed disk standalone (make run-installable-install). Provisioning and update operate on the installed data region after that floor exists.

Region placement decision (reconciled). The storage model is the co-located CAPOSST1 Store + CAPOSWF1 writable filesystem data region, not a persistent Namespace and not three independent mutable partitions. The separate data-region disk remains a focused proof packaging, while the installed system packages the ESP and data region onto one target disk. The kernel relies on the fixed tool/kernel data-region LBA contract for the installable path.

flowchart TD
    InstallDisk[Installed disk] --> Boot[GPT partition 1: ESP with Limine + kernel + boot manifest]
    InstallDisk --> DataRegion[GPT partition 2: fixed-LBA data region]
    DataRegion --> System[CAPOSST1 content-addressed Store: immutable generation objects]
    DataRegion --> Data[CAPOSWF1 writable filesystem: config/account state and markers]
    Boot -.init mounts data region when present.-> DataRegion
    Data -.active and known-good marker files name hashes.-> System

The system and data regions share the co-located data region: immutable generation objects live in the persistent Store (CAPOSST1), and mutable config/accounts plus the active/known-good pointers live in the writable filesystem (CAPOSWF1). The overlay read/validate/merge that composes them at boot has landed for the installable-system path (see below).

Boot region (read-only at runtime)

The single GPT ESP carrying Limine, the kernel, and the immutable boot manifest – the same SystemManifest shape that exists today. This region is what make image produces now (one hybrid BIOS+UEFI image, one ESP). At runtime it is treated as read-only. The landed update proof stages and commits generation objects in the data region; production boot-region rewrite, rollback, signing, and release policy remain future work.

The boot manifest stays the root of trust for topology: it pins the kernel, the init binary, and the minimum kernel-sourced caps init needs to bootstrap. In the installable-system path, which system/config generation to activate is named by writable-filesystem marker files and persistent Store hashes (see Generations And Rollback); the SystemManifest still carries no generation field. The boot manifest does not grow to hold installed-service config or accounts.

System region (immutable per generation)

The landed persistent content-addressed Store (CAPOSST1, put/get/has/ delete keyed by SHA-256, durable across reboot) is the durable substrate for immutable generation objects. The installable-system proofs exercise config and account generation objects (SystemConfigOverlay plus related records) and the marker-selected hashes that choose them. Package-manager-style system payload generations – service binaries, default software configuration, and release payload roots written into CAPOSST1 by a production updater – remain future work.

The target system region is the system of record for what software is installed, not a POSIX /usr, once those software-payload generation roots exist. capOS has capabilities, not paths: a service is “installed” when the generation root object binds its name to the content hash of its manifest fragment and binary. Because the landed Namespace cap is RAM-only and does not survive reboot, persistent name-to-hash bindings live inside generation capnp objects in the Store and in writable-filesystem marker files, not in a persistent Namespace cap (none exists). A Namespace may still be populated in RAM at boot from those persistent bindings, and a StoreFS adapter (storage proposal “Bridging the Two Models”) can expose a generation as a directory tree for POSIX/WASI consumers, but the durable installable record is the Store objects plus writable-filesystem markers.

Data region (mutable)

The landed writable filesystem (CAPOSWF1, full Directory mutation set + File.write/truncate/sync/close under a fail-closed single-writer policy, co-located with the persistent Store). It holds everything that legitimately changes at runtime and must survive reboot:

  • Persistent system configuration – the central subject below: capnp overlay objects in the persistent Store plus writable-filesystem marker files for the active/known-good pointers.
  • Local account store – the provision proof writes an operator account record as a persistent Store object and config overlay input; broader durable account policy remains owned by local-users-management.md Gate 3.
  • Per-account home/config/cache subtrees and service state.

The data region is mutated under capability authority only. There is no global filesystem root and no ambient path-based access: a service receives a writable-filesystem Directory cap (or a Store cap) scoped to its own subtree, exactly as Storage and Naming describes for attenuated grants.

Why not “a filesystem is the system of record”

A traditional install makes / the source of truth and layers config files, package databases, and /etc over it. That is ambient authority through paths, which capOS rejects by design (storage proposal, “The Problem with Filesystems”). Here the capability object graph is authoritative; the durable installable-system record is the persistent Store objects plus writable-filesystem marker files, and a filesystem view is an adapter over that authority rather than ambient authority itself. The on-disk bytes may be a filesystem for tooling convenience, but the system model is capability-native.

Beyond-Boot-Manifest Configuration (Central Decision)

This is the core of the proposal. Today the system is fully described by the static boot manifest. An installed system needs persistent, mutable configuration that the boot manifest cannot carry, while keeping the boot manifest’s fail-closed guarantees.

The model is a two-layer composition resolved by init at boot:

  1. Base layer – the immutable boot manifest. Pins the kernel, the init binary (the init mandate from Run Targets, Init Mandate, and Default-Run Integration Gate B still applies: initConfig.init.binary must be init), the kernel-sourced bootstrap caps, and the floor of services and policy the system always runs. This layer is authoritative and cannot be overridden by persistent state – it is the recovery anchor.

  2. Overlay layer – the persistent config generation. A capnp-encoded configuration object naming additional installed services, local network/runtime settings, and account bindings. The object is content-stored in the persistent Store (CAPOSST1); a well-known writable-filesystem path (proposed system/config/, a CAPOSWF1 directory) holds the small marker files that name the current and known-good generation by content hash. The landed Namespace is RAM-only, so this is filesystem-path + content-hash grounding, not a persistent Namespace root. This is what capos-system-install, capos-system-provision, and capos-system-update write in the landed local proofs.

Precedence and merge model

init reads the base manifest from BootPackage (as it does today). The overlay step landed in 2026-05 (installable-config-overlay-schema-and-merge): when the data region mounts, init reads system/config/overlay.bin, decodes the SystemConfigOverlay capnp object, and – only if it validates against the base’s declared SystemManifest.extensionPoints – composes its additional services over the base plan (SystemConfigOverlay::compose_onto, proof make run-installable-overlay). Generation selection also landed (installable-system-generation-rollback): writable-filesystem marker files select the active/known-good object hashes and provide failed-boot fallback. The merge rules are deliberately conservative:

  • Base pins win. Anything the base manifest declares (kernel caps, the init binary, floor services, policy floors) is not overridable by the overlay. The overlay may only add services and fill in settings the base marks as overlay-supplied. This prevents a tampered or buggy overlay from dropping a recovery service or widening authority.
  • Overlay adds, within declared extension points. The base manifest declares named extension points (e.g. “additional services”, “network config”, “account store location”). The overlay may bind only those. An overlay key that does not match a declared extension point is rejected, not silently applied – closed by default, mirroring the existing “omitted cap sources fail closed” invariant (manifest-startup.md).
  • No new authority classes. The overlay can request services be started with caps the base manifest already authorizes init to delegate. It cannot mint kernel-source authority that the base did not grant init. The interface is the permission: an overlay names which already-authorized caps a service gets, never a new kernel cap source.

This is layering, not free-form override: the base manifest is a contract and the overlay fills declared holes in it.

Where persistent config physically and logically lives

  • Physically: the data region – the landed persistent Store (CAPOSST1) for the immutable per-generation capnp objects, and the landed writable filesystem (CAPOSWF1) for the small active/known-good marker files.
  • Logically: a CAPOSWF1 directory tree, e.g. system/config/, holds one marker file naming the current generation object by content hash (plus the retained known-good marker); the generation objects themselves are content-stored in the persistent Store. Account records live under a sibling system/accounts/ tree the same way (consumed by local-users-management.md). There is no persistent Namespace cap; the RAM Namespace is repopulated from these bindings at boot if needed.
  • Authority: only a narrow system-config authority (held by init and the dedicated install/provision/update proof services) may write the system/config writable-filesystem subtree and the system-config Store objects. Ordinary services receive read-only scoped views or nothing. This is the writable-filesystem Directory / Store attenuation model, not a new mechanism.

Detecting and recovering from a bad persistent layer

The overlay is the most dangerous new surface: a corrupt or hostile overlay must never prevent boot. The design is fail-safe by construction:

  • Validation before merge (landed). init validates the overlay against the base manifest’s declared extension points and the schema before applying any of it: a schema-invalid, version-mismatched, content-hash-mismatched, stale-epoch, or extension-point-violating overlay is rejected whole (SystemConfigOverlay::from_capnp_bytes + compose_onto). Validation reuses the existing capos-config discipline rather than a parallel checker.
  • Monotonic generation + integrity. No landed Store, Namespace, or SystemManifest schema carries a system-generation/epoch field (other caps such as AccountRecord and the DDF revocation generations do, but not the installable-system path). The overlay instead carries a monotonic epoch and a SHA-256 contentHash inside its own SystemConfigOverlay capnp object (both landed in track item 3): the epoch is checked against the base’s minOverlayEpoch floor and the content hash is a self-consistency check. The writable-filesystem marker files that record which hash is active/known-good landed in the generation/rollback path. This mirrors the stale-write and monotonic-version rules already required for the account store (local-users-management.md Gate 3) and the managed-cloud store (storage proposal “Managed Cloud Backing”) without extending Store or Namespace. A stale overlay (epoch below the floor) is rejected.
  • Boot-with-base fallback. If the data region does not mount, or the active overlay fails validation, init boots from the base manifest alone and surfaces the failure (serial diagnostics / audit). The system always reaches at least the floor configuration, which by construction includes a recovery path.
  • Known-good generation pointer. The active overlay pointer is advanced only after a generation is proven to boot (see Generations And Rollback); a failed new generation leaves active on the prior known-good one.

Install / Provision / Update / Rollback Flow

The local/QEMU install, provision, update, and rollback flows have landed. They prove the authority and durability shape over capOS capabilities; they do not claim a production release/update service, secure boot/signing, public ingress, or live multi-provider deployment readiness.

Install

The capos-system-install userspace service takes the packaged image source from the booted CD-ROM /boot/bins/ tree and writes the installable layout onto a manifest-selected target disk. It holds only the read-only installable_image_source Directory and the target-scoped block_device_target BlockDevice; it cannot reach the boot disk through that target cap.

The service writes the boot-region head (BOOTHEAD.BIN: protective MBR, primary GPT, FAT ESP with Limine + release kernel + base manifest), writes the backup GPT (BOOTGPT.BIN) at the LBA named by the primary GPT header, and initializes the empty data region (DATAIMG.BIN: empty CAPOSST1 Store + CAPOSWF1 filesystem with system/config) at the fixed cap::data_region_base_lba. It validates every sector range and verifies the read-back before treating the install as complete. The empty data region is the install floor; the first non-empty config generation is provisioning.

make run-installable-install proves the flow in two passes: pass 1 installs into the target virtio-blk disk, and pass 2 boots that disk standalone with no CD-ROM and reaches the base service with the data region mounted.

Provision

First-boot provisioning writes the initial persistent config: the operator’s first local account record, network/runtime settings, and any additional services to start. capos-system-provision runs as PID 1 over an installed system’s persistent data region with only Console, writable_fs_root, and persistent_store caps. On the empty install floor, it writes the first non-empty SystemConfigOverlay generation, commits the generation object to the Store, writes system/config/overlay.bin, and advances the gen-active marker. Until provisioning runs, the system boots on the base-manifest floor.

make run-installable-provision boots the same empty-config disk twice: pass 1 provisions the account/settings/additional service, and pass 2 re-reads the active generation and account record from the data region to prove they survived reboot.

Update

The landed update flow applies a new generation over the same persistent Store + writable system/config region used by provisioning. The local proof does not rewrite a production boot region or ship a signed release updater; it proves staged generation commit, failed-candidate fallback, and base-overlay revalidation.

  1. Write the new generation into the content-addressed Store as a new root hash; the old generation’s objects remain (content-addressing dedups shared objects).
  2. Stage a new active-candidate pointer; do not advance active yet.
  3. Reboot into the candidate. If it reaches a health checkpoint, commit by advancing active. If not, the boot-with-known-good fallback keeps the prior generation (see below).

Persistent config (the overlay and accounts in the data region) is carried across updates: the data region’s config/account generations persist across candidate staging, commit, and fallback. Where a new base no longer admits an overlay’s declared authority, the overlay is re-validated against that base and falls back to the base floor with a surfaced error rather than applying partially.

make run-installable-update boots the same empty-config disk three times: boot 1 provisions known-good generation 1, rejects an overlay against a revoked-cap base, and stages a healthy generation 2; boot 2 commits generation 2 across reboot and stages a failing generation 3; boot 3 auto-falls back from generation 3 to known-good generation 2 while preserving the data region.

Generations and rollback

The active system/config generation is named by writable-filesystem marker files (CAPOSWF1) carrying a content hash and monotonic pointer epoch – not by a SystemManifest field, since the manifest schema carries no system-generation field. The generation objects themselves are immutable content-addressed roots in the persistent Store. Rollback is:

  • System rollback: point the active system-generation hash back to the prior known-good generation. Because generations are immutable content-addressed roots, the prior generation’s bytes are still present; rollback is a pointer move plus reboot, not a re-extraction.
  • Config rollback: point the active overlay binding back to the prior overlay generation, retained for a bounded number of generations.
  • Failed-boot auto-fallback: a generation is promoted to known-good only after it reaches a defined health checkpoint. A boot that does not reach the checkpoint (kernel panic, init failure, overlay validation failure) is detected on the next boot via a “boot attempt count vs confirmed” marker, and the init/generation logic reverts to the last confirmed generation. This is the standard A/B-generation pattern, expressed over content-addressed Store roots rather than two fixed partitions.

make run-installable-generation proves this machinery before the full update flow: it stages a candidate, records a boot attempt before applying it, rejects a stale pointer, proves config rollback to a retained generation, and auto-falls back to the known-good generation across a fresh reboot when a candidate is left unconfirmed.

Build-On Relationship To Landed And Planned Work

This proposal is an integration design over existing tracks. It must not redesign them. Current state of each piece it builds on:

Building blockOwning trackStatus today
Persistent content-addressed StoreStorage and Naminglanded: CAPOSST1 superblock at LBA 0, put/get/has/delete keyed by SHA-256, durable across reboot (persistentStore grant source; reboot proof make run-storage-persist). RAM-backed Store CapObject + userspace RAM Store service also landed
Namespace modelStorage and Naminglanded but RAM-only: resolve/bind/list/sub, not persistent (namespace grant source). No persistent Namespace cap exists
BlockDevice boundaryHardware, Boot, and Storage “Reusable Block-Device Path” / “Local Disk Storage”landed: readBlocks/writeBlocks/info/flush over a real cfg(qemu) virtio-blk device (blockDevice grant source; proof make run-virtio-blk)
Read-only filesystem over BlockDevice“Local Disk Storage Milestone”landed: CAPOSRO1 superblock, Directory.list/open/sub + File.read/stat, mutating methods fail closed (readOnlyFsRoot; proof make run-storage-fs)
Writable persistence across reboot“Writable Local Storage Milestone”landed: CAPOSWF1 writable filesystem at LBA 256, full Directory mutation set + File.write/truncate/sync/close, fail-closed single-writer (writableFsRoot; reboot proof make run-storage-writable). Co-located with CAPOSST1 via tools/mkstore-image --writable
Bootable disk image (make image, make run-disk)“Bootable Disk Image”landed: single hybrid BIOS+UEFI raw image with one GPT ESP carrying Limine + kernel + manifest.bin; make image/run-disk/run-disk-bios; GCP/AWS provider packaging. The boot-binary ISO layout’s on-demand reads also landed behind boot_iso
Boot manifest / SystemManifest / init mandateManifest and Service Startup, Run Targets, Init Mandate, and Default-Run Integrationlanded: static manifest, init-owned service graph, name-only boot-ISO path. The installable path additionally reads and validates a persistent overlay only when the data region is mounted and the base manifest declares matching extension points (make run-installable-overlay)
Local account store (a consumer)Local Users, Storage, and Policy Gate 3partially landed for installable proof: capos-system-provision writes and re-reads one operator account record through persistent Store/writable-filesystem state; full durable account policy remains future

The storage and disk-image prerequisites have landed, and the bounded installable-system composition has landed on top of them: overlay read/validate/merge, generation marker files, install, provision, and update/rollback flows all have local QEMU evidence. The decomposition task (installable-system-decomposition) required ddf-blockdevice-boundary-virtio-blk-smoke, storage-readonly-fs-over-blockdevice, storage-persistent-store-reboot-proof, storage-writable-persistence-reboot-proof, and disk-image-provider-packaging to be done before emitting implementation tasks; they are. Because some prerequisites landed with contracts that differ from this proposal’s original projections (single hybrid ESP rather than three boot/system/data partitions, RAM-only Namespace rather than a persistent one, no system-generation field on the Store/Namespace/SystemManifest path), this proposal has been reconciled to the landed shapes above so the track does not encode a stale contract. Production hardening remains separate: secure boot/signing, authorized release publication, public ingress, broader cloud-provider coverage, direct-remapping production hardware, and full durable local-account policy are not implied by the local installable-system evidence.

Milestone Framing

installable-system is its own milestone: “an installed, persistent capOS that boots from disk and keeps mutable system configuration across reboots.” It is a distinct, user-visible product outcome from the storage and bootable-disk image milestones it builds on, even though it depends on them – a user can have block devices, a filesystem, and a bootable disk image without having an installed, self-configuring, updatable system.

This framing is recorded in Roadmap. The milestone became the selected milestone after Device Driver Foundation closed and is now closed for the bounded local/QEMU installable-system contract by the structural docs reconcile and the landed install/provision/update/rollback evidence. The successor selected milestone is the GCE self-hosted Web UI path; public ingress and TLS remain approval-gated follow-ups under that track.

Design Grounding

Closeout And Decomposition

This proposal is reachable from docs/SUMMARY.md, and the installable-system milestone framing is recorded in docs/roadmap.md.

Turning this design into actionable backlog + implementation tasks is a separate task, installable-system-decomposition, which decomposed the track against the landed BlockDevice/filesystem/Store/writable-persistence/disk-image contracts in Installable System. The behavior track then landed the data-region mount, overlay compose, generation/rollback machinery, integrated disk packaging, target-disk install, first-boot provision, and update/rollback flows. This proposal has now been structurally reconciled to those landed shapes: integrated installed disk packaging over an ESP plus fixed-LBA data region, writable-filesystem + content-addressed Store grounding for persistent naming and generation markers, RAM-only Namespace, and no system-generation field on the Store/Namespace/SystemManifest path. The proposal text and backlog track therefore describe the same bounded local/QEMU contract.