Design Proposal: Installable capOS System
This is a design proposal with its bounded local/QEMU path landed. The
persistent data-region mount, config-overlay schema + init compose/merge with
fail-closed fallback (make run-installable-overlay), generation/rollback
machinery (make run-installable-generation), integrated installable disk
(make run-installable-disk), target-disk install
(make run-installable-install), first-boot provision
(make run-installable-provision), and update/rollback flow
(make run-installable-update) are implemented. The storage and disk-image
prerequisites it builds on have also landed (see
Build-On Relationship):
block-device-backed read-only and writable filesystems, a persistent
content-addressed Store, reboot-surviving writable persistence, and a hybrid
BIOS+UEFI disk image. This proposal has been reconciled against those landed
contracts and is decomposed separately (see
Closeout And Decomposition).
Throughout, landed behavior is written in the present tense and planned
behavior in the future/conditional tense. The installed-system proof remains a
bounded local/QEMU result: it does not claim secure boot/signing, production
release authority, public ingress, AWS/Azure live support, direct-remapping
production hardware, userspace smoltcp/L4 readiness, or a persistent
Namespace.
Problem
The baseline capOS boot path is a boot-from-image research system. The build
packs a Cap’n Proto SystemManifest (compiled from system.cue) plus the
userspace binaries into an ISO; Limine loads the manifest as a module; the
kernel parses it, builds init’s bootstrap caps, and enters the single
initConfig.init process. The boot-binary ISO layout (behind the boot_iso
feature) can instead read ELFs on demand from /boot/bins/ so the manifest
carries names only. Without the installable-system path, the system that boots
is still exactly the image that was built: the next boot re-reads the same
immutable manifest and rebuilds the same capability graph.
That baseline is correct for a research image and insufficient for an installed system. An installed capOS is one that:
- boots from a local disk rather than a re-imaged ISO each time;
- carries mutable system configuration – installed services, local accounts, network/runtime settings – that persists across reboot and is not baked into the image; and
- can be updated to a new system generation and rolled back to a known-good one.
The hard question is not the disk format. It is how persistent, mutable system configuration composes with the immutable boot manifest without reintroducing ambient authority or a single mutable blob that can brick the system. That composition is the center of this proposal.
Non-Goals
- Designing the block device, filesystem, or
Storepersistence mechanisms. Those are owned by Storage and Naming and the storage tracks in Hardware, Boot, and Storage. This proposal composes them and must not redesign them. - Defining the local-account schema. That is Local Users, Storage, and Policy; the account store is a consumer of the persistent-config region designed here.
- Secure boot, image signing, and manifest trust. Those are tracked as storage-proposal Open Question #5 and the security/verification track; this proposal notes where a signature check would attach but does not specify the cryptography.
- Any cloud-image or non-ATAPI boot-binary loader work; see the Cloud Device Tracks backlog.
On-Disk Layout
The installed system needs three regions with distinct mutability and authority: a read-only boot region, an immutable-per-generation system region, and a mutable data region. How those regions map onto physical disks is the first reconciliation point, because the landed building blocks already fix part of it.
Landed shape:
make imageproduces a single hybrid BIOS+UEFI raw image with one GPT ESP (FAT32) carrying Limine + the kernel +manifest.bin. That is the boot region (tools/mkdiskimage.sh,make run-disk/make run-disk-bios).- The persistent content-addressed
Store(CAPOSST1) and writable filesystem (CAPOSWF1) are co-located in the data-region image produced bytools/mkstore-image --writable. Focused storage and early data-region smokes can still attach that image as a separate virtio-blk device. - The installable-system disk path has folded those regions onto one bootable
disk: GPT partition 1 is the ESP and GPT partition 2 carries the co-located
CAPOSST1+CAPOSWF1data region at a fixed base LBA read throughcap::data_region_base_lba(no GPT parser in the kernel).make run-installable-diskproves boot from that integrated disk. capos-system-installwrites the same layout to a manifest-selected targetBlockDeviceand then boots the installed disk standalone (make run-installable-install). Provisioning and update operate on the installed data region after that floor exists.
Region placement decision (reconciled). The storage model is the
co-located CAPOSST1 Store + CAPOSWF1 writable filesystem data region,
not a persistent Namespace and not three independent mutable partitions. The
separate data-region disk remains a focused proof packaging, while the installed
system packages the ESP and data region onto one target disk. The kernel relies
on the fixed tool/kernel data-region LBA contract for the installable path.
flowchart TD
InstallDisk[Installed disk] --> Boot[GPT partition 1: ESP with Limine + kernel + boot manifest]
InstallDisk --> DataRegion[GPT partition 2: fixed-LBA data region]
DataRegion --> System[CAPOSST1 content-addressed Store: immutable generation objects]
DataRegion --> Data[CAPOSWF1 writable filesystem: config/account state and markers]
Boot -.init mounts data region when present.-> DataRegion
Data -.active and known-good marker files name hashes.-> System
The system and data regions share the co-located data region: immutable
generation objects live in the persistent Store (CAPOSST1), and mutable
config/accounts plus the active/known-good pointers live in the writable
filesystem (CAPOSWF1). The overlay read/validate/merge that composes them at
boot has landed for the installable-system path (see below).
Boot region (read-only at runtime)
The single GPT ESP carrying Limine, the kernel, and the immutable boot
manifest – the same SystemManifest shape that exists today. This region is
what make image produces now (one hybrid BIOS+UEFI image, one ESP). At runtime
it is treated as read-only. The landed update proof stages and commits
generation objects in the data region; production boot-region rewrite,
rollback, signing, and release policy remain future work.
The boot manifest stays the root of trust for topology: it pins the kernel,
the init binary, and the minimum kernel-sourced caps init needs to bootstrap. In
the installable-system path, which system/config generation to activate is
named by writable-filesystem marker files and persistent Store hashes (see
Generations And Rollback); the SystemManifest
still carries no generation field. The boot manifest does not grow to hold
installed-service config or accounts.
System region (immutable per generation)
The landed persistent content-addressed Store (CAPOSST1, put/get/has/
delete keyed by SHA-256, durable across reboot) is the durable substrate for
immutable generation objects. The installable-system proofs exercise config and
account generation objects (SystemConfigOverlay plus related records) and the
marker-selected hashes that choose them. Package-manager-style system payload
generations – service binaries, default software configuration, and release
payload roots written into CAPOSST1 by a production updater – remain future
work.
The target system region is the system of record for what software is
installed, not a POSIX /usr, once those software-payload generation roots
exist. capOS has capabilities, not paths: a service is “installed” when the
generation root object binds its name to the content hash of its manifest
fragment and binary. Because the landed Namespace cap is RAM-only and does
not survive reboot, persistent name-to-hash bindings live inside generation
capnp objects in the Store and in writable-filesystem marker files, not in a
persistent Namespace cap (none exists). A Namespace may still be
populated in RAM at boot from those persistent bindings, and a StoreFS adapter
(storage proposal “Bridging the Two Models”) can expose a generation as a
directory tree for POSIX/WASI consumers, but the durable installable record is
the Store objects plus writable-filesystem markers.
Data region (mutable)
The landed writable filesystem (CAPOSWF1, full Directory mutation set +
File.write/truncate/sync/close under a fail-closed single-writer policy,
co-located with the persistent Store). It holds everything that legitimately
changes at runtime and must survive reboot:
- Persistent system configuration – the central subject below: capnp
overlay objects in the persistent
Storeplus writable-filesystem marker files for theactive/known-good pointers. - Local account store – the provision proof writes an operator account
record as a persistent
Storeobject and config overlay input; broader durable account policy remains owned bylocal-users-management.mdGate 3. - Per-account home/config/cache subtrees and service state.
The data region is mutated under capability authority only. There is no global
filesystem root and no ambient path-based access: a service receives a
writable-filesystem Directory cap (or a Store cap) scoped to its own subtree,
exactly as Storage and Naming
describes for attenuated grants.
Why not “a filesystem is the system of record”
A traditional install makes / the source of truth and layers config files,
package databases, and /etc over it. That is ambient authority through paths,
which capOS rejects by design (storage proposal, “The Problem with
Filesystems”). Here the capability object graph is authoritative; the
durable installable-system record is the persistent Store objects plus
writable-filesystem marker files, and a filesystem view is an adapter over that
authority rather than ambient authority itself. The on-disk bytes may be a
filesystem for tooling convenience, but the system model is capability-native.
Beyond-Boot-Manifest Configuration (Central Decision)
This is the core of the proposal. Today the system is fully described by the static boot manifest. An installed system needs persistent, mutable configuration that the boot manifest cannot carry, while keeping the boot manifest’s fail-closed guarantees.
The model is a two-layer composition resolved by init at boot:
-
Base layer – the immutable boot manifest. Pins the kernel, the init binary (the init mandate from Run Targets, Init Mandate, and Default-Run Integration Gate B still applies:
initConfig.init.binarymust beinit), the kernel-sourced bootstrap caps, and the floor of services and policy the system always runs. This layer is authoritative and cannot be overridden by persistent state – it is the recovery anchor. -
Overlay layer – the persistent config generation. A capnp-encoded configuration object naming additional installed services, local network/runtime settings, and account bindings. The object is content-stored in the persistent
Store(CAPOSST1); a well-known writable-filesystem path (proposedsystem/config/, aCAPOSWF1directory) holds the small marker files that name the current and known-good generation by content hash. The landedNamespaceis RAM-only, so this is filesystem-path + content-hash grounding, not a persistentNamespaceroot. This is whatcapos-system-install,capos-system-provision, andcapos-system-updatewrite in the landed local proofs.
Precedence and merge model
init reads the base manifest from BootPackage (as it does today). The overlay
step landed in 2026-05 (installable-config-overlay-schema-and-merge): when
the data region mounts, init reads system/config/overlay.bin, decodes the
SystemConfigOverlay capnp object, and – only if it validates against the
base’s declared SystemManifest.extensionPoints – composes its additional
services over the base plan (SystemConfigOverlay::compose_onto, proof
make run-installable-overlay). Generation selection also landed
(installable-system-generation-rollback): writable-filesystem marker files
select the active/known-good object hashes and provide failed-boot fallback.
The merge rules are deliberately conservative:
- Base pins win. Anything the base manifest declares (kernel caps, the init binary, floor services, policy floors) is not overridable by the overlay. The overlay may only add services and fill in settings the base marks as overlay-supplied. This prevents a tampered or buggy overlay from dropping a recovery service or widening authority.
- Overlay adds, within declared extension points. The base manifest declares
named extension points (e.g. “additional services”, “network config”,
“account store location”). The overlay may bind only those. An overlay key
that does not match a declared extension point is rejected, not silently
applied – closed by default, mirroring the existing “omitted cap sources fail
closed” invariant (
manifest-startup.md). - No new authority classes. The overlay can request services be started with caps the base manifest already authorizes init to delegate. It cannot mint kernel-source authority that the base did not grant init. The interface is the permission: an overlay names which already-authorized caps a service gets, never a new kernel cap source.
This is layering, not free-form override: the base manifest is a contract and the overlay fills declared holes in it.
Where persistent config physically and logically lives
- Physically: the data region – the landed persistent
Store(CAPOSST1) for the immutable per-generation capnp objects, and the landed writable filesystem (CAPOSWF1) for the smallactive/known-good marker files. - Logically: a
CAPOSWF1directory tree, e.g.system/config/, holds one marker file naming the current generation object by content hash (plus the retained known-good marker); the generation objects themselves are content-stored in the persistentStore. Account records live under a siblingsystem/accounts/tree the same way (consumed bylocal-users-management.md). There is no persistentNamespacecap; the RAMNamespaceis repopulated from these bindings at boot if needed. - Authority: only a narrow system-config authority (held by init and the
dedicated install/provision/update proof services) may write the
system/configwritable-filesystem subtree and the system-configStoreobjects. Ordinary services receive read-only scoped views or nothing. This is the writable-filesystemDirectory/Storeattenuation model, not a new mechanism.
Detecting and recovering from a bad persistent layer
The overlay is the most dangerous new surface: a corrupt or hostile overlay must never prevent boot. The design is fail-safe by construction:
- Validation before merge (landed). init validates the overlay against the
base manifest’s declared extension points and the schema before applying any
of it: a schema-invalid, version-mismatched, content-hash-mismatched,
stale-epoch, or extension-point-violating overlay is rejected whole
(
SystemConfigOverlay::from_capnp_bytes+compose_onto). Validation reuses the existingcapos-configdiscipline rather than a parallel checker. - Monotonic generation + integrity. No landed
Store,Namespace, orSystemManifestschema carries a system-generation/epoch field (other caps such asAccountRecordand the DDF revocation generations do, but not the installable-system path). The overlay instead carries a monotonicepochand a SHA-256contentHashinside its ownSystemConfigOverlaycapnp object (both landed in track item 3): the epoch is checked against the base’sminOverlayEpochfloor and the content hash is a self-consistency check. The writable-filesystem marker files that record which hash isactive/known-good landed in the generation/rollback path. This mirrors the stale-write and monotonic-version rules already required for the account store (local-users-management.mdGate 3) and the managed-cloud store (storage proposal “Managed Cloud Backing”) without extendingStoreorNamespace. A stale overlay (epoch below the floor) is rejected. - Boot-with-base fallback. If the data region does not mount, or the active overlay fails validation, init boots from the base manifest alone and surfaces the failure (serial diagnostics / audit). The system always reaches at least the floor configuration, which by construction includes a recovery path.
- Known-good generation pointer. The
activeoverlay pointer is advanced only after a generation is proven to boot (see Generations And Rollback); a failed new generation leavesactiveon the prior known-good one.
Install / Provision / Update / Rollback Flow
The local/QEMU install, provision, update, and rollback flows have landed. They prove the authority and durability shape over capOS capabilities; they do not claim a production release/update service, secure boot/signing, public ingress, or live multi-provider deployment readiness.
Install
The capos-system-install userspace service takes the packaged image source
from the booted CD-ROM /boot/bins/ tree and writes the installable layout onto
a manifest-selected target disk. It holds only the read-only
installable_image_source Directory and the target-scoped
block_device_target BlockDevice; it cannot reach the boot disk through that
target cap.
The service writes the boot-region head (BOOTHEAD.BIN: protective MBR,
primary GPT, FAT ESP with Limine + release kernel + base manifest), writes the
backup GPT (BOOTGPT.BIN) at the LBA named by the primary GPT header, and
initializes the empty data region (DATAIMG.BIN: empty CAPOSST1 Store +
CAPOSWF1 filesystem with system/config) at the fixed
cap::data_region_base_lba. It validates every sector range and verifies the
read-back before treating the install as complete. The empty data region is the
install floor; the first non-empty config generation is provisioning.
make run-installable-install proves the flow in two passes: pass 1 installs
into the target virtio-blk disk, and pass 2 boots that disk standalone with no
CD-ROM and reaches the base service with the data region mounted.
Provision
First-boot provisioning writes the initial persistent config: the
operator’s first local account record, network/runtime settings, and any
additional services to start. capos-system-provision runs as PID 1 over an
installed system’s persistent data region with only Console,
writable_fs_root, and persistent_store caps. On the empty install floor, it
writes the first non-empty SystemConfigOverlay generation, commits the
generation object to the Store, writes system/config/overlay.bin, and
advances the gen-active marker. Until provisioning runs, the system boots on
the base-manifest floor.
make run-installable-provision boots the same empty-config disk twice: pass 1
provisions the account/settings/additional service, and pass 2 re-reads the
active generation and account record from the data region to prove they survived
reboot.
Update
The landed update flow applies a new generation over the same persistent
Store + writable system/config region used by provisioning. The local proof
does not rewrite a production boot region or ship a signed release updater; it
proves staged generation commit, failed-candidate fallback, and base-overlay
revalidation.
- Write the new generation into the content-addressed
Storeas a new root hash; the old generation’s objects remain (content-addressing dedups shared objects). - Stage a new
active-candidate pointer; do not advanceactiveyet. - Reboot into the candidate. If it reaches a health checkpoint, commit by
advancing
active. If not, the boot-with-known-good fallback keeps the prior generation (see below).
Persistent config (the overlay and accounts in the data region) is carried across updates: the data region’s config/account generations persist across candidate staging, commit, and fallback. Where a new base no longer admits an overlay’s declared authority, the overlay is re-validated against that base and falls back to the base floor with a surfaced error rather than applying partially.
make run-installable-update boots the same empty-config disk three times:
boot 1 provisions known-good generation 1, rejects an overlay against a
revoked-cap base, and stages a healthy generation 2; boot 2 commits generation
2 across reboot and stages a failing generation 3; boot 3 auto-falls back from
generation 3 to known-good generation 2 while preserving the data region.
Generations and rollback
The active system/config generation is named by writable-filesystem marker
files (CAPOSWF1) carrying a content hash and monotonic pointer epoch – not by
a SystemManifest field, since the manifest schema carries no system-generation
field. The generation objects themselves are immutable content-addressed roots
in the persistent Store. Rollback is:
- System rollback: point the active system-generation hash back to the prior known-good generation. Because generations are immutable content-addressed roots, the prior generation’s bytes are still present; rollback is a pointer move plus reboot, not a re-extraction.
- Config rollback: point the
activeoverlay binding back to the prior overlay generation, retained for a bounded number of generations. - Failed-boot auto-fallback: a generation is promoted to known-good only
after it reaches a defined health checkpoint. A boot that does not reach the
checkpoint (kernel panic, init failure, overlay validation failure) is
detected on the next boot via a “boot attempt count vs confirmed” marker, and
the init/generation logic reverts to the last confirmed generation. This is the
standard A/B-generation pattern, expressed over content-addressed
Storeroots rather than two fixed partitions.
make run-installable-generation proves this machinery before the full update
flow: it stages a candidate, records a boot attempt before applying it, rejects
a stale pointer, proves config rollback to a retained generation, and
auto-falls back to the known-good generation across a fresh reboot when a
candidate is left unconfirmed.
Build-On Relationship To Landed And Planned Work
This proposal is an integration design over existing tracks. It must not redesign them. Current state of each piece it builds on:
| Building block | Owning track | Status today |
|---|---|---|
Persistent content-addressed Store | Storage and Naming | landed: CAPOSST1 superblock at LBA 0, put/get/has/delete keyed by SHA-256, durable across reboot (persistentStore grant source; reboot proof make run-storage-persist). RAM-backed Store CapObject + userspace RAM Store service also landed |
Namespace model | Storage and Naming | landed but RAM-only: resolve/bind/list/sub, not persistent (namespace grant source). No persistent Namespace cap exists |
BlockDevice boundary | Hardware, Boot, and Storage “Reusable Block-Device Path” / “Local Disk Storage” | landed: readBlocks/writeBlocks/info/flush over a real cfg(qemu) virtio-blk device (blockDevice grant source; proof make run-virtio-blk) |
Read-only filesystem over BlockDevice | “Local Disk Storage Milestone” | landed: CAPOSRO1 superblock, Directory.list/open/sub + File.read/stat, mutating methods fail closed (readOnlyFsRoot; proof make run-storage-fs) |
| Writable persistence across reboot | “Writable Local Storage Milestone” | landed: CAPOSWF1 writable filesystem at LBA 256, full Directory mutation set + File.write/truncate/sync/close, fail-closed single-writer (writableFsRoot; reboot proof make run-storage-writable). Co-located with CAPOSST1 via tools/mkstore-image --writable |
Bootable disk image (make image, make run-disk) | “Bootable Disk Image” | landed: single hybrid BIOS+UEFI raw image with one GPT ESP carrying Limine + kernel + manifest.bin; make image/run-disk/run-disk-bios; GCP/AWS provider packaging. The boot-binary ISO layout’s on-demand reads also landed behind boot_iso |
Boot manifest / SystemManifest / init mandate | Manifest and Service Startup, Run Targets, Init Mandate, and Default-Run Integration | landed: static manifest, init-owned service graph, name-only boot-ISO path. The installable path additionally reads and validates a persistent overlay only when the data region is mounted and the base manifest declares matching extension points (make run-installable-overlay) |
| Local account store (a consumer) | Local Users, Storage, and Policy Gate 3 | partially landed for installable proof: capos-system-provision writes and re-reads one operator account record through persistent Store/writable-filesystem state; full durable account policy remains future |
The storage and disk-image prerequisites have landed, and the bounded
installable-system composition has landed on top of them: overlay
read/validate/merge, generation marker files, install, provision, and
update/rollback flows all have local QEMU evidence. The decomposition task
(installable-system-decomposition)
required ddf-blockdevice-boundary-virtio-blk-smoke,
storage-readonly-fs-over-blockdevice, storage-persistent-store-reboot-proof,
storage-writable-persistence-reboot-proof, and disk-image-provider-packaging
to be done before emitting implementation tasks; they are. Because some
prerequisites landed with contracts that differ from this proposal’s original
projections (single hybrid ESP rather than three boot/system/data partitions,
RAM-only Namespace rather than a persistent one, no system-generation field on
the Store/Namespace/SystemManifest path), this proposal has been reconciled
to the landed shapes above so the track does not encode a stale contract.
Production hardening remains separate: secure boot/signing, authorized release
publication, public ingress, broader cloud-provider coverage, direct-remapping
production hardware, and full durable local-account policy are not implied by
the local installable-system evidence.
Milestone Framing
installable-system is its own milestone: “an installed, persistent capOS
that boots from disk and keeps mutable system configuration across reboots.” It
is a distinct, user-visible product outcome from the storage and bootable-disk
image milestones it builds on, even though it depends on them – a user can have
block devices, a filesystem, and a bootable disk image without having an
installed, self-configuring, updatable system.
This framing is recorded in Roadmap. The milestone became the selected milestone after Device Driver Foundation closed and is now closed for the bounded local/QEMU installable-system contract by the structural docs reconcile and the landed install/provision/update/rollback evidence. The successor selected milestone is the GCE self-hosted Web UI path; public ingress and TLS remain approval-gated follow-ups under that track.
Design Grounding
- Hardware, Boot, and Storage – Local Disk Storage, Writable Local Storage, and Bootable Disk Image tracks (the storage/boot prerequisites this design composes).
- Local Users, Storage, and Policy – manifest-seeded vs disk-backed accounts (Gate 3); a concrete consumer of the persistent-config region.
- Run Targets, Init Mandate, and Default-Run Integration – the init mandate and boot-manifest policy any installed-system boot path must respect.
- Storage and Naming
– the
Store/Namespace/Directory/Filemodel, content-addressing, attenuation, and the managed-cloud/stale-write rules the persistence layer reuses. - Manifest and Service Startup
and the
system.cue->SystemManifestboot path – the immutable base the persistent overlay composes with. - Cloud Deployment – the cloud disk-image/import path that an installed-system image must remain compatible with.
Closeout And Decomposition
This proposal is reachable from docs/SUMMARY.md, and the installable-system
milestone framing is recorded in docs/roadmap.md.
Turning this design into actionable backlog + implementation tasks is a
separate task,
installable-system-decomposition,
which decomposed the track against the landed
BlockDevice/filesystem/Store/writable-persistence/disk-image contracts in
Installable System. The
behavior track then landed the data-region mount, overlay compose,
generation/rollback machinery, integrated disk packaging, target-disk install,
first-boot provision, and update/rollback flows. This proposal has now been
structurally reconciled to those landed shapes: integrated installed disk
packaging over an ESP plus fixed-LBA data region, writable-filesystem +
content-addressed Store grounding for persistent naming and generation
markers, RAM-only Namespace, and no system-generation field on the
Store/Namespace/SystemManifest path. The proposal text and backlog track
therefore describe the same bounded local/QEMU contract.