Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Capability-Based Service Architecture

How capOS processes receive authority, compose into services, and expose layered capabilities — without a service manager daemon.

Problem

Traditional OSes grant processes ambient authority (file system, network, IPC namespaces) and then restrict it via sandboxing (seccomp, namespaces, AppArmor). Service managers like systemd handle dependencies, lifecycle, and resource limits through a central daemon with a massive configuration surface.

capOS inverts this: processes start with zero authority and receive only the capabilities they need. The capability graph implicitly encodes service dependencies, resource limits, and access control. No central daemon required.

Process Startup Model

A process receives its entire authority as a set of named capabilities at spawn time. There is no ambient authority to fall back on — if a capability wasn’t granted, the operation is impossible.

The child process sees its granted capabilities by name. It cannot discover or request capabilities it wasn’t given.

Capability Layering

Each process consumes lower-level capabilities and exports higher-level ones. Authority narrows at every layer:

Kernel
  │
  ├─ Nic cap (raw frame send/receive for one device)
  ├─ Timer cap (monotonic clock)
  ├─ DeviceMmio cap (one device's BAR regions)
  └─ Interrupt cap (one IRQ line)
       │
       v
NIC Driver Process
  │
  └─ Nic cap ──> Network Stack Process
                   │
                   ├─ TcpSocket cap (one connection)
                   ├─ UdpSocket cap (one socket)
                   └─ NetworkManager cap (create sockets)
                        │
                        v
                   HTTP Service Process
                     │
                     ├─ Fetch cap (any URL)
                     │    │
                     │    v
                     │  Trusted Process (holds Fetch, mints scoped caps)
                     │
                     └─ HttpEndpoint cap (one origin)
                          │
                          v
                     Application Process

The application at the bottom holds an HttpEndpoint cap scoped to a single origin. It cannot make raw TCP connections, send arbitrary packets, or touch any device. The capability is the security policy.

HTTP Capabilities

Two levels of HTTP capability: Fetch (general) and HttpEndpoint (scoped). HttpEndpoint is implemented by a process that holds a Fetch cap and restricts it.

Fetch

Unrestricted HTTP access — equivalent to the browser Fetch API. The holder can make requests to any URL. This is the base capability that HTTP service processes use internally.

interface Fetch {
    # General-purpose HTTP request to any URL.
    request @0 (url :Text, method :Text, headers :List(Header), body :Data)
        -> (status :UInt16, headers :List(Header), body :Data);
}

struct Header {
    name @0 :Text;
    value @1 :Text;
}

Fetch is powerful — granting it is roughly equivalent to granting arbitrary outbound network access. It should only be held by service processes that need to make requests on behalf of others, not by application code directly.

HttpEndpoint

A restricted view of Fetch, scoped to a single origin. The holder can only make requests within the bounds encoded in the capability.

interface HttpEndpoint {
    # Request scoped to this endpoint's origin.
    # Path is relative (e.g., "/v1/users").
    request @0 (method :Text, path :Text, headers :List(Header), body :Data)
        -> (status :UInt16, headers :List(Header), body :Data);
}

Note: same request() signature as Fetch, but path instead of url. The origin is implicit — bound into the capability at mint time.

Attenuation

A process holding Fetch mints HttpEndpoint caps by narrowing authority. The core restriction is always origin — Fetch can reach any URL, HttpEndpoint is locked to one host. Additional constraints (path prefixes, method restrictions, rate limits) are possible but are userspace policy details, not OS-level concerns.

This is the standard object-capability attenuation pattern: same interface, less authority. The application code is identical whether it holds a broad or narrow HttpEndpoint.

Boot and Initialization Sequence

The kernel doesn’t know about services. It boots, creates a handful of kernel-provided caps, and spawns exactly one process: init. Everything else is init’s responsibility.

Current State vs Target State

The implementation has crossed the default init-owned startup milestone. Default system.cue sets config.initExecutesManifest; the kernel validates the full manifest, rejects manifests without that flag, boots only the first init service, grants init BootPackage and ProcessSpawner, and lets init resolve the remaining ServiceEntry graph through ProcessSpawner. The old kernel-side create_all_service_caps resolver remains as cleanup debt until the manifest schema shrinks.

The target model removes the kernel-side service graph entirely. The manifest stops being a kernel authority graph and becomes a boot package delivered to init:

  • List of embedded binaries (init needs them before any storage service exists; they can’t be fetched from a filesystem that hasn’t started).
  • Init’s config blob (CUE-encoded tree; what to spawn, with what attenuations, with what restart policy).
  • Kernel boot parameters (memory limits, feature flags) consumed by the kernel itself, not forwarded to init.

The kernel spawns exactly one userspace process (init) with a fixed cap bundle:

  • Console — kernel serial wrapper (may be replaced later by a userspace log service, with init retaining a direct console cap for emergency use).
  • ProcessSpawner — only init and its delegated supervisors hold this.
  • FrameAllocator — physical frame authority for init’s own allocations.
  • VirtualMemory — per-process address-space authority for init.
  • DeviceManager — enumerate/claim devices; init delegates device-specific slices to drivers.
  • Timer — monotonic clock.
  • BootPackage — read-only cap exposing the embedded binaries and the config blob.

Everything else — drivers, net-stack, filesystems, supervisors, apps — init spawns at runtime via ProcessSpawner with appropriate attenuation. No manifest ServiceEntry, no cross-service CapRef, no manifest exports.

Pre-Init Boundary After Stage 6

Rule of thumb: no userspace service runs before init. The kernel’s job is primitive cap synthesis and a single-process handoff; init’s job is the whole service graph. Concretely, after Stage 6:

  • Stays in kernel pre-init: memory map ingest, frame allocator, heap, paging, GDT/IDT/TSS, serial for kernel diagnostics, scheduler, ring dispatch, kernel-cap CapObject impls, ELF loading for init, boot package measurement (if attested boot is added).
  • Stays in manifest: binaries list + init config blob + kernel boot params. Schema-wise, ServiceEntry and CapSource::Service disappear; SystemManifest shrinks to binaries + initConfig + kernelParams.
  • Moves to init: service topology, cross-service cap wiring, attenuation, restart policies, dynamic spawn, cap export/import, supervision trees. Anything a service manager would do.
  • Moves to init or later services: logging policy, config store, secrets, filesystem mounts, network configuration, device binding.

Edge cases that might look like they want a pre-init service but don’t:

  • Early crash / panic handling. Kernel-side panic handler, no service needed.
  • Recovery shell. Kernel fallback: if init fails to reach a healthy state within a timeout (e.g. exits immediately, or never issues a liveness SQE), kernel optionally spawns a “recovery” binary from the boot package with the same cap bundle. Still just one userspace process at a time pre-supervisor-loop.
  • Attested/measured boot. Kernel hashes binaries in the boot package before handing BootPackage to init. The measurement agent, if any, runs as a normal service spawned by init with a cap to the sealed measurements.
  • Early-boot console. Kernel owns serial and exposes Console to init. A userspace log service can layer on top later; it is not pre-init.

Legacy Manifest Fields After Stage 6

ServiceEntry.caps, CapSource::Service, and ServiceEntry.exports are transitional. ProcessSpawner and the generic init-side spawn loop are now in place for system-spawn.cue; the remaining cleanup is to remove these fields from the kernel bootstrap contract:

  1. Delete ServiceEntry and CapSource::Service from schema/capos.capnp.
  2. Collapse SystemManifest.services into initConfig: CueValue.
  3. Remove create_all_service_caps, the two-pass resolver, and the manifest authority-graph validator (validate_manifest_graph).
  4. Kernel spawns one process from initConfig.initBinary with the fixed cap bundle described above plus BootPackage.

The re-export restriction added in capos-config::validate_manifest_graph (service A exports cap sourced from B.ep) becomes moot at that point because there are no manifest exports at all. It stays as defensive validation while the transitional schema exists.

Init Binary Embedding

Init is part of the kernel’s bootstrap contract, not a configuration choice: the cap bundle handed to init is a kernel ABI, the _start(ring, pid, …) entry shape is a kernel ABI, and a version-mismatched init is a footgun with no payoff in a single-init research OS. So the init ELF ships inside the kernel binary via include_bytes!, not as a separate manifest entry or Limine module.

Shape:

  • init/ stays a standalone crate with its own linker script and code model (user-space base 0x200000, static relocation model, 4 KiB alignment). Not a workspace member; different build flags than the kernel.
  • kernel/build.rs drives init/’s build (or depends on the prebuilt artifact at a known path) and emits an include_bytes!("…") into a kernel::boot::INIT_ELF: &[u8] static.
  • Kernel bootstrap parses INIT_ELF through the same capos_lib::elf path used for service binaries, creates the init address space via AddressSpace::new_user(), loads segments, populates the cap bundle (including BootPackage), and jumps. No Limine module lookup for init.
  • SystemManifest.binaries stops containing an “init” entry. Its binaries list is services-only. BootPackage exposes only what init hands out to children.
  • Measured-boot attestation (if added) covers the kernel ELF, which transitively covers init’s bytes. Service binaries are hashed separately by the kernel before handing BootPackage to init.

What this does not change:

  • Init still runs in Ring 3 with its own page tables; embedding is byte packaging, not privilege merging.
  • Init is still ELF-parsed at boot — the same loader and W^X enforcement apply. The only thing different is where the bytes came from.
  • Service binaries (everything spawned after init) stay in the boot package as distinct blobs, exposed to init via BootPackage. They are not linked into the kernel; their lifecycle is independent of the kernel’s.

What option was rejected: fully linking init into the kernel crate (shared compilation unit, shared text). That collapses the kernel/user build boundary, couples linker scripts and code models, and puts init’s panics/UB inside the kernel’s compilation context. The process-isolation boundary survives that arrangement — but the build-time separation that makes the boundary trustworthy does not. include_bytes! preserves the separation; static linking destroys it.

Kernel boot
  │
  ├─ Create kernel caps: Console, Timer, DeviceManager, ProcessSpawner
  │
  └─ Spawn init with all kernel caps
       │
       init process (PID 1)
         │
         ├─ Phase 1: Core services (sequential — each depends on previous)
         │    ├─ DeviceManager.enumerate() → list of devices
         │    ├─ Spawn NIC driver with device-specific caps
         │    ├─ Wait for NIC driver to export Nic cap
         │    ├─ Spawn net-stack with Nic + Timer caps
         │    └─ Wait for net-stack to export NetworkManager cap
         │
         ├─ Phase 2: Higher-level services (can be parallel)
         │    ├─ Spawn http-service with TcpSocket cap from net-stack
         │    ├─ Spawn dns-resolver with UdpSocket cap
         │    └─ ...
         │
         └─ Phase 3: Applications
              ├─ Spawn app-a with HttpEndpoint("api.example.com")
              ├─ Spawn app-b with Fetch cap (trusted)
              └─ ...

The Init Process in Detail

Init is a regular userspace process with privileged caps. It is the only process that holds ProcessSpawner (the right to create new processes) and DeviceManager (the right to enumerate and claim devices). It can delegate subsets of these to child supervisors.

// init/src/main.rs — this IS the system configuration

fn main(caps: CapSet) {
    let spawner = caps.get::<ProcessSpawner>("spawner");
    let devices = caps.get::<DeviceManager>("devices");
    let timer = caps.get::<Timer>("timer");
    let console = caps.get::<Console>("console");

    // === Phase 1: Hardware drivers ===

    // Find the NIC
    let nic_device = devices.find("virtio-net")
        .expect("no network device found");

    // Spawn NIC driver — gets ONLY its device's MMIO + IRQ
    let nic_driver = spawner.spawn(SpawnRequest {
        binary: "/sbin/virtio-net",
        caps: caps![
            "device_mmio" => nic_device.mmio(),
            "interrupt"   => nic_device.interrupt(),
            "log"         => console.clone(),
        ],
        restart: RestartPolicy::Always,
    });

    // The driver exports a Nic cap once initialized
    let nic: Cap<Nic> = nic_driver.exported("nic").wait();

    // === Phase 2: Network stack ===

    let net_stack = spawner.spawn(SpawnRequest {
        binary: "/sbin/net-stack",
        caps: caps![
            "nic"   => nic,
            "timer" => timer.clone(),
            "log"   => console.clone(),
        ],
        restart: RestartPolicy::Always,
    });

    let net_mgr: Cap<NetworkManager> = net_stack.exported("net").wait();

    // === Phase 3: HTTP service ===

    let tcp = net_mgr.create_tcp_pool();

    let http_service = spawner.spawn(SpawnRequest {
        binary: "/sbin/http-service",
        caps: caps![
            "tcp" => tcp,
            "log" => console.clone(),
        ],
        restart: RestartPolicy::Always,
    });

    let fetch: Cap<Fetch> = http_service.exported("fetch").wait();

    // === Phase 4: Applications ===

    // Trusted telemetry agent — gets full Fetch
    spawner.spawn(SpawnRequest {
        binary: "/sbin/telemetry",
        caps: caps![
            "fetch" => fetch.clone(),
            "log"   => console.clone(),
        ],
        restart: RestartPolicy::OnFailure,
    });

    // Sandboxed app — gets scoped HttpEndpoint
    let api_cap = fetch.attenuate(EndpointPolicy {
        origin: "https://api.example.com",
        paths: Some("/v1/users/*"),
        methods: Some(&["GET", "POST"]),
    });

    spawner.spawn(SpawnRequest {
        binary: "/app/my-service",
        caps: caps![
            "api" => api_cap,
            "log" => console.clone(),
        ],
        restart: RestartPolicy::OnFailure,
    });

    // Init stays alive as the root supervisor
    supervisor_loop(&spawner);
}

Key Mechanisms

Cap export. A spawned process can export capabilities back to its parent via the ProcessHandle (see Spawn Mechanism section). This is how the NIC driver makes its Nic cap available to the network stack — init spawns the driver, waits for it to export "nic", then passes that cap to the next process.

Restart policy. Encoded in SpawnRequest, enforced by the supervisor loop in the spawning process. When a child exits unexpectedly:

  1. Old caps held by the child are automatically revoked (kernel invalidates the process’s cap table on exit)
  2. Supervisor re-spawns with the same SpawnRequest
  3. New instance gets fresh caps — same authority, new identity

Dependency ordering. Sequential in code: wait() on exported caps blocks until the dependency is ready. No declarative dependency graph needed — Rust’s control flow is the dependency graph.

Service Taxonomy

Concrete categories of userspace services capOS expects to run. All spawned by init (or a supervisor init delegates to) after Stage 6. None are pre-init.

Hardware Drivers

One process per managed device. Each holds exactly the caps for its own hardware: an DeviceMmio slice, the corresponding Interrupt cap, and optionally a DmaRegion cap carved out of the frame allocator. Exports a typed device cap (Nic, BlockDevice, Framebuffer, Gpu, …). Examples: virtio-net, virtio-blk, NVMe, AHCI, framebuffer/GPU.

Platform Services

  • Logger / journal — accepts Log cap writes, forwards to console and/or durable storage. Init and kernel bootstrap use a direct Console cap until the logger is up; afterwards new services get Log caps only.
  • Filesystem — one per mounted volume. Consumes a BlockDevice cap, exports Directory / File caps. FAT, ext4, overlay, tmpfs.
  • Store — capability-native content-addressed storage backing persistent capability state (storage-and-naming-proposal.md).
  • Network stack — userspace TCP/IP (networking-proposal.md). Consumes Nic + Timer, exports NetworkManager, TcpSocket, UdpSocket, TcpListener.
  • DNS resolver — consumes a UdpSocket, exports Resolver.
  • Config / secrets store — reads the initial config from BootPackage, exposes runtime Config and Secret caps with per-key attenuation.
  • Cloud metadata agent — detects IMDS / ConfigDrive / SMBIOS on cloud boot and delivers a ManifestDelta (cloud-metadata-proposal.md).
  • Upgrade manager — orchestrates CapRetarget for live service replacement (live-upgrade-proposal.md).
  • Capability proxy — makes local caps reachable over the network with capnp-rpc transport (Plan 9’s exportfs equivalent).
  • Measurement / attestation agent — consumes sealed kernel hashes from BootPackage, exposes Quote caps for remote attestation.

Supervisors

Per-subsystem restart managers that hold a narrowed ProcessSpawner plus the caps of the subtree they own. If any child crashes, the supervisor tears down and re-spawns the set. Example: net-supervisor owns NIC driver + net-stack + DHCP client.

Application Services

User-facing or user-spawned processes: HTTP servers, API gateways, worker pools, shells, interactive tools. Hold only the narrow caps the supervisor grants (HttpEndpoint for one origin, Directory for one mount, etc.). Human users, service accounts, guests, and anonymous callers are represented by session/profile services that grant scoped cap bundles; they are not kernel subjects or ambient process credentials. See user-identity-and-policy-proposal.md.

What Does Not Become a Service

  • Console / serial — stays in the kernel as a CapObject wrapper. Small enough, needed for kernel diagnostics, no benefit from userspace isolation. A userspace log service can layer on top.
  • Frame allocator, virtual memory, scheduler, ring dispatch — kernel primitives, exposed as caps but not as services.
  • Interrupt delivery, DMA mapping — kernel mechanisms, exposed to drivers as caps.
  • Boot measurement — if added, happens in the kernel before BootPackage exists; the measurement agent (userspace) only reports them.

Supervision

Supervision Tree

Init doesn’t have to supervise everything directly. It can delegate:

init (root supervisor)
  ├─ net-supervisor (holds: spawner subset, device caps)
  │    ├─ virtio-net driver
  │    ├─ net-stack
  │    └─ http-service
  └─ app-supervisor (holds: spawner subset, service caps)
       ├─ my-service
       └─ another-app

Each supervisor is a process that holds a ProcessSpawner cap (possibly restricted to specific binaries) and the caps it needs to grant to children. If net-supervisor crashes, init restarts it, and it re-spawns the entire networking subtree.

Supervisor Loop

#![allow(unused)]
fn main() {
fn supervisor_loop(children: &[SpawnRequest], spawner: &ProcessSpawner) {
    let mut handles: Vec<ProcessHandle> = children.iter()
        .map(|req| spawner.spawn(req.clone()))
        .collect();

    loop {
        // Wait for any child to exit
        let (index, exit_code) = wait_any(&handles);
        let req = &children[index];

        match req.restart {
            RestartPolicy::Always => {
                handles[index] = spawner.spawn(req.clone());
            }
            RestartPolicy::OnFailure if exit_code != 0 => {
                handles[index] = spawner.spawn(req.clone());
            }
            _ => {
                // Process exited normally, don't restart
            }
        }
    }
}
}

Socket Activation

systemd pre-creates a socket and passes the fd to the service on first connection. In capOS, the supervisor does the same with caps:

Eager (default): supervisor spawns the child immediately with a TcpListener cap. Child calls accept() and blocks.

Lazy: supervisor holds the TcpListener cap itself. On first incoming connection (or on first accept() from a proxy cap), it spawns the child and transfers the cap. The child code is identical in both cases.

#![allow(unused)]
fn main() {
// Lazy activation — supervisor holds the listener until needed
let listener = net_mgr.create_tcp_listener();
listener.bind([0,0,0,0], 8080);

// This blocks until a connection arrives
let _conn = listener.accept();

// Now spawn the actual service, giving it the listener
spawner.spawn(SpawnRequest {
    binary: "/app/web-server",
    caps: caps!["listener" => listener, "log" => console.clone()],
    restart: RestartPolicy::Always,
});
}

Configuration

See docs/proposals/storage-and-naming-proposal.md for the full storage, naming, and configuration model.

Summary: the system topology is currently defined in a capnp-encoded system manifest baked into the boot image. tools/mkmanifest compiles the human-authored system.cue or system-spawn.cue source into the binary manifest. Default boot lets init validate and execute that manifest through ProcessSpawner; the remaining cleanup is to remove the unused generic kernel resolver and move runtime configuration into a capability-based store service once that service exists.

Comparison with Traditional Approaches

Concernsystemd/LinuxcapOS
Service dependenciesWants=, After=, Requires=Implicit in cap graph
Sandboxingseccomp, namespaces, AppArmorDefault: zero ambient authority
Socket activationListenStream=, fd passing protocolPass TcpListener cap
Restart policyRestart=on-failureSupervisor process loop
Loggingjournald, StandardOutput=journalLog cap in granted set
Resource limitscgroups, MemoryMax=, CPUQuota=Bounded allocator caps
Network access controlfirewall rules (iptables/nftables)Scoped HttpEndpoint / TcpSocket caps
Config formatINI-like unit files (~1500 directives)Rust code or minimal manifest
Trusted computing basesystemd PID 1 (~1.4M lines)Init process (hundreds of lines)

Spawn Mechanism

Spawning is a capability-gated operation. The kernel provides a ProcessSpawner capability — only the holder can create new processes.

Implemented Kernel Slice

The kernel now provides:

  1. ProcessSpawner capability — a CapObject impl in kernel/src/cap/process_spawner.rs. Methods:

    • spawn(name, binaryName, grants) -> handleIndex — resolve a boot-package binary, load ELF, create address space (builds on existing elf.rs loader and AddressSpace::new_user() in mem/paging.rs), populate the initial cap table, schedule the process, and return the ProcessHandle through the ring result-cap list
    • the returned ProcessHandle cap lets the parent wait for child exit in the first slice; exported caps and kill semantics are later lifecycle work
  2. Initial cap passing — at spawn time, the kernel copies permitted parent cap references into the child’s cap table or mints authorized child-local kernel caps. Raw grants preserve the source badge, endpoint-client grants can mint a requested badge from an endpoint owner or parent endpoint result source without adding server authority, and child-local Endpoint, FrameAllocator, and VirtualMemory grants are created for the child’s process. Child-local endpoint grants return parent-side client facets as result caps instead of sharing the endpoint owner object. The parent’s references are unaffected.

  3. Cap export — future lifecycle work will let a child register a cap by name in its ProcessHandle, making it available to the parent (or anyone holding the handle). This is the mechanism behind nic_driver.exported("nic").wait() once exported-cap lookup is added.

Schema

interface ProcessSpawner {
    spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}

struct CapGrant {
    name @0 :Text;
    capId @1 :UInt32;
    interfaceId @2 :UInt64;
    mode @3 :CapGrantMode;
    badge @4 :UInt64;
    source @5 :CapGrantSource;
}

struct CapGrantSource {
    union {
        capability @0 :Void;
        kernel @1 :KernelCapSource;
    }
}

enum CapGrantMode {
    raw @0;
    clientEndpoint @1;
}

interface ProcessHandle {
    wait @0 () -> (exitCode :Int64);
}

Note on capability passing: Capabilities are referenced by cap table slot IDs (UInt32), not by Cap’n Proto’s native capability table mechanism. spawn() returns the ProcessHandle through the ring result-cap list; handleIndex identifies that transferred cap in the completion. The first slice passes a boot-package binaryName instead of raw ELF bytes so the request stays within the bounded ring parameter buffer. kill, post-spawn grants, and exported-cap lookup remain future lifecycle work until their teardown and authority semantics are implemented. capOS uses manual capnp dispatch (CapObject trait with raw message bytes, not capnp-rpc), so cap references are plain integers and typed result caps use the ring transfer-result metadata. See userspace-binaries-proposal.md Part 7 for the surrounding userspace bootstrap schema context.

Relationship to Existing Code

The current kernel has these pieces in place:

  • ELF loading (kernel/src/elf.rs) — parses PT_LOAD segments, validates alignment, and feeds the reusable spawn primitive behind ProcessSpawner.
  • Address space creation (kernel/src/mem/paging.rs) — AddressSpace::new_user() creates isolated page tables with the kernel mapped in the upper half.
  • Cap table (kernel/src/cap/table.rs) — CapTable with insert(), get(), remove(), transfer preflight, provisional insert, commit, and rollback helpers. Each Process owns one local table.
  • Process struct and scheduler (kernel/src/process.rs, kernel/src/sched.rs) — a process table plus round-robin run queue are in place for both legacy manifest-spawned services and init-spawned children.

Generic capability transfer/release and the reusable ProcessSpawner lifecycle path are complete enough for default init-owned service startup. Remaining lifecycle gaps are kill, post-spawn grants, runtime exported-cap lookup, restart supervision, and retiring the now-unused generic kernel service-cap resolver.

Prerequisites

PrerequisiteStatusWhy
ELF loading + address spacesDone (Stage 2-3)elf.rs, AddressSpace::new_user()
Capability ring + cap_enterDone (Stage 4/6 foundation)Ring-based cap invocation with blocking waits
Scheduling + preemption (core)Done (Stage 5)Round-robin, PIT 100 Hz, context switch
Cross-process Endpoint IPCDone (Stage 6 foundation)CALL/RECV/RETURN routing through Endpoint objects
Generic cap transfer/releaseDone (Stage 6, 2026-04-22)Copy/move transfer, result-cap insertion, CAP_OP_RELEASE; epoch revocation still future
ProcessSpawner + ProcessHandleDone (Stage 6, 2026-04-22)Init-driven spawn with grants, wait completion, hostile-input coverage; kill/post-spawn grants still future
Authority graph + quota design (S.9)Done (2026-04-21)Defines transfer/spawn invariants, per-process quotas, and rollback rules; see docs/authority-accounting-transfer-design.md

This proposal describes the target architecture. Individual pieces (like Fetch/HttpEndpoint) are additive — they’re userspace processes that compose existing caps into higher-level ones. No kernel changes needed beyond Stages 4-6.

First Step After Transfer and ProcessSpawner — done 2026-04-23

The minimal demonstration of this architecture landed together with capability transfer and ProcessSpawner:

  1. ProcessSpawner cap in kernel/src/cap/process_spawner.rs wraps ELF loading and address-space creation behind a typed capability.
  2. Init spawns children — default make run boots a manifest with config.initExecutesManifest set; the kernel boots only init, then init spawns the default demo graph from manifest entries through ProcessSpawner, grants child-local endpoint owners and client facets, then releases parent endpoint facets before waiting on each ProcessHandle.
  3. Cross-process cap invocation — spawned client invokes the server’s Endpoint cap, server replies, both print to console.

This exercises: spawn cap, initial cap passing, manifest-declared export recording, cross-process cap invocation, hostile-input rejection, and per-process resource exhaustion paths. Deleting the unused legacy kernel resolver is post-milestone cleanup tracked in WORKPLAN.md.

Open Questions

  1. Cap revocation. If a service is restarted, its old caps should be invalidated. Planned approach (from research): epoch-based revocation (EROS-inspired, O(1) invalidate) plus generation counters on CapId (Zircon-inspired, stale reference detection). See ROADMAP.md Stages 5-6.

  2. Cap discovery. How does a process learn what caps it was given? Resolved: name→(cap_id, interface_id) mapping passed at spawn via a well-known page (CapSet). See userspace-binaries-proposal.md Part 2. cap_id is the authority-bearing table handle. interface_id is the transported capnp TYPE_ID used by typed clients to check that the handle speaks the expected interface.

  3. Lazy spawning. Should the init process start everything eagerly, or should caps be backed by lazy proxies that spawn the backing service on first invocation?

  4. Cap persistence. If the system reboots, should the cap graph be reconstructable from saved state? Or is it always rebuilt from init code?

  5. Delegation depth. Can an application further delegate its HttpEndpoint cap to a subprocess? If so, the HTTP gateway needs to support fan-out. If not, how is this restriction enforced?