Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Capability-Based Binaries, Language Support, and Compatibility Adapters

How userspace binaries receive, use, and compose capabilities, from the native Rust runtime through future language runtimes and compatibility adapters.

Current State

The init binary (init/src/main.rs) and smoke services are no_std Rust binaries over capos-rt. The runtime owns _start, fixed heap initialization, CapSet parsing, exit/cap_enter syscall wrappers, typed clients, result-cap adoption, queued release flushing, and panic output. Init reads the BootPackage manifest, validates the metadata-only service graph, spawns child services through ProcessSpawner, waits on ProcessHandles, and exits. The former raw bootstrap syscall and demo-support runtime shims are historical; demo support now keeps only low-level transport helpers for intentionally malformed SQE/CQE smokes.

Userspace now has a checked-in targets/x86_64-unknown-capos.json custom target that exposes target_os = "capos" while preserving the current static ELF, soft-float, no_std baseline. The kernel remains on the repository default x86_64-unknown-none target. init, demos, shell, and the capos-rt smoke binary build through custom-target Cargo aliases, and checked-in CUE manifests embed userspace from target/x86_64-unknown-capos/release paths. The remaining future work is hardening this target contract into a broader toolchain and packaging interface rather than treating it as a probe.

The kernel-side roadmap provides the capability ring (SQ/CQ shared memory plus cap_enter, implemented), scheduling, and IPC. This proposal covers the userspace half: what binaries look like, how they are built, and how existing software can be adapted to a system with no ambient authority.

Part 1: Native Userspace Runtime (capos-rt)

The Historical Problem

Before capos-rt, every userspace binary had to:

  • Define _start and a panic handler
  • Set up an allocator
  • Construct raw syscall wrappers
  • Manually serialize/deserialize capnp messages
  • Know the syscall ABI (register layout, method IDs)

That was acceptable for one proof-of-concept binary. It does not scale to dozens of services, and the current tree has moved those mechanics into capos-rt.

Solution: A Userspace Runtime Crate

capos-rt is a no_std + alloc Rust crate that every native capOS binary depends on. It provides:

1. Entry point and allocator setup.

#![allow(unused)]
fn main() {
use capos_rt::{Console, ConsoleClient, Runtime};

fn service_main(mut runtime: Runtime) -> i64 {
    let console = match runtime.capset().get_typed::<Console>(b"console") {
        Ok(cap) => cap,
        Err(_) => return 1,
    };
    let mut ring = match runtime.ring_client() {
        Ok(ring) => ring,
        Err(_) => return 2,
    };
    let mut client = ConsoleClient::new(console);
    match client.write_line_wait(&mut ring, "Hello from capOS", u64::MAX) {
        Ok(()) => 0,
        Err(_) => 3,
    }
}

capos_rt::entry_point!(service_main);
}

2. Syscall layer. Raw syscall asm wrapped in safe Rust functions. The entire syscall surface is 2 calls – new operations are SQE opcodes, not new syscalls:

  • sys_exit(code) – terminate the current thread; the process exits when this was its last live thread (syscall 1)
  • sys_cap_enter(min_complete, timeout_ns) – flush pending SQEs, then wait until N completions are available or the timeout expires (syscall 2)

The accepted in-process threading contract preserves this two-syscall surface: thread exit is available through both the raw terminal syscall and the typed ThreadControl.exitThread capability call.

Capability invocations go through the per-process SQ/CQ ring. capos-rt provides helpers for writing SQEs and reading CQEs:

#![allow(unused)]
fn main() {
/// Submit a CALL SQE to the capability ring and wait for the CQE.
pub fn cap_call(
    ring: &mut CapRing,
    cap_id: u32,
    method_id: u16,
    params: &[u8],
    result_buf: &mut [u8],
) -> Result<usize, CapError> {
    ring.push_call_sqe(cap_id, method_id, params);
    sys_cap_enter(1, u64::MAX);
    ring.pop_cqe(result_buf)
}
}

3. Cap’n Proto integration. The current runtime uses handwritten typed clients over schema-defined method ids and message shapes. Shared generated schema bindings live through capos-config; broad generated client bindings for capos-rt remain future work. The runtime owns transport lifetime and completion matching, while each typed client owns its interface-specific message encoding.

4. CapSet – the initial capability environment.

At spawn time, the kernel writes the process’s initial capabilities into the read-only CapSet page and passes its address to _start. capos-rt parses this into a typed lookup surface over name, local CapId, and interface id.

#![allow(unused)]
fn main() {
struct CapEntry {
    cap_id: u32,        // authority-bearing slot in the process CapTable
    interface_id: u64,  // Cap'n Proto interface TYPE_ID for type checking
}

impl CapSet {
    /// Get a typed capability by manifest name.
    pub fn get_typed<T: CapabilityType>(
        &self,
        name: &[u8],
    ) -> Result<Capability<T>, CapSetError> { ... }

    /// Iterate manifest-order entries for diagnostics and shell inspection.
    pub fn iter(&self) -> impl Iterator<Item = CapSetEntryRef> { ... }
}
}

interface_id is not a handle. It is metadata carrying the Cap’n Proto TYPE_ID for the interface expected by the typed client. The handle is cap_id. A typed client constructor must check that entry.interface_id == T::TYPE_ID, then store the local CapId. Normal CALL SQEs do not need to repeat the interface ID because each capability table entry exposes one public interface. The ring SQE keeps fixed-size reserved padding for ABI stability, not a required interface field for the system transport.

This matters for the system transport because several capabilities can expose the same interface while representing different authority: a serial console, a log-buffer console, and a console proxy all have the Console TYPE_ID, but different CapId values.

Crate Structure

capos-rt/
  Cargo.toml          # no_std + alloc, depends on capnp
  build.rs            # userspace linker arguments
  src/
    lib.rs            # type markers, owned handles, entry_point! macro
    entry.rs          # _start, Runtime, bootstrap validation
    syscall.rs        # raw asm syscall wrappers
    capset.rs         # CapSet lookup and iteration helpers
    client.rs         # handwritten typed clients
    ring.rs           # single-owner ring client and completion matching
    alloc.rs          # userspace heap allocator setup

capos-rt is NOT a workspace member (same as init/ – needs different target/linker handling from the kernel). It’s a path dependency for userspace crates.

Init On The Current Runtime

init/src/main.rs is already a capos-rt user. Its init_main(Runtime) entry is registered with capos_rt::entry_point!, obtains typed bootstrap caps from the runtime CapSet, reads the BootPackage manifest, validates the service graph, resolves spawn grants, launches children through ProcessSpawnerClient, waits on ProcessHandleClient, and reports failures through the Console client.

Part 2: Capability-Based Binary Model

Binary Format

ELF64, same as now. The kernel’s ELF loader (kernel/src/elf.rs) already handles PT_LOAD segments. No changes to the binary format itself.

What changed from the early prototype to the current runtime baseline is the ABI contract between kernel and binary:

AspectHistorical prototypeCurrent capos-rt baseline
Entry pointcrate-local _start(), no argsruntime-owned _start(ring_addr, pid, capset_addr)
Syscall ABIad-hoc (rax=0 write, rax=1 exit)SQ/CQ ring + sys_cap_enter + sys_exit
Capability accessnoneread-only CapSet page validated by capos-rt
SerializationnoneCap’n Proto messages encoded by typed clients
Allocatornone or crate-localruntime-owned fixed heap

Initial Capability Passing

The kernel communicates bootstrap state through _start arguments and fixed userspace mappings. The implemented shape is:

  • ring_addr: the process capability ring, expected to equal RING_VADDR.
  • pid: the process identifier for diagnostics/runtime bookkeeping.
  • capset_addr: read-only bootstrap CapSet page populated from the manifest and spawn grants.

Earlier options considered:

Option A: Well-known page. Kernel maps a read-only page at a fixed virtual address (e.g., 0x1000) containing a capnp-serialized InitialCaps message:

struct InitialCaps {
    entries @0 :List(InitialCapEntry);
}

struct InitialCapEntry {
    name @0 :Text;
    id @1 :UInt32;
    interfaceId @2 :UInt64;
}

Option B: Register convention. Pass pointer and length in rdi/rsi at entry. Simpler, but the data still needs to live somewhere in user memory.

Option C: Stack. Push the cap descriptor onto the user stack before iretq. Similar to how Linux passes auxv to _start.

Option A is cleanest – the page is always there, no calling-convention dependency, and it naturally extends to passing additional boot info later.

Service Binary Lifecycle

1. Kernel loads ELF, creates address space, populates cap table
2. Kernel maps InitialCaps page at well-known address
3. Kernel enters userspace at _start

4. capos-rt _start:
   a. Initialize heap allocator
   b. Parse InitialCaps page into CapSet
   c. Call user's main(CapSet)

5. User main:
   a. Extract needed caps from CapSet
   b. Do work (invoke caps, serve requests)
   c. Optionally export caps to parent once ProcessHandle export lookup exists

6. On return from main (or sys_exit):
   a. Kernel destroys process
   b. All caps in process's cap table are dropped
   c. Parent's ProcessHandle receives exit notification

Part 3: Language Support Roadmap

The current manual status page for this subject is Programming Languages. This proposal owns the longer roadmap and should not be read as implemented support for every language listed below.

Implemented Baseline: Rust (no_std + alloc)

Rust is the only implemented booted language path. Native services use #![no_std], alloc, capos-rt, static ELF binaries, and the targets/x86_64-unknown-capos.json userspace target. This fits the current kernel because it does not require a libc, dynamic linker, process environment, global filesystem, or ambient socket namespace.

Rust remains the default implementation language for core capOS services until the runtime, schema, and packaging contracts are stable. That is a project priority, not a rule that every future service must be written in Rust.

Future: Rust std

Rust std support is not implemented. It requires an operating-system backend for filesystem, networking, threads, time, standard I/O, process, environment, and synchronization APIs. On capOS those APIs must get authority from granted capabilities such as Directory, File, TcpSocket, Timer, ThreadSpawner, ThreadControl, ParkSpace, StdIO, and ProcessSpawner.

The project has not selected whether Rust std should be implemented directly over native capOS capabilities, through a POSIX compatibility adapter, or in a hybrid form. Until that decision is made, native no_std + alloc Rust over capos-rt remains the supported Rust path.

C via libcapos

The C substrate is in tree at Phase 0. The libcapos/ crate compiles to libcapos.a, a thin Rust staticlib that exposes the capos-rt syscall, ring CALL, CapSet lookup, and global allocator under an extern "C" ABI. C binaries link statically against the archive, share the userspace ELF layout used by Rust demos, and run inside the existing capos-rt _start chain. make run-c-hello boots a C main() that calls Console.writeLine, Timer.now, EntropySource.fill, and VirtualMemory wrappers through libcapos and exits cleanly. make run-c-pipe boots a second native C smoke that creates a kernel pipe through the typed ProcessSpawner.createPipe wrapper, writes and reads a marker through typed Pipe wrappers, closes the writer, observes EOF, and exits cleanly.

The current substrate is intentionally narrow: capability primitives, hand-written typed wrappers (capos_console_write_line, capos_timer_now, capos_entropy_fill, the capos_virtual_memory_{map,unmap,protect} trio, capos_process_spawner_create_pipe, and capos_pipe_{read,write,close}), raw syscalls, and the heap shim. The Pipe wrapper is a typed bridge over the existing transferred-result-cap path; it does not make capos_cap_call() a general transfer ABI, which still refuses transfer-bearing completions with CAPOS_E_TRANSFER_NOT_SUPPORTED. Anything POSIX-shaped (errno, fd table, open/read/write, signals, fork/exec, sockets) belongs in the separate libcapos-posix layer above libcapos. Generated typed wrappers for the remaining capabilities (NetworkManager, Endpoint, etc.), a stable C ABI for cap-transfer (today the v0 surface refuses transfer-bearing completions with CAPOS_E_TRANSFER_NOT_SUPPORTED), and per-thread runtime routing are also future work. Until that routing or a POSIX pthread layer lands, libcapos v0 is fail-closed for C-created capOS threads: capos_cap_call rejects bootstrap ThreadSpawner capabilities with CAPOS_E_THREADING_UNSUPPORTED, and concurrent or re-entrant runtime borrows return CAPOS_E_RUNTIME_BUSY.

The target libcapos shape is a static library providing:

#include <capos.h>

// Ring-based capability invocation (synchronous wrapper around SQ/CQ ring)
int cap_call(cap_ring_t *ring, uint32_t cap_id, uint16_t method_id,
             const void *params, size_t params_len,
             void *result, size_t result_len);

// Typed wrappers (generated from .capnp schema)
int console_write(cap_t console, const void *data, size_t len);
int console_write_line(cap_t console, const char *text);

// CapSet access
cap_t capset_get(const char *name);
uint64_t capset_interface_id(const char *name);

// Syscalls (the entire syscall surface -- 2 calls total)
_Noreturn void sys_exit(int code);                   // terminate current thread
uint32_t sys_cap_enter(uint32_t min_complete,        // flush SQEs + wait
                       uint64_t timeout_ns);

Implementation: libcapos is Rust compiled to a static .a with a C ABI (#[no_mangle] extern "C"). The capnp message construction happens in Rust behind the C API. This avoids requiring a C capnp implementation.

C binaries would link against libcapos.a and use the same static userspace ELF model as Rust binaries. Startup, allocator setup, CapSet access, and ring submission should be owned by libcapos, not repeated in every C program.

Future: C++

C++ support waits on the C substrate and explicit ABI decisions: exceptions, RTTI, TLS, allocator behavior, unwind policy, static initialization, and the scope of any standard-library subset. A freestanding arena/container subset is plausible earlier than hosted C++.

The previously inspected pg83/std library remains a later experiment, not a shortcut to full C++ support. Its low-level arena/container pieces are relevant; its hosted/POSIX assumptions still require the same capOS adapter work as other C++ libraries.

Future: Go (GOOS=capos)

Go is the next high-priority runtime after regular Rust. It needs in-process threading, futex-like wait/wake, TLS/runtime metadata support, GC integration, and a network poller mapped to capOS capabilities. See Go Runtime for the dedicated plan.

Go has higher priority than C++ because it unlocks CUE and a large practical tooling/runtime ecosystem. Go via WASI may be useful for CPU-bound CUE evaluation before native Go exists, but it is not a substitute for native Go network services or full runtime behavior.

Future: Python

Python is not implemented on booted capOS. It has three plausible paths:

  1. Native CPython through a POSIX compatibility adapter. This depends on the C/libc substrate plus file, stdio, timer, networking, and process adapters. It is the likely path for trusted system scripts and Python tools that need capOS storage or networking.
  2. MicroPython through the native C substrate. This is a smaller early scripting option with less runtime surface than CPython.
  3. WASI or Emscripten-hosted Python. This is useful for sandboxed or compute-oriented Python. It still runs a Python interpreter; WebAssembly is the sandbox and host ABI, not a way to avoid Python runtime work.

As of this review, upstream CPython support helps only the WebAssembly path: PEP 11 lists wasm32-unknown-wasip1 as Tier 2 and wasm32-unknown-emscripten as Tier 3, and PEP 776 records Emscripten support for Python 3.14. Those facts do not provide native capOS bindings for files, sockets, threads, process launch, or capabilities.

Future: Lua

Lua is a future capability-scoped scripting runner. The dedicated Lua Scripting proposal defines capos-lua as an ordinary userspace process with exact grants, curated standard libraries, unforgeable capability userdata, and no raw CapIds exposed to scripts. Upstream PUC Lua is a C implementation, so the native path waits on the C/libcapos substrate unless the project uses a pure-Rust Lua-like VM as a bootstrap proof.

Future: JavaScript / TypeScript

JavaScript support means running an engine as an ordinary capOS process. A small QuickJS-style native runner is the likely first experiment after C support. V8 or SpiderMonkey are much larger C++ runtime ports. TypeScript is normally compiled before execution and should not imply a kernel or base-system TypeScript compiler.

Partially landed: WASI and WebAssembly

The WASI host adapter Phase W.4 closed 2026-05-07 20:09 UTC (docs/proposals/wasi-host-adapter-proposal.md, docs/proposals/wasi-host-adapter-proposal.md). Languages that compile to WASI Preview 1 can now run on capOS through the wasm-host process (capos-wasm/, vendored wasmi 1.0.9), with imports backed by granted capOS capabilities. The current Preview 1 surface covers stdout/stderr writes, manifest-granted argv, bounded manifest-granted environment entries through initConfig.init.wasiEnv, monotonic clock time/resolution, no-op sched_yield, stdio fd metadata, stdio seek refusal as ERRNO_SPIPE, clean shutdown, and random_get when the manifest grants EntropySource. The regression smokes are make run-wasi-hello-rust (Rust wasm32-wasip1 payload), make run-wasi-hello-c (C wasm32-wasi payload), make run-wasi-cli-args, make run-wasi-env, make run-wasi-random, make run-wasi-random-ungranted, and make run-wasi-stdio-fd. Filesystem (W.5), sockets (W.6), and Preview 2 / Component Model (W.7+) remain future phases; make run-wasi-preview1-refusals keeps proving representative blocked storage/socket imports return ERRNO_NOSYS = 52 without authority.

Important distinction: WASI works differently for compiled vs. interpreted languages:

  • Compiled languages (Rust, C) compile directly to .wasm — no interpreter in the loop. WASI is a clean, efficient execution path.
  • Interpreted languages (Python, JS, Lua) still need their interpreter (CPython, QuickJS, etc.) — it’s just compiled to .wasm instead of native code. The stack becomes: script → interpreter.wasm → WASI runtime → kernel. You pay for a wasm sandbox layer on top of the interpreter you’d need anyway.

For interpreted languages, WASI sandboxing is valuable when running untrusted plugins or user-submitted scripts. For trusted system scripts, native CPython, QuickJS, or Lua over a POSIX or capability-native adapter may be simpler and faster once the native C substrate exists.

Future: Managed Runtimes

Languages with large managed runtimes such as Java and .NET need their runtime ported or a WASI-style host path. This is large effort and low priority.

Part 4: POSIX Compatibility Adapter

Status note: the full design lives in POSIX Adapter proposal and the implementation decomposition in POSIX Adapter, which are the canonical source for phase status. Phases P1.1 (libcapos C-substrate v0 + C hello smoke, closed 2026-05-05 13:28 UTC), P1.2 Phase A (UDP cap surface + capos-rt UdpSocketClient, closed 2026-05-05 18:02 UTC), P1.2 Phase B (kernel UDP path, libcapos-posix crate, dns.c vendoring, demo + manifest, closed 2026-05-05 21:21 UTC), and P1.3 (Pipe cap + recording-shim fork-for-exec + posix_spawn successor, closed 2026-05-07 09:55 UTC) have landed. The remaining open phase is the dash port successor (Task 4). The Namespace + File cap surface from Storage and Naming proposal has landed far enough for the v0 smoke; current POSIX-adapter work is now dash vendoring/patching, the multi-translation-unit C build, and the run-posix-shell-smoke harness. The signal/time stub slice is closed by make run-posix-signal-time. The sketch below remains for context; the dedicated proposal and plan are the source of truth for FdTable shape, supported-function matrix, and open questions.

Why POSIX at All?

capOS is not POSIX and doesn’t want to be. But:

  1. Existing software. Most useful software assumes POSIX. A DNS resolver, an HTTP server, a database – all speak open()/read()/write()/socket(). Without an adapter, every piece of software must be rewritten.

  2. Developer familiarity. Programmers know POSIX. A compatibility adapter lowers the barrier to writing capOS software, even if native caps are better.

  3. Gradual migration. Port software first with POSIX-shaped APIs, then incrementally convert to native capabilities for tighter sandboxing.

The goal is not full POSIX compliance. It is a pragmatic adapter that maps selected POSIX concepts to capabilities so existing software can run with bounded modification while preserving capability-based authority.

Architecture: libcapos-posix

Application (C/Rust, uses POSIX APIs)
  │
  │  open(), read(), write(), socket(), ...
  │
  v
libcapos-posix (POSIX-to-capability adapter)
  │
  │  Maps fds to caps, paths to granted directory/namespace lookups
  │
  v
libcapos (native capability invocation)
  │
  │  SQ/CQ ring + cap_enter syscall
  │
  v
Kernel (capability dispatch)

libcapos-posix is a static library that provides POSIX-like function signatures over granted capabilities. It is not an authority source and should not be described as “Linux compatibility.” A process without file/directory authority cannot open files; a process without socket authority cannot create sockets; a process without launcher or spawner authority cannot create children.

Current v0 surface (shipped as libcapos-posix.a alongside libcapos.a; see libcapos-posix/ and the canonical POSIX Adapter proposal):

  • Static-array fd table with a 32-fd cap (P1.2 Phase A decision §5).
  • Single-thread __errno_location() TLS cell (P1.2 Phase A decision §4).
  • socket(AF_INET, SOCK_DGRAM, 0) / sendto / recvfrom / close over the kernel UdpSocket capability (P1.2 Phase B).
  • pipe / read / write / dup / dup2 / close over the kernel Pipe capability via ProcessSpawner.createPipe (P1.3).
  • fork / execve / waitpid / _exit / posix_inherit_stdio via the recording-shim ProcessSpawner.spawn Move-grant path (P1.3 §6 decision: Variant A). fork() returns 0 unconditionally and opens a TLS recording window; dup2() / close() between fork and execve record into the window; execve() drains the recording into stdio_<N> spawn grants and returns the synthetic child pid (a deliberate v0 deviation from POSIX).
  • Direct posix_spawn / posix_spawn_file_actions_init / _destroy / _adddup2 / _addclose over the same Move-grant action-replay code path; argv / envp are accepted but ignored until a LaunchParameters surface lands.
  • open / read / write / close / lseek over the bootstrap root Directory and minted File caps; opendir / readdir / closedir over minted Directory caps.
  • Console/Terminal stdio adoption, focused printf / string / ctype helpers, manifest-backed getenv / setenv / putenv / unsetenv, and single-identity getpid / getuid / getgid stubs.
  • clock_gettime(CLOCK_MONOTONIC, ...) / gettimeofday(&tv, NULL) / time / nanosleep / sleep over the kernel Timer capability.
  • signal / sigaction store handlers without delivery; kill and raise fail closed until typed process-control authority exists.

C headers ship under libcapos-posix/include/capos/posix/ (errno.h, dirent.h, fcntl.h, signal.h, spawn.h, stdio.h, stdlib.h, string.h, sys/socket.h, sys/wait.h, time.h, unistd.h, and focused subsets such as ctype.h). libcapos-posix reuses libcapos’s installed Runtime through the renamed extern crate libcapos_::runtime::with(...) to avoid colliding with libcapos’s C-side capos_* exports.

Not yet implemented for the dash-port successor: file metadata/remove calls such as stat / fstat / access / unlink, TCP socket wrappers, select / poll / epoll, real asynchronous signal delivery, job control, chdir / cwd-relative path resolution, and broad FILE * stream semantics. These remain on the dash port successor track (Task 4 of docs/proposals/posix-adapter-proposal.md) or later typed-authority work.

File Descriptor Table

POSIX programs think in file descriptors. capOS has capabilities. The implemented v0 translation is a fixed 32-slot per-process fd table inside libcapos-posix. Slots may be backed by Console, UDP socket, Pipe, File, Directory, TerminalSession, or a moved-out sentinel used by the recording-shim execve() path.

Fd 0/1/2 are initialized only from explicit authority:

  • stdio_<N> Pipe grants seeded by a parent spawn action take precedence.
  • A bootstrap TerminalSession cap may adopt empty stdio slots when the program calls posix_inherit_stdio().
  • A bootstrap Console cap fills empty fd 1 and fd 2 for simple smokes.
  • Fd 0 stays closed unless the process received pipe or terminal input authority.

Path Resolution

POSIX open("/etc/config.toml", O_RDONLY) becomes:

  1. libcapos-posix looks up the bootstrap-granted root Directory cap named root.
  2. It rejects relative paths, .., and non-UTF-8 or oversized path segments.
  3. It walks intermediate components with Directory.sub().
  4. It opens the leaf with Directory.open() or Directory.sub().
  5. It installs a File or Directory fd slot with per-fd position / iteration state.

The future Namespace + Store resolver remains documented in the POSIX adapter proposal, but the shipped v0 dash-port proof uses the RAM-backed root Directory capability because that is the implemented kernel authority.

Supported POSIX Functions

Grouped by what capability backs them:

Console cap -> stdio:

POSIXcapOS translation
write(1, buf, len)console.write(buf[..len])
write(2, buf, len)console.write(buf[..len]) (or log cap)
read(0, buf, len)Pipe or TerminalSession-backed stdin when granted

Directory + File caps -> file I/O:

POSIXcapOS translation
open(path, flags)root Directory walk -> Directory.open() -> fd
read(fd, buf, len)File.read(offset, len) using per-fd position
write(fd, buf, len)File.write(offset, bytes) using per-fd position
close(fd)drop/release the backing cap slot
lseek(fd, off, whence)update per-fd file position
opendir/readdir/closedirDirectory.list() plus per-fd iteration

Pipe + ProcessSpawner caps -> subprocess I/O:

POSIXcapOS translation
pipe(fds)ProcessSpawner.createPipe() -> two Pipe-backed fds
fork() + execve()recording shim -> ProcessSpawner.spawn()
posix_spawn()direct action replay -> ProcessSpawner.spawn()
waitpid(pid, &status, 0)ProcessHandle.wait()

UdpSocket caps -> networking:

POSIXcapOS translation
socket(AF_INET, SOCK_DGRAM, 0)NetworkManager.createUdpSocket() -> fd
sendto / recvfromUdpSocket.sendTo() / UdpSocket.recvFrom()
close(fd)release the owned UdpSocket cap

Timer + local stubs:

POSIXcapOS translation
clock_gettime / gettimeofday / timeTimer.now()
nanosleep / sleepTimer.sleep()
signal / sigactionstore handler locally, never deliver
kill / raisevalidate signal number, then fail closed

Not supported or still partial:

POSIXWhy not
bare fork() state cloningNo address space cloning; only fork-for-exec is recorded
in-place exec() replacementSpawn creates a fresh process
real signal delivery / job controlNeeds typed process-control and terminal authority
chmod/chownNo permission bits. Authority is structural
mmap(MAP_SHARED)No shared memory yet (future: SharedMemory cap)
ioctlNo device files. Use typed capability methods
ptraceNo debugging interface yet
select/poll/epollRequires async cap invocation (Stage 5+). Initial version is blocking only

Process Creation Compatibility

capOS process creation is spawn-style, not fork/exec-style. A new process is a fresh ELF instance selected by ProcessSpawner, with an explicit initial CapSet assembled from granted capabilities. The parent address space is not cloned, and an existing process image is not replaced in place.

posix_spawn() is the compatibility primitive for subprocess creation. libcapos-posix (P1.3, closed 2026-05-07 09:55 UTC) maps it to ProcessSpawner.spawn(), translates posix_spawn_file_actions into fd-table setup and Move-grant stdio_<N> capability grants on the spawn ABI. argv / envp are accepted but ignored until a LaunchParameters surface lands. make run-posix-spawn-smoke is the end-to-end proof.

Full fork() is intentionally not a native kernel primitive. Supporting it would require copy-on-write address-space cloning, parent/child register return semantics, fd-table duplication, a per-capability inheritance policy, safe handling for outstanding SQEs/CQEs, and defined behavior for endpoint calls, timers, waits, and process handles that are in flight at the fork point. Threaded POSIX processes add another constraint: only the calling thread is cloned, while locks and async-signal-safe state must remain coherent in the child.

P1.3 also shipped a narrow recording-shim fork() for the common fork-for-exec pattern that does not require general address-space cloning. fork() returns 0 unconditionally and opens a TLS recording window; dup2() / close() between fork and execve record into the window without mutating the parent fd table; execve() drains the recording into Move-grant stdio_<N> spawn grants and returns the synthetic child pid as its own return value. The pseudo-child branch is still the parent process, so a failed execve() MUST NOT call _exit() – it must surface the error to the parent’s normal error path. The user pattern is pid_t child = fork(); if (child == 0) { dup2(); close(); child = execve(...); } /* parent flow */. Earlier iterations used x86_64 setjmp/longjmp to fake fork-return-twice; that was replaced because longjmp back into fork()’s already- returned stack frame was undefined behaviour. make run-posix-pipe-smoke is the end-to-end proof.

make run-posix-dns-smoke exercises socket(AF_INET, SOCK_DGRAM, 0) / sendto / recvfrom against the kernel UdpSocket capability through a hand-rolled DNS A query in demos/posix-dns-resolver/. The current smoke does not compile the vendored dns.c whole because the v0 libcapos-posix POSIX surface is narrower than dns.c expects (poll.h, netinet/in.h, arpa/inet.h, netdb.h, sys/select.h, sys/un.h); widening that surface is follow-on work on the dash port track.

Security Model

The POSIX compatibility adapter does not weaken capability security. Every POSIX call translates to a capability invocation on caps the process was actually granted:

  • open("/etc/passwd") fails if the process lacks a bootstrap root Directory cap or that directory tree does not contain etc/passwd – not because of permission bits, but because no granted authority resolves the path.
  • socket(AF_INET, SOCK_DGRAM, 0) fails if the process was not granted a NetworkManager cap; TCP stream wrappers remain future work.
  • fork() only opens the recording window for the supported fork-for-exec pattern; bare address-space cloning remains unsupported.

A POSIX binary on capOS is more constrained than on Linux, not less. The compatibility adapter provides familiar function signatures, not familiar authority.

Building POSIX-Compatible Binaries

my-app/
  Cargo.toml        # depends on capos-posix (which depends on capos-rt)
  src/main.rs       # uses libc-style APIs

Or for C:

#include <capos/posix/fcntl.h>       // open, O_RDONLY
#include <capos/posix/sys/socket.h>  // socket, sendto, recvfrom
#include <capos/posix/unistd.h>      // read, write, close

int main() {
    // Works -- stdout is mapped to Console cap
    write(1, "hello\n", 6);

    // Works -- if the process was granted a root Directory cap
    int fd = open("/config.toml", O_RDONLY);
    char buf[4096];
    ssize_t n = read(fd, buf, sizeof(buf));
    close(fd);

    // Works -- if NetworkManager cap was granted; TCP is not in v0
    int sock = socket(AF_INET, SOCK_DGRAM, 0);
    close(sock);
}

The linker pulls in libcapos-posix.a -> libcapos.a -> startup code. Same ELF output, same kernel loader.

musl as a Base (Optional, Later)

For broader C compatibility (printf, string functions, math), libcapos-posix can be layered under musl libc. musl has a clean syscall interface – all system calls go through a single __syscall() function. Replacing that function with capability-based dispatch gives you full libc on top of capOS capabilities:

// musl's syscall entry point -- we replace this
long __syscall(long n, ...) {
    switch (n) {
        case SYS_write: return capos_write(fd, buf, len);
        case SYS_open:  return capos_open(path, flags, mode);
        case SYS_socket: return capos_socket(domain, type, protocol);
        // ...
        default: return -ENOSYS;
    }
}

This is the same approach Fuchsia uses with fdio + musl, and Redox OS uses with relibc. It works and it gives you printf, fopen, getaddrinfo, and most of the C standard library.

Priority: after native capos-rt and libcapos are stable. musl integration is a significant engineering effort and should only be done when there’s actual software to port.

Part 5: WASI Host Adapter

Note: the full design lives in WASI Host Adapter proposal and the implementation decomposition in WASI Host Adapter. The sketch below remains for context; the dedicated proposal is the source of truth for runtime selection (wasmi for v0; wasmtime / WAMR as W.7+ migration), capability-mapping surface, per-instance CapSet plumbing, phase decomposition, and open questions.

Why WASI Fits capOS Better Than POSIX

WASI (WebAssembly System Interface) was designed from the start as a capability-based system interface. Its concepts map almost directly to capOS:

WASI conceptcapOS equivalent
fd (pre-opened directory)Namespace cap
fd (socket)TcpSocket/UdpSocket cap
fd_write on stdoutConsole.write()
Pre-opened dirs at startupCapSet at spawn
No ambient filesystem accessNo ambient authority
path_open scoped to pre-opened dirnamespace.resolve() scoped to granted prefix

WASI programs already assume they get no ambient authority. A WASI binary compiled for capOS still needs a host adapter, but the security model is closer to capOS than POSIX because preopened handles are explicit.

Architecture: Wasm Runtime as a capOS Service

WASI binary (.wasm)
  │
  │  WASI syscalls (fd_read, fd_write, path_open, ...)
  │
  v
wasm-runtime process (Wasmtime/wasm-micro-runtime, native capOS binary)
  │
  │  Translates WASI calls to capability invocations
  │  Each wasm instance gets its own CapSet
  │
  v
libcapos (native capability invocation)
  │
  v
Kernel

The wasm runtime is itself a native capOS process. It receives caps from its parent and partitions them among the wasm modules it hosts. This gives you:

  • Language independence. Any language with a useful WASI target can be evaluated through the same host adapter.
  • Extra sandboxing. Wasm memory isolation combines with capOS capability scoping.
  • Less porting effort for software that already targets WASI, assuming its required imports are implemented by the host adapter.
  • Density. Multiple wasm modules in one process, each with different caps

WASI vs Native Performance

Wasm adds overhead: bounds-checked memory, indirect calls, and host-call marshalling. For foundational system services, native Rust remains the default choice until there is a concrete reason to choose otherwise. For application code and portable tools, the sandboxing and reuse may be worth the overhead.

WASI Implementation Phases

The current shipped state is owned by WASI Host Adapter and WASI Host Adapter proposal; the phase status summary below is a pointer, not the source of truth.

Phase W.0 (planning, closed): runtime decision recorded as wasmi for v0; WAMR / wasmtime are W.7+ migration candidates. The earlier “wasm-micro-runtime as a C binary via libcapos” sketch is superseded by wasmi-as-a-Rust-crate inside the standalone capos-wasm/ package. Cross-cutting Open Questions §1 (per-instance vs per-process) and §3 (poll_oneoff semantics over the capOS ring) resolved 2026-05-13 16:46 UTC: one wasm instance per capos-wasm process, and poll_oneoff stays ERRNO_NOSYS in v0 with subscription kinds extended one at a time through W.5/W.6 against a single blocking cap_enter.

Phase W.1 (host scaffold, closed 2026-05-05 19:12 UTC): capos-wasm/ standalone userspace crate over vendored wasmi 1.0.9 (vendor/wasmi-no_std/wasmi-1.0.9/); make capos-wasm-build.

Phase W.2 (Preview 1 stdout-only, closed 2026-05-07 10:53 UTC): wasm-host userspace binary, empty-instantiation smoke (make run-wasm-host), Preview 1 stdout-only import resolver (args_get / environ_get empty, clock_time_get(MONOTONIC), proc_exit, fd_write(1, …) / fd_write(2, …); everything else including random_get returns ERRNO_NOSYS), manifest-payload load path through an optional BootPackage cap, Rust hello, wasi (make run-wasi-hello-rust), and C hello, wasi (make run-wasi-hello-c).

Phase W.3 (per-instance argv grant, closed 2026-05-07 18:25 UTC): bounded initConfig.init.wasiArgs text grant on top of the existing manifest CapSet, validated against WASI_ARGS_MAX_COUNT = 32, WASI_ARGS_MAX_ARG_BYTES = 4096, and WASI_ARGS_MAX_TOTAL_BYTES = 8192. The wasm-host installs the bundle on HostState before instantiation, and Preview 1 args_get / args_sizes_get reflect it. make run-wasi-cli-args is the end-to-end proof. A 2026-05-13 follow-up adds the same bounded-text shape for initConfig.init.wasiEnv (WASI_ENV_MAX_COUNT = 32, WASI_ENV_MAX_ENTRY_BYTES = 4096, WASI_ENV_MAX_TOTAL_BYTES = 8192) with make run-wasi-env and make wasi-env-negative-check.

Phase W.4 (random_get production + clocks production-ready, closed 2026-05-07 20:09 UTC): Preview 1 random_get routed through the kernel EntropySource cap when the manifest grants it, chunked at the cap’s MAX_ENTROPY_FILL_BYTES = 64 ceiling and capped per Preview 1 invocation at RANDOM_GET_MAX_BYTES = 65_536 bytes; ungranted variant refuses with ERRNO_NOSYS = 52. make run-wasi-random and make run-wasi-random-ungranted are the granted/ungranted proofs. clock_time_get(CLOCKID_REALTIME) keeps returning ERRNO_NOSYS until a typed WallClock cap exists. A 2026-05-13 compatibility-import slice promotes authority-free Preview 1 imports (clock_res_get(MONOTONIC), sched_yield, stdio fd_fdstat_get metadata, stdio fd_seek returning ERRNO_SPIPE) through make run-wasi-stdio-fd. make run-wasi-preview1-refusals keeps representative blocked storage and socket imports failed closed with ERRNO_NOSYS = 52.

Phase W.5 (filesystem against Namespace / File / Store, blocked): waits on the storage cap surface from Storage and Naming proposal. Until then, make run-wasi-preview1-refusals is the refusal evidence.

Phase W.6 (sockets against TcpSocket / UdpSocket, blocked): waits on a userspace network stack process (or an interim Fetch / HttpEndpoint shim) from Networking proposal. Same refusal evidence as W.5 in the interim.

Phase W.7 (Preview 2 / Component Model + wasmtime migration, blocked): waits on the std-userspace decision (same blocker as the capnp-rpc remote-session rewrite). When it lands, WIT resources map to typed OwnedCapability<T> slots in the host adapter and the schema gains the Component Model resource bridging variants.

Phase W.8 (TinyGo / Go-on-WASI CUE evaluator, blocked): waits on the same std-userspace decision; native GOOS=capos remains the path for full Go runtime semantics.

Part 6: Putting It All Together – Porting Strategy

Spectrum of Integration

Most native                                              Most compatible
     |                                                          |
     v                                                          v
Native Rust    C with libcapos    POSIX adapter         WASI binary
(capos-rt)     (typed caps)       (libcapos-posix)      (wasm runtime)

- Best perf     - Good perf        - Familiar API        - Any language
- Full cap      - Full cap         - Auto sandboxing     - Auto sandboxing
  control         control            via cap scoping       via wasm + caps
- Most work     - Moderate work    - Less rewrite        - Less rewrite
  to write        to write           for existing C        for WASI targets

Example: Porting a DNS Resolver

Native Rust: Rewrite using capos-rt. Receives UdpSocket cap, serves DNS lookups as a DnsResolver capability. Other processes get a DnsResolver cap instead of calling getaddrinfo(). Clean, typed, minimal authority.

C with POSIX adapter: Take an existing DNS resolver (e.g., musl’s getaddrinfo implementation or a standalone resolver). Compile against libcapos-posix. Give it a UdpSocket cap and a Namespace cap for /etc/resolv.conf. It calls socket(), sendto(), recvfrom() – all translated to cap invocations. Works with minimal changes, but can’t export a typed DnsResolver cap (it speaks POSIX, not caps).

WASI: Compile a Rust DNS resolver to WASI. Run it in the wasm runtime. Same capability scoping, but through the wasm sandbox.

  1. Foundational services: native Rust by default. Drivers, network stack, store, and init are the foundation and should use capabilities natively unless a concrete reviewed reason justifies another runtime.

  2. First applications: native Rust. While the ecosystem is young, applications should use capos-rt directly. This validates the cap model.

  3. C compatibility: when porting specific software. Do not build the POSIX adapter speculatively. Build it when there is a specific C program to port (e.g., a DNS resolver, an HTTP server, a database). Let real porting needs drive which POSIX functions to implement.

  4. WASI: as the general-purpose application runtime. Once the native runtime is stable, the wasm runtime becomes the “run anything” answer. Lower priority than native Rust, but higher priority than full POSIX/musl compat, because WASI’s capability model is a natural fit.

Part 7: Schema Extensions

New schema types needed for the userspace runtime:

# Extend schema/capos.capnp

struct InitialCaps {
    entries @0 :List(InitialCapEntry);
}

struct InitialCapEntry {
    name @0 :Text;
    id @1 :UInt32;
    interfaceId @2 :UInt64;
}

interface ProcessSpawner {
    spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}

struct CapGrant {
    name @0 :Text;
    capId @1 :UInt32;
    interfaceId @2 :UInt64;
}

interface ProcessHandle {
    wait @0 () -> (exitCode :Int64);
}

These definitions now live in schema/capos.capnp as the single source of truth. spawn() returns the ProcessHandle through the ring result-cap list; handleIndex identifies that transferred cap in the completion. The first slice passes a boot-package binaryName instead of raw ELF bytes so spawn requests stay inside the bounded ring parameter buffer; manifest-byte exposure and bulk-buffer spawning remain later work. kill, post-spawn grants, and exported-cap lookup are deferred until their lifecycle semantics are implemented.

Implementation Status And Future Phases

Implemented Baseline: capos-rt

  • capos-rt/ exists as a standalone no_std + alloc runtime crate.
  • capos-rt owns _start, heap initialization, panic output, raw syscall wrappers, bootstrap validation, CapSet parsing, the entry-point macro, the single-owner ring client, typed clients, result-cap adoption, and owned handle release.
  • init/, shell/, demos/, and the runtime smoke binary build for targets/x86_64-unknown-capos.json.
  • QEMU proofs cover typed Console calls, exception decoding, spawn/wait, runtime VirtualMemory, Timer, ThreadControl, ThreadSpawner, ThreadHandle, terminal sessions, and release behavior.

Deliverable: completed. See Userspace Runtime and Programming Languages for current validation.

Future Phase: broader generated/native clients

  • Add generated clients after the schema surface stabilizes.
  • Preserve the existing split where capos-rt owns transport lifetime and interface-specific wrappers own message encoding.
  • Establish the out-of-tree service-binary packaging pattern once the internal userspace target contract is stable.

Deliverable: ordinary native capOS services can depend on generated typed clients without copying runtime transport logic.

libcapos for C – Phase 0 closed

  • extern "C" API exposing capos_cap_call, capos_capset_get, capos_sys_exit, capos_sys_cap_enter, capos_console_write_line, capos_timer_now, capos_entropy_fill, capos_virtual_memory_*, capos_process_spawner_create_pipe, capos_pipe_read, capos_pipe_write, capos_pipe_close, and malloc/free/calloc/ realloc heap shims over the capos-rt global allocator.
  • Public header at libcapos/include/capos/capos.h.
  • Build system: make libcapos produces libcapos/target/x86_64-unknown-capos/release/libcapos.a; make c-hello and make c-pipe link native C smokes with clang + lld using the shared demos/linker.ld.
  • C “hello world” smoke at demos/c-hello/main.c calls Console.writeLine through capos_console_write_line, exercises Timer, EntropySource, and VirtualMemory typed wrappers, verifies capos_cap_call rejects a bootstrap ThreadSpawner cap locally, and exits cleanly. make run-c-hello boots system-c-hello.cue and the smoke greps for the [c-hello] hello from c-hello, entropy, VM, and ThreadSpawner rejection markers plus the kernel process N exited with code 0 line.
  • Native C pipe smoke at demos/c-pipe/main.c uses capos_process_spawner_create_pipe, writes and reads native-c-pipe-marker through typed Pipe wrappers, closes the write end, observes EOF, and exits cleanly. make run-c-pipe boots system-c-pipe.cue and checks the create, read, EOF, and clean-exit markers.

Deliverable: complete – C binary boots, calls Console.writeLine, and exits cleanly through capos_sys_exit.

Deferred to later libcapos phases: generated typed wrappers per interface, transferred result-cap propagation across the C ABI, per-thread routing of the runtime ring, and a libcapos-posix layer.

Future Phase: POSIX compatibility adapter

  • Implement FdTable and path resolution
  • Start with file I/O (open/read/write/close over Namespace + Store)
  • Add socket wrappers when networking is userspace
  • Optionally integrate musl for full libc

Deliverable: an existing C program (e.g., a simple HTTP server) runs on capOS with minimal source changes.

WASI runtime (partially landed)

The WASI host adapter is its own track owned by docs/proposals/wasi-host-adapter-proposal.md and docs/proposals/wasi-host-adapter-proposal.md. Phase decomposition:

  • W.1 (host scaffold; landed 2026-05-05 19:12 UTC): capos-wasm/ standalone crate over vendored wasmi 1.0.9 (vendor/wasmi-no_std/wasmi-1.0.9/), make capos-wasm-build.
  • W.2 (Preview 1 stdout-only; closed 2026-05-07 10:53 UTC): wasm-host userspace binary, make run-wasm-host empty-instantiation smoke, Preview 1 stdout-only import resolver, manifest-payload load path, Rust hello, wasi smoke (make run-wasi-hello-rust), and C hello, wasi smoke (make run-wasi-hello-c). Capabilities backing the host imports today: Console + Timer + BootPackage. v0 chose wasmi-as-Rust-crate over wasm-micro-runtime-as-C-binary; wasmtime / WAMR remain W.7+ migration candidates.
  • W.3 (per-instance CapSet plumbing + LaunchParameters) closed 2026-05-07 18:25 UTC.
  • W.4 (random_get against the in-tree EntropySource cap, plus clocks production-ready) closed 2026-05-07 20:09 UTC.
  • 2026-05-13 compatibility/refusal smokes: make run-wasi-stdio-fd proves promoted authority-free imports no longer return ERRNO_NOSYS; make run-wasi-preview1-refusals keeps storage and socket imports failed closed without authority.
  • W.5 (filesystem against Namespace/File/Store), W.6 (sockets against TcpSocket/UdpSocket), and W.7+ (Preview 2 / Component Model) remain future phases.

Deliverable status: hello.wasm runs on capOS today (both Rust and C payloads), argv and entropy grants are implemented, and authority-free stdio fd compatibility imports are covered by a direct smoke. Filesystem/socket phases are queued behind their authority surfaces.

Open Questions

  1. Allocator strategy. Should the userspace heap be a fixed-size region (simple, but limits memory), or should it grow by invoking a FrameAllocator cap (flexible, but every allocation might syscall)? Likely answer: fixed initial region + grow-on-demand via cap.

  2. Async I/O. The SQ/CQ ring is inherently asynchronous (submit SQEs, poll CQEs), but the initial capos-rt wrappers provide blocking convenience (submit one CALL SQE + cap_enter(1, MAX)). Real services need batched async patterns. Options:

    • Submit multiple SQEs, poll CQEs in an event loop (io_uring style)
    • Runtime green threads or tasks multiplexed through one ring dispatcher; the 7.1 threading contract keeps at most one blocked cap_enter waiter per process ring until a sharded or per-thread ring ABI exists
    • Userspace executor (like tokio) driving the ring
  3. Cap passing in the POSIX adapter. POSIX has SCM_RIGHTS for passing fds over Unix sockets. Should the POSIX adapter support something similar for passing caps? Or is this native-only?

  4. Dynamic linking. Currently all binaries are statically linked. Should capOS support shared libraries? Probably not initially – static linking is simpler and the binaries are small. Revisit if binary size becomes a concern.

  5. WASI component model integration. WASI preview 2 components have typed imports/exports that could map to capnp interfaces. Should the wasm runtime auto-generate capnp-to-WIT adapters from schemas? This would let wasm components participate natively in the capability graph.

  6. Build system. How are userspace binaries packed into the boot image? Currently the Makefile builds init/ separately. With multiple service binaries, need a more scalable approach (build manifest that lists all binaries, Makefile target that builds and packs them all).

Relationship to Other Proposals

  • Service architecture proposal – defines what services exist and how they compose. This proposal defines how those service binaries are built, what runtime they use, and how non-Rust software fits in.
  • Storage and naming proposal – the POSIX open()/read()/write() translation targets the Store and Namespace caps defined there.
  • Networking proposal – the POSIX socket translation targets the TcpSocket/UdpSocket caps from the network stack.