Proposal: Capability-Based Binaries, Language Support, and Compatibility Adapters
How userspace binaries receive, use, and compose capabilities, from the native Rust runtime through future language runtimes and compatibility adapters.
Current State
The init binary (init/src/main.rs) and smoke services are no_std Rust
binaries over capos-rt. The runtime owns _start, fixed heap initialization,
CapSet parsing, exit/cap_enter syscall wrappers, typed clients, result-cap
adoption, queued release flushing, and panic output. Init reads the BootPackage
manifest, validates the metadata-only service graph, spawns child services
through ProcessSpawner, waits on ProcessHandles, and exits. The former raw
bootstrap syscall and demo-support runtime shims are historical; demo support
now keeps only low-level transport helpers for intentionally malformed SQE/CQE
smokes.
Userspace now has a checked-in targets/x86_64-unknown-capos.json custom
target that exposes target_os = "capos" while preserving the current static
ELF, soft-float, no_std baseline. The kernel remains on the repository default
x86_64-unknown-none target. init, demos, shell, and the capos-rt
smoke binary build through custom-target Cargo aliases, and checked-in CUE
manifests embed userspace from target/x86_64-unknown-capos/release paths.
The remaining future work is hardening this target contract into a broader
toolchain and packaging interface rather than treating it as a probe.
The kernel-side roadmap provides the capability ring (SQ/CQ shared memory plus
cap_enter, implemented), scheduling, and IPC. This proposal covers the
userspace half: what binaries look like, how they are built, and how existing
software can be adapted to a system with no ambient authority.
Part 1: Native Userspace Runtime (capos-rt)
The Historical Problem
Before capos-rt, every userspace binary had to:
- Define
_startand a panic handler - Set up an allocator
- Construct raw syscall wrappers
- Manually serialize/deserialize capnp messages
- Know the syscall ABI (register layout, method IDs)
That was acceptable for one proof-of-concept binary. It does not scale to
dozens of services, and the current tree has moved those mechanics into
capos-rt.
Solution: A Userspace Runtime Crate
capos-rt is a no_std + alloc Rust crate that every native capOS binary
depends on. It provides:
1. Entry point and allocator setup.
#![allow(unused)]
fn main() {
use capos_rt::{Console, ConsoleClient, Runtime};
fn service_main(mut runtime: Runtime) -> i64 {
let console = match runtime.capset().get_typed::<Console>(b"console") {
Ok(cap) => cap,
Err(_) => return 1,
};
let mut ring = match runtime.ring_client() {
Ok(ring) => ring,
Err(_) => return 2,
};
let mut client = ConsoleClient::new(console);
match client.write_line_wait(&mut ring, "Hello from capOS", u64::MAX) {
Ok(()) => 0,
Err(_) => 3,
}
}
capos_rt::entry_point!(service_main);
}
2. Syscall layer. Raw syscall asm wrapped in safe Rust functions.
The entire syscall surface is 2 calls – new operations are SQE opcodes, not
new syscalls:
sys_exit(code)– terminate the current thread; the process exits when this was its last live thread (syscall 1)sys_cap_enter(min_complete, timeout_ns)– flush pending SQEs, then wait until N completions are available or the timeout expires (syscall 2)
The accepted in-process threading contract preserves this two-syscall surface:
thread exit is available through both the raw terminal syscall and the typed
ThreadControl.exitThread capability call.
Capability invocations go through the per-process SQ/CQ ring. capos-rt
provides helpers for writing SQEs and reading CQEs:
#![allow(unused)]
fn main() {
/// Submit a CALL SQE to the capability ring and wait for the CQE.
pub fn cap_call(
ring: &mut CapRing,
cap_id: u32,
method_id: u16,
params: &[u8],
result_buf: &mut [u8],
) -> Result<usize, CapError> {
ring.push_call_sqe(cap_id, method_id, params);
sys_cap_enter(1, u64::MAX);
ring.pop_cqe(result_buf)
}
}
3. Cap’n Proto integration. The current runtime uses handwritten typed
clients over schema-defined method ids and message shapes. Shared generated
schema bindings live through capos-config; broad generated client bindings
for capos-rt remain future work. The runtime owns transport lifetime and
completion matching, while each typed client owns its interface-specific
message encoding.
4. CapSet – the initial capability environment.
At spawn time, the kernel writes the process’s initial capabilities into the
read-only CapSet page and passes its address to _start. capos-rt parses
this into a typed lookup surface over name, local CapId, and interface id.
#![allow(unused)]
fn main() {
struct CapEntry {
cap_id: u32, // authority-bearing slot in the process CapTable
interface_id: u64, // Cap'n Proto interface TYPE_ID for type checking
}
impl CapSet {
/// Get a typed capability by manifest name.
pub fn get_typed<T: CapabilityType>(
&self,
name: &[u8],
) -> Result<Capability<T>, CapSetError> { ... }
/// Iterate manifest-order entries for diagnostics and shell inspection.
pub fn iter(&self) -> impl Iterator<Item = CapSetEntryRef> { ... }
}
}
interface_id is not a handle. It is metadata carrying the Cap’n Proto
TYPE_ID for the interface expected by the typed client. The handle is
cap_id. A typed client constructor must check that
entry.interface_id == T::TYPE_ID, then store the local CapId. Normal CALL
SQEs do not need to repeat the interface ID because each capability table entry
exposes one public interface. The ring SQE keeps fixed-size reserved padding
for ABI stability, not a required interface field for the system transport.
This matters for the system transport because several capabilities can expose
the same interface while representing different authority: a serial console, a
log-buffer console, and a console proxy all have the Console TYPE_ID, but
different CapId values.
Crate Structure
capos-rt/
Cargo.toml # no_std + alloc, depends on capnp
build.rs # userspace linker arguments
src/
lib.rs # type markers, owned handles, entry_point! macro
entry.rs # _start, Runtime, bootstrap validation
syscall.rs # raw asm syscall wrappers
capset.rs # CapSet lookup and iteration helpers
client.rs # handwritten typed clients
ring.rs # single-owner ring client and completion matching
alloc.rs # userspace heap allocator setup
capos-rt is NOT a workspace member (same as init/ – needs different
target/linker handling from the kernel). It’s a path dependency for userspace
crates.
Init On The Current Runtime
init/src/main.rs is already a capos-rt user. Its init_main(Runtime) entry
is registered with capos_rt::entry_point!, obtains typed bootstrap caps from
the runtime CapSet, reads the BootPackage manifest, validates the service graph,
resolves spawn grants, launches children through ProcessSpawnerClient, waits
on ProcessHandleClient, and reports failures through the Console client.
Part 2: Capability-Based Binary Model
Binary Format
ELF64, same as now. The kernel’s ELF loader (kernel/src/elf.rs) already
handles PT_LOAD segments. No changes to the binary format itself.
What changed from the early prototype to the current runtime baseline is the ABI contract between kernel and binary:
| Aspect | Historical prototype | Current capos-rt baseline |
|---|---|---|
| Entry point | crate-local _start(), no args | runtime-owned _start(ring_addr, pid, capset_addr) |
| Syscall ABI | ad-hoc (rax=0 write, rax=1 exit) | SQ/CQ ring + sys_cap_enter + sys_exit |
| Capability access | none | read-only CapSet page validated by capos-rt |
| Serialization | none | Cap’n Proto messages encoded by typed clients |
| Allocator | none or crate-local | runtime-owned fixed heap |
Initial Capability Passing
The kernel communicates bootstrap state through _start arguments and fixed
userspace mappings. The implemented shape is:
ring_addr: the process capability ring, expected to equalRING_VADDR.pid: the process identifier for diagnostics/runtime bookkeeping.capset_addr: read-only bootstrap CapSet page populated from the manifest and spawn grants.
Earlier options considered:
Option A: Well-known page. Kernel maps a read-only page at a fixed virtual
address (e.g., 0x1000) containing a capnp-serialized InitialCaps message:
struct InitialCaps {
entries @0 :List(InitialCapEntry);
}
struct InitialCapEntry {
name @0 :Text;
id @1 :UInt32;
interfaceId @2 :UInt64;
}
Option B: Register convention. Pass pointer and length in rdi/rsi at
entry. Simpler, but the data still needs to live somewhere in user memory.
Option C: Stack. Push the cap descriptor onto the user stack before iretq.
Similar to how Linux passes auxv to _start.
Option A is cleanest – the page is always there, no calling-convention dependency, and it naturally extends to passing additional boot info later.
Service Binary Lifecycle
1. Kernel loads ELF, creates address space, populates cap table
2. Kernel maps InitialCaps page at well-known address
3. Kernel enters userspace at _start
4. capos-rt _start:
a. Initialize heap allocator
b. Parse InitialCaps page into CapSet
c. Call user's main(CapSet)
5. User main:
a. Extract needed caps from CapSet
b. Do work (invoke caps, serve requests)
c. Optionally export caps to parent once ProcessHandle export lookup exists
6. On return from main (or sys_exit):
a. Kernel destroys process
b. All caps in process's cap table are dropped
c. Parent's ProcessHandle receives exit notification
Part 3: Language Support Roadmap
The current manual status page for this subject is Programming Languages. This proposal owns the longer roadmap and should not be read as implemented support for every language listed below.
Implemented Baseline: Rust (no_std + alloc)
Rust is the only implemented booted language path. Native services use
#![no_std], alloc, capos-rt, static ELF binaries, and the
targets/x86_64-unknown-capos.json userspace target. This fits the current
kernel because it does not require a libc, dynamic linker, process environment,
global filesystem, or ambient socket namespace.
Rust remains the default implementation language for core capOS services until the runtime, schema, and packaging contracts are stable. That is a project priority, not a rule that every future service must be written in Rust.
Future: Rust std
Rust std support is not implemented. It requires an operating-system backend
for filesystem, networking, threads, time, standard I/O, process, environment,
and synchronization APIs. On capOS those APIs must get authority from granted
capabilities such as Directory, File, TcpSocket, Timer,
ThreadSpawner, ThreadControl, ParkSpace, StdIO, and ProcessSpawner.
The project has not selected whether Rust std should be implemented directly
over native capOS capabilities, through a POSIX compatibility adapter, or in a
hybrid form. Until that decision is made, native no_std + alloc Rust over
capos-rt remains the supported Rust path.
C via libcapos
The C substrate is in tree at Phase 0. The libcapos/ crate compiles to
libcapos.a, a thin Rust staticlib that exposes the capos-rt syscall, ring
CALL, CapSet lookup, and global allocator under an extern "C" ABI. C
binaries link statically against the archive, share the userspace ELF layout
used by Rust demos, and run inside the existing capos-rt _start chain.
make run-c-hello boots a C main() that calls Console.writeLine,
Timer.now, EntropySource.fill, and VirtualMemory wrappers through
libcapos and exits cleanly. make run-c-pipe boots a second native C smoke
that creates a kernel pipe through the typed ProcessSpawner.createPipe
wrapper, writes and reads a marker through typed Pipe wrappers, closes the
writer, observes EOF, and exits cleanly.
The current substrate is intentionally narrow: capability primitives,
hand-written typed wrappers (capos_console_write_line, capos_timer_now,
capos_entropy_fill, the capos_virtual_memory_{map,unmap,protect} trio,
capos_process_spawner_create_pipe, and capos_pipe_{read,write,close}),
raw syscalls, and the heap shim. The Pipe wrapper is a typed bridge over the
existing transferred-result-cap path; it does not make capos_cap_call() a
general transfer ABI, which still refuses transfer-bearing completions with
CAPOS_E_TRANSFER_NOT_SUPPORTED. Anything POSIX-shaped (errno, fd table,
open/read/write, signals, fork/exec, sockets) belongs in the separate
libcapos-posix layer above libcapos.
Generated typed wrappers for the remaining capabilities (NetworkManager,
Endpoint, etc.), a stable C ABI for cap-transfer (today the v0 surface
refuses transfer-bearing completions with CAPOS_E_TRANSFER_NOT_SUPPORTED),
and per-thread runtime routing are also future work. Until that routing or a
POSIX pthread layer lands, libcapos v0 is fail-closed for C-created capOS
threads: capos_cap_call rejects bootstrap ThreadSpawner capabilities with
CAPOS_E_THREADING_UNSUPPORTED, and concurrent or re-entrant runtime borrows
return CAPOS_E_RUNTIME_BUSY.
The target libcapos shape is a static library providing:
#include <capos.h>
// Ring-based capability invocation (synchronous wrapper around SQ/CQ ring)
int cap_call(cap_ring_t *ring, uint32_t cap_id, uint16_t method_id,
const void *params, size_t params_len,
void *result, size_t result_len);
// Typed wrappers (generated from .capnp schema)
int console_write(cap_t console, const void *data, size_t len);
int console_write_line(cap_t console, const char *text);
// CapSet access
cap_t capset_get(const char *name);
uint64_t capset_interface_id(const char *name);
// Syscalls (the entire syscall surface -- 2 calls total)
_Noreturn void sys_exit(int code); // terminate current thread
uint32_t sys_cap_enter(uint32_t min_complete, // flush SQEs + wait
uint64_t timeout_ns);
Implementation: libcapos is Rust compiled to a static .a with a C ABI
(#[no_mangle] extern "C"). The capnp message construction happens in Rust
behind the C API. This avoids requiring a C capnp implementation.
C binaries would link against libcapos.a and use the same static userspace
ELF model as Rust binaries. Startup, allocator setup, CapSet access, and ring
submission should be owned by libcapos, not repeated in every C program.
Future: C++
C++ support waits on the C substrate and explicit ABI decisions: exceptions, RTTI, TLS, allocator behavior, unwind policy, static initialization, and the scope of any standard-library subset. A freestanding arena/container subset is plausible earlier than hosted C++.
The previously inspected pg83/std library remains a later experiment, not a
shortcut to full C++ support. Its low-level arena/container pieces are relevant;
its hosted/POSIX assumptions still require the same capOS adapter work as other
C++ libraries.
Future: Go (GOOS=capos)
Go is the next high-priority runtime after regular Rust. It needs in-process threading, futex-like wait/wake, TLS/runtime metadata support, GC integration, and a network poller mapped to capOS capabilities. See Go Runtime for the dedicated plan.
Go has higher priority than C++ because it unlocks CUE and a large practical tooling/runtime ecosystem. Go via WASI may be useful for CPU-bound CUE evaluation before native Go exists, but it is not a substitute for native Go network services or full runtime behavior.
Future: Python
Python is not implemented on booted capOS. It has three plausible paths:
- Native CPython through a POSIX compatibility adapter. This depends on the C/libc substrate plus file, stdio, timer, networking, and process adapters. It is the likely path for trusted system scripts and Python tools that need capOS storage or networking.
- MicroPython through the native C substrate. This is a smaller early scripting option with less runtime surface than CPython.
- WASI or Emscripten-hosted Python. This is useful for sandboxed or compute-oriented Python. It still runs a Python interpreter; WebAssembly is the sandbox and host ABI, not a way to avoid Python runtime work.
As of this review, upstream CPython support helps only the WebAssembly path:
PEP 11 lists
wasm32-unknown-wasip1 as Tier 2 and wasm32-unknown-emscripten as Tier 3,
and PEP 776 records Emscripten support
for Python 3.14. Those facts do not provide native capOS bindings for files,
sockets, threads, process launch, or capabilities.
Future: Lua
Lua is a future capability-scoped scripting runner. The dedicated
Lua Scripting proposal defines capos-lua as an
ordinary userspace process with exact grants, curated standard libraries,
unforgeable capability userdata, and no raw CapIds exposed to scripts. Upstream
PUC Lua is a C implementation, so the native path waits on the C/libcapos
substrate unless the project uses a pure-Rust Lua-like VM as a bootstrap proof.
Future: JavaScript / TypeScript
JavaScript support means running an engine as an ordinary capOS process. A small QuickJS-style native runner is the likely first experiment after C support. V8 or SpiderMonkey are much larger C++ runtime ports. TypeScript is normally compiled before execution and should not imply a kernel or base-system TypeScript compiler.
Partially landed: WASI and WebAssembly
The WASI host adapter Phase W.4 closed 2026-05-07 20:09 UTC
(docs/proposals/wasi-host-adapter-proposal.md,
docs/proposals/wasi-host-adapter-proposal.md). Languages that compile to WASI
Preview 1 can now run on capOS through the wasm-host process
(capos-wasm/, vendored wasmi 1.0.9), with imports backed by
granted capOS capabilities. The current Preview 1 surface covers
stdout/stderr writes, manifest-granted argv, bounded manifest-granted
environment entries through initConfig.init.wasiEnv,
monotonic clock time/resolution, no-op sched_yield, stdio fd
metadata, stdio seek refusal as ERRNO_SPIPE, clean shutdown, and
random_get when the manifest grants EntropySource. The regression
smokes are make run-wasi-hello-rust (Rust wasm32-wasip1 payload),
make run-wasi-hello-c (C wasm32-wasi payload),
make run-wasi-cli-args, make run-wasi-env, make run-wasi-random,
make run-wasi-random-ungranted, and make run-wasi-stdio-fd.
Filesystem (W.5), sockets (W.6), and Preview 2 / Component Model
(W.7+) remain future phases; make run-wasi-preview1-refusals
keeps proving representative blocked storage/socket imports return
ERRNO_NOSYS = 52 without authority.
Important distinction: WASI works differently for compiled vs. interpreted languages:
- Compiled languages (Rust, C) compile directly to
.wasm— no interpreter in the loop. WASI is a clean, efficient execution path. - Interpreted languages (Python, JS, Lua) still need their interpreter
(CPython, QuickJS, etc.) — it’s just compiled to
.wasminstead of native code. The stack becomes: script → interpreter.wasm → WASI runtime → kernel. You pay for a wasm sandbox layer on top of the interpreter you’d need anyway.
For interpreted languages, WASI sandboxing is valuable when running untrusted plugins or user-submitted scripts. For trusted system scripts, native CPython, QuickJS, or Lua over a POSIX or capability-native adapter may be simpler and faster once the native C substrate exists.
Future: Managed Runtimes
Languages with large managed runtimes such as Java and .NET need their runtime ported or a WASI-style host path. This is large effort and low priority.
Part 4: POSIX Compatibility Adapter
Status note: the full design lives in POSIX Adapter proposal and the implementation decomposition in POSIX Adapter, which are the canonical source for phase status. Phases P1.1 (libcapos C-substrate v0 + C hello smoke, closed
2026-05-05 13:28 UTC), P1.2 Phase A (UDP cap surface +capos-rtUdpSocketClient, closed2026-05-05 18:02 UTC), P1.2 Phase B (kernel UDP path,libcapos-posixcrate,dns.cvendoring, demo + manifest, closed2026-05-05 21:21 UTC), and P1.3 (Pipe cap + recording-shim fork-for-exec +posix_spawnsuccessor, closed2026-05-07 09:55 UTC) have landed. The remaining open phase is the dash port successor (Task 4). The Namespace + File cap surface from Storage and Naming proposal has landed far enough for the v0 smoke; current POSIX-adapter work is now dash vendoring/patching, the multi-translation-unit C build, and therun-posix-shell-smokeharness. The signal/time stub slice is closed bymake run-posix-signal-time. The sketch below remains for context; the dedicated proposal and plan are the source of truth for FdTable shape, supported-function matrix, and open questions.
Why POSIX at All?
capOS is not POSIX and doesn’t want to be. But:
-
Existing software. Most useful software assumes POSIX. A DNS resolver, an HTTP server, a database – all speak
open()/read()/write()/socket(). Without an adapter, every piece of software must be rewritten. -
Developer familiarity. Programmers know POSIX. A compatibility adapter lowers the barrier to writing capOS software, even if native caps are better.
-
Gradual migration. Port software first with POSIX-shaped APIs, then incrementally convert to native capabilities for tighter sandboxing.
The goal is not full POSIX compliance. It is a pragmatic adapter that maps selected POSIX concepts to capabilities so existing software can run with bounded modification while preserving capability-based authority.
Architecture: libcapos-posix
Application (C/Rust, uses POSIX APIs)
│
│ open(), read(), write(), socket(), ...
│
v
libcapos-posix (POSIX-to-capability adapter)
│
│ Maps fds to caps, paths to granted directory/namespace lookups
│
v
libcapos (native capability invocation)
│
│ SQ/CQ ring + cap_enter syscall
│
v
Kernel (capability dispatch)
libcapos-posix is a static library that provides POSIX-like function
signatures over granted capabilities. It is not an authority source and should
not be described as “Linux compatibility.” A process without file/directory
authority cannot open files; a process without socket authority cannot create
sockets; a process without launcher or spawner authority cannot create
children.
Current v0 surface (shipped as libcapos-posix.a alongside
libcapos.a; see libcapos-posix/ and the canonical
POSIX Adapter proposal):
- Static-array fd table with a 32-fd cap (P1.2 Phase A decision §5).
- Single-thread
__errno_location()TLS cell (P1.2 Phase A decision §4). socket(AF_INET, SOCK_DGRAM, 0)/sendto/recvfrom/closeover the kernelUdpSocketcapability (P1.2 Phase B).pipe/read/write/dup/dup2/closeover the kernelPipecapability viaProcessSpawner.createPipe(P1.3).fork/execve/waitpid/_exit/posix_inherit_stdiovia the recording-shimProcessSpawner.spawnMove-grant path (P1.3 §6 decision: Variant A).fork()returns 0 unconditionally and opens a TLS recording window;dup2()/close()between fork and execve record into the window;execve()drains the recording intostdio_<N>spawn grants and returns the synthetic child pid (a deliberate v0 deviation from POSIX).- Direct
posix_spawn/posix_spawn_file_actions_init/_destroy/_adddup2/_addcloseover the same Move-grant action-replay code path;argv/envpare accepted but ignored until aLaunchParameterssurface lands. open/read/write/close/lseekover the bootstrap rootDirectoryand mintedFilecaps;opendir/readdir/closedirover mintedDirectorycaps.- Console/Terminal stdio adoption, focused
printf/ string / ctype helpers, manifest-backedgetenv/setenv/putenv/unsetenv, and single-identitygetpid/getuid/getgidstubs. clock_gettime(CLOCK_MONOTONIC, ...)/gettimeofday(&tv, NULL)/time/nanosleep/sleepover the kernelTimercapability.signal/sigactionstore handlers without delivery;killandraisefail closed until typed process-control authority exists.
C headers ship under libcapos-posix/include/capos/posix/ (errno.h,
dirent.h, fcntl.h, signal.h, spawn.h, stdio.h, stdlib.h,
string.h, sys/socket.h, sys/wait.h, time.h, unistd.h, and focused
subsets such as ctype.h). libcapos-posix reuses libcapos’s installed
Runtime through the renamed extern crate libcapos_::runtime::with(...) to
avoid colliding with libcapos’s C-side capos_* exports.
Not yet implemented for the dash-port successor: file metadata/remove
calls such as stat / fstat / access / unlink, TCP socket wrappers,
select / poll / epoll, real asynchronous signal delivery, job control,
chdir / cwd-relative path resolution, and broad FILE * stream semantics.
These remain on the dash port successor track (Task 4 of
docs/proposals/posix-adapter-proposal.md) or later typed-authority work.
File Descriptor Table
POSIX programs think in file descriptors. capOS has capabilities. The
implemented v0 translation is a fixed 32-slot per-process fd table inside
libcapos-posix. Slots may be backed by Console, UDP socket, Pipe, File,
Directory, TerminalSession, or a moved-out sentinel used by the recording-shim
execve() path.
Fd 0/1/2 are initialized only from explicit authority:
stdio_<N>Pipe grants seeded by a parent spawn action take precedence.- A bootstrap
TerminalSessioncap may adopt empty stdio slots when the program callsposix_inherit_stdio(). - A bootstrap
Consolecap fills empty fd 1 and fd 2 for simple smokes. - Fd 0 stays closed unless the process received pipe or terminal input authority.
Path Resolution
POSIX open("/etc/config.toml", O_RDONLY) becomes:
libcapos-posixlooks up the bootstrap-granted rootDirectorycap namedroot.- It rejects relative paths,
.., and non-UTF-8 or oversized path segments. - It walks intermediate components with
Directory.sub(). - It opens the leaf with
Directory.open()orDirectory.sub(). - It installs a File or Directory fd slot with per-fd position / iteration state.
The future Namespace + Store resolver remains documented in the POSIX adapter
proposal, but the shipped v0 dash-port proof uses the RAM-backed root
Directory capability because that is the implemented kernel authority.
Supported POSIX Functions
Grouped by what capability backs them:
Console cap -> stdio:
| POSIX | capOS translation |
|---|---|
write(1, buf, len) | console.write(buf[..len]) |
write(2, buf, len) | console.write(buf[..len]) (or log cap) |
read(0, buf, len) | Pipe or TerminalSession-backed stdin when granted |
Directory + File caps -> file I/O:
| POSIX | capOS translation |
|---|---|
open(path, flags) | root Directory walk -> Directory.open() -> fd |
read(fd, buf, len) | File.read(offset, len) using per-fd position |
write(fd, buf, len) | File.write(offset, bytes) using per-fd position |
close(fd) | drop/release the backing cap slot |
lseek(fd, off, whence) | update per-fd file position |
opendir/readdir/closedir | Directory.list() plus per-fd iteration |
Pipe + ProcessSpawner caps -> subprocess I/O:
| POSIX | capOS translation |
|---|---|
pipe(fds) | ProcessSpawner.createPipe() -> two Pipe-backed fds |
fork() + execve() | recording shim -> ProcessSpawner.spawn() |
posix_spawn() | direct action replay -> ProcessSpawner.spawn() |
waitpid(pid, &status, 0) | ProcessHandle.wait() |
UdpSocket caps -> networking:
| POSIX | capOS translation |
|---|---|
socket(AF_INET, SOCK_DGRAM, 0) | NetworkManager.createUdpSocket() -> fd |
sendto / recvfrom | UdpSocket.sendTo() / UdpSocket.recvFrom() |
close(fd) | release the owned UdpSocket cap |
Timer + local stubs:
| POSIX | capOS translation |
|---|---|
clock_gettime / gettimeofday / time | Timer.now() |
nanosleep / sleep | Timer.sleep() |
signal / sigaction | store handler locally, never deliver |
kill / raise | validate signal number, then fail closed |
Not supported or still partial:
| POSIX | Why not |
|---|---|
bare fork() state cloning | No address space cloning; only fork-for-exec is recorded |
in-place exec() replacement | Spawn creates a fresh process |
| real signal delivery / job control | Needs typed process-control and terminal authority |
chmod/chown | No permission bits. Authority is structural |
mmap(MAP_SHARED) | No shared memory yet (future: SharedMemory cap) |
ioctl | No device files. Use typed capability methods |
ptrace | No debugging interface yet |
select/poll/epoll | Requires async cap invocation (Stage 5+). Initial version is blocking only |
Process Creation Compatibility
capOS process creation is spawn-style, not fork/exec-style. A new process is a
fresh ELF instance selected by ProcessSpawner, with an explicit initial
CapSet assembled from granted capabilities. The parent address space is not
cloned, and an existing process image is not replaced in place.
posix_spawn() is the compatibility primitive for subprocess creation.
libcapos-posix (P1.3, closed 2026-05-07 09:55 UTC) maps it to
ProcessSpawner.spawn(), translates posix_spawn_file_actions into
fd-table setup and Move-grant stdio_<N> capability grants on the
spawn ABI. argv / envp are accepted but ignored until a
LaunchParameters surface lands. make run-posix-spawn-smoke is the
end-to-end proof.
Full fork() is intentionally not a native kernel primitive. Supporting it
would require copy-on-write address-space cloning, parent/child register return
semantics, fd-table duplication, a per-capability inheritance policy, safe
handling for outstanding SQEs/CQEs, and defined behavior for endpoint calls,
timers, waits, and process handles that are in flight at the fork point.
Threaded POSIX processes add another constraint: only the calling thread is
cloned, while locks and async-signal-safe state must remain coherent in the
child.
P1.3 also shipped a narrow recording-shim fork() for the common
fork-for-exec pattern that does not require general address-space
cloning. fork() returns 0 unconditionally and opens a TLS recording
window; dup2() / close() between fork and execve record into the
window without mutating the parent fd table; execve() drains the
recording into Move-grant stdio_<N> spawn grants and returns the
synthetic child pid as its own return value. The pseudo-child branch
is still the parent process, so a failed execve() MUST NOT call
_exit() – it must surface the error to the parent’s normal error
path. The user pattern is pid_t child = fork(); if (child == 0) { dup2(); close(); child = execve(...); } /* parent flow */. Earlier
iterations used x86_64 setjmp/longjmp to fake fork-return-twice;
that was replaced because longjmp back into fork()’s already-
returned stack frame was undefined behaviour. make run-posix-pipe-smoke
is the end-to-end proof.
make run-posix-dns-smoke exercises socket(AF_INET, SOCK_DGRAM, 0) /
sendto / recvfrom against the kernel UdpSocket capability through
a hand-rolled DNS A query in demos/posix-dns-resolver/. The current
smoke does not compile the vendored dns.c whole because the v0
libcapos-posix POSIX surface is narrower than dns.c expects
(poll.h, netinet/in.h, arpa/inet.h, netdb.h, sys/select.h,
sys/un.h); widening that surface is follow-on work on the dash port
track.
Security Model
The POSIX compatibility adapter does not weaken capability security. Every POSIX call translates to a capability invocation on caps the process was actually granted:
open("/etc/passwd")fails if the process lacks a bootstraprootDirectorycap or that directory tree does not containetc/passwd– not because of permission bits, but because no granted authority resolves the path.socket(AF_INET, SOCK_DGRAM, 0)fails if the process was not granted aNetworkManagercap; TCP stream wrappers remain future work.fork()only opens the recording window for the supported fork-for-exec pattern; bare address-space cloning remains unsupported.
A POSIX binary on capOS is more constrained than on Linux, not less. The compatibility adapter provides familiar function signatures, not familiar authority.
Building POSIX-Compatible Binaries
my-app/
Cargo.toml # depends on capos-posix (which depends on capos-rt)
src/main.rs # uses libc-style APIs
Or for C:
#include <capos/posix/fcntl.h> // open, O_RDONLY
#include <capos/posix/sys/socket.h> // socket, sendto, recvfrom
#include <capos/posix/unistd.h> // read, write, close
int main() {
// Works -- stdout is mapped to Console cap
write(1, "hello\n", 6);
// Works -- if the process was granted a root Directory cap
int fd = open("/config.toml", O_RDONLY);
char buf[4096];
ssize_t n = read(fd, buf, sizeof(buf));
close(fd);
// Works -- if NetworkManager cap was granted; TCP is not in v0
int sock = socket(AF_INET, SOCK_DGRAM, 0);
close(sock);
}
The linker pulls in libcapos-posix.a -> libcapos.a -> startup code.
Same ELF output, same kernel loader.
musl as a Base (Optional, Later)
For broader C compatibility (printf, string functions, math), libcapos-posix
can be layered under musl libc. musl has a clean
syscall interface – all system calls go through a single __syscall() function.
Replacing that function with capability-based dispatch gives you full libc on
top of capOS capabilities:
// musl's syscall entry point -- we replace this
long __syscall(long n, ...) {
switch (n) {
case SYS_write: return capos_write(fd, buf, len);
case SYS_open: return capos_open(path, flags, mode);
case SYS_socket: return capos_socket(domain, type, protocol);
// ...
default: return -ENOSYS;
}
}
This is the same approach Fuchsia uses with fdio + musl, and Redox OS uses
with relibc. It works and it gives you printf, fopen, getaddrinfo, and
most of the C standard library.
Priority: after native capos-rt and libcapos are stable. musl integration is a significant engineering effort and should only be done when there’s actual software to port.
Part 5: WASI Host Adapter
Note: the full design lives in WASI Host Adapter proposal and the implementation decomposition in WASI Host Adapter. The sketch below remains for context; the dedicated proposal is the source of truth for runtime selection (wasmi for v0; wasmtime / WAMR as W.7+ migration), capability-mapping surface, per-instance CapSet plumbing, phase decomposition, and open questions.
Why WASI Fits capOS Better Than POSIX
WASI (WebAssembly System Interface) was designed from the start as a capability-based system interface. Its concepts map almost directly to capOS:
| WASI concept | capOS equivalent |
|---|---|
fd (pre-opened directory) | Namespace cap |
fd (socket) | TcpSocket/UdpSocket cap |
fd_write on stdout | Console.write() |
| Pre-opened dirs at startup | CapSet at spawn |
| No ambient filesystem access | No ambient authority |
path_open scoped to pre-opened dir | namespace.resolve() scoped to granted prefix |
WASI programs already assume they get no ambient authority. A WASI binary compiled for capOS still needs a host adapter, but the security model is closer to capOS than POSIX because preopened handles are explicit.
Architecture: Wasm Runtime as a capOS Service
WASI binary (.wasm)
│
│ WASI syscalls (fd_read, fd_write, path_open, ...)
│
v
wasm-runtime process (Wasmtime/wasm-micro-runtime, native capOS binary)
│
│ Translates WASI calls to capability invocations
│ Each wasm instance gets its own CapSet
│
v
libcapos (native capability invocation)
│
v
Kernel
The wasm runtime is itself a native capOS process. It receives caps from its parent and partitions them among the wasm modules it hosts. This gives you:
- Language independence. Any language with a useful WASI target can be evaluated through the same host adapter.
- Extra sandboxing. Wasm memory isolation combines with capOS capability scoping.
- Less porting effort for software that already targets WASI, assuming its required imports are implemented by the host adapter.
- Density. Multiple wasm modules in one process, each with different caps
WASI vs Native Performance
Wasm adds overhead: bounds-checked memory, indirect calls, and host-call marshalling. For foundational system services, native Rust remains the default choice until there is a concrete reason to choose otherwise. For application code and portable tools, the sandboxing and reuse may be worth the overhead.
WASI Implementation Phases
The current shipped state is owned by WASI Host Adapter and WASI Host Adapter proposal; the phase status summary below is a pointer, not the source of truth.
Phase W.0 (planning, closed): runtime decision recorded as wasmi
for v0; WAMR / wasmtime are W.7+ migration candidates. The earlier
“wasm-micro-runtime as a C binary via libcapos” sketch is superseded
by wasmi-as-a-Rust-crate inside the standalone capos-wasm/ package.
Cross-cutting Open Questions §1 (per-instance vs per-process) and §3
(poll_oneoff semantics over the capOS ring) resolved
2026-05-13 16:46 UTC: one wasm instance per capos-wasm process,
and poll_oneoff stays ERRNO_NOSYS in v0 with subscription kinds
extended one at a time through W.5/W.6 against a single blocking
cap_enter.
Phase W.1 (host scaffold, closed 2026-05-05 19:12 UTC):
capos-wasm/ standalone userspace crate over vendored wasmi 1.0.9
(vendor/wasmi-no_std/wasmi-1.0.9/); make capos-wasm-build.
Phase W.2 (Preview 1 stdout-only, closed 2026-05-07 10:53 UTC):
wasm-host userspace binary, empty-instantiation smoke
(make run-wasm-host), Preview 1 stdout-only import resolver
(args_get / environ_get empty, clock_time_get(MONOTONIC),
proc_exit, fd_write(1, …) / fd_write(2, …); everything else
including random_get returns ERRNO_NOSYS), manifest-payload load
path through an optional BootPackage cap, Rust hello, wasi
(make run-wasi-hello-rust), and C hello, wasi
(make run-wasi-hello-c).
Phase W.3 (per-instance argv grant, closed
2026-05-07 18:25 UTC): bounded initConfig.init.wasiArgs text
grant on top of the existing manifest CapSet, validated against
WASI_ARGS_MAX_COUNT = 32, WASI_ARGS_MAX_ARG_BYTES = 4096, and
WASI_ARGS_MAX_TOTAL_BYTES = 8192. The wasm-host installs the bundle
on HostState before instantiation, and Preview 1 args_get /
args_sizes_get reflect it. make run-wasi-cli-args is the
end-to-end proof. A 2026-05-13 follow-up adds the same bounded-text
shape for initConfig.init.wasiEnv (WASI_ENV_MAX_COUNT = 32,
WASI_ENV_MAX_ENTRY_BYTES = 4096, WASI_ENV_MAX_TOTAL_BYTES = 8192)
with make run-wasi-env and make wasi-env-negative-check.
Phase W.4 (random_get production + clocks production-ready,
closed 2026-05-07 20:09 UTC): Preview 1 random_get routed
through the kernel EntropySource cap when the manifest grants it,
chunked at the cap’s MAX_ENTROPY_FILL_BYTES = 64 ceiling and capped
per Preview 1 invocation at RANDOM_GET_MAX_BYTES = 65_536 bytes;
ungranted variant refuses with ERRNO_NOSYS = 52.
make run-wasi-random and make run-wasi-random-ungranted are the
granted/ungranted proofs. clock_time_get(CLOCKID_REALTIME) keeps
returning ERRNO_NOSYS until a typed WallClock cap exists.
A 2026-05-13 compatibility-import slice promotes authority-free
Preview 1 imports (clock_res_get(MONOTONIC), sched_yield, stdio
fd_fdstat_get metadata, stdio fd_seek returning ERRNO_SPIPE)
through make run-wasi-stdio-fd. make run-wasi-preview1-refusals
keeps representative blocked storage and socket imports failed closed
with ERRNO_NOSYS = 52.
Phase W.5 (filesystem against Namespace / File / Store,
blocked): waits on the storage cap surface from
Storage and Naming proposal. Until
then, make run-wasi-preview1-refusals is the refusal evidence.
Phase W.6 (sockets against TcpSocket / UdpSocket, blocked):
waits on a userspace network stack process (or an interim
Fetch / HttpEndpoint shim) from
Networking proposal. Same refusal evidence
as W.5 in the interim.
Phase W.7 (Preview 2 / Component Model + wasmtime migration,
blocked): waits on the std-userspace decision (same blocker as
the capnp-rpc remote-session rewrite). When it lands, WIT resources
map to typed OwnedCapability<T> slots in the host adapter and the
schema gains the Component Model resource bridging variants.
Phase W.8 (TinyGo / Go-on-WASI CUE evaluator, blocked): waits on
the same std-userspace decision; native GOOS=capos remains the path
for full Go runtime semantics.
Part 6: Putting It All Together – Porting Strategy
Spectrum of Integration
Most native Most compatible
| |
v v
Native Rust C with libcapos POSIX adapter WASI binary
(capos-rt) (typed caps) (libcapos-posix) (wasm runtime)
- Best perf - Good perf - Familiar API - Any language
- Full cap - Full cap - Auto sandboxing - Auto sandboxing
control control via cap scoping via wasm + caps
- Most work - Moderate work - Less rewrite - Less rewrite
to write to write for existing C for WASI targets
Example: Porting a DNS Resolver
Native Rust: Rewrite using capos-rt. Receives UdpSocket cap, serves
DNS lookups as a DnsResolver capability. Other processes get a
DnsResolver cap instead of calling getaddrinfo(). Clean, typed, minimal
authority.
C with POSIX adapter: Take an existing DNS resolver (e.g., musl’s
getaddrinfo implementation or a standalone resolver). Compile against
libcapos-posix. Give it a UdpSocket cap and a Namespace cap for
/etc/resolv.conf. It calls socket(), sendto(), recvfrom() – all
translated to cap invocations. Works with minimal changes, but can’t export
a typed DnsResolver cap (it speaks POSIX, not caps).
WASI: Compile a Rust DNS resolver to WASI. Run it in the wasm runtime. Same capability scoping, but through the wasm sandbox.
Recommended Approach for capOS
-
Foundational services: native Rust by default. Drivers, network stack, store, and init are the foundation and should use capabilities natively unless a concrete reviewed reason justifies another runtime.
-
First applications: native Rust. While the ecosystem is young, applications should use
capos-rtdirectly. This validates the cap model. -
C compatibility: when porting specific software. Do not build the POSIX adapter speculatively. Build it when there is a specific C program to port (e.g., a DNS resolver, an HTTP server, a database). Let real porting needs drive which POSIX functions to implement.
-
WASI: as the general-purpose application runtime. Once the native runtime is stable, the wasm runtime becomes the “run anything” answer. Lower priority than native Rust, but higher priority than full POSIX/musl compat, because WASI’s capability model is a natural fit.
Part 7: Schema Extensions
New schema types needed for the userspace runtime:
# Extend schema/capos.capnp
struct InitialCaps {
entries @0 :List(InitialCapEntry);
}
struct InitialCapEntry {
name @0 :Text;
id @1 :UInt32;
interfaceId @2 :UInt64;
}
interface ProcessSpawner {
spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}
struct CapGrant {
name @0 :Text;
capId @1 :UInt32;
interfaceId @2 :UInt64;
}
interface ProcessHandle {
wait @0 () -> (exitCode :Int64);
}
These definitions now live in schema/capos.capnp as the single source of
truth. spawn() returns the ProcessHandle through the ring result-cap list;
handleIndex identifies that transferred cap in the completion. The first
slice passes a boot-package binaryName instead of raw ELF bytes so spawn
requests stay inside the bounded ring parameter buffer; manifest-byte exposure
and bulk-buffer spawning remain later work. kill, post-spawn grants, and
exported-cap lookup are deferred until their lifecycle semantics are
implemented.
Implementation Status And Future Phases
Implemented Baseline: capos-rt
capos-rt/exists as a standaloneno_std + allocruntime crate.capos-rtowns_start, heap initialization, panic output, raw syscall wrappers, bootstrap validation, CapSet parsing, the entry-point macro, the single-owner ring client, typed clients, result-cap adoption, and owned handle release.init/,shell/,demos/, and the runtime smoke binary build fortargets/x86_64-unknown-capos.json.- QEMU proofs cover typed Console calls, exception decoding, spawn/wait, runtime VirtualMemory, Timer, ThreadControl, ThreadSpawner, ThreadHandle, terminal sessions, and release behavior.
Deliverable: completed. See Userspace Runtime and Programming Languages for current validation.
Future Phase: broader generated/native clients
- Add generated clients after the schema surface stabilizes.
- Preserve the existing split where
capos-rtowns transport lifetime and interface-specific wrappers own message encoding. - Establish the out-of-tree service-binary packaging pattern once the internal userspace target contract is stable.
Deliverable: ordinary native capOS services can depend on generated typed clients without copying runtime transport logic.
libcapos for C – Phase 0 closed
extern "C"API exposingcapos_cap_call,capos_capset_get,capos_sys_exit,capos_sys_cap_enter,capos_console_write_line,capos_timer_now,capos_entropy_fill,capos_virtual_memory_*,capos_process_spawner_create_pipe,capos_pipe_read,capos_pipe_write,capos_pipe_close, andmalloc/free/calloc/reallocheap shims over the capos-rt global allocator.- Public header at
libcapos/include/capos/capos.h. - Build system:
make libcaposproduceslibcapos/target/x86_64-unknown-capos/release/libcapos.a;make c-helloandmake c-pipelink native C smokes with clang + lld using the shareddemos/linker.ld. - C “hello world” smoke at
demos/c-hello/main.ccallsConsole.writeLinethroughcapos_console_write_line, exercises Timer, EntropySource, and VirtualMemory typed wrappers, verifiescapos_cap_callrejects a bootstrapThreadSpawnercap locally, and exits cleanly.make run-c-hellobootssystem-c-hello.cueand the smoke greps for the[c-hello] hello from c-hello, entropy, VM, and ThreadSpawner rejection markers plus the kernelprocess N exited with code 0line. - Native C pipe smoke at
demos/c-pipe/main.cusescapos_process_spawner_create_pipe, writes and readsnative-c-pipe-markerthrough typed Pipe wrappers, closes the write end, observes EOF, and exits cleanly.make run-c-pipebootssystem-c-pipe.cueand checks the create, read, EOF, and clean-exit markers.
Deliverable: complete – C binary boots, calls Console.writeLine, and
exits cleanly through capos_sys_exit.
Deferred to later libcapos phases: generated typed wrappers per
interface, transferred result-cap propagation across the C ABI,
per-thread routing of the runtime ring, and a libcapos-posix layer.
Future Phase: POSIX compatibility adapter
- Implement FdTable and path resolution
- Start with file I/O (open/read/write/close over Namespace + Store)
- Add socket wrappers when networking is userspace
- Optionally integrate musl for full libc
Deliverable: an existing C program (e.g., a simple HTTP server) runs on capOS with minimal source changes.
WASI runtime (partially landed)
The WASI host adapter is its own track owned by
docs/proposals/wasi-host-adapter-proposal.md and
docs/proposals/wasi-host-adapter-proposal.md. Phase decomposition:
- W.1 (host scaffold; landed
2026-05-05 19:12 UTC):capos-wasm/standalone crate over vendored wasmi 1.0.9 (vendor/wasmi-no_std/wasmi-1.0.9/),make capos-wasm-build. - W.2 (Preview 1 stdout-only; closed
2026-05-07 10:53 UTC): wasm-host userspace binary,make run-wasm-hostempty-instantiation smoke, Preview 1 stdout-only import resolver, manifest-payload load path, Rusthello, wasismoke (make run-wasi-hello-rust), and Chello, wasismoke (make run-wasi-hello-c). Capabilities backing the host imports today: Console + Timer + BootPackage. v0 chose wasmi-as-Rust-crate overwasm-micro-runtime-as-C-binary; wasmtime / WAMR remain W.7+ migration candidates. - W.3 (per-instance CapSet plumbing + LaunchParameters) closed
2026-05-07 18:25 UTC. - W.4 (
random_getagainst the in-treeEntropySourcecap, plus clocks production-ready) closed2026-05-07 20:09 UTC. - 2026-05-13 compatibility/refusal smokes:
make run-wasi-stdio-fdproves promoted authority-free imports no longer returnERRNO_NOSYS;make run-wasi-preview1-refusalskeeps storage and socket imports failed closed without authority. - W.5 (filesystem against
Namespace/File/Store), W.6 (sockets againstTcpSocket/UdpSocket), and W.7+ (Preview 2 / Component Model) remain future phases.
Deliverable status: hello.wasm runs on capOS today (both Rust
and C payloads), argv and entropy grants are implemented, and
authority-free stdio fd compatibility imports are covered by a direct
smoke. Filesystem/socket phases are queued behind their authority
surfaces.
Open Questions
-
Allocator strategy. Should the userspace heap be a fixed-size region (simple, but limits memory), or should it grow by invoking a FrameAllocator cap (flexible, but every allocation might syscall)? Likely answer: fixed initial region + grow-on-demand via cap.
-
Async I/O. The SQ/CQ ring is inherently asynchronous (submit SQEs, poll CQEs), but the initial
capos-rtwrappers provide blocking convenience (submit one CALL SQE +cap_enter(1, MAX)). Real services need batched async patterns. Options:- Submit multiple SQEs, poll CQEs in an event loop (io_uring style)
- Runtime green threads or tasks multiplexed through one ring dispatcher;
the 7.1 threading contract keeps at most one blocked
cap_enterwaiter per process ring until a sharded or per-thread ring ABI exists - Userspace executor (like tokio) driving the ring
-
Cap passing in the POSIX adapter. POSIX has
SCM_RIGHTSfor passing fds over Unix sockets. Should the POSIX adapter support something similar for passing caps? Or is this native-only? -
Dynamic linking. Currently all binaries are statically linked. Should capOS support shared libraries? Probably not initially – static linking is simpler and the binaries are small. Revisit if binary size becomes a concern.
-
WASI component model integration. WASI preview 2 components have typed imports/exports that could map to capnp interfaces. Should the wasm runtime auto-generate capnp-to-WIT adapters from schemas? This would let wasm components participate natively in the capability graph.
-
Build system. How are userspace binaries packed into the boot image? Currently the Makefile builds
init/separately. With multiple service binaries, need a more scalable approach (build manifest that lists all binaries, Makefile target that builds and packs them all).
Relationship to Other Proposals
- Service architecture proposal – defines what services exist and how they compose. This proposal defines how those service binaries are built, what runtime they use, and how non-Rust software fits in.
- Storage and naming proposal – the POSIX
open()/read()/write()translation targets the Store and Namespace caps defined there. - Networking proposal – the POSIX socket translation targets the TcpSocket/UdpSocket caps from the network stack.