Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Userspace Runtime

The userspace runtime owns the repeated mechanics that every service needs: bootstrap validation, heap initialization, typed capability lookup, ring submission, completion matching, application exception decoding, and handle lifetime.

  • Go VirtualMemory Contract defines the caller-buffer reserve, commit, and decommit methods allocator paths need.
  • Programming Languages summarizes current native Rust support and planned language-runtime tracks.
  • Memory Management documents the implemented kernel VirtualMemory and MemoryObject behavior.
  • Go Runtime is the owning language runtime proposal; LLVM Target records the Go runtime OS hooks that drive this work.

Current Behavior

Runtime-owned _start receives (ring_addr, pid, capset_addr), initializes a fixed heap, validates the ring address, reads the read-only CapSet page, installs an emergency Console panic path when available, calls capos_rt_main(runtime), and exits with the returned code.

The Runtime lends out at most one RuntimeRingClient at a time. The client wraps the raw ring page, keeps request buffers alive until completions are matched, handles out-of-order completions, packs copy-transfer descriptors, and parses result-cap records. Owned runtime handles queue CAP_OP_RELEASE when the last local reference is dropped; the release queue flushes when a ring client is borrowed or dropped, or when code calls Runtime::flush_releases() explicitly. Promise placeholders are currently bookkeeping only; their future SQE coordinates map AnswerId.raw() to pipeline_dep and a result-cap record index to pipeline_field.

Design

The runtime separates non-owning bootstrap references from owned local handles. CapSet entries produce typed Capability<T> values only when the interface ID matches the requested type, and the same manifest-order CapSet entries remain available for diagnostic and shell surfaces that need to list or inspect what a process was actually granted. Result-cap adoption performs the same interface check before producing OwnedCapability<T>.

Typed clients are thin wrappers over the ring client. They encode Cap’n Proto params, submit CALL SQEs, wait for a matching CQE, decode transport errors, and decode kernel-produced CapException payloads into client errors. Endpoint servers can use submit_endpoint_return_exception() to return a serialized CapException to the original caller over the same endpoint RETURN path. The handwritten TimerClient exposes monotonic now reads and sleep calls over the same completion-matching path. The handwritten VirtualMemoryClient exposes map, reserve, commit, decommit, unmap, and protect calls for runtime heap/arena allocation over anonymous user pages. It has both the ordinary allocation-backed async methods and synchronous caller-buffer methods for allocator growth paths that cannot allocate while asking the kernel for more memory. This matches the reserve/commit/decommit surface specified in Go VirtualMemory Contract. The handwritten ThreadControlClient exposes current-process FS-base reads and updates for runtimes that need to swap a language-managed TLS base after process startup.

The 7.1.0 threading contract keeps one process ring and the runtime’s single-owner ring-client invariant for the first in-process threading implementation. Future multi-threaded runtimes must serialize blocking ring entry through capos-rt until a runtime reactor or Ring v2 lands. The reactor bridge uses one runtime-owned CQ drainer plus ParkSpace-backed wait records; the full-SMP kernel target is per-thread rings, where cap_enter waits on the current thread’s CQ. After 7.2, the existing ThreadControlClient methods apply to the current thread’s FS base rather than to a process-wide saved FS base. ThreadControl.exitThread and the raw exit(code) syscall both terminate the current thread; the process exits when its last live thread exits.

The 7.2.3 park slice adds a process-local ParkSpace marker type and compact CAP_OP_PARK / CAP_OP_UNPARK operations. capos-rt should expose those operations as runtime synchronization primitives in a later slice; the current thread-lifecycle proof uses raw SQEs so the runtime does not prematurely claim the park user_data namespace. Blocking park wait is not an ordinary RuntimeRingClient call: the wait SQE must be thread-owned for the current thread, and the runtime must reserve park user_data values, write the wait SQE under its ring-submission lock, release that lock before cap_enter, and demultiplex park CQEs into runtime-owned wait slots so a sibling thread can still submit the wake. The temporary single-thread park fallback remains only as the pre-thread runtime checkpoint proof.

Future generated clients should preserve this split: transport lifetime and completion matching belong in the runtime, while interface-specific encoding belongs in generated or handwritten client wrappers.

Invariants

  • ring_addr must equal RING_VADDR; runtime bootstrap rejects any other address.
  • The CapSet header magic/version must validate before lookup.
  • CapSet handles are non-owning unless explicitly adopted.
  • Only one runtime ring client may be live at a time for a process.
  • Until Ring v2, multithreaded generic client waits must flow through a runtime reactor/demux path rather than letting multiple threads consume the process CQ directly.
  • Park wait must not hold the live runtime ring client while the kernel parks the current thread.
  • Request params and result buffers must outlive their matching CQE.
  • A result cap can be consumed only once and only with the expected interface ID.
  • Promise placeholders must map to sideband result-cap record indexes, not schema field paths.
  • Dropping the final owned handle queues exactly one local CAP_OP_RELEASE; Runtime::flush_releases() forces queued releases and reports rejected kernel release results.
  • Release flushing treats stale or already-removed caps as non-fatal cleanup.

Code Map

  • capos-rt/src/entry.rs - _start, Runtime, bootstrap validation, single-owner ring token, release queue flushing.
  • capos-rt/src/alloc.rs - fixed userspace heap initialization.
  • capos-rt/src/capset.rs - typed CapSet lookup and manifest-order iteration wrappers.
  • capos-rt/src/ring.rs - ring client, pending calls, completion matching, copy-transfer packing, result-cap parsing.
  • capos-rt/src/client.rs - Console, TerminalSession, BootPackage, ProcessSpawner, ProcessHandle, VirtualMemory, Timer, ThreadControl, ThreadSpawner, and ThreadHandle clients, and exception decoding.
  • capos-rt/src/lib.rs - typed capability marker types and owned handle reference counting.
  • capos-rt/src/panic.rs - emergency Console output path.
  • capos-rt/src/syscall.rs - raw syscall instructions and public syscall wrappers, including the hostile smoke probe for the removed ambient write syscall.
  • targets/x86_64-unknown-capos.json - userspace target specification.
  • tools/check-userspace-runtime-surface.sh - source check that keeps runtime primitives owned by capos-rt.
  • init/src/main.rs, capos-rt/src/bin/smoke.rs, and shell/src/main.rs - current runtime users.

Validation

  • make capos-rt-check builds the runtime smoke binary against targets/x86_64-unknown-capos.json, matching the booted userspace target.
  • make init-capos-build, make demos-capos-build, make shell-capos-build, and make capos-rt-capos-build expose focused custom-target build wrappers for the current userspace crates and runtime smoke binary.
  • tools/check-userspace-runtime-surface.sh verifies init, demos, and shell do not define _start, panic handlers, global allocators, raw syscall instructions, or entry-point macros outside capos-rt.
  • make run-smoke validates runtime entry, typed Console calls, exception decoding, owned handle release, result-cap parsing through IPC, and clean process exit.
  • make run-spawn validates ProcessSpawnerClient, ProcessHandleClient, VirtualMemoryClient, TimerClient, ThreadControlClient, ThreadSpawnerClient, ThreadHandleClient, result-cap adoption, and release behavior under init spawning. The single-thread-runtime child proves the first runtime-shaped checkpoint over caller-buffer VirtualMemory calls and Timer; the thread-lifecycle child proves in-process create, self-join rejection, join, detach, last-thread exitThread, and private ParkSpace wait/wake correctness.
  • make run-shell validates CapSet iteration, capability inspection, typed application-error decoding, guest session metadata, exact-grant spawning, ProcessHandle waits, and stale-handle release behavior in the focused shell-launch proof manifest.
  • make run-terminal validates TerminalSessionClient writes, bounded line reads, hidden-echo input handling, and structured cancellation in the focused terminal proof manifest.
  • cd capos-rt && cargo test --lib --target x86_64-unknown-linux-gnu covers host-testable runtime invariants when run explicitly.

Open Work

  • Add generated client bindings after the schema surface stabilizes.
  • Implement promise/answer transport semantics beyond current placeholders.
  • Add typed ParkSpace clients with runtime-owned user_data demultiplexing.
  • Define release behavior for queued handles when a process exits before the release queue flushes.