# Userspace Runtime

The userspace runtime owns the repeated mechanics that every service needs:
bootstrap validation, heap initialization, typed capability lookup, ring
submission, completion matching, application exception decoding, and handle
lifetime.


## Related

- [Go VirtualMemory Contract](../backlog/go-virtual-memory-contract.md)
  defines the caller-buffer reserve, commit, and decommit methods allocator
  paths need.
- [Programming Languages](../programming-languages.md) summarizes current
  native Rust support and planned language-runtime tracks.
- [Memory Management](memory.md) documents the implemented kernel
  `VirtualMemory` and `MemoryObject` behavior.
- [Go Runtime](../proposals/go-runtime-proposal.md) is the owning language
  runtime proposal; [LLVM Target](../research/llvm-target.md) records the Go
  runtime OS hooks that drive this work.

## Current Behavior

Runtime-owned `_start` receives `(ring_addr, pid, capset_addr)`, initializes a
fixed heap, validates the ring address, reads the read-only CapSet page, installs
an emergency Console panic path when available, calls `capos_rt_main(runtime)`,
and exits with the returned code.

The `Runtime` lends out at most one `RuntimeRingClient` at a time. The client
wraps the raw ring page, keeps request buffers alive until completions are
matched, handles out-of-order completions, packs copy-transfer descriptors, and
parses result-cap records. Owned runtime handles queue `CAP_OP_RELEASE` when the
last local reference is dropped; the release queue flushes when a ring client is
borrowed or dropped, or when code calls `Runtime::flush_releases()` explicitly.
Promise placeholders are currently bookkeeping only; their future SQE
coordinates map `AnswerId.raw()` to `pipeline_dep` and a result-cap record index
to `pipeline_field`.

## Design

The runtime separates non-owning bootstrap references from owned local handles.
CapSet entries produce typed `Capability<T>` values only when the interface ID
matches the requested type, and the same manifest-order CapSet entries remain
available for diagnostic and shell surfaces that need to list or inspect what a
process was actually granted. Result-cap adoption performs the same interface
check before producing `OwnedCapability<T>`.

Typed clients are thin wrappers over the ring client. They encode Cap'n Proto
params, submit CALL SQEs, wait for a matching CQE, decode transport errors, and
decode kernel-produced `CapException` payloads into client errors. Endpoint
servers can use `submit_endpoint_return_exception()` to return a serialized
`CapException` to the original caller over the same endpoint RETURN path.
The handwritten `TimerClient` exposes monotonic `now` reads and sleep calls
over the same completion-matching path.
The handwritten `VirtualMemoryClient` exposes map, reserve, commit, decommit,
unmap, and protect calls for runtime heap/arena allocation over anonymous user
pages. It has both the ordinary allocation-backed async methods and synchronous
caller-buffer methods for allocator growth paths that cannot allocate while
asking the kernel for more memory. This matches the reserve/commit/decommit
surface specified in
[Go VirtualMemory Contract](../backlog/go-virtual-memory-contract.md).
The handwritten `ThreadControlClient` exposes current-process FS-base reads and
updates for runtimes that need to swap a language-managed TLS base after process
startup.

The 7.1.0 threading contract keeps one process ring and the runtime's
single-owner ring-client invariant for the first in-process threading
implementation. Future multi-threaded runtimes must serialize blocking ring
entry through `capos-rt` until a runtime reactor or Ring v2 lands. The reactor
bridge uses one runtime-owned CQ drainer plus ParkSpace-backed wait records;
the full-SMP kernel target is per-thread rings, where `cap_enter` waits on the
current thread's CQ. After 7.2, the existing `ThreadControlClient` methods apply
to the current thread's FS base rather than to a process-wide saved FS base.
`ThreadControl.exitThread` and the raw `exit(code)` syscall both terminate the
current thread; the process exits when its last live thread exits.

The 7.2.3 park slice adds a process-local ParkSpace marker type and compact
`CAP_OP_PARK` / `CAP_OP_UNPARK` operations. `capos-rt` should expose
those operations as runtime synchronization primitives in a later slice; the
current `thread-lifecycle` proof uses raw SQEs so the runtime does not
prematurely claim the park `user_data` namespace. Blocking park wait is not
an ordinary
`RuntimeRingClient` call: the wait SQE must be thread-owned for the current
thread, and the runtime must reserve park `user_data` values,
write the wait SQE under its ring-submission lock, release that lock before
`cap_enter`, and demultiplex park CQEs into runtime-owned wait slots so a
sibling thread can still submit the wake. The temporary single-thread park
fallback remains only as the pre-thread runtime checkpoint proof.

Future generated clients should preserve this split: transport lifetime and
completion matching belong in the runtime, while interface-specific encoding
belongs in generated or handwritten client wrappers.

## Invariants

- `ring_addr` must equal `RING_VADDR`; runtime bootstrap rejects any other
  address.
- The CapSet header magic/version must validate before lookup.
- CapSet handles are non-owning unless explicitly adopted.
- Only one runtime ring client may be live at a time for a process.
- Until Ring v2, multithreaded generic client waits must flow through a runtime
  reactor/demux path rather than letting multiple threads consume the process
  CQ directly.
- Park wait must not hold the live runtime ring client while the kernel parks
  the current thread.
- Request params and result buffers must outlive their matching CQE.
- A result cap can be consumed only once and only with the expected interface
  ID.
- Promise placeholders must map to sideband result-cap record indexes, not
  schema field paths.
- Dropping the final owned handle queues exactly one local `CAP_OP_RELEASE`;
  `Runtime::flush_releases()` forces queued releases and reports rejected
  kernel release results.
- Release flushing treats stale or already-removed caps as non-fatal cleanup.

## Code Map

- `capos-rt/src/entry.rs` - `_start`, `Runtime`, bootstrap validation,
  single-owner ring token, release queue flushing.
- `capos-rt/src/alloc.rs` - fixed userspace heap initialization.
- `capos-rt/src/capset.rs` - typed CapSet lookup and manifest-order iteration
  wrappers.
- `capos-rt/src/ring.rs` - ring client, pending calls, completion matching,
  copy-transfer packing, result-cap parsing.
- `capos-rt/src/client.rs` - Console, TerminalSession, BootPackage,
  ProcessSpawner, ProcessHandle, VirtualMemory, Timer, ThreadControl,
  ThreadSpawner, and ThreadHandle clients, and exception decoding.
- `capos-rt/src/lib.rs` - typed capability marker types and owned handle
  reference counting.
- `capos-rt/src/panic.rs` - emergency Console output path.
- `capos-rt/src/syscall.rs` - raw syscall instructions and public syscall
  wrappers, including the hostile smoke probe for the removed ambient write
  syscall.
- `targets/x86_64-unknown-capos.json` - userspace target specification.
- `tools/check-userspace-runtime-surface.sh` - source check that keeps runtime
  primitives owned by `capos-rt`.
- `init/src/main.rs`, `capos-rt/src/bin/smoke.rs`, and
  `shell/src/main.rs` - current runtime users.

## Validation

- `make capos-rt-check` builds the runtime smoke binary against
  `targets/x86_64-unknown-capos.json`, matching the booted userspace target.
- `make init-capos-build`, `make demos-capos-build`, `make shell-capos-build`,
  and `make capos-rt-capos-build` expose focused custom-target build wrappers
  for the current userspace crates and runtime smoke binary.
- `tools/check-userspace-runtime-surface.sh` verifies `init`, `demos`, and
  `shell` do not define `_start`, panic handlers, global allocators, raw
  syscall instructions, or entry-point macros outside `capos-rt`.
- `make run-smoke` validates runtime entry, typed Console calls, exception decoding,
  owned handle release, result-cap parsing through IPC, and clean process exit.
- `make run-spawn` validates `ProcessSpawnerClient`, `ProcessHandleClient`,
  `VirtualMemoryClient`, `TimerClient`, `ThreadControlClient`,
  `ThreadSpawnerClient`, `ThreadHandleClient`, result-cap adoption, and release
  behavior under init spawning. The `single-thread-runtime` child proves the
  first runtime-shaped checkpoint over caller-buffer VirtualMemory calls and
  Timer; the `thread-lifecycle` child proves in-process create, self-join
  rejection, join, detach, last-thread `exitThread`, and private ParkSpace
  wait/wake correctness.
- `make run-shell` validates CapSet iteration, capability inspection, typed
  application-error decoding, guest session metadata, exact-grant spawning,
  ProcessHandle waits, and stale-handle release behavior in the focused
  shell-launch proof manifest.
- `make run-terminal` validates `TerminalSessionClient` writes, bounded line
  reads, hidden-echo input handling, and structured cancellation in the
  focused terminal proof manifest.
- `cd capos-rt && cargo test --lib --target x86_64-unknown-linux-gnu` covers
  host-testable runtime invariants when run explicitly.

## Open Work

- Add generated client bindings after the schema surface stabilizes.
- Implement promise/answer transport semantics beyond current placeholders.
- Add typed ParkSpace clients with runtime-owned `user_data` demultiplexing.
- Define release behavior for queued handles when a process exits before the
  release queue flushes.
