Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: POSIX Compatibility Adapter

How capOS should host POSIX-shaped C software without recreating the ambient authority that makes POSIX hard to confine, and which two ports validate the adapter for the first time.

Problem

capOS is not POSIX and is not trying to become POSIX. But useful software – DNS resolvers, line-editing libraries, shells, archivers, compilers, network clients – assumes a POSIX surface. Rewriting each of these in capability- native Rust would forfeit decades of debugging, security review, and performance work for no isolation gain: a POSIX program whose only authority is a typed capability set is already as confined as an equivalent native one.

The risk pattern is the one POSIX historically gets wrong: a translation layer that synthesises ambient authority (a global /, an inherited credential table, a kernel-managed file descriptor map) rebuilds the property capOS is trying to leave behind. A useful adapter must do the opposite – every POSIX call must be backed by a typed capability the calling process already holds, or it must fail closed with a documented errno.

Two upstream programs are the natural first validators of that adapter:

  • A POSIX shell exercises the broadest surface (process, pipe, file, env, signal stubs, stdio).
  • A DNS resolver exercises the smallest network surface (UDP socket, one-shot poll-equivalent, time, log).

Both are already small, mature, and BSD/MIT-licensed. Picking the smallest representative of each category makes the adapter’s first job a real port, not a synthetic test.

Scope

In scope:

  • A two-layer C substrate: libcapos (thin Rust staticlib, capability ring + CapSet + raw syscalls + heap, C ABI) and libcapos-posix (POSIX shape on top: fd table, errno, path resolution, posix_spawn shim, signal stubs, pthread mapping).
  • A first POSIX shell port that builds against libcapos-posix with no hidden ambient authority.
  • A first DNS resolver port that builds against libcapos-posix with no hidden ambient authority.
  • Phase decomposition (P1.1, P1.2, P1.3) that defers the adapter’s biggest dependencies (Namespace + File caps for the shell file path; UDP cap for the resolver) into clearly-named gating phases.
  • Validation through QEMU smokes that prove granted and ungranted paths.

Out of scope for the first implementation:

  • Binary compatibility with Linux ELFs. Both ports are sources-on-disk recompiled against libcapos-posix.
  • Full POSIX compliance. The adapter ships exactly the surface dash and dns.c exercise, plus any free additions that fall out.
  • Real fork() (parent state inheritance, COW, sibling address-space surgery before exec). Only fork() followed promptly by execve() is supported, via a posix_spawn-shaped shim.
  • Real signal delivery. signal()/sigaction() accept the call, store the handler, never invoke it. kill(2) requires a future ProcessHandle cap.
  • Job control, process groups, sessions, controlling terminals.
  • musl, glibc, or any other host libc. The substrate is Rust-authored and exposes a C ABI; it is not a libc port.
  • Hosted C++. ABI decisions for C++ remain tracked in docs/proposals/userspace-binaries-proposal.md.

Current Manual Pages

  • Programming Languages summarizes POSIX adapter status relative to Rust, C/C++, Python, Go, Lua, and WASI tracks.
  • Userspace Binaries Part 4 sketches the POSIX adapter at a higher level. This proposal supersedes that sketch with the full design surface; the userspace-binaries proposal continues to own the broader native-binary, language, and adapter roadmap.
  • Userspace Runtime documents the implemented capos-rt surface that libcapos mirrors for C consumers.
  • Networking defines NetworkManager, TcpListener, and TcpSocket and explicitly defers UdpSocket until DNS / userspace-network work needs it. The DNS resolver port in this proposal defines the UDP cap surface; the TCP cap surface is reused unchanged.
  • Storage and Naming defines the Namespace, Directory, File, and Store cap shape; these gate the shell port’s filesystem surface (Phase 2/3 of that proposal).
  • Service Architecture frames the future Resolver cap as the long-term consumer of the resolver process built in this track.
  • Shell covers the native capos-shell. The POSIX shell port (dash) is for porting validation, not as a replacement for the native shell.
  • WASI Host Adapter is the parallel untrusted-portable execution path; both proposals share fd-table and per-import authority insight, but target different substrates.

Research Grounding

Relevant research and external references:

  • POSIX shell candidates surveyed: dash (Debian Almquist Shell, ~13 kSLOC, BSD; the canonical small POSIX-strict shell); busybox ash; OpenBSD ksh (oksh); toybox toysh. Source repositories cited inline in the candidate comparison table.
  • DNS resolver candidates surveyed: dns.c by William Ahern (single-file MIT, ~10 kSLOC, no dependencies); c-ares; GNU adns; udns; SPCDNS; musl’s embedded res_query; trust-dns-resolver. Source repositories cited inline in the candidate comparison table.
  • libcapos prior art: this proposal builds on the libcapos shape sketched in Userspace Binaries “Future: C via libcapos” / “Future Phase: libcapos for C”. The C substrate is designed as a Rust staticlib with a C ABI rather than musl, redox relibc, or a hand-rolled libc. Fuchsia’s fdio + musl pattern and Redox’s relibc pattern are the comparable points; capOS deliberately picks neither.
  • POSIX surface translation: Cygwin’s fork() emulation is the closest prior art for fork-for-exec semantics on top of a non-fork substrate; the capOS shim inverts the default (capOS cannot fork; the shim emulates the useful case) but uses the same call-pattern recognition.

In-tree research grounding:

  • Genode – per-session typed service interfaces and resource accounting are the closest precedent for routing every POSIX wrapper through a typed cap rather than through an ambient kernel syscall table. POSIX adapter wrappers should follow the same pattern at the library boundary instead of the kernel boundary.
  • OS Error Handling – cross-OS comparison of error-model surfaces. Informs the bidirectional mapping between CapError / CapException and POSIX errno (Open Question §4) and the decision to keep one shared mapping table at the C boundary rather than per-wrapper bespoke mappings.
  • LLVM Target – target triple, calling convention, and bare-metal toolchain options for capOS C consumers. Informs Open Question §11 on the linker / toolchain choice (clang --target=x86_64-unknown-none-elf -nostdlib -static).

This proposal also lifts the capability-mapping shape and the “every translation has authority backing” property from the WASI host adapter proposal, and the libcapos staticlib shape from the userspace-binaries proposal Part 2. It deliberately does not adopt the musl + __syscall hook pattern noted in the userspace-binaries proposal “musl as a Base (Optional, Later)” section, because the layered Rust staticlib shape is preferred over a libc port for the v0 surface.

External:

  • dash – Debian Almquist Shell, ~13 kSLOC, Debian’s /bin/sh since Squeeze (2011).
  • busybox ash – alternative Almquist port, embedded.
  • oksh – portable OpenBSD ksh, public domain, larger surface.
  • toybox toysh – 0BSD, currently incomplete.
  • c-ares – modern async DNS resolver, MIT, larger.
  • dns.c – single-file non-blocking DNS, MIT, no deps.
  • GNU adns – async DNS resolver, GPL-2.0+.
  • musl resolver – embedded in musl libc; not available without linking musl.
  • udns – small async stub-only resolver, LGPL-2.1.

Design Principles

  1. POSIX is not a kernel feature. The kernel sees ordinary userspace processes with a CapSet and a capability ring. libcapos and libcapos-posix are static libraries linked into those processes.
  2. Two layers, one C ABI per layer. libcapos is the C-ABI mirror of capos-rt: capability ring, CapSet, raw syscalls, heap. It has no errno, no fd table, no open/read/write. libcapos-posix builds the POSIX shape on top. Programs that do not need POSIX semantics may link only libcapos.
  3. Authority is per-process, granted at spawn. Every fd a POSIX program sees was granted to its parent process at spawn time and projected onto an fd by libcapos-posix. There is no ambient /, no inherited credential table, no global signal source.
  4. Schema-first, not POSIX-first, at the boundary. Each POSIX wrapper is backed by a typed capability call with a documented errno mapping. POSIX-shaped integer fds and POSIX-shaped errno are an ABI requirement of the C substrate, not a capability-model concession.
  5. Fail closed. Any unimplemented POSIX call returns ENOSYS and sets errno. Any cap lookup that fails returns the documented errno. Programs cannot probe absent caps for ambient behaviour.
  6. No fork without exec. Only fork() followed by execve() is supported. The shim turns the pair into posix_spawn(). Bare fork() used to clone state in-process fails on the next non-trivial syscall.
  7. No real signals. Handlers are accepted and stored, never delivered. kill(2) requires a future ProcessHandle cap and even then is limited to SIGKILL. Programs that depend on SIGCHLD job control are out of scope.
  8. The C substrate is Rust. libcapos and libcapos-posix are Rust crates with crate-type = ["staticlib"], all symbols #[no_mangle] extern "C". This is not musl, not a hand-rolled libc.

Architecture

flowchart TD
    Shell["POSIX shell binary<br/>(e.g. dash)"]
    Resolver["DNS resolver binary<br/>(e.g. dns.c)"]
    Posix["libcapos-posix<br/>(POSIX adapter, Rust staticlib, C ABI)"]
    PosixDetail["fd table per process<br/>path resolver over Namespace + Store<br/>errno mapping (TLS cell)<br/>posix_spawn over ProcessSpawner<br/>signal stubs<br/>pthread over ThreadSpawner"]
    Posix --> PosixDetail
    Capos["libcapos<br/>(thin Rust staticlib, C ABI)"]
    CaposDetail["cap_call / capset_get / capset_iter<br/>sys_exit / sys_cap_enter<br/>heap (malloc/free over capos-rt allocator)<br/>typed wrappers for Console / Terminal / etc."]
    Capos --> CaposDetail
    Rt["capos-rt<br/>(no_std + alloc Rust)"]
    Ring["capability ring"]
    Kernel["kernel CapObject dispatch"]
    Services["userspace services"]

    Shell -->|"open/read/write/exec/..."| Posix
    Resolver -->|"socket/sendto/recvfrom"| Posix
    Posix -->|"extern C"| Capos
    Capos -->|"Rust FFI re-export"| Rt
    Rt --> Ring
    Ring --> Kernel
    Ring --> Services

libcapos is the C-ABI projection of capos-rt. libcapos-posix is the POSIX projection on top. Every POSIX call ultimately resolves to either a capability invocation through the ring or a synthetic answer (errno, ENOSYS) computed without authority.

libcapos: C-Facing Substrate

Headers expected to ship under include/capos/:

// capos.h -- capability primitives only
typedef struct cap_ring cap_ring_t;
typedef uint32_t        cap_id_t;
typedef uint64_t        iface_id_t;

cap_ring_t *capos_ring(void);                     // process ring handle
int  cap_call(cap_ring_t *ring,
              cap_id_t cap, uint16_t method,
              const void *params, size_t plen,
              void *result, size_t rlen,
              size_t *out_len);
int  capset_get(const char *name,
                cap_id_t *out_cap, iface_id_t *out_iface);
size_t capset_iter(void (*cb)(const char*, cap_id_t, iface_id_t,
                              void*), void *ud);
_Noreturn void sys_exit(int code);
uint32_t       sys_cap_enter(uint32_t min_complete, uint64_t timeout_ns);

// Heap (backed by capos-rt fixed heap; grow-on-demand later if needed)
void *capos_malloc(size_t);
void  capos_free(void*);
void *capos_calloc(size_t, size_t);
void *capos_realloc(void*, size_t);

There is no errno here, no open/read/write. Those live one layer up. libcapos is the C-ABI mirror of capos-rt: startup, ring, CapSet, raw syscalls, heap.

Build artifact: target/.../libcapos.a plus headers. Naming for the C library is intentionally just libcapos, mirroring how the Rust runtime crate is capos-rt. The C library name libcapos is distinct from any Rust service framework that may carry a similar name; this proposal owns the C-substrate name and treats Rust-framework naming as out of scope.

libcapos-posix: POSIX Surface

Headers under include/capos/posix/: unistd.h, fcntl.h, errno.h, sys/socket.h, netdb.h, sys/stat.h, dirent.h, string.h, stdlib.h (subset), sys/types.h, pthread.h (subset), signal.h (stub).

Implementation language: Rust, same crate-type pattern as libcapos, but linked separately so a binary that does not need POSIX can omit it.

Errno bridge: per-thread errno cell stored in TLS slot owned by libcapos-posix; populated by every wrapper that maps a Rust CapError to a POSIX errno value. See “errno Convention” below.

File descriptor table

Per-process userspace state inside libcapos-posix. Not a kernel object – neither libcapos nor the kernel know anything about fds.

#![allow(unused)]
fn main() {
// libcapos-posix/src/fd.rs (sketch)
struct FdEntry {
    backing: FdBacking,       // Console / Stream / Listener / File / Dir
    flags:   i32,             // O_NONBLOCK, FD_CLOEXEC, ...
    cursor:  u64,             // for seekable backings
}

enum FdBacking {
    Stdin,                    // Console / TerminalSession (read side)
    Stdout,                   // Console (write side)
    Stderr,                   // Console (write side)
    File   { file: Cap<File>, dirty: bool },
    Dir    { dir:  Cap<Directory>, iter: usize },
    Tcp    { sock: Cap<TcpSocket> },
    Udp    { sock: Cap<UdpSocket> },
    Listener { l: Cap<TcpListener> },
}

static FD_TABLE: Mutex<BTreeMap<i32, FdEntry>> = ...;
static NEXT_FD:  AtomicI32 = AtomicI32::new(3);
}

dup/dup2/close operate on this table. dup increments a refcount on the underlying cap; close releases when the last fd holding the cap drops. Cap drop runs through capos-rt owned-handle release. The fd table is a strict per-process userspace structure; it is not shared with the kernel and is never serialised on the wire.

Standard fds wired at _start:

  • fd 0: stdin cap from CapSet (TerminalSession, Console, or future StdinReader-shaped cap, whichever is granted).
  • fd 1: stdout Console cap.
  • fd 2: stderr Console cap (or distinct Log cap if granted).

Process model: fork-for-exec only

capOS process creation is ProcessSpawner.spawn(name, binaryName, grants) (kernel/src/cap/process_spawner.rs). There is no fork(), no exec()-in-place.

Decision matrix (working answers; the policy choice is Open Question §6 and is not settled until that question is confirmed):

OptionWhat it providesCostWorking answer
Emulate fork() as posix_spawn with inherited cap-set, recording inter-call dup2/close as posix_spawn file actionsExisting fork+exec and fork+dup2+exec pipeline patterns work with one patch siteDaemonisation and arbitrary COW state inheritance between fork and exec still breakRecommended primary for the shell, with documented “fork-for-exec only” semantics. Whether the shim records inter-call file actions or requires the port to call posix_spawn with explicit file actions is Open Question §6.
Return ENOSYS for any fork()HonestEvery POSIX program that uses fork must be patchedRecommended safety net when fork-for-exec is misused
Process-shadow: a “POSIX process” wraps a capOS processGeneralLarge kernel + runtime change; doubles process accountingRecommended reject for v0; revisit only if a real POSIX program needs it

Working answer: fork-for-exec, with hard-fail as the safety net (subject to Open Question §6 confirmation before P1.3 begins). Two libcapos-posix shim variants are on the table; §6 selects between them:

  • Variant A – recording shim. libcapos-posix exposes fork() and execve() as a coupled shim that:
    1. fork() records “next exec is the real spawn” in TLS, returns 0 in the “child” pseudo-context (still in parent address space).
    2. dup2() / close() calls between fork() and execve() are recorded as posix_spawn file actions on the pending spawn rather than mutating the parent’s fd table.
    3. execve(path, argv, envp) consumes the recorded intent, calls ProcessSpawner.spawn() with attenuated grants and the recorded file actions, returns the “child” PID to the parent path.
    4. Any fork() not followed by execve() before a syscall outside the recorded-action allowlist (e.g. setsid) returns -1 / ENOSYS on that downstream call.
  • Variant B – patched-port shim. libcapos-posix exposes only posix_spawn() with explicit file actions, plus stub fork() / execve() that return -1 / ENOSYS. Each port (dash and successors) is patched to translate its fork+dup2+exec sequence into a single posix_spawn() call with the equivalent file actions.

posix_spawn() is the preferred primitive in either variant and gets a direct mapping to ProcessSpawner.spawn(). The choice between Variant A and Variant B is Open Question §6.

Signals

Stubbed. capOS has no signal mechanism today and the cap model disagrees with ambient asynchronous interrupts.

  • signal() / sigaction() accept the call, store the handler in a per-process table, never invoke it. Return success.
  • kill(pid, sig) returns -1 / EPERM unless the caller has a ProcessHandle cap for the target – and even then the only signal honoured is SIGKILL, which maps to a future ProcessHandle.kill() (not implemented yet, returns ENOSYS today).
  • pause() / sigsuspend() / sigwait() block forever (or with timeout) via sys_cap_enter(0, timeout); they never wake from a signal.
  • SIGPIPE is never delivered. Writes on a closed connection return -1 / EPIPE.

This is acceptable for a shell + DNS resolver. Anything that depends on real signals (job control with Ctrl-Z, Ctrl-C across pipelines, real SIGCHLD) is out of scope for the first port. Job control in the shell must be reimplemented over typed control caps, not signals.

errno convention

Per-thread errno cell in TLS owned by libcapos-posix. Mapping table (libcapos-posix/src/errno_map.rs):

capOS CapError / CapExceptionPOSIX errno
CapError::NotFoundENOENT
CapError::PermissionDeniedEACCES
CapError::DisconnectedECONNRESET
CapError::TimeoutETIMEDOUT
CapError::ResourceExhaustedENOMEM / EMFILE (context dependent)
CapError::InvalidArgumentEINVAL
CapError::WouldBlockEAGAIN
(fall-through)EIO

Wrappers always: clear errno, call, on error set errno + return -1 (int) or NULL (pointer). Same convention as glibc / musl.

Threading

pthreads -> capOS in-process threading. Substrate already exists in the kernel: ThreadSpawner, ThreadControl, ThreadHandle, per-thread FS-base, ParkSpace.

Mapping:

  • pthread_create -> ThreadSpawner.spawn + start-routine trampoline.
  • pthread_exit -> ThreadControl.exitThread.
  • pthread_join -> ThreadHandle.join (block via cap_enter).
  • pthread_self -> TLS slot or ThreadControl.currentId.
  • pthread_mutex_* -> ParkSpace-backed mutex (futex-style park / unpark).
  • pthread_cond_* -> ParkSpace + bounded waiter queue.
  • pthread_key_* -> fixed-size TLS slot table per thread.

This is in scope but not on the critical path for the shell or DNS resolver – both can run single-threaded for v0. The pthread shim is deferred to a v1 successor.

First Port: POSIX Shell

Candidate survey

ShellLicenseSizeDepsPOSIX coverageVerdict
dash (upstream)BSD~13 kSLOC, ~134 KBtiny libc subset; no readline; no termcapStrict POSIX, no extensionsRecommended primary
busybox ash (upstream)GPL-2.0~8 kSLOC of shell/ash.c + busybox infraDesigned for embedded, modularPOSIX + selectable extensionsHeavier framework cost; useful later when capOS wants a coreutils set
toybox toysh (upstream)0BSDcurrently incompleteDesigned for self-contained ELFPOSIX + Bash compat target, not finishedSkip – explicitly described upstream as still under development
oksh (upstream)Public domain~308 KB binary, 0 depsOptional ncurses for clear-screen onlyKorn-shell superset of POSIXBigger surface than v0 needs to validate libcapos-posix
Custom Rust shelln/an/an/an/aReject – defeats the purpose of porting C. Native shell already exists at shell/ (capos-shell).

Recommended primary: dash.

Reasons:

  1. Smallest established POSIX-strict shell. ~13 kSLOC is small enough for the porting team to read the entire codebase.
  2. No readline / termcap dependency. The shell talks to whatever fd 0 gives it. This is exactly what libcapos-posix provides through TerminalSession or Console.
  3. Strict POSIX means the port does not accidentally validate Bash extensions that libcapos-posix does not implement.
  4. Already proven as a porting target on Linux from Scratch, OpenWrt, and Alpine. Patterns for replacing the libc layer (__syscall, stubbed sigaction) are well documented.
  5. Debian uses it as /bin/sh since Squeeze (2011), so any “POSIX shell only” script base in the wild is dash-compatible.

Open Question §1 below records that the candidate is a recommendation, not a final decision.

Required POSIX surface (v0)

What a dash instance actually exercises before printing a prompt and running ls | grep foo:

GroupCalls (minimum set)Backed by
Process startup_start shim, argv/envp parsing, exitlibcapos _start, sys_exit
Stdioread(0,...), write(1,...), write(2,...)Console / TerminalSession cap
Allocationmalloc/free/calloc/realloclibcapos heap
String/formatprintf/fprintf/memcpy/strlen/strcmp/strchr/strncpy/…libcapos-posix string/printf subset
File I/Oopen/close/read/write/lseek/stat/fstat/access/unlinkNamespace + File caps
Directoryopendir/readdir/closedirDirectory cap
Pipespipe(), dup2(), close() on fdsNEW Pipe capability (P1.3)
Processfork+execve (fork-for-exec only), posix_spawn, wait/waitpidProcessSpawner + ProcessHandle.wait
Envgetenv/setenv/putenvPer-process env vector in libcapos-posix; populated from a future LaunchParameters cap when one lands
Signalssignal/kill/sigaction (stubs)TLS-stored handlers, never delivered
Timetime/gettimeofday/nanosleepTimer cap
Miscgetpid/getuid/getgidSynthetic per-process; uid/gid hardcoded for v0

Critical gap: pipe(). The shell pipeline ls | grep foo requires fd 1 of ls to feed fd 0 of grep. capOS has no pipe capability today. This is the first-port-blocking item; see Phase P1.3.

What dash will not get in v0

  • Job control (Ctrl-Z, bg, fg, & background): requires real SIGCHLD/SIGTSTP. Skip; documented as out of scope.
  • Process groups, sessions, controlling terminals: same reason.
  • trap for signals other than EXIT: handlers stored, never fired.
  • read -t (timeout): doable via Timer cap; defer to v1.
  • ulimit: returns 0 / ENOSYS. Quotas are kernel-side capability ledgers, not POSIX rlimits.

Validation smoke

make run-posix-shell-smoke:

  1. Boot a manifest that grants dash a TerminalSession (stdio), a read-only Namespace cap rooted at a tiny in-rodata pseudo-fs, a ProcessSpawner narrowed to one allowed binary (ls-shim), and a Timer cap.
  2. Pipe a heredoc into stdin: ls; echo done.
  3. Assert kernel log shows done and clean exit.

Stretch goal smoke: cat foo | grep bar end-to-end (depends on the pipe primitive landing).

First Port: DNS Resolver

Candidate survey

LibraryLicenseSource sizeDepsAsync styleVerdict
musl res_query (upstream)MIT~2 kSLOC for resolver coreEmbedded in muslSynchronous (parallel queries internally)Available only if the build links musl; capOS does not. Skip.
c-ares (upstream)MIT, C89~30+ kSLOC, multi-file, configure-drivenPOSIX sockets, optional threadsNative async (callbacks + select/poll/event loop)Largest surface, most mature, most invasive port
dns.c (wahern) (upstream)MITsingle-file C, ~10 kSLOC, no depsNone – caller provides socket I/O via three pluggable patterns (pollfd / events / timeout)Non-blocking, no required callback shapeRecommended primary
GNU adns (upstream)GPL-2.0+Multi-file, ~10-15 kSLOCPOSIX, no event-loop integrationAsync, opaque stateLicense is GPL-2.0+, not BSD/MIT. Skip unless capOS accepts a GPL component in the demo path.
udns (upstream)LGPL-2.1smallPOSIXAsync stub-onlyLGPL plus older project; skip unless dns.c blows up
SPCDNSLGPLsmallencode/decode only, no socketn/aSkip – provides no resolver loop
trust-dns-resolver in RustApache-2 / MITlargeTokioasyncReject – defeats the purpose of porting C. Native Rust resolver is a separate path.

Recommended primary: dns.c by William Ahern.

Reasons:

  1. Single-file, zero deps. Drops into the build with a minimal cc rule. The build avoids configure scripts, pkg-config, optional feature matrices, and multi-file build orchestration.
  2. No fixed I/O model. dns.c is designed around three common methods (pollfd, events, timeout). The host adapter plugs capability-backed socket I/O without rewriting the resolver core, replacing socket()/sendto()/recvfrom()/poll() with libcapos-posix wrappers that return fd-shaped results backed by UdpSocket / TcpSocket caps.
  3. MIT license is capOS-compatible.
  4. ~10 kSLOC means port review can read it end-to-end.
  5. C89, no threading assumption, no global state surprises (resolver handle is opaque per-instance) – fits a single-process v0 design.

Open Question §2 below records that the candidate is a recommendation, not a final decision.

Required POSIX surface (v0)

The DNS resolver port exercises a very narrow POSIX subset:

GroupCallsBacked by
Stdio (logs only)write(2,...)Console cap
Allocationmalloc/free/calloc/realloclibcapos heap
Timeclock_gettime/gettimeofdayTimer cap
Sockets (UDP)socket(AF_INET, SOCK_DGRAM, 0), sendto, recvfrom, bind, close, setsockopt (subset)NetworkManager + UdpSocket cap
Pollingpoll(fds, nfds, timeout_ms)Synthesised: each fd carries its underlying cap; libcapos-posix uses cap_enter(min_complete=1, timeout_ns) with one CQE per ready fd. No new kernel surface needed for v0 if dns.c uses one fd per query.
Resolv configOne in-rodata bounded text blob inlined into libcapos-posix (single nameserver entry; v0 ships before any storage cap exists)No open / Namespace cap required for v0

No pipes, no fork, no exec, no signals, no /etc/resolv.conf-by-path, no Namespace or File caps required. The DNS resolver is strictly easier than the shell.

The v0 surface intentionally omits TCP fallback for truncated responses and intentionally omits any path-based config file. The optional TCP fallback row uses socket(SOCK_STREAM), connect, send, recv through the existing NetworkManager + TcpSocket cap, but only on a later iteration once the v0 UDP-only smoke is green; see “What dns.c will not get in v0” below.

Critical gaps:

  • UdpSocket capability. The networking proposal Phase B implements TCP + listener only; UDP “is deferred until the userspace network stack or DNS work needs it; it is not part of the Telnet Shell Demo contract” (networking-proposal.md). The resolver port creates the UDP path; it does not consume an existing one.
  • The future Resolver cap concept (in service-architecture-proposal.md “DNS resolver – consumes a UdpSocket, exports Resolver”) is a target once the UDP path exists. The first port produces the exported shape.

What dns.c will not get in v0

  • DNSSEC validation: dns.c supports it, depending on /etc/resolv.conf trust anchor config. Defer.
  • TCP fallback for truncated responses: implement on a second iteration once the TCP capability path is reusable.
  • mDNS: out of scope.
  • Recursive mode (acting as a recursive resolver): out of scope; v0 ships stub-only.

Validation smoke

make run-posix-dns-smoke:

  1. Boot a manifest that grants the resolver process a NetworkManager (or future narrowed UdpSocket-only authority), a Console cap, and a Timer cap. The single-nameserver resolv config is the in-rodata bounded text blob compiled into libcapos-posix; no Namespace or File cap is needed for v0.
  2. The resolver opens a UDP socket, sends a query for a known A record to QEMU’s user-mode 10.0.2.3 (slirp’s built-in DNS) or to an in-host test resolver.
  3. Resolver prints the resolved IPv4 address.
  4. Assert kernel log line matches.

Trade-offs and Ordering

Smallest-deps comparison

PortC surface neededNew capOS infrastructure requiredDifficulty
DNS resolver (dns.c)malloc, time, socket subset, write(2), open RO file, poll-equivalentUDP socket cap + NetworkManager exposure of UDP; otherwise reuses Phase B TCP path infraSmaller – strictly additive (UDP is missing today but the kernel-side smoltcp stack supports it)
POSIX shell (dash)malloc, full stdio, file I/O, directory iteration, pipe(), fork-for-exec, exec, wait, env, time, signals (stub)Pipe primitive (new), Namespace+File cap surface, ProcessSpawner sidecar work to honour fd-action grants, env-vector handoffLarger – touches storage / IPC / process surfaces

Which blocks which

  • Both ports can run in parallel at the libcapos / libcapos-posix layer level: each pulls a disjoint subset of POSIX surfaces.
  • DNS resolver blocks on a new capOS surface (UDP cap exposure) but does not block on pipe(), fork(), or exec().
  • Shell blocks on (in order of probable cost): pipe primitive, ProcessSpawner fd-action support for stdin / stdout redirection, Namespace+File cap availability, env vector / LaunchParameters.
  • The library substrate (libcapos staticlib + libcapos-posix scaffold) blocks both. Once the substrate exists, the two ports proceed in parallel.
  1. libcapos staticlib v0 (Phase P1.1). The thin Rust .a with cap_call, capset_get, sys_exit, sys_cap_enter, heap. Plus a “C hello world” smoke that calls console_write_line() (mirrors the userspace-binaries proposal “Future Phase: libcapos for C”). This phase is the prerequisite for both P1.2 and P1.3.
  2. libcapos-posix scaffold – fd table, errno cell, stdio wrappers for fd 0/1/2, stub signals, _start glue that registers argv / envp from LaunchParameters (or empty arrays if that surface has not landed), basic malloc/free re-export.
  3. dns.c port (Phase P1.2). Library-layer work in P1.2 can overlap with library-layer work in P1.3, but both phases add interfaces to schema/capos.capnp and must serialise on the shared schema serial surface per docs/plans/README.md Concurrency Notes; the schema half of either phase cannot run concurrently with the schema half of the other.
  4. dash port (P1.3 lays the pipe + fork-for-exec primitives; the actual dash vendoring is a successor task that also depends on Namespace+File caps). The same schema serial-surface constraint applies to P1.3.

Critical path

The DNS resolver is the smaller-deps first slice only because of the shell’s fork / pipe / file dependencies. The shell-first ordering is viable, but it requires the pipe cap design + implementation plus Namespace + File caps (Phase 2 of storage-and-naming-proposal.md) ahead of the dash port. Both prerequisites are sizeable. The DNS resolver remains the faster proof of “POSIX adapter actually adapts something that was not written for capOS.”

What this slice does not promise

  • Not a path to running glibc-built binaries unchanged. Both ports are sources-on-disk recompiled against libcapos-posix. Binary compatibility with Linux ELFs is not in scope.
  • Not job control, not signals, not full POSIX session/pgrp model.
  • Not a libc – the POSIX surface ships just enough for dash and dns.c. printf family lands in libcapos-posix only because both ports need it; this is not a <stdio.h> for general use.
  • Not a reason to skip the native Rust paths – capos-shell (Rust shell/ crate) remains the default capOS shell. dash is for porting validation, not as the system shell.
  • Not a foundation for hosted C++. C++ requires explicit ABI decisions tracked separately in docs/proposals/userspace-binaries-proposal.md.

Phase Decomposition

Phases are dispatch-ready. P1.1 must land before P1.2 or P1.3 begin. P1.2 and P1.3 can overlap at the library and kernel-cap layer, but both add interfaces to schema/capos.capnp and must serialise on the shared schema serial surface per docs/plans/README.md Concurrency Notes; the schema halves cannot run concurrently.

Phase P1.1 – libcapos C-substrate v0 + C hello-world smoke

  • New crate libcapos/ with crate-type = ["staticlib"] and the C primitive surface (cap_call, capset_get, capset_iter, sys_exit, sys_cap_enter, heap).
  • New header tree under include/capos/.
  • New c-build Make helper that invokes clang --target=x86_64-unknown-none-elf -nostdlib -static, links libcapos.a, with capos-rt’s _start as the entry point that calls a C main() shim.
  • New demo demos/c-hello/: single .c file calling console_write_line().
  • New manifest system-c-hello.cue.
  • No POSIX surface, no errno, no pthreads. Heap re-exports the capos-rt fixed allocator.
  • Validation: make run-c-hello boots; the C binary prints hello from C and exits cleanly with code 0.

This phase is the strict prerequisite for the rest of the track.

Phase P1.2 – UDP cap surface + dns.c stub resolver smoke

  • Schema additions to schema/capos.capnp: new UdpSocket interface + NetworkManager.createUdpSocket method (small additive change).
  • Kernel: extend kernel/src/cap/network.rs with the UDP path mirroring the existing TCP path, and add UDP RX demux on the existing scheduler-polled smoltcp runtime in kernel/src/virtio.rs.
  • Userspace: new typed UdpSocketClient in capos-rt/src/client.rs.
  • New crate libcapos-posix/ with the minimal socket/sendto/recvfrom/poll surface for one UDP fd at a time.
  • Vendored dns.c under vendor/dns-c-wahern/ (single .c plus header).
  • New demo demos/posix-dns-resolver/.
  • New manifest system-posix-dns.cue; new Makefile target run-posix-dns-smoke.
  • Validation: end-to-end “boot capOS, launch resolver, print resolved <name> -> <addr>”. Single-fd resolver, single in-flight query is sufficient for v0.
  • Schema serial-surface coordination: queues on the shared schema/capos.capnp serial surface per docs/plans/README.md Concurrency Notes. Must not run concurrently with another schema- touching plan.

Depends on Phase P1.1.

Phase P1.3 – Pipe capability + fork-for-exec scaffolding

  • Schema additions to schema/capos.capnp: new Pipe interface (small additive change, distinct from UDP and LaunchParameters surfaces). EOF semantics on close.
  • Kernel: new kernel/src/cap/pipe.rs – bounded SPSC byte ring backed by a kernel-allocated MemoryObject page.
  • Kernel: extend kernel/src/cap/process_spawner.rs so spawn grants can mint Pipe halves and bind them to the child’s standard fds.
  • Userspace: new PipeClient in capos-rt/src/client.rs.
  • libcapos-posix extensions for pipe/dup2/close.
  • libcapos-posix extensions for fork/execve/waitpid (TLS “next exec is the real spawn” state machine, ProcessSpawner integration).
  • New demo demos/posix-pipe-shim/: a minimal C program that pipe()s, posix_spawn()s a child whose stdout is the write end, parent reads from the read end and prints. Plus a second smoke that exercises the §6-selected fork-for-exec path (either inter-call recording of dup2/close as posix_spawn file actions, or a patched-port variant), proving the path dash pipelines actually take.
  • New manifest system-posix-pipe.cue; new Makefile target run-posix-pipe-smoke.
  • Validation: end-to-end pipe smoke covers both the posix_spawn-direct path and the §6-selected fork-for-exec path, proving the primitive shell pipelines need before vendoring dash.
  • Schema serial-surface coordination: queues on the shared schema/capos.capnp serial surface per docs/plans/README.md Concurrency Notes. Must not run concurrently with P1.2 if both want the schema serial surface.

Depends on Phase P1.1.

The dash vendoring + full file I/O surface is a successor task that also depends on Namespace + File cap surface (storage Phase 2), which is not yet started.

Recommended dispatch ordering: P1.1 -> (P1.2 alternating with P1.3 on the schema serial surface) -> shell-port follow-on once Namespace + File caps land.

Trust Boundaries

BoundaryNative capOS servicePOSIX-shaped C binary on capOS
Authority sourceProcess CapSetProcess CapSet projected through libcapos-posix fd table
Memory isolationPage tablesPage tables (no wasm-style sandbox; libc has no extra runtime check)
Code integrityW^X + NXW^X + NX
Cap forgeryKernel-owned CapTableSame; the fd table is per-process userspace state, not authority
Resource limitsKernel quotasKernel quotas; ulimit is ENOSYS
Side channelsHardware-level (Spectre etc.)Same hardware level

A POSIX binary on capOS is more constrained than on Linux, not less. The adapter provides familiar function signatures, not familiar authority.

Validation

The first ports are not complete until they have QEMU evidence:

  • A POSIX binary prints through a granted Console / TerminalSession.
  • The same binary cannot use write to a fd it was not granted, cannot open() a path outside its preopened namespaces, and cannot call an unimplemented POSIX function without receiving ENOSYS.
  • A missing or wrong-interface cap lookup returns the documented errno (not a host-side panic, not silent success).
  • An owned result cap is released deterministically when the binary exits.
  • Each demo binary exits cleanly and does not wedge the kernel.

Host tests should cover errno mapping and the per-process fd table once those pieces are pure enough to test outside QEMU. Do not claim “POSIX adapter works” from host tests alone; the useful behavior is authority- shaped POSIX execution in capOS.

Open Questions

The following design decisions are documented as open questions because the planning phase recommends an answer but has not yet committed to one.

  1. POSIX shell candidate. Recommended: dash 0.5.13.x, vendored at a pinned tag under vendor/dash/. Alternatives: busybox ash (heavier framework cost), oksh (ksh-superset, larger surface), toysh (incomplete), custom Rust shell (defeats the purpose of porting C). Working answer: dash. Confirm or pick another before P1.3 successor work begins.
  2. DNS resolver candidate. Recommended: dns.c (wahern) as a single-file MIT C library with no required I/O model. Alternatives: c-ares (~3x larger, configure-driven, more invasive port), GNU adns (GPL-2.0+ – license question), musl res_query (requires linking musl, rejected), pure-Rust trust-dns (defeats the C-port purpose). Working answer: dns.c. Confirm or pick another before P1.2 begins.
  3. libcapos versioning and naming. The C library is just libcapos (mirrors the Rust capos-rt). Open question: should the POSIX layer be libcapos-posix (current recommendation), or a different name that avoids any Rust-side framework name collision? The C-side naming is settled; the POSIX-layer name remains an open question pending confirmation that no Rust framework will reuse the libcapos-posix identifier. Working answer: keep libcapos-posix for the POSIX layer.
  4. POSIX errno representation. The C ABI requires an int errno per thread. The Rust internals can either use a typed enum mapped to int at the boundary, or use raw i32 throughout. Recommended: typed Rust error type with one bidirectional mapping at the C boundary, so internal callers cannot accidentally invent unmapped values. Working answer: typed Rust error internally, int at the C ABI. Confirm before P1.2 begins.
  5. File descriptor table location. Recommended: per-process userspace state inside libcapos-posix, with the kernel knowing nothing about fds. Alternative: a kernel-side fd table (closer to Linux). The userspace location preserves the property that capOS authority is the capability table; a kernel fd table would duplicate authority. Working answer: per-process userspace state. Confirm before P1.2 begins.
  6. Fork policy. Confirm “fork-for-exec only” semantics. Real fork() is rejected. The shim turns fork() + execve() into posix_spawn(). Any fork() not followed by execve() returns -1 / ENOSYS on the next non-trivial syscall. The shell-pipeline pattern fork() -> dup2()/close() to wire stdin/stdout to a pipe end -> execve() is the most common shape that the strict fork-for-exec policy breaks; dash uses exactly this pattern for cmd1 | cmd2. To keep that pattern working, the shim must either (a) record dup2 / close calls between fork() and execve() as posix_spawn file actions and apply them to the spawn, or (b) require the port to be patched to call posix_spawn with explicit file actions. P1.3 must pick one before vendoring dash. Confirm before P1.3 begins.
  7. fd 0 backing for the shell. The natural mapping is the TerminalSession cap (read line + cooked-mode line discipline already exists in kernel and migrates to userspace at networking Phase C). For the DNS resolver fd 0 is unused and stays unmapped. Confirm TerminalSession is the canonical fd-0 backing.
  8. UDP cap surface scope. Minimum: NetworkManager.createUdpSocket(localPort?) -> socketIndex, UdpSocket.sendTo(addr, port, data) -> bytesSent, UdpSocket.recvFrom(maxLen) -> (addr, port, data), UdpSocket.close(). Same blocking model as TCP accept / recv (CQE on completion or timeout). Confirm shape, especially whether recvFrom should be readiness-based instead of blocking-with-timeout.
  9. Pipe cap design. Recommended: kernel-allocated bounded SPSC ring (page-sized) with EOF on close, exposed as two cap halves (PipeReader, PipeWriter) minted by ProcessSpawner. Alternative: shared MemoryObject + userspace ring (less kernel work, but harder to make EOF safe across process exits). Confirm before P1.3 begins.
  10. argv / envp source. This proposal assumes a future LaunchParameters cap delivers argv / envp through a typed cap. Until that cap lands, libcapos-posix can carry argv / envp via a fixed well-known cap or rodata blob. Confirm gate-on-LaunchParameters versus ship-stub.
  11. Linker / toolchain for C consumers. Recommended: clang --target=x86_64-unknown-none-elf -nostdlib -static, link against libcapos.a (and optionally libcapos-posix.a), reuse the existing capos-rt linker script. Confirm clang vs gcc and whether the track ships a shared cc-glue Cargo crate or a Make rule invoking cc directly.
  12. Vendoring policy. In-tree vendor/dash/, vendor/dns-c-wahern/ versus out-of-tree submodule versus separate repo. Working answer: in-tree vendoring with pinned tags, mirroring the planned vendor/piccolo-no_std/ shape from the Lua track.
  13. Audit / measure-mode interaction. The libcapos-posix wrappers must not break measure mode (the measure feature). Most wrappers only call libcapos, which only calls capos-rt, which is already measure-mode-clean, so this should be free; confirm whether the track adds a make run-measure smoke for one libcapos-posix binary as a regression gate.

Relationship to Other Proposals

  • Userspace Binaries owns the broader native-binary, language, and POSIX-adapter roadmap. This proposal supersedes Part 4 of that proposal with the full POSIX adapter design.
  • Programming Languages is the reader-facing summary of language support. Its C and POSIX rows will cross-link this proposal once the libcapos C-substrate v0 task lands the corresponding row updates; until then, this proposal stands as the long-form design source.
  • Networking defines NetworkManager, TcpListener, and TcpSocket and defers UDP. The DNS resolver port in Phase P1.2 adds the UdpSocket cap surface; the TCP cap surface is reused unchanged.
  • Storage and Naming defines the Directory / File / Store / Namespace surfaces that the shell port consumes. Phase 2/3 of that proposal gates the dash file I/O surface.
  • Service Architecture defines the future Resolver cap that the resolver port eventually exports.
  • Shell covers the native capos-shell. The POSIX shell port is for porting validation and does not replace capos-shell.
  • WASI Host Adapter is the parallel untrusted-portable execution path. POSIX adapter targets trusted source-recompiled C; WASI adapter targets sandboxed wasm modules. Both share the per-process fd-table and per-import authority pattern.
  • Lua Scripting is the capability-scoped trusted-script path; PUC Lua’s native build assumes a C substrate, so it eventually consumes libcapos.