Proposal: POSIX Compatibility Adapter
How capOS should host POSIX-shaped C software without recreating the ambient authority that makes POSIX hard to confine, and which two ports validate the adapter for the first time.
Problem
capOS is not POSIX and is not trying to become POSIX. But useful software – DNS resolvers, line-editing libraries, shells, archivers, compilers, network clients – assumes a POSIX surface. Rewriting each of these in capability- native Rust would forfeit decades of debugging, security review, and performance work for no isolation gain: a POSIX program whose only authority is a typed capability set is already as confined as an equivalent native one.
The risk pattern is the one POSIX historically gets wrong: a translation layer
that synthesises ambient authority (a global /, an inherited credential
table, a kernel-managed file descriptor map) rebuilds the property capOS is
trying to leave behind. A useful adapter must do the opposite – every POSIX
call must be backed by a typed capability the calling process already holds,
or it must fail closed with a documented errno.
Two upstream programs are the natural first validators of that adapter:
- A POSIX shell exercises the broadest surface (process, pipe, file, env, signal stubs, stdio).
- A DNS resolver exercises the smallest network surface (UDP socket, one-shot poll-equivalent, time, log).
Both are already small, mature, and BSD/MIT-licensed. Picking the smallest representative of each category makes the adapter’s first job a real port, not a synthetic test.
Scope
In scope:
- A two-layer C substrate:
libcapos(thin Rust staticlib, capability ring + CapSet + raw syscalls + heap, C ABI) andlibcapos-posix(POSIX shape on top: fd table, errno, path resolution, posix_spawn shim, signal stubs, pthread mapping). - A first POSIX shell port that builds against
libcapos-posixwith no hidden ambient authority. - A first DNS resolver port that builds against
libcapos-posixwith no hidden ambient authority. - Phase decomposition (P1.1, P1.2, P1.3) that defers the adapter’s biggest dependencies (Namespace + File caps for the shell file path; UDP cap for the resolver) into clearly-named gating phases.
- Validation through QEMU smokes that prove granted and ungranted paths.
Out of scope for the first implementation:
- Binary compatibility with Linux ELFs. Both ports are sources-on-disk
recompiled against
libcapos-posix. - Full POSIX compliance. The adapter ships exactly the surface dash and dns.c exercise, plus any free additions that fall out.
- Real
fork()(parent state inheritance, COW, sibling address-space surgery before exec). Onlyfork()followed promptly byexecve()is supported, via aposix_spawn-shaped shim. - Real signal delivery.
signal()/sigaction()accept the call, store the handler, never invoke it.kill(2)requires a futureProcessHandlecap. - Job control, process groups, sessions, controlling terminals.
- musl, glibc, or any other host libc. The substrate is Rust-authored and exposes a C ABI; it is not a libc port.
- Hosted C++. ABI decisions for C++ remain tracked in
docs/proposals/userspace-binaries-proposal.md.
Current Manual Pages
- Programming Languages summarizes POSIX adapter status relative to Rust, C/C++, Python, Go, Lua, and WASI tracks.
- Userspace Binaries Part 4 sketches the POSIX adapter at a higher level. This proposal supersedes that sketch with the full design surface; the userspace-binaries proposal continues to own the broader native-binary, language, and adapter roadmap.
- Userspace Runtime documents the
implemented
capos-rtsurface thatlibcaposmirrors for C consumers. - Networking defines
NetworkManager,TcpListener, andTcpSocketand explicitly defersUdpSocketuntil DNS / userspace-network work needs it. The DNS resolver port in this proposal defines the UDP cap surface; the TCP cap surface is reused unchanged. - Storage and Naming defines the
Namespace,Directory,File, andStorecap shape; these gate the shell port’s filesystem surface (Phase 2/3 of that proposal). - Service Architecture frames the future
Resolvercap as the long-term consumer of the resolver process built in this track. - Shell covers the native
capos-shell. The POSIX shell port (dash) is for porting validation, not as a replacement for the native shell. - WASI Host Adapter is the parallel untrusted-portable execution path; both proposals share fd-table and per-import authority insight, but target different substrates.
Research Grounding
Relevant research and external references:
- POSIX shell candidates surveyed: dash (Debian Almquist Shell, ~13 kSLOC,
BSD; the canonical small POSIX-strict shell); busybox
ash; OpenBSD ksh (oksh); toyboxtoysh. Source repositories cited inline in the candidate comparison table. - DNS resolver candidates surveyed:
dns.cby William Ahern (single-file MIT, ~10 kSLOC, no dependencies); c-ares; GNU adns; udns; SPCDNS; musl’s embeddedres_query; trust-dns-resolver. Source repositories cited inline in the candidate comparison table. - libcapos prior art: this proposal builds on the
libcaposshape sketched in Userspace Binaries “Future: C vialibcapos” / “Future Phase: libcapos for C”. The C substrate is designed as a Rust staticlib with a C ABI rather than musl, redox relibc, or a hand-rolled libc. Fuchsia’s fdio + musl pattern and Redox’s relibc pattern are the comparable points; capOS deliberately picks neither. - POSIX surface translation: Cygwin’s
fork()emulation is the closest prior art for fork-for-exec semantics on top of a non-fork substrate; the capOS shim inverts the default (capOS cannot fork; the shim emulates the useful case) but uses the same call-pattern recognition.
In-tree research grounding:
- Genode – per-session typed service interfaces and resource accounting are the closest precedent for routing every POSIX wrapper through a typed cap rather than through an ambient kernel syscall table. POSIX adapter wrappers should follow the same pattern at the library boundary instead of the kernel boundary.
- OS Error Handling – cross-OS
comparison of error-model surfaces. Informs the bidirectional mapping
between
CapError/CapExceptionand POSIX errno (Open Question §4) and the decision to keep one shared mapping table at the C boundary rather than per-wrapper bespoke mappings. - LLVM Target – target triple, calling
convention, and bare-metal toolchain options for capOS C consumers.
Informs Open Question §11 on the linker / toolchain choice (
clang --target=x86_64-unknown-none-elf -nostdlib -static).
This proposal also lifts the capability-mapping shape and the “every
translation has authority backing” property from the WASI host adapter
proposal, and the libcapos staticlib shape from the userspace-binaries
proposal Part 2. It deliberately does not adopt the musl + __syscall
hook pattern noted in the userspace-binaries proposal “musl as a Base
(Optional, Later)” section, because the layered Rust staticlib shape is
preferred over a libc port for the v0 surface.
External:
- dash – Debian Almquist Shell, ~13 kSLOC, Debian’s
/bin/shsince Squeeze (2011). - busybox
ash– alternative Almquist port, embedded. - oksh – portable OpenBSD ksh, public domain, larger surface.
- toybox toysh – 0BSD, currently incomplete.
- c-ares – modern async DNS resolver, MIT, larger.
- dns.c – single-file non-blocking DNS, MIT, no deps.
- GNU adns – async DNS resolver, GPL-2.0+.
- musl resolver – embedded in musl libc; not available without linking musl.
- udns – small async stub-only resolver, LGPL-2.1.
Design Principles
- POSIX is not a kernel feature. The kernel sees ordinary userspace
processes with a CapSet and a capability ring.
libcaposandlibcapos-posixare static libraries linked into those processes. - Two layers, one C ABI per layer.
libcaposis the C-ABI mirror ofcapos-rt: capability ring, CapSet, raw syscalls, heap. It has no errno, no fd table, noopen/read/write.libcapos-posixbuilds the POSIX shape on top. Programs that do not need POSIX semantics may link onlylibcapos. - Authority is per-process, granted at spawn. Every fd a POSIX program
sees was granted to its parent process at spawn time and projected onto
an fd by
libcapos-posix. There is no ambient/, no inherited credential table, no global signal source. - Schema-first, not POSIX-first, at the boundary. Each POSIX wrapper is backed by a typed capability call with a documented errno mapping. POSIX-shaped integer fds and POSIX-shaped errno are an ABI requirement of the C substrate, not a capability-model concession.
- Fail closed. Any unimplemented POSIX call returns
ENOSYSand sets errno. Any cap lookup that fails returns the documented errno. Programs cannot probe absent caps for ambient behaviour. - No fork without exec. Only
fork()followed byexecve()is supported. The shim turns the pair intoposix_spawn(). Barefork()used to clone state in-process fails on the next non-trivial syscall. - No real signals. Handlers are accepted and stored, never delivered.
kill(2)requires a futureProcessHandlecap and even then is limited toSIGKILL. Programs that depend onSIGCHLDjob control are out of scope. - The C substrate is Rust.
libcaposandlibcapos-posixare Rust crates withcrate-type = ["staticlib"], all symbols#[no_mangle] extern "C". This is not musl, not a hand-rolled libc.
Architecture
flowchart TD
Shell["POSIX shell binary<br/>(e.g. dash)"]
Resolver["DNS resolver binary<br/>(e.g. dns.c)"]
Posix["libcapos-posix<br/>(POSIX adapter, Rust staticlib, C ABI)"]
PosixDetail["fd table per process<br/>path resolver over Namespace + Store<br/>errno mapping (TLS cell)<br/>posix_spawn over ProcessSpawner<br/>signal stubs<br/>pthread over ThreadSpawner"]
Posix --> PosixDetail
Capos["libcapos<br/>(thin Rust staticlib, C ABI)"]
CaposDetail["cap_call / capset_get / capset_iter<br/>sys_exit / sys_cap_enter<br/>heap (malloc/free over capos-rt allocator)<br/>typed wrappers for Console / Terminal / etc."]
Capos --> CaposDetail
Rt["capos-rt<br/>(no_std + alloc Rust)"]
Ring["capability ring"]
Kernel["kernel CapObject dispatch"]
Services["userspace services"]
Shell -->|"open/read/write/exec/..."| Posix
Resolver -->|"socket/sendto/recvfrom"| Posix
Posix -->|"extern C"| Capos
Capos -->|"Rust FFI re-export"| Rt
Rt --> Ring
Ring --> Kernel
Ring --> Services
libcapos is the C-ABI projection of capos-rt. libcapos-posix is the
POSIX projection on top. Every POSIX call ultimately resolves to either a
capability invocation through the ring or a synthetic answer (errno,
ENOSYS) computed without authority.
libcapos: C-Facing Substrate
Headers expected to ship under include/capos/:
// capos.h -- capability primitives only
typedef struct cap_ring cap_ring_t;
typedef uint32_t cap_id_t;
typedef uint64_t iface_id_t;
cap_ring_t *capos_ring(void); // process ring handle
int cap_call(cap_ring_t *ring,
cap_id_t cap, uint16_t method,
const void *params, size_t plen,
void *result, size_t rlen,
size_t *out_len);
int capset_get(const char *name,
cap_id_t *out_cap, iface_id_t *out_iface);
size_t capset_iter(void (*cb)(const char*, cap_id_t, iface_id_t,
void*), void *ud);
_Noreturn void sys_exit(int code);
uint32_t sys_cap_enter(uint32_t min_complete, uint64_t timeout_ns);
// Heap (backed by capos-rt fixed heap; grow-on-demand later if needed)
void *capos_malloc(size_t);
void capos_free(void*);
void *capos_calloc(size_t, size_t);
void *capos_realloc(void*, size_t);
There is no errno here, no open/read/write. Those live one
layer up. libcapos is the C-ABI mirror of capos-rt: startup, ring,
CapSet, raw syscalls, heap.
Build artifact: target/.../libcapos.a plus headers. Naming for the C
library is intentionally just libcapos, mirroring how the Rust
runtime crate is capos-rt. The C library name libcapos is
distinct from any Rust service framework that may carry a similar name;
this proposal owns the C-substrate name and treats Rust-framework
naming as out of scope.
libcapos-posix: POSIX Surface
Headers under include/capos/posix/: unistd.h, fcntl.h, errno.h,
sys/socket.h, netdb.h, sys/stat.h, dirent.h, string.h, stdlib.h
(subset), sys/types.h, pthread.h (subset), signal.h (stub).
Implementation language: Rust, same crate-type pattern as libcapos,
but linked separately so a binary that does not need POSIX can omit it.
Errno bridge: per-thread errno cell stored in TLS slot owned by
libcapos-posix; populated by every wrapper that maps a Rust CapError to
a POSIX errno value. See “errno Convention” below.
File descriptor table
Per-process userspace state inside libcapos-posix. Not a kernel object –
neither libcapos nor the kernel know anything about fds.
#![allow(unused)]
fn main() {
// libcapos-posix/src/fd.rs (sketch)
struct FdEntry {
backing: FdBacking, // Console / Stream / Listener / File / Dir
flags: i32, // O_NONBLOCK, FD_CLOEXEC, ...
cursor: u64, // for seekable backings
}
enum FdBacking {
Stdin, // Console / TerminalSession (read side)
Stdout, // Console (write side)
Stderr, // Console (write side)
File { file: Cap<File>, dirty: bool },
Dir { dir: Cap<Directory>, iter: usize },
Tcp { sock: Cap<TcpSocket> },
Udp { sock: Cap<UdpSocket> },
Listener { l: Cap<TcpListener> },
}
static FD_TABLE: Mutex<BTreeMap<i32, FdEntry>> = ...;
static NEXT_FD: AtomicI32 = AtomicI32::new(3);
}
dup/dup2/close operate on this table. dup increments a refcount on
the underlying cap; close releases when the last fd holding the cap drops.
Cap drop runs through capos-rt owned-handle release. The fd table is a
strict per-process userspace structure; it is not shared with the kernel
and is never serialised on the wire.
Standard fds wired at _start:
- fd 0:
stdincap from CapSet (TerminalSession, Console, or future StdinReader-shaped cap, whichever is granted). - fd 1:
stdoutConsole cap. - fd 2:
stderrConsole cap (or distinct Log cap if granted).
Process model: fork-for-exec only
capOS process creation is ProcessSpawner.spawn(name, binaryName, grants)
(kernel/src/cap/process_spawner.rs). There is no fork(), no
exec()-in-place.
Decision matrix (working answers; the policy choice is Open Question §6 and is not settled until that question is confirmed):
| Option | What it provides | Cost | Working answer |
|---|---|---|---|
Emulate fork() as posix_spawn with inherited cap-set, recording inter-call dup2/close as posix_spawn file actions | Existing fork+exec and fork+dup2+exec pipeline patterns work with one patch site | Daemonisation and arbitrary COW state inheritance between fork and exec still break | Recommended primary for the shell, with documented “fork-for-exec only” semantics. Whether the shim records inter-call file actions or requires the port to call posix_spawn with explicit file actions is Open Question §6. |
Return ENOSYS for any fork() | Honest | Every POSIX program that uses fork must be patched | Recommended safety net when fork-for-exec is misused |
| Process-shadow: a “POSIX process” wraps a capOS process | General | Large kernel + runtime change; doubles process accounting | Recommended reject for v0; revisit only if a real POSIX program needs it |
Working answer: fork-for-exec, with hard-fail as the safety net (subject to
Open Question §6 confirmation before P1.3 begins). Two libcapos-posix
shim variants are on the table; §6 selects between them:
- Variant A – recording shim.
libcapos-posixexposesfork()andexecve()as a coupled shim that:fork()records “next exec is the real spawn” in TLS, returns 0 in the “child” pseudo-context (still in parent address space).dup2()/close()calls betweenfork()andexecve()are recorded asposix_spawnfile actions on the pending spawn rather than mutating the parent’s fd table.execve(path, argv, envp)consumes the recorded intent, callsProcessSpawner.spawn()with attenuated grants and the recorded file actions, returns the “child” PID to the parent path.- Any
fork()not followed byexecve()before a syscall outside the recorded-action allowlist (e.g.setsid) returns -1 / ENOSYS on that downstream call.
- Variant B – patched-port shim.
libcapos-posixexposes onlyposix_spawn()with explicit file actions, plus stubfork()/execve()that return -1 / ENOSYS. Each port (dash and successors) is patched to translate its fork+dup2+exec sequence into a singleposix_spawn()call with the equivalent file actions.
posix_spawn() is the preferred primitive in either variant and gets a
direct mapping to ProcessSpawner.spawn(). The choice between Variant
A and Variant B is Open Question §6.
Signals
Stubbed. capOS has no signal mechanism today and the cap model disagrees with ambient asynchronous interrupts.
signal()/sigaction()accept the call, store the handler in a per-process table, never invoke it. Return success.kill(pid, sig)returns -1 / EPERM unless the caller has aProcessHandlecap for the target – and even then the only signal honoured isSIGKILL, which maps to a futureProcessHandle.kill()(not implemented yet, returns ENOSYS today).pause()/sigsuspend()/sigwait()block forever (or with timeout) viasys_cap_enter(0, timeout); they never wake from a signal.SIGPIPEis never delivered. Writes on a closed connection return -1 / EPIPE.
This is acceptable for a shell + DNS resolver. Anything that depends on
real signals (job control with Ctrl-Z, Ctrl-C across pipelines, real
SIGCHLD) is out of scope for the first port. Job control in the shell
must be reimplemented over typed control caps, not signals.
errno convention
Per-thread errno cell in TLS owned by libcapos-posix. Mapping table
(libcapos-posix/src/errno_map.rs):
capOS CapError / CapException | POSIX errno |
|---|---|
CapError::NotFound | ENOENT |
CapError::PermissionDenied | EACCES |
CapError::Disconnected | ECONNRESET |
CapError::Timeout | ETIMEDOUT |
CapError::ResourceExhausted | ENOMEM / EMFILE (context dependent) |
CapError::InvalidArgument | EINVAL |
CapError::WouldBlock | EAGAIN |
| (fall-through) | EIO |
Wrappers always: clear errno, call, on error set errno + return -1 (int) or NULL (pointer). Same convention as glibc / musl.
Threading
pthreads -> capOS in-process threading. Substrate already exists in the
kernel: ThreadSpawner, ThreadControl, ThreadHandle, per-thread
FS-base, ParkSpace.
Mapping:
pthread_create->ThreadSpawner.spawn+ start-routine trampoline.pthread_exit->ThreadControl.exitThread.pthread_join->ThreadHandle.join(block viacap_enter).pthread_self-> TLS slot orThreadControl.currentId.pthread_mutex_*-> ParkSpace-backed mutex (futex-style park / unpark).pthread_cond_*-> ParkSpace + bounded waiter queue.pthread_key_*-> fixed-size TLS slot table per thread.
This is in scope but not on the critical path for the shell or DNS resolver – both can run single-threaded for v0. The pthread shim is deferred to a v1 successor.
First Port: POSIX Shell
Candidate survey
| Shell | License | Size | Deps | POSIX coverage | Verdict |
|---|---|---|---|---|---|
| dash (upstream) | BSD | ~13 kSLOC, ~134 KB | tiny libc subset; no readline; no termcap | Strict POSIX, no extensions | Recommended primary |
| busybox ash (upstream) | GPL-2.0 | ~8 kSLOC of shell/ash.c + busybox infra | Designed for embedded, modular | POSIX + selectable extensions | Heavier framework cost; useful later when capOS wants a coreutils set |
| toybox toysh (upstream) | 0BSD | currently incomplete | Designed for self-contained ELF | POSIX + Bash compat target, not finished | Skip – explicitly described upstream as still under development |
| oksh (upstream) | Public domain | ~308 KB binary, 0 deps | Optional ncurses for clear-screen only | Korn-shell superset of POSIX | Bigger surface than v0 needs to validate libcapos-posix |
| Custom Rust shell | n/a | n/a | n/a | n/a | Reject – defeats the purpose of porting C. Native shell already exists at shell/ (capos-shell). |
Recommended primary: dash.
Reasons:
- Smallest established POSIX-strict shell. ~13 kSLOC is small enough for the porting team to read the entire codebase.
- No readline / termcap dependency. The shell talks to whatever fd 0
gives it. This is exactly what
libcapos-posixprovides throughTerminalSessionorConsole. - Strict POSIX means the port does not accidentally validate Bash
extensions that
libcapos-posixdoes not implement. - Already proven as a porting target on Linux from Scratch, OpenWrt, and
Alpine. Patterns for replacing the libc layer (
__syscall, stubbedsigaction) are well documented. - Debian uses it as
/bin/shsince Squeeze (2011), so any “POSIX shell only” script base in the wild is dash-compatible.
Open Question §1 below records that the candidate is a recommendation, not a final decision.
Required POSIX surface (v0)
What a dash instance actually exercises before printing a prompt and
running ls | grep foo:
| Group | Calls (minimum set) | Backed by |
|---|---|---|
| Process startup | _start shim, argv/envp parsing, exit | libcapos _start, sys_exit |
| Stdio | read(0,...), write(1,...), write(2,...) | Console / TerminalSession cap |
| Allocation | malloc/free/calloc/realloc | libcapos heap |
| String/format | printf/fprintf/memcpy/strlen/strcmp/strchr/strncpy/… | libcapos-posix string/printf subset |
| File I/O | open/close/read/write/lseek/stat/fstat/access/unlink | Namespace + File caps |
| Directory | opendir/readdir/closedir | Directory cap |
| Pipes | pipe(), dup2(), close() on fds | NEW Pipe capability (P1.3) |
| Process | fork+execve (fork-for-exec only), posix_spawn, wait/waitpid | ProcessSpawner + ProcessHandle.wait |
| Env | getenv/setenv/putenv | Per-process env vector in libcapos-posix; populated from a future LaunchParameters cap when one lands |
| Signals | signal/kill/sigaction (stubs) | TLS-stored handlers, never delivered |
| Time | time/gettimeofday/nanosleep | Timer cap |
| Misc | getpid/getuid/getgid | Synthetic per-process; uid/gid hardcoded for v0 |
Critical gap: pipe(). The shell pipeline ls | grep foo requires fd 1
of ls to feed fd 0 of grep. capOS has no pipe capability today. This is
the first-port-blocking item; see Phase P1.3.
What dash will not get in v0
- Job control (Ctrl-Z,
bg,fg,&background): requires realSIGCHLD/SIGTSTP. Skip; documented as out of scope. - Process groups, sessions, controlling terminals: same reason.
trapfor signals other thanEXIT: handlers stored, never fired.read -t(timeout): doable via Timer cap; defer to v1.ulimit: returns 0 / ENOSYS. Quotas are kernel-side capability ledgers, not POSIX rlimits.
Validation smoke
make run-posix-shell-smoke:
- Boot a manifest that grants
dasha TerminalSession (stdio), a read-only Namespace cap rooted at a tiny in-rodata pseudo-fs, a ProcessSpawner narrowed to one allowed binary (ls-shim), and a Timer cap. - Pipe a heredoc into stdin:
ls; echo done. - Assert kernel log shows
doneand clean exit.
Stretch goal smoke: cat foo | grep bar end-to-end (depends on the pipe
primitive landing).
First Port: DNS Resolver
Candidate survey
| Library | License | Source size | Deps | Async style | Verdict |
|---|---|---|---|---|---|
musl res_query (upstream) | MIT | ~2 kSLOC for resolver core | Embedded in musl | Synchronous (parallel queries internally) | Available only if the build links musl; capOS does not. Skip. |
| c-ares (upstream) | MIT, C89 | ~30+ kSLOC, multi-file, configure-driven | POSIX sockets, optional threads | Native async (callbacks + select/poll/event loop) | Largest surface, most mature, most invasive port |
| dns.c (wahern) (upstream) | MIT | single-file C, ~10 kSLOC, no deps | None – caller provides socket I/O via three pluggable patterns (pollfd / events / timeout) | Non-blocking, no required callback shape | Recommended primary |
| GNU adns (upstream) | GPL-2.0+ | Multi-file, ~10-15 kSLOC | POSIX, no event-loop integration | Async, opaque state | License is GPL-2.0+, not BSD/MIT. Skip unless capOS accepts a GPL component in the demo path. |
| udns (upstream) | LGPL-2.1 | small | POSIX | Async stub-only | LGPL plus older project; skip unless dns.c blows up |
| SPCDNS | LGPL | small | encode/decode only, no socket | n/a | Skip – provides no resolver loop |
| trust-dns-resolver in Rust | Apache-2 / MIT | large | Tokio | async | Reject – defeats the purpose of porting C. Native Rust resolver is a separate path. |
Recommended primary: dns.c by William Ahern.
Reasons:
- Single-file, zero deps. Drops into the build with a minimal
ccrule. The build avoids configure scripts, pkg-config, optional feature matrices, and multi-file build orchestration. - No fixed I/O model. dns.c is designed around three common methods
(pollfd, events, timeout). The host adapter plugs capability-backed
socket I/O without rewriting the resolver core, replacing
socket()/sendto()/recvfrom()/poll()withlibcapos-posixwrappers that return fd-shaped results backed byUdpSocket/TcpSocketcaps. - MIT license is capOS-compatible.
- ~10 kSLOC means port review can read it end-to-end.
- C89, no threading assumption, no global state surprises (resolver handle is opaque per-instance) – fits a single-process v0 design.
Open Question §2 below records that the candidate is a recommendation, not a final decision.
Required POSIX surface (v0)
The DNS resolver port exercises a very narrow POSIX subset:
| Group | Calls | Backed by |
|---|---|---|
| Stdio (logs only) | write(2,...) | Console cap |
| Allocation | malloc/free/calloc/realloc | libcapos heap |
| Time | clock_gettime/gettimeofday | Timer cap |
| Sockets (UDP) | socket(AF_INET, SOCK_DGRAM, 0), sendto, recvfrom, bind, close, setsockopt (subset) | NetworkManager + UdpSocket cap |
| Polling | poll(fds, nfds, timeout_ms) | Synthesised: each fd carries its underlying cap; libcapos-posix uses cap_enter(min_complete=1, timeout_ns) with one CQE per ready fd. No new kernel surface needed for v0 if dns.c uses one fd per query. |
| Resolv config | One in-rodata bounded text blob inlined into libcapos-posix (single nameserver entry; v0 ships before any storage cap exists) | No open / Namespace cap required for v0 |
No pipes, no fork, no exec, no signals, no /etc/resolv.conf-by-path,
no Namespace or File caps required. The DNS resolver is strictly easier
than the shell.
The v0 surface intentionally omits TCP fallback for truncated responses
and intentionally omits any path-based config file. The optional TCP
fallback row uses socket(SOCK_STREAM), connect, send, recv
through the existing NetworkManager + TcpSocket cap, but only on a
later iteration once the v0 UDP-only smoke is green; see “What dns.c
will not get in v0” below.
Critical gaps:
UdpSocketcapability. The networking proposal Phase B implements TCP + listener only; UDP “is deferred until the userspace network stack or DNS work needs it; it is not part of the Telnet Shell Demo contract” (networking-proposal.md). The resolver port creates the UDP path; it does not consume an existing one.- The future
Resolvercap concept (inservice-architecture-proposal.md“DNS resolver – consumes aUdpSocket, exportsResolver”) is a target once the UDP path exists. The first port produces the exported shape.
What dns.c will not get in v0
- DNSSEC validation: dns.c supports it, depending on
/etc/resolv.conftrust anchor config. Defer. - TCP fallback for truncated responses: implement on a second iteration once the TCP capability path is reusable.
mDNS: out of scope.- Recursive mode (acting as a recursive resolver): out of scope; v0 ships stub-only.
Validation smoke
make run-posix-dns-smoke:
- Boot a manifest that grants the resolver process a
NetworkManager(or future narrowedUdpSocket-only authority), a Console cap, and a Timer cap. The single-nameserver resolv config is the in-rodata bounded text blob compiled intolibcapos-posix; no Namespace or File cap is needed for v0. - The resolver opens a UDP socket, sends a query for a known A record to QEMU’s user-mode 10.0.2.3 (slirp’s built-in DNS) or to an in-host test resolver.
- Resolver prints the resolved IPv4 address.
- Assert kernel log line matches.
Trade-offs and Ordering
Smallest-deps comparison
| Port | C surface needed | New capOS infrastructure required | Difficulty |
|---|---|---|---|
| DNS resolver (dns.c) | malloc, time, socket subset, write(2), open RO file, poll-equivalent | UDP socket cap + NetworkManager exposure of UDP; otherwise reuses Phase B TCP path infra | Smaller – strictly additive (UDP is missing today but the kernel-side smoltcp stack supports it) |
| POSIX shell (dash) | malloc, full stdio, file I/O, directory iteration, pipe(), fork-for-exec, exec, wait, env, time, signals (stub) | Pipe primitive (new), Namespace+File cap surface, ProcessSpawner sidecar work to honour fd-action grants, env-vector handoff | Larger – touches storage / IPC / process surfaces |
Which blocks which
- Both ports can run in parallel at the
libcapos/libcapos-posixlayer level: each pulls a disjoint subset of POSIX surfaces. - DNS resolver blocks on a new capOS surface (UDP cap exposure) but does
not block on
pipe(),fork(), orexec(). - Shell blocks on (in order of probable cost): pipe primitive,
ProcessSpawner fd-action support for stdin / stdout redirection,
Namespace+File cap availability, env vector /
LaunchParameters. - The library substrate (
libcaposstaticlib +libcapos-posixscaffold) blocks both. Once the substrate exists, the two ports proceed in parallel.
Recommended sequence
- libcapos staticlib v0 (Phase P1.1). The thin Rust
.awithcap_call,capset_get,sys_exit,sys_cap_enter, heap. Plus a “C hello world” smoke that callsconsole_write_line()(mirrors the userspace-binaries proposal “Future Phase: libcapos for C”). This phase is the prerequisite for both P1.2 and P1.3. - libcapos-posix scaffold – fd table, errno cell, stdio wrappers for
fd 0/1/2, stub signals,
_startglue that registersargv/envpfromLaunchParameters(or empty arrays if that surface has not landed), basicmalloc/freere-export. - dns.c port (Phase P1.2). Library-layer work in P1.2 can overlap
with library-layer work in P1.3, but both phases add interfaces to
schema/capos.capnpand must serialise on the shared schema serial surface perdocs/plans/README.mdConcurrency Notes; the schema half of either phase cannot run concurrently with the schema half of the other. - dash port (P1.3 lays the pipe + fork-for-exec primitives; the actual dash vendoring is a successor task that also depends on Namespace+File caps). The same schema serial-surface constraint applies to P1.3.
Critical path
The DNS resolver is the smaller-deps first slice only because of the
shell’s fork / pipe / file dependencies. The shell-first ordering is
viable, but it requires the pipe cap design + implementation plus
Namespace + File caps (Phase 2 of storage-and-naming-proposal.md)
ahead of the dash port. Both prerequisites are sizeable. The DNS
resolver remains the faster proof of “POSIX adapter actually adapts
something that was not written for capOS.”
What this slice does not promise
- Not a path to running glibc-built binaries unchanged. Both ports are
sources-on-disk recompiled against
libcapos-posix. Binary compatibility with Linux ELFs is not in scope. - Not job control, not signals, not full POSIX session/pgrp model.
- Not a libc – the POSIX surface ships just enough for dash and dns.c.
printffamily lands inlibcapos-posixonly because both ports need it; this is not a<stdio.h>for general use. - Not a reason to skip the native Rust paths –
capos-shell(Rustshell/crate) remains the default capOS shell. dash is for porting validation, not as the system shell. - Not a foundation for hosted C++. C++ requires explicit ABI decisions
tracked separately in
docs/proposals/userspace-binaries-proposal.md.
Phase Decomposition
Phases are dispatch-ready. P1.1 must land before P1.2 or P1.3 begin. P1.2
and P1.3 can overlap at the library and kernel-cap layer, but both add
interfaces to schema/capos.capnp and must serialise on the shared
schema serial surface per docs/plans/README.md Concurrency Notes; the
schema halves cannot run concurrently.
Phase P1.1 – libcapos C-substrate v0 + C hello-world smoke
- New crate
libcapos/withcrate-type = ["staticlib"]and the C primitive surface (cap_call,capset_get,capset_iter,sys_exit,sys_cap_enter, heap). - New header tree under
include/capos/. - New
c-buildMake helper that invokesclang --target=x86_64-unknown-none-elf -nostdlib -static, linkslibcapos.a, withcapos-rt’s_startas the entry point that calls a Cmain()shim. - New demo
demos/c-hello/: single.cfile callingconsole_write_line(). - New manifest
system-c-hello.cue. - No POSIX surface, no errno, no pthreads. Heap re-exports the
capos-rtfixed allocator. - Validation:
make run-c-helloboots; the C binary printshello from Cand exits cleanly with code 0.
This phase is the strict prerequisite for the rest of the track.
Phase P1.2 – UDP cap surface + dns.c stub resolver smoke
- Schema additions to
schema/capos.capnp: newUdpSocketinterface +NetworkManager.createUdpSocketmethod (small additive change). - Kernel: extend
kernel/src/cap/network.rswith the UDP path mirroring the existing TCP path, and add UDP RX demux on the existing scheduler-polled smoltcp runtime inkernel/src/virtio.rs. - Userspace: new typed
UdpSocketClientincapos-rt/src/client.rs. - New crate
libcapos-posix/with the minimalsocket/sendto/recvfrom/pollsurface for one UDP fd at a time. - Vendored dns.c under
vendor/dns-c-wahern/(single.cplus header). - New demo
demos/posix-dns-resolver/. - New manifest
system-posix-dns.cue; new Makefile targetrun-posix-dns-smoke. - Validation: end-to-end “boot capOS, launch resolver, print
resolved <name> -> <addr>”. Single-fd resolver, single in-flight query is sufficient for v0. - Schema serial-surface coordination: queues on the shared
schema/capos.capnpserial surface perdocs/plans/README.mdConcurrency Notes. Must not run concurrently with another schema- touching plan.
Depends on Phase P1.1.
Phase P1.3 – Pipe capability + fork-for-exec scaffolding
- Schema additions to
schema/capos.capnp: newPipeinterface (small additive change, distinct from UDP andLaunchParameterssurfaces). EOF semantics on close. - Kernel: new
kernel/src/cap/pipe.rs– bounded SPSC byte ring backed by a kernel-allocated MemoryObject page. - Kernel: extend
kernel/src/cap/process_spawner.rsso spawn grants can mintPipehalves and bind them to the child’s standard fds. - Userspace: new
PipeClientincapos-rt/src/client.rs. libcapos-posixextensions forpipe/dup2/close.libcapos-posixextensions forfork/execve/waitpid(TLS “next exec is the real spawn” state machine, ProcessSpawner integration).- New demo
demos/posix-pipe-shim/: a minimal C program thatpipe()s,posix_spawn()s a child whose stdout is the write end, parent reads from the read end and prints. Plus a second smoke that exercises the §6-selected fork-for-exec path (either inter-call recording ofdup2/closeasposix_spawnfile actions, or a patched-port variant), proving the path dash pipelines actually take. - New manifest
system-posix-pipe.cue; new Makefile targetrun-posix-pipe-smoke. - Validation: end-to-end pipe smoke covers both the
posix_spawn-direct path and the §6-selected fork-for-exec path, proving the primitive shell pipelines need before vendoring dash. - Schema serial-surface coordination: queues on the shared
schema/capos.capnpserial surface perdocs/plans/README.mdConcurrency Notes. Must not run concurrently with P1.2 if both want the schema serial surface.
Depends on Phase P1.1.
The dash vendoring + full file I/O surface is a successor task that also depends on Namespace + File cap surface (storage Phase 2), which is not yet started.
Recommended dispatch ordering: P1.1 -> (P1.2 alternating with P1.3 on the schema serial surface) -> shell-port follow-on once Namespace + File caps land.
Trust Boundaries
| Boundary | Native capOS service | POSIX-shaped C binary on capOS |
|---|---|---|
| Authority source | Process CapSet | Process CapSet projected through libcapos-posix fd table |
| Memory isolation | Page tables | Page tables (no wasm-style sandbox; libc has no extra runtime check) |
| Code integrity | W^X + NX | W^X + NX |
| Cap forgery | Kernel-owned CapTable | Same; the fd table is per-process userspace state, not authority |
| Resource limits | Kernel quotas | Kernel quotas; ulimit is ENOSYS |
| Side channels | Hardware-level (Spectre etc.) | Same hardware level |
A POSIX binary on capOS is more constrained than on Linux, not less. The adapter provides familiar function signatures, not familiar authority.
Validation
The first ports are not complete until they have QEMU evidence:
- A POSIX binary prints through a granted Console / TerminalSession.
- The same binary cannot use
writeto a fd it was not granted, cannotopen()a path outside its preopened namespaces, and cannot call an unimplemented POSIX function without receivingENOSYS. - A missing or wrong-interface cap lookup returns the documented errno (not a host-side panic, not silent success).
- An owned result cap is released deterministically when the binary exits.
- Each demo binary exits cleanly and does not wedge the kernel.
Host tests should cover errno mapping and the per-process fd table once those pieces are pure enough to test outside QEMU. Do not claim “POSIX adapter works” from host tests alone; the useful behavior is authority- shaped POSIX execution in capOS.
Open Questions
The following design decisions are documented as open questions because the planning phase recommends an answer but has not yet committed to one.
- POSIX shell candidate. Recommended: dash 0.5.13.x, vendored at
a pinned tag under
vendor/dash/. Alternatives: busyboxash(heavier framework cost), oksh (ksh-superset, larger surface), toysh (incomplete), custom Rust shell (defeats the purpose of porting C). Working answer: dash. Confirm or pick another before P1.3 successor work begins. - DNS resolver candidate. Recommended: dns.c (wahern) as a
single-file MIT C library with no required I/O model. Alternatives:
c-ares (~3x larger, configure-driven, more invasive port), GNU adns
(GPL-2.0+ – license question), musl
res_query(requires linking musl, rejected), pure-Rust trust-dns (defeats the C-port purpose). Working answer: dns.c. Confirm or pick another before P1.2 begins. - libcapos versioning and naming. The C library is just
libcapos(mirrors the Rustcapos-rt). Open question: should the POSIX layer belibcapos-posix(current recommendation), or a different name that avoids any Rust-side framework name collision? The C-side naming is settled; the POSIX-layer name remains an open question pending confirmation that no Rust framework will reuse thelibcapos-posixidentifier. Working answer: keeplibcapos-posixfor the POSIX layer. - POSIX errno representation. The C ABI requires an
interrno per thread. The Rust internals can either use a typedenummapped tointat the boundary, or use rawi32throughout. Recommended: typed Rust error type with one bidirectional mapping at the C boundary, so internal callers cannot accidentally invent unmapped values. Working answer: typed Rust error internally,intat the C ABI. Confirm before P1.2 begins. - File descriptor table location. Recommended: per-process userspace
state inside
libcapos-posix, with the kernel knowing nothing about fds. Alternative: a kernel-side fd table (closer to Linux). The userspace location preserves the property that capOS authority is the capability table; a kernel fd table would duplicate authority. Working answer: per-process userspace state. Confirm before P1.2 begins. - Fork policy. Confirm “fork-for-exec only” semantics. Real
fork()is rejected. The shim turnsfork()+execve()intoposix_spawn(). Anyfork()not followed byexecve()returns -1 / ENOSYS on the next non-trivial syscall. The shell-pipeline patternfork()->dup2()/close()to wire stdin/stdout to a pipe end ->execve()is the most common shape that the strict fork-for-exec policy breaks; dash uses exactly this pattern forcmd1 | cmd2. To keep that pattern working, the shim must either (a) recorddup2/closecalls betweenfork()andexecve()asposix_spawnfile actions and apply them to the spawn, or (b) require the port to be patched to callposix_spawnwith explicit file actions. P1.3 must pick one before vendoring dash. Confirm before P1.3 begins. - fd 0 backing for the shell. The natural mapping is the
TerminalSessioncap (read line + cooked-mode line discipline already exists in kernel and migrates to userspace at networking Phase C). For the DNS resolver fd 0 is unused and stays unmapped. ConfirmTerminalSessionis the canonical fd-0 backing. - UDP cap surface scope. Minimum:
NetworkManager.createUdpSocket(localPort?) -> socketIndex,UdpSocket.sendTo(addr, port, data) -> bytesSent,UdpSocket.recvFrom(maxLen) -> (addr, port, data),UdpSocket.close(). Same blocking model as TCPaccept/recv(CQE on completion or timeout). Confirm shape, especially whetherrecvFromshould be readiness-based instead of blocking-with-timeout. - Pipe cap design. Recommended: kernel-allocated bounded SPSC ring
(page-sized) with EOF on close, exposed as two cap halves
(
PipeReader,PipeWriter) minted by ProcessSpawner. Alternative: shared MemoryObject + userspace ring (less kernel work, but harder to make EOF safe across process exits). Confirm before P1.3 begins. - argv / envp source. This proposal assumes a future
LaunchParameterscap delivers argv / envp through a typed cap. Until that cap lands,libcapos-posixcan carry argv / envp via a fixed well-known cap or rodata blob. Confirm gate-on-LaunchParametersversus ship-stub. - Linker / toolchain for C consumers. Recommended:
clang --target=x86_64-unknown-none-elf -nostdlib -static, link againstlibcapos.a(and optionallylibcapos-posix.a), reuse the existingcapos-rtlinker script. Confirm clang vs gcc and whether the track ships a sharedcc-glueCargo crate or a Make rule invokingccdirectly. - Vendoring policy. In-tree
vendor/dash/,vendor/dns-c-wahern/versus out-of-tree submodule versus separate repo. Working answer: in-tree vendoring with pinned tags, mirroring the plannedvendor/piccolo-no_std/shape from the Lua track. - Audit / measure-mode interaction. The
libcapos-posixwrappers must not break measure mode (themeasurefeature). Most wrappers only calllibcapos, which only callscapos-rt, which is already measure-mode-clean, so this should be free; confirm whether the track adds amake run-measuresmoke for onelibcapos-posixbinary as a regression gate.
Relationship to Other Proposals
- Userspace Binaries owns the broader native-binary, language, and POSIX-adapter roadmap. This proposal supersedes Part 4 of that proposal with the full POSIX adapter design.
- Programming Languages is the reader-facing summary of language support. Its C and POSIX rows will cross-link this proposal once the libcapos C-substrate v0 task lands the corresponding row updates; until then, this proposal stands as the long-form design source.
- Networking defines
NetworkManager,TcpListener, andTcpSocketand defers UDP. The DNS resolver port in Phase P1.2 adds theUdpSocketcap surface; the TCP cap surface is reused unchanged. - Storage and Naming defines the
Directory/File/Store/Namespacesurfaces that the shell port consumes. Phase 2/3 of that proposal gates the dash file I/O surface. - Service Architecture defines
the future
Resolvercap that the resolver port eventually exports. - Shell covers the native
capos-shell. The POSIX shell port is for porting validation and does not replacecapos-shell. - WASI Host Adapter is the parallel untrusted-portable execution path. POSIX adapter targets trusted source-recompiled C; WASI adapter targets sandboxed wasm modules. Both share the per-process fd-table and per-import authority pattern.
- Lua Scripting is the
capability-scoped trusted-script path; PUC Lua’s native build assumes
a C substrate, so it eventually consumes
libcapos.