# Proposal: POSIX Compatibility Adapter

How capOS should host POSIX-shaped C software without recreating the ambient
authority that makes POSIX hard to confine, and which two ports validate the
adapter for the first time.


## Problem

capOS is not POSIX and is not trying to become POSIX. But useful software --
DNS resolvers, line-editing libraries, shells, archivers, compilers, network
clients -- assumes a POSIX surface. Rewriting each of these in capability-
native Rust would forfeit decades of debugging, security review, and
performance work for no isolation gain: a POSIX program whose only authority
is a typed capability set is already as confined as an equivalent native one.

The risk pattern is the one POSIX historically gets wrong: a translation layer
that synthesises ambient authority (a global `/`, an inherited credential
table, a kernel-managed file descriptor map) rebuilds the property capOS is
trying to leave behind. A useful adapter must do the opposite -- every POSIX
call must be backed by a typed capability the calling process already holds,
or it must fail closed with a documented errno.

Two upstream programs are the natural first validators of that adapter:

- A **POSIX shell** exercises the broadest surface (process, pipe, file, env,
  signal stubs, stdio).
- A **DNS resolver** exercises the smallest network surface (UDP socket,
  one-shot poll-equivalent, time, log).

Both are already small, mature, and BSD/MIT-licensed. Picking the smallest
representative of each category makes the adapter's first job a real port,
not a synthetic test.

## Scope

In scope:

- A two-layer C substrate: `libcapos` (thin Rust staticlib, capability ring +
  CapSet + raw syscalls + heap, C ABI) and `libcapos-posix` (POSIX shape on
  top: fd table, errno, path resolution, posix_spawn shim, signal stubs,
  pthread mapping).
- A first POSIX shell port that builds against `libcapos-posix` with no
  hidden ambient authority.
- A first DNS resolver port that builds against `libcapos-posix` with no
  hidden ambient authority.
- Phase decomposition (P1.1, P1.2, P1.3) that defers the adapter's biggest
  dependencies (Namespace + File caps for the shell file path; UDP cap for
  the resolver) into clearly-named gating phases.
- Validation through QEMU smokes that prove granted and ungranted paths.

Out of scope for the first implementation:

- Binary compatibility with Linux ELFs. Both ports are sources-on-disk
  recompiled against `libcapos-posix`.
- Full POSIX compliance. The adapter ships exactly the surface dash and dns.c
  exercise, plus any free additions that fall out.
- Real `fork()` (parent state inheritance, COW, sibling address-space surgery
  before exec). Only `fork()` followed promptly by `execve()` is supported,
  via a `posix_spawn`-shaped shim.
- Real signal delivery. `signal()`/`sigaction()` accept the call, store the
  handler, never invoke it. `kill(2)` requires a future `ProcessHandle` cap.
- Job control, process groups, sessions, controlling terminals.
- musl, glibc, or any other host libc. The substrate is Rust-authored and
  exposes a C ABI; it is not a libc port.
- Hosted C++. ABI decisions for C++ remain tracked in
  `docs/proposals/userspace-binaries-proposal.md`.

## Current Manual Pages

- [Programming Languages](../programming-languages.md) summarizes POSIX
  adapter status relative to Rust, C/C++, Python, Go, Lua, and WASI tracks.
- [Userspace Binaries](userspace-binaries-proposal.md) Part 4 sketches the
  POSIX adapter at a higher level. This proposal supersedes that sketch with
  the full design surface; the userspace-binaries proposal continues to own
  the broader native-binary, language, and adapter roadmap.
- [Userspace Runtime](../architecture/userspace-runtime.md) documents the
  implemented `capos-rt` surface that `libcapos` mirrors for C consumers.
- [Networking](networking-proposal.md) defines `NetworkManager`,
  `TcpListener`, and `TcpSocket` and explicitly defers `UdpSocket` until
  DNS / userspace-network work needs it. The DNS resolver port in this
  proposal defines the UDP cap surface; the TCP cap surface is reused
  unchanged.
- [Storage and Naming](storage-and-naming-proposal.md) defines the
  `Namespace`, `Directory`, `File`, and `Store` cap shape; these gate the
  shell port's filesystem surface (Phase 2/3 of that proposal).
- [Service Architecture](service-architecture-proposal.md) frames the future
  `Resolver` cap as the long-term consumer of the resolver process built in
  this track.
- [Shell](shell-proposal.md) covers the native `capos-shell`. The POSIX shell
  port (dash) is for porting validation, not as a replacement for the native
  shell.
- [WASI Host Adapter](wasi-host-adapter-proposal.md) is the parallel
  untrusted-portable execution path; both proposals share fd-table and
  per-import authority insight, but target different substrates.

## Research Grounding

Relevant research and external references:

- POSIX shell candidates surveyed: dash (Debian Almquist Shell, ~13 kSLOC,
  BSD; the canonical small POSIX-strict shell); busybox `ash`; OpenBSD ksh
  (oksh); toybox `toysh`. Source repositories cited inline in the candidate
  comparison table.
- DNS resolver candidates surveyed: `dns.c` by William Ahern (single-file
  MIT, ~10 kSLOC, no dependencies); c-ares; GNU adns; udns; SPCDNS; musl's
  embedded `res_query`; trust-dns-resolver. Source repositories cited inline
  in the candidate comparison table.
- libcapos prior art: this proposal builds on the `libcapos` shape sketched
  in [Userspace Binaries](userspace-binaries-proposal.md) "Future: C via
  `libcapos`" / "Future Phase: libcapos for C". The C substrate is designed
  as a Rust staticlib with a C ABI rather than musl, redox relibc, or a
  hand-rolled libc. Fuchsia's fdio + musl pattern and Redox's relibc
  pattern are the comparable points; capOS deliberately picks neither.
- POSIX surface translation: Cygwin's `fork()` emulation is the closest
  prior art for fork-for-exec semantics on top of a non-fork substrate; the
  capOS shim inverts the default (capOS *cannot* fork; the shim emulates
  the useful case) but uses the same call-pattern recognition.

In-tree research grounding:

- [Genode](../research/genode.md) -- per-session typed service interfaces
  and resource accounting are the closest precedent for routing every
  POSIX wrapper through a typed cap rather than through an ambient kernel
  syscall table. POSIX adapter wrappers should follow the same pattern at
  the library boundary instead of the kernel boundary.
- [OS Error Handling](../research/os-error-handling.md) -- cross-OS
  comparison of error-model surfaces. Informs the bidirectional mapping
  between `CapError` / `CapException` and POSIX errno (Open Question §4)
  and the decision to keep one shared mapping table at the C boundary
  rather than per-wrapper bespoke mappings.
- [LLVM Target](../research/llvm-target.md) -- target triple, calling
  convention, and bare-metal toolchain options for capOS C consumers.
  Informs Open Question §11 on the linker / toolchain choice (`clang
  --target=x86_64-unknown-none-elf -nostdlib -static`).

This proposal also lifts the capability-mapping shape and the "every
translation has authority backing" property from the WASI host adapter
proposal, and the `libcapos` staticlib shape from the userspace-binaries
proposal Part 2. It deliberately does not adopt the musl + `__syscall`
hook pattern noted in the userspace-binaries proposal "musl as a Base
(Optional, Later)" section, because the layered Rust staticlib shape is
preferred over a libc port for the v0 surface.

External:

- [dash][dash-debian] -- Debian Almquist Shell, ~13 kSLOC, Debian's
  `/bin/sh` since Squeeze (2011).
- [busybox `ash`][busybox-ash] -- alternative Almquist port, embedded.
- [oksh][oksh] -- portable OpenBSD ksh, public domain, larger surface.
- [toybox toysh][toybox] -- 0BSD, currently incomplete.
- [c-ares][c-ares] -- modern async DNS resolver, MIT, larger.
- [dns.c][wahern-dns] -- single-file non-blocking DNS, MIT, no deps.
- [GNU adns][gnu-adns] -- async DNS resolver, GPL-2.0+.
- [musl resolver][musl-resolver] -- embedded in musl libc; not available
  without linking musl.
- [udns][udns] -- small async stub-only resolver, LGPL-2.1.

## Design Principles

1. **POSIX is not a kernel feature.** The kernel sees ordinary userspace
   processes with a CapSet and a capability ring. `libcapos` and
   `libcapos-posix` are static libraries linked into those processes.
2. **Two layers, one C ABI per layer.** `libcapos` is the C-ABI mirror of
   `capos-rt`: capability ring, CapSet, raw syscalls, heap. It has no errno,
   no fd table, no `open`/`read`/`write`. `libcapos-posix` builds the POSIX
   shape on top. Programs that do not need POSIX semantics may link only
   `libcapos`.
3. **Authority is per-process, granted at spawn.** Every fd a POSIX program
   sees was granted to its parent process at spawn time and projected onto
   an fd by `libcapos-posix`. There is no ambient `/`, no inherited
   credential table, no global signal source.
4. **Schema-first, not POSIX-first, at the boundary.** Each POSIX wrapper is
   backed by a typed capability call with a documented errno mapping.
   POSIX-shaped integer fds and POSIX-shaped errno are an ABI requirement
   of the C substrate, not a capability-model concession.
5. **Fail closed.** Any unimplemented POSIX call returns `ENOSYS` and sets
   errno. Any cap lookup that fails returns the documented errno. Programs
   cannot probe absent caps for ambient behaviour.
6. **No fork without exec.** Only `fork()` followed by `execve()` is
   supported. The shim turns the pair into `posix_spawn()`. Bare `fork()`
   used to clone state in-process fails on the next non-trivial syscall.
7. **No real signals.** Handlers are accepted and stored, never delivered.
   `kill(2)` requires a future `ProcessHandle` cap and even then is limited
   to `SIGKILL`. Programs that depend on `SIGCHLD` job control are out of
   scope.
8. **The C substrate is Rust.** `libcapos` and `libcapos-posix` are Rust
   crates with `crate-type = ["staticlib"]`, all symbols `#[no_mangle]
   extern "C"`. This is **not** musl, **not** a hand-rolled libc.

## Architecture

```mermaid
flowchart TD
    Shell["POSIX shell binary<br/>(e.g. dash)"]
    Resolver["DNS resolver binary<br/>(e.g. dns.c)"]
    Posix["libcapos-posix<br/>(POSIX adapter, Rust staticlib, C ABI)"]
    PosixDetail["fd table per process<br/>path resolver over Namespace + Store<br/>errno mapping (TLS cell)<br/>posix_spawn over ProcessSpawner<br/>signal stubs<br/>pthread over ThreadSpawner"]
    Posix --> PosixDetail
    Capos["libcapos<br/>(thin Rust staticlib, C ABI)"]
    CaposDetail["cap_call / capset_get / capset_iter<br/>sys_exit / sys_cap_enter<br/>heap (malloc/free over capos-rt allocator)<br/>typed wrappers for Console / Terminal / etc."]
    Capos --> CaposDetail
    Rt["capos-rt<br/>(no_std + alloc Rust)"]
    Ring["capability ring"]
    Kernel["kernel CapObject dispatch"]
    Services["userspace services"]

    Shell -->|"open/read/write/exec/..."| Posix
    Resolver -->|"socket/sendto/recvfrom"| Posix
    Posix -->|"extern C"| Capos
    Capos -->|"Rust FFI re-export"| Rt
    Rt --> Ring
    Ring --> Kernel
    Ring --> Services
```

`libcapos` is the C-ABI projection of `capos-rt`. `libcapos-posix` is the
POSIX projection on top. Every POSIX call ultimately resolves to either a
capability invocation through the ring or a synthetic answer (errno,
ENOSYS) computed without authority.

## libcapos: C-Facing Substrate

Headers expected to ship under `include/capos/`:

```c
// capos.h -- capability primitives only
typedef struct cap_ring cap_ring_t;
typedef uint32_t        cap_id_t;
typedef uint64_t        iface_id_t;

cap_ring_t *capos_ring(void);                     // process ring handle
int  cap_call(cap_ring_t *ring,
              cap_id_t cap, uint16_t method,
              const void *params, size_t plen,
              void *result, size_t rlen,
              size_t *out_len);
int  capset_get(const char *name,
                cap_id_t *out_cap, iface_id_t *out_iface);
size_t capset_iter(void (*cb)(const char*, cap_id_t, iface_id_t,
                              void*), void *ud);
_Noreturn void sys_exit(int code);
uint32_t       sys_cap_enter(uint32_t min_complete, uint64_t timeout_ns);

// Heap (backed by capos-rt fixed heap; grow-on-demand later if needed)
void *capos_malloc(size_t);
void  capos_free(void*);
void *capos_calloc(size_t, size_t);
void *capos_realloc(void*, size_t);
```

There is **no** `errno` here, **no** `open`/`read`/`write`. Those live one
layer up. `libcapos` is the C-ABI mirror of `capos-rt`: startup, ring,
CapSet, raw syscalls, heap.

Build artifact: `target/.../libcapos.a` plus headers. Naming for the C
library is intentionally just **`libcapos`**, mirroring how the Rust
runtime crate is `capos-rt`. The C library name **`libcapos`** is
distinct from any Rust service framework that may carry a similar name;
this proposal owns the C-substrate name and treats Rust-framework
naming as out of scope.

## libcapos-posix: POSIX Surface

Headers under `include/capos/posix/`: `unistd.h`, `fcntl.h`, `errno.h`,
`sys/socket.h`, `netdb.h`, `sys/stat.h`, `dirent.h`, `string.h`, `stdlib.h`
(subset), `sys/types.h`, `pthread.h` (subset), `signal.h` (stub).

Implementation language: **Rust**, same crate-type pattern as `libcapos`,
but linked separately so a binary that does not need POSIX can omit it.

Errno bridge: per-thread `errno` cell stored in TLS slot owned by
`libcapos-posix`; populated by every wrapper that maps a Rust `CapError` to
a POSIX errno value. See "errno Convention" below.

### File descriptor table

Per-process userspace state inside `libcapos-posix`. Not a kernel object --
neither `libcapos` nor the kernel know anything about fds.

```rust
// libcapos-posix/src/fd.rs (sketch)
struct FdEntry {
    backing: FdBacking,       // Console / Stream / Listener / File / Dir
    flags:   i32,             // O_NONBLOCK, FD_CLOEXEC, ...
    cursor:  u64,             // for seekable backings
}

enum FdBacking {
    Stdin,                    // Console / TerminalSession (read side)
    Stdout,                   // Console (write side)
    Stderr,                   // Console (write side)
    File   { file: Cap<File>, dirty: bool },
    Dir    { dir:  Cap<Directory>, iter: usize },
    Tcp    { sock: Cap<TcpSocket> },
    Udp    { sock: Cap<UdpSocket> },
    Listener { l: Cap<TcpListener> },
}

static FD_TABLE: Mutex<BTreeMap<i32, FdEntry>> = ...;
static NEXT_FD:  AtomicI32 = AtomicI32::new(3);
```

`dup`/`dup2`/`close` operate on this table. `dup` increments a refcount on
the underlying cap; `close` releases when the last fd holding the cap drops.
Cap drop runs through `capos-rt` owned-handle release. The fd table is a
strict per-process userspace structure; it is not shared with the kernel
and is never serialised on the wire.

Standard fds wired at `_start`:

- fd 0: `stdin` cap from CapSet (TerminalSession, Console, or future
  StdinReader-shaped cap, whichever is granted).
- fd 1: `stdout` Console cap.
- fd 2: `stderr` Console cap (or distinct Log cap if granted).

### Process model: fork-for-exec only

capOS process creation is `ProcessSpawner.spawn(name, binaryName, grants)`
(`kernel/src/cap/process_spawner.rs`). There is no `fork()`, no
`exec()`-in-place.

Decision matrix (working answers; the policy choice is Open Question §6
and is not settled until that question is confirmed):

| Option | What it provides | Cost | Working answer |
|---|---|---|---|
| Emulate `fork()` as `posix_spawn` with inherited cap-set, recording inter-call `dup2`/`close` as posix_spawn file actions | Existing fork+exec and fork+dup2+exec pipeline patterns work with one patch site | Daemonisation and arbitrary COW state inheritance between fork and exec still break | Recommended primary for the shell, with documented "fork-for-exec only" semantics. Whether the shim records inter-call file actions or requires the port to call `posix_spawn` with explicit file actions is Open Question §6. |
| Return ENOSYS for any `fork()` | Honest | Every POSIX program that uses fork must be patched | Recommended **safety net** when fork-for-exec is misused |
| Process-shadow: a "POSIX process" wraps a capOS process | General | Large kernel + runtime change; doubles process accounting | Recommended **reject** for v0; revisit only if a real POSIX program needs it |

Working answer: fork-for-exec, with hard-fail as the safety net (subject to
Open Question §6 confirmation before P1.3 begins). Two `libcapos-posix`
shim variants are on the table; §6 selects between them:

- **Variant A -- recording shim.** `libcapos-posix` exposes `fork()` and
  `execve()` as a coupled shim that:
  1. `fork()` records "next exec is the real spawn" in TLS, returns 0 in
     the "child" pseudo-context (still in parent address space).
  2. `dup2()` / `close()` calls between `fork()` and `execve()` are
     recorded as `posix_spawn` file actions on the pending spawn rather
     than mutating the parent's fd table.
  3. `execve(path, argv, envp)` consumes the recorded intent, calls
     `ProcessSpawner.spawn()` with attenuated grants and the recorded
     file actions, returns the "child" PID to the parent path.
  4. Any `fork()` not followed by `execve()` before a syscall outside
     the recorded-action allowlist (e.g. `setsid`) returns -1 / ENOSYS
     on that downstream call.
- **Variant B -- patched-port shim.** `libcapos-posix` exposes only
  `posix_spawn()` with explicit file actions, plus stub `fork()` /
  `execve()` that return -1 / ENOSYS. Each port (dash and successors)
  is patched to translate its fork+dup2+exec sequence into a single
  `posix_spawn()` call with the equivalent file actions.

`posix_spawn()` is the preferred primitive in either variant and gets a
direct mapping to `ProcessSpawner.spawn()`. The choice between Variant
A and Variant B is Open Question §6.

### Signals

Stubbed. capOS has no signal mechanism today and the cap model disagrees
with ambient asynchronous interrupts.

- `signal()` / `sigaction()` accept the call, store the handler in a
  per-process table, never invoke it. Return success.
- `kill(pid, sig)` returns -1 / EPERM unless the caller has a
  `ProcessHandle` cap for the target -- and even then the only signal
  honoured is `SIGKILL`, which maps to a future `ProcessHandle.kill()`
  (not implemented yet, returns ENOSYS today).
- `pause()` / `sigsuspend()` / `sigwait()` block forever (or with timeout)
  via `sys_cap_enter(0, timeout)`; they never wake from a signal.
- `SIGPIPE` is never delivered. Writes on a closed connection return -1 /
  EPIPE.

This is acceptable for a shell + DNS resolver. Anything that depends on
real signals (job control with Ctrl-Z, Ctrl-C across pipelines, real
`SIGCHLD`) is out of scope for the first port. Job control in the shell
must be reimplemented over typed control caps, not signals.

### errno convention

Per-thread `errno` cell in TLS owned by `libcapos-posix`. Mapping table
(`libcapos-posix/src/errno_map.rs`):

| capOS `CapError` / `CapException` | POSIX errno |
|---|---|
| `CapError::NotFound`              | `ENOENT` |
| `CapError::PermissionDenied`      | `EACCES` |
| `CapError::Disconnected`          | `ECONNRESET` |
| `CapError::Timeout`               | `ETIMEDOUT` |
| `CapError::ResourceExhausted`     | `ENOMEM` / `EMFILE` (context dependent) |
| `CapError::InvalidArgument`       | `EINVAL` |
| `CapError::WouldBlock`            | `EAGAIN` |
| (fall-through)                    | `EIO` |

Wrappers always: clear errno, call, on error set errno + return -1 (int) or
NULL (pointer). Same convention as glibc / musl.

### Threading

pthreads -> capOS in-process threading. Substrate already exists in the
kernel: `ThreadSpawner`, `ThreadControl`, `ThreadHandle`, per-thread
FS-base, `ParkSpace`.

Mapping:

- `pthread_create` -> `ThreadSpawner.spawn` + start-routine trampoline.
- `pthread_exit`   -> `ThreadControl.exitThread`.
- `pthread_join`   -> `ThreadHandle.join` (block via `cap_enter`).
- `pthread_self`   -> TLS slot or `ThreadControl.currentId`.
- `pthread_mutex_*` -> ParkSpace-backed mutex (futex-style park / unpark).
- `pthread_cond_*`  -> ParkSpace + bounded waiter queue.
- `pthread_key_*`   -> fixed-size TLS slot table per thread.

This is in scope but **not on the critical path** for the shell or DNS
resolver -- both can run single-threaded for v0. The pthread shim is
deferred to a v1 successor.

## First Port: POSIX Shell

### Candidate survey

| Shell | License | Size | Deps | POSIX coverage | Verdict |
|---|---|---|---|---|---|
| **dash** ([upstream][dash-debian]) | BSD | ~13 kSLOC, ~134 KB | tiny libc subset; no readline; no termcap | Strict POSIX, no extensions | **Recommended primary** |
| **busybox ash** ([upstream][busybox-ash]) | GPL-2.0 | ~8 kSLOC of `shell/ash.c` + busybox infra | Designed for embedded, modular | POSIX + selectable extensions | Heavier framework cost; useful later when capOS wants a coreutils set |
| **toybox toysh** ([upstream][toybox]) | 0BSD | currently incomplete | Designed for self-contained ELF | POSIX + Bash compat target, **not finished** | Skip -- explicitly described upstream as still under development |
| **oksh** ([upstream][oksh]) | Public domain | ~308 KB binary, 0 deps | Optional ncurses for clear-screen only | Korn-shell superset of POSIX | Bigger surface than v0 needs to validate `libcapos-posix` |
| **Custom Rust shell** | n/a | n/a | n/a | n/a | **Reject -- defeats the purpose of porting C.** Native shell already exists at `shell/` (`capos-shell`). |

Recommended primary: **dash**.

Reasons:

1. Smallest established POSIX-strict shell. ~13 kSLOC is small enough for
   the porting team to read the entire codebase.
2. No readline / termcap dependency. The shell talks to whatever fd 0
   gives it. This is exactly what `libcapos-posix` provides through
   `TerminalSession` or `Console`.
3. Strict POSIX means the port does not accidentally validate Bash
   extensions that `libcapos-posix` does not implement.
4. Already proven as a porting target on Linux from Scratch, OpenWrt, and
   Alpine. Patterns for replacing the libc layer (`__syscall`, stubbed
   `sigaction`) are well documented.
5. Debian uses it as `/bin/sh` since Squeeze (2011), so any "POSIX shell
   only" script base in the wild is dash-compatible.

Open Question §1 below records that the candidate is a recommendation,
not a final decision.

### Required POSIX surface (v0)

What a `dash` instance actually exercises before printing a prompt and
running `ls | grep foo`:

| Group | Calls (minimum set) | Backed by |
|---|---|---|
| Process startup | `_start` shim, `argv`/`envp` parsing, `exit` | `libcapos` `_start`, `sys_exit` |
| Stdio | `read(0,...)`, `write(1,...)`, `write(2,...)` | Console / TerminalSession cap |
| Allocation | `malloc`/`free`/`calloc`/`realloc` | `libcapos` heap |
| String/format | `printf`/`fprintf`/`memcpy`/`strlen`/`strcmp`/`strchr`/`strncpy`/... | `libcapos-posix` string/printf subset |
| File I/O | `open`/`close`/`read`/`write`/`lseek`/`stat`/`fstat`/`access`/`unlink` | Namespace + File caps |
| Directory | `opendir`/`readdir`/`closedir` | Directory cap |
| Pipes | `pipe()`, `dup2()`, `close()` on fds | NEW `Pipe` capability (P1.3) |
| Process | `fork`+`execve` (fork-for-exec only), `posix_spawn`, `wait`/`waitpid` | ProcessSpawner + `ProcessHandle.wait` |
| Env | `getenv`/`setenv`/`putenv` | Per-process env vector in `libcapos-posix`; populated from a future `LaunchParameters` cap when one lands |
| Signals | `signal`/`kill`/`sigaction` (stubs) | TLS-stored handlers, never delivered |
| Time | `time`/`gettimeofday`/`nanosleep` | Timer cap |
| Misc | `getpid`/`getuid`/`getgid` | Synthetic per-process; uid/gid hardcoded for v0 |

**Critical gap:** `pipe()`. The shell pipeline `ls | grep foo` requires fd 1
of `ls` to feed fd 0 of `grep`. capOS has no pipe capability today. This is
the first-port-blocking item; see Phase P1.3.

### What dash will not get in v0

- Job control (Ctrl-Z, `bg`, `fg`, `&` background): requires real
  `SIGCHLD`/`SIGTSTP`. Skip; documented as out of scope.
- Process groups, sessions, controlling terminals: same reason.
- `trap` for signals other than `EXIT`: handlers stored, never fired.
- `read -t` (timeout): doable via Timer cap; defer to v1.
- `ulimit`: returns 0 / ENOSYS. Quotas are kernel-side capability ledgers,
  not POSIX rlimits.

### Validation smoke

`make run-posix-shell-smoke`:

1. Boot a manifest that grants `dash` a TerminalSession (stdio), a
   read-only Namespace cap rooted at a tiny in-rodata pseudo-fs, a
   ProcessSpawner narrowed to one allowed binary (`ls-shim`), and a
   Timer cap.
2. Pipe a heredoc into stdin: `ls; echo done`.
3. Assert kernel log shows `done` and clean exit.

Stretch goal smoke: `cat foo | grep bar` end-to-end (depends on the pipe
primitive landing).

## First Port: DNS Resolver

### Candidate survey

| Library | License | Source size | Deps | Async style | Verdict |
|---|---|---|---|---|---|
| **musl `res_query`** ([upstream][musl-resolver]) | MIT | ~2 kSLOC for resolver core | Embedded in musl | Synchronous (parallel queries internally) | Available *only if* the build links musl; capOS does not. **Skip.** |
| **c-ares** ([upstream][c-ares]) | MIT, C89 | ~30+ kSLOC, multi-file, configure-driven | POSIX sockets, optional threads | Native async (callbacks + select/poll/event loop) | Largest surface, most mature, most invasive port |
| **dns.c (wahern)** ([upstream][wahern-dns]) | MIT | **single-file C, ~10 kSLOC, no deps** | None -- caller provides socket I/O via three pluggable patterns (pollfd / events / timeout) | Non-blocking, no required callback shape | **Recommended primary** |
| **GNU adns** ([upstream][gnu-adns]) | GPL-2.0+ | Multi-file, ~10-15 kSLOC | POSIX, no event-loop integration | Async, opaque state | License is GPL-2.0+, not BSD/MIT. Skip unless capOS accepts a GPL component in the demo path. |
| **udns** ([upstream][udns]) | LGPL-2.1 | small | POSIX | Async stub-only | LGPL plus older project; skip unless dns.c blows up |
| **SPCDNS** | LGPL | small | encode/decode only, no socket | n/a | Skip -- provides no resolver loop |
| **trust-dns-resolver in Rust** | Apache-2 / MIT | large | Tokio | async | **Reject -- defeats the purpose of porting C.** Native Rust resolver is a separate path. |

Recommended primary: **dns.c** by William Ahern.

Reasons:

1. **Single-file, zero deps.** Drops into the build with a minimal `cc`
   rule. The build avoids configure scripts, pkg-config, optional
   feature matrices, and multi-file build orchestration.
2. **No fixed I/O model.** dns.c is designed around three common methods
   (pollfd, events, timeout). The host adapter plugs capability-backed
   socket I/O without rewriting the resolver core, replacing
   `socket()`/`sendto()`/`recvfrom()`/`poll()` with `libcapos-posix`
   wrappers that return fd-shaped results backed by `UdpSocket` /
   `TcpSocket` caps.
3. MIT license is capOS-compatible.
4. ~10 kSLOC means port review can read it end-to-end.
5. C89, no threading assumption, no global state surprises (resolver
   handle is opaque per-instance) -- fits a single-process v0 design.

Open Question §2 below records that the candidate is a recommendation,
not a final decision.

### Required POSIX surface (v0)

The DNS resolver port exercises a *very* narrow POSIX subset:

| Group | Calls | Backed by |
|---|---|---|
| Stdio (logs only) | `write(2,...)` | Console cap |
| Allocation | `malloc`/`free`/`calloc`/`realloc` | `libcapos` heap |
| Time | `clock_gettime`/`gettimeofday` | Timer cap |
| Sockets (UDP) | `socket(AF_INET, SOCK_DGRAM, 0)`, `sendto`, `recvfrom`, `bind`, `close`, `setsockopt` (subset) | NetworkManager + UdpSocket cap |
| Polling | `poll(fds, nfds, timeout_ms)` | Synthesised: each fd carries its underlying cap; `libcapos-posix` uses `cap_enter(min_complete=1, timeout_ns)` with one CQE per ready fd. No new kernel surface needed for v0 if dns.c uses one fd per query. |
| Resolv config | One in-rodata bounded text blob inlined into `libcapos-posix` (single nameserver entry; v0 ships before any storage cap exists) | No `open` / Namespace cap required for v0 |

No pipes, no fork, no exec, no signals, no `/etc/resolv.conf`-by-path,
no Namespace or File caps required. The DNS resolver is strictly easier
than the shell.

The v0 surface intentionally omits TCP fallback for truncated responses
and intentionally omits any path-based config file. The optional TCP
fallback row uses `socket(SOCK_STREAM)`, `connect`, `send`, `recv`
through the existing `NetworkManager` + `TcpSocket` cap, but only on a
later iteration once the v0 UDP-only smoke is green; see "What dns.c
will not get in v0" below.

**Critical gaps:**

- `UdpSocket` capability. The networking proposal Phase B implements TCP +
  listener only; UDP "is deferred until the userspace network stack or DNS
  work needs it; it is not part of the Telnet Shell Demo contract"
  (`networking-proposal.md`). The resolver port creates the UDP path; it
  does not consume an existing one.
- The future `Resolver` cap concept (in `service-architecture-proposal.md`
  "DNS resolver -- consumes a `UdpSocket`, exports `Resolver`") is a target
  once the UDP path exists. The first port produces the exported shape.

### What dns.c will not get in v0

- DNSSEC validation: dns.c supports it, depending on `/etc/resolv.conf`
  trust anchor config. Defer.
- TCP fallback for truncated responses: implement on a second iteration
  once the TCP capability path is reusable.
- `mDNS`: out of scope.
- Recursive mode (acting as a recursive resolver): out of scope; v0
  ships stub-only.

### Validation smoke

`make run-posix-dns-smoke`:

1. Boot a manifest that grants the resolver process a `NetworkManager`
   (or future narrowed `UdpSocket`-only authority), a Console cap, and
   a Timer cap. The single-nameserver resolv config is the in-rodata
   bounded text blob compiled into `libcapos-posix`; no Namespace or
   File cap is needed for v0.
2. The resolver opens a UDP socket, sends a query for a known A record
   to QEMU's user-mode 10.0.2.3 (slirp's built-in DNS) or to an in-host
   test resolver.
3. Resolver prints the resolved IPv4 address.
4. Assert kernel log line matches.

## Trade-offs and Ordering

### Smallest-deps comparison

| Port | C surface needed | New capOS infrastructure required | Difficulty |
|---|---|---|---|
| **DNS resolver (dns.c)** | malloc, time, socket subset, write(2), open RO file, poll-equivalent | UDP socket cap + NetworkManager exposure of UDP; otherwise reuses Phase B TCP path infra | **Smaller** -- strictly additive (UDP is missing today but the kernel-side smoltcp stack supports it) |
| **POSIX shell (dash)** | malloc, full stdio, file I/O, directory iteration, **pipe()**, fork-for-exec, exec, wait, env, time, signals (stub) | Pipe primitive (new), Namespace+File cap surface, ProcessSpawner sidecar work to honour fd-action grants, env-vector handoff | **Larger** -- touches storage / IPC / process surfaces |

### Which blocks which

- Both ports can run in parallel at the `libcapos` / `libcapos-posix`
  layer level: each pulls a disjoint subset of POSIX surfaces.
- DNS resolver blocks on a new capOS surface (UDP cap exposure) but does
  not block on `pipe()`, `fork()`, or `exec()`.
- Shell blocks on (in order of probable cost): pipe primitive,
  ProcessSpawner fd-action support for stdin / stdout redirection,
  Namespace+File cap availability, env vector / `LaunchParameters`.
- The library substrate (`libcapos` staticlib + `libcapos-posix` scaffold)
  blocks both. Once the substrate exists, the two ports proceed in
  parallel.

### Recommended sequence

1. **libcapos staticlib v0** (Phase P1.1). The thin Rust `.a` with
   `cap_call`, `capset_get`, `sys_exit`, `sys_cap_enter`, heap. Plus a "C
   hello world" smoke that calls `console_write_line()` (mirrors the
   userspace-binaries proposal "Future Phase: libcapos for C"). This phase
   is the prerequisite for both P1.2 and P1.3.
2. **libcapos-posix scaffold** -- fd table, errno cell, stdio wrappers for
   fd 0/1/2, stub signals, `_start` glue that registers `argv` / `envp`
   from `LaunchParameters` (or empty arrays if that surface has not
   landed), basic `malloc`/`free` re-export.
3. **dns.c port** (Phase P1.2). Library-layer work in P1.2 can overlap
   with library-layer work in P1.3, but both phases add interfaces to
   `schema/capos.capnp` and must serialise on the shared schema serial
   surface per `docs/plans/README.md` Concurrency Notes; the schema half
   of either phase cannot run concurrently with the schema half of the
   other.
4. **dash port** (P1.3 lays the pipe + fork-for-exec primitives; the
   actual dash vendoring is a successor task that also depends on
   Namespace+File caps). The same schema serial-surface constraint
   applies to P1.3.

### Critical path

The DNS resolver is the smaller-deps first slice **only because** of the
shell's fork / pipe / file dependencies. The shell-first ordering is
viable, but it requires the pipe cap design + implementation plus
Namespace + File caps (Phase 2 of `storage-and-naming-proposal.md`)
ahead of the dash port. Both prerequisites are sizeable. The DNS
resolver remains the faster proof of "POSIX adapter actually adapts
something that was not written for capOS."

### What this slice does not promise

- Not a path to running glibc-built binaries unchanged. Both ports are
  sources-on-disk recompiled against `libcapos-posix`. Binary
  compatibility with Linux ELFs is not in scope.
- Not job control, not signals, not full POSIX session/pgrp model.
- Not a libc -- the POSIX surface ships *just enough* for dash and dns.c.
  `printf` family lands in `libcapos-posix` only because both ports need
  it; this is not a `<stdio.h>` for general use.
- Not a reason to skip the native Rust paths -- `capos-shell` (Rust
  `shell/` crate) remains the default capOS shell. dash is for porting
  validation, not as the system shell.
- Not a foundation for hosted C++. C++ requires explicit ABI decisions
  tracked separately in `docs/proposals/userspace-binaries-proposal.md`.

## Phase Decomposition

Phases are dispatch-ready. P1.1 must land before P1.2 or P1.3 begin. P1.2
and P1.3 can overlap at the library and kernel-cap layer, but both add
interfaces to `schema/capos.capnp` and must serialise on the shared
schema serial surface per `docs/plans/README.md` Concurrency Notes; the
schema halves cannot run concurrently.

### Phase P1.1 -- libcapos C-substrate v0 + C hello-world smoke

- New crate `libcapos/` with `crate-type = ["staticlib"]` and the C
  primitive surface (`cap_call`, `capset_get`, `capset_iter`, `sys_exit`,
  `sys_cap_enter`, heap).
- New header tree under `include/capos/`.
- New `c-build` Make helper that invokes `clang
  --target=x86_64-unknown-none-elf -nostdlib -static`, links `libcapos.a`,
  with `capos-rt`'s `_start` as the entry point that calls a C `main()`
  shim.
- New demo `demos/c-hello/`: single `.c` file calling
  `console_write_line()`.
- New manifest `system-c-hello.cue`.
- No POSIX surface, no errno, no pthreads. Heap re-exports the `capos-rt`
  fixed allocator.
- Validation: `make run-c-hello` boots; the C binary prints
  `hello from C` and exits cleanly with code 0.

This phase is the strict prerequisite for the rest of the track.

### Phase P1.2 -- UDP cap surface + dns.c stub resolver smoke

- Schema additions to `schema/capos.capnp`: new `UdpSocket` interface +
  `NetworkManager.createUdpSocket` method (small additive change).
- Kernel: extend `kernel/src/cap/network.rs` with the UDP path mirroring
  the existing TCP path, and add UDP RX demux on the existing
  scheduler-polled smoltcp runtime in `kernel/src/virtio.rs`.
- Userspace: new typed `UdpSocketClient` in `capos-rt/src/client.rs`.
- New crate `libcapos-posix/` with the minimal
  `socket`/`sendto`/`recvfrom`/`poll` surface for one UDP fd at a time.
- Vendored dns.c under `vendor/dns-c-wahern/` (single `.c` plus header).
- New demo `demos/posix-dns-resolver/`.
- New manifest `system-posix-dns.cue`; new Makefile target
  `run-posix-dns-smoke`.
- Validation: end-to-end "boot capOS, launch resolver, print
  `resolved <name> -> <addr>`". Single-fd resolver, single in-flight
  query is sufficient for v0.
- Schema serial-surface coordination: queues on the shared
  `schema/capos.capnp` serial surface per `docs/plans/README.md`
  Concurrency Notes. Must not run concurrently with another schema-
  touching plan.

Depends on Phase P1.1.

### Phase P1.3 -- Pipe capability + fork-for-exec scaffolding

- Schema additions to `schema/capos.capnp`: new `Pipe` interface
  (small additive change, distinct from UDP and `LaunchParameters`
  surfaces). EOF semantics on close.
- Kernel: new `kernel/src/cap/pipe.rs` -- bounded SPSC byte ring backed
  by a kernel-allocated MemoryObject page.
- Kernel: extend `kernel/src/cap/process_spawner.rs` so spawn grants can
  mint `Pipe` halves and bind them to the child's standard fds.
- Userspace: new `PipeClient` in `capos-rt/src/client.rs`.
- `libcapos-posix` extensions for `pipe`/`dup2`/`close`.
- `libcapos-posix` extensions for `fork`/`execve`/`waitpid` (TLS "next
  exec is the real spawn" state machine, ProcessSpawner integration).
- New demo `demos/posix-pipe-shim/`: a minimal C program that `pipe()`s,
  `posix_spawn()`s a child whose stdout is the write end, parent reads
  from the read end and prints. Plus a second smoke that exercises the
  §6-selected fork-for-exec path (either inter-call recording of
  `dup2`/`close` as `posix_spawn` file actions, or a patched-port
  variant), proving the path dash pipelines actually take.
- New manifest `system-posix-pipe.cue`; new Makefile target
  `run-posix-pipe-smoke`.
- Validation: end-to-end pipe smoke covers both the `posix_spawn`-direct
  path and the §6-selected fork-for-exec path, proving the primitive
  shell pipelines need before vendoring dash.
- Schema serial-surface coordination: queues on the shared
  `schema/capos.capnp` serial surface per `docs/plans/README.md`
  Concurrency Notes. Must not run concurrently with P1.2 if both want
  the schema serial surface.

Depends on Phase P1.1.

The dash vendoring + full file I/O surface is a successor task that
*also* depends on Namespace + File cap surface (storage Phase 2), which
is not yet started.

Recommended dispatch ordering: P1.1 -> (P1.2 alternating with P1.3 on
the schema serial surface) -> shell-port follow-on once Namespace + File
caps land.

## Trust Boundaries

| Boundary | Native capOS service | POSIX-shaped C binary on capOS |
|---|---|---|
| Authority source | Process CapSet | Process CapSet projected through `libcapos-posix` fd table |
| Memory isolation | Page tables | Page tables (no wasm-style sandbox; libc has no extra runtime check) |
| Code integrity | W^X + NX | W^X + NX |
| Cap forgery | Kernel-owned `CapTable` | Same; the fd table is per-process userspace state, not authority |
| Resource limits | Kernel quotas | Kernel quotas; `ulimit` is ENOSYS |
| Side channels | Hardware-level (Spectre etc.) | Same hardware level |

A POSIX binary on capOS is more constrained than on Linux, not less. The
adapter provides familiar function signatures, not familiar authority.

## Validation

The first ports are not complete until they have QEMU evidence:

- A POSIX binary prints through a granted Console / TerminalSession.
- The same binary cannot use `write` to a fd it was not granted, cannot
  `open()` a path outside its preopened namespaces, and cannot call an
  unimplemented POSIX function without receiving `ENOSYS`.
- A missing or wrong-interface cap lookup returns the documented errno
  (not a host-side panic, not silent success).
- An owned result cap is released deterministically when the binary
  exits.
- Each demo binary exits cleanly and does not wedge the kernel.

Host tests should cover errno mapping and the per-process fd table once
those pieces are pure enough to test outside QEMU. Do not claim "POSIX
adapter works" from host tests alone; the useful behavior is authority-
shaped POSIX execution in capOS.

## Open Questions

The following design decisions are documented as open questions because
the planning phase recommends an answer but has not yet committed to one.

1. **POSIX shell candidate.** Recommended: **dash 0.5.13.x**, vendored at
   a pinned tag under `vendor/dash/`. Alternatives: busybox `ash`
   (heavier framework cost), oksh (ksh-superset, larger surface), toysh
   (incomplete), custom Rust shell (defeats the purpose of porting C).
   **Working answer:** dash. Confirm or pick another before P1.3
   successor work begins.
2. **DNS resolver candidate.** Recommended: **dns.c (wahern)** as a
   single-file MIT C library with no required I/O model. Alternatives:
   c-ares (~3x larger, configure-driven, more invasive port), GNU adns
   (GPL-2.0+ -- license question), musl `res_query` (requires linking
   musl, rejected), pure-Rust trust-dns (defeats the C-port purpose).
   **Working answer:** dns.c. Confirm or pick another before P1.2
   begins.
3. **libcapos versioning and naming.** The C library is just **`libcapos`**
   (mirrors the Rust `capos-rt`). Open question: should the POSIX layer
   be **`libcapos-posix`** (current recommendation), or a different name
   that avoids any Rust-side framework name collision? The C-side naming
   is settled; the POSIX-layer name remains an open question pending
   confirmation that no Rust framework will reuse the `libcapos-posix`
   identifier. Working answer: keep `libcapos-posix` for the POSIX
   layer.
4. **POSIX errno representation.** The C ABI requires an `int` errno per
   thread. The Rust internals can either use a typed `enum` mapped to
   `int` at the boundary, or use raw `i32` throughout. Recommended:
   typed Rust error type with one bidirectional mapping at the C
   boundary, so internal callers cannot accidentally invent unmapped
   values. **Working answer:** typed Rust error internally, `int` at
   the C ABI. Confirm before P1.2 begins.
5. **File descriptor table location.** Recommended: per-process userspace
   state inside `libcapos-posix`, with the kernel knowing nothing about
   fds. Alternative: a kernel-side fd table (closer to Linux). The
   userspace location preserves the property that capOS authority is the
   capability table; a kernel fd table would duplicate authority.
   **Working answer:** per-process userspace state. Confirm before P1.2
   begins.
6. **Fork policy.** Confirm "fork-for-exec only" semantics. Real `fork()`
   is rejected. The shim turns `fork()` + `execve()` into
   `posix_spawn()`. Any `fork()` not followed by `execve()` returns -1 /
   ENOSYS on the next non-trivial syscall. The shell-pipeline pattern
   `fork()` -> `dup2()`/`close()` to wire stdin/stdout to a pipe end ->
   `execve()` is the most common shape that the strict fork-for-exec
   policy breaks; dash uses exactly this pattern for `cmd1 | cmd2`. To
   keep that pattern working, the shim must either (a) record `dup2` /
   `close` calls between `fork()` and `execve()` as `posix_spawn` file
   actions and apply them to the spawn, or (b) require the port to be
   patched to call `posix_spawn` with explicit file actions. P1.3 must
   pick one before vendoring dash. Confirm before P1.3 begins.
7. **fd 0 backing for the shell.** The natural mapping is the
   `TerminalSession` cap (read line + cooked-mode line discipline
   already exists in kernel and migrates to userspace at networking
   Phase C). For the DNS resolver fd 0 is unused and stays unmapped.
   Confirm `TerminalSession` is the canonical fd-0 backing.
8. **UDP cap surface scope.** Minimum:
   `NetworkManager.createUdpSocket(localPort?) -> socketIndex`,
   `UdpSocket.sendTo(addr, port, data) -> bytesSent`,
   `UdpSocket.recvFrom(maxLen) -> (addr, port, data)`,
   `UdpSocket.close()`. Same blocking model as TCP `accept` / `recv`
   (CQE on completion or timeout). Confirm shape, especially whether
   `recvFrom` should be readiness-based instead of blocking-with-timeout.
9. **Pipe cap design.** Recommended: kernel-allocated bounded SPSC ring
   (page-sized) with EOF on close, exposed as two cap halves
   (`PipeReader`, `PipeWriter`) minted by ProcessSpawner. Alternative:
   shared MemoryObject + userspace ring (less kernel work, but harder
   to make EOF safe across process exits). Confirm before P1.3 begins.
10. **argv / envp source.** This proposal assumes a future
    `LaunchParameters` cap delivers argv / envp through a typed cap.
    Until that cap lands, `libcapos-posix` can carry argv / envp via a
    fixed well-known cap or rodata blob. Confirm gate-on-`LaunchParameters`
    versus ship-stub.
11. **Linker / toolchain for C consumers.** Recommended: `clang
    --target=x86_64-unknown-none-elf -nostdlib -static`, link against
    `libcapos.a` (and optionally `libcapos-posix.a`), reuse the existing
    `capos-rt` linker script. Confirm clang vs gcc and whether the
    track ships a shared `cc-glue` Cargo crate or a Make rule invoking
    `cc` directly.
12. **Vendoring policy.** In-tree `vendor/dash/`,
    `vendor/dns-c-wahern/` versus out-of-tree submodule versus separate
    repo. **Working answer:** in-tree vendoring with pinned tags,
    mirroring the planned `vendor/piccolo-no_std/` shape from the Lua
    track.
13. **Audit / measure-mode interaction.** The `libcapos-posix` wrappers
    must not break measure mode (the `measure` feature). Most wrappers
    only call `libcapos`, which only calls `capos-rt`, which is already
    measure-mode-clean, so this should be free; confirm whether the
    track adds a `make run-measure` smoke for one `libcapos-posix`
    binary as a regression gate.

## Relationship to Other Proposals

- **[Userspace Binaries](userspace-binaries-proposal.md)** owns the
  broader native-binary, language, and POSIX-adapter roadmap. This
  proposal supersedes Part 4 of that proposal with the full POSIX adapter
  design.
- **[Programming Languages](../programming-languages.md)** is the
  reader-facing summary of language support. Its C and POSIX rows will
  cross-link this proposal once the libcapos C-substrate v0 task lands
  the corresponding row updates; until then, this proposal stands as
  the long-form design source.
- **[Networking](networking-proposal.md)** defines `NetworkManager`,
  `TcpListener`, and `TcpSocket` and defers UDP. The DNS resolver port
  in Phase P1.2 adds the `UdpSocket` cap surface; the TCP cap surface
  is reused unchanged.
- **[Storage and Naming](storage-and-naming-proposal.md)** defines the
  `Directory` / `File` / `Store` / `Namespace` surfaces that the shell
  port consumes. Phase 2/3 of that proposal gates the dash file I/O
  surface.
- **[Service Architecture](service-architecture-proposal.md)** defines
  the future `Resolver` cap that the resolver port eventually exports.
- **[Shell](shell-proposal.md)** covers the native `capos-shell`. The
  POSIX shell port is for porting validation and does not replace
  `capos-shell`.
- **[WASI Host Adapter](wasi-host-adapter-proposal.md)** is the
  parallel untrusted-portable execution path. POSIX adapter targets
  trusted source-recompiled C; WASI adapter targets sandboxed wasm
  modules. Both share the per-process fd-table and per-import authority
  pattern.
- **[Lua Scripting](lua-scripting-proposal.md)** is the
  capability-scoped trusted-script path; PUC Lua's native build assumes
  a C substrate, so it eventually consumes `libcapos`.

[dash-debian]: https://packages.debian.org/sid/dash
[busybox-ash]: https://github.com/brgl/busybox/blob/master/shell/ash.c
[oksh]: https://github.com/ibara/oksh
[toybox]: https://landley.net/toybox/about.html
[c-ares]: https://c-ares.org/
[wahern-dns]: https://github.com/wahern/dns
[gnu-adns]: https://www.gnu.org/software/adns/
[musl-resolver]: https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001
[udns]: https://www.corpit.ru/mjt/udns.html
