Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: POSIX fork/execve Full-fd-table Inheritance

Make the capOS POSIX fork+execve recording shim inherit the parent’s full live fd table by default, honoring close-on-exec, so unmodified POSIX software (the dash port is the headline case) gets working stdin/stdout/stderr and an inherited cwd in its children without the application explicitly dup2-ing every descriptor. This reverses the v0 explicit-grant-only default, which is the inverse of real POSIX semantics, while keeping the capability model’s no-ambient-authority guarantee.

Why this is needed

capOS has no real fork (no address-space copy, no shared open-file descriptions). fork+execve is emulated by a recording shim (libcapos-posix/src/process.rs): fork() opens a recording window and returns 0; dup2/close between fork and execve are recorded as deferred fd actions; execve() replays them against a virtual child fd-view and forwards the resulting fds as CapGrants through ProcessSpawner.spawn. The child reconstructs its fd table from the named stdio_<N> grants (libcapos-posix/src/fd.rs inherit_stdio_grants).

The v0 contract is explicit-grant-only: in spawn_path_with_actions, only fd slots a recorded dup2/close touched become grants; untouched live slots are deliberately not inherited (the touched array gate). This is the inverse of POSIX, where a child inherits the parent’s entire fd table across fork+execve – every descriptor not marked O_CLOEXEC/FD_CLOEXEC – sharing the underlying open file descriptions.

The consequence is decisive for arbitrary POSIX software. Vanilla dash compiled JOBS=0 does not dup2 stdio for a foreground external command – only the FORK_BG path in vendor/dash/src/jobs.c (forkchild) manipulates fds. So dash -> ls-shim replays an empty action list and hands the child an empty CapSet: no stdout to print to, no Directory to list. This is not a dash bug; it is correct POSIX behavior (the child is expected to inherit dash’s stdio). The v0 shim’s inverted default breaks every POSIX program that relies on inheritance, which is essentially all of them.

The project directive is explicit: do not solve this with per-app dash patches (posix-p1-4-dash-shell-smoke). A fd-inheritance fix that must be re-applied to every POSIX program is not POSIX compatibility. The correct fix is to make the recording shim inherit the full fd table by default, like real POSIX, reconciled with the capability model.

Current state vs target

AspectRealized (done 2026-05-27)Notes
Inheritance defaultfull-table: every open slot forwards unless FD_CLOEXEC or a non-forwardable backingspawn_path_with_actions walks every open parent slot; recorded dup2/close are edits on the baseline
FD_CLOEXECenforced: an implicitly-inherited CLOEXEC slot is dropped at execve forward time; open(O_CLOEXEC) sets the bytean explicit recorded dup2 keeps its child slot (POSIX dup2 clears close-on-exec)
Terminal stdoutnon-destructive: the recording shim forwards TerminalSession via SpawnGrantMode::Raw (process.rs Terminal arm) over the Copy/SameSession bootstrap capparent keeps its terminal across the spawn (proof make run-posix-fd-inherit-default); kernel mint proven by make run-posix-terminal-forward
Writable File/DirectoryNonTransferable -> kernel rejects grant -> whole-spawn ENOEXECdocumented divergence (single-writer policy). v0 POSIX open mints only Copy/SameSession RAM/read-only caps, so none enters the fd table; a future writable-fs open path needs a pre-spawn transferability check to skip it non-fatally (follow-up)
Directory fd (open("/"))EISDIR; forwardable dir fd via dirfd(opendir()) (inherits by default under full-table)open(dir, O_RDONLY) -> FdBacking::Directory landed (§5, posix-open-directory-fd); non-O_RDONLY stays EISDIR

Target design

1. Full-fd-table inheritance default

execve() should forward the parent’s entire live fd table to the child, not only touched slots. The recording shim already builds a virtual child_view seeded from every open parent slot (spawn_path_with_actions); the change is to remove the touched-only gate so the forward list is built from every child_view[slot] == Some(parent_slot) entry, then apply the recorded dup2/close actions as edits on top of that baseline. The replay order is:

  1. Seed child_view[k] = Some(k) for every open parent slot k (already done).
  2. Apply recorded Dup2(src, dst) / Close(fd) actions in order (already done).
  3. New: skip any slot whose parent fd carries FD_CLOEXEC (see §2).
  4. Build a forward for every remaining child_view[child_slot] == Some(parent) entry – not only touched ones.

This makes the dash-> child case work: dash’s open stdio fds (0/1/2) flow to the child by default, exactly as POSIX requires, with no dup2 from dash.

A subtlety the v0 forward list already half-handles: the one-parent-slot-per- forward rule. Under full inheritance multiple child slots can legitimately map to the same parent fd (e.g. dash’s fd 0 and a child’s inherited fd 0 are the same open description). For non-destructive (Copy/Raw) backings this is fine – the parent keeps its cap and each child slot gets an independent Copy. For destructive (Move) backings (Pipe), the existing unique-owner / one-forward rule must hold: a single Move’d Pipe end cannot legitimately appear under two child slot names. The forward builder must therefore Copy-share where the backing permits and reject only the genuine Move-aliasing case, rather than the v0 blanket “one parent slot per forward for every backing type” rule. This is the main behavioral subtlety to get right and test.

2. close-on-exec enforcement

FD_CLOEXEC is currently stored per-fd (fd.rs FD_FLAGS) but never acted on, because the v0 explicit-grant model has no full-table walk to enforce it against. Under full inheritance there is now a walk: at execve forward-build time, a parent slot whose FD_FLAGS byte has FD_CLOEXEC set is not forwarded (equivalent to the recorded-Close path for that child slot). This needs a small read API on the fd module (e.g. fd::is_cloexec(slot)); the FD_FLAGS array already exists. O_CLOEXEC passed to open() must set the same byte at open time so the two surfaces agree. This is the POSIX-correctness half: inherit-all without CLOEXEC enforcement would leak descriptors a correct program expects closed (e.g. a listening socket dash opened for itself).

3. The TerminalSession-stdout problem (core decision)

Real POSIX dup-inherits the controlling terminal to all children non-destructively: a shell keeps its tty while every child writes to the same tty. The kernel precursor for this is now landed: the bootstrap TerminalSession cap is minted Copy/SameSession (boot_cap_hold, kernel/src/cap/mod.rs) and forwards non-destructively via SpawnGrantMode::Raw, proven by make run-posix-terminal-forward (a parent forwards its terminal to a child and both write distinct lines; the parent’s post-spawn write proves it kept its cap). The remaining gap is on the POSIX side: the recording shim still forwards a Terminal fd via destructive Move (process.rs Terminal arm) and must switch to Raw under posix-recording-shim-full-fd-inherit. Until then, forwarding fd 1 when it is a TerminalSession would still strip the parent under the shim path.

Decision (kernel mint landed): mint TerminalSession Copy/SameSession, matching Console, so it forwards via SpawnGrantMode::Raw non-destructively. This is safe because TerminalSessionCap (kernel/src/cap/terminal_session.rs) is a stateless unit struct – it carries no per-session ownership state; write/writeLine dispatch onto the shared kernel terminal, and readLine resolves caller context at call time (call_with_context). The Move/ServiceRegrantOnly choice was a policy default, not a state-ownership requirement. Minting it Copy/SameSession lets the parent keep its terminal cap while each child receives an independent Copy to the same shared terminal – which is exactly the POSIX all-children-share-the-tty semantics, realized through the capability model rather than against it.

Security/scope: Copy/SameSession keeps the cap from escaping the session (the same scope Console already uses); a child gains no authority the parent did not already hold (a write/read view of the same terminal it was already attached to). requires_live_caller_session stays true, so the child’s readLine still resolves against the child’s own live session context. This must be confirmed in the kernel slice’s security review, including that a forwarded terminal cap cannot outlive the session improperly and that line-discipline interleaving of two writers (parent + child) is acceptable for the research surface (it is: the shared kernel terminal already serializes writes; cooked-mode interleaving at sub-line granularity is a known, documented research-surface limitation, not a capability leak).

Alternative considered and rejected: a separate narrower TerminalWrite write-only cap (interface-is-the-permission). This is cleaner long-term but introduces a new interface, a new bootstrap source, a new FdBacking variant, and child-side adoption – disproportionate for v0 when the existing TerminalSession write surface is already the right shape and can be shared by a mint-mode change alone. Recorded as future work if a write-only child terminal view is later wanted.

4. Writable File/Directory single-writer tension

Real POSIX shares writable fds across fork (parent and child write to the same open description). capOS’s disk-backed writable filesystem enforces a fail-closed single-writer policy: writable File/Directory caps are minted NonTransferable (writable_fs::transfer_result_cap), so the kernel rejects the spawn grant and execve surfaces ENOEXEC.

Decision: keep writable File/Directory NonTransferable; document the divergence. Under full inheritance this means a child does not inherit a parent’s writable disk fd – execve must treat a NonTransferable backing as a non-fatal skip (drop that one fd from the child, like CLOEXEC) rather than a fatal ENOEXEC for the whole spawn. The v0 path made it fatal because the fd was explicitly dup2’d (the app asked for it); under full inheritance the fd is inherited implicitly, so failing the entire spawn because one incidental writable fd cannot transfer would break unrelated programs. The honest divergence: capOS shares the read path of the filesystem across fork (read-only caps are Copy/SameSession) but not the write path, because the single-writer policy is a deliberate capOS guarantee that has no POSIX analog. RAM scratch Directory/File (the kernel:directory/kernel:file sources) are Copy/SameSession and do inherit, matching the common shell-scratch case.

A future revocation-aware writable share (refcounted or session-scoped) is possible but out of scope; recorded as a follow-up. v0’s stance is: writable disk fds are not inheritable, skipped non-fatally, documented.

5. cwd Directory representation and inheritance

A shell’s children should be able to list/open the cwd without the app doing anything special. A forwardable directory fd is obtainable both via dirfd(opendir()) and, since posix-open-directory-fd, via open(dir, O_RDONLY) (libcapos-posix/src/file.rs). Two parts:

  • cwd as an inheritable Directory fd. Under full inheritance, if the shell holds an open FdBacking::Directory fd for its cwd, it forwards to the child by default (read-only RAM/readonly_fs dirs are Copy/SameSession). The child’s libc cwd resolution can then target the inherited dir fd. This is the primary mechanism and needs no new surface beyond full inheritance.
  • open(dir, O_RDONLY) -> Directory fd (landed, posix-open-directory-fd). open on a directory now installs a FdBacking::Directory fd instead of failing: read returns EISDIR, write returns EBADF, lseek returns EISDIR, and fdopendir consumes it. A non-O_RDONLY directory open stays EISDIR; a missing path keeps its original error (ENOENT). This covers the N</dir redirection path (dash redir uses sh_open -> open) without the bespoke dirfd(opendir()) dance. Proof make run-posix-open-dir-fd. It was decoupled from the headline path, which never depended on it.

6. Backward compatibility and re-verification

Changing the default from explicit-grant-only to full-inherit interacts with the just-landed explicit-grant contract and existing smokes. What must be re-verified when the behavior slice lands:

  • make run-posix-pipe-smoke – relies on explicit pipe-end Move grants. Under full inheritance the parent’s other open fds (e.g. its terminal stdio) would now also forward. The pipe child must still see EOF when the parent closes the write end, and the parent must not lose its own terminal (fixed by §3). The recorded close(write_end) still drops that child slot. Re-verify.
  • make run-posix-spawn-smokeposix_spawn with explicit file actions. The file-actions path must still honor explicit dup2/close; full inheritance is the baseline the actions edit on top of. Re-verify.
  • make run-posix-execve-inherit-smoke – the bespoke parent that explicitly dup2s a Directory/Console. Under full inheritance the explicit dup2s become redundant (the fds would inherit anyway) but must remain correct. Re-verify.
  • make run-posix-stdio-smoke / run-posix-stdio-terminal-smoke – stdio backing selection. Re-verify.

The capability-purity argument is unchanged: full-inherit is not ambient authority. The child inherits exactly the capabilities in the parent’s fd table (the same caps under the same slots), nothing more. There is no global namespace, no inherited credential, no kernel-side fd knowledge – the kernel still only sees an explicit List(CapGrant) from ProcessSpawner.spawn. The shim now computes that list from the full table instead of the touched subset; the kernel’s transfer-mode enforcement (process_spawner.rs) still gates every grant. A child can receive only caps the parent already holds and that are transferable; NonTransferable writable caps are skipped, not smuggled.

Implementation path (decomposed)

The work splits into a kernel cap-mode slice and a libcapos-posix behavior slice, with one optional narrow slice, all gating the dash shell smoke. See the ready task records:

  • posix-terminal-session-forwardable (behavior, kernel, done 2026-05-27) – mint TerminalSession Copy/SameSession so it forwards non-destructively via SpawnGrantMode::Raw. Precursor for the terminal-stdout half of §3. Proven by make run-posix-terminal-forward.
  • posix-recording-shim-full-fd-inherit (behavior, libcapos-posix, done 2026-05-27) – full-table inheritance default (§1), FD_CLOEXEC enforcement (§2), non-fatal skip of non-forwardable backings (Udp / already-moved / shared Pipe) when implicitly inherited (§4), and Copy-share of multi-aliased non-destructive backings (§1 subtlety). The recording-shim Terminal arm now forwards Raw (non-destructive). Proven by make run-posix-fd-inherit-default. A NonTransferable writable backing stays a documented whole-spawn ENOEXEC boundary; the v0 POSIX open surface mints no such cap, so the §4 non-fatal skip is realized for the backings that can actually arise.
  • posix-open-directory-fd (behavior, libcapos-posix, done) – open(dir, O_RDONLY) -> FdBacking::Directory (§5); non-O_RDONLY stays EISDIR, missing path keeps ENOENT. Proof make run-posix-open-dir-fd. Was off the headline critical path.

posix-p1-4-dash-shell-smoke (docs/tasks/) depends on the first two; once they land it can run with no per-app dash patch (only the generic, already- landed Variant A fork-exec patch set and the slash-bearing /ls-shim invocation to skip dash’s PATH stat, which is a documented dash-config choice, not a capOS workaround).

Per-app patch stance

The directive forbids per-app dash patches that would have to be repeated for every POSIX program. This design needs none: full inheritance is a generic capOS-side fix in the shim. The only acceptable vendored-dash touch is a generic POSIX-correctness item (the existing Variant A fork-exec coupling under vendor/dash/patches/, owned by posix-p1-4-dash-vendor), not a per-app inheritance workaround. The EV_EXIT in-place-exec residual (posix-p1-4-dash-shell-smoke) is the one remaining dash-specific item; it is a recording-shim “exec without prior fork” limitation, handled in the shell-smoke slice (disable the optimization or a bounded generic patch), not by this proposal.

Design grounding

  • libcapos-posix/src/process.rs (spawn_path_with_actions, fork, execve, the recording-shim contract), libcapos-posix/src/fd.rs (FdBacking, FD_FLAGS/FD_CLOEXEC, inherit_stdio_grants), libcapos-posix/src/terminal.rs, libcapos-posix/src/directory.rs, libcapos-posix/src/file.rs.
  • kernel/src/cap/mod.rs boot_cap_hold (Console and TerminalSession both Copy/SameSession since 2026-05-27), kernel/src/cap/terminal_session.rs (TerminalSessionCap stateless unit struct), kernel/src/cap/process_spawner.rs (validate_spawn_transfer_scope, transfer-mode enforcement).
  • schema/capos.capnp ProcessSpawner.spawn(... grants :List(CapGrant)), CapGrant, CapGrantMode.
  • docs/proposals/posix-adapter-proposal.md (recording-shim Variant A, fd-backing-cap inheritance), docs/capability-model.md (interface-is-the-permission, transfer modes/scopes).
  • docs/tasks/done/2026-05-27/posix-execve-capability-inheritance.md and docs/tasks/done/2026-05-26/spawn-grant-forwardable-readonly-directory.md (the landed explicit-grant inheritance this proposal generalizes), posix-p1-4-dash-shell-smoke (the premise conflict this resolves).