Proposal: POSIX fork/execve Full-fd-table Inheritance
Make the capOS POSIX fork+execve recording shim inherit the parent’s full
live fd table by default, honoring close-on-exec, so unmodified POSIX software
(the dash port is the headline case) gets working stdin/stdout/stderr and an
inherited cwd in its children without the application explicitly dup2-ing every
descriptor. This reverses the v0 explicit-grant-only default, which is the
inverse of real POSIX semantics, while keeping the capability model’s
no-ambient-authority guarantee.
Why this is needed
capOS has no real fork (no address-space copy, no shared open-file
descriptions). fork+execve is emulated by a recording shim
(libcapos-posix/src/process.rs): fork() opens a recording window and returns
0; dup2/close between fork and execve are recorded as deferred fd
actions; execve() replays them against a virtual child fd-view and forwards the
resulting fds as CapGrants through ProcessSpawner.spawn. The child
reconstructs its fd table from the named stdio_<N> grants
(libcapos-posix/src/fd.rs inherit_stdio_grants).
The v0 contract is explicit-grant-only: in spawn_path_with_actions, only
fd slots a recorded dup2/close touched become grants; untouched live slots
are deliberately not inherited (the touched array gate). This is the inverse of
POSIX, where a child inherits the parent’s entire fd table across
fork+execve – every descriptor not marked O_CLOEXEC/FD_CLOEXEC – sharing
the underlying open file descriptions.
The consequence is decisive for arbitrary POSIX software. Vanilla dash compiled
JOBS=0 does not dup2 stdio for a foreground external command – only the
FORK_BG path in vendor/dash/src/jobs.c (forkchild) manipulates fds. So
dash -> ls-shim replays an empty action list and hands the child an empty
CapSet: no stdout to print to, no Directory to list. This is not a dash bug;
it is correct POSIX behavior (the child is expected to inherit dash’s stdio). The
v0 shim’s inverted default breaks every POSIX program that relies on inheritance,
which is essentially all of them.
The project directive is explicit: do not solve this with per-app dash patches (posix-p1-4-dash-shell-smoke). A fd-inheritance fix that must be re-applied to every POSIX program is not POSIX compatibility. The correct fix is to make the recording shim inherit the full fd table by default, like real POSIX, reconciled with the capability model.
Current state vs target
| Aspect | Realized (done 2026-05-27) | Notes |
|---|---|---|
| Inheritance default | full-table: every open slot forwards unless FD_CLOEXEC or a non-forwardable backing | spawn_path_with_actions walks every open parent slot; recorded dup2/close are edits on the baseline |
FD_CLOEXEC | enforced: an implicitly-inherited CLOEXEC slot is dropped at execve forward time; open(O_CLOEXEC) sets the byte | an explicit recorded dup2 keeps its child slot (POSIX dup2 clears close-on-exec) |
| Terminal stdout | non-destructive: the recording shim forwards TerminalSession via SpawnGrantMode::Raw (process.rs Terminal arm) over the Copy/SameSession bootstrap cap | parent keeps its terminal across the spawn (proof make run-posix-fd-inherit-default); kernel mint proven by make run-posix-terminal-forward |
| Writable File/Directory | NonTransferable -> kernel rejects grant -> whole-spawn ENOEXEC | documented divergence (single-writer policy). v0 POSIX open mints only Copy/SameSession RAM/read-only caps, so none enters the fd table; a future writable-fs open path needs a pre-spawn transferability check to skip it non-fatally (follow-up) |
Directory fd (open("/")) | EISDIR; forwardable dir fd via dirfd(opendir()) (inherits by default under full-table) | open(dir, O_RDONLY) -> FdBacking::Directory landed (§5, posix-open-directory-fd); non-O_RDONLY stays EISDIR |
Target design
1. Full-fd-table inheritance default
execve() should forward the parent’s entire live fd table to the child,
not only touched slots. The recording shim already builds a virtual child_view
seeded from every open parent slot (spawn_path_with_actions); the change is to
remove the touched-only gate so the forward list is built from every
child_view[slot] == Some(parent_slot) entry, then apply the recorded
dup2/close actions as edits on top of that baseline. The replay order is:
- Seed
child_view[k] = Some(k)for every open parent slotk(already done). - Apply recorded
Dup2(src, dst)/Close(fd)actions in order (already done). - New: skip any slot whose parent fd carries
FD_CLOEXEC(see §2). - Build a forward for every remaining
child_view[child_slot] == Some(parent)entry – not onlytouchedones.
This makes the dash-> child case work: dash’s open stdio fds (0/1/2) flow to the
child by default, exactly as POSIX requires, with no dup2 from dash.
A subtlety the v0 forward list already half-handles: the one-parent-slot-per-
forward rule. Under full inheritance multiple child slots can legitimately map to
the same parent fd (e.g. dash’s fd 0 and a child’s inherited fd 0 are the same
open description). For non-destructive (Copy/Raw) backings this is fine – the
parent keeps its cap and each child slot gets an independent Copy. For
destructive (Move) backings (Pipe), the existing unique-owner / one-forward
rule must hold: a single Move’d Pipe end cannot legitimately appear under two
child slot names. The forward builder must therefore Copy-share where the backing
permits and reject only the genuine Move-aliasing case, rather than the v0 blanket
“one parent slot per forward for every backing type” rule. This is the main
behavioral subtlety to get right and test.
2. close-on-exec enforcement
FD_CLOEXEC is currently stored per-fd (fd.rs FD_FLAGS) but never acted on,
because the v0 explicit-grant model has no full-table walk to enforce it against.
Under full inheritance there is now a walk: at execve forward-build time, a
parent slot whose FD_FLAGS byte has FD_CLOEXEC set is not forwarded
(equivalent to the recorded-Close path for that child slot). This needs a small
read API on the fd module (e.g. fd::is_cloexec(slot)); the FD_FLAGS array
already exists. O_CLOEXEC passed to open() must set the same byte at open
time so the two surfaces agree. This is the POSIX-correctness half: inherit-all
without CLOEXEC enforcement would leak descriptors a correct program expects
closed (e.g. a listening socket dash opened for itself).
3. The TerminalSession-stdout problem (core decision)
Real POSIX dup-inherits the controlling terminal to all children
non-destructively: a shell keeps its tty while every child writes to the same
tty. The kernel precursor for this is now landed: the bootstrap TerminalSession
cap is minted Copy/SameSession (boot_cap_hold, kernel/src/cap/mod.rs) and
forwards non-destructively via SpawnGrantMode::Raw, proven by
make run-posix-terminal-forward (a parent forwards its terminal to a child and
both write distinct lines; the parent’s post-spawn write proves it kept its cap).
The remaining gap is on the POSIX side: the recording shim still forwards a
Terminal fd via destructive Move (process.rs Terminal arm) and must switch
to Raw under posix-recording-shim-full-fd-inherit. Until then, forwarding fd 1
when it is a TerminalSession would still strip the parent under the shim path.
Decision (kernel mint landed): mint TerminalSession Copy/SameSession,
matching Console, so it forwards via SpawnGrantMode::Raw non-destructively.
This is safe because
TerminalSessionCap (kernel/src/cap/terminal_session.rs) is a stateless unit
struct – it carries no per-session ownership state; write/writeLine
dispatch onto the shared kernel terminal, and readLine resolves caller context
at call time (call_with_context). The Move/ServiceRegrantOnly choice was a
policy default, not a state-ownership requirement. Minting it Copy/SameSession
lets the parent keep its terminal cap while each child receives an independent
Copy to the same shared terminal – which is exactly the POSIX
all-children-share-the-tty semantics, realized through the capability model
rather than against it.
Security/scope: Copy/SameSession keeps the cap from escaping the session (the
same scope Console already uses); a child gains no authority the parent did not
already hold (a write/read view of the same terminal it was already attached to).
requires_live_caller_session stays true, so the child’s readLine still
resolves against the child’s own live session context. This must be confirmed in
the kernel slice’s security review, including that a forwarded terminal cap
cannot outlive the session improperly and that line-discipline interleaving of
two writers (parent + child) is acceptable for the research surface (it is: the
shared kernel terminal already serializes writes; cooked-mode interleaving at
sub-line granularity is a known, documented research-surface limitation, not a
capability leak).
Alternative considered and rejected: a separate narrower TerminalWrite
write-only cap (interface-is-the-permission). This is cleaner long-term but
introduces a new interface, a new bootstrap source, a new FdBacking variant,
and child-side adoption – disproportionate for v0 when the existing
TerminalSession write surface is already the right shape and can be shared by a
mint-mode change alone. Recorded as future work if a write-only child terminal
view is later wanted.
4. Writable File/Directory single-writer tension
Real POSIX shares writable fds across fork (parent and child write to the same
open description). capOS’s disk-backed writable filesystem enforces a
fail-closed single-writer policy: writable File/Directory caps are minted
NonTransferable (writable_fs::transfer_result_cap), so the kernel rejects the
spawn grant and execve surfaces ENOEXEC.
Decision: keep writable File/Directory NonTransferable; document the
divergence. Under full inheritance this means a child does not inherit a
parent’s writable disk fd – execve must treat a NonTransferable backing as a
non-fatal skip (drop that one fd from the child, like CLOEXEC) rather than a
fatal ENOEXEC for the whole spawn. The v0 path made it fatal because the fd was
explicitly dup2’d (the app asked for it); under full inheritance the fd is
inherited implicitly, so failing the entire spawn because one incidental writable
fd cannot transfer would break unrelated programs. The honest divergence: capOS
shares the read path of the filesystem across fork (read-only caps are
Copy/SameSession) but not the write path, because the single-writer policy
is a deliberate capOS guarantee that has no POSIX analog. RAM scratch
Directory/File (the kernel:directory/kernel:file sources) are
Copy/SameSession and do inherit, matching the common shell-scratch case.
A future revocation-aware writable share (refcounted or session-scoped) is possible but out of scope; recorded as a follow-up. v0’s stance is: writable disk fds are not inheritable, skipped non-fatally, documented.
5. cwd Directory representation and inheritance
A shell’s children should be able to list/open the cwd without the app doing
anything special. A forwardable directory fd is obtainable both via
dirfd(opendir()) and, since posix-open-directory-fd, via
open(dir, O_RDONLY) (libcapos-posix/src/file.rs). Two parts:
- cwd as an inheritable Directory fd. Under full inheritance, if the shell
holds an open
FdBacking::Directoryfd for its cwd, it forwards to the child by default (read-only RAM/readonly_fsdirs areCopy/SameSession). The child’s libc cwd resolution can then target the inherited dir fd. This is the primary mechanism and needs no new surface beyond full inheritance. open(dir, O_RDONLY)-> Directory fd (landed,posix-open-directory-fd).openon a directory now installs aFdBacking::Directoryfd instead of failing:readreturnsEISDIR,writereturnsEBADF,lseekreturnsEISDIR, andfdopendirconsumes it. A non-O_RDONLYdirectory open staysEISDIR; a missing path keeps its original error (ENOENT). This covers theN</dirredirection path (dash redir usessh_open->open) without the bespokedirfd(opendir())dance. Proofmake run-posix-open-dir-fd. It was decoupled from the headline path, which never depended on it.
6. Backward compatibility and re-verification
Changing the default from explicit-grant-only to full-inherit interacts with the just-landed explicit-grant contract and existing smokes. What must be re-verified when the behavior slice lands:
make run-posix-pipe-smoke– relies on explicit pipe-end Move grants. Under full inheritance the parent’s other open fds (e.g. its terminal stdio) would now also forward. The pipe child must still see EOF when the parent closes the write end, and the parent must not lose its own terminal (fixed by §3). The recordedclose(write_end)still drops that child slot. Re-verify.make run-posix-spawn-smoke–posix_spawnwith explicit file actions. The file-actions path must still honor explicitdup2/close; full inheritance is the baseline the actions edit on top of. Re-verify.make run-posix-execve-inherit-smoke– the bespoke parent that explicitlydup2s a Directory/Console. Under full inheritance the explicitdup2s become redundant (the fds would inherit anyway) but must remain correct. Re-verify.make run-posix-stdio-smoke/run-posix-stdio-terminal-smoke– stdio backing selection. Re-verify.
The capability-purity argument is unchanged: full-inherit is not ambient
authority. The child inherits exactly the capabilities in the parent’s fd
table (the same caps under the same slots), nothing more. There is no global
namespace, no inherited credential, no kernel-side fd knowledge – the kernel
still only sees an explicit List(CapGrant) from ProcessSpawner.spawn. The
shim now computes that list from the full table instead of the touched subset;
the kernel’s transfer-mode enforcement (process_spawner.rs) still gates every
grant. A child can receive only caps the parent already holds and that are
transferable; NonTransferable writable caps are skipped, not smuggled.
Implementation path (decomposed)
The work splits into a kernel cap-mode slice and a libcapos-posix behavior slice, with one optional narrow slice, all gating the dash shell smoke. See the ready task records:
posix-terminal-session-forwardable(behavior, kernel, done 2026-05-27) – mintTerminalSessionCopy/SameSessionso it forwards non-destructively viaSpawnGrantMode::Raw. Precursor for the terminal-stdout half of §3. Proven bymake run-posix-terminal-forward.posix-recording-shim-full-fd-inherit(behavior, libcapos-posix, done 2026-05-27) – full-table inheritance default (§1),FD_CLOEXECenforcement (§2), non-fatal skip of non-forwardable backings (Udp / already-moved / shared Pipe) when implicitly inherited (§4), and Copy-share of multi-aliased non-destructive backings (§1 subtlety). The recording-shimTerminalarm now forwards Raw (non-destructive). Proven bymake run-posix-fd-inherit-default. ANonTransferablewritable backing stays a documented whole-spawnENOEXECboundary; the v0 POSIXopensurface mints no such cap, so the §4 non-fatal skip is realized for the backings that can actually arise.posix-open-directory-fd(behavior, libcapos-posix, done) –open(dir, O_RDONLY)->FdBacking::Directory(§5); non-O_RDONLYstaysEISDIR, missing path keepsENOENT. Proofmake run-posix-open-dir-fd. Was off the headline critical path.
posix-p1-4-dash-shell-smoke (docs/tasks/) depends on the first two;
once they land it can run with no per-app dash patch (only the generic, already-
landed Variant A fork-exec patch set and the slash-bearing /ls-shim invocation
to skip dash’s PATH stat, which is a documented dash-config choice, not a
capOS workaround).
Per-app patch stance
The directive forbids per-app dash patches that would have to be repeated for
every POSIX program. This design needs none: full inheritance is a generic
capOS-side fix in the shim. The only acceptable vendored-dash touch is a generic
POSIX-correctness item (the existing Variant A fork-exec coupling under
vendor/dash/patches/, owned by posix-p1-4-dash-vendor), not a per-app
inheritance workaround. The EV_EXIT in-place-exec residual
(posix-p1-4-dash-shell-smoke)
is the one remaining dash-specific item; it is a recording-shim “exec without
prior fork” limitation, handled in the shell-smoke slice (disable the
optimization or a bounded generic patch), not by this proposal.
Design grounding
libcapos-posix/src/process.rs(spawn_path_with_actions,fork,execve, the recording-shim contract),libcapos-posix/src/fd.rs(FdBacking,FD_FLAGS/FD_CLOEXEC,inherit_stdio_grants),libcapos-posix/src/terminal.rs,libcapos-posix/src/directory.rs,libcapos-posix/src/file.rs.kernel/src/cap/mod.rsboot_cap_hold(Console and TerminalSession bothCopy/SameSessionsince 2026-05-27),kernel/src/cap/terminal_session.rs(TerminalSessionCapstateless unit struct),kernel/src/cap/process_spawner.rs(validate_spawn_transfer_scope, transfer-mode enforcement).schema/capos.capnpProcessSpawner.spawn(... grants :List(CapGrant)),CapGrant,CapGrantMode.docs/proposals/posix-adapter-proposal.md(recording-shim Variant A, fd-backing-cap inheritance),docs/capability-model.md(interface-is-the-permission, transfer modes/scopes).docs/tasks/done/2026-05-27/posix-execve-capability-inheritance.mdanddocs/tasks/done/2026-05-26/spawn-grant-forwardable-readonly-directory.md(the landed explicit-grant inheritance this proposal generalizes), posix-p1-4-dash-shell-smoke (the premise conflict this resolves).