Proposal: Native Shell and POSIX Shell
How interactive operation should work on capOS without reintroducing ambient authority through a Unix-like command line.
Problem
capOS deliberately avoids global paths, inherited file descriptors, ambient network access, and process-wide privilege bits. A conventional shell assumes all of those. If capOS copied a Unix shell model directly, the shell would either be mostly useless or become an ambiently privileged escape hatch around the capability model.
The system needs two related, but distinct, shell layers:
- Native shell: schema-aware capability REPL and scripting language.
- POSIX shell: compatibility personality for existing programs and scripts.
Both must be ordinary userspace processes. Neither should receive special kernel privilege. The kernel and trusted capability-serving processes remain the enforcement boundary.
Model-driven interaction on top of the native shell is a separate concern and is defined in Language Models and Agent Runtime. The model runs as its own service with no session authority; the native shell (in “agent mode”) is the runner: it holds the session caps, exposes them to the model as typed tool descriptors with per-tool permission modes, executes tool calls on behalf of the model, streams results back, and keeps the user in the loop.
The first boot-to-shell milestone is text-only: local console login/setup and, later in the same family, a browser-hosted terminal gateway. Graphical shells, desktop UI, compositors, and GUI app launchers are a later tier. See Boot to Shell.
Design Principles
- A shell starts with only the capabilities it was granted.
- A shell command compiles to typed capability calls, not stringly syscalls.
- Child processes receive explicit grants. There is no implicit inheritance of the shell’s full authority.
- Elevation is a capability request mediated by a trusted broker, not a flag inside the shell.
- Shell startup is a workload launch from a
UserSession, service principal, or recovery profile. Session metadata informs policy and audit; it is not authority. - Default interactive cap sets are broker-issued session bundles, not hard-coded shell privileges.
- POSIX behavior is an adapter over scoped
Directory,File, socket factory, and process capabilities. It is not the native authority model.
User identity and policy sit above this shell model. A shell session may be
associated with a human, service, guest, anonymous, or pseudonymous principal,
but the session’s capabilities remain the authority. RBAC, ABAC, and mandatory
policy decide which scoped caps a broker may grant; they do not create a
kernel-side uid, role bit, or label check on ordinary capability calls. See
User Identity and Policy.
Federated sessions (OIDC-authenticated principals, service accounts using
OAuth2 workload identity) are one input shape for this model. OAuth scopes
and OIDC claims from a session’s issuer feed AuthorityBroker as ABAC
attributes. They never authorize capability calls directly, and raw bearer
tokens never appear in shell state. The token-typed capabilities,
OAuthClient, OidcIdentityProvider, and the broker-side token handling
are defined in
OIDC and OAuth2.
Layering
flowchart TD
Input[Login, guest, anonymous, or service request] --> SessionMgr[SessionManager]
SessionMgr --> Session[UserSession metadata cap]
Session --> Broker[AuthorityBroker / PolicyEngine]
Broker --> Bundle[Scoped session cap bundle]
Bundle --> Native[Native shell]
Bundle --> Posix[POSIX shell]
Posix --> Compat[POSIX compatibility runtime]
Native --> Ring[capos-rt capability transport]
Compat --> Ring
Ring --> Kernel[Kernel cap ring]
Ring --> Services[Userspace services]
Native --> Approval[Approval client cap]
Approval --> Broker
Broker --> Services
Broker --> Audit[AuditLog]
The native shell is the primitive interactive surface. The POSIX shell is a
compatibility consumer of capOS capabilities, not the model other shells are
built on. A language-model service, when present, is invoked through a
LanguageModel cap from the native shell running in “agent mode”; the
shell is the tool runner, not the model. That flow is defined in
Language Models and Agent Runtime and is not expanded
in this diagram.
A shell may display a principal name, profile, role set, label, or POSIX UID,
but those values are descriptive unless a trusted broker uses them to return a
specific capability. Losing a home, logs, launcher, or approval cap
cannot be repaired by presenting the same session ID back to the kernel.
Native Shell
The native shell is a typed capability graph operator. Its job is to inspect, invoke, pass, attenuate, release, and trace capabilities.
Current implementation status as of 2026-05-16 21:36 UTC: capos-shell is
the standalone no_std crate at shell/ and ships the anonymous-first
interactive flow. Focused shell/login manifests still launch it directly as
initConfig.init; the default make run manifest now runs it as an
init-started service under standalone init, together with the chat /
adventure binaries and the remote-session CapSet gateway. On boot the shell
mints an anonymous UserSession via SessionManager.anonymous() and
receives an empty-allowlist anonymous bundle from AuthorityBroker.
login and setup commands use
CredentialStore/SessionManager/AuthorityBroker to verify or create the
password, mint an operator session, request the operator shell bundle, and
swap session/launcher in place. Login prompts for a username as well as a
password through a username-aware SessionManager.login() request that
carries method, selector, proof, and source metadata. A guest command
mints a guest session via SessionManager.guest() and swaps to a
broker-issued guest bundle (guest sessions require an explicit manifest seed;
no broad authority is granted to guest profiles). Shell exit calls
UserSession.logout() to clean up the session context. The default make run manifest includes the native shell, chat/adventure binaries, terminal,
console, stdio, chat, adventure, creds, sessions, audit,
broker, and system_info caps; its MOTD shows the concrete spawn / run
commands for the adventure demo. The current command set is help, caps,
binaries, motd, inspect <name>, session, login, setup, guest,
spawn, blocking run, wait, and exit, with a launcher-backed
binaries command that lists binaries available to the current session
(anonymous and guest launcher policies return an empty list).
The session-scoped TerminalSession substrate now exists behind
make run-terminal, and the bounded SSH terminal-host proof can launch
capos-shell over a socket-backed TerminalSession with a public-key
UserSession through RestrictedShellLauncher. The generic
call @cap.method(...) REPL, schema reflection, richer daily shell profiles,
and the full OpenSSH gateway remain future work.
Example init or development session with explicit spawn authority:
capos:init> caps
log Console
spawn ProcessSpawner
boot BootPackage
vm VirtualMemory
capos:init> call @log.writeLine({ text: "hello" })
ok
capos:init> spawn "tls-smoke" with {
log: @log
} -> $child
started pid 12
capos:init> wait $child
exit 0
Values
Native shell values should include:
@name: a named capability in the current shell context.$name: a local value, result, promise, or process handle.- structured values: text, bytes, integers, booleans, lists, and structs.
- result-cap values returned through the capOS transfer-result path.
- trace values representing CQE and call-history slices.
The shell should preserve interface metadata with every capability value. A method call is valid only if the target cap exposes the method’s schema.
Commands
Initial commands should be small and explicit:
caps
binaries
inspect @log
methods @spawn
call @log.writeLine({ text: "boot complete" })
spawn "ipc-server" with { log: @log, ep: @serverEp } -> $server
wait $server
run "ipc-client" with { log: @log, ep: client @serverEp }
release @temporary
trace $server
bind scratch = @store.sub("scratch")
derive readonly = @home.sub("config").readOnly()
inspect should show the interface ID, label, transferability, revocation
state when available, and callable methods. It should not imply that two caps
with the same interface ID are the same authority.
The current prototype intentionally does not yet provide the generic
call @cap.method(...) REPL. Until the schema registry and structured value
parser exist, native-shell exposes only narrow typed commands and should make
that gap visible through planning docs rather than accepting raw method IDs and
opaque byte blobs.
Syntax
The syntax should be structured rather than shell-token based. A CUE-like or Cap’n-Proto-literal-like shape fits capOS better than POSIX word splitting:
spawn "net-stack" with {
log: @log
nic: @virtioNic
timer: @timer
}
The shell can still provide abbreviations, but the executable representation
should be an ActionPlan object with typed fields.
Composition
Native composition should pass typed caps or structured values, not inherited byte streams by default:
pipe @camera.frames()
|> spawn "resize" with { input: $, width: 640, height: 480 }
|> spawn "jpeg-encode" with { input: $, quality: 85 }
|> call @photos.write({ name: "frame.jpg", data: $ })
If a byte stream is desired, it should be explicit through a ByteStream,
File, or POSIX adapter capability. This keeps the “pipe” operator from
silently turning every interface into untyped bytes.
Namespaces
There is no global root. A native shell may have a current Directory or
Namespace capability, but that is just a default argument:
capos:user> ls @config
services
network
capos:user> cd @config.sub("services")
capos:@config/services> ls
logger
net-stack
The shell cannot traverse above a scoped directory or namespace unless it holds another capability that names that authority.
Session Context
A session-aware shell may hold a self or session cap for UserSession.info()
and audit context. That cap is metadata. It can identify the principal, auth
strength, expiry, quota profile, and audit identity, but it cannot widen the
shell’s CapSet or authorize kernel operations by itself.
The launcher or supervisor starts the shell with a CapSet returned by
AuthorityBroker(session, profile). For interactive work, that bundle should
usually include scoped terminal, home, logs, launcher, status, and approval
caps. For service accounts, guest sessions, anonymous workloads, and recovery
mode, the broker returns different bundles under explicit policy profiles.
Shell-launched children inherit only the caps named in the spawn plan. A child
may receive a UserSession or session badge for audit, per-client quotas, or
service-side selection, but object access still comes from the scoped object
caps passed to that child.
Interactive Command Surfaces
Application-specific interactions must stay out of the native shell command
set. A chat client, adventure client, or other interactive application should
run as an ordinary shell-spawned application or resident service session, not
as a builtin such as chat or play adventure.
The near-term target is a prototype bridge, not the final app protocol:
capos-shell launches clients with spawn or run, grants them explicit
endpoint clients such as stdio: client @stdio, and services StdIO while
waiting. That proves exact grants, process handles, child completion, and the
terminal bridge without giving a child the shell’s move-only TerminalSession.
Legacy badge N syntax is retired from normal client @... grants; delegated
client endpoints preserve their service identity by default, and service object
capabilities replace badged chat/adventure identity. Explicit selector fixtures
remain only in low-level and hostile-path tests.
That StdIO bridge is intentionally limited. It is acceptable for focused
QEMU smokes and textual compatibility, but it is the wrong long-term semantic
boundary for capOS-native applications. If an adventure client receives a line
from StdIO and parses go north, take key, or say hello internally,
capOS has only moved string command parsing out of the shell and into the app.
That is still weaker than typed capability invocation.
Native interactive applications should expose a command surface:
path=["go"], args={direction:"north"}
path=["take"], args={item:"brass-key"}
path=["say"], args={text:"hello there"}
path=["chat","join"], args={channel:"#lobby"}
The user may still type familiar command <args> forms. The shell or terminal
host parses them through generic command metadata, including nested
subcommands, argument kinds, completions, and redaction rules. The app receives
a structured invocation and converts it to typed service calls. The shell does
not hardcode application verbs, and the application does not parse unstructured
terminal text for normal operations.
StdIO remains an explicit text I/O capability for transcript output, simple
programs, POSIX compatibility, and test harnesses. It should not be the primary
command interface for native chat/adventure-style applications. The focused
design is in
Interactive Command Surfaces.
Remote Session CapSet Clients
Not every remote interaction should become a shell session. A regular host
application – CLI, native GUI, Tauri backend, webapp gateway, or service
client – should be able to authenticate to capOS, receive a broker-issued
remote view of its session CapSet, and call the capabilities it was granted
over Cap’n Proto RPC. That path is a programmatic peer of the native shell:
both consume a session bundle from AuthorityBroker, but only the shell adds
command parsing, terminal state, and child-process workflow.
The remote client must not receive the kernel’s local CapSet page, local
cap-table indexes, endpoint selectors, result-cap indexes, or global session
identifiers. It receives typed RPC object references backed by a capOS
per-session worker. Chat, Paperclips, Adventure, command sessions, and future
service APIs should therefore be callable by generated clients without routing
through capos-shell. The owning design is
Remote Session CapSet Clients.
That proposal also covers bidirectional UI composition for web/Tauri/GUI
sessions: services can propose task-specific panes or command surfaces through
explicit UI caps, but cannot take arbitrary control of the host UI.
Terminal Host Separation
The shell should not be the terminal host forever. The component that owns a UART, web socket, GUI pane, line editing, history, paste handling, resize state, and render policy can be a separate terminal host process. The shell then runs against a terminal entity and can be reused unchanged from local console, GUI, web, and scripted hosts.
TerminalSession remains the foreground text-session authority, but it is an
interface between terminal host and shell, not proof that the shell implements
the terminal. Shell-spawned applications should normally receive command
sessions or explicit StdIO adapters, not the shell’s move-only
TerminalSession.
Remote text transports follow the same rule. The Telnet Shell Demo in
Networking is a demo-only plaintext
terminal host: it accepts a host-loopback QEMU-forwarded TCP connection and
gives the shell a socket-backed TerminalSession. The kernel-side socket
terminal silently consumes IAC option negotiation in its line discipline, so
no userspace pre-handoff recv is required. It must not turn the shell login path into a
raw ByteStream, raw TcpSocket, or StdIO substitute, because password
entry, echo policy, cancellation, and shell launch authority are defined at the
TerminalSession boundary. The QEMU harness for that demo binds the host
forward to 127.0.0.1:2323 only and runs caps to prove the child shell did
not receive raw NetworkManager, ProcessSpawner, TCP, or unknown capability
interfaces. The gateway itself remains a trusted demo bootstrap service until
scoped listener and manifest-declared shell-launch grants exist; production
remote CLI shell access waits for the SSH gateway layer. The SSH path is
specified separately in SSH Shell Gateway: it
keeps the same TerminalSession and broker-issued shell-bundle boundary, while
adding SSH host authentication, encrypted transport, public-key user
authentication, channel policy, and remote-session audit. Its initial schema
stubs name the terminal construction and authority surfaces as
SshTerminalFactory, TcpListenAuthority, and RestrictedShellLauncher; they
now have focused QEMU proofs for scoped listen authority, public-key session
minting, restricted shell launch, and a bounded plain-TCP terminal-host handoff.
A focused development-only host-key proof grants an explicitly labeled
non-production SshHostKey cap in QEMU that performs bounded fixture
exchange-hash signing. The full runnable OpenSSH gateway still waits on
encrypted transport, SSH packet/channel handling, persistent production
key-management-backed signing, and the final run-ssh-shell host harness.
Agent Mode
Model-driven interaction is defined in
Language Models and Agent Runtime. This proposal does
not describe a separate “agent shell” process. The native shell, running
in “agent mode”, is the tool runner: it holds the session cap bundle,
exposes caps to a LanguageModel service as typed ToolDescriptor
values with per-tool permission modes (auto / consent / stepUp /
forbidden), executes the model’s tool calls against its own caps,
streams results back into the conversation, and keeps the user in the
loop through consent prompts, streaming, and interrupt. There is no
separate PlannerAgent or ActionPlan pipeline.
Long-lived OpenClaw-like hosted agents, swarms, background tasks, external channel ingress, agent-maintained memory/wiki stores, and MCP/A2A-style interoperability are intentionally separate from the shell surface; see capOS-Hosted Agent Swarms. The shell can launch, inspect, approve, or cancel hosted tasks, but it should not own the hosted-agent control plane.
Approval and Authentication
Elevation belongs in a trusted broker service that the shell can consult but cannot impersonate.
Conceptual interfaces:
interface ApprovalClient {
request @0 (
reason :Text,
plan :ActionPlan,
requestedCaps :List(CapRequest),
durationMs :UInt64
) -> (grant :ApprovalGrant);
}
enum ApprovalState {
pending @0;
approved @1;
denied @2;
expired @3;
escalated @4;
}
interface ApprovalGrant {
state @0 () -> (state :ApprovalState, reason :Text);
claim @1 () -> (caps :List(GrantedCap));
cancel @2 () -> ();
}
interface AuthorityBroker {
request @0 (
session :UserSession,
plan :ActionPlan,
requestedCaps :List(CapRequest),
durationMs :UInt64
) -> (grant :ApprovalGrant);
}
ActionPlan is the structured description of the work the request will
perform. Free-form text it carries is for the approval UI; the broker
decides authority from the typed step list, never from the summary string.
struct ActionPlan {
# Brief, redactable, human-readable summary. Used by the approval UI;
# not used as an authority input by the broker.
summary @0 :Text;
# Structured action steps. The broker decides whether each step is
# representable for the bound session/profile; an unrepresentable step
# fails the whole request.
steps @1 :List(ActionStep);
# True if any step modifies durable state, terminates a service,
# releases storage, sends external traffic, or is otherwise hard to
# reverse. Brokers may require step-up authentication and longer
# review windows when this is set.
destructive @2 :Bool;
# Stable identifier the requester sets so it can correlate the resulting
# grant or queue entry. Brokers must not interpret this as authority.
requestId @3 :Data;
}
struct ActionStep {
union {
spawn :group {
# Manifest entry name or trusted launcher alias. The broker
# resolves the alias to a binary identity before grant.
target @0 :Text;
# Cap names the spawned process needs from the launcher's
# advertised set. Each name maps to a concrete `CapRequest`
# in the enclosing `ActionPlan.requestedCaps`.
capNames @1 :List(Text);
}
serviceControl :group {
service @2 :Text;
verb @3 :ServiceVerb;
}
storageOpen :group {
namespace @4 :Text;
path @5 :Text;
mode @6 :StorageMode;
}
# Free-form structured payload describing a step the broker
# recognises by name. Lets new step kinds land without re-issuing
# the schema; brokers refuse unknown `kind` values.
custom :group {
kind @7 :Text;
payload @8 :Data;
}
}
}
enum ServiceVerb {
start @0;
stop @1;
restart @2;
reload @3;
}
enum StorageMode {
read @0;
readWrite @1;
append @2;
}
CapRequest describes a single capability the plan needs. The broker
matches each request against the principal’s role bundle and ABAC
context; the response either narrows the request and mints the cap, or
denies. There is no widening path.
struct CapRequest {
# Capability interface name advertised by the broker
# (`ServiceSupervisor`, `Directory`, `TcpProvider`, ...). The broker
# refuses unknown interfaces.
interface @0 :Text;
# Identifier of the target object inside that interface. For
# `ServiceSupervisor` this is the service name; for `Directory` it
# is the namespace path; for `TcpProvider` it is an address-policy
# selector. The broker validates the target against policy.
target @1 :Text;
# Per-cap maximum duration. The grant returns the lesser of this and
# the plan-level `durationMs` after policy narrowing. Zero means
# "use plan-level default".
maxDurationMs @2 :UInt64;
# Optional attenuation hints (subdirectory, method allow-list,
# address filter). The broker may further narrow these but must
# never widen them.
attenuation @3 :Data;
}
GrantedCap is the same transport-level result-cap concept used by
ProcessSpawner – a typed reference to an attenuated, leased
capability the broker has minted. It is not a separate authority
encoding; reading the granted cap is the only way to use the granted
authority.
The native shell holds only a session-bound ApprovalClient. It does not
submit arbitrary PrincipalInfo, role, UID, label values, or authentication
proofs as authority. The ApprovalClient forwards the bound UserSession
and typed request to AuthorityBroker. The broker or a consent service
wrapping it holds powerful caps, drives any trusted consent or step-up
authentication path, and mints attenuated temporary caps after policy and
authentication checks.
The conceptual API intentionally has no authProof argument on the
shell-visible path. If a proof is needed, it is collected by
SessionManager, the broker, or a trusted approval UI and reflected back
to the shell only as pending, approved, denied, expired, or
escalated.
Approval Inbox
Synchronous approval is not always available. Step-up authentication, a dual-control destructive action, or a deferred review (for example a service-restart change-window) all need a durable queue: the request must be listable later, persistent across reconnects, and triageable in batch.
The broker exposes that queue through an ApprovalInbox cap minted
into the session bundle of whoever may approve. The inbox is not a
shell cap; the native shell uses ApprovalClient to submit requests,
and a separate principal (a security operator, the same operator under
step-up, or a multi-party reviewer set) holds the inbox cap that
decides them. Remote workspaces (the CapSet UI) treat
ApprovalInbox as the canonical pending-actions surface, which lets a
browser session show “you have pending approvals” without granting the
browser any of the requested authority.
interface ApprovalInbox {
# List entries currently awaiting decision. Bounded; the broker
# enforces a per-inbox visible-window cap and may return fewer than
# `limit` rows. `truncated` distinguishes "broker capped this page"
# from "no further rows".
list @0 (
cursor :Data,
limit :UInt32
) -> (
entries :List(ApprovalEntry),
nextCursor :Data,
truncated :Bool
);
# Look up a specific entry by id. Useful when a UI deep-links to
# an entry past the listed window.
entry @1 (entryId :Data) -> (entry :ApprovalEntry);
# Approve, deny, or escalate a single entry. `approve` returns the
# `ApprovalGrant` minted by the broker; `deny` and `escalate`
# transition the entry without minting caps. The decider's reason
# text is bounded and recorded in audit.
decide @2 (
entryId :Data,
decision :ApprovalDecision,
reason :Text
) -> (grant :ApprovalGrant);
# Bulk-decide entries that share shape (same requester principal,
# same plan summary fingerprint, same destructive flag). The broker
# rejects mixed shapes with an explicit diagnostic instead of
# silently approving heterogeneous requests.
batchDecide @3 (
entryIds :List(Data),
decision :ApprovalDecision,
reason :Text
) -> (grants :List(ApprovalGrant));
# Subscribe to inbox change events. The listener cap is held by
# the broker; logging out of the inbox session revokes the
# subscription.
watch @4 (listener :ApprovalListener) -> ();
}
enum ApprovalDecision {
approve @0;
deny @1;
escalate @2;
}
struct ApprovalEntry {
# Broker-minted opaque id, stable across reconnects.
entryId @0 :Data;
# Opaque audit-only principal id of the requester.
requesterId @1 :Data;
# Display name; not authoritative.
requesterName @2 :Text;
plan @3 :ActionPlan;
requestedCaps @4 :List(CapRequest);
durationMs @5 :UInt64;
state @6 :ApprovalState;
# Last decider reason or denial detail; bounded.
reason @7 :Text;
createdAtMs @8 :UInt64;
expiresAtMs @9 :UInt64;
escalation @10 :EscalationInfo;
}
struct EscalationInfo {
# Number of additional reviewers the broker has notified. Zero when
# the entry has not been escalated.
reviewerCount @0 :UInt32;
# Role names of the additional reviewers; never principal ids.
reviewerHints @1 :List(Text);
}
interface ApprovalListener {
appended @0 (entry :ApprovalEntry) -> ();
decided @1 (entryId :Data, state :ApprovalState) -> ();
expired @2 (entryId :Data) -> ();
}
The ApprovalClient itself does not change shape: a request that the
broker cannot decide synchronously still returns an ApprovalGrant
immediately, with state == pending and a stable handle. The broker
adds an entry to the corresponding inbox; the requester polls or
watches its grant; the inbox holder drives the decision. When the
inbox holder calls decide(approve), the existing grant transitions
to approved and claim returns the minted caps – the requester
does not learn an entry id, and the inbox does not learn the
requester’s ApprovalGrant cap. The two surfaces meet only at the
broker.
Inbox entries are durable across reconnects because entryId is
broker-minted and the inbox cap is session-bound rather than
transport-bound. Closing a transport does not delete entries;
re-presenting the same session-scoped inbox cap rebinds the listener
without losing pending state. Entries expire on the broker timer at
expiresAtMs and produce an expired listener event; expired
entries remain visible to entry() for a bounded audit window
defined by broker policy, after which they move to the audit log
only.
Elevation Flow
User request (typed directly, or produced by agent-mode tool-use as an
ActionPlan before invoking the broker):
restart the network stack
Requested action presented to the broker:
- stop service "net-stack"
- spawn "net-stack"
- grant: nic, timer, log
- wait for health check
Missing authority:
- ServiceSupervisor(net-stack)
Requested duration:
- 60 seconds
Broker decision:
- Which
UserSessionand profile is this request bound to? - Is that principal/profile allowed to restart
net-stack? - Is the requested binary allowed?
- Are the requested grants narrower than policy permits?
- Do mandatory confidentiality and integrity constraints allow the grant?
- Is there fresh user presence?
- Does this require step-up authentication?
If approved, the broker returns a narrow leased capability:
supervisor: ServiceSupervisor(service="net-stack", expires=60s)
It should not return broad ProcessSpawner, BootPackage, or
DeviceManager authority when a scoped supervisor cap can do the job.
Authentication
Authentication proof should be consumed by the SessionManager or broker
boundary, not exposed as a secret to the shell. Suitable mechanisms include:
- password or PIN for medium-risk local actions.
- hardware key or WebAuthn-style challenge for administrative actions.
- TPM-backed local presence for device or boot-policy operations.
- OIDC step-up: broker requests a fresh ID token from the session’s IdP
with
prompt=login,max_age, or strongeracr_valuesbefore returning a leased cap. The IdP andSessionManagerdrive the user interaction; the shell sees onlypending→approved/denied. - multi-party approval for destructive policy, storage, or recovery actions.
The shell should never receive raw tokens (including OAuth access or refresh
tokens), private keys, recovery codes, or full environment dumps. When the
broker must delegate outbound authority to a session — for example, “read
from this company’s HR API” — it returns a wrapper capability that holds
the AccessToken internally; the shell invokes the wrapper without seeing
the bearer string.
Shell Hardening
The shell must treat files, logs, web pages, service output, model output, and CQE payloads as untrusted data. They are not instructions.
Required behavior:
- show an executable typed plan before authority-changing actions.
- keep elevated caps leased, narrow, and short-lived.
- release temporary caps after the plan finishes or fails.
- audit every approval request, grant, cap transfer, and release.
- require exact targets for destructive actions.
- refuse broad phrases such as “give it everything” unless a trusted policy explicitly allows a named emergency mode.
- keep any model-derived context separate from secrets and authentication proofs; see the LLM/agent-runtime proposal for the model-service side.
The enforcement rule is simple: users and models may propose, explain, and request. Capabilities decide what can happen.
POSIX Shell
The POSIX shell is a compatibility layer for existing software and scripts. It should be useful, but it should not define native capOS administration.
The C-ABI substrate for porting POSIX programs (including a POSIX shell) is
specified separately in
POSIX Adapter. libcapos exposes the
capability ring, CapSet, raw syscalls, and heap to C; libcapos-posix layers
the POSIX shape (fd table, errno, pipe / read / write / dup / dup2,
fork / execve / waitpid / _exit, posix_spawn and the file-action
shims, clock_gettime, UDP socket calls, console-backed stdio) on top. Phases
P1.1, P1.2, and P1.3 of that proposal are landed; the C-substrate, pipe cap,
recording-shim fork-for-exec, direct posix_spawn path, and Console-backed
stdio are proven by QEMU smokes (make run-c-hello, make run-posix-dns-smoke,
make run-posix-pipe-smoke, make run-posix-stdio-smoke). The POSIX shell port
itself depends on Namespace and File caps, which are tracked in that
proposal as gating work after the current phases close.
Mapping
POSIX concepts map onto granted capabilities:
| POSIX concept | capOS backing |
|---|---|
/ | synthetic root built from granted Directory or FileServer caps |
| cwd | current scoped Directory cap |
| fd | local handle to File, ByteStream, pipe, terminal, or socket cap |
| pipe | ByteStream pair or userspace pipe service |
PATH | search inside the synthetic root or a command registry cap |
exec | ProcessSpawner or restricted launcher cap |
| sockets | socket factory caps such as TcpProvider or HttpEndpoint |
uid, gid, user, group | synthetic POSIX profile derived from session metadata |
$HOME | path alias backed by a granted home directory or namespace cap |
/etc/passwd, /etc/group | profile service view, scoped to the compatibility environment |
| env vars | data only; never authority by themselves |
If a POSIX process has no network cap, connect() fails. If it has no
directory mounted at /etc, opening /etc/resolv.conf fails. If it has no
device cap, /dev is empty or synthetic.
A POSIX shell is launched with both a CapSet and compatibility profile metadata. The profile controls what legacy APIs report. The CapSet controls what the process can actually do.
Compatibility Limits
Exact Unix semantics should not be promised early.
- Prefer
posix_spawnover fullforkfor the first implementation. forkwith arbitrary shared process state can be emulated later if needed.setuidcannot grant caps. At most it asks a compatibility broker to replace the POSIX profile or launch a new process with a different broker-issued cap bundle.- Mode bits and ownership metadata do not create authority.
chmodcan modify filesystem metadata exposed by a filesystem service, but it cannot grant caps outside that service’s policy./procis a debugging service view, not kernel ambient introspection.- Device files exist only when a capability-backed adapter deliberately exposes them.
This is enough for many build tools and CLI programs without making POSIX the security model.
POSIX Session Caps
A normal POSIX shell session might receive:
terminal TerminalSession
session UserSession metadata
profile POSIX profile view
root Directory or FileServer synthetic root
launcher restricted ProcessSpawner/command launcher
pipeFactory ByteStream factory
clock Timer
Optional caps:
tcp scoped socket provider
home writable user Directory
tmp temporary Directory
proc read-only process inspection tree
Administrative caps still require broker-mediated approval.
Recovery Shell
A recovery shell is a separate policy profile, not the normal interactive shell with hidden extra privileges. It may receive a larger cap set, but only after strong local authentication and with full audit logging. Guest and anonymous profiles must not fall into recovery authority by omission.
Possible recovery bundle:
console
boot package read
system status read
service supervisor for critical services
read-only storage inspection
scoped repair caps
approval client
Destructive recovery operations should still go through exact-target approval. The recovery shell should be local-only unless a separate remote recovery policy explicitly grants network access.
Required Interfaces
This proposal implies several service interfaces beyond the current smoke-test surface:
UserSession/SessionManager: principal/session metadata, audit context, and guest or anonymous profile creation (user identity proposal).TerminalSession: session-scoped interactive terminal I/O. The first boundary is line-orientedwrite,writeLine, and boundedreadLinewith per-call echo control andsubmitted/cancelled/closedoutcomes; resize and paste framing can layer on later.StdIO: explicit text I/O capability serviced by the shell, a test harness, a web gateway, or another UI adapter. It has namedstdout,stderr, andstatusstreams plusline,block, andhiddenread modes; it does not imply inherited POSIX file descriptors and should not be the semantic command interface for native interactive applications.CommandSession: generic interactive command surface for native applications. It describes command paths, nested subcommands, argument shapes, completions, prompts, redaction metadata, render events, and typed invocation results.TerminalHost/ terminal entity: process and session object owning raw terminal transport, line discipline, presentation state, history, resize, and GUI/web framing while granting a foreground session to the shell.SchemaRegistry: maps interface IDs to method names and parameter schemas.CommandRegistry: optional registry of native command capabilities.SystemStatus: read-only process and service status.LogReader: scoped log access.ServiceSupervisor: restart/status authority for one service or subtree.AuthorityBroker/ApprovalClient: session-bound base bundles, plan-specific leased grants, and policy/authentication mediation.CredentialStore,ConsoleLogin, andWebShellGateway: boot-to-shell authentication services for password-verifier setup, passkey registration, federated OIDC login, and text terminal launch (boot-to-shell proposal).OAuthClient,OidcIdentityProvider,TokenVerifier,WorkloadIdentityFederation: OAuth2/OIDC primitives for federated login, outbound service authentication, and inbound resource-server token validation (OIDC and OAuth2 proposal).SshGateway,SshHostKey,AuthorizedKeyStore,SshTerminalFactory,TcpListenAuthority, andRestrictedShellLauncher: production remote CLI terminal ingress, SSH host-key proof, public-key login mapping, scoped TCP listen authority, shell-only launch authority, and SSH-backedTerminalSessionlaunch. The current development host-key proof exposes non-production public metadata and performs bounded fixture signing in QEMU; production host keys still require persistent key management (SSH shell proposal).AuditLog: append-only record of plans, approvals, grants, and releases.POSIXProfile/ compatibility broker: synthetic UID/GID, names,$HOME, cwd, and profile replacement without treating POSIX metadata as authority.ByteStream/ pipe factory: explicit byte-stream composition for POSIX and selected native pipelines.
These should be ordinary capabilities. A shell only sees the subset it has been granted.
Implementation Plan
-
Native serial shell
- Built on
capos-rt. - Lists initial CapSet entries.
- Invokes typed methods on the capabilities it was actually granted,
including
TerminalSessionfor ordinary interactive sessions. - When launched with a restricted launcher or other scoped spawn authority,
spawns and waits on exact-grant children without assuming broad
BootPackageorProcessSpawneraccess. - Provides
caps,inspect,call,spawn,run,wait,release, andtrace. - Runs interactive applications as ordinary spawned commands or resident
command sessions.
StdIOrequests may be serviced for text-stream programs, but native app commands should flow through structured command surfaces.
- Built on
-
Session-aware shell profile
- Use the
SessionManager -> UserSession metadataandAuthorityBroker(session, profile) -> cap bundlesplit. - Add
self/sessionintrospection without making identity metadata authoritative. - Start with guest, local-presence, and service-account profiles before durable account storage exists.
- Use the
-
Structured native scripting
- Add typed variables, result-cap binding, and plan serialization.
- Add schema registry support for method names and argument validation.
- Add a generic command-surface parser so
command <args>and nested subcommands compile to typed invocations without app-specific shell matches. - Add explicit byte-stream adapters for commands that need text streams.
-
Approval broker
- Define
ActionPlan,ActionStep,CapRequest,ApprovalClient,ApprovalInbox,ApprovalEntry, and leased grant records. - Add local authentication and audit logging.
- Make administrative native-shell operations request scoped caps through the broker instead of running from a permanently privileged shell.
- Wire
ApprovalInboxinto the operator session bundle so deferred, stepped-up, and multi-party approvals have a durable triage surface instead of relying on synchronous return-from-request.
- Define
-
Boot-to-shell integration
- Add local console login/setup in front of the native shell.
- Require a configured password verifier when one exists.
- Enter setup mode when no console password verifier exists.
- Treat guest as an explicit local profile and anonymous as a separate remote/programmatic profile, not as missing-password fallbacks.
- Support passkey-only web terminal setup through local/bootstrap authority, not unauthenticated remote first use.
- The local console login/setup half of this step is landed; the full boot-to-shell flow (durable multi-verifier accounts, passkey paths, federated OIDC login, web text shell gateway, production SSH shell gateway) is tracked in Boot to Shell.
-
Agent mode (out of scope here)
- Defined in Language Models and Agent Runtime:
no separate “agent shell” process. The native shell, running in
“agent mode”, is the tool runner: it gains a
LanguageModelclient cap plus a per-tool permission table (auto/consent/stepUp/forbidden), exposes its own session caps as typedToolDescriptorvalues to the model service, executes the model’s tool calls against those caps, streams results back into the conversation, and keeps the user in the loop through consent prompts and interrupts. There is noPlannerAgentor staticActionPlanpipeline.
- Defined in Language Models and Agent Runtime:
no separate “agent shell” process. The native shell, running in
“agent mode”, is the tool runner: it gains a
-
POSIX shell
- Implement after
Directory/File,ByteStream, and restricted process launch exist. - Start with
posix_spawn, fd table emulation, cwd, scoped root, pipes, and terminal I/O, plus synthetic POSIX profile metadata. - Add broader compatibility only as real workloads demand it.
- Implement after
Non-Goals
- No global root namespace.
- No shell-owned root/admin bit.
- No model-visible secrets.
- No default inheritance of all shell caps into children.
- No authorization from
PrincipalInfo, UID/GID, role, or label values alone. - No promise that POSIX scripts observe exact Unix behavior without a compatibility profile that grants the needed caps.
Open Questions
- Should the native shell syntax be CUE-derived, Cap’n-Proto-literal-derived, or a smaller custom grammar?
- How should schema reflection be packaged before a full runtime
SchemaRegistryexists? - How should later
TerminalSessionextensions such as resize and paste framing fit without exposing raw transport authority to ordinary shells? - How should the broker fingerprint plans for
ApprovalInbox.batchDecideshape-equivalence? A direct hash ofActionPlan.stepsis enough for identical plans submitted by the same requester profile, but near-identical plans differing only inrequestIdor summary text must still batch; near-identical plans differing in step targets or attenuation must not. The broker design needs an explicit fingerprinting rule beforebatchDecidecan be enabled. - How should audit logs be stored before persistent storage exists?
- How should interactive terminal UX scale beyond the planned
“one typed capability per command” native-shell surface? The current
prototype only exposes narrow typed commands; the questions below apply
to the proposed surface, not just what already runs. Several concrete
pain points are open:
- Cap management is manual. A shell user holds a CapSet and must
inspect, name, attenuate, pass, andreleasecaps explicitly per command. That is the right model for trust, but it is hostile for everyday work compared with a Unix prompt where$PWD,$PATH, open fds, and ambient credentials disappear from the user’s mind. The question is what affordances (named bindings, scoped session “workspaces”, broker-issued bundles bound to a task, auto-release on plan completion, undo/redo on cap moves, a visible “current authority” indicator) the shell should provide so the typical user is not hand-curating a cap graph for every line. None of this should re-introduce ambient authority; the goal is ergonomics over an already typed graph, not hiding it. - No agreed convention for passing parameters to programs. The
manifest currently launches binaries with a named CapSet and no
positional
args, noargv, no environment block, and no structured parameter struct (seesystem.cueandSystemManifestinschema/capos.capnp); init’sProcessSpawner-driven children inherit only the caps named in the spawn plan. Shellspawn ... with { ... }syntax is similarly cap-only. That is consistent, but it leaves “what does this program need to know besides its caps?” unanswered: where do free-form values (a chat channel name, an adventure save slot, a resize width) live? Options range from a typedLaunchParameterscapnp struct passed through the spawn plan, to a convention that every program declares a parameter schema discovered viaSchemaRegistry, to letting parameters always travel as fields on the first method call against aCommandSession/service cap rather than at launch time. The proposal should pick a single shape and describe how the manifest, shellspawn/run, native applications, and POSIXargvadapters all map onto it. - No replacement for Unix pipes. The native composition example uses
|>but defers byte-stream semantics toByteStream/StdIO, which is a strictly weaker pipe and not a data-processing model. Real workloads on Unix lean on text streams precisely because they are cheap and structured-enough; capOS can do better with typed records. The open question is whether to standardize a higher-level data-processing primitive — for example, YTsaurus-style map/reduce operators where each stage declares input and output schemas (RecordStream<T>?), the runtime negotiates a wire format (capnp records, framed JSON, columnar, raw bytes) at the boundary, and the shell’s|>becomes a pipeline planner rather than a byte pump. That would give native shell pipelines first-class typed composition without making every interface look likeByteStream. The question is whether this belongs in shell scope, in a separate data-processing proposal, or as aRecordStreamcapability in the schema registry that the shell merely consumes. - No story for ordinary shell programming constructs. The proposed
surface is one typed call per line plus
|>; the prototype is even narrower. Real interactive and scripted use needs conditionals (branch on a cap call result, onCapExceptionkind, on a value field), loops (iterate aList, fold aRecordStream, retry-with-backoff against a Timer), local variables and assignment beyond the implicit$from|>, user-defined functions/procedures that take typed parameters and capability arguments, early-return / break, and structured error handling that distinguishes transport-levelCapExceptionfrom application-level result variants. Each of these has capability-graph consequences that POSIX shells never had to face: does a function body close over the caller’s CapSet by reference or by an explicit captured set, are caps bound inside a loop iteration auto-released at the end of that iteration, does atry/recoverblock release leased broker grants on the failure path, can a function be saved and re-invoked across sessions (i.e. does it become a persistentActionPlantemplate), and how does the shell present a partial failure mid-pipeline without leaving orphan caps. The proposal should decide whether the native shell language defines these constructs itself, borrows them from a host language (CUE, a small embedded Rust-like DSL, an existing scripting runtime exposed as a capability), or stays deliberately non-Turing-complete and forces non-trivial control flow into spawned programs that expose typedCommandSessioninterfaces back to the shell. - No environment-variable concept, and no clear replacement. Unix
$VAR/exportdoes three jobs at once: ambient configuration inherited by every child, a per-process key-value scratchpad, and a side channel for caller-supplied tweaks (PATH,LANG,TZ,HTTP_PROXY,XDG_*). capOS deliberately has none of this — the manifest passes only a CapSet, and the shell does not synthesize a process-wide string-keyed table. There is also no obvious immediate need: configuration that should be authoritative belongs in aConfigcapability, locale/timezone are policy state on a session or service cap, and per-invocation tweaks fit the still-undecided parameter-passing convention above. The open question is whether capOS ever needs an explicit environment-like primitive (e.g. aKeyValueScopecapability bound to a session, an inheritable structured “ambient context” attached to a spawn plan, or a typedConfigOverlaychannel) for the cases where Unix would have used an environment variable, or whether each historical use case should instead be replaced by a dedicated capability (Locale,Clock,ProxyPolicy,XdgPaths,LogLevel) and the absence of an environment table treated as a feature rather than a gap. POSIX compatibility still has to exposegetenv/environ, but that is a separate per-process synthetic view inside the POSIX profile, not a native-shell concept.
- Cap management is manual. A shell user holds a CapSet and must