Proposal: Native Shell and POSIX Shell

How interactive operation should work on capOS without reintroducing ambient authority through a Unix-like command line.

Problem

capOS deliberately avoids global paths, inherited file descriptors, ambient network access, and process-wide privilege bits. A conventional shell assumes all of those. If capOS copied a Unix shell model directly, the shell would either be mostly useless or become an ambiently privileged escape hatch around the capability model.

The system needs two related, but distinct, shell layers:

Native shell: schema-aware capability REPL and scripting language.
POSIX shell: compatibility personality for existing programs and scripts.

Both must be ordinary userspace processes. Neither should receive special kernel privilege. The kernel and trusted capability-serving processes remain the enforcement boundary.

Model-driven interaction on top of the native shell is a separate concern and is defined in Language Models and Agent Runtime. The model runs as its own service with no session authority; the native shell (in “agent mode”) is the runner: it holds the session caps, exposes them to the model as typed tool descriptors with per-tool permission modes, executes tool calls on behalf of the model, streams results back, and keeps the user in the loop.

The first boot-to-shell milestone is text-only: local console login/setup and, later in the same family, a browser-hosted terminal gateway. Graphical shells, desktop UI, compositors, and GUI app launchers are a later tier. See Boot to Shell.

Design Principles

A shell starts with only the capabilities it was granted.
A shell command compiles to typed capability calls, not stringly syscalls.
Child processes receive explicit grants. There is no implicit inheritance of the shell’s full authority.
Elevation is a capability request mediated by a trusted broker, not a flag inside the shell.
Shell startup is a workload launch from a UserSession, service principal, or recovery profile. Session metadata informs policy and audit; it is not authority.
Default interactive cap sets are broker-issued session bundles, not hard-coded shell privileges.
POSIX behavior is an adapter over scoped Directory, File, socket factory, and process capabilities. It is not the native authority model.

User identity and policy sit above this shell model. A shell session may be associated with a human, service, guest, anonymous, or pseudonymous principal, but the session’s capabilities remain the authority. RBAC, ABAC, and mandatory policy decide which scoped caps a broker may grant; they do not create a kernel-side uid, role bit, or label check on ordinary capability calls. See User Identity and Policy.

Federated sessions (OIDC-authenticated principals, service accounts using OAuth2 workload identity) are one input shape for this model. OAuth scopes and OIDC claims from a session’s issuer feed AuthorityBroker as ABAC attributes. They never authorize capability calls directly, and raw bearer tokens never appear in shell state. The token-typed capabilities, OAuthClient, OidcIdentityProvider, and the broker-side token handling are defined in OIDC and OAuth2.

Layering

flowchart TD
    Input[Login, guest, anonymous, or service request] --> SessionMgr[SessionManager]
    SessionMgr --> Session[UserSession metadata cap]
    Session --> Broker[AuthorityBroker / PolicyEngine]
    Broker --> Bundle[Scoped session cap bundle]

    Bundle --> Native[Native shell]
    Bundle --> Posix[POSIX shell]

    Posix --> Compat[POSIX compatibility runtime]

    Native --> Ring[capos-rt capability transport]
    Compat --> Ring
    Ring --> Kernel[Kernel cap ring]
    Ring --> Services[Userspace services]

    Native --> Approval[Approval client cap]
    Approval --> Broker
    Broker --> Services
    Broker --> Audit[AuditLog]

The native shell is the primitive interactive surface. The POSIX shell is a compatibility consumer of capOS capabilities, not the model other shells are built on. A language-model service, when present, is invoked through a LanguageModel cap from the native shell running in “agent mode”; the shell is the tool runner, not the model. That flow is defined in Language Models and Agent Runtime and is not expanded in this diagram.

A shell may display a principal name, profile, role set, label, or POSIX UID, but those values are descriptive unless a trusted broker uses them to return a specific capability. Losing a home, logs, launcher, or approval cap cannot be repaired by presenting the same session ID back to the kernel.

Native Shell

The native shell is a typed capability graph operator. Its job is to inspect, invoke, pass, attenuate, release, and trace capabilities.

Current implementation status as of 2026-05-16 21:36 UTC: capos-shell is the standalone no_std crate at shell/ and ships the anonymous-first interactive flow. Focused shell/login manifests still launch it directly as initConfig.init; the default make run manifest now runs it as an init-started service under standalone init, together with the chat / adventure binaries and the remote-session CapSet gateway. On boot the shell mints an anonymous UserSession via SessionManager.anonymous() and receives an empty-allowlist anonymous bundle from AuthorityBroker. login and setup commands use CredentialStore/SessionManager/AuthorityBroker to verify or create the password, mint an operator session, request the operator shell bundle, and swap session/launcher in place. Login prompts for a username as well as a password through a username-aware SessionManager.login() request that carries method, selector, proof, and source metadata. A guest command mints a guest session via SessionManager.guest() and swaps to a broker-issued guest bundle (guest sessions require an explicit manifest seed; no broad authority is granted to guest profiles). Shell exit calls UserSession.logout() to clean up the session context. The default make run manifest includes the native shell, chat/adventure binaries, terminal, console, stdio, chat, adventure, creds, sessions, audit, broker, and system_info caps; its MOTD shows the concrete spawn / run commands for the adventure demo. The current command set is help, caps, binaries, motd, inspect <name>, session, login, setup, guest, spawn, blocking run, wait, and exit, with a launcher-backed binaries command that lists binaries available to the current session (anonymous and guest launcher policies return an empty list). The session-scoped TerminalSession substrate now exists behind make run-terminal, and the bounded SSH terminal-host proof can launch capos-shell over a socket-backed TerminalSession with a public-key UserSession through RestrictedShellLauncher. The generic call @cap.method(...) REPL, schema reflection, richer daily shell profiles, and the full OpenSSH gateway remain future work.

Example init or development session with explicit spawn authority:

capos:init> caps
log        Console
spawn      ProcessSpawner
boot       BootPackage
vm         VirtualMemory

capos:init> call @log.writeLine({ text: "hello" })
ok

capos:init> spawn "tls-smoke" with {
  log: @log
} -> $child
started pid 12

capos:init> wait $child
exit 0

Values

Native shell values should include:

@name: a named capability in the current shell context.
$name: a local value, result, promise, or process handle.
structured values: text, bytes, integers, booleans, lists, and structs.
result-cap values returned through the capOS transfer-result path.
trace values representing CQE and call-history slices.

The shell should preserve interface metadata with every capability value. A method call is valid only if the target cap exposes the method’s schema.

Commands

Initial commands should be small and explicit:

caps
binaries
inspect @log
methods @spawn
call @log.writeLine({ text: "boot complete" })
spawn "ipc-server" with { log: @log, ep: @serverEp } -> $server
wait $server
run "ipc-client" with { log: @log, ep: client @serverEp }
release @temporary
trace $server
bind scratch = @store.sub("scratch")
derive readonly = @home.sub("config").readOnly()

inspect should show the interface ID, label, transferability, revocation state when available, and callable methods. It should not imply that two caps with the same interface ID are the same authority.

The current prototype intentionally does not yet provide the generic call @cap.method(...) REPL. Until the schema registry and structured value parser exist, native-shell exposes only narrow typed commands and should make that gap visible through planning docs rather than accepting raw method IDs and opaque byte blobs.

Syntax

The syntax should be structured rather than shell-token based. A CUE-like or Cap’n-Proto-literal-like shape fits capOS better than POSIX word splitting:

spawn "net-stack" with {
  log: @log
  nic: @virtioNic
  timer: @timer
}

The shell can still provide abbreviations, but the executable representation should be an ActionPlan object with typed fields.

Composition

Native composition should pass typed caps or structured values, not inherited byte streams by default:

pipe @camera.frames()
  |> spawn "resize" with { input: $, width: 640, height: 480 }
  |> spawn "jpeg-encode" with { input: $, quality: 85 }
  |> call @photos.write({ name: "frame.jpg", data: $ })

If a byte stream is desired, it should be explicit through a ByteStream, File, or POSIX adapter capability. This keeps the “pipe” operator from silently turning every interface into untyped bytes.

Namespaces

There is no global root. A native shell may have a current Directory or Namespace capability, but that is just a default argument:

capos:user> ls @config
services
network

capos:user> cd @config.sub("services")
capos:@config/services> ls
logger
net-stack

The shell cannot traverse above a scoped directory or namespace unless it holds another capability that names that authority.

Session Context

A session-aware shell may hold a self or session cap for UserSession.info() and audit context. That cap is metadata. It can identify the principal, auth strength, expiry, quota profile, and audit identity, but it cannot widen the shell’s CapSet or authorize kernel operations by itself.

The launcher or supervisor starts the shell with a CapSet returned by AuthorityBroker(session, profile). For interactive work, that bundle should usually include scoped terminal, home, logs, launcher, status, and approval caps. For service accounts, guest sessions, anonymous workloads, and recovery mode, the broker returns different bundles under explicit policy profiles.

Shell-launched children inherit only the caps named in the spawn plan. A child may receive a UserSession or session badge for audit, per-client quotas, or service-side selection, but object access still comes from the scoped object caps passed to that child.

Interactive Command Surfaces

Application-specific interactions must stay out of the native shell command set. A chat client, adventure client, or other interactive application should run as an ordinary shell-spawned application or resident service session, not as a builtin such as chat or play adventure.

The near-term target is a prototype bridge, not the final app protocol: capos-shell launches clients with spawn or run, grants them explicit endpoint clients such as stdio: client @stdio, and services StdIO while waiting. That proves exact grants, process handles, child completion, and the terminal bridge without giving a child the shell’s move-only TerminalSession. Legacy badge N syntax is retired from normal client @... grants; delegated client endpoints preserve their service identity by default, and service object capabilities replace badged chat/adventure identity. Explicit selector fixtures remain only in low-level and hostile-path tests.

That StdIO bridge is intentionally limited. It is acceptable for focused QEMU smokes and textual compatibility, but it is the wrong long-term semantic boundary for capOS-native applications. If an adventure client receives a line from StdIO and parses go north, take key, or say hello internally, capOS has only moved string command parsing out of the shell and into the app. That is still weaker than typed capability invocation.

Native interactive applications should expose a command surface:

path=["go"], args={direction:"north"}
path=["take"], args={item:"brass-key"}
path=["say"], args={text:"hello there"}
path=["chat","join"], args={channel:"#lobby"}

The user may still type familiar command <args> forms. The shell or terminal host parses them through generic command metadata, including nested subcommands, argument kinds, completions, and redaction rules. The app receives a structured invocation and converts it to typed service calls. The shell does not hardcode application verbs, and the application does not parse unstructured terminal text for normal operations.

StdIO remains an explicit text I/O capability for transcript output, simple programs, POSIX compatibility, and test harnesses. It should not be the primary command interface for native chat/adventure-style applications. The focused design is in Interactive Command Surfaces.

Remote Session CapSet Clients

Not every remote interaction should become a shell session. A regular host application – CLI, native GUI, Tauri backend, webapp gateway, or service client – should be able to authenticate to capOS, receive a broker-issued remote view of its session CapSet, and call the capabilities it was granted over Cap’n Proto RPC. That path is a programmatic peer of the native shell: both consume a session bundle from AuthorityBroker, but only the shell adds command parsing, terminal state, and child-process workflow.

The remote client must not receive the kernel’s local CapSet page, local cap-table indexes, endpoint selectors, result-cap indexes, or global session identifiers. It receives typed RPC object references backed by a capOS per-session worker. Chat, Paperclips, Adventure, command sessions, and future service APIs should therefore be callable by generated clients without routing through capos-shell. The owning design is Remote Session CapSet Clients. That proposal also covers bidirectional UI composition for web/Tauri/GUI sessions: services can propose task-specific panes or command surfaces through explicit UI caps, but cannot take arbitrary control of the host UI.

Terminal Host Separation

The shell should not be the terminal host forever. The component that owns a UART, web socket, GUI pane, line editing, history, paste handling, resize state, and render policy can be a separate terminal host process. The shell then runs against a terminal entity and can be reused unchanged from local console, GUI, web, and scripted hosts.

TerminalSession remains the foreground text-session authority, but it is an interface between terminal host and shell, not proof that the shell implements the terminal. Shell-spawned applications should normally receive command sessions or explicit StdIO adapters, not the shell’s move-only TerminalSession.

Remote text transports follow the same rule. The Telnet Shell Demo in Networking is a demo-only plaintext terminal host: it accepts a host-loopback QEMU-forwarded TCP connection and gives the shell a socket-backed TerminalSession. The kernel-side socket terminal silently consumes IAC option negotiation in its line discipline, so no userspace pre-handoff recv is required. It must not turn the shell login path into a raw ByteStream, raw TcpSocket, or StdIO substitute, because password entry, echo policy, cancellation, and shell launch authority are defined at the TerminalSession boundary. The QEMU harness for that demo binds the host forward to 127.0.0.1:2323 only and runs caps to prove the child shell did not receive raw NetworkManager, ProcessSpawner, TCP, or unknown capability interfaces. The gateway itself remains a trusted demo bootstrap service until scoped listener and manifest-declared shell-launch grants exist; production remote CLI shell access waits for the SSH gateway layer. The SSH path is specified separately in SSH Shell Gateway: it keeps the same TerminalSession and broker-issued shell-bundle boundary, while adding SSH host authentication, encrypted transport, public-key user authentication, channel policy, and remote-session audit. Its initial schema stubs name the terminal construction and authority surfaces as SshTerminalFactory, TcpListenAuthority, and RestrictedShellLauncher; they now have focused QEMU proofs for scoped listen authority, public-key session minting, restricted shell launch, and a bounded plain-TCP terminal-host handoff. A focused development-only host-key proof grants an explicitly labeled non-production SshHostKey cap in QEMU that performs bounded fixture exchange-hash signing. The full runnable OpenSSH gateway still waits on encrypted transport, SSH packet/channel handling, persistent production key-management-backed signing, and the final run-ssh-shell host harness.

Agent Mode

Model-driven interaction is defined in Language Models and Agent Runtime. This proposal does not describe a separate “agent shell” process. The native shell, running in “agent mode”, is the tool runner: it holds the session cap bundle, exposes caps to a LanguageModel service as typed ToolDescriptor values with per-tool permission modes (auto / consent / stepUp / forbidden), executes the model’s tool calls against its own caps, streams results back into the conversation, and keeps the user in the loop through consent prompts, streaming, and interrupt. There is no separate PlannerAgent or ActionPlan pipeline.

Long-lived OpenClaw-like hosted agents, swarms, background tasks, external channel ingress, agent-maintained memory/wiki stores, and MCP/A2A-style interoperability are intentionally separate from the shell surface; see capOS-Hosted Agent Swarms. The shell can launch, inspect, approve, or cancel hosted tasks, but it should not own the hosted-agent control plane.

Approval and Authentication

Elevation belongs in a trusted broker service that the shell can consult but cannot impersonate.

Conceptual interfaces:

interface ApprovalClient {
  request @0 (
    reason :Text,
    plan :ActionPlan,
    requestedCaps :List(CapRequest),
    durationMs :UInt64
  ) -> (grant :ApprovalGrant);
}

enum ApprovalState {
  pending @0;
  approved @1;
  denied @2;
  expired @3;
  escalated @4;
}

interface ApprovalGrant {
  state @0 () -> (state :ApprovalState, reason :Text);
  claim @1 () -> (caps :List(GrantedCap));
  cancel @2 () -> ();
}

interface AuthorityBroker {
  request @0 (
    session :UserSession,
    plan :ActionPlan,
    requestedCaps :List(CapRequest),
    durationMs :UInt64
  ) -> (grant :ApprovalGrant);
}

ActionPlan is the structured description of the work the request will perform. Free-form text it carries is for the approval UI; the broker decides authority from the typed step list, never from the summary string.

struct ActionPlan {
  # Brief, redactable, human-readable summary. Used by the approval UI;
  # not used as an authority input by the broker.
  summary @0 :Text;

  # Structured action steps. The broker decides whether each step is
  # representable for the bound session/profile; an unrepresentable step
  # fails the whole request.
  steps @1 :List(ActionStep);

  # True if any step modifies durable state, terminates a service,
  # releases storage, sends external traffic, or is otherwise hard to
  # reverse. Brokers may require step-up authentication and longer
  # review windows when this is set.
  destructive @2 :Bool;

  # Stable identifier the requester sets so it can correlate the resulting
  # grant or queue entry. Brokers must not interpret this as authority.
  requestId @3 :Data;
}

struct ActionStep {
  union {
    spawn :group {
      # Manifest entry name or trusted launcher alias. The broker
      # resolves the alias to a binary identity before grant.
      target @0 :Text;
      # Cap names the spawned process needs from the launcher's
      # advertised set. Each name maps to a concrete `CapRequest`
      # in the enclosing `ActionPlan.requestedCaps`.
      capNames @1 :List(Text);
    }
    serviceControl :group {
      service @2 :Text;
      verb    @3 :ServiceVerb;
    }
    storageOpen :group {
      namespace @4 :Text;
      path      @5 :Text;
      mode      @6 :StorageMode;
    }
    # Free-form structured payload describing a step the broker
    # recognises by name. Lets new step kinds land without re-issuing
    # the schema; brokers refuse unknown `kind` values.
    custom :group {
      kind    @7 :Text;
      payload @8 :Data;
    }
  }
}

enum ServiceVerb {
  start   @0;
  stop    @1;
  restart @2;
  reload  @3;
}

enum StorageMode {
  read       @0;
  readWrite  @1;
  append     @2;
}

CapRequest describes a single capability the plan needs. The broker matches each request against the principal’s role bundle and ABAC context; the response either narrows the request and mints the cap, or denies. There is no widening path.

struct CapRequest {
  # Capability interface name advertised by the broker
  # (`ServiceSupervisor`, `Directory`, `TcpProvider`, ...). The broker
  # refuses unknown interfaces.
  interface @0 :Text;

  # Identifier of the target object inside that interface. For
  # `ServiceSupervisor` this is the service name; for `Directory` it
  # is the namespace path; for `TcpProvider` it is an address-policy
  # selector. The broker validates the target against policy.
  target @1 :Text;

  # Per-cap maximum duration. The grant returns the lesser of this and
  # the plan-level `durationMs` after policy narrowing. Zero means
  # "use plan-level default".
  maxDurationMs @2 :UInt64;

  # Optional attenuation hints (subdirectory, method allow-list,
  # address filter). The broker may further narrow these but must
  # never widen them.
  attenuation @3 :Data;
}

GrantedCap is the same transport-level result-cap concept used by ProcessSpawner – a typed reference to an attenuated, leased capability the broker has minted. It is not a separate authority encoding; reading the granted cap is the only way to use the granted authority.

The native shell holds only a session-bound ApprovalClient. It does not submit arbitrary PrincipalInfo, role, UID, label values, or authentication proofs as authority. The ApprovalClient forwards the bound UserSession and typed request to AuthorityBroker. The broker or a consent service wrapping it holds powerful caps, drives any trusted consent or step-up authentication path, and mints attenuated temporary caps after policy and authentication checks.

The conceptual API intentionally has no authProof argument on the shell-visible path. If a proof is needed, it is collected by SessionManager, the broker, or a trusted approval UI and reflected back to the shell only as pending, approved, denied, expired, or escalated.

Approval Inbox

Synchronous approval is not always available. Step-up authentication, a dual-control destructive action, or a deferred review (for example a service-restart change-window) all need a durable queue: the request must be listable later, persistent across reconnects, and triageable in batch.

The broker exposes that queue through an ApprovalInbox cap minted into the session bundle of whoever may approve. The inbox is not a shell cap; the native shell uses ApprovalClient to submit requests, and a separate principal (a security operator, the same operator under step-up, or a multi-party reviewer set) holds the inbox cap that decides them. Remote workspaces (the CapSet UI) treat ApprovalInbox as the canonical pending-actions surface, which lets a browser session show “you have pending approvals” without granting the browser any of the requested authority.

interface ApprovalInbox {
  # List entries currently awaiting decision. Bounded; the broker
  # enforces a per-inbox visible-window cap and may return fewer than
  # `limit` rows. `truncated` distinguishes "broker capped this page"
  # from "no further rows".
  list @0 (
    cursor :Data,
    limit  :UInt32
  ) -> (
    entries    :List(ApprovalEntry),
    nextCursor :Data,
    truncated  :Bool
  );

  # Look up a specific entry by id. Useful when a UI deep-links to
  # an entry past the listed window.
  entry @1 (entryId :Data) -> (entry :ApprovalEntry);

  # Approve, deny, or escalate a single entry. `approve` returns the
  # `ApprovalGrant` minted by the broker; `deny` and `escalate`
  # transition the entry without minting caps. The decider's reason
  # text is bounded and recorded in audit.
  decide @2 (
    entryId  :Data,
    decision :ApprovalDecision,
    reason   :Text
  ) -> (grant :ApprovalGrant);

  # Bulk-decide entries that share shape (same requester principal,
  # same plan summary fingerprint, same destructive flag). The broker
  # rejects mixed shapes with an explicit diagnostic instead of
  # silently approving heterogeneous requests.
  batchDecide @3 (
    entryIds :List(Data),
    decision :ApprovalDecision,
    reason   :Text
  ) -> (grants :List(ApprovalGrant));

  # Subscribe to inbox change events. The listener cap is held by
  # the broker; logging out of the inbox session revokes the
  # subscription.
  watch @4 (listener :ApprovalListener) -> ();
}

enum ApprovalDecision {
  approve  @0;
  deny     @1;
  escalate @2;
}

struct ApprovalEntry {
  # Broker-minted opaque id, stable across reconnects.
  entryId       @0 :Data;
  # Opaque audit-only principal id of the requester.
  requesterId   @1 :Data;
  # Display name; not authoritative.
  requesterName @2 :Text;
  plan          @3 :ActionPlan;
  requestedCaps @4 :List(CapRequest);
  durationMs    @5 :UInt64;
  state         @6 :ApprovalState;
  # Last decider reason or denial detail; bounded.
  reason        @7 :Text;
  createdAtMs   @8 :UInt64;
  expiresAtMs   @9 :UInt64;
  escalation    @10 :EscalationInfo;
}

struct EscalationInfo {
  # Number of additional reviewers the broker has notified. Zero when
  # the entry has not been escalated.
  reviewerCount @0 :UInt32;
  # Role names of the additional reviewers; never principal ids.
  reviewerHints @1 :List(Text);
}

interface ApprovalListener {
  appended  @0 (entry :ApprovalEntry) -> ();
  decided   @1 (entryId :Data, state :ApprovalState) -> ();
  expired   @2 (entryId :Data) -> ();
}

The ApprovalClient itself does not change shape: a request that the broker cannot decide synchronously still returns an ApprovalGrant immediately, with state == pending and a stable handle. The broker adds an entry to the corresponding inbox; the requester polls or watches its grant; the inbox holder drives the decision. When the inbox holder calls decide(approve), the existing grant transitions to approved and claim returns the minted caps – the requester does not learn an entry id, and the inbox does not learn the requester’s ApprovalGrant cap. The two surfaces meet only at the broker.

Inbox entries are durable across reconnects because entryId is broker-minted and the inbox cap is session-bound rather than transport-bound. Closing a transport does not delete entries; re-presenting the same session-scoped inbox cap rebinds the listener without losing pending state. Entries expire on the broker timer at expiresAtMs and produce an expired listener event; expired entries remain visible to entry() for a bounded audit window defined by broker policy, after which they move to the audit log only.

Elevation Flow

User request (typed directly, or produced by agent-mode tool-use as an ActionPlan before invoking the broker):

restart the network stack

Requested action presented to the broker:

- stop service "net-stack"
- spawn "net-stack"
- grant: nic, timer, log
- wait for health check

Missing authority:
- ServiceSupervisor(net-stack)

Requested duration:
- 60 seconds

Broker decision:

Which UserSession and profile is this request bound to?
Is that principal/profile allowed to restart net-stack?
Is the requested binary allowed?
Are the requested grants narrower than policy permits?
Do mandatory confidentiality and integrity constraints allow the grant?
Is there fresh user presence?
Does this require step-up authentication?

If approved, the broker returns a narrow leased capability:

supervisor: ServiceSupervisor(service="net-stack", expires=60s)

It should not return broad ProcessSpawner, BootPackage, or DeviceManager authority when a scoped supervisor cap can do the job.

Authentication

Authentication proof should be consumed by the SessionManager or broker boundary, not exposed as a secret to the shell. Suitable mechanisms include:

password or PIN for medium-risk local actions.
hardware key or WebAuthn-style challenge for administrative actions.
TPM-backed local presence for device or boot-policy operations.
OIDC step-up: broker requests a fresh ID token from the session’s IdP with prompt=login, max_age, or stronger acr_values before returning a leased cap. The IdP and SessionManager drive the user interaction; the shell sees only pending → approved/denied.
multi-party approval for destructive policy, storage, or recovery actions.

The shell should never receive raw tokens (including OAuth access or refresh tokens), private keys, recovery codes, or full environment dumps. When the broker must delegate outbound authority to a session — for example, “read from this company’s HR API” — it returns a wrapper capability that holds the AccessToken internally; the shell invokes the wrapper without seeing the bearer string.

Shell Hardening

The shell must treat files, logs, web pages, service output, model output, and CQE payloads as untrusted data. They are not instructions.

Required behavior:

show an executable typed plan before authority-changing actions.
keep elevated caps leased, narrow, and short-lived.
release temporary caps after the plan finishes or fails.
audit every approval request, grant, cap transfer, and release.
require exact targets for destructive actions.
refuse broad phrases such as “give it everything” unless a trusted policy explicitly allows a named emergency mode.
keep any model-derived context separate from secrets and authentication proofs; see the LLM/agent-runtime proposal for the model-service side.

The enforcement rule is simple: users and models may propose, explain, and request. Capabilities decide what can happen.

POSIX Shell

The POSIX shell is a compatibility layer for existing software and scripts. It should be useful, but it should not define native capOS administration.

The C-ABI substrate for porting POSIX programs (including a POSIX shell) is specified separately in POSIX Adapter. libcapos exposes the capability ring, CapSet, raw syscalls, and heap to C; libcapos-posix layers the POSIX shape (fd table, errno, pipe / read / write / dup / dup2, fork / execve / waitpid / _exit, posix_spawn and the file-action shims, clock_gettime, UDP socket calls, console-backed stdio) on top. Phases P1.1, P1.2, and P1.3 of that proposal are landed; the C-substrate, pipe cap, recording-shim fork-for-exec, direct posix_spawn path, and Console-backed stdio are proven by QEMU smokes (make run-c-hello, make run-posix-dns-smoke, make run-posix-pipe-smoke, make run-posix-stdio-smoke). The POSIX shell port itself depends on Namespace and File caps, which are tracked in that proposal as gating work after the current phases close.

Mapping

POSIX concepts map onto granted capabilities:

POSIX concept	capOS backing
`/`	synthetic root built from granted `Directory` or `FileServer` caps
cwd	current scoped `Directory` cap
fd	local handle to `File`, `ByteStream`, pipe, terminal, or socket cap
pipe	`ByteStream` pair or userspace pipe service
`PATH`	search inside the synthetic root or a command registry cap
`exec`	`ProcessSpawner` or restricted launcher cap
sockets	socket factory caps such as `TcpProvider` or `HttpEndpoint`
`uid`, `gid`, user, group	synthetic POSIX profile derived from session metadata
`$HOME`	path alias backed by a granted `home` directory or namespace cap
`/etc/passwd`, `/etc/group`	profile service view, scoped to the compatibility environment
env vars	data only; never authority by themselves

If a POSIX process has no network cap, connect() fails. If it has no directory mounted at /etc, opening /etc/resolv.conf fails. If it has no device cap, /dev is empty or synthetic.

A POSIX shell is launched with both a CapSet and compatibility profile metadata. The profile controls what legacy APIs report. The CapSet controls what the process can actually do.

Compatibility Limits

Exact Unix semantics should not be promised early.

Prefer posix_spawn over full fork for the first implementation.
fork with arbitrary shared process state can be emulated later if needed.
setuid cannot grant caps. At most it asks a compatibility broker to replace the POSIX profile or launch a new process with a different broker-issued cap bundle.
Mode bits and ownership metadata do not create authority.
chmod can modify filesystem metadata exposed by a filesystem service, but it cannot grant caps outside that service’s policy.
/proc is a debugging service view, not kernel ambient introspection.
Device files exist only when a capability-backed adapter deliberately exposes them.

This is enough for many build tools and CLI programs without making POSIX the security model.

POSIX Session Caps

A normal POSIX shell session might receive:

terminal      TerminalSession
session       UserSession metadata
profile       POSIX profile view
root          Directory or FileServer synthetic root
launcher      restricted ProcessSpawner/command launcher
pipeFactory   ByteStream factory
clock         Timer

Optional caps:

tcp           scoped socket provider
home          writable user Directory
tmp           temporary Directory
proc          read-only process inspection tree

Administrative caps still require broker-mediated approval.

Recovery Shell

A recovery shell is a separate policy profile, not the normal interactive shell with hidden extra privileges. It may receive a larger cap set, but only after strong local authentication and with full audit logging. Guest and anonymous profiles must not fall into recovery authority by omission.

Possible recovery bundle:

console
boot package read
system status read
service supervisor for critical services
read-only storage inspection
scoped repair caps
approval client

Destructive recovery operations should still go through exact-target approval. The recovery shell should be local-only unless a separate remote recovery policy explicitly grants network access.

Required Interfaces

This proposal implies several service interfaces beyond the current smoke-test surface:

UserSession / SessionManager: principal/session metadata, audit context, and guest or anonymous profile creation (user identity proposal).
TerminalSession: session-scoped interactive terminal I/O. The first boundary is line-oriented write, writeLine, and bounded readLine with per-call echo control and submitted/cancelled/closed outcomes; resize and paste framing can layer on later.
StdIO: explicit text I/O capability serviced by the shell, a test harness, a web gateway, or another UI adapter. It has named stdout, stderr, and status streams plus line, block, and hidden read modes; it does not imply inherited POSIX file descriptors and should not be the semantic command interface for native interactive applications.
CommandSession: generic interactive command surface for native applications. It describes command paths, nested subcommands, argument shapes, completions, prompts, redaction metadata, render events, and typed invocation results.
TerminalHost / terminal entity: process and session object owning raw terminal transport, line discipline, presentation state, history, resize, and GUI/web framing while granting a foreground session to the shell.
SchemaRegistry: maps interface IDs to method names and parameter schemas.
CommandRegistry: optional registry of native command capabilities.
SystemStatus: read-only process and service status.
LogReader: scoped log access.
ServiceSupervisor: restart/status authority for one service or subtree.
AuthorityBroker / ApprovalClient: session-bound base bundles, plan-specific leased grants, and policy/authentication mediation.
CredentialStore, ConsoleLogin, and WebShellGateway: boot-to-shell authentication services for password-verifier setup, passkey registration, federated OIDC login, and text terminal launch (boot-to-shell proposal).
OAuthClient, OidcIdentityProvider, TokenVerifier, WorkloadIdentityFederation: OAuth2/OIDC primitives for federated login, outbound service authentication, and inbound resource-server token validation (OIDC and OAuth2 proposal).
SshGateway, SshHostKey, AuthorizedKeyStore, SshTerminalFactory, TcpListenAuthority, and RestrictedShellLauncher: production remote CLI terminal ingress, SSH host-key proof, public-key login mapping, scoped TCP listen authority, shell-only launch authority, and SSH-backed TerminalSession launch. The current development host-key proof exposes non-production public metadata and performs bounded fixture signing in QEMU; production host keys still require persistent key management (SSH shell proposal).
AuditLog: append-only record of plans, approvals, grants, and releases.
POSIXProfile / compatibility broker: synthetic UID/GID, names, $HOME, cwd, and profile replacement without treating POSIX metadata as authority.
ByteStream / pipe factory: explicit byte-stream composition for POSIX and selected native pipelines.

These should be ordinary capabilities. A shell only sees the subset it has been granted.

Implementation Plan

Native serial shell
- Built on capos-rt.
- Lists initial CapSet entries.
- Invokes typed methods on the capabilities it was actually granted, including TerminalSession for ordinary interactive sessions.
- When launched with a restricted launcher or other scoped spawn authority, spawns and waits on exact-grant children without assuming broad BootPackage or ProcessSpawner access.
- Provides caps, inspect, call, spawn, run, wait, release, and trace.
- Runs interactive applications as ordinary spawned commands or resident command sessions. StdIO requests may be serviced for text-stream programs, but native app commands should flow through structured command surfaces.
Session-aware shell profile
- Use the SessionManager -> UserSession metadata and AuthorityBroker(session, profile) -> cap bundle split.
- Add self/session introspection without making identity metadata authoritative.
- Start with guest, local-presence, and service-account profiles before durable account storage exists.
Structured native scripting
- Add typed variables, result-cap binding, and plan serialization.
- Add schema registry support for method names and argument validation.
- Add a generic command-surface parser so command <args> and nested subcommands compile to typed invocations without app-specific shell matches.
- Add explicit byte-stream adapters for commands that need text streams.
Approval broker
- Define ActionPlan, ActionStep, CapRequest, ApprovalClient, ApprovalInbox, ApprovalEntry, and leased grant records.
- Add local authentication and audit logging.
- Make administrative native-shell operations request scoped caps through the broker instead of running from a permanently privileged shell.
- Wire ApprovalInbox into the operator session bundle so deferred, stepped-up, and multi-party approvals have a durable triage surface instead of relying on synchronous return-from-request.
Boot-to-shell integration
- Add local console login/setup in front of the native shell.
- Require a configured password verifier when one exists.
- Enter setup mode when no console password verifier exists.
- Treat guest as an explicit local profile and anonymous as a separate remote/programmatic profile, not as missing-password fallbacks.
- Support passkey-only web terminal setup through local/bootstrap authority, not unauthenticated remote first use.
- The local console login/setup half of this step is landed; the full boot-to-shell flow (durable multi-verifier accounts, passkey paths, federated OIDC login, web text shell gateway, production SSH shell gateway) is tracked in Boot to Shell.
Agent mode (out of scope here)
- Defined in Language Models and Agent Runtime: no separate “agent shell” process. The native shell, running in “agent mode”, is the tool runner: it gains a LanguageModel client cap plus a per-tool permission table (auto / consent / stepUp / forbidden), exposes its own session caps as typed ToolDescriptor values to the model service, executes the model’s tool calls against those caps, streams results back into the conversation, and keeps the user in the loop through consent prompts and interrupts. There is no PlannerAgent or static ActionPlan pipeline.
POSIX shell
- Implement after Directory/File, ByteStream, and restricted process launch exist.
- Start with posix_spawn, fd table emulation, cwd, scoped root, pipes, and terminal I/O, plus synthetic POSIX profile metadata.
- Add broader compatibility only as real workloads demand it.

Non-Goals

No global root namespace.
No shell-owned root/admin bit.
No model-visible secrets.
No default inheritance of all shell caps into children.
No authorization from PrincipalInfo, UID/GID, role, or label values alone.
No promise that POSIX scripts observe exact Unix behavior without a compatibility profile that grants the needed caps.

Open Questions

Should the native shell syntax be CUE-derived, Cap’n-Proto-literal-derived, or a smaller custom grammar?
How should schema reflection be packaged before a full runtime SchemaRegistry exists?
How should later TerminalSession extensions such as resize and paste framing fit without exposing raw transport authority to ordinary shells?
How should the broker fingerprint plans for ApprovalInbox.batchDecide shape-equivalence? A direct hash of ActionPlan.steps is enough for identical plans submitted by the same requester profile, but near-identical plans differing only in requestId or summary text must still batch; near-identical plans differing in step targets or attenuation must not. The broker design needs an explicit fingerprinting rule before batchDecide can be enabled.
How should audit logs be stored before persistent storage exists?
How should interactive terminal UX scale beyond the planned “one typed capability per command” native-shell surface? The current prototype only exposes narrow typed commands; the questions below apply to the proposed surface, not just what already runs. Several concrete pain points are open:
- Cap management is manual. A shell user holds a CapSet and must inspect, name, attenuate, pass, and release caps explicitly per command. That is the right model for trust, but it is hostile for everyday work compared with a Unix prompt where $PWD, $PATH, open fds, and ambient credentials disappear from the user’s mind. The question is what affordances (named bindings, scoped session “workspaces”, broker-issued bundles bound to a task, auto-release on plan completion, undo/redo on cap moves, a visible “current authority” indicator) the shell should provide so the typical user is not hand-curating a cap graph for every line. None of this should re-introduce ambient authority; the goal is ergonomics over an already typed graph, not hiding it.
- No agreed convention for passing parameters to programs. The manifest currently launches binaries with a named CapSet and no positional args, no argv, no environment block, and no structured parameter struct (see system.cue and SystemManifest in schema/capos.capnp); init’s ProcessSpawner-driven children inherit only the caps named in the spawn plan. Shell spawn ... with { ... } syntax is similarly cap-only. That is consistent, but it leaves “what does this program need to know besides its caps?” unanswered: where do free-form values (a chat channel name, an adventure save slot, a resize width) live? Options range from a typed LaunchParameters capnp struct passed through the spawn plan, to a convention that every program declares a parameter schema discovered via SchemaRegistry, to letting parameters always travel as fields on the first method call against a CommandSession/service cap rather than at launch time. The proposal should pick a single shape and describe how the manifest, shell spawn/run, native applications, and POSIX argv adapters all map onto it.
- No replacement for Unix pipes. The native composition example uses |> but defers byte-stream semantics to ByteStream/StdIO, which is a strictly weaker pipe and not a data-processing model. Real workloads on Unix lean on text streams precisely because they are cheap and structured-enough; capOS can do better with typed records. The open question is whether to standardize a higher-level data-processing primitive — for example, YTsaurus-style map/reduce operators where each stage declares input and output schemas (RecordStream<T>?), the runtime negotiates a wire format (capnp records, framed JSON, columnar, raw bytes) at the boundary, and the shell’s |> becomes a pipeline planner rather than a byte pump. That would give native shell pipelines first-class typed composition without making every interface look like ByteStream. The question is whether this belongs in shell scope, in a separate data-processing proposal, or as a RecordStream capability in the schema registry that the shell merely consumes.
- No story for ordinary shell programming constructs. The proposed surface is one typed call per line plus |>; the prototype is even narrower. Real interactive and scripted use needs conditionals (branch on a cap call result, on CapException kind, on a value field), loops (iterate a List, fold a RecordStream, retry-with-backoff against a Timer), local variables and assignment beyond the implicit $ from |>, user-defined functions/procedures that take typed parameters and capability arguments, early-return / break, and structured error handling that distinguishes transport-level CapException from application-level result variants. Each of these has capability-graph consequences that POSIX shells never had to face: does a function body close over the caller’s CapSet by reference or by an explicit captured set, are caps bound inside a loop iteration auto-released at the end of that iteration, does a try/recover block release leased broker grants on the failure path, can a function be saved and re-invoked across sessions (i.e. does it become a persistent ActionPlan template), and how does the shell present a partial failure mid-pipeline without leaving orphan caps. The proposal should decide whether the native shell language defines these constructs itself, borrows them from a host language (CUE, a small embedded Rust-like DSL, an existing scripting runtime exposed as a capability), or stays deliberately non-Turing-complete and forces non-trivial control flow into spawned programs that expose typed CommandSession interfaces back to the shell.
- No environment-variable concept, and no clear replacement. Unix $VAR / export does three jobs at once: ambient configuration inherited by every child, a per-process key-value scratchpad, and a side channel for caller-supplied tweaks (PATH, LANG, TZ, HTTP_PROXY, XDG_*). capOS deliberately has none of this — the manifest passes only a CapSet, and the shell does not synthesize a process-wide string-keyed table. There is also no obvious immediate need: configuration that should be authoritative belongs in a Config capability, locale/timezone are policy state on a session or service cap, and per-invocation tweaks fit the still-undecided parameter-passing convention above. The open question is whether capOS ever needs an explicit environment-like primitive (e.g. a KeyValueScope capability bound to a session, an inheritable structured “ambient context” attached to a spawn plan, or a typed ConfigOverlay channel) for the cases where Unix would have used an environment variable, or whether each historical use case should instead be replaced by a dedicated capability (Locale, Clock, ProxyPolicy, XdgPaths, LogLevel) and the absence of an environment table treated as a feature rather than a gap. POSIX compatibility still has to expose getenv/environ, but that is a separate per-process synthetic view inside the POSIX profile, not a native-shell concept.

Keyboard shortcuts

capOS Documentation