# Proposal: Native Shell and POSIX Shell

How interactive operation should work on capOS without reintroducing ambient
authority through a Unix-like command line.


## Problem

capOS deliberately avoids global paths, inherited file descriptors, ambient
network access, and process-wide privilege bits. A conventional shell assumes
all of those. If capOS copied a Unix shell model directly, the shell would
either be mostly useless or become an ambiently privileged escape hatch around
the capability model.

The system needs two related, but distinct, shell layers:

- **Native shell**: schema-aware capability REPL and scripting language.
- **POSIX shell**: compatibility personality for existing programs and scripts.

Both must be ordinary userspace processes. Neither should receive special
kernel privilege. The kernel and trusted capability-serving processes remain
the enforcement boundary.

Model-driven interaction on top of the native shell is a separate concern
and is defined in [llm-and-agent-proposal.md](llm-and-agent-proposal.md).
The model runs as its own service with no session authority; the native
shell (in "agent mode") is the runner: it holds the session caps, exposes
them to the model as typed tool descriptors with per-tool permission
modes, executes tool calls on behalf of the model, streams results back,
and keeps the user in the loop.

The first boot-to-shell milestone is text-only: local console login/setup and,
later in the same family, a browser-hosted terminal gateway. Graphical shells,
desktop UI, compositors, and GUI app launchers are a later tier. See
[boot-to-shell-proposal.md](boot-to-shell-proposal.md).

## Design Principles

- A shell starts with only the capabilities it was granted.
- A shell command compiles to typed capability calls, not stringly syscalls.
- Child processes receive explicit grants. There is no implicit inheritance of
  the shell's full authority.
- Elevation is a capability request mediated by a trusted broker, not a flag
  inside the shell.
- Shell startup is a workload launch from a `UserSession`, service principal,
  or recovery profile. Session metadata informs policy and audit; it is not
  authority.
- Default interactive cap sets are broker-issued session bundles, not
  hard-coded shell privileges.
- POSIX behavior is an adapter over scoped `Directory`, `File`, socket factory,
  and process capabilities. It is not the native authority model.

User identity and policy sit above this shell model. A shell session may be
associated with a human, service, guest, anonymous, or pseudonymous principal,
but the session's capabilities remain the authority. RBAC, ABAC, and mandatory
policy decide which scoped caps a broker may grant; they do not create a
kernel-side `uid`, role bit, or label check on ordinary capability calls. See
[user-identity-and-policy-proposal.md](user-identity-and-policy-proposal.md).

Federated sessions (OIDC-authenticated principals, service accounts using
OAuth2 workload identity) are one input shape for this model. OAuth scopes
and OIDC claims from a session's issuer feed `AuthorityBroker` as ABAC
attributes. They never authorize capability calls directly, and raw bearer
tokens never appear in shell state. The token-typed capabilities,
`OAuthClient`, `OidcIdentityProvider`, and the broker-side token handling
are defined in
[oidc-and-oauth2-proposal.md](oidc-and-oauth2-proposal.md).

## Layering

```mermaid
flowchart TD
    Input[Login, guest, anonymous, or service request] --> SessionMgr[SessionManager]
    SessionMgr --> Session[UserSession metadata cap]
    Session --> Broker[AuthorityBroker / PolicyEngine]
    Broker --> Bundle[Scoped session cap bundle]

    Bundle --> Native[Native shell]
    Bundle --> Posix[POSIX shell]

    Posix --> Compat[POSIX compatibility runtime]

    Native --> Ring[capos-rt capability transport]
    Compat --> Ring
    Ring --> Kernel[Kernel cap ring]
    Ring --> Services[Userspace services]

    Native --> Approval[Approval client cap]
    Approval --> Broker
    Broker --> Services
    Broker --> Audit[AuditLog]
```

The native shell is the primitive interactive surface. The POSIX shell is a
compatibility consumer of capOS capabilities, not the model other shells are
built on. A language-model service, when present, is invoked through a
`LanguageModel` cap from the native shell running in "agent mode"; the
shell is the tool runner, not the model. That flow is defined in
[llm-and-agent-proposal.md](llm-and-agent-proposal.md) and is not expanded
in this diagram.

A shell may display a principal name, profile, role set, label, or POSIX UID,
but those values are descriptive unless a trusted broker uses them to return a
specific capability. Losing a `home`, `logs`, `launcher`, or `approval` cap
cannot be repaired by presenting the same session ID back to the kernel.

## Native Shell

The native shell is a typed capability graph operator. Its job is to inspect,
invoke, pass, attenuate, release, and trace capabilities.

Current implementation status as of 2026-05-16 21:36 UTC: `capos-shell` is
the standalone `no_std` crate at `shell/` and ships the anonymous-first
interactive flow. Focused shell/login manifests still launch it directly as
`initConfig.init`; the default `make run` manifest now runs it as an
init-started service under standalone `init`, together with the chat /
adventure binaries and the remote-session CapSet gateway. On boot the shell
mints an anonymous `UserSession` via `SessionManager.anonymous()` and
receives an empty-allowlist `anonymous` bundle from `AuthorityBroker`.
`login` and `setup` commands use
`CredentialStore`/`SessionManager`/`AuthorityBroker` to verify or create the
password, mint an operator session, request the `operator` shell bundle, and
swap session/launcher in place. Login prompts for a username as well as a
password through a username-aware `SessionManager.login()` request that
carries method, selector, proof, and source metadata. A `guest` command
mints a guest session via `SessionManager.guest()` and swaps to a
broker-issued guest bundle (guest sessions require an explicit manifest seed;
no broad authority is granted to guest profiles). Shell exit calls
`UserSession.logout()` to clean up the session context. The default `make
run` manifest includes the native shell, chat/adventure binaries, `terminal`,
`console`, `stdio`, `chat`, `adventure`, `creds`, `sessions`, `audit`,
`broker`, and `system_info` caps; its MOTD shows the concrete `spawn` / `run`
commands for the adventure demo. The current command set is `help`, `caps`,
`binaries`, `motd`, `inspect <name>`, `session`, `login`, `setup`, `guest`,
`spawn`, blocking `run`, `wait`, and `exit`, with a launcher-backed
`binaries` command that lists binaries available to the current session
(anonymous and guest launcher policies return an empty list).
The session-scoped `TerminalSession` substrate now exists behind
`make run-terminal`, and the bounded SSH terminal-host proof can launch
`capos-shell` over a socket-backed `TerminalSession` with a public-key
`UserSession` through `RestrictedShellLauncher`. The generic
`call @cap.method(...)` REPL, schema reflection, richer daily shell profiles,
and the full OpenSSH gateway remain future work.

Example init or development session with explicit spawn authority:

```text
capos:init> caps
log        Console
spawn      ProcessSpawner
boot       BootPackage
vm         VirtualMemory

capos:init> call @log.writeLine({ text: "hello" })
ok

capos:init> spawn "tls-smoke" with {
  log: @log
} -> $child
started pid 12

capos:init> wait $child
exit 0
```

### Values

Native shell values should include:

- `@name`: a named capability in the current shell context.
- `$name`: a local value, result, promise, or process handle.
- structured values: text, bytes, integers, booleans, lists, and structs.
- result-cap values returned through the capOS transfer-result path.
- trace values representing CQE and call-history slices.

The shell should preserve interface metadata with every capability value. A
method call is valid only if the target cap exposes the method's schema.

### Commands

Initial commands should be small and explicit:

```text
caps
binaries
inspect @log
methods @spawn
call @log.writeLine({ text: "boot complete" })
spawn "ipc-server" with { log: @log, ep: @serverEp } -> $server
wait $server
run "ipc-client" with { log: @log, ep: client @serverEp }
release @temporary
trace $server
bind scratch = @store.sub("scratch")
derive readonly = @home.sub("config").readOnly()
```

`inspect` should show the interface ID, label, transferability, revocation
state when available, and callable methods. It should not imply that two caps
with the same interface ID are the same authority.

The current prototype intentionally does not yet provide the generic
`call @cap.method(...)` REPL. Until the schema registry and structured value
parser exist, `native-shell` exposes only narrow typed commands and should make
that gap visible through planning docs rather than accepting raw method IDs and
opaque byte blobs.

### Syntax

The syntax should be structured rather than shell-token based. A CUE-like or
Cap'n-Proto-literal-like shape fits capOS better than POSIX word splitting:

```text
spawn "net-stack" with {
  log: @log
  nic: @virtioNic
  timer: @timer
}
```

The shell can still provide abbreviations, but the executable representation
should be an `ActionPlan` object with typed fields.

### Composition

Native composition should pass typed caps or structured values, not inherited
byte streams by default:

```text
pipe @camera.frames()
  |> spawn "resize" with { input: $, width: 640, height: 480 }
  |> spawn "jpeg-encode" with { input: $, quality: 85 }
  |> call @photos.write({ name: "frame.jpg", data: $ })
```

If a byte stream is desired, it should be explicit through a `ByteStream`,
`File`, or POSIX adapter capability. This keeps the "pipe" operator from
silently turning every interface into untyped bytes.

### Namespaces

There is no global root. A native shell may have a current `Directory` or
`Namespace` capability, but that is just a default argument:

```text
capos:user> ls @config
services
network

capos:user> cd @config.sub("services")
capos:@config/services> ls
logger
net-stack
```

The shell cannot traverse above a scoped directory or namespace unless it holds
another capability that names that authority.

### Session Context

A session-aware shell may hold a `self` or `session` cap for `UserSession.info()`
and audit context. That cap is metadata. It can identify the principal, auth
strength, expiry, quota profile, and audit identity, but it cannot widen the
shell's CapSet or authorize kernel operations by itself.

The launcher or supervisor starts the shell with a CapSet returned by
`AuthorityBroker(session, profile)`. For interactive work, that bundle should
usually include scoped terminal, home, logs, launcher, status, and approval
caps. For service accounts, guest sessions, anonymous workloads, and recovery
mode, the broker returns different bundles under explicit policy profiles.

Shell-launched children inherit only the caps named in the spawn plan. A child
may receive a `UserSession` or session badge for audit, per-client quotas, or
service-side selection, but object access still comes from the scoped object
caps passed to that child.

### Interactive Command Surfaces

Application-specific interactions must stay out of the native shell command
set. A chat client, adventure client, or other interactive application should
run as an ordinary shell-spawned application or resident service session, not
as a builtin such as `chat` or `play adventure`.

The near-term target is a prototype bridge, not the final app protocol:
`capos-shell` launches clients with `spawn` or `run`, grants them explicit
endpoint clients such as `stdio: client @stdio`, and services `StdIO` while
waiting. That proves exact grants, process handles, child completion, and the
terminal bridge without giving a child the shell's move-only `TerminalSession`.
Legacy `badge N` syntax is retired from normal `client @...` grants; delegated
client endpoints preserve their service identity by default, and service object
capabilities replace badged chat/adventure identity. Explicit selector fixtures
remain only in low-level and hostile-path tests.

That `StdIO` bridge is intentionally limited. It is acceptable for focused
QEMU smokes and textual compatibility, but it is the wrong long-term semantic
boundary for capOS-native applications. If an adventure client receives a line
from `StdIO` and parses `go north`, `take key`, or `say hello` internally,
capOS has only moved string command parsing out of the shell and into the app.
That is still weaker than typed capability invocation.

Native interactive applications should expose a command surface:

```text
path=["go"], args={direction:"north"}
path=["take"], args={item:"brass-key"}
path=["say"], args={text:"hello there"}
path=["chat","join"], args={channel:"#lobby"}
```

The user may still type familiar `command <args>` forms. The shell or terminal
host parses them through generic command metadata, including nested
subcommands, argument kinds, completions, and redaction rules. The app receives
a structured invocation and converts it to typed service calls. The shell does
not hardcode application verbs, and the application does not parse unstructured
terminal text for normal operations.

`StdIO` remains an explicit text I/O capability for transcript output, simple
programs, POSIX compatibility, and test harnesses. It should not be the primary
command interface for native chat/adventure-style applications. The focused
design is in
[interactive-command-surface-proposal.md](interactive-command-surface-proposal.md).

### Remote Session CapSet Clients

Not every remote interaction should become a shell session. A regular host
application -- CLI, native GUI, Tauri backend, webapp gateway, or service
client -- should be able to authenticate to capOS, receive a broker-issued
remote view of its session CapSet, and call the capabilities it was granted
over Cap'n Proto RPC. That path is a programmatic peer of the native shell:
both consume a session bundle from `AuthorityBroker`, but only the shell adds
command parsing, terminal state, and child-process workflow.

The remote client must not receive the kernel's local CapSet page, local
cap-table indexes, endpoint selectors, result-cap indexes, or global session
identifiers. It receives typed RPC object references backed by a capOS
per-session worker. Chat, Paperclips, Adventure, command sessions, and future
service APIs should therefore be callable by generated clients without routing
through `capos-shell`. The owning design is
[remote-session-capset-client-proposal.md](remote-session-capset-client-proposal.md).
That proposal also covers bidirectional UI composition for web/Tauri/GUI
sessions: services can propose task-specific panes or command surfaces through
explicit UI caps, but cannot take arbitrary control of the host UI.

### Terminal Host Separation

The shell should not be the terminal host forever. The component that owns a
UART, web socket, GUI pane, line editing, history, paste handling, resize
state, and render policy can be a separate terminal host process. The shell
then runs against a terminal entity and can be reused unchanged from local
console, GUI, web, and scripted hosts.

`TerminalSession` remains the foreground text-session authority, but it is an
interface between terminal host and shell, not proof that the shell implements
the terminal. Shell-spawned applications should normally receive command
sessions or explicit `StdIO` adapters, not the shell's move-only
`TerminalSession`.

Remote text transports follow the same rule. The Telnet Shell Demo in
[networking-proposal.md](networking-proposal.md) is a demo-only plaintext
terminal host: it accepts a host-loopback QEMU-forwarded TCP connection and
gives the shell a socket-backed `TerminalSession`. The kernel-side socket
terminal silently consumes IAC option negotiation in its line discipline, so
no userspace pre-handoff recv is required. It must not turn the shell login path into a
raw `ByteStream`, raw `TcpSocket`, or `StdIO` substitute, because password
entry, echo policy, cancellation, and shell launch authority are defined at the
`TerminalSession` boundary. The QEMU harness for that demo binds the host
forward to `127.0.0.1:2323` only and runs `caps` to prove the child shell did
not receive raw `NetworkManager`, `ProcessSpawner`, TCP, or unknown capability
interfaces. The gateway itself remains a trusted demo bootstrap service until
scoped listener and manifest-declared shell-launch grants exist; production
remote CLI shell access waits for the SSH gateway layer. The SSH path is
specified separately in [ssh-shell-proposal.md](ssh-shell-proposal.md): it
keeps the same `TerminalSession` and broker-issued shell-bundle boundary, while
adding SSH host authentication, encrypted transport, public-key user
authentication, channel policy, and remote-session audit. Its initial schema
stubs name the terminal construction and authority surfaces as
`SshTerminalFactory`, `TcpListenAuthority`, and `RestrictedShellLauncher`; they
now have focused QEMU proofs for scoped listen authority, public-key session
minting, restricted shell launch, and a bounded plain-TCP terminal-host handoff.
A focused development-only host-key proof grants an explicitly labeled
non-production `SshHostKey` cap in QEMU that performs bounded fixture
exchange-hash signing. The full runnable OpenSSH gateway still waits on
encrypted transport, SSH packet/channel handling, persistent production
key-management-backed signing, and the final `run-ssh-shell` host harness.

## Agent Mode

Model-driven interaction is defined in
[llm-and-agent-proposal.md](llm-and-agent-proposal.md). This proposal does
not describe a separate "agent shell" process. The native shell, running
in "agent mode", is the tool runner: it holds the session cap bundle,
exposes caps to a `LanguageModel` service as typed `ToolDescriptor`
values with per-tool permission modes (`auto` / `consent` / `stepUp` /
`forbidden`), executes the model's tool calls against its own caps,
streams results back into the conversation, and keeps the user in the
loop through consent prompts, streaming, and interrupt. There is no
separate `PlannerAgent` or `ActionPlan` pipeline.

Long-lived OpenClaw-like hosted agents, swarms, background tasks, external
channel ingress, agent-maintained memory/wiki stores, and MCP/A2A-style
interoperability are intentionally separate from the shell surface; see
[capOS-Hosted Agent Swarms](hosted-agent-swarm-proposal.md). The shell can
launch, inspect, approve, or cancel hosted tasks, but it should not own the
hosted-agent control plane.

## Approval and Authentication

Elevation belongs in a trusted broker service that the shell can consult
but cannot impersonate.

Conceptual interfaces:

```capnp
interface ApprovalClient {
  request @0 (
    reason :Text,
    plan :ActionPlan,
    requestedCaps :List(CapRequest),
    durationMs :UInt64
  ) -> (grant :ApprovalGrant);
}

enum ApprovalState {
  pending @0;
  approved @1;
  denied @2;
  expired @3;
  escalated @4;
}

interface ApprovalGrant {
  state @0 () -> (state :ApprovalState, reason :Text);
  claim @1 () -> (caps :List(GrantedCap));
  cancel @2 () -> ();
}

interface AuthorityBroker {
  request @0 (
    session :UserSession,
    plan :ActionPlan,
    requestedCaps :List(CapRequest),
    durationMs :UInt64
  ) -> (grant :ApprovalGrant);
}
```

`ActionPlan` is the structured description of the work the request will
perform. Free-form text it carries is for the approval UI; the broker
decides authority from the typed step list, never from the summary string.

```capnp
struct ActionPlan {
  # Brief, redactable, human-readable summary. Used by the approval UI;
  # not used as an authority input by the broker.
  summary @0 :Text;

  # Structured action steps. The broker decides whether each step is
  # representable for the bound session/profile; an unrepresentable step
  # fails the whole request.
  steps @1 :List(ActionStep);

  # True if any step modifies durable state, terminates a service,
  # releases storage, sends external traffic, or is otherwise hard to
  # reverse. Brokers may require step-up authentication and longer
  # review windows when this is set.
  destructive @2 :Bool;

  # Stable identifier the requester sets so it can correlate the resulting
  # grant or queue entry. Brokers must not interpret this as authority.
  requestId @3 :Data;
}

struct ActionStep {
  union {
    spawn :group {
      # Manifest entry name or trusted launcher alias. The broker
      # resolves the alias to a binary identity before grant.
      target @0 :Text;
      # Cap names the spawned process needs from the launcher's
      # advertised set. Each name maps to a concrete `CapRequest`
      # in the enclosing `ActionPlan.requestedCaps`.
      capNames @1 :List(Text);
    }
    serviceControl :group {
      service @2 :Text;
      verb    @3 :ServiceVerb;
    }
    storageOpen :group {
      namespace @4 :Text;
      path      @5 :Text;
      mode      @6 :StorageMode;
    }
    # Free-form structured payload describing a step the broker
    # recognises by name. Lets new step kinds land without re-issuing
    # the schema; brokers refuse unknown `kind` values.
    custom :group {
      kind    @7 :Text;
      payload @8 :Data;
    }
  }
}

enum ServiceVerb {
  start   @0;
  stop    @1;
  restart @2;
  reload  @3;
}

enum StorageMode {
  read       @0;
  readWrite  @1;
  append     @2;
}
```

`CapRequest` describes a single capability the plan needs. The broker
matches each request against the principal's role bundle and ABAC
context; the response either narrows the request and mints the cap, or
denies. There is no widening path.

```capnp
struct CapRequest {
  # Capability interface name advertised by the broker
  # (`ServiceSupervisor`, `Directory`, `TcpProvider`, ...). The broker
  # refuses unknown interfaces.
  interface @0 :Text;

  # Identifier of the target object inside that interface. For
  # `ServiceSupervisor` this is the service name; for `Directory` it
  # is the namespace path; for `TcpProvider` it is an address-policy
  # selector. The broker validates the target against policy.
  target @1 :Text;

  # Per-cap maximum duration. The grant returns the lesser of this and
  # the plan-level `durationMs` after policy narrowing. Zero means
  # "use plan-level default".
  maxDurationMs @2 :UInt64;

  # Optional attenuation hints (subdirectory, method allow-list,
  # address filter). The broker may further narrow these but must
  # never widen them.
  attenuation @3 :Data;
}
```

`GrantedCap` is the same transport-level result-cap concept used by
`ProcessSpawner` -- a typed reference to an attenuated, leased
capability the broker has minted. It is not a separate authority
encoding; reading the granted cap is the only way to use the granted
authority.

The native shell holds only a session-bound `ApprovalClient`. It does not
submit arbitrary `PrincipalInfo`, role, UID, label values, or authentication
proofs as authority. The `ApprovalClient` forwards the bound `UserSession`
and typed request to `AuthorityBroker`. The broker or a consent service
wrapping it holds powerful caps, drives any trusted consent or step-up
authentication path, and mints attenuated temporary caps after policy and
authentication checks.

The conceptual API intentionally has no `authProof` argument on the
shell-visible path. If a proof is needed, it is collected by
`SessionManager`, the broker, or a trusted approval UI and reflected back
to the shell only as `pending`, `approved`, `denied`, `expired`, or
`escalated`.

### Approval Inbox

Synchronous approval is not always available. Step-up authentication, a
dual-control destructive action, or a deferred review (for example a
service-restart change-window) all need a durable queue: the request
must be listable later, persistent across reconnects, and triageable
in batch.

The broker exposes that queue through an `ApprovalInbox` cap minted
into the session bundle of whoever may approve. The inbox is not a
shell cap; the native shell uses `ApprovalClient` to *submit* requests,
and a separate principal (a security operator, the same operator under
step-up, or a multi-party reviewer set) holds the inbox cap that
*decides* them. Remote workspaces (the CapSet UI) treat
`ApprovalInbox` as the canonical pending-actions surface, which lets a
browser session show "you have pending approvals" without granting the
browser any of the requested authority.

```capnp
interface ApprovalInbox {
  # List entries currently awaiting decision. Bounded; the broker
  # enforces a per-inbox visible-window cap and may return fewer than
  # `limit` rows. `truncated` distinguishes "broker capped this page"
  # from "no further rows".
  list @0 (
    cursor :Data,
    limit  :UInt32
  ) -> (
    entries    :List(ApprovalEntry),
    nextCursor :Data,
    truncated  :Bool
  );

  # Look up a specific entry by id. Useful when a UI deep-links to
  # an entry past the listed window.
  entry @1 (entryId :Data) -> (entry :ApprovalEntry);

  # Approve, deny, or escalate a single entry. `approve` returns the
  # `ApprovalGrant` minted by the broker; `deny` and `escalate`
  # transition the entry without minting caps. The decider's reason
  # text is bounded and recorded in audit.
  decide @2 (
    entryId  :Data,
    decision :ApprovalDecision,
    reason   :Text
  ) -> (grant :ApprovalGrant);

  # Bulk-decide entries that share shape (same requester principal,
  # same plan summary fingerprint, same destructive flag). The broker
  # rejects mixed shapes with an explicit diagnostic instead of
  # silently approving heterogeneous requests.
  batchDecide @3 (
    entryIds :List(Data),
    decision :ApprovalDecision,
    reason   :Text
  ) -> (grants :List(ApprovalGrant));

  # Subscribe to inbox change events. The listener cap is held by
  # the broker; logging out of the inbox session revokes the
  # subscription.
  watch @4 (listener :ApprovalListener) -> ();
}

enum ApprovalDecision {
  approve  @0;
  deny     @1;
  escalate @2;
}

struct ApprovalEntry {
  # Broker-minted opaque id, stable across reconnects.
  entryId       @0 :Data;
  # Opaque audit-only principal id of the requester.
  requesterId   @1 :Data;
  # Display name; not authoritative.
  requesterName @2 :Text;
  plan          @3 :ActionPlan;
  requestedCaps @4 :List(CapRequest);
  durationMs    @5 :UInt64;
  state         @6 :ApprovalState;
  # Last decider reason or denial detail; bounded.
  reason        @7 :Text;
  createdAtMs   @8 :UInt64;
  expiresAtMs   @9 :UInt64;
  escalation    @10 :EscalationInfo;
}

struct EscalationInfo {
  # Number of additional reviewers the broker has notified. Zero when
  # the entry has not been escalated.
  reviewerCount @0 :UInt32;
  # Role names of the additional reviewers; never principal ids.
  reviewerHints @1 :List(Text);
}

interface ApprovalListener {
  appended  @0 (entry :ApprovalEntry) -> ();
  decided   @1 (entryId :Data, state :ApprovalState) -> ();
  expired   @2 (entryId :Data) -> ();
}
```

The `ApprovalClient` itself does not change shape: a request that the
broker cannot decide synchronously still returns an `ApprovalGrant`
immediately, with `state == pending` and a stable handle. The broker
adds an entry to the corresponding inbox; the requester polls or
watches its grant; the inbox holder drives the decision. When the
inbox holder calls `decide(approve)`, the existing grant transitions
to `approved` and `claim` returns the minted caps -- the requester
does not learn an entry id, and the inbox does not learn the
requester's `ApprovalGrant` cap. The two surfaces meet only at the
broker.

Inbox entries are durable across reconnects because `entryId` is
broker-minted and the inbox cap is session-bound rather than
transport-bound. Closing a transport does not delete entries;
re-presenting the same session-scoped inbox cap rebinds the listener
without losing pending state. Entries expire on the broker timer at
`expiresAtMs` and produce an `expired` listener event; expired
entries remain visible to `entry()` for a bounded audit window
defined by broker policy, after which they move to the audit log
only.

### Elevation Flow

User request (typed directly, or produced by agent-mode tool-use as an
`ActionPlan` before invoking the broker):

```text
restart the network stack
```

Requested action presented to the broker:

```text
- stop service "net-stack"
- spawn "net-stack"
- grant: nic, timer, log
- wait for health check

Missing authority:
- ServiceSupervisor(net-stack)

Requested duration:
- 60 seconds
```

Broker decision:

- Which `UserSession` and profile is this request bound to?
- Is that principal/profile allowed to restart `net-stack`?
- Is the requested binary allowed?
- Are the requested grants narrower than policy permits?
- Do mandatory confidentiality and integrity constraints allow the grant?
- Is there fresh user presence?
- Does this require step-up authentication?

If approved, the broker returns a narrow leased capability:

```text
supervisor: ServiceSupervisor(service="net-stack", expires=60s)
```

It should not return broad `ProcessSpawner`, `BootPackage`, or
`DeviceManager` authority when a scoped supervisor cap can do the job.

### Authentication

Authentication proof should be consumed by the `SessionManager` or broker
boundary, not exposed as a secret to the shell. Suitable mechanisms include:

- password or PIN for medium-risk local actions.
- hardware key or WebAuthn-style challenge for administrative actions.
- TPM-backed local presence for device or boot-policy operations.
- OIDC step-up: broker requests a fresh ID token from the session's IdP
  with `prompt=login`, `max_age`, or stronger `acr_values` before
  returning a leased cap. The IdP and `SessionManager` drive the user
  interaction; the shell sees only `pending` → `approved`/`denied`.
- multi-party approval for destructive policy, storage, or recovery actions.

The shell should never receive raw tokens (including OAuth access or refresh
tokens), private keys, recovery codes, or full environment dumps. When the
broker must delegate outbound authority to a session — for example, "read
from this company's HR API" — it returns a wrapper capability that holds
the `AccessToken` internally; the shell invokes the wrapper without seeing
the bearer string.

### Shell Hardening

The shell must treat files, logs, web pages, service output, model
output, and CQE payloads as untrusted data. They are not instructions.

Required behavior:

- show an executable typed plan before authority-changing actions.
- keep elevated caps leased, narrow, and short-lived.
- release temporary caps after the plan finishes or fails.
- audit every approval request, grant, cap transfer, and release.
- require exact targets for destructive actions.
- refuse broad phrases such as "give it everything" unless a trusted policy
  explicitly allows a named emergency mode.
- keep any model-derived context separate from secrets and authentication
  proofs; see the LLM/agent-runtime proposal for the model-service side.

The enforcement rule is simple: users and models may propose, explain,
and request. Capabilities decide what can happen.

## POSIX Shell

The POSIX shell is a compatibility layer for existing software and scripts. It
should be useful, but it should not define native capOS administration.

The C-ABI substrate for porting POSIX programs (including a POSIX shell) is
specified separately in
[posix-adapter-proposal.md](posix-adapter-proposal.md). `libcapos` exposes the
capability ring, CapSet, raw syscalls, and heap to C; `libcapos-posix` layers
the POSIX shape (fd table, errno, `pipe` / `read` / `write` / `dup` / `dup2`,
`fork` / `execve` / `waitpid` / `_exit`, `posix_spawn` and the file-action
shims, `clock_gettime`, UDP socket calls, console-backed stdio) on top. Phases
P1.1, P1.2, and P1.3 of that proposal are landed; the C-substrate, pipe cap,
recording-shim fork-for-exec, direct `posix_spawn` path, and Console-backed
stdio are proven by QEMU smokes (`make run-c-hello`, `make run-posix-dns-smoke`,
`make run-posix-pipe-smoke`, `make run-posix-stdio-smoke`). The POSIX shell port
itself depends on `Namespace` and `File` caps, which are tracked in that
proposal as gating work after the current phases close.

### Mapping

POSIX concepts map onto granted capabilities:

| POSIX concept | capOS backing |
|---|---|
| `/` | synthetic root built from granted `Directory` or `FileServer` caps |
| cwd | current scoped `Directory` cap |
| fd | local handle to `File`, `ByteStream`, pipe, terminal, or socket cap |
| pipe | `ByteStream` pair or userspace pipe service |
| `PATH` | search inside the synthetic root or a command registry cap |
| `exec` | `ProcessSpawner` or restricted launcher cap |
| sockets | socket factory caps such as `TcpProvider` or `HttpEndpoint` |
| `uid`, `gid`, user, group | synthetic POSIX profile derived from session metadata |
| `$HOME` | path alias backed by a granted `home` directory or namespace cap |
| `/etc/passwd`, `/etc/group` | profile service view, scoped to the compatibility environment |
| env vars | data only; never authority by themselves |

If a POSIX process has no network cap, `connect()` fails. If it has no
directory mounted at `/etc`, opening `/etc/resolv.conf` fails. If it has no
device cap, `/dev` is empty or synthetic.

A POSIX shell is launched with both a CapSet and compatibility profile
metadata. The profile controls what legacy APIs report. The CapSet controls
what the process can actually do.

### Compatibility Limits

Exact Unix semantics should not be promised early.

- Prefer `posix_spawn` over full `fork` for the first implementation.
- `fork` with arbitrary shared process state can be emulated later if needed.
- `setuid` cannot grant caps. At most it asks a compatibility broker to replace
  the POSIX profile or launch a new process with a different broker-issued cap
  bundle.
- Mode bits and ownership metadata do not create authority.
- `chmod` can modify filesystem metadata exposed by a filesystem service, but
  it cannot grant caps outside that service's policy.
- `/proc` is a debugging service view, not kernel ambient introspection.
- Device files exist only when a capability-backed adapter deliberately exposes
  them.

This is enough for many build tools and CLI programs without making POSIX the
security model.

### POSIX Session Caps

A normal POSIX shell session might receive:

```text
terminal      TerminalSession
session       UserSession metadata
profile       POSIX profile view
root          Directory or FileServer synthetic root
launcher      restricted ProcessSpawner/command launcher
pipeFactory   ByteStream factory
clock         Timer
```

Optional caps:

```text
tcp           scoped socket provider
home          writable user Directory
tmp           temporary Directory
proc          read-only process inspection tree
```

Administrative caps still require broker-mediated approval.

## Recovery Shell

A recovery shell is a separate policy profile, not the normal interactive
shell with hidden extra privileges. It may receive a larger cap set, but only after strong
local authentication and with full audit logging. Guest and anonymous profiles
must not fall into recovery authority by omission.

Possible recovery bundle:

```text
console
boot package read
system status read
service supervisor for critical services
read-only storage inspection
scoped repair caps
approval client
```

Destructive recovery operations should still go through exact-target approval.
The recovery shell should be local-only unless a separate remote recovery
policy explicitly grants network access.

## Required Interfaces

This proposal implies several service interfaces beyond the current smoke-test
surface:

- `UserSession` / `SessionManager`: principal/session metadata, audit context,
  and guest or anonymous profile creation
  ([user identity proposal](user-identity-and-policy-proposal.md)).
- `TerminalSession`: session-scoped interactive terminal I/O. The first
  boundary is line-oriented `write`, `writeLine`, and bounded `readLine`
  with per-call echo control and `submitted`/`cancelled`/`closed` outcomes;
  resize and paste framing can layer on later.
- `StdIO`: explicit text I/O capability serviced by the shell, a test harness,
  a web gateway, or another UI adapter. It has named `stdout`, `stderr`, and
  `status` streams plus `line`, `block`, and `hidden` read modes; it does not
  imply inherited POSIX file descriptors and should not be the semantic command
  interface for native interactive applications.
- `CommandSession`: generic interactive command surface for native
  applications. It describes command paths, nested subcommands, argument
  shapes, completions, prompts, redaction metadata, render events, and typed
  invocation results.
- `TerminalHost` / terminal entity: process and session object owning raw
  terminal transport, line discipline, presentation state, history, resize,
  and GUI/web framing while granting a foreground session to the shell.
- `SchemaRegistry`: maps interface IDs to method names and parameter schemas.
- `CommandRegistry`: optional registry of native command capabilities.
- `SystemStatus`: read-only process and service status.
- `LogReader`: scoped log access.
- `ServiceSupervisor`: restart/status authority for one service or subtree.
- `AuthorityBroker` / `ApprovalClient`: session-bound base bundles,
  plan-specific leased grants, and policy/authentication mediation.
- `CredentialStore`, `ConsoleLogin`, and `WebShellGateway`: boot-to-shell
  authentication services for password-verifier setup, passkey registration,
  federated OIDC login, and text terminal launch
  ([boot-to-shell proposal](boot-to-shell-proposal.md)).
- `OAuthClient`, `OidcIdentityProvider`, `TokenVerifier`,
  `WorkloadIdentityFederation`: OAuth2/OIDC primitives for federated
  login, outbound service authentication, and inbound resource-server
  token validation
  ([OIDC and OAuth2 proposal](oidc-and-oauth2-proposal.md)).
- `SshGateway`, `SshHostKey`, `AuthorizedKeyStore`, `SshTerminalFactory`,
  `TcpListenAuthority`, and `RestrictedShellLauncher`: production remote CLI
  terminal ingress, SSH host-key proof, public-key login mapping, scoped TCP
  listen authority, shell-only launch authority, and SSH-backed
  `TerminalSession` launch. The current development host-key proof exposes
  non-production public metadata and performs bounded fixture signing in QEMU;
  production host keys still require persistent key management
  ([SSH shell proposal](ssh-shell-proposal.md)).
- `AuditLog`: append-only record of plans, approvals, grants, and releases.
- `POSIXProfile` / compatibility broker: synthetic UID/GID, names, `$HOME`,
  cwd, and profile replacement without treating POSIX metadata as authority.
- `ByteStream` / pipe factory: explicit byte-stream composition for POSIX and
  selected native pipelines.

These should be ordinary capabilities. A shell only sees the subset it has
been granted.

## Implementation Plan

1. **Native serial shell**
   - Built on `capos-rt`.
   - Lists initial CapSet entries.
   - Invokes typed methods on the capabilities it was actually granted,
     including `TerminalSession` for ordinary interactive sessions.
   - When launched with a restricted launcher or other scoped spawn authority,
     spawns and waits on exact-grant children without assuming broad
     `BootPackage` or `ProcessSpawner` access.
   - Provides `caps`, `inspect`, `call`, `spawn`, `run`, `wait`, `release`, and
     `trace`.
   - Runs interactive applications as ordinary spawned commands or resident
     command sessions. `StdIO` requests may be serviced for text-stream
     programs, but native app commands should flow through structured command
     surfaces.

2. **Session-aware shell profile**
   - Use the `SessionManager -> UserSession metadata` and
     `AuthorityBroker(session, profile) -> cap bundle` split.
   - Add `self/session` introspection without making identity metadata
     authoritative.
   - Start with guest, local-presence, and service-account profiles before
     durable account storage exists.

3. **Structured native scripting**
   - Add typed variables, result-cap binding, and plan serialization.
   - Add schema registry support for method names and argument validation.
   - Add a generic command-surface parser so `command <args>` and nested
     subcommands compile to typed invocations without app-specific shell
     matches.
   - Add explicit byte-stream adapters for commands that need text streams.

4. **Approval broker**
   - Define `ActionPlan`, `ActionStep`, `CapRequest`, `ApprovalClient`,
     `ApprovalInbox`, `ApprovalEntry`, and leased grant records.
   - Add local authentication and audit logging.
   - Make administrative native-shell operations request scoped caps through
     the broker instead of running from a permanently privileged shell.
   - Wire `ApprovalInbox` into the operator session bundle so deferred,
     stepped-up, and multi-party approvals have a durable triage surface
     instead of relying on synchronous return-from-`request`.

5. **Boot-to-shell integration**
   - Add local console login/setup in front of the native shell.
   - Require a configured password verifier when one exists.
   - Enter setup mode when no console password verifier exists.
   - Treat guest as an explicit local profile and anonymous as a separate
     remote/programmatic profile, not as missing-password fallbacks.
   - Support passkey-only web terminal setup through local/bootstrap authority,
     not unauthenticated remote first use.
   - The local console login/setup half of this step is landed; the full
     boot-to-shell flow (durable multi-verifier accounts, passkey paths,
     federated OIDC login, web text shell gateway, production SSH shell
     gateway) is tracked in
     [boot-to-shell-proposal.md](boot-to-shell-proposal.md).

6. **Agent mode (out of scope here)**
   - Defined in [llm-and-agent-proposal.md](llm-and-agent-proposal.md):
     no separate "agent shell" process. The native shell, running in
     "agent mode", is the tool runner: it gains a `LanguageModel` client
     cap plus a per-tool permission table (`auto` / `consent` / `stepUp` /
     `forbidden`), exposes its own session caps as typed `ToolDescriptor`
     values to the model service, executes the model's tool calls against
     those caps, streams results back into the conversation, and keeps the
     user in the loop through consent prompts and interrupts. There is no
     `PlannerAgent` or static `ActionPlan` pipeline.

7. **POSIX shell**
   - Implement after `Directory`/`File`, `ByteStream`, and restricted process
     launch exist.
   - Start with `posix_spawn`, fd table emulation, cwd, scoped root, pipes, and
     terminal I/O, plus synthetic POSIX profile metadata.
   - Add broader compatibility only as real workloads demand it.

## Non-Goals

- No global root namespace.
- No shell-owned root/admin bit.
- No model-visible secrets.
- No default inheritance of all shell caps into children.
- No authorization from `PrincipalInfo`, UID/GID, role, or label values alone.
- No promise that POSIX scripts observe exact Unix behavior without a
  compatibility profile that grants the needed caps.

## Open Questions

- Should the native shell syntax be CUE-derived, Cap'n-Proto-literal-derived,
  or a smaller custom grammar?
- How should schema reflection be packaged before a full runtime
  `SchemaRegistry` exists?
- How should later `TerminalSession` extensions such as resize and paste
  framing fit without exposing raw transport authority to ordinary shells?
- How should the broker fingerprint plans for `ApprovalInbox.batchDecide`
  shape-equivalence? A direct hash of `ActionPlan.steps` is enough for
  identical plans submitted by the same requester profile, but
  near-identical plans differing only in `requestId` or summary text
  must still batch; near-identical plans differing in step targets or
  attenuation must not. The broker design needs an explicit
  fingerprinting rule before `batchDecide` can be enabled.
- How should audit logs be stored before persistent storage exists?
- How should interactive terminal UX scale beyond the planned
  "one typed capability per command" native-shell surface? The current
  prototype only exposes narrow typed commands; the questions below apply
  to the proposed surface, not just what already runs. Several concrete
  pain points are open:
  - **Cap management is manual.** A shell user holds a CapSet and must
    `inspect`, name, attenuate, pass, and `release` caps explicitly per
    command. That is the right model for trust, but it is hostile for
    everyday work compared with a Unix prompt where `$PWD`, `$PATH`, open
    fds, and ambient credentials disappear from the user's mind. The
    question is what affordances (named bindings, scoped session
    "workspaces", broker-issued bundles bound to a task, auto-release on
    plan completion, undo/redo on cap moves, a visible "current authority"
    indicator) the shell should provide so the typical user is not
    hand-curating a cap graph for every line. None of this should
    re-introduce ambient authority; the goal is ergonomics over an already
    typed graph, not hiding it.
  - **No agreed convention for passing parameters to programs.** The
    manifest currently launches binaries with a named CapSet and no
    positional `args`, no `argv`, no environment block, and no structured
    parameter struct (see `system.cue` and `SystemManifest` in
    `schema/capos.capnp`); init's `ProcessSpawner`-driven children inherit
    only the caps named in the spawn plan. Shell `spawn ... with { ... }`
    syntax is similarly cap-only. That is consistent, but it leaves
    "what does this program need to know besides its caps?" unanswered:
    where do free-form values (a chat channel name, an adventure save
    slot, a resize width) live? Options range from a typed
    `LaunchParameters` capnp struct passed through the spawn plan, to a
    convention that every program declares a parameter schema discovered
    via `SchemaRegistry`, to letting parameters always travel as fields on
    the first method call against a `CommandSession`/service cap rather
    than at launch time. The proposal should pick a single shape and
    describe how the manifest, shell `spawn`/`run`, native applications,
    and POSIX `argv` adapters all map onto it.
  - **No replacement for Unix pipes.** The native composition example uses
    `|>` but defers byte-stream semantics to `ByteStream`/`StdIO`, which
    is a strictly weaker pipe and not a data-processing model. Real
    workloads on Unix lean on text streams precisely because they are
    cheap and structured-enough; capOS can do better with typed records.
    The open question is whether to standardize a higher-level
    data-processing primitive — for example, YTsaurus-style map/reduce
    operators where each stage declares input and output schemas
    (`RecordStream<T>`?), the runtime negotiates a wire format
    (capnp records, framed JSON, columnar, raw bytes) at the boundary,
    and the shell's `|>` becomes a pipeline planner rather than a byte
    pump. That would give native shell pipelines first-class typed
    composition without making every interface look like `ByteStream`.
    The question is whether this belongs in shell scope, in a separate
    data-processing proposal, or as a `RecordStream` capability in the
    schema registry that the shell merely consumes.
  - **No story for ordinary shell programming constructs.** The proposed
    surface is one typed call per line plus `|>`; the prototype is even
    narrower. Real interactive and scripted use needs conditionals
    (branch on a cap call result, on
    `CapException` kind, on a value field), loops (iterate a `List`,
    fold a `RecordStream`, retry-with-backoff against a Timer), local
    variables and assignment beyond the implicit `$` from `|>`,
    user-defined functions/procedures that take typed parameters and
    capability arguments, early-return / break, and structured error
    handling that distinguishes transport-level `CapException` from
    application-level result variants. Each of these has capability-graph
    consequences that POSIX shells never had to face: does a function
    body close over the caller's CapSet by reference or by an explicit
    captured set, are caps bound inside a loop iteration auto-released
    at the end of that iteration, does a `try`/`recover` block release
    leased broker grants on the failure path, can a function be saved
    and re-invoked across sessions (i.e. does it become a persistent
    `ActionPlan` template), and how does the shell present a partial
    failure mid-pipeline without leaving orphan caps. The proposal
    should decide whether the native shell language defines these
    constructs itself, borrows them from a host language (CUE, a small
    embedded Rust-like DSL, an existing scripting runtime exposed as a
    capability), or stays deliberately non-Turing-complete and forces
    non-trivial control flow into spawned programs that expose typed
    `CommandSession` interfaces back to the shell.
  - **No environment-variable concept, and no clear replacement.** Unix
    `$VAR` / `export` does three jobs at once: ambient configuration
    inherited by every child, a per-process key-value scratchpad, and a
    side channel for caller-supplied tweaks (`PATH`, `LANG`, `TZ`,
    `HTTP_PROXY`, `XDG_*`). capOS deliberately has none of this — the
    manifest passes only a CapSet, and the shell does not synthesize a
    process-wide string-keyed table. There is also no obvious immediate
    need: configuration that should be authoritative belongs in a
    `Config` capability, locale/timezone are policy state on a session
    or service cap, and per-invocation tweaks fit the still-undecided
    parameter-passing convention above. The open question is whether
    capOS ever needs an explicit environment-like primitive (e.g. a
    `KeyValueScope` capability bound to a session, an inheritable
    structured "ambient context" attached to a spawn plan, or a typed
    `ConfigOverlay` channel) for the cases where Unix would have used an
    environment variable, or whether each historical use case should
    instead be replaced by a dedicated capability (`Locale`, `Clock`,
    `ProxyPolicy`, `XdgPaths`, `LogLevel`) and the absence of an
    environment table treated as a feature rather than a gap. POSIX
    compatibility still has to expose `getenv`/`environ`, but that is a
    separate per-process synthetic view inside the POSIX profile, not a
    native-shell concept.