# Proposal: SSH Shell Gateway

Production remote shell access for capOS using SSH as a terminal transport
while preserving the native shell's capability boundaries.

## Status Split

Implemented:

- SSH-shaped authority prerequisites and fixture authentication proof:
  development-only sign-only host key, manifest-seeded authorized-key lookup,
  public-key session minting over fixture authentication bytes, unsupported
  feature policy/audit classification, restricted shell launcher, and a bounded
  host-local plain-TCP terminal-host proof.

Not implemented:

- encrypted SSH packet transport;
- OpenSSH-compatible key exchange and channel handling;
- full SSH userauth transcript validation;
- channel binding;
- `TerminalSessionFromByteStream` terminal-factory wiring;
- OpenSSH harness.

Do not infer OpenSSH-compatible remote login from the current "partially
implemented" status.

Remote and non-loopback deployment is blocked. The current proof uses
development/fixture key material and host-local plaintext wiring for bounded
authority checks; it is not a production SSH service. Before exposure beyond
loopback, the implementation must have encrypted SSH transport, production
host-key storage, durable authorized-key/account storage, full userauth
transcript validation, channel binding, audit records for auth and shell
launch, and a reviewed pre-auth/post-auth isolation story.

## Problem

The Telnet Shell Demo proves that a remote TCP connection can become a
`TerminalSession` without granting the shell raw network authority. That is
the right capability boundary, but Telnet is intentionally not a production
remote access path. It has no encryption, no host authentication, no replay
protection, no key-based user authentication, and no deployable security
story beyond "host loopback in QEMU."

capOS needs a production-oriented CLI remote shell that works with normal SSH
clients while avoiding the Unix mistake of treating an SSH login as a raw
remote root shell, ambient user id, inherited file descriptor set, or global
filesystem entry point.

The SSH path should be a terminal host and session authenticator. It should
not become a general-purpose privilege broker, TCP proxy, process supervisor,
or substitute for the native shell's capability model.

## Relationship To Telnet

SSH reuses the Telnet Shell Demo's core contract:

- A gateway accepts TCP connections.
- The gateway owns transport framing and terminal-host behavior.
- The spawned `capos-shell` receives a cap named `terminal` implementing
  `TerminalSession`.
- The shell receives the normal broker-issued shell bundle for the
  authenticated session.
- The shell does not receive raw `TcpSocket`, `NetworkManager`, listener,
  broad process-spawn, private-key, authorized-key-store, or host-key
  authority.

The transport changes. Telnet handles plaintext option negotiation over a
host-loopback QEMU forwarding rule. SSH handles version exchange, key
exchange, host-key proof, encrypted packet framing, user authentication,
session channels, PTY requests, window changes, shell requests, and clean
channel teardown.

The security boundary does not change. The shell still sees only a terminal
session and a scoped capability bundle.

The first SSH implementation milestone is still host-local development. It
should not silently inherit the Telnet demo's trusted gateway compromise.
Before implementation, the SSH path must either close the gateway authority gap
with scoped listener and shell-only launcher grants, or explicitly preserve
that gap in `REVIEW_FINDINGS.md` as a host-local-only compromise while still
proving that the spawned shell has no raw network, spawn, key, or SSH transport
authority.

Pre-auth and post-auth shell flows must not share broad process/address-space
authority for production exposure. Either split the authentication gateway and
post-auth shell launcher into separate processes with narrow handoff caps, or
produce a reviewable proof that the shared process cannot use pre-auth network,
key, listener, or parser state as post-auth shell authority.

## Scope

Initial SSH support is deliberately narrow:

- SSH-2 only, following the RFC 4251-4254 family at the protocol level.
- One interactive `session` channel per connection for the first proof.
- `pty-req`, `window-change`, `shell`, EOF, close, and disconnect handling.
- Public-key user authentication first.
- Fresh random material for key exchange, rekey, padding, session identifiers,
  and authentication challenges comes from `EntropySource` or a narrowed SSH
  transport-crypto service that owns `EntropySource`; it is never ambient
  process state.
- Password authentication only if it is wired to the existing
  `CredentialStore` failure/backoff path and policy explicitly enables it.
- No port forwarding, agent forwarding, X11 forwarding, SFTP, SCP, subsystem
  requests, exec requests, direct TCP forwarding, or arbitrary environment
  import in the first milestone.

Those excluded SSH features are not harmless defaults. In capOS they require
their own capabilities, policy, accounting, and audit records before exposure.

## Components

```mermaid
flowchart TD
    Client[SSH client] -->|TCP 22| Gateway[SshGateway]
    Gateway --> HostKey[SshHostKey cap]
    Gateway --> Keys[AuthorizedKeyStore]
    Gateway --> Sessions[SessionManager]
    Gateway --> Broker[AuthorityBroker]
    Gateway --> Launcher[RestrictedShellLauncher]
    Gateway --> Listen[TcpListenAuthority]
    Gateway --> Audit[AuditLog]

    Keys --> Sessions
    Sessions --> Broker
    Broker --> Bundle[Scoped shell bundle]
    Gateway --> Terminal[SSH-backed TerminalSession]
    Launcher --> Shell[capos-shell]
    Terminal --> Shell
    Bundle --> Shell
```

`SshGateway` is the only component exposed to the network. It is an ordinary
userspace service once the socket capability path can support it. During an
early implementation it may wrap the same in-kernel TCP capabilities used by
Telnet; a later decomposed-network stack should not change the shell contract.
The schema-level gateway contract is intentionally small: status and shutdown
methods identify the service surface without granting child shell authority.

`SshHostKey` is a sign-only private-key capability. It should be backed by
the `PrivateKey`/`KeyVault` model from
[cryptography-and-key-management-proposal.md](cryptography-and-key-management-proposal.md):
the gateway can sign the SSH exchange hash but cannot export private key
material, enumerate unrelated keys, or administer the vault.

`AuthorizedKeyStore` maps an SSH public key to a principal and authentication
policy. It stores public key material and policy metadata, not shell
authority. OpenSSH-format public keys are bytes imported into a verifier path,
matching the crypto proposal's `PublicKeyFormat.opensshWire` escape hatch for
public material. The initial schema returns an `SshAuthorizedKeyDecision` with
principal/profile metadata and an audit reason; actual shell authority still
comes from `SessionManager` and `AuthorityBroker`.

`TerminalSession` is backed by the SSH channel. The gateway translates channel
data, EOF, close, PTY mode, and window-size events into the terminal host
contract. The schema names this construction surface `SshTerminalFactory`;
it returns a result-cap index for the SSH-backed `TerminalSession`. Password
prompts, hidden echo, cancellation, and teardown stay at that boundary.

`TcpListenAuthority` is the scoped listener grant shape for this milestone. It
can mint only the configured `TcpListener` rather than exposing raw
`NetworkManager.createTcpListener` for arbitrary ports.

`RestrictedShellLauncher` is narrower than the transitional
`RestrictedLauncher`: it launches only the native shell against a supplied
terminal/session context instead of accepting an arbitrary binary name. The
current kernel source is manifest-declared as `restricted_shell_launcher`; it
adds the child `terminal`, `session`, and `stdio` grants itself and accepts
only named capability-sourced pass-through grants for the reviewed shell
startup bundle (`creds`, `sessions`, `audit`, `broker`, and optional
`system_info`). Before spawn it verifies the supplied `UserSession` profile
matches the requested profile, and the focused proof shows the spawned native
shell running under that supplied session.

## Authority Model

The gateway receives only the capabilities required for its job:

- TCP listen authority for the configured SSH port, preferably as a
  manifest-declared `TcpListener` handoff or scoped listener factory rather
  than raw `NetworkManager`.
- Sign-only `SshHostKey` authority for configured host-key algorithms.
- Narrow `EntropySource` authority, or an `SshTransportCrypto` cap that owns
  entropy and exposes only SSH key-exchange, rekey, cipher/MAC, and random
  padding operations.
- Read or verify authority over `AuthorizedKeyStore`.
- `SessionManager` authority to mint a session after successful SSH
  authentication.
- `AuthorityBroker` authority to request the normal remote shell profile.
- Restricted shell launch authority scoped to `capos-shell`.
- Pass-through grants required by the current shell startup path, such as
  `creds`, `sessions`, `audit`, and `broker`, where policy permits them.
- `AuditLog` append authority for connection, authentication, launch, and
  teardown records.

In the production-shaped authority model, it does not receive:

- Broad `ProcessSpawner` authority.
- Raw `NetworkManager`, outbound `connectTcp`, or an arbitrary listener
  factory.
- Key export or `KeyVault` administrative authority.
- Storage namespace authority except the narrow public-key records required
  by `AuthorizedKeyStore`.
- SSH agent, port-forward, or subsystem authority unless later proposals add
  explicit caps for those surfaces.

A host-local development checkpoint may temporarily preserve raw
`NetworkManager`, arbitrary listener factory, or broad `ProcessSpawner`
authority in the gateway only if `REVIEW_FINDINGS.md` records the compromise
and the harness proves it does not cross the shell boundary. The spawned shell
must never receive raw `NetworkManager`, `TcpListener`, `TcpSocket`,
`ProcessSpawner`, SSH transport, host-key, authorized-key-store, key-vault, or
general-purpose entropy authority.

Identity metadata is not authority. A login name, SSH username, key
fingerprint, source IP, principal id, or profile label only becomes useful
after a trusted service returns a capability bundle.

## Authentication

### Host authentication

The host key should be a narrow wrapper around a `PrivateKey` cap, constrained
to SSH host-key signing. Host keys are generated or imported through
`KeyVault`, opened through an explicit `SealPolicy`, and rotated through a
versioned host identity record. The gateway can sign the key exchange hash but
cannot export private material.

SSH transport keys are separate from the host key. Key exchange must use
fresh entropy and the algorithm policy selected for the deployment. The
baseline standards are RFC 4251-4254; extension negotiation and modern
algorithm recommendations come from later SSH RFCs such as RFC 8308, RFC
8709, RFC 9142, and other updates recorded by the RFC Editor for the 4251-4254
family. The first implementation should pin a small reviewed algorithm set
rather than accepting every algorithm a library exposes.

For development, a manifest-seeded host key may be acceptable only when the
manifest field, docs, and harness mark it as non-production. The current
development path uses `kernelParams.sshDevelopmentHostKey` with the required
label `capos-development-only-ssh-host-key` and the kernel source
`ssh_development_host_key`; the resulting cap exposes only public metadata and
signs bounded `ssh-ed25519` exchange hashes with the manifest seed for QEMU
proof. `make run-ssh-host-key` verifies the signature against the configured
public key, proves wrong-algorithm denial, and checks that the development seed
and raw signature are not printed to proof logs. For deployment, host keys need
persistent storage, rotation policy, key-management-backed signing, and audit.

### User public keys

Public-key login maps an accepted SSH public key to a principal record and
authentication strength. The key record should include:

```text
AuthorizedSshKey {
  keyId
  principalId
  publicKey
  algorithm
  fingerprint
  allowedProfiles
  sourcePolicy
  createdAtMs
  disabledAtMs
  comment
}
```

The current manifest-seeded prerequisites implement public key record loading,
generic authorization decisions, and a bounded session-mint bridge. The
`AuthorizedKeyStore` accepts `ssh-ed25519` records with 32-byte public keys and
SHA-256 fingerprints, rejects duplicate ids and fingerprints, maps principals
to existing seed accounts, and denies disabled records. `SessionManager`
accepts bounded fixture authentication bytes/signatures for configured keys and
mints `UserSession` metadata with `publicKey` authentication strength; the
focused `make run-ssh-public-key-auth` proof also shows `AuthorityBroker`
denying a mismatched shell profile.

`SessionManager.sshPublicKey` consults the bootstrap `RamAccountStore` after
signature verification using `lookup_by_principal`. Non-`Active` account
statuses (Disabled, Locked, RecoveryOnly) and missing principals fail closed
before a session is minted, so a runtime account-store mutation cannot be
ignored by the SSH path even though authorized-key records carry their own
`disabledAtMs` flag. The bootstrap fallback (no account store wired) keeps the
seed-account validation contract: manifest validation guarantees every
authorized-key principal binds to an active seed account.

Each denial path emits a stable `auth=` audit code (no schema variant change).
The codes form the SSH gateway's operator-visible audit contract:
`ssh-public-key` for success, `ssh-key-unknown`, `ssh-key-disabled`,
`ssh-key-profile-not-allowed`, `ssh-bad-signature`, `ssh-account-missing`,
`ssh-account-disabled`, `ssh-account-locked`, `ssh-account-recovery-only`,
`ssh-account-lookup-failed`, `ssh-profile-kind-invalid`,
`ssh-profile-not-interactive`, `ssh-auth-bytes-invalid`. Failed records keep
`principal` and `profile` blank by policy: the `auth=` code is the only
discriminator, so failed-auth lines cannot be used as a side channel to probe
for valid principal IDs.

This is still not a complete SSH public-key authentication exchange: no SSH
transport transcript, channel binding, or terminal factory is wired
end-to-end. A bounded plain-TCP terminal-host proof now reuses the configured
key fixture to mint a public-key session and launch `capos-shell` through
`RestrictedShellLauncher`, but that proof is not an encrypted SSH transport or
OpenSSH userauth exchange. End-to-end QEMU proof of the
`ssh-account-disabled`/`ssh-account-locked` paths requires an
`AccountStoreManagerCap` kernel cap source so a demo can mutate account state
at runtime; that is tracked in the local-users management backlog and is not
required by the bounded host-local SSH gateway proofs.

Cloud metadata may seed initial authorized keys through the cloud-bootstrap
path, but those keys are input to `AuthorizedKeyStore`, not ambient login
authority. A metadata-provided key still needs an account/profile mapping and
should be auditable as cloud-seeded material.

### Passwords and step-up

Password authentication over SSH is optional and should be disabled unless
`CredentialStore` can enforce the same generic failure text, bounded backoff,
rate limits, and audit behavior as the local shell. Keyboard-interactive can
later drive step-up prompts, but it should not be the first implementation
unless a concrete policy needs it.

## SSH Channel Policy

The first gateway accepts only `session` channels that request an interactive
shell. It rejects:

- `exec` requests.
- `subsystem` requests such as SFTP.
- agent forwarding.
- TCP forwarding and reverse forwarding.
- X11 forwarding.
- environment variables except a small reviewed allow-list, if any.
- more than one active shell channel per connection.

Each rejected request should produce an SSH protocol failure plus an audit
record with a reason code. The audit record should not include command lines,
environment dumps, key material, or terminal content.

The current bounded policy surface is `capos-config::ssh_policy`. It allows
public-key auth, one session channel, PTY, window-change, and a first shell
request. It denies disabled password auth, exec, subsystem/SFTP, direct TCP/IP,
TCP/IP forwarding and cancellation, agent forwarding, X11 forwarding,
environment import, second session-channel opens, and second shell channels.
Password auth has no policy allow path in this proof; it stays denied until a
real `CredentialStore` verifier, backoff, and audit path is wired into the
gateway. Denials return only a protocol failure class and a stable audit
reason code; request payloads such as command text and environment values are
not part of the decision data.

## Implementation Slices

The final OpenSSH proof should not land as one opaque SSH server commit. Keep
the implementation reviewable by landing these slices in order:

1. **Version exchange.** A bootable `ssh-gateway` service accepts one
   host-local OpenSSH TCP connection, exchanges RFC 4253 identification
   strings, records only sanitized client software/version metadata, and
   disconnects before key exchange without launching a shell. The compatibility
   harness uses `/usr/bin/ssh`; malformed and overlong client identification
   strings are covered by a separate low-level hostile TCP/banner fixture.
2. **KEXINIT and algorithm selection.** Parse KEXINIT, select exactly one
   reviewed development algorithm set, and disconnect on unsupported
   algorithms. Algorithm names are transport policy inputs, not authority.
3. **Development key exchange.** Complete the host-local encrypted transport by
   deriving traffic keys from the negotiated KEX shared secret, exchange hash,
   and session id per RFC 4253. Entropy supplies ephemeral KEX material,
   padding, and challenges, not direct session-key bytes. Call
   `SshHostKey.signExchangeHash` and prove no private host-key or raw entropy
   material reaches logs or child shell grants.
4. **Public-key userauth.** Bind the OpenSSH public-key userauth transcript to
   `SessionManager.sshPublicKey`, accept the configured key, deny unknown keys
   generically, and keep password auth disabled until a real verifier/backoff
   path is wired.
5. **Channel policy.** Route session open, PTY, window-change, shell, exec,
   subsystem, forwarding, agent, X11, environment, and second-channel requests
   through `capos-config::ssh_policy`, producing protocol-visible failures and
   sanitized audit reason codes for denied features.
6. **SSH-backed terminal launch.** Replace the plain-TCP terminal-host proof
   with an SSH channel-backed `TerminalSession`, launch `capos-shell` through
   `RestrictedShellLauncher`, run `session`, `caps`, and `exit` via OpenSSH,
   and prove cleanup for both client disconnect and shell exit.

## Resource And Teardown Rules

SSH exposes several resource boundaries before the shell even starts:
handshake CPU, pending connections, packet buffers, channels, PTY state,
terminal buffers, authentication attempts, and live shell processes.

The gateway must have fixed per-connection bounds and fail closed when they
are exceeded. Disconnect, TCP close, SSH channel close, failed
authentication, session expiration, shell exit, and gateway teardown must all
release the same resources:

- accepted socket,
- SSH connection state,
- terminal session object,
- spawned shell handle,
- broker-issued grants,
- authentication challenge state,
- audit correlation record.

Shell exit should close the SSH channel. Client disconnect should close the
terminal and let the shell observe the normal `TerminalSession` close path.

## Exit Criteria

The first SSH milestone is complete when:

- `SshGateway`, host-key, authorized-key, and SSH-backed terminal contracts
  are documented in schema/design form.
- The development host-key path is available only through an explicitly
  non-production manifest field and a narrow `SshHostKey` cap; production
  signing remains blocked on key management and persistent storage.
- A manifest can start an SSH gateway with only scoped TCP listen, host-key,
  authorized-key, session, broker, audit, and restricted shell-launch grants,
  or the remaining host-local demo compromise is explicitly preserved in
  `REVIEW_FINDINGS.md`.
- The gateway accepts a normal OpenSSH client on a host-local QEMU forwarded
  port, authenticates one public key, spawns `capos-shell` with a
  `TerminalSession`, runs one command, and disconnects cleanly.
- The harness proves denied password login when disabled, denied port
  forwarding, denied subsystem requests, rejected unknown keys, and cleanup
  after client disconnect.
- The harness proves unavailable entropy or disabled KEX algorithms fail
  closed before authentication or shell launch.
- Documentation states which parts are development-only and which are
  acceptable for production deployment.

## Dependencies

- Telnet Shell Demo for socket-backed `TerminalSession` proof.
- `TerminalSessionFromByteStream` as a shared prerequisite for SSH channel and
  TLS/mTLS-backed remote terminals. SSH channel data is not a connected
  `TcpSocket`; it must enter the same terminal factory used by
  Telnet-over-TLS so line discipline, echo policy, IAC handling where relevant,
  close semantics, and hidden password behavior do not fork by transport.
- Cryptography and key-management primitives for sign-only host keys.
- `EntropySource` or a narrowed SSH transport-crypto service for key exchange,
  rekey, packet padding, and challenge freshness.
- User identity, account, and session policy records for
  `AuthorizedKeyStore` principal/profile mapping.
- System-monitoring audit records for remote authentication, denied SSH
  features, launch decisions, and teardown.
- Resource accounting for connection, channel, and shell-process limits.
- Persistent storage before production host keys and authorized keys can
  survive reboot safely.

Remote-shell ingress should land in this order:

1. `TerminalSessionFromByteStream` and shared terminal line/echo/hidden-input
   discipline.
2. A transport-neutral byte-stream terminal factory used by both SSH channel
   data and TLS/mTLS cleartext byte streams.
3. Either Telnet-over-TLS or SSH may land first, but neither should fork
   terminal semantics.
4. Production deployment profile chooses SSH for familiar operator CLI access
   and TLS/mTLS for PKI-integrated service/operator environments.

No more SSH terminal transport work should land until the shared prerequisite
exists and has proof coverage for byte-identical hidden password behavior,
line/IAC factoring, and repeated close/reconnect behavior.

## Grounding

This proposal relies on these in-tree design documents and research notes:

- [Networking](networking-proposal.md) for the Telnet Shell Demo and TCP
  capability path.
- [Shell](shell-proposal.md) for the `TerminalSession` boundary.
- [Boot to Shell](boot-to-shell-proposal.md) for `CredentialStore`,
  `SessionManager`, `AuthorityBroker`, and `EntropySource`.
- [Cryptography and Key Management](cryptography-and-key-management-proposal.md)
  for `PrivateKey`, `PublicKeyFormat.opensshWire`, `KeyVault`, and
  `SealPolicy`.
- [User Identity and Policy](user-identity-and-policy-proposal.md) for
  principal/account/session/profile semantics.
- [Resource Accounting and Quotas](resource-accounting-proposal.md) for
  listener, socket, channel, packet-buffer, and shell-process bounds.
- [System Monitoring](system-monitoring-proposal.md) for audit record shape
  and retention boundaries.
- [Storage and Naming](storage-and-naming-proposal.md) for the capability-native
  storage model needed before production host keys and authorized keys become
  durable.
- [Trust Boundaries](../security/trust-boundaries.md) for remote-shell ingress
  review criteria.
- [Local Users Management Backlog](../backlog/local-users-management.md) for
  account, role, and RAM-store sequencing that feeds authorized-key principal
  mapping.
- [Genode Research](../research/genode.md) for the session-factory precedent:
  clients request narrowed sessions from authority-bearing components instead
  of receiving broad factories directly.
- [Pingora Research](../research/pingora.md) for the listener/service/runtime
  split that informs keeping TCP listener setup separate from application
  shell authority.

External standards grounding starts from RFC 4251, RFC 4252, RFC 4253, and
RFC 4254. Later SSH algorithm and extension updates, including RFC 8308, RFC
8709, and RFC 9142, must be checked when choosing the implementation's
accepted algorithm set.

## Non-Goals

- Replacing the native shell with a POSIX shell.
- Treating SSH username or Unix UID as authority.
- Ambient home directories, inherited file descriptors, or global paths.
- SSH agent forwarding as a shortcut to key authority.
- SFTP/SCP as a storage API before scoped file/storage capabilities exist.
- Port forwarding before explicit network-proxy capabilities and policy exist.