Proposal: Remote Session CapSet Clients
Let a regular host application connect to a capOS instance, authenticate through the same session machinery as shells and gateways, receive a broker-issued remote view of its CapSet, and invoke the granted capabilities over standard Cap’n Proto RPC. The first proof can be a Linux Rust CLI because it is easy to script, but the design is for host applications generally: native GUI apps, Tauri apps with Rust backends, server-side webapp gateways, desktop tools, and agent runners can all consume the same remote session CapSet model.
The important correction is that this is not a special “remote chat client” and not another shell transport. Chat, Paperclips, Adventure, system-info, command surfaces, and future service APIs should be ordinary capabilities in a remote session bundle. A shell is one possible client of that bundle; it is not the universal protocol.
Current State
The tree has two interop proofs:
demos/capnp-chat-interopruns inside capOS, accepts one scoped TCP connection, decodes a schema-framedChat.sendparameter message, calls the resident chat endpoint, returns a schema-framed result, and exits.- The host harness uses a Linux Python script plus the pinned
capnptool to encode/decode request and result messages. demos/remote-session-capset-gatewayruns inside capOS, listens through a manifest-scopedTcpListenAuthorityon guest port2327, authenticates a remote session throughSessionManager, returns a broker-shaped remote CapSet view, calls session/system-info DTO operations, and proves wrong-interface, unknown-cap, and stale-session denials. It derives login source metadata from the accepted socket and a gateway-generated connection event id.tools/remote-session-clientis a regular Linux Rust client crate. Its library is UI-neutral so the same client logic can back a CLI harness, native GUI, Tauri backend, or trusted web gateway.
Those proofs are useful because they show external Cap’n Proto data can cross the QEMU TCP boundary and reach capOS-hosted services through narrowed listener caps. The remote-session proof is the first target-shaped slice, but it is not the final RPC API. It still lacks:
- standard
capnp-rpcmessage transport; - live typed RPC proxy objects rather than DTO-mediated gateway operations;
- endpoint-backed service proxy calls running in an authenticated per-session worker context;
- complete object lifetime and exception behavior;
- kernel/user-session logout, disconnect, and revocation propagation beyond the gateway-local stale flag;
- TLS/mTLS and expanded auth adapters beyond password and anonymous;
- resource accounting for remote references, in-flight calls, and result sizes.
Goals
- Support a normal host client built and run outside capOS. A Linux Rust CLI is the smallest harness; native GUI and Tauri/webapp-backed clients should not need a different capOS protocol.
- Authenticate through capOS session/admission services, not through an application-specific service secret.
- Support multiple admission methods: local password where policy enables it, public-key signatures, OIDC/OAuth browser or device flows, passkey/WebAuthn through the web gateway path, mTLS client identity, guest/anonymous profiles where explicitly enabled, and future service/workload credentials.
- Return a live remote CapSet view whose entries are typed RPC client objects, not serialized local cap-table slots.
- Let the client call any granted remote-proxyable capability by name and expected interface ID.
- Support bidirectional session UI composition: a host UI can call capOS capabilities, and capOS-side services or agents can propose bounded changes to the host session’s panes, command palette, visualizations, density, theme, and workflow-specific controls through explicit UI capabilities.
- Keep local-only authority local: cap IDs, endpoint generations, receiver selectors, session-global identifiers, and kernel result-cap indexes never become portable remote authority.
- Preserve session-bound invocation context. Remote calls run under the gateway/worker session created for that remote client.
- Make logout, disconnect, transport breakage, session expiry, policy revocation, and object drop observable and fail closed.
Non-Goals
- General network transparency across arbitrary capOS hosts.
- OCapN compatibility or third-party handoffs.
- Browser JavaScript receiving capOS capability objects directly. A webapp may be a front end, but a trusted server, gateway, or Tauri Rust backend holds the remote CapSet.
- Letting capOS services execute arbitrary host UI code, inject unreviewed JavaScript/CSS, spoof trusted browser/desktop chrome, or persist UI changes outside the granted session UI scope.
- Replacing SSH, WebShellGateway, native shell, or interactive command surfaces.
- Exposing raw
ProcessSpawner, raw network factories, broad storage roots, or key material as a default remote bundle. - Treating password authentication as the only or preferred remote path.
- Serializing the kernel CapSet page or local cap table to the client.
Architecture
flowchart TD
Client[Host app: CLI, GUI, Tauri, or web gateway] -->|TCP/TLS + capnp-rpc| Gateway[RemoteSessionGateway]
Gateway --> Auth[Auth adapters]
Auth --> Sessions[SessionManager]
Gateway --> Broker[AuthorityBroker]
Broker --> Worker[Per-session RPC worker]
Worker --> RemoteCapSet[RemoteCapSet]
RemoteCapSet --> Proxies[Remote capability proxies]
Proxies --> LocalCaps[capOS capabilities]
Worker --> Audit[AuditLog]
The remote listener is a trusted gateway. It accepts the transport, performs
or delegates authentication, obtains a UserSession, asks the broker for a
remote-client bundle, and hosts a per-session RPC vat. The vat exports a
RemoteSession object and remote proxy objects for capabilities in the
broker-issued bundle.
For the first implementation the per-session worker may be an ordinary capOS service process. That shape matches the session-bound invariant: one workload process has one immutable session context. A single long-lived gateway may handle pre-auth connection state, but post-auth capability invocation should run inside a worker whose session context is the authenticated remote session, or through an equivalently reviewable dispatch path that cannot mix unrelated user sessions as ambient authority.
Bootstrap Interfaces
These schema sketches are proposal-level. Ordinals must be assigned from the checked-in schema when implemented.
enum RemoteAuthKind {
password @0;
publicKey @1;
oidcDeviceCode @2;
oidcAuthorizationCodePkce @3;
passkey @4;
mtlsClientCert @5;
guest @6;
anonymous @7;
serviceCredential @8;
}
struct RemoteAuthMethod {
kind @0 :RemoteAuthKind;
label @1 :Text;
profileHints @2 :List(Text);
interactive @3 :Bool;
}
struct RemoteAuthStart {
kind @0 :RemoteAuthKind;
selector @1 :LoginSelector;
requestedProfile @2 :Text;
clientNonce @3 :Data;
source @4 :LoginSourceMetadata;
}
struct RemoteAuthStep {
prompt @0 :Text;
redaction @1 :Bool;
url @2 :Text;
userCode @3 :Text;
challenge @4 :Data;
expiresAtMs @5 :UInt64;
}
interface RemoteSessionGateway {
authMethods @0 () -> (methods :List(RemoteAuthMethod));
start @1 (request :RemoteAuthStart) -> (flow :RemoteAuthFlow);
guest @2 (requestedProfile :Text, source :LoginSourceMetadata)
-> (session :RemoteSession);
anonymous @3 (requestedProfile :Text, source :LoginSourceMetadata)
-> (session :RemoteSession);
}
interface RemoteAuthFlow {
next @0 (response :Data) -> (step :RemoteAuthStep, done :Bool,
session :RemoteSession);
cancel @1 () -> ();
}
struct RemoteCapEntry {
name @0 :Text;
interfaceId @1 :UInt64;
transferPolicy @2 :Text;
leaseExpiresAtMs @3 :UInt64;
}
interface RemoteSession {
info @0 () -> (info :SessionInfo);
capSet @1 () -> (caps :RemoteCapSet);
renew @2 (proof :Data, requestedDurationMs :UInt64)
-> (session :RemoteSession);
logout @3 () -> ();
}
interface RemoteCapSet {
list @0 () -> (entries :List(RemoteCapEntry));
get @1 (name :Text, expectedInterfaceId :UInt64) -> (cap :AnyPointer);
}
The AnyPointer result is proposal shorthand for an ordinary Cap’n Proto
capability pointer whose expected interface ID was already checked by the
gateway. Generated client helpers should immediately cast it to the requested
typed client. The remote client does not receive a numeric local capId,
endpoint selector, result-cap index, or session identifier it can replay
somewhere else.
Bidirectional UI Composition
A conventional GUI program opens a window and owns the controls inside it. A remote capOS session does not need to be that limited. The host app can expose a session-scoped UI host capability to capOS, and capOS-side services or agents can use that capability to propose a better interface for the current task:
- Paperclips can ask for counters, project controls, and status charts instead of printing lines.
- Chat can ask for a channel list, unread badges, and a message pane.
- Adventure can ask for a map pane, inventory slots, command buttons, and room transcript.
- A diagnostics agent can open log, metric, and trace panes side by side, highlight the relevant capability calls, and change density for a debugging session.
- A teaching or accessibility agent can request larger type, simplified controls, or a guided task layout for a particular session.
The authority is explicit and separate from service authority. Holding Chat
does not let a service rewrite the user’s UI. Holding RemoteUiHost or a
narrow UiSurface facet lets the service propose bounded UI changes for the
current remote session. The host app remains the compositor and policy
enforcer.
Conceptual shape:
enum UiPatchKind {
openSurface @0;
closeSurface @1;
updateModel @2;
setLayoutHint @3;
setThemeHint @4;
addCommand @5;
removeCommand @6;
}
struct UiSurfaceSpec {
surfaceId @0 :Data;
title @1 :Text;
kind @2 :Text;
safetyClass @3 :Text;
modelSchema @4 :UInt64;
}
struct UiPatch {
kind @0 :UiPatchKind;
surfaceId @1 :Data;
payload @2 :Data;
expiresAtMs @3 :UInt64;
}
struct UiEvent {
surfaceId @0 :Data;
command @1 :Text;
payload @2 :Data;
userInitiated @3 :Bool;
}
interface RemoteUiHost {
open @0 (spec :UiSurfaceSpec) -> (surface :RemoteUiSurface);
theme @1 (scope :Text, hints :Data) -> ();
}
interface RemoteUiSurface {
apply @0 (patch :UiPatch) -> ();
poll @1 (maxEvents :UInt16) -> (events :List(UiEvent));
close @2 () -> ();
}
The payloads above should become typed structs before implementation. They are
shown as Data only to keep the sketch short. The important boundary is that
UI updates are declarative patches and typed view models, not arbitrary host
code. The host validates the requested surface kind, model schema, command
set, theme tokens, data size, update rate, and safety class before rendering
anything.
This is still a remote CapSet client model:
host UI holds RemoteSession + RemoteCapSet
host UI grants a narrow RemoteUiHost/RemoteUiSurface cap to a trusted worker
capOS service or agent sends declarative UI patches through that cap
host UI renders and sends typed user events back
service effects still require ordinary service caps
The direction is therefore bidirectional but not symmetric. The host app can call capOS service caps. capOS can shape the session UI only through UI caps the host granted. Neither side gains ambient authority over the other.
Safety rules:
- Host chrome, login prompts, origin indicators, permission prompts, and emergency reset controls are reserved. capOS-rendered surfaces cannot spoof them.
- UI patches are session-scoped. Persistent layout/theme changes require an explicit profile/settings cap or user confirmation.
- Theme and look/feel changes use bounded tokens or validated design-system variables, not raw CSS injection.
- UI command descriptors are data; executing a command still calls a typed capability under the current session policy.
- The user can close, reset, or pin surfaces against agent rearrangement.
- UI updates are quota-bound and auditable when they materially affect workflow, consent, disclosure, or action execution.
- Browser front ends keep raw capOS caps server-side or in a Tauri/native Rust
backend. Browser JavaScript receives rendered state and sends user events; it
does not hold
RemoteCapSetentries.
This is the broader version of the WebShell idea. A web shell can be more than a terminal emulator: it can be a session workspace whose composition is negotiated by the capabilities present in the session. The terminal remains one surface in that workspace, not the only surface.
Authentication And Admission
Authentication adapters all produce the same output: a UserSession plus
profile inputs for the broker. They differ only in how the proof is obtained
and verified.
- Password: maps to the existing
SessionManager.login(method, selector, proof, source)path when remote password login is enabled by policy. It must use the existing credential failure/backoff/audit rules and must not be the only supported remote method. - Public key: maps to
SessionManager.sshPublicKeyor a generalized signature-auth method. SSH userauth and raw remote RPC public-key auth can share account/key records, but the transcript bytes must be domain-separated by protocol and channel binding. - OIDC/OAuth: device-code flow fits headless or CLI clients; authorization
code + PKCE fits browser-assisted clients. The OAuth/OIDC service verifies
ID tokens and maps external subjects through the user-identity admission
model before
SessionManagermints a session. - Passkey/WebAuthn: belongs behind the web-authenticator path. A remote native client may open a browser or use a platform authenticator, but raw authenticator secrets never become capOS app data.
- mTLS client certificate: TLS client-auth can identify a principal or pseudonymous subject through certificate policy. Certificate identity is an admission input; the resulting CapSet still comes from the broker.
- Guest and anonymous: explicit policy profiles. They are not fallbacks for missing credentials and should receive short leases and narrow bundles.
- Service/workload credentials: future non-human clients can authenticate with OAuth client credentials, token exchange, mTLS, or signed workload assertions. They receive service-profile bundles, not human shell bundles.
Every method must record source metadata and protocol/channel binding appropriate to its transport. A successful proof selects a principal and session; it does not directly grant service authority.
Remote CapSet Semantics
A local process starts with a read-only CapSet page plus local cap-table
entries. A remote client instead receives a live RemoteCapSet object:
listreturns names, interface IDs, display metadata, and lease summaries.getreturns a typed RPC capability pointer only if the name exists and the expected interface ID matches.- The returned object is a proxy owned by the remote-session worker.
- Dropping the remote object releases the worker’s hold edge when no other remote references remain.
- Logout, expiry, revocation, disconnect, or worker shutdown breaks all session-bound proxy objects.
This is still an actual session bundle. It is not a copy of the kernel’s local CapSet ABI. The remote representation exists because a Linux process has no capOS ring page, no capOS CapSet mapping, and no local cap table.
Invocation Context
Remote capability calls should look like ordinary calls to the target service:
remote client call
-> capnp-rpc message
-> per-session worker proxy
-> local capOS capability call
-> target service sees the worker's live session context
The remote client cannot choose service-visible subject identity. Request fields are ordinary data. If a service needs subject details, it uses the existing subject-disclosure policy: explicit request plus a matching service-scoped disclosure grant. By default it receives only the opaque service-scoped caller-session reference used by the session-bound invocation model.
Error And Lifetime Model
The remote path keeps the existing error split:
- Cap’n Proto RPC transport errors and broken connections become RPC exceptions or disconnected promises.
- Proxy/worker infrastructure failures become
CapException-like capability exceptions. - Domain outcomes remain schema result fields or unions.
- A missing cap name, interface mismatch, denied profile, stale session, or revoked lease is an observable denial, not a silent fallback to a broader service.
Open promises must fail when the remote session logs out or the connection is closed. The worker must release local caps on every close path.
Relationship To Shells And Gateways
Remote session CapSet clients are a peer of shell transports:
- Native shell: a local capOS process that uses its local CapSet and ring. It can later expose a schema-aware REPL over the same capabilities a remote client sees, but the remote client does not need to spawn a shell.
- SSH shell: a production CLI terminal transport. It authenticates and
launches
capos-shellwith aTerminalSession. It should not become the only way for external programs to call typed services. - WebShellGateway: browser terminal, webapp, and agent UI transport. Browser JavaScript must not receive raw capOS caps; the gateway can use the remote session CapSet model server-side and expose terminal frames, view models, command descriptors, or bounded tool requests to the browser. This is close to the same mental model as a “web shell”, except the shell is not the required protocol. The web UI can present service-specific controls over the same session CapSet, and capOS-side services can adjust the session workspace through UI composition caps.
- Tauri or desktop GUI: the Rust/native backend may hold the remote
RemoteSessionand typed capability clients, while the UI layer receives rendered state, command descriptors, and user-intent events. The UI layer should not receive replayable capOS authority as data. The backend may grant narrow UI-surface caps back to capOS services so they can propose adaptive layouts without gaining arbitrary desktop control. - Agent shell: the agent runner holds session caps server-side and presents tool descriptors to the model. A hosted agent can use the same remote session bundle shape as long as actual capOS invocations remain in the trusted worker.
- Interactive command surfaces: command metadata can be one of the granted capabilities. A remote client can render command specs directly instead of scripting text through a shell.
Authority Rules
- The gateway receives scoped listener/TLS/auth/session/broker/audit authority, not raw broad network or spawn authority.
- Post-auth workers receive only the broker-issued remote-client bundle plus proxy lifecycle authority.
- Default remote bundles should be narrower than operator shell bundles.
- Raw
ProcessSpawner, unrestrictedNetworkManager, key-vault, credential store, broad account store, broad storage root, and host debug caps require explicit elevated policy. - Remote proxyable caps must declare transfer/lifetime policy. Local-only caps
may appear in a local shell CapSet without being exportable through
RemoteCapSet. - Capability names are lookup conveniences. Interface ID and broker policy define whether a returned object is usable for the requested type.
- Replayable handles are forbidden. Session IDs, grant IDs, endpoint metadata, object epochs, and proxy table positions are not bearer tokens.
Design Grounding
- Session-Bound Invocation Context defines the one-session-per-process invariant and privacy-preserving endpoint caller-session metadata.
- User Identity and Policy defines principals, sessions, profiles, admission sources, renewal, and brokered CapSet minting.
- Boot to Shell defines the existing
CredentialStore/SessionManager/AuthorityBrokerpath and non-password login directions. - SSH Shell Gateway, Certificates and TLS, and OIDC and OAuth2 define public-key, TLS/mTLS, and federated admission inputs.
- libcapos-service defines the service lifecycle shape needed for listener loops, per-session context, shutdown, drain, and metrics.
- Interactive Command Surfaces defines typed command sessions that can be rendered by remote clients.
- Browser Capability and Agent Web Sessions defines browser-side authority boundaries and gateway mediation for web UI sessions.
- Language Models and Agent Runtime defines agent runners, tool proxies, and browser-agent UI orchestration boundaries.
- Cloudflare, Cap’n Proto, Workers RPC, and Cap’n Web grounds production object-capability RPC, live object bindings, and remote resource-exhaustion discipline.
- Spritely, OCapN, and CapTP grounds distributed object-capability lifetime, promise, reference, and handoff questions while staying non-binding for capOS wire compatibility.
Implementation Shape
The first implementation is deliberately small:
- Keep the existing
capnp-chat-interopservice and harness as the transport starting point, but rename the target outcome in planning docs to remote session CapSet interop. Done. - Add generated Linux Rust bindings for the relevant schema subset. Done.
- Add a host client library that connects through QEMU user TCP. Done with a
schema-framed DTO transport; replacing it with standard
capnp-rpcframing and live proxy objects remains the next transport step. - Add a capOS gateway that supports one policy-enabled auth method plus explicit guest/anonymous behavior. Done for password and anonymous, with disabled public-key, OIDC, and passkey/WebAuthn method entries advertised.
- Return remote session summary, CapSet list, and typed
getmetadata. Done as DTOs. - Call at least two capabilities from the bundle. Done for
sessionandsystem_info; endpoint-backed services such aschatremain blocked on the per-session worker proxy. - Prove a missing cap, wrong interface ID, wrong profile, stale session, and logout path fail closed. Done for the focused proof; full disconnect, release, and revocation propagation remains future work.
- Replace the DTO transport with standard
capnp-rpc, typed remote proxy objects, exception mapping, release/drop handling, and resource bounds. - Add a separate UI-composition proof only after the basic session proof:
grant a narrow test
RemoteUiSurface, accept one declarative patch, send one typed user event back, and prove the service cannot spoof trusted chrome or persist layout state without the relevant cap.
Later slices can add more auth adapters, TLS, renewal, browser-assisted auth, service credentials, UI composition surfaces, promise pipelining, and distributed GC.