Proposal: Interactive Command Surfaces
Typed command surfaces for native interactive applications without moving
application parsing into StdIO text streams.
Current Target Versus Future Design
The immediate target is deliberately narrower than this proposal:
capos-shellexposes generic process control commands, includingspawnfor asynchronous launch andrunfor launch-and-wait.- Chat and adventure clients are ordinary spawned commands, not shell builtins.
- Interactive child I/O uses an explicit
StdIOendpoint client with stdin/stdout/stderr-shaped semantics while the shell keeps ownership of itsTerminalSession. - Focused QEMU smokes prove the resident-service plus shell-spawned-client path before the native command protocol hardens.
The future native design is the CommandSession/CommandSurface protocol
below. It should replace semantic command parsing inside chat/adventure
clients once the prototype has proved the process, grant, wait, and terminal
bridging mechanics.
Problem
The current chat/adventure worktree moved application commands out of
capos-shell builtins and into ordinary shell-spawned clients. That fixes one
bad boundary, but it leaves another one: the clients read lines from StdIO
and parse command text such as go north, take key, /join #lobby, and
say hello themselves.
That is still too stringly for capOS. The kernel and services already expose
typed capabilities. Native interactive applications should not receive their
primary operation as an unstructured terminal line and then rebuild an ad hoc
parser. StdIO is useful for textual programs, logs, compatibility layers,
and simple smoke harnesses. It is not the right semantic boundary for a native
application command language.
The other design pressure is terminal reuse. The same native shell should work from a local UART, GUI pane, web terminal, or test harness. That argues for a terminal host process that owns terminal transport and rendering separately from the shell process that owns command routing and capability context.
Goals
- Keep application-specific verbs out of
capos-shell. - Keep application command semantics out of unstructured
StdIOtext parsing. - Let a user type familiar command forms such as
go northorchat join #lobbywhile the executable representation is a typed invocation. - Support nested subcommands without hardcoding app grammar into the shell.
- Let terminal hosts provide line editing, completion, history, resize, and GUI/web rendering from the same command metadata.
- Preserve typed service authority: parsing a command never grants access, and every effect still requires the right capability.
Non-Goals
- POSIX shell compatibility.
- A global command namespace.
- Making terminal text a security boundary.
- Removing
StdIO; it remains the byte/text stream adapter for programs whose interface really is textual.
Layering
flowchart TD
Uart[UART TerminalHost] --> Terminal[Terminal entity]
Web[Web TerminalHost] --> Terminal
Gui[GUI TerminalHost] --> Terminal
Terminal --> Shell[Native shell session]
Shell --> Cmd[Interactive CommandSession]
Cmd --> Adventure[Adventure service cap]
Cmd --> Chat[Chat service cap]
Shell --> Launcher[Restricted launcher]
Shell --> Broker[AuthorityBroker]
The terminal host owns raw input/output, line discipline, presentation state,
history, paste handling, resize events, and later GUI/web affordances. The
terminal entity is the session object the host exposes to a foreground shell or
application view. TerminalSession remains the capability boundary for a
foreground text session, but it does not have to be implemented inside the
shell.
The native shell owns command namespace, current capability context, spawn/wait state, and policy-mediated bundle changes. It can run from any terminal host because it talks to the terminal entity, not to a particular UART.
An interactive application owns a CommandSession. It exposes a command
surface and receives structured invocations. The application may be a thin
adapter over service capabilities, as the adventure client should be, or a
resident service may expose the command session directly.
Command Pattern
command <args> is acceptable as user-facing syntax, but it must not become
the application ABI. It is a parseable notation for a declared command surface.
The shell or terminal host parses text into a CommandInvocation; the
application receives typed fields.
Conceptual schema:
struct CommandSurface {
revision @0 :UInt64;
prompt @1 :Text;
commands @2 :List(CommandSpec);
}
struct CommandSpec {
path @0 :List(Text);
summary @1 :Text;
args @2 :List(CommandArg);
flags @3 :List(CommandFlag);
redaction @4 :List(RedactionClass);
}
struct CommandArg {
name @0 :Text;
kind @1 :CommandValueKind;
required @2 :Bool;
variadic @3 :Bool;
restOfLine @4 :Bool;
completions @5 :CompletionSource;
}
struct CommandInvocation {
surfaceRevision @0 :UInt64;
path @1 :List(Text);
args @2 :List(CommandValue);
flags @3 :List(CommandFlagValue);
}
interface CommandSession {
describe @0 () -> (surface :CommandSurface);
invoke @1 (command :CommandInvocation) -> (result :CommandResult);
poll @2 (maxEvents :UInt16) -> (events :List(CommandEvent));
close @3 () -> ();
}
The parser is generic:
- Match the longest declared command path.
- Parse arguments according to the declared shapes.
- Treat ambiguous prefixes as errors with alternatives.
- Treat
restOfLineas one text argument; do not split it again in the app. - Attach redaction metadata before audit or transcript recording.
- Re-read
CommandSurfacewhen a command returns a new revision.
The application can still reject a typed invocation if the command is no longer valid. That is ordinary semantic validation, not text parsing.
Subcommand Nesting
Nested subcommands work if the command path is represented as a token list rather than a single string. Examples:
go north
take brass-key
say hello there
chat join #lobby
chat who
inventory equip lantern
admin npc spawn wanderer room=atrium
Those become:
path=["go"], args={direction:"north"}
path=["take"], args={item:"brass-key"}
path=["say"], args={text:"hello there"}
path=["chat","join"], args={channel:"#lobby"}
path=["chat","who"], args={}
path=["inventory","equip"], args={item:"lantern"}
path=["admin","npc","spawn"], args={kind:"wanderer", room:"atrium"}
The shell does not need adventure-specific code for any of these. It needs a
generic command tree, longest-prefix matching, value parsers, and completion
hooks. The same mechanism can describe shell commands such as spawn, wait,
login, and caps, even if the implementations remain inside the shell for
now.
Subcommand nesting is also a better fit for GUI/web sessions than raw StdIO.
A terminal host can render chat join as a command palette entry, offer room
completions for go, or show buttons for zero-argument commands such as
look, all from the same metadata.
Adventure Shape
The adventure command session should own only the caps it needs:
adventure Adventure or Endpoint client cap
chat Chat or Endpoint client cap
session optional UserSession metadata cap
It should expose a dynamic surface derived from current player state:
lookgo <direction>with room-specific direction completionstake <item>with visible item completionsdrop <item>with inventory completionsinventorysay <text...>withrestOfLine=truechat join <channel>chat whoquit
The shell or terminal host parses those forms. The adventure command session
turns the resulting invocation into typed Adventure and Chat calls. The
adventure service still validates the session-bound caller identity, room,
exits, items, and chat channel authority. Dynamic completions are convenience,
not authority.
This is the balance capOS wants: generic shell integration, app-owned command metadata, typed service calls, and no application-specific shell builtins.
Role of StdIO
StdIO remains useful, but it should be demoted to a transport and
compatibility interface:
- output streams for simple textual programs,
- test harnesses that script input and check transcript output,
- POSIX personality descriptor emulation,
- applications whose real protocol is text.
For capOS-native interactive applications, StdIO.read() should not be the
primary command interface. A command session can still emit render events that
the shell forwards to a terminal host, and a compatibility adapter can expose
the same session as text when necessary.
Terminal Host Separation
The shell should not permanently own the terminal implementation. A separate terminal host process gives the system one shell that can be reused across different front ends:
- local UART host for QEMU and early hardware,
- web host for browser terminal sessions,
- GUI host for a desktop pane or command palette,
- test host for smoke scripts.
Each host owns a terminal entity and grants a foreground TerminalSession or
equivalent view to the shell. The shell runs command sessions and returns
render/update events. The host decides how to display them.
This also avoids a future false choice between “shell owns the terminal” and “child process receives the terminal.” The terminal entity can support a foreground lease, shell-mediated command sessions, and later split panes or GUI widgets without making every child process a terminal driver.
Migration Plan
- Land the current shell-spawned
StdIOclients as an explicit prototype: no app-specific shell builtins, no terminal-cap delegation to children, andrunavailable for blocking command execution. - Add focused QEMU smokes for chat and adventure against that prototype so the resident service, exact grants, wait path, and terminal bridge have a stable regression target.
- Add a userspace
CommandSessionDTO/protocol in the shared demo/runtime layer, carried over ordinaryEndpointuntil a manifest-visible interface is worth committing. - Teach
capos-shella generic command-surface parser and command-provider registry. Do not addchat,play adventure,go,take, or similar application verbs as hardcoded shell matches. - Move adventure command parsing out of
demos/adventure-client/and into command descriptors plus typedAdventure/Chatinvocations. - Split terminal hosting from the shell when the local UART path needs to
support a second front end or when the web terminal work starts. Until then,
keep the current terminal implementation constrained to the
TerminalSessionboundary so the split is mechanical.