Proposal: libcapos-service
Define a userspace service framework above capos-rt for long-running capOS
services. The library should provide common lifecycle, endpoint, readiness,
shutdown, context, metrics, and budgeting mechanics without adding a generic
kernel Service capability or a kernel-level phase machine.
The immediate target is terminal/networking lifecycle: byte-stream terminal hosting, Telnet/TLS/SSH gateway plumbing, listener accept loops, shell launch, proxying, cleanup, and observable shutdown. HTTP/fetch services come later.
Problem
Current services duplicate the same shape:
- discover bootstrap caps;
- wait for dependencies;
- mark readiness through log output or implicit behavior;
- run accept or endpoint receive loops;
- spawn children or proxy byte streams;
- release result caps and temporary state;
- log or count failures;
- shut down after EOF, error, process exit, or supervisor request.
Duplicating that lifecycle is tolerable for proofs, but it is a poor foundation for production gateway, storage, agent, monitoring, and network services. Repeated hand-rolled loops are also where capability leaks, stuck children, incorrect close ordering, and hidden unbounded work appear.
Layering Decision
The stack remains:
schema/capos.capnp
stable authority-bearing interfaces
capos-rt
raw CapSet/ring/typed-handle transport, result caps, release, exceptions
libcapos-service
userspace lifecycle and endpoint/service helpers
domain libraries
terminal host, HTTP/fetch, storage, supervisor, agent tools
init/supervisors
compose services by passing capabilities
libcapos-service is not a new authority source. It wraps and narrows
capabilities the process already holds. The kernel still sees ordinary typed
capability calls and ordinary process lifecycle.
Core Surface
Initial framework pieces:
- Service lifecycle: initialize, dependency wait, ready, run, drain, shutdown, and final cleanup.
- Endpoint serve loops: generated or handwritten helpers for
RECV, decode, dispatch,RETURN, exception return, cancellation, and release. - Readiness handles: typed local handles or service-exported readiness caps, not global service names.
- Shutdown and drain: cancellable waits, child/process-handle cleanup, listener stop, in-flight request drain, bounded force-close.
- Background tasks: timers, periodic health checks, metrics export, and discovery loops with explicit cancellation.
- Request/session context: owned context object per request or session containing caller-session metadata, derived policy, resource reservations, transfer state, timing, and audit correlation.
- Metrics hooks: bounded counters and summaries; no unbounded per-user, per-cap-id, or per-method labels by default.
- Resource budgeting: reservation/donation hooks that call into the relevant ledger owner; the framework records what was reserved and releases it on every exit path.
- Error boundary: preserve the error-handling split from
error-handling-proposal.md: CQE status for transport/kernel dispatch failure,CapExceptionfor capability infrastructure failure, and schema result unions for normal domain outcomes.
First Target: Terminal And Networking
The first useful slice should be:
TerminalSessionFromByteStream/ byte-stream terminal host.- Lifecycle wrapper around accept, session minting, proxying, and cleanup.
- Request/session context and metrics hooks.
- Network service container for listener-backed services.
- HTTP/fetch lifecycle only after terminal/networking proves the cleanup and authority model.
This ordering deliberately exercises the hard lifecycle edges before adding HTTP convenience: authenticated session creation, shell spawn, bidirectional byte proxying, EOF/close/error ordering, repeated connect/disconnect, and release of terminal/session/process result caps.
Authority Rules
- The framework must not accept ambient service names, raw global handles, or stringly typed service discovery.
- Hooks receive only the caps explicitly passed to that service or request.
- Request contexts are lifecycle-owned and must be dropped deterministically.
- Background tasks are budgeted and cancellable during shutdown.
- Retry policy is domain-specific and requires idempotency or operation ids.
- Pool keys for reusable resources include every authority and identity field that changes policy: target, protocol, TLS identity, cap/object epoch, caller/session reference, namespace, tenant, and transformation policy.
- Readiness means the service can actually accept authorized work; config parse success is not enough.
Non-Goals
- No generic kernel
Servicecapability. - No kernel callback registry or phase machine.
- No plugin ABI that passes
phase_idand bytes through a single generic cap. - No global service discovery namespace.
- No HTTP-first framework that delays terminal/networking lifecycle cleanup.
- No replacement for
capos-rttransport primitives.
Implementation Sequence
- Draft shared
ServiceMain/ServiceRuntimeshape for one process. - Factor byte-stream terminal host lifecycle around
TerminalSessionFromByteStream. - Convert a focused terminal or gateway proof to use the lifecycle wrapper.
- Add request/session context and bounded metrics hooks.
- Add readiness and shutdown/drain helpers.
- Add endpoint serve-loop helpers that preserve typed schema authority.
- Add resource reservation/donation hooks.
- Consider HTTP/fetch domain library only after terminal/networking proofs pass.
Verification
Initial proof gates:
make docs
make run-terminal
make run-telnet or qemu-telnet-harness
focused close/reconnect proof
hidden password behavior remains byte-identical
child shell receives no raw network/spawn/listener authority
gateway cleanup releases terminal/session/process handles on EOF/error/shutdown
Later endpoint-helper gates should add targeted tests for exception return, result-cap release, cancellation, and resource rollback.
Related
- Pingora research records the framework precedent and rejects importing Pingora’s HTTP proxy model into the kernel.
- Telnet over TLS Shell and SSH Shell Gateway define the terminal factory and remote-ingress boundaries.
- Error Handling defines the three error layers that generated clients and service helpers must preserve.
- Resource Accounting and Quotas defines the ledger vocabulary for budgeting/donation hooks.