# Proposal: libcapos-service

Define a userspace service framework above `capos-rt` for long-running capOS
services. The library should provide common lifecycle, endpoint, readiness,
shutdown, context, metrics, and budgeting mechanics without adding a generic
kernel `Service` capability or a kernel-level phase machine.

The immediate target is terminal/networking lifecycle: byte-stream terminal
hosting, Telnet/TLS/SSH gateway plumbing, listener accept loops, shell launch,
proxying, cleanup, and observable shutdown. HTTP/fetch services come later.

## Problem

Current services duplicate the same shape:

- discover bootstrap caps;
- wait for dependencies;
- mark readiness through log output or implicit behavior;
- run accept or endpoint receive loops;
- spawn children or proxy byte streams;
- release result caps and temporary state;
- log or count failures;
- shut down after EOF, error, process exit, or supervisor request.

Duplicating that lifecycle is tolerable for proofs, but it is a poor foundation
for production gateway, storage, agent, monitoring, and network services.
Repeated hand-rolled loops are also where capability leaks, stuck children,
incorrect close ordering, and hidden unbounded work appear.

## Layering Decision

The stack remains:

```text
schema/capos.capnp
  stable authority-bearing interfaces

capos-rt
  raw runtime and transport:
  bootstrap, CapSet, ring client, typed handles, completion matching,
  release flushing, exception decoding

libcapos-service
  generic userspace service container:
  lifecycle, endpoint loops, readiness, shutdown, background tasks,
  metrics, context, resource hooks

domain libraries
  HTTP/fetch, terminal host, storage, supervisor, agent tools

init/supervisors
  compose services by passing capabilities, not global names
```

`libcapos-service` is not a new authority source. It wraps and narrows
capabilities the process already holds. The kernel still sees ordinary typed
capability calls and ordinary process lifecycle.

## Core Surface

Initial framework pieces:

- **Service lifecycle:** initialize, dependency wait, ready, run, drain,
  shutdown, and final cleanup.
- **Endpoint serve loops:** generated or handwritten helpers for `RECV`,
  decode, dispatch, `RETURN`, exception return, cancellation, and release.
- **Readiness handles:** typed local handles or service-exported readiness caps,
  not global service names.
- **Shutdown and drain:** cancellable waits, child/process-handle cleanup,
  listener stop, in-flight request drain, bounded force-close.
- **Background tasks:** timers, periodic health checks, metrics export, and
  discovery loops with explicit cancellation.
- **Request/session context:** owned context object per request or session
  containing caller-session metadata, derived policy, resource reservations,
  transfer state, timing, and audit correlation.
- **Metrics hooks:** bounded counters and summaries; no unbounded per-user,
  per-cap-id, or per-method labels by default.
- **Resource budgeting:** reservation/donation hooks that call into the relevant
  ledger owner; the framework records what was reserved and releases it on every
  exit path.
- **Error boundary:** preserve the error-handling split from
  `error-handling-proposal.md`: CQE status for transport/kernel dispatch
  failure, `CapException` for capability infrastructure failure, and schema
  result unions for normal domain outcomes.
- **Graceful handoff hooks:** transfer or drain listeners, endpoint loops,
  child handles, background tasks, and in-flight request state during upgrade or
  supervisor-directed replacement. Handoff must be explicit; silent cloning of
  authority or abandoning in-flight work is a bug.

## First Target: Terminal And Networking

The first useful slice should be:

1. `TerminalSessionFromByteStream` / byte-stream terminal host.
2. Lifecycle wrapper around accept, session minting, proxying, and cleanup.
3. Request/session context and metrics hooks.
4. Network service container for listener-backed services.
5. HTTP/fetch lifecycle only after terminal/networking proves the cleanup and
   authority model.

This ordering deliberately exercises the hard lifecycle edges before adding
HTTP convenience: authenticated session creation, shell spawn, bidirectional
byte proxying, EOF/close/error ordering, repeated connect/disconnect, and
release of terminal/session/process result caps.

## Authority Rules

- The framework must not accept ambient service names, raw global handles, or
  stringly typed service discovery.
- Hooks receive narrow capabilities, not ambient process authority.
- Request/session context is lifecycle-owned and cannot outlive the
  request/session that created it.
- Background tasks are budgeted, cancellable, and observable during shutdown.
- Retry policy must encode side-effect safety through idempotency, operation
  ids, or a domain-specific no-retry rule.
- Pool keys for reusable resources include every authority and identity field
  that changes policy: target, protocol, TLS identity, cap/object epoch,
  caller/session reference, namespace, tenant, and transformation policy.
- Cache keys must include tenant, session, and authority dimensions where those
  dimensions affect disclosure or correctness.
- Protocol parsers must drain or close before stream reuse.
- Readiness means the service can actually accept authorized work; config parse
  success is not enough.
- Shutdown must either drain, cancel, or explicitly transfer all in-flight work.

## Non-Goals

- No generic kernel `Service` capability.
- No kernel callback registry or phase machine.
- No plugin ABI that passes `phase_id` and bytes through a single generic cap.
- No global service discovery namespace.
- No HTTP-first framework that delays terminal/networking lifecycle cleanup.
- No replacement for `capos-rt` transport primitives.

## Implementation Sequence

1. Draft shared `ServiceMain`/`ServiceRuntime` shape for one process.
2. Factor byte-stream terminal host lifecycle around
   `TerminalSessionFromByteStream`.
3. Convert a focused terminal or gateway proof to use the lifecycle wrapper.
4. Add request/session context and bounded metrics hooks.
5. Add readiness and shutdown/drain helpers.
6. Add endpoint serve-loop helpers that preserve typed schema authority.
7. Add resource reservation/donation hooks.
8. Consider HTTP/fetch domain library only after terminal/networking proofs pass.

## Verification

Initial proof gates:

```text
make docs
make run-terminal
make run-telnet or qemu-telnet-harness
focused close/reconnect proof
hidden password behavior remains byte-identical
child shell receives no raw network/spawn/listener authority
gateway cleanup releases terminal/session/process handles on EOF/error/shutdown
```

Later endpoint-helper gates should add targeted tests for exception return,
result-cap release, cancellation, and resource rollback.

## Related

- [Pingora research](../research/pingora.md) records the framework precedent and
  rejects importing Pingora's HTTP proxy model into the kernel.
- [Telnet over TLS Shell](telnet-tls-shell-proposal.md) and
  [SSH Shell Gateway](ssh-shell-proposal.md) define the terminal factory and
  remote-ingress boundaries.
- [Error Handling](error-handling-proposal.md) defines the three error layers
  that generated clients and service helpers must preserve.
- [Resource Accounting and Quotas](resource-accounting-proposal.md) defines the
  ledger vocabulary for budgeting/donation hooks.
