# Capability Model

How capabilities work in capOS.

## What is a Capability

A capability in capOS is a reference to a kernel object that carries:
- An **interface** (what methods can be called), defined by a Cap'n Proto schema
- A **permission** (the object it references, enforced by the kernel)
- A **wire format** (Cap'n Proto serialized messages for all invocations)

A process can only access a resource if it holds a capability to it. There is
no ambient authority -- no global namespace, no "open by path" syscall, no
implicit resource access.

## Identity Terms and Authority

capOS documentation uses identity terms as policy metadata, not as kernel
authorization primitives. A **user** is human-facing prose. A **principal** is
the stable identity metadata used by authentication, policy, audit, and
ownership records. An **account** is planned durable local record state for a
principal, including credential references, roles, attributes, storage-root
references, and default profile names. A **session** is the live context that
receives a concrete CapSet. **Policy profiles** and **resource profiles**
select bundle fragments, approval eligibility, and quotas that a trusted
broker may use when minting capabilities.

None of those terms is kernel authority: the kernel dispatches through
generation-tagged `CapId` entries, not users, roles, accounts, groups, UIDs,
or profile names. Account-store behavior, durable profile records, and broader
quota policy remain future work tracked in the
[local users backlog](backlog/local-users-management.md).

## Session-Bound Invocation Context

Services should not infer authority from caller-supplied identity fields. A
request parameter such as `user`, `principal`, `client`, or `role` is data.
The active model is one immutable session context per process plus explicit
capabilities granted by a broker or supervisor.

The general pattern is:

- authentication or admission creates a live `SessionContext`;
- process spawn installs exactly one immutable session context in the child;
- `AuthorityBroker` grants service roots/facets appropriate to that session;
- endpoint calls carry privacy-preserving caller-session metadata by default;
- subject details such as global principal id, display name, profile class, or
  external claims are disclosed only through explicit client disclosure and a
  matching broker/service disclosure scope. The current endpoint CALL path
  implements this as a disclosure request mask intersected with cap-held
  disclosure scope.

The kernel role is narrower. It verifies that a process holds a live cap-table
entry, that the process session is live, and that transfer/spawn obey session
scope. It may deliver an opaque service-scoped caller-session reference and
freshness result to endpoint servers, but it must not disclose broader subject
details by default. It does not decide that a process is Alice, an operator, a
moderator, or an NPC. Those are policy facts maintained by session, broker,
account, and application services.

Opaque receiver selectors may still exist in the IPC implementation and in
historical service-object routing tests. A receiver selector is not identity
metadata, not shell syntax, not a user field, not a disclosure channel, and not
a role bit. New shared-service identity should use the caller session context
and broker-granted service facets, not caller-selected numeric labels.
The chat demo now follows this rule for membership: the server receives the
endpoint caller metadata and keys member records by an opaque live
caller-session reference, while chat handles remain request data and visible
member labels are assigned by the service.
The shared chat/adventure endpoint helper now exposes caller-session metadata
through `EndpointCaller` instead of a badge field; the old badge-named
user-data type remains only as a source-compatible alias. Terminal output and
shell-serviced stdio bridges are also gated by live caller-session metadata.

## Schema as Contract

Capability interfaces are defined in `.capnp` schema files under `schema/`.
The schema is the canonical interface definition. Currently defined:

```capnp
interface Console {
    write @0 (data :Data) -> ();
    writeLine @1 (text :Text) -> ();
}

interface TerminalSession {
    write @0 (data :Data) -> ();
    writeLine @1 (text :Text) -> ();
    readLine @2 (request :LineRequest) -> (status :LineStatus, line :Data);
}

interface FrameAllocator {
    allocFrame @0 () -> (handleIndex :UInt16);
    allocContiguous @1 (count :UInt32) -> (handleIndex :UInt16);
}

interface MemoryObject {
    info @0 () -> (pageCount :UInt32, sizeBytes :UInt64);
    map @1 (hint :UInt64, offset :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
    unmap @2 (addr :UInt64, size :UInt64) -> ();
    protect @3 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}

interface VirtualMemory {
    map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
    unmap @1 (addr :UInt64, size :UInt64) -> ();
    protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}

interface Endpoint {}

interface ProcessSpawner {
    spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}

interface ProcessHandle {
    wait @0 () -> (exitCode :Int64);
}

interface BootPackage {
    manifestSize @0 () -> (size :UInt64);
    readManifest @1 (offset :UInt64, maxBytes :UInt32) -> (data :Data);
}

# Management-only introspection. Ordinary handle release uses the system
# transport opcode CAP_OP_RELEASE, not a method here.
interface CapabilityManager {
    list @0 () -> (capabilities :List(CapabilityInfo));
    revoke @1 (capId :UInt32) -> ();
    # grant is planned for a later Stage 6 management slice
}
```

Each interface has a unique 64-bit `TYPE_ID` generated by the Cap'n Proto
compiler. `TYPE_ID` is the schema constant. `interface_id` is the runtime
metadata used by CapSet/bootstrap descriptions and endpoint delivery headers.
Method dispatch uses the interface assigned to the capability entry plus
`method_id`; `method_id` selects a method inside that schema.

This is not capability identity. A `CapId` is the authority-bearing handle in
a process table, analogous to an fd. Multiple capabilities can expose the same
interface:

- `cap_id=3` -> serial-backed `Console`
- `cap_id=4` -> log-buffer-backed `Console`
- `cap_id=5` -> `Console` proxy served by another process

All three use the same `Console` `TYPE_ID`, but they are different objects
with different authority. The manifest/CapSet should record the expected schema
`TYPE_ID` as interface metadata for typed handle construction. Normal CALL SQEs
do not need to repeat it because the kernel or serving transport can derive it
from the target capability entry. `CapSqe` keeps reserved tail padding for ABI
stability.

The kernel exposes the initial CapSet to each process as a read-only
4 KiB page mapped at `capos_config::capset::CAPSET_VADDR` and passes its
address in RDX to `_start`. The page starts with a
`CapSetHeader { magic, version, count }` and is followed by
`CapSetEntry { cap_id, name_len, interface_id, name: [u8; 32] }` records
in manifest declaration order. Userspace looks up caps by the manifest
name rather than by numeric index (`capos_config::capset::find`), so
grants can be reordered in `system.cue` without breaking clients. The
mapping is installed without `WRITABLE` so userspace cannot mutate its
own bootstrap authority map.

Security invariant: a `CapTable` entry exposes one public interface. If the
same backing state must be available through multiple interfaces, mint multiple
capability entries, each wrapping the same state with a narrower interface.
Do not grant one handle that accepts unrelated `interface_id` values; that
makes hidden authority easy to miss during review.

## Invocation Path

Capabilities are invoked via a shared-memory **capability ring** (io_uring-
inspired). Each process has a submission queue (SQ) and completion queue (CQ)
mapped into its address space. Two invocation paths exist:

```
Caller builds capnp params message
    → serialize to bytes (write_message_to_words)
    → write CALL SQE to SQ ring (pure userspace memory write)
    → advance SQ tail
    → caller invokes cap_enter for ordinary capability methods
      (timer polling only runs explicitly interrupt-safe CALL targets)
    → kernel reads SQE, validates user buffers
    → CapTable.call(cap_id, method_id, bytes)
    → kernel writes CQE to CQ ring
    ... caller reads CQE after cap_enter, or spin-polls only for
        interrupt-safe/non-CALL ring work ...
    → caller reads CQE result
```

`CapObject::call` does not receive a caller-supplied interface ID. The cap
table derives the invoked interface from the target entry before invoking the
object. The SQE carries only the capability handle and method ID because each
capability entry owns one public interface:

```rust
pub trait CapObject: Send + Sync {
    fn interface_id(&self) -> u64;
    fn label(&self) -> &str;
    fn call(
        &self,
        method_id: u16,
        params: &[u8],
        result: &mut [u8],
        reply_scratch: &mut dyn ReplyScratch,
    ) -> capnp::Result<CapInvokeResult>;
}
```

All communication goes through serialized capnp messages, even when caller and
callee are in the same address space. This ensures the wire format is always
exercised and makes the transition to cross-address-space IPC seamless.

The result buffer is supplied by the caller (the user-validated SQE result
region). Implementations serialize directly into it and return the number of
bytes written, so the kernel's dispatch path does not allocate an intermediate
`Vec<u8>` per invocation.

## Capability Table

Each process has its own capability table (`CapTable`), created at process
startup. The kernel also maintains a global table (`KERNEL_CAPS`) for
kernel-internal use. Each table maps a `CapId` (u32) to a boxed `CapObject`.

CapId encoding: `[generation:8 | index:24]`. The generation counter increments
when a slot is freed, so stale CapIds (from a previous occupant of the slot)
are rejected with `CapError::StaleGeneration` rather than accidentally
referring to a different capability.

Generation wrap must not resurrect old authority. The implemented table retires
a slot permanently when its 8-bit generation would wrap from `255` back to `0`;
that slot is not returned to the free list. Heavy churn can therefore exhaust a
table even when many retired slots are empty, but the failure mode is
`CapError::TableFull`, not stale-cap revalidation. Future widening of `CapId`
generation bits is an ABI change and belongs in the schema/ring ABI evolution
track.

Operations:
- `insert(obj)` -- register a new capability, returns its CapId
- `get(id)` -- look up a capability by ID (validates generation)
- `remove(id)` -- revoke a capability, bumps slot generation
- `call(id, method_id, params)` -- dispatch a method call against the
  interface assigned to the capability entry

Every current boot manifest gives only `initConfig.init` a kernel-built
capability table. The default `system.cue` manifest boots the standalone
`init` binary, which reads BootPackage, validates `initConfig.services`, and
spawns `capos-shell`, the remote-session CapSet gateway, and resident demo
services through ProcessSpawner. The Telnet gateway fixture is retired with
the kernel socket owner. Focused shell-led
manifests such as `system-smoke.cue` and `system-shell.cue` still boot
`capos-shell` directly as `initConfig.init` for narrow login/shell proofs.
Focused init-executor manifests such as `system-spawn.cue` also boot the
standalone `init` binary with Console, BootPackage, and ProcessSpawner for
isolated ProcessSpawner coverage. Child capabilities are assembled from
explicit spawn grants in declaration order:
raw grants preserve the source capability metadata, legacy endpoint-client
grants attenuate an endpoint owner or `ProcessSpawner` endpoint result source
to a client facet while preserving delegated receiver metadata, and child-local
Endpoint, FrameAllocator, and VirtualMemory grants are minted for the child's
process. Endpoint kernel grants return parent-side client facets as result
caps; init uses those facets for later service imports and releases them
before waiting on children. Kernel bootstrap now builds only
`initConfig.init` kernel-sourced caps; `CapSource::Service` resolution stays in
init's BootPackage executor path.
`CapRef.source` is structured CUE inside `initConfig.services`, not an
authority string:

```cue
{
    name:                "client"
    expectedInterfaceId: 0xacf0c15a7b2e0041
    source: service: {
        service: "endpoint-server"
        export:  "client"
    }
}
```

The source selector chooses the object or authority to grant. The
`expectedInterfaceId` value is a schema compatibility check against the
constructed object, not the authority selector itself. This distinction matters
because different objects can implement the same interface.

## Transport-Level Capability Lifetime

Cap'n Proto applications do not usually model capability lifetime as an
application method on every interface. The RPC transport owns capability
reference bookkeeping.

The standard Cap'n Proto RPC protocol is stateful per connection. Each side
keeps four tables: questions, answers, imports, and exports. Import/export IDs
are connection-local, not global object names. When an exported capability is
sent over the connection, the export reference count is incremented. When the
importing side drops its last local reference, the transport sends `Release`
to decrement the remote export count. Implementations may batch these releases.
If the connection is lost, in-flight questions fail, imports become broken, and
exports/answers are implicitly released. Persistent capabilities, when
implemented, are a separate `SturdyRef` mechanism and should not be treated as
owned pointers.

References:

- [Cap'n Proto RPC Protocol: Handling disconnects](https://capnproto.org/rpc.html#handling-disconnects)
- [Cap'n Proto `rpc.capnp`: four tables and `Release`](https://github.com/capnproto/capnproto/blob/master/c++/src/capnp/rpc.capnp)

This distinction matters for capOS:

- `close()` is application protocol. A `File.close()` method can flush dirty
  state, commit metadata, or tell a server that a session should end.
- `Release` / cap drop is transport protocol. It removes one reference from the
  caller's local capability namespace and eventually lets the serving side
  reclaim the object if no references remain.
- Process exit is bulk transport cleanup. Dropping the process must release all
  caps in its table, cancel pending calls, and wake peers waiting on those
  calls.

capOS therefore needs a system transport layer in the userspace runtime
(`capos-rt` / later language runtimes), not just raw SQE helpers. That transport
should own typed client handles, local reference counts, promise-pipelined
answers, and broken-cap state. When the last local handle is dropped, it should
queue a transport-level release operation that is flushed through the kernel
ring at an explicit runtime boundary.

Ordinary handle release is a transport concern, not an application method.
The target design: the generated client drops the last local handle
(RAII / GC / finalizer), the runtime transport queues `CAP_OP_RELEASE`, an
explicit runtime flush or later ring-client boundary submits it, and the kernel
removes the caller's `CapTable` slot with mutable access to that table.
Encoding ordinary local release as a
regular method call on `CapabilityManager` was rejected because it would
mutate the same table used to dispatch the call; `CapabilityManager` is
therefore management-only (`list()` plus child-scoped `revoke(capId)`,
later `grant()`), not the default release path. `CAP_OP_FINISH` remains
reserved in the same transport opcode
namespace for application-level "end of work" signals that the transport must
deliver reliably, so the kernel can tell them apart from a truly malformed
opcode.

Current status: the kernel dispatches `CAP_OP_RELEASE` as a local cap-table
slot removal and fails closed for stale or non-owned cap IDs. `capos-rt`
bootstrap handles remain explicitly non-owning, while adopted owned handles
queue `CAP_OP_RELEASE` on final drop and expose `Runtime::flush_releases()` for
callers that need to force the queued releases. Result-cap adoption validates
the kernel-supplied interface ID before producing an owned typed handle.
`CAP_OP_FINISH` remains reserved and returns `CAP_ERR_UNSUPPORTED_OPCODE`.
Process exit remains the fallback cleanup path for unreleased local slots.

Queued release is not immediate revocation. A dropped runtime handle no longer
provides local typed access in that runtime, but the kernel cap-table slot is
removed only after the release SQE is flushed and processed, or during process
exit cleanup. Security-sensitive flows that need to invalidate authority for
other holders or peers must use explicit revoke/epoch semantics such as
`CapabilityManager.revoke`, session expiry, object epochs, or service-specific
close/revoke methods; they must not rely on destructor timing.

Session expiry is also not a substitute for every revocation shape. The target
session lifecycle model has separate layers:

- a mutable session liveness cell for `live`, `logged_out`, `revoked`,
  `expired`, and `recovery_only` state behind the immutable process
  `SessionContext`;
- broker grant leases for bundle fragments and elevated or temporary caps;
- object/facet epochs for invalidating a live target generation.

Renewal acts on the first two layers. It may extend session liveness or mint
fresh grant leases, but it must not make old ordinary grants fresh merely
because the session renewed. Object/facet revocation remains an independent
target-side operation.

Service authors should make this distinction explicit in protocol design:

- Use ordinary handle drop or runtime `flush_releases()` only to stop this
  process from using one local cap slot.
- Use a service `close` method when the service must observe application-level
  shutdown, flush durable state, or publish an orderly end-of-session result.
- Use `CapabilityManager.revoke`, session expiry, object epochs, or a
  service-specific revoke method when existing peers or delegated holders must
  lose authority before the service proceeds.
- Treat destructor/finalizer timing as advisory cleanup. It is not a security
  boundary, and it is not proof that another process has stopped using a cap.

## Stale-Handle and Revoke Patterns

Not all kernel cap families use the same model for handling stale or revoked
capabilities. The correct pattern depends on the semantics of the object, not on
a blanket epoch test. Using the wrong model produces incorrect tests or
incorrect behavior expectations.

### Category A — Exception-based stale guard

The cap exposes an `ensure_*_live` guard or an equivalent consumed-state check
that returns a stable typed exception (not a silent success) on a stale or
consumed cap.

- `UserSession` (`kernel/src/cap/user_session.rs`): `info()`/`auditContext()`
  fail closed with a stable exception message after `logout()`; second `logout`
  is idempotent. Proved by `run-ssh-public-key-session`.
- `SchedulingContext`, `CpuIsolationLease`: expose an explicit `revoke` method
  returning `staleGeneration`. Subsequent `info`, `bind_caller_thread`,
  `activation_preflight`, `create`, and `drain_notifications` calls fail closed
  on the staled cap. Proved by `run-scheduling-context`
  (`demos/scheduling-context-smoke/src/main.rs:285-313, 1129-1141`) and
  `run-scheduler-cpu-isolation-lease`
  (`demos/cpu-isolation-lease-smoke/src/main.rs:201-237`).
- `ThreadHandle` (`kernel/src/cap/thread_handle.rs`): `join`
  (`sched.rs:1038-1057`) returns `AlreadyJoined` on the second call (hard fail,
  not silent success) and returns `TargetNotLive`
  (`sched.rs:1371,1377,1385`) if the thread record is absent post-cleanup.
  `exitCode` (`sched.rs:1418-1420`) is a non-consuming idempotent read. The
  `join_or_register` consumed-state check is the stale guard; the `joined` flag
  is the epoch. Proved by `run-thread-lifecycle`
  (`demos/thread-lifecycle/src/main.rs:293-298`).

Per-cap epoch tests are applicable only to Category A caps.

### Category B — Idempotent-stale-target

The cap returns silent success (or a latched result) on a stale target. No
`ensure_*_live` guard is present by design.

- `ProcessHandle` (`kernel/src/cap/process_spawner.rs`): `terminate` on an
  already-exited process returns `Complete(0)`; `wait` re-reads the latched exit
  code. Writing fail-closed tests for Category B caps would test the opposite of
  intended behavior.

### Category C — Soft-EOF / zero-write

The cap uses v0 `ExceptionType` policy: closing one side causes the other to
drain and receive EOF; writes return zero bytes rather than an error.

- `Pipe` (`kernel/src/cap/pipe.rs`): `close` causes read to drain + EOF, write
  returns zero bytes (schema lines 2429-2433). No epoch test needed.

### Category D — No revoke verb (kernel singletons)

These caps expose no `revoke` or `close` method in the schema. The backing
object lives for the process lifetime.

`CredentialStore`, `AuthorizedKeyStore`, `SshHostKey`, `EntropySource`,
`SystemInfo`, `AuditLog`, `HardwareAuditLog`, `SessionManager`,
`AuthorityBroker`, `RestrictedLauncher`, `BootPackage`. Nothing to test for
stale-handle behavior.

### Category E — DDF caps with release/scrub semantics

These caps use internal handle epoch validation. The full stale-handle behavior
for each requires targeted per-cap investigation when a behavior gap is
identified.

`DmaBuffer`, `DeviceMmio`, `Interrupt`.

### Open residuals

- **UserSession expiry path (Category A)**: the `expiresAtMs`/`anonymousMs`-
  driven expiry path is not yet covered by a focused smoke.
  `run-ssh-public-key-session` covers the explicit `logout()` close-side path.
  Note that `run-session-context` is flaking on TCG-only hosts — a stability
  fix is needed before that smoke can be strengthened.

## Access Control: Interfaces, Not Rights Bitmasks

capOS deliberately does **not** use a rights bitmask (READ/WRITE/EXECUTE) on
capability entries, despite this being standard in Zircon and seL4. The reason
is that Cap'n Proto typed interfaces already serve as the access control
mechanism, and a parallel rights system creates an impedance mismatch.

**Why rights bitmasks exist in other systems:** Zircon and seL4 use rights
because their syscall interfaces are untyped -- a handle is an opaque reference
to a kernel object, and the kernel needs something to decide which fixed
syscalls are allowed. capOS has typed interfaces where the `.capnp` schema
defines exactly what methods exist.

**capOS's approach: the interface IS the permission.** To restrict what a
caller can do, grant a narrower capability:

- `Fetch` (full HTTP) → `HttpEndpoint` (scoped to one origin)
- `Store` (read-write) → `Store` wrapper that rejects write methods
- `Namespace` (full) → `Namespace` scoped to a prefix

The "restricted" capability is a different `CapObject` implementation that
wraps the original. The kernel doesn't know or care -- it dispatches to
whatever `CapObject` is in the slot. Attenuation is userspace/schema logic,
not a kernel mechanism.

**Session transfer scope:** capability holds now carry reference-level transfer
scope. `same_session` caps cannot move into another process session through
raw IPC, endpoint return, or spawn grants. `cross_session_shareable` caps may
cross and then invoke under the receiver process session. `service_regrant_only`
caps require a trusted fixed-session broker/launcher path. These meta-rights
are about the reference, not the referenced object, and do not overlap with
interface-level method access control.

**Non-writable filesystem caps are forwardable to a same-session child;
writable caps are not.** `Directory`/`File` caps are minted `Copy`/`same_session`
at the read-only and RAM mint sites, so a holder can forward an opened directory
or file to a `ProcessSpawner.spawn` child within the same session -- the kernel
handoff that backs POSIX fd inheritance across fork/execve. The security
argument is the same for all of them: the child gains no authority the parent
does not already hold, `same_session` keeps the cap from escaping the session,
and the spawn-grant epoch wrapper keeps a forwarded child cap from outliving a
revoked parent. Two flavours exist:

- **Read-only views** -- the read-only filesystem (`readonly_fs`) and the
  packaged-image source (`installable_image`), plus their `read_only_fs_root`/
  `installable_image_source` bootstrap roots. Their interfaces fail closed on
  every mutation, so forwarding shares a pure read view. Here *the interface is
  the permission* makes the share unambiguously benign.
- **The holder's own RAM scratch namespace** -- the `directory::transfer_result_cap`
  results and the `kernel:directory`/`kernel:file` bootstrap sources (via
  `boot_cap_hold`). This `Directory`/`File` interface includes mutation methods,
  so the forwarded cap is shared *read/write* with the child, not a read view.
  It is still safe to forward because it is the parent's own scratch tree shared
  within one session, not a privilege the parent lacked.

The disk-backed *writable* filesystem (`writable_fs`) is a distinct `CapObject`
type minted `NonTransferable`: a writable cap carries the filesystem-wide
single-writer claim, so forwarding it would let two processes hold that claim.
The `ProcessSpawner` Raw/Move grant modes reject a `NonTransferable` source, so
the single-writer policy is preserved by the mint-time mode rather than a
separate check. Proven by `make run-spawn-grant-directory`.

**`TerminalSession` is forwardable to a same-session child, parent-retained.**
The bootstrap `TerminalSession` cap is minted `Copy`/`same_session` (matching
`Console`) in `boot_cap_hold`, so a holder can forward its terminal-backed
stdout/stderr to a `ProcessSpawner.spawn` child without losing its own terminal.
`TerminalSessionCap` is a stateless unit struct: `write`/`writeLine` dispatch
onto the shared kernel terminal and `readLine` resolves the caller's session
context per call (`requires_live_caller_session` stays true), so there is no
per-session ownership state to strip on a forward. The child gains no terminal
authority the parent did not already hold, and `same_session` keeps the cap from
escaping the session. This is the non-destructive capability-model realization of
POSIX "all children share the controlling tty"; the prior
`Move`/`service_regrant_only` mint was a policy default, not a state-ownership
requirement, and a destructive `Move` would have stripped a shell of its terminal
on its first child spawn under full fd inheritance. Two writers reaching the same
terminal serialize at the shared kernel UART; sub-line interleaving between a
parent and a child writing concurrently is an accepted research-surface
limitation, not an authority leak. Proven by `make run-posix-terminal-forward`.

See [research survey](research/capability-systems-survey.md) for the cross-system analysis that led to this
decision (§1 Capability Table Design).

## Planned Enhancements (from research)

Tracked in [`docs/roadmap.md`](roadmap.md) Stages 5-6:

- **Legacy badge / receiver selector** -- the current storage field is a `u64`
  per capability hold edge, delivered to endpoint servers on invocation.
  Existing code still calls it a badge because it began as seL4-style client
  identity metadata. The active model keeps that field out of service identity:
  new service capability should use one immutable process session, broker-granted
  service roots/facets, privacy-preserving endpoint caller-session metadata,
  and explicit subject disclosure plus a matching disclosure scope when a
  service needs more than an opaque service-scoped session reference.
- **Epoch** (from EROS) -- per-object revocation epoch. Incrementing the
  epoch invalidates all outstanding references. O(1) revoke, O(1) check.

## Current Limitations

- **Process-ring blocking remains process-level; private ParkSpace waits are
  per-thread.**
  `cap_enter(min_complete, timeout_ns)` processes pending SQEs and can block
  one admitted thread per process until enough CQEs exist or a finite timeout
  expires. That ring wait is still process-owned and does not make the
  capability ring itself a per-thread completion queue. Separately, the
  implemented private ParkSpace path provides process-local per-thread
  wait/wake on userspace words through compact `CAP_OP_PARK`/`CAP_OP_UNPARK`
  operations. SharedParkSpace park-words and runtime safe park clients remain
  future work.
- **No persistence.** Capabilities exist only at runtime.
- **Capability transfer is implemented for Endpoint CALL/RECV/RETURN.**
  Transfer descriptors on the capability ring let callers and receivers copy or
  move transferable local caps through IPC messages. Delivery also enforces the
  cap hold's session transfer scope; an unsupported cross-session transfer
  fails with `CAP_ERR_TRANSFER_NOT_SUPPORTED` and is reported to the caller
  instead of being requeued to the endpoint. See
  [storage-and-naming-proposal.md](proposals/storage-and-naming-proposal.md)
  "IPC and Capability Transfer" for the full design.
- **Transfer ABI (3.6.0 draft).** Sideband transfer descriptors are defined in
  `capos-config/src/ring.rs` as `CapTransferDescriptor`:
  - `cap_id` is the sender-side local capability-table handle.
  - `transfer_mode` is either `CAP_TRANSFER_MODE_COPY` or
    `CAP_TRANSFER_MODE_MOVE`.
  - `xfer_cap_count` in `CapSqe` is the descriptor count.
  - For CALL/RETURN, descriptors are packed at `addr + len` after the payload bytes
    and must be aligned to `CAP_TRANSFER_DESCRIPTOR_ALIGNMENT`.
  - Result-cap insertion semantics are defined by `CapCqe`:
    `result` reports normal payload bytes, while `cap_count` reports how many
    `CapTransferResult { cap_id, interface_id }` records were appended
    immediately after those payload bytes in `result_addr` when
    `CAP_CQE_TRANSFER_RESULT_CAPS` is set. User space must bound-check
    `result + cap_count * CAP_TRANSFER_RESULT_SIZE` against its requested
    `result_len`.
  - Future promise pipelining must target that sideband result-cap namespace:
    `pipeline_dep` names a process-local promised answer, and `pipeline_field`
    is a zero-based `CapTransferResult` record index in that answer's completion.
    It is not a Cap'n Proto schema field number; the kernel must not traverse
    opaque result payload bytes to find a capability.
  - Transfer-bearing SQEs are fail-closed:
    - unsupported transfer scope or object class:
      `CAP_ERR_TRANSFER_NOT_SUPPORTED`,
    - malformed descriptor metadata (invalid mode, reserved bits, non-zero
      `_reserved0`, misalignment, overflow): `CAP_ERR_INVALID_TRANSFER_DESCRIPTOR`,
    - all other reserved-field misuse remains `CAP_ERR_INVALID_REQUEST`.
- **Revocation propagates through object epochs.** `CapabilityManager.revoke`
  invalidates child-local grant copies for the revoked object, and the ring
  maps revoked ordinary and endpoint use to typed `Disconnected` exceptions
  where a result buffer exists. Broader supervision/restart policy remains
  future work.
- **MemoryObject is the mapped bulk-data substrate.** `FrameAllocator` returns
  owned `MemoryObject` result caps instead of raw physical addresses. The
  object exposes metadata plus caller-local map/unmap/protect operations for
  page-aligned ranges. File I/O, networking, GPU data planes, and zero-copy IPC
  still need service-level SharedBuffer operations built on this substrate.
  See [storage-and-naming-proposal.md](proposals/storage-and-naming-proposal.md)
  "Shared Memory for Bulk Data" for the broader interface design.

## Future Directions

- **Broader capability-bearing services.** Endpoint CALL/RECV/RETURN already
  carry copy/move sideband transfer descriptors and install result caps in the
  receiver's local table. Remaining work is to use that transport in higher
  service layers: capability-bearing naming and persistence services,
  Directory/File and Namespace-style object models, promise pipelining over
  result-cap indexes, and policy for durable references. See
  [storage-and-naming-proposal.md](proposals/storage-and-naming-proposal.md).
- **Persistence.** Persistent object references should be restored through a
  capability-bearing naming or persistence service that can authorize the
  request and mint a fresh live object. Do not serialize local cap-table
  handles, endpoint generations, receiver selectors, or server cookies as
  durable authority.
- **Network transparency.** Remote capability transport should use
  connection-local export/import tables and explicit disconnect semantics. A
  remote Console capability can expose the same typed interface as a local one,
  but the portable authority is the live object reference, not a global URL or
  serialized local routing selector.
