Capability Model
How capabilities work in capOS.
Status: Partially implemented. Generation-tagged cap tables, typed schema interface IDs, manifest/CapSet grants, badges, transport-level release, and Endpoint copy/move transfer are implemented. Revocation propagation, persistence, and bulk-data capabilities remain future work.
What is a Capability
A capability in capOS is a reference to a kernel object that carries:
- An interface (what methods can be called), defined by a Cap’n Proto schema
- A permission (the object it references, enforced by the kernel)
- A wire format (Cap’n Proto serialized messages for all invocations)
A process can only access a resource if it holds a capability to it. There is no ambient authority – no global namespace, no “open by path” syscall, no implicit resource access.
Schema as Contract
Capability interfaces are defined in .capnp schema files under schema/.
The schema is the canonical interface definition. Currently defined:
interface Console {
write @0 (data :Data) -> ();
writeLine @1 (text :Text) -> ();
}
interface FrameAllocator {
allocFrame @0 () -> (physAddr :UInt64);
freeFrame @1 (physAddr :UInt64) -> ();
allocContiguous @2 (count :UInt32) -> (physAddr :UInt64);
}
interface VirtualMemory {
map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
unmap @1 (addr :UInt64, size :UInt64) -> ();
protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}
interface Endpoint {}
interface ProcessSpawner {
spawn @0 (name :Text, binaryName :Text, grants :List(CapGrant)) -> (handleIndex :UInt16);
}
interface ProcessHandle {
wait @0 () -> (exitCode :Int64);
}
interface BootPackage {
manifestSize @0 () -> (size :UInt64);
readManifest @1 (offset :UInt64, maxBytes :UInt32) -> (data :Data);
}
# Management-only introspection. Ordinary handle release uses the system
# transport opcode CAP_OP_RELEASE, not a method here.
interface CapabilityManager {
list @0 () -> (capabilities :List(CapabilityInfo));
# grant is planned for Stage 6 (IPC and Capability Transfer)
}
Each interface has a unique 64-bit TYPE_ID generated by the Cap’n Proto
compiler. TYPE_ID is the schema constant. interface_id is the runtime
metadata used by CapSet/bootstrap descriptions and endpoint delivery headers.
Method dispatch uses the interface assigned to the capability entry plus
method_id; method_id selects a method inside that schema.
This is not capability identity. A CapId is the authority-bearing handle in
a process table, analogous to an fd. Multiple capabilities can expose the same
interface:
cap_id=3-> serial-backedConsolecap_id=4-> log-buffer-backedConsolecap_id=5->Consoleproxy served by another process
All three use the same Console TYPE_ID, but they are different objects
with different authority. The manifest/CapSet should record the expected schema
TYPE_ID as interface metadata for typed handle construction. Normal CALL SQEs
do not need to repeat it because the kernel or serving transport can derive it
from the target capability entry. CapSqe keeps reserved tail padding for ABI
stability.
The kernel exposes the initial CapSet to each process as a read-only
4 KiB page mapped at capos_config::capset::CAPSET_VADDR and passes its
address in RDX to _start. The page starts with a
CapSetHeader { magic, version, count } and is followed by
CapSetEntry { cap_id, name_len, interface_id, name: [u8; 32] } records
in manifest declaration order. Userspace looks up caps by the manifest
name rather than by numeric index (capos_config::capset::find), so
grants can be reordered in system.cue without breaking clients. The
mapping is installed without WRITABLE so userspace cannot mutate its
own bootstrap authority map.
Security invariant: a CapTable entry exposes one public interface. If the
same backing state must be available through multiple interfaces, mint multiple
capability entries, each wrapping the same state with a narrower interface.
Do not grant one handle that accepts unrelated interface_id values; that
makes hidden authority easy to miss during review.
Invocation Path
Capabilities are invoked via a shared-memory capability ring (io_uring- inspired). Each process has a submission queue (SQ) and completion queue (CQ) mapped into its address space. Two invocation paths exist:
Caller builds capnp params message
→ serialize to bytes (write_message_to_words)
→ write CALL SQE to SQ ring (pure userspace memory write)
→ advance SQ tail
→ caller invokes cap_enter for ordinary capability methods
(timer polling only runs explicitly interrupt-safe CALL targets)
→ kernel reads SQE, validates user buffers
→ CapTable.call(cap_id, method_id, bytes)
→ kernel writes CQE to CQ ring
... caller reads CQE after cap_enter, or spin-polls only for
interrupt-safe/non-CALL ring work ...
→ caller reads CQE result
CapObject::call does not receive a caller-supplied interface ID. The cap
table derives the invoked interface from the target entry before invoking the
object. The SQE carries only the capability handle and method ID because each
capability entry owns one public interface:
#![allow(unused)]
fn main() {
pub trait CapObject: Send + Sync {
fn interface_id(&self) -> u64;
fn label(&self) -> &str;
fn call(
&self,
method_id: u16,
params: &[u8],
result: &mut [u8],
reply_scratch: &mut dyn ReplyScratch,
) -> capnp::Result<CapInvokeResult>;
}
}
All communication goes through serialized capnp messages, even when caller and callee are in the same address space. This ensures the wire format is always exercised and makes the transition to cross-address-space IPC seamless.
The result buffer is supplied by the caller (the user-validated SQE result
region). Implementations serialize directly into it and return the number of
bytes written, so the kernel’s dispatch path does not allocate an intermediate
Vec<u8> per invocation.
Capability Table
Each process has its own capability table (CapTable), created at process
startup. The kernel also maintains a global table (KERNEL_CAPS) for
kernel-internal use. Each table maps a CapId (u32) to a boxed CapObject.
CapId encoding: [generation:8 | index:24]. The generation counter increments
when a slot is freed, so stale CapIds (from a previous occupant of the slot)
are rejected with CapError::StaleGeneration rather than accidentally
referring to a different capability.
Operations:
insert(obj)– register a new capability, returns its CapIdget(id)– look up a capability by ID (validates generation)remove(id)– revoke a capability, bumps slot generationcall(id, method_id, params)– dispatch a method call against the interface assigned to the capability entry
Each service receives capabilities from cap::create_all_service_caps(),
which runs a two-pass resolution over the whole manifest: pass 1 materializes
each service’s kernel-sourced caps as Arc<dyn CapObject> and records its
declared exports; pass 2 assembles each service’s CapTable in declaration
order, cloning the exported Arc when another service’s CapRef resolves
via CapSource::Service. Declaration order is preserved because numeric
CapIds are assigned by insertion order and smoke tests depend on specific
indices. CapRef.source is a structured capnp union, not an authority string:
struct CapRef {
name @0 :Text;
expectedInterfaceId @1 :UInt64;
union {
unset @2 :Void; # invalid; keeps omitted sources fail-closed
kernel @3 :KernelCapSource;
service @4 :ServiceCapSource;
}
}
enum KernelCapSource {
console @0;
endpoint @1;
frameAllocator @2;
virtualMemory @3;
}
struct ServiceCapSource {
service @0 :Text;
export @1 :Text;
}
The source selector chooses the object or authority to grant. The
expectedInterfaceId value is a schema compatibility check against the
constructed object, not the authority selector itself. This distinction matters
because different objects can implement the same interface.
Transport-Level Capability Lifetime
Cap’n Proto applications do not usually model capability lifetime as an application method on every interface. The RPC transport owns capability reference bookkeeping.
The standard Cap’n Proto RPC protocol is stateful per connection. Each side
keeps four tables: questions, answers, imports, and exports. Import/export IDs
are connection-local, not global object names. When an exported capability is
sent over the connection, the export reference count is incremented. When the
importing side drops its last local reference, the transport sends Release
to decrement the remote export count. Implementations may batch these releases.
If the connection is lost, in-flight questions fail, imports become broken, and
exports/answers are implicitly released. Persistent capabilities, when
implemented, are a separate SturdyRef mechanism and should not be treated as
owned pointers.
References:
This distinction matters for capOS:
close()is application protocol. AFile.close()method can flush dirty state, commit metadata, or tell a server that a session should end.Release/ cap drop is transport protocol. It removes one reference from the caller’s local capability namespace and eventually lets the serving side reclaim the object if no references remain.- Process exit is bulk transport cleanup. Dropping the process must release all caps in its table, cancel pending calls, and wake peers waiting on those calls.
capOS therefore needs a system transport layer in the userspace runtime
(capos-rt / later language runtimes), not just raw SQE helpers. That transport
should own typed client handles, local reference counts, promise-pipelined
answers, and broken-cap state. When the last local handle is dropped, it should
queue a transport-level release operation that is flushed through the kernel
ring at an explicit runtime boundary.
Ordinary handle release is a transport concern, not an application method.
The target design: the generated client drops the last local handle
(RAII / GC / finalizer), the runtime transport queues CAP_OP_RELEASE, an
explicit runtime flush or later ring-client boundary submits it, and the kernel
removes the caller’s CapTable slot with mutable access to that table.
Encoding release as a
regular method call on CapabilityManager was rejected because it would
mutate the same table used to dispatch the call; CapabilityManager is
therefore management-only (list(), later grant()), not the default
release path. CAP_OP_FINISH remains reserved in the same transport opcode
namespace for application-level “end of work” signals that the transport must
deliver reliably, so the kernel can tell them apart from a truly malformed
opcode.
Current status: the kernel dispatches CAP_OP_RELEASE as a local cap-table
slot removal and fails closed for stale or non-owned cap IDs. capos-rt
bootstrap handles remain explicitly non-owning, while adopted owned handles
queue CAP_OP_RELEASE on final drop and expose Runtime::flush_releases() for
callers that need to force the queued releases. Result-cap adoption validates
the kernel-supplied interface ID before producing an owned typed handle.
CAP_OP_FINISH remains reserved and returns CAP_ERR_UNSUPPORTED_OPCODE.
Process exit remains the fallback cleanup path for unreleased local slots.
Access Control: Interfaces, Not Rights Bitmasks
capOS deliberately does not use a rights bitmask (READ/WRITE/EXECUTE) on capability entries, despite this being standard in Zircon and seL4. The reason is that Cap’n Proto typed interfaces already serve as the access control mechanism, and a parallel rights system creates an impedance mismatch.
Why rights bitmasks exist in other systems: Zircon and seL4 use rights
because their syscall interfaces are untyped – a handle is an opaque reference
to a kernel object, and the kernel needs something to decide which fixed
syscalls are allowed. capOS has typed interfaces where the .capnp schema
defines exactly what methods exist.
capOS’s approach: the interface IS the permission. To restrict what a caller can do, grant a narrower capability:
Fetch(full HTTP) →HttpEndpoint(scoped to one origin)Store(read-write) →Storewrapper that rejects write methodsNamespace(full) →Namespacescoped to a prefix
The “restricted” capability is a different CapObject implementation that
wraps the original. The kernel doesn’t know or care – it dispatches to
whatever CapObject is in the slot. Attenuation is userspace/schema logic,
not a kernel mechanism.
When transfer control is needed (Stage 6): meta-rights for the capability itself (can it be transferred? duplicated?) may be added as a small bitmask. These are about the reference, not the referenced object, and don’t overlap with interface-level method access control.
See research.md for the cross-system analysis that led to this decision (§1 Capability Table Design).
Planned Enhancements (from research)
Tracked in ROADMAP.md
Stages 5-6:
- Badge (from seL4) – u64 value per capability entry, delivered to the
server on invocation. Implemented for manifest cap refs, IPC transfer, and
ProcessSpawnerendpoint-client minting so servers can distinguish callers without separate capability objects per client. - Epoch (from EROS) – per-object revocation epoch. Incrementing the epoch invalidates all outstanding references. O(1) revoke, O(1) check.
Current Limitations
- Blocking wait exists, but waits are still process-level.
cap_enter(min_complete, timeout_ns)processes pending SQEs and can block the current process until enough CQEs exist or a finite timeout expires. It is not yet a general futex/thread wait primitive; in-process threading and futex-shaped measurements are tracked separately. - No persistence. Capabilities exist only at runtime.
- Capability transfer is implemented for Endpoint CALL/RECV/RETURN. Transfer descriptors on the capability ring let callers and receivers copy or move transferable local caps through IPC messages. See storage-and-naming-proposal.md “IPC and Capability Transfer” for the full design.
- Transfer ABI (3.6.0 draft). Sideband transfer descriptors are defined in
capos-config/src/ring.rsasCapTransferDescriptor:cap_idis the sender-side local capability-table handle.transfer_modeis eitherCAP_TRANSFER_MODE_COPYorCAP_TRANSFER_MODE_MOVE.xfer_cap_countinCapSqeis the descriptor count.- For CALL/RETURN, descriptors are packed at
addr + lenafter the payload bytes and must be aligned toCAP_TRANSFER_DESCRIPTOR_ALIGNMENT. - Result-cap insertion semantics are defined by
CapCqe:resultreports normal payload bytes, whilecap_countreports how manyCapTransferResult { cap_id, interface_id }records were appended immediately after those payload bytes inresult_addrwhenCAP_CQE_TRANSFER_RESULT_CAPSis set. User space must bound-checkresult + cap_count * CAP_TRANSFER_RESULT_SIZEagainst its requestedresult_len. - Transfer-bearing SQEs are fail-closed:
- unsupported-by-kernel-transfer path:
CAP_ERR_TRANSFER_NOT_SUPPORTED(until sideband transfer is enabled), - malformed descriptor metadata (invalid mode, reserved bits, non-zero
_reserved0, misalignment, overflow):CAP_ERR_INVALID_TRANSFER_DESCRIPTOR, - all other reserved-field misuse remains
CAP_ERR_INVALID_REQUEST.
- unsupported-by-kernel-transfer path:
- No revocation propagation. Removing a table entry doesn’t invalidate copies or derived capabilities. Epoch-based revocation is planned.
- No bulk data path. All data goes through capnp message copy. SharedBuffer / MemoryObject capability needed for file I/O, networking, GPU data plane. See storage-and-naming-proposal.md “Shared Memory for Bulk Data” for the interface design.
Future Directions
- Capability transfer. Cross-process capability calls already go through
the kernel via Endpoint objects with RECV/RETURN SQE opcodes on the
existing per-process capability ring (no new syscalls). The remaining
transfer work will carry capability references with sideband descriptors and
install result caps in the receiver’s local table. See
storage-and-naming-proposal.md for how
this enables
Directory.open()returning File caps,Namespace.sub()returning scoped Namespace caps, etc. - Persistence. Serialize capability state to storage using capnp format. Restore capabilities across reboots.
- Network transparency. Forward capability calls to remote machines using the same capnp wire format. A remote Console capability looks identical to a local one.