Proposal: Error Handling for Capability Invocations
How capOS communicates errors from capability calls back to userspace processes.
Current design authority now lives in Error Handling. This proposal is retained as the archival decision record and original rationale for the implemented two-level model.
This proposal defines a two-level error model: transport errors (the invocation mechanism itself failed) and application errors (the capability processed the request and returned a structured error). The design aligns with Cap’n Proto’s own exception model and the patterns used by seL4, Zircon, and other capability systems.
Status note: The shared-memory capability ring +
cap_enterhas replacedcap_callas the invocation surface, and the two-level error model described below is implemented for the current ring, runtime, and endpoint IPC surface. Transport errors arrive as negativeCapCqe.resultcodes (see “Current CQE Error Namespace”); application errors arrive as a serializedCapExceptionwithCAP_ERR_APPLICATION_EXCEPTION. TheCapExceptionschema andExceptionTypetaxonomy live inschema/capos.capnp(enum ExceptionTypeandstruct CapExceptionnear the bottom of the schema), the kernel side serializes them throughkernel/src/cap/ring.rs(including theINVALID_ARGUMENT_SENTINELchannel for the capOS-onlyinvalidArgumentvariant), andcapos-rt/src/client.rsdecodes them intoClientError::Application(ApplicationException).Related documents:
docs/architecture/error-handling.mdis the current design authority for the implemented error layers.docs/architecture/capability-ring.mdowns the current ring transport contract that carries the CQE status values.docs/proposals/service-architecture-proposal.mdcaptures the cross-process spawn and revoked-endpoint surface that exercisesDisconnectedand the endpoint RETURN exception flag end-to-end.docs/design-risks-register.mdrecords the open contracts that flow into this proposal: R6 (deferredCAP_OP_RELEASE) and R15 (application-exception serialization depends on result-buffer capacity).docs/capability-model.mddescribes the broader capability model the error layers sit inside; this proposal owns only the error model.The “Problem Statement”, “Syscall Return Convention”, “Kernel Implementation”, “Userspace API”, and “Migration Path” sections below describe the original
cap_call-era design that motivated the model. They are kept as historical context; the “Current CQE Error Namespace”, “CapException Schema”, and “Application-Level Errors in Interface Schemas” sections describe current behavior.
Current CQE Error Namespace
The capability ring uses signed 32-bit CapCqe.result values. Non-negative
values are opcode-specific success results; negative values are kernel transport
errors defined in capos-config/src/ring.rs:
| Code | Name | Meaning |
|---|---|---|
-1 | CAP_ERR_INVALID_REQUEST | Malformed request metadata or an opcode value not reserved in the ABI. |
-2 | CAP_ERR_INVALID_PARAMS_BUFFER | SQE parameter buffer is unmapped, out of range, or not readable. |
-3 | CAP_ERR_INVALID_RESULT_BUFFER | SQE result buffer is unmapped, out of range, or not writable. |
-4 | CAP_ERR_INVOKE_FAILED | Capability lookup or invocation failed before a successful result was produced. |
-5 | CAP_ERR_UNSUPPORTED_OPCODE | Opcode is reserved in the ABI but not yet dispatched. Currently returned for CAP_OP_FINISH; CAP_OP_RELEASE has kernel dispatch and reports stale/non-owned caps as request/invoke failures. |
-6 | CAP_ERR_TRANSFER_NOT_SUPPORTED | Transfer mode or sideband descriptor layout is recognized as unsupported by this kernel. |
-7 | CAP_ERR_INVALID_TRANSFER_DESCRIPTOR | xfer_cap_count descriptor layout malformed or contains reserved bits. |
-8 | CAP_ERR_TRANSFER_ABORTED | Transaction-in-progress transfer failed and must not produce partial capability state. |
-9 | CAP_ERR_APPLICATION_EXCEPTION | A structured CapException was serialized into the caller-provided result buffer. |
-10 | CAP_ERR_APPLICATION_EXCEPTION_TRUNCATED | An application exception occurred, but no detail fit in the available result buffer. |
This is deliberately a small transport namespace. Interface-specific failures should be encoded in the result payload once the target capability successfully handles the request.
Revoked capabilities use the same application-exception path when the caller
provided a result buffer. Ordinary capability CALLs and endpoint CALL/RECV on a
revoked cap serialize a Disconnected CapException and complete with
CAP_ERR_APPLICATION_EXCEPTION. Runtime clients decode that CQE into
ClientError::Application(ApplicationException { type: Disconnected, ... }).
Endpoint RETURN is asymmetric because the result belongs to the original caller,
not the returning receiver. A receiver can set
CAP_SQE_RETURN_APPLICATION_EXCEPTION on CAP_OP_RETURN to return a
serialized CapException to the original caller; the receiver’s own RETURN CQE
still reports only whether the RETURN transport succeeded. If a receiver tries
to RETURN through a revoked endpoint while an in-flight caller still has a
result buffer, the kernel first preflights completion-queue space for both
caller and receiver, then removes the in-flight call, serializes a
Disconnected exception into the caller’s buffer, and posts the caller
completion with CAP_ERR_APPLICATION_EXCEPTION. The receiver always gets
CAP_ERR_APPLICATION_EXCEPTION_TRUNCATED because revoked RETURN has no
receiver-owned result payload. If the caller did not provide a result buffer,
the caller also receives the truncated code. Lookup or CQ-space failures that
cannot be tied to a result buffer remain transport failures.
Revoking an endpoint cap through a child CapabilityManager also cancels
endpoint wait state on that object: owner endpoint revoke cancels all queued
calls, pending receives, and in-flight calls, while non-owner endpoint facet
revoke cancels entries tied to the managed child pid. Those cancellation
completions use the existing endpoint-cancel transport result because they
describe already-pending SQEs, not a fresh invocation with a result buffer.
Current Implementation Inventory
Implemented typed exception paths:
- Ordinary
CAP_OP_CALLcapability implementations that returncapnp::Errorare serialized asCapExceptionpayloads when the SQE has a writable result buffer.capnp::ErrorKind::{Failed, Overloaded, Disconnected, Unimplemented}map to the matchingExceptionType; all other Cap’n Proto decode/validation kinds map toFailed. - Ordinary revoked-cap calls serialize
Disconnectedwhen a result buffer is present. - Endpoint CALL and RECV on a revoked endpoint serialize
Disconnectedwhen a result buffer is present. - Live endpoint CALL target errors that arise after a valid endpoint cap is
identified serialize as
CapExceptionwhen the caller supplies a result buffer. Endpoint queue-capacity, parameter-slot, call-id, and in-flight capacity failures are reported asOverloaded. - Endpoint RETURN through a revoked endpoint reports
Disconnectedto the original caller when that caller has a result buffer, and reports the receiver-side no-payload/truncated application-exception code. - Endpoint RETURN with
CAP_SQE_RETURN_APPLICATION_EXCEPTIONcopies the receiver-provided serializedCapExceptionto the original caller and postsCAP_ERR_APPLICATION_EXCEPTION; if no payload fits, the original caller getsCAP_ERR_APPLICATION_EXCEPTION_TRUNCATED. capos-rtdecodesCAP_ERR_APPLICATION_EXCEPTIONintoClientError::Application(ApplicationException)and treatsDisconnectedas breaking the local capability handle. Truncated application exceptions decode asFailedwith an empty diagnostic message. Endpoint servers can usecapos-rt’ssubmit_endpoint_return_exception()helper to produce that RETURN shape.
Intentional generic transport paths:
- Capability lookup failures before a target object is identified still return
CAP_ERR_INVOKE_FAILED; these remain transport errors. - Malformed SQE metadata, bad params/result buffers, unsupported opcodes, and malformed transfer descriptors remain transport errors.
- Endpoint delivery/receive/return rollback failures that arise while restoring
queues, committing sideband transfers, posting to completion queues, or
writing endpoint payloads still use
CAP_ERR_INVOKE_FAILED,CAP_ERR_TRANSFER_ABORTED, orCAP_ERR_INVALID_RESULT_BUFFER. Result-buffer validation and endpoint payload copy failures are transport errors because no safe payload destination exists. - Existing QEMU coverage proves
Disconnectedfor revocation and one ordinary localUnimplementedruntime path. Theendpoint-roundtripQEMU demo proves local live-endpointOverloadedserialization for endpoint queue saturation. Cross-processDisconnectedis covered for revoked endpoint use, andmake run-spawnnow proves cross-process endpoint RETURN propagation forFailed,Overloaded, andUnimplementedapplication exceptions. The same focused spawn proof runsring-reserved-opcodes, which checks that the RETURN exception flag is rejected outside its valid shape and that an endpoint caller with no result buffer receivesCAP_ERR_APPLICATION_EXCEPTION_TRUNCATED.
Target Contract
For this milestone, a kernel path should produce a typed CapException when
all of the following are true:
- A capability invocation target was identified, or an endpoint operation is acting on an already accepted call/receive relationship.
- The failure is attributable to invocation semantics rather than malformed ring transport metadata.
- The affected caller supplied a result buffer that can hold a serialized exception.
If the same invocation-level failure occurs with no result buffer or an
insufficient result buffer, the CQE result is
CAP_ERR_APPLICATION_EXCEPTION_TRUNCATED. If no target capability or accepted
IPC relationship exists, the failure stays in the transport namespace. Result
buffer validation failure also stays transport-level because no safe payload
destination exists.
The exception serialization path respects two per-process resource-profile
limits wired from the manifest ResourceProfile fields (both defaulting to
65 536 bytes, the kernel ceiling):
ringScratchLimitBytes– bounds the ring input and output scratch buffers. Any CALL withparams_lenexceeding the effective input limit is rejected withCAP_ERR_INVALID_REQUESTat the transport layer before capability dispatch.replyScratchLimitBytes– bounds the reply scratch used byserialize_application_exception_to_userandserialize_disconnected_exception_to_process. The effective reply limit ismin(replyScratchLimitBytes, ringScratchLimitBytes); if the serialized exception exceeds this limit, the caller receivesCAP_ERR_APPLICATION_EXCEPTION_TRUNCATEDinstead. Prior to this wiring, reply scratch was unconstrained at the global 64 KiB ceiling regardless of the process’sringScratchLimitBytes, which caused spurious TRUNCATED results for tightly constrained processes. Both limits are enforced as of commit 4fc0466d (replyScratchLimitBytes) and commit 1bcfbad4 (ringScratchLimitBytes).
The exception types keep their Cap’n Proto client-response meaning;
InvalidArgument is the capOS-only addition introduced with Scheduler
Phase D Task 1 (commit cb8c58b1, 2026-05-07). The canonical worked example
is SchedulingPolicyCap.setWeight in schema/capos.capnp, whose schema
comment states the cap rejects out-of-range or zero values with a
CapException of type invalidArgument and does NOT silently clamp:
Failed: deterministic invocation failure, deserialization error, or a target-side invariant failure. New caps that validate parameters at the cap boundary should returnInvalidArgumentinstead ofFailedfor caller bugs;Failedis for “the cap tried and could not”.Overloaded: temporary resource exhaustion after a valid target invocation has begun.Disconnected: target object, endpoint facet, or peer relationship is gone.Unimplemented: target object is live but does not implement the requested method.InvalidArgument: the cap accepted the call (target lives, message parsed) but a parameter value violates the documented contract. Distinct fromFailedbecause the caller is expected to correct its input and retry, not back off or treat the cap as broken. Carried on the wire today throughINVALID_ARGUMENT_SENTINELinkernel/src/cap/ring.rs; userspace decode incapos-rt::client::ApplicationExceptionreturnsExceptionType::InvalidArgument.
Exception messages are diagnostic only. They must not include kernel pointers, secret payload bytes, or other process-private data.
Schema Style Guide
Use the three error layers consistently:
| Layer | Use for | Do not use for |
|---|---|---|
| CQE status | Ring, transport, kernel dispatch, malformed SQE, missing target, invalid buffer, unsupported ABI/version, and other failures where no safe capability-level payload exists. | Normal service/domain outcomes. |
CapException | Capability-level infrastructure failure after a target or accepted endpoint relationship exists: decode failure, unknown method, target gone, temporary overload after dispatch, or target invariant failure. | Expected application/domain rejection. |
| Schema result union | Ordinary application or domain outcome: not found, permission denied by service policy, invalid business object, quota denied as a declared operation result, or accepted conditional failure. | Ring/transport failure or generic catch-all exceptions. |
Generated clients and future capos-service helpers should preserve this
split: CQE status is transport failure, decoded CapException is capability
infrastructure failure, and method result unions are the normal application
error surface.
Use CQE status for ring transport errors, invalid SQE layout, invalid cap slot, kernel dispatch failure, buffer access failure, unsupported ring ABI/SQE version, malformed transfer descriptors, and other transport-level failures where no safe typed payload boundary exists.
Use CapException for capability infrastructure failure: unknown method,
revoked capability, stale endpoint/session, permission or authority failure,
resource exhaustion at a capability boundary, service unavailable, and
unimplemented method.
Use schema result unions for normal domain/application outcomes:
notFound, permissionDenied as a domain decision, invalidInput with domain
meaning, alreadyExists, conflict, validation failure, and accepted/rejected
business results.
Anti-rules:
- Do not encode ordinary application outcomes as
CapException. - Do not expose internal traces, filesystem paths, kernel pointers, or service-local details in cross-service exceptions by default.
- Do not use generic
Texterrors where a stable union variant is possible. - Do not overload
CapException::failedfor every domain-level failure.
Preferred schema shape for ordinary domain outcomes:
struct OpenResult {
union {
file @0 :File;
notFound @1 :Void;
permissionDenied @2 :Void;
invalidPath @3 :Void;
unsupported @4 :Void;
}
}
Transfer-related transport mapping (3.6.0 ABI slice)
CAP_ERR_TRANSFER_NOT_SUPPORTEDis used for transfer-bearing SQEs that the kernel currently dispatches but does not yet process (xfer_cap_count != 0on kernels where sideband transfer is off).CAP_ERR_INVALID_TRANSFER_DESCRIPTORis used for structurally validly dispatched transfer SQEs where transfer metadata is malformed:- descriptor
transfer_modeis not exactlyCAP_TRANSFER_MODE_COPYorCAP_TRANSFER_MODE_MOVE; - any descriptor reserved bits are set;
- any descriptor
_reserved0field is non-zero; - descriptor region placement (
addr + len) is misaligned; - descriptor range overflows or cannot be safely bounded.
- descriptor
CAP_ERR_TRANSFER_ABORTEDis reserved for transaction failure after partial transfer side effects are prepared and must not be observed (all-or-nothing rollback boundary).CAP_ERR_INVALID_REQUESTremains for non-transfer transport malformation (unsupported opcodes for today, unsupported SQE fields not part of the transfer path, and malformed result/payload buffer pairs).
Historical: Pre-Ring cap_call Design
The sections from “Problem Statement” through “Migration Path” describe the
original cap_call synchronous syscall that preceded the capability ring.
They are preserved for design context; see the “Current CQE Error Namespace”
and “CapException Schema” sections above for current behavior.
Problem Statement
Currently, cap_call returns u64::MAX on any error and prints the details
to the kernel serial console. The userspace process receives no information
about what went wrong – it cannot distinguish “invalid capability ID” from
“method not implemented” from “out of memory inside the service.”
Every other capability system separates transport-level errors (bad handle, message validation failure) from application-level errors (the service processed the request and returned a meaningful error). capOS needs both.
Background: How Other Systems Do This
Cap’n Proto RPC Protocol
The Cap’n Proto RPC specification defines an Exception type in rpc.capnp:
struct Exception {
reason @0 :Text;
type @3 :Type;
enum Type {
failed @0; # deterministic failure, retrying won't help
overloaded @1; # temporary resource exhaustion, retry with backoff
disconnected @2; # connection to a required capability was lost
unimplemented @3; # method not supported by this server
}
trace @4 :Text;
}
These four types describe client response strategy, not error semantics.
The capnp Rust crate maps them to capnp::ErrorKind::{Failed, Overloaded, Disconnected, Unimplemented}.
Cap’n Proto’s official philosophy (from KJ library and Kenton Varda’s writings): exceptions are for infrastructure failures, not application semantics. Application-level errors should be modeled as unions in method return types.
Cloudflare Workers RPC and Spritely/OCapN CapTP reinforce the network-boundary
rule: remote promise breakage and error values are diagnostic material, not
authority inputs, and debug details such as traces or internal paths can leak
sensitive information. Future Workers RPC, Cap’n Web, CapTP, or OCapN-style
adapters must deliberately map remote errors into CapException or schema
result unions and strip or seal debug detail at the boundary. See
Cloudflare, Cap’n Proto, Workers RPC, and Cap’n Web
and
Spritely, OCapN, and CapTP.
Capability OS Error Models
| System | Transport errors | Application errors |
|---|---|---|
| seL4 | seL4_Error enum (11 values) from syscall return | In-band via IPC message payload (user-defined) |
| Zircon | zx_status_t (signed i32, ~30 values) from syscall | FIDL per-method error type (union in return) |
| EROS/Coyotos | Kernel-generated invocation exceptions | OPR0.ex flag + exception code in reply payload |
| Plan 9 (9P) | Connection loss (no in-band transport error) | Rerror message with UTF-8 error string |
| Genode | Ipc_error exception | Declared C++ exceptions via GENODE_RPC_THROW |
Common pattern: a small kernel error code set for transport failures, combined with service-specific typed errors for application failures.
POSIX errno: Why Not
POSIX errno is a global flat namespace of ~100 integers that conflates
transport errors (EBADF) with application errors (ENOENT). In a
capability system:
EACCES/EPERMdon’t apply – if you have the capability, you have permission; if you don’t, you can’t even name the resource.- A global error namespace conflicts with typed interfaces where errors should be scoped to the interface.
- No room for structured information (which argument was invalid, how much memory was needed).
- Not composable across trust boundaries – a callee’s errno has no meaning in the caller’s address space without explicit serialization.
Design
Principle: Two Levels, One Wire Format
Level 1 – Transport errors are returned in the syscall return value.
These indicate that the capability invocation mechanism itself failed before
the target CapObject was reached. No result buffer is written.
Level 2 – Application errors are returned as capnp-serialized messages in the result buffer. The capability was found and dispatched; the implementation returned a structured error. The syscall return value distinguishes this from a successful result.
Both levels use Cap’n Proto serialization for the error payload (level 2 always, level 1 when there’s a result buffer available). This keeps one parsing path in userspace.
Syscall Return Convention
The cap_call syscall (number=2) currently returns:
0..N– success, N bytes written to result bufferu64::MAX– error (undifferentiated)
New convention:
| Return value | Meaning |
|---|---|
0..=(u64::MAX - 256) | Success. Value = number of bytes written to result buffer. |
u64::MAX | Transport error: invalid capability ID or stale generation. |
u64::MAX - 1 | Transport error: invalid user buffer (bad pointer, unmapped, not writable). |
u64::MAX - 2 | Transport error: params too large (exceeds MAX_CAP_CALL_PARAMS). |
u64::MAX - 3 | Application error: the capability returned an error. A CapException message has been written to the result buffer. The message length is encoded in the low 32 bits of the value at result_ptr (the capnp message itself). |
u64::MAX - 4 | Application error, but the result buffer was too small or NULL. The error detail is lost; the caller should retry with a larger buffer or treat it as an opaque failure. |
The transport error codes are a small closed set (like seL4’s 11 values). New transport errors can be added, but the set should remain small and stable.
CapException Schema
Added to schema/capos.capnp:
enum ExceptionType {
failed @0;
overloaded @1;
disconnected @2;
unimplemented @3;
invalidArgument @4;
}
struct CapException {
type @0 :ExceptionType;
message @1 :Text;
}
This mirrors Cap’n Proto RPC’s Exception struct, plus a capOS-only
invalidArgument variant added with the Scheduler Phase D Task 1
schema slice (commit cb8c58b1, 2026-05-07). Capnp’s upstream Exception.Type
remains a closed four-value set; capOS extends CapException because a
capability boundary that validates arguments needs a typed signal
distinct from failed. The five types describe client response
strategy:
- failed – deterministic failure on the callee side, retrying
won’t help. Covers invariant violations, deserialization errors, and
any
capnp::ErrorKindvariant not in the other categories. As of the Phase D Task 1 slice, callee-side argument rejection no longer maps here – new caps that validate inputs at the cap boundary should returninvalidArgumentinstead. - overloaded – temporary resource exhaustion (out of frames, table full). Client may retry with backoff.
- invalidArgument – the request was syntactically a well-formed
capnp message but a parameter value violated the cap’s documented
contract (e.g.
SchedulingPolicyCap.setWeightrejectingweight = 0or values outside[MIN_WEIGHT, MAX_WEIGHT]). The kernel does not silently clamp; the caller is expected to fix its input and retry, not back off. Today this is signalled by kernel cap modules through a small sentinel-prefix channel inkernel/src/cap/ring.rs(INVALID_ARGUMENT_SENTINEL) because capnp 0.25 has noErrorKind::InvalidArgumentand the enum is#[non_exhaustive]. The dispatcher strips the sentinel before serializing theCapExceptionso the wire form is identical to the four upstream-aligned variants. - disconnected – the capability’s backing resource is gone (device removed, process exited). Client should re-acquire the capability.
- unimplemented – unknown method ID for this interface. Client should not retry.
The message field is a human-readable string for diagnostics/logging.
It must not contain security-sensitive information (internal pointers, kernel
addresses) since it crosses the kernel-user boundary.
Application-Level Errors in Interface Schemas
Following Cap’n Proto’s philosophy, expected error conditions that a caller should handle programmatically belong in the method return type, not in the exception mechanism.
Example – FrameAllocator can legitimately run out of memory:
struct AllocResult {
union {
ok @0 :UInt16; # result-cap handle index for a MemoryObject
outOfMemory @1 :Void;
}
}
interface FrameAllocator {
allocFrame @0 () -> (result :AllocResult);
allocContiguous @1 (count :UInt32) -> (result :AllocResult);
}
The caller can pattern-match on the result union without parsing an exception. This is the Zircon/FIDL model: transport errors at the syscall layer, application errors as typed return values.
When to use each:
| Situation | Mechanism |
|---|---|
| Bad cap ID, stale generation, bad buffer | Transport error (syscall return code) |
| Deserialization failure, unknown method | CapException with failed/unimplemented |
| Temporary resource exhaustion in dispatch | CapException with overloaded |
| Expected domain-specific error | Union in method return type |
| Bug in capability implementation | CapException with failed |
Kernel Implementation
CapObject trait change
The ring SQE does not carry a caller-supplied interface ID. The trait shape below keeps interface selection out of capability implementations because each capability entry owns one public interface:
#![allow(unused)]
fn main() {
pub trait CapObject: Send + Sync {
fn interface_id(&self) -> u64;
fn label(&self) -> &str;
fn call(
&self,
method_id: u16,
params: &[u8],
result: &mut [u8],
reply_scratch: &mut dyn ReplyScratch,
) -> capnp::Result<CapInvokeResult>;
}
}
Implementations serialize directly into the caller’s result buffer and return
a completion containing the number of bytes written, or Pending for async
endpoint calls. Dispatch uses the interface assigned to the target capability
entry; normal CALL SQEs do not need to repeat that interface ID. capnp::Error
carries ErrorKind with the four RPC exception types. The kernel’s dispatch
handler converts Err(capnp::Error) into a serialized CapException message
and writes it to the result buffer.
Syscall handler changes
In cap_call(), the error path changes from:
#![allow(unused)]
fn main() {
Err(e) => {
kprintln!("cap_call: ... error: {}", e);
u64::MAX
}
}
to:
#![allow(unused)]
fn main() {
Err(CapError::NotFound) => ECAP_NOT_FOUND,
Err(CapError::StaleGeneration) => ECAP_NOT_FOUND,
Err(CapError::InvokeError(e)) => {
// Serialize CapException to result buffer
let exception_bytes = serialize_cap_exception(&e);
if result_ptr != 0 && result_capacity >= exception_bytes.len() {
copy_to_user(result_ptr, &exception_bytes);
ECAP_APPLICATION_ERROR
} else {
ECAP_APPLICATION_ERROR_NO_BUFFER
}
}
}
The serialize_cap_exception function maps capnp::ErrorKind to
ExceptionType:
capnp::ErrorKind | ExceptionType |
|---|---|
Failed | failed |
Overloaded | overloaded |
Disconnected | disconnected |
Unimplemented | unimplemented |
| All other variants (deserialization, validation) | failed |
This matches how capnp-rpc maps exceptions to the wire format.
Userspace API
The init crate (and future userspace libraries) wraps cap_call in a
helper that interprets the return value:
#![allow(unused)]
fn main() {
pub enum CapCallResult {
Ok(Vec<u8>),
Exception(ExceptionType, String),
TransportError(TransportError),
}
pub enum TransportError {
InvalidCapability,
InvalidBuffer,
ParamsTooLarge,
}
pub fn cap_call(
cap_id: u32,
method_id: u16,
params: &[u8],
result_buf: &mut [u8],
) -> CapCallResult {
let ret = sys_cap_call(cap_id, method_id, params, result_buf);
match ret {
ECAP_NOT_FOUND => CapCallResult::TransportError(TransportError::InvalidCapability),
ECAP_BAD_BUFFER => CapCallResult::TransportError(TransportError::InvalidBuffer),
ECAP_PARAMS_TOO_LARGE => CapCallResult::TransportError(TransportError::ParamsTooLarge),
ECAP_APPLICATION_ERROR => {
let (typ, msg) = deserialize_cap_exception(result_buf);
CapCallResult::Exception(typ, msg)
}
ECAP_APPLICATION_ERROR_NO_BUFFER => {
CapCallResult::Exception(ExceptionType::Failed, String::new())
}
n => CapCallResult::Ok(result_buf[..n as usize].to_vec()),
}
}
}
Future: Batched Calls
When capOS adds batched capability invocations (async rings, pipelining), each request in the batch gets its own result status. The same two-level model applies per-request:
- Transport error for the batch envelope (invalid ring descriptor, bad capability table) fails the whole batch.
- Per-request transport errors (individual bad cap_id) fail that request.
- Application errors are per-request, written to each request’s result slot.
This matches how NFS compound operations and JSON-RPC batch requests work: a transport error on the batch vs per-operation results.
What This Does NOT Cover
- Error logging/tracing infrastructure. How errors get collected,
aggregated, or displayed is a separate concern, owned by
docs/proposals/system-monitoring-proposal.md. The kernel currently prints to serial; a futureErrorLog/ audit-log capability captures structured error streams there. - Retry policy. The
ExceptionTypehints at retry strategy (overloaded -> retry, failed -> don’t, invalidArgument -> fix input and retry), but the retry logic itself belongs in userspace libraries, not the kernel. - Error propagation across capability chains. When capability A calls
capability B which calls capability C, and C fails – how does the error
propagate back through A? The single-hop transport-vs-application split is
defined here; the cross-process spawn and endpoint-return surface that
exercises it end-to-end is owned by
docs/proposals/service-architecture-proposal.mdtogether with theCAP_SQE_RETURN_APPLICATION_EXCEPTIONshape incapos-config/src/ring.rs. - Result-buffer sizing. Truncation of serialized
CapExceptionpayloads when callers under-size their result buffer is tracked as R15 indocs/design-risks-register.md. The per-processringScratchLimitBytesandreplyScratchLimitBytesresource-profile fields now bound the reply scratch used at both serialization call sites, eliminating spurious TRUNCATED results for constrained processes. Each cap contract should still document its expected result-buffer capacity rather than relying on truncation behavior. - Deferred release vs revocation. Owned-handle Drop in
capos-rtenqueuesCAP_OP_RELEASErather than running synchronously; resource- pressure or revocation-sensitive flows that depend on aDisconnectedsurface must follow R6 indocs/design-risks-register.mdand preferCapabilityManager.revokeor epoch revocation rather than relying on Drop ordering. - Transactional semantics. Whether a failed operation has side effects
(partial writes, allocated-but-not-returned frames) is per-capability
semantics, not a kernel-level concern. The transfer-rollback boundary
carried by
CAP_ERR_TRANSFER_ABORTEDis the only transport-level all-or-nothing guarantee.
Migration Path
Phase 1: Transport error codes (minimal, no schema changes)
Change cap_call to return distinct error codes instead of u64::MAX for
all failures. Update the init crate to interpret them. No new schema types
needed – application errors still use u64::MAX - 3 but without a structured
payload (treated as opaque failure).
This is backward-compatible: existing userspace code that checks == u64::MAX
sees different values for different errors, but any >= u64::MAX - 255 check
catches all errors.
Phase 2: CapException serialization
Add ExceptionType and CapException to the schema. Implement
serialize_cap_exception in the kernel. Update init to deserialize and
display errors. Now userspace gets the exception type and message string.
Phase 3: Per-interface application errors
As interfaces mature, add typed error unions to method return types for
expected error conditions. FrameAllocator::allocFrame returns
AllocResult instead of bare UInt64. The exception mechanism remains for
unexpected failures.
Design Rationale
Why mirror capnp RPC’s Exception type instead of inventing our own?
Cap’n Proto already defines a well-thought-out exception taxonomy. The four
types (failed, overloaded, disconnected, unimplemented) map directly to
capnp::ErrorKind in Rust. Using the same vocabulary means capOS capabilities
can eventually participate in capnp RPC networks without translation. It also
means the Rust compiler enforces exhaustive matching on ErrorKind variants
that matter.
Why not put error codes in the syscall return value only (like seL4)?
seL4’s 11 error codes work because seL4 kernel objects are simple and
fixed-function. capOS capabilities are arbitrary typed interfaces – a file
system, a network stack, a GPU driver. The error vocabulary is open-ended.
Encoding all possible errors as syscall return values would either require an
ever-growing enum (fragile) or lose information (back to errno’s problems).
The capnp-serialized CapException in the result buffer gives unbounded
expressiveness without changing the syscall ABI.
Why not use capnp exceptions for everything (skip the transport error codes)?
Because transport errors happen before the capability is reached. There’s
no CapObject to serialize an exception. The kernel would have to synthesize
a capnp message on behalf of a non-existent capability, which is wasteful and
semantically wrong. A small integer return code is cheaper and more honest
about what happened.
Why not define a generic Result(Ok) wrapper in the schema?
Cap’n Proto generics only bind to pointer types (Text, Data, structs, lists,
interfaces), not to primitives (UInt32, Bool). A Result(UInt64) for
allocFrame wouldn’t work. Per-method result structs with unions are more
flexible and don’t hit this limitation. The cost is a bit more schema
boilerplate, which is acceptable given that capOS has a small number of
interfaces.
Why string-based messages (like Plan 9) instead of structured error fields?
String messages are adequate for diagnostics and logging. Structured error
data belongs in the typed return unions (Phase 3), where the schema enforces
what fields exist. Putting structured data in CapException would duplicate
the schema’s job and encourage using exceptions for flow control, which
Cap’n Proto explicitly warns against.