OS Error Handling in Capability Systems: Research Notes
Research on error handling patterns in capability-based and microkernel operating systems. Used as input for the capOS error handling proposal.
1. seL4
Error Codes
seL4 defines 11 kernel error codes in errors.h:
typedef enum {
seL4_NoError = 0,
seL4_InvalidArgument = 1,
seL4_InvalidCapability = 2,
seL4_IllegalOperation = 3,
seL4_RangeError = 4,
seL4_AlignmentError = 5,
seL4_FailedLookup = 6,
seL4_TruncatedMessage = 7,
seL4_DeleteFirst = 8,
seL4_RevokeFirst = 9,
seL4_NotEnoughMemory = 10,
} seL4_Error;
Error Return Mechanism
- Capability invocations (kernel object operations) return
seL4_Errordirectly. - IPC messages use
seL4_MessageInfo_twithlabel,length,extraCaps,capsUnwrapped. Thelabelis copied unmodified – kernel doesn’t interpret it. - MR0 (Message Register 0) carries return codes for kernel object invocations
via
seL4_Call.
Error Propagation
Fault handler mechanism: each TCB has a fault endpoint capability. On fault (capability fault, VM fault, etc.):
- Kernel blocks the faulting thread.
- Kernel sends an IPC to the fault endpoint with fault-type-specific fields.
- Fault handler (separate process) receives, fixes, and replies.
- Kernel resumes the faulting thread.
Design Choices
seL4_NBSendon invalid capability: silently fails (prevents covert channels).seL4_Send/seL4_Callon invalid capability: returnsseL4_FailedLookup.- No application-level error convention – user servers choose their own protocol.
- Partial capability transfer: if some caps in a multi-cap transfer fail,
already-transferred caps succeed;
extraCapsreflects the successful count.
Sources
- seL4 errors.h: https://github.com/seL4/seL4/blob/master/libsel4/include/sel4/errors.h
- seL4 IPC tutorial: https://docs.sel4.systems/Tutorials/ipc.html
- seL4 fault handlers: https://docs.sel4.systems/Tutorials/fault-handlers.html
- seL4 API reference: https://docs.sel4.systems/projects/sel4/api-doc.html
2. Fuchsia / Zircon
zx_status_t
Signed 32-bit integer. Negative = error, ZX_OK (0) = success.
Categories:
| Category | Examples |
|---|---|
| General | ZX_ERR_INTERNAL, ZX_ERR_NOT_SUPPORTED, ZX_ERR_NO_RESOURCES, ZX_ERR_NO_MEMORY |
| Parameter | ZX_ERR_INVALID_ARGS, ZX_ERR_WRONG_TYPE, ZX_ERR_BAD_HANDLE, ZX_ERR_BUFFER_TOO_SMALL |
| State | ZX_ERR_BAD_STATE, ZX_ERR_NOT_FOUND, ZX_ERR_TIMED_OUT, ZX_ERR_ALREADY_EXISTS, ZX_ERR_PEER_CLOSED |
| Permission | ZX_ERR_ACCESS_DENIED |
| I/O | ZX_ERR_IO, ZX_ERR_IO_REFUSED, ZX_ERR_IO_DATA_INTEGRITY, ZX_ERR_IO_DATA_LOSS |
FIDL Error Handling (Three Layers)
Layer 1: Transport errors. Channel broke. Currently all transport-level
FIDL errors close the channel. Client observes ZX_ERR_PEER_CLOSED.
Layer 2: Epitaphs (RFC-0053). Server sends a special final message
before closing a channel, explaining why. Wire format: ordinal 0xFFFFFFFF,
error status in the reserved uint32 of the FIDL message header. After
sending, server closes the channel.
Layer 3: Application errors (RFC-0060). Methods declare error types:
Method() -> (string result) error int32;
Serialized as:
union MethodReturn {
MethodResult result;
int32 err;
};
Error types constrained to int32, uint32, or an enum thereof. Deliberately
no standard error enum – each service defines its own error domain.
Rationale: standard error enums “try to capture more detail than we think is
appropriate.”
C++ binding: zx::result<T> (specialization of fit::result<zx_status_t, T>).
Sources
- Zircon errors: https://fuchsia.dev/fuchsia-src/concepts/kernel/errors
- RFC-0060 error handling: https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs/0060_error_handling
- RFC-0053 epitaphs: https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs/0053_epitaphs
3. EROS / KeyKOS / Coyotos
KeyKOS Invocation Message Format
KC (Key, Order_code)
STRUCTFROM(arg_structure)
KEYSFROM(arg_key_slots)
STRUCTTO(reply_structure)
KEYSTO(reply_key_slots)
RCTO(return_code_variable)
- Order code: small integer selecting the operation (method selector).
- Return code: integer returned by the invoked object via
RCTO. - Data string: bulk data parameter (up to ~4KB).
- Keys: up to 4 capability parameters in each direction.
Invocation Primitives
- CALL: send + block for reply. Kernel synthesizes a resume key (capability to resume caller) as 4th key parameter to callee.
- RETURN: reply using a resume key + go back to waiting.
- FORK: send and continue (fire-and-forget).
Keeper Error Handling
Every domain has a domain keeper slot. On hardware trap (illegal instruction, divide-by-zero, protection fault):
- Kernel invokes the keeper as if the domain had issued a CALL.
- Keeper receives fault information in the message.
- Keeper can fix and resume (via resume key) or terminate.
- A non-zero return code from a key invocation triggers the keeper mechanism.
Coyotos (EROS Successor) – Formalized Error Model
Cleanly separates invocation-level vs application-level exceptions:
Invocation-level (before the target processes the message):
MalformedSyscall, InvalidAddress, AccessViolation,
DataAccessTypeError, CapAccessTypeError, MalformedSpace,
MisalignedReference
Application-level: signaled via OPR0.ex flag bit in the reply control
word. If set, remaining parameter words contain a 64-bit exception code
plus optional info.
Sources
- KeyKOS architecture: https://dl.acm.org/doi/pdf/10.1145/858336.858337
- Coyotos spec: https://hydra-www.ietfng.org/capbib/cache/shapiro:coyotosspec.html
- EROS (SOSP 1999): https://sites.cs.ucsb.edu/~chris/teaching/cs290/doc/eros-sosp99.pdf
4. Plan 9 / 9P
9P2000 Rerror Format
size[4] Rerror tag[2] ename[s]
ename[s]: variable-length UTF-8 string describing the error.- No
Terrormessage – only servers send errors. - String-based, not numeric. Conventional strings (“permission denied”, “file not found”) but no fixed taxonomy.
9P2000.u Extension (Unix compatibility)
size[4] Rerror tag[2] ename[s] errno[4]
Adds a 4-byte Unix errno as a hint. Clients should prefer the string.
ERRUNDEF sentinel when Unix errno doesn’t apply.
Design Rationale
Avoids “errno fragmentation” where different Unix variants assign different numbers to the same condition. The string is authoritative; the number is an optimization for Unix-compatibility clients.
Sources
- 9P2000 RFC: https://ericvh.github.io/9p-rfc/rfc9p2000.html
- 9P2000.u RFC: https://ericvh.github.io/9p-rfc/rfc9p2000.u.html
5. Genode
RPC Exception Propagation
GENODE_RPC_THROW(func_type, ret_type, func_name,
GENODE_TYPE_LIST(Exception1, Exception2, ...),
arg_type...)
Only the exception type crosses the boundary – exception objects (fields,
messages) are not transferred. Server encodes a numeric Rpc_exception_code,
client reconstructs a default-constructed exception of the matching type.
Undeclared exceptions: undefined behavior (server crash or hung RPC).
Infrastructure-Level Errors
RPC_INVALID_OPCODE: dispatched operation code doesn’t match.Rpc_exception_code: integral type, computed asRPC_EXCEPTION_BASE - index_in_exception_list.Ipc_error: kernel IPC failure (server unreachable).- Server death: capabilities become invalid, subsequent invocations
produce
Ipc_error.
Sources
- Genode RPC: https://genode.org/documentation/genode-foundations/20.05/functional_specification/Remote_procedure_calls.html
- Genode IPC: https://genode.org/documentation/genode-foundations/23.05/architecture/Inter-component_communication.html
6. Cross-System Comparison: Transport vs Application Errors
Every capability/microkernel IPC system separates two failure modes:
-
Transport errors – the invocation mechanism failed before the target processed the request (bad handle, insufficient rights, target dead, malformed message, timeout).
-
Application errors – the service processed the request and returned a meaningful error (not found, resource exhausted, invalid operation).
| System | Transport errors | Application errors |
|---|---|---|
| seL4 | seL4_Error (11 values) from syscall | IPC message payload (user-defined) |
| Zircon | zx_status_t (~30 values) from syscall | FIDL per-method error type |
| EROS/Coyotos | Invocation exceptions (kernel) | OPR0.ex flag + code in reply |
| Plan 9 | Connection loss | Rerror with string |
| Genode | Ipc_error + RPC_INVALID_OPCODE | C++ exceptions via GENODE_RPC_THROW |
| Cap’n Proto RPC | disconnected/unimplemented | failed/overloaded or schema types |
Common pattern: small kernel error code set for transport + typed service-specific errors for application.
7. POSIX errno: Strengths and Weaknesses for Capability Systems
Strengths
- Simple (single integer, zero overhead on success).
- Universal (every Unix developer knows it).
- Low overhead (no allocation on error path).
Weaknesses for Capability Systems
- Ambient authority assumption:
EACCES/EPERMassume ACL-style access control. In capability systems, having the capability IS the permission. - Global flat namespace: all errors share one integer space. Capability systems have typed interfaces; errors should be scoped per-interface.
- No structured information: just an integer, no “which argument” or “how much memory needed.”
- Thread-local state: clobbered by intermediate calls, breaks down with async IPC or promise pipelining.
- No transport/application distinction:
EBADF(transport) andENOENT(application) in the same space. - Not composable across trust boundaries: callee’s errno meaningless in caller’s address space without explicit serialization.
No capability system uses a POSIX-style global errno namespace.