# Cap'n Proto Error Handling: Research Notes

Research on how Cap'n Proto handles errors at the protocol, schema, and Rust
crate levels. Used as input for the capOS error handling proposal.

---

## 1. Protocol-Level Exception Model (rpc.capnp)

The Cap'n Proto RPC protocol defines an `Exception` struct used in three
positions: `Message.abort`, `Return.exception`, and `Resolve.exception`.

```capnp
struct Exception {
  reason @0 :Text;
  type @3 :Type;
  enum Type {
    failed @0;        # deterministic bug/invalid input; retrying won't help
    overloaded @1;    # temporary lack of resources; retry with backoff
    disconnected @2;  # connection to necessary capability was lost
    unimplemented @3; # server doesn't implement the method
  }
  obsoleteIsCallersFault @1 :Bool;
  obsoleteDurability @2 :UInt16;
  trace @4 :Text;     # stack trace from the remote server
}
```

The four exception types describe **client response strategy**, not error
semantics:

| Type | Client response |
|------|----------------|
| `failed` | Log and propagate. Don't retry. |
| `overloaded` | Retry with exponential backoff. |
| `disconnected` | Re-establish connection, retry. |
| `unimplemented` | Fall back to alternative methods. |

## 2. Rust capnp Crate (v0.25.x)

### Core error types

```rust
pub type Result<T> = ::core::result::Result<T, Error>;

#[derive(Debug, Clone)]
pub struct Error {
    pub kind: ErrorKind,
    pub extra: String,  // human-readable description (requires `alloc`)
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[non_exhaustive]
pub enum ErrorKind {
    // Four RPC-mapped kinds (match Exception.Type)
    Failed,
    Overloaded,
    Disconnected,
    Unimplemented,

    // Wire format validation errors (~40 more variants)
    BufferNotLargeEnough,
    EmptyBuffer,
    MessageContainsOutOfBoundsPointer,
    MessageIsTooDeeplyNested,
    ReadLimitExceeded,
    TextContainsNonUtf8Data(core::str::Utf8Error),
    // ... etc
}
```

Constructor functions: `Error::failed(s)`, `Error::overloaded(s)`,
`Error::disconnected(s)`, `Error::unimplemented(s)`.

The `NotInSchema(u16)` type handles unknown enum values or union
discriminants.

### std::io::Error mapping

When `std` feature is enabled, `From<std::io::Error>` maps:
- `TimedOut` -> `Overloaded`
- `BrokenPipe`/`ConnectionRefused`/`ConnectionReset`/`ConnectionAborted`/`NotConnected` -> `Disconnected`
- `UnexpectedEof` -> `PrematureEndOfFile`
- Everything else -> `Failed`

## 3. capnp-rpc Rust Crate Error Mapping

Bidirectional conversion between wire `Exception` and `capnp::Error`:

**Sending (Error -> Exception):**
```rust
fn from_error(error: &Error, mut builder: exception::Builder) {
    let typ = match error.kind {
        ErrorKind::Failed => exception::Type::Failed,
        ErrorKind::Overloaded => exception::Type::Overloaded,
        ErrorKind::Disconnected => exception::Type::Disconnected,
        ErrorKind::Unimplemented => exception::Type::Unimplemented,
        _ => exception::Type::Failed,  // all validation errors -> Failed
    };
    builder.set_type(typ);
    builder.set_reason(&error.extra);
}
```

**Receiving (Exception -> Error):**
Maps `exception::Type` back to `ErrorKind`, preserving the reason string.

Server traits return `Promise<(), capnp::Error>`. Client gets
`Promise<Response<Results>, capnp::Error>`.

## 4. Cap'n Proto Error Handling Philosophy

From KJ library documentation and Kenton Varda:

> "KJ exceptions are meant to express unrecoverable problems or logistical
> problems orthogonal to the API semantics; they are NOT intended to be used
> as part of your API semantics."

> "In the Cap'n Proto world, 'checked exceptions' (where an interface
> explicitly defines the exceptions it throws) do NOT make sense."

**Exceptions**: infrastructure failures (network down, bug, overload).
**Application errors**: should be modeled in the schema return types.

## 5. Schema Design Patterns for Application Errors

### Generic Result pattern

```capnp
struct Error {
    code @0 :UInt16;
    message @1 :Text;
}

struct Result(Ok) {
    union {
        ok @0 :Ok;
        err @1 :Error;
    }
}

interface MyService {
    doThing @0 (input :Text) -> (result :Result(Text));
}
```

**Constraint**: generic type parameters bind only to pointer types (Text,
Data, structs, lists, interfaces), not primitives (UInt32, Bool). So
`Result(UInt64)` doesn't work -- need a wrapper struct.

### Per-method result unions

```capnp
interface FileSystem {
    open @0 (path :Text) -> (result :OpenResult);
}

struct OpenResult {
    union {
        file @0 :File;
        notFound @1 :Void;
        permissionDenied @2 :Void;
        error @3 :Text;
    }
}
```

Unions must be embedded in structs (no free-standing unions). This allows
adding new fields later without breaking compatibility.

## 6. How Other Cap'n Proto Systems Handle Errors

### Sandstorm

Uses the exception mechanism for infrastructure errors. Capabilities report
errors through disconnection. The `grain.capnp` schema does not define
explicit error types. `util.capnp` documents errors as "It will throw an
exception if any error occurs."

### Cloudflare Workers (workerd)

Uses Cap'n Proto for internal RPC. JavaScript `Error.message` and
`Error.name` are preserved across RPC; stack traces and custom properties
are stripped. Does not model errors in capnp schema -- relies on exception
propagation.

### OCapN (Open Capability Network)

Adopted the same four-kind exception model for cross-system compatibility.
Diagnostic information is non-normative. Security concern: exception objects
may leak sensitive information (stack traces, paths) at CapTP boundaries.

Kenton Varda expressed reservations about `unimplemented` (ambiguity about
whether the direct method or callees failed) and `disconnected` (requires
catching at specific stack frames for meaningful retry).

## 7. Relevance to capOS

capOS uses the `capnp` crate but not `capnp-rpc`. Manual dispatch goes through
`CapObject::call()` with caller-provided params/result buffers. Current error
handling:

- `capnp::Error::failed()` for semantic errors
- `capnp::Error::unimplemented()` for unknown methods
- `?` for deserialization errors (naturally produce `capnp::Error`)
- Transport errors become negative CQE result codes (`CAP_ERR_INVALID_REQUEST`,
  `CAP_ERR_INVALID_PARAMS_BUFFER`, `CAP_ERR_INVALID_RESULT_BUFFER`,
  `CAP_ERR_INVOKE_FAILED`, `CAP_ERR_UNSUPPORTED_OPCODE`,
  `CAP_ERR_TRANSFER_NOT_SUPPORTED`, `CAP_ERR_TRANSFER_ABORTED`, etc.).
- Kernel-produced `CapException` values are serialized into result buffers for
  capability-level failures (`CAP_ERR_APPLICATION_EXCEPTION`) and decoded by
  `capos-rt`. If the result buffer is too small to hold the serialized
  `CapException`, the CQE result is `CAP_ERR_APPLICATION_EXCEPTION_TRUNCATED`
  instead. The per-process `ringScratchLimitBytes` manifest field bounds the
  kernel-side scratch allocation and makes this truncated path reachable for
  tightly constrained process profiles.

capOS extends the standard four-kind `ExceptionType` with a fifth variant,
`invalidArgument`, for capability-level argument validation failures. This
fifth kind has no capnp-rpc equivalent; it maps to `Failed` when converting
back to `capnp::ErrorKind` for logging.

The normative schema-author rule now lives in
[Error Handling](../proposals/error-handling-proposal.md): CQE status is for
ring/transport/kernel dispatch failure, `CapException` is for
capability-level infrastructure failure, and schema result unions are for normal
application/domain outcomes.

The `capnp::Error` type carries the information needed for `CapException`:
`kind` maps to `ExceptionType`, and `extra` maps to `message`.

---

## Sources

- Cap'n Proto RPC Protocol: https://capnproto.org/rpc.html
- Cap'n Proto C++ RPC: https://capnproto.org/cxxrpc.html
- Cap'n Proto Schema Language: https://capnproto.org/language.html
- Cap'n Proto FAQ: https://capnproto.org/faq.html
- KJ exception.h: https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/kj/exception.h
- rpc.capnp schema: https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/capnp/rpc.capnp
- OCapN error handling discussion: https://github.com/ocapn/ocapn/issues/10
- Cap'n Proto usage patterns: https://github.com/capnproto/capnproto/discussions/1849
- capnp-rpc Rust crate: https://crates.io/crates/capnp-rpc
- Cloudflare Workers RPC errors: https://developers.cloudflare.com/workers/runtime-apis/rpc/error-handling/
- Sandstorm util.capnp: https://docs.rs/crate/sandstorm/0.0.5/source/schema/util.capnp
