# Proposal: Schema Registry Capability

Cap'n Proto is self-describing. When the compiler processes `schema/capos.capnp`
it emits a `CodeGeneratorRequest` containing every interface id, every method
name and ordinal, every parameter and result struct layout, every enum, and
every doc comment. That machine-readable reflection data exists today; it just
is not served at runtime. This proposal defines a `SchemaRegistry` capability
that serves it.

**Status:** Proposal. No implementation. The prerequisite work -- schema
doc-comment authoring across `schema/capos.capnp` and preservation of those
comments in the generated-bindings pipeline -- is tracked separately and is
also a prerequisite for the [System Manual](system-manual-proposal.md) Phase 3.
This proposal records the design and its authority model so it can be built
once those prerequisites land.

## Problem

Every capability interface in capOS has a precise machine-readable definition:
method names, ordinals, parameter struct field names and types, result struct
layouts, enums. Today that information lives only in the host-side compiler
output, the checked-in generated Rust bindings, and in the heads of developers
who have read the schema. A running capOS instance cannot answer:

- "What methods does this interface expose?"
- "What ordinal does `listMethods` map to?"
- "What fields does the parameter struct for `resolveMethod` contain?"

This gap affects three categories of caller:

1. **Interactive shell.** A user typing `call @cap.method(args)` in the capOS
   shell wants the shell to resolve the human method name to an ordinal, check
   argument types against the parameter struct schema, encode the capnp message,
   dispatch the call, and decode the result -- all without requiring the user to
   have memorized ordinals or wire layouts.
2. **Dynamic and agent-driven callers.** A process or agent that receives a
   capability without compile-time bindings cannot easily discover what methods
   are available. Today it must carry out-of-band schema knowledge or guess. A
   machine-readable registry eliminates that gap.
3. **Cross-language and network tooling.** A host-side tool connecting to a
   running capOS instance via the remote-session gateway needs schema metadata
   to encode and decode capnp messages without shipping language-specific
   generated bindings for every possible capability type.

## What Cap'n Proto's Self-Description Provides

The capnp compiler's `CodeGeneratorRequest` contains:

- For **interfaces**: the 64-bit interface id, the interface name, and for each
  method: its ordinal (call slot number), method name, the type id of the
  parameter struct, and the type id of the result struct.
- For **structs**: the struct's 64-bit type id, name, and for each field: field
  name, ordinal slot, and the type of the value (a primitive, a struct type id,
  a list, a capability type id, etc.).
- For **enums**: the enum type id, name, and for each enumerant: name and
  numeric value.
- **Doc comments**: the raw doc-comment text attached to interfaces, methods,
  structs, and fields. These are preserved in the `CodeGeneratorRequest` when
  the compiler receives them; the current generated-bindings pipeline strips
  them. Preserving them is a tracked prerequisite.

The registry bakes this data into a boot-packaged blob at `make` time, exactly
as the System Manual bakes its corpus. Both are read-only deliveries of
build-time information; neither reflects the live state of a running object.

## Relationship to the System Manual

The `SchemaRegistry` and the `Manual` capability share one substrate: the same
`CodeGeneratorRequest` blob baked at build time. They are two delivery modes of
that shared reflection data:

- **Manual** (see [system-manual-proposal.md](system-manual-proposal.md)):
  human prose delivery. It renders the schema into `man(2)`-style interface
  pages with `SYNOPSIS` sections generated from method signatures and
  `DESCRIPTION` sections from doc comments. It serves text, structured for
  human reading.
- **SchemaRegistry**: machine-readable metadata delivery. It serves structured
  `SchemaNode` values carrying interface ids, ordinals, type ids, and field
  layouts. It serves data, structured for programmatic consumption.

The two can share a service implementation that reads the same blob; the
interface shape differs because the consumers differ.

## Interface

```capnp
struct MethodInfo {
    ordinal      @0 :UInt16;    # call slot number
    name         @1 :Text;      # as written in the .capnp source
    paramTypeId  @2 :UInt64;    # type id of the parameter struct
    resultTypeId @3 :UInt64;    # type id of the result struct
    docComment   @4 :Text;      # empty until doc-comment prerequisite lands
}

struct FieldInfo {
    slot         @0 :UInt16;
    name         @1 :Text;
    typeKind     @2 :TypeKind;
    structTypeId @3 :UInt64;    # set when typeKind == struct
    docComment   @4 :Text;
}

enum TypeKind {
    void @0; bool @1; int8 @2; int16 @3; int32 @4; int64 @5;
    uint8 @6; uint16 @7; uint32 @8; uint64 @9; float32 @10; float64 @11;
    text @12; data @13; list @14; enum_ @15; struct_ @16; interface_ @17;
    anyPointer @18;
}

struct SchemaNode {
    typeId      @0 :UInt64;
    displayName @1 :Text;
    union {
        interface @2 :InterfaceSchema;
        struct_   @3 :StructSchema;
        enum_     @4 :EnumSchema;
        other     @5 :Void;
    }
}

struct InterfaceSchema {
    methods     @0 :List(MethodInfo);
    docComment  @1 :Text;
}

struct StructSchema {
    fields      @0 :List(FieldInfo);
    docComment  @1 :Text;
}

struct EnumSchema {
    enumerants  @0 :List(EnumerantInfo);
    docComment  @1 :Text;
}

struct EnumerantInfo {
    value       @0 :UInt16;
    name        @1 :Text;
}

struct SearchResult {
    typeId      @0 :UInt64;
    displayName @1 :Text;
    kind        @2 :Text;        # "interface", "struct", or "enum"
    snippet     @3 :Text;        # first line of doc comment, if present
}

interface SchemaRegistry {
    # Resolve a method name on an interface to its ordinal and struct type ids.
    resolveMethod @0 (interfaceId :UInt64, name :Text)
        -> (ordinal :UInt16, paramTypeId :UInt64, resultTypeId :UInt64);

    # Fetch the full schema node for a given type id.
    lookupType    @1 (typeId :UInt64) -> (node :SchemaNode);

    # List all methods on an interface (for discovery without a known name).
    listMethods   @2 (interfaceId :UInt64) -> (methods :List(MethodInfo));

    # Keyword search across names and doc comments in the baked blob.
    search        @3 (query :Text) -> (candidates :List(SearchResult));

    # The build/commit this schema blob was produced from.
    buildInfo     @4 () -> (commit :Text, builtAt :Text);
}
```

The interface is additive-only; future methods append at higher ordinals,
matching the convention already established by `SystemInfo` and `Manual`.

## Authority Model

This is the most important section of this proposal. The registry embodies
capOS Design Principle 4 -- "the interface IS the permission" -- from a
different angle:

**Discovery does not grant call authority.**

Holding a `SchemaRegistry` capability lets the caller learn the *shape* of an
interface: its method names, ordinals, parameter field names and types, and
doc comments. It does not grant the caller permission to invoke those methods.
To call `Console.writeLine`, the caller still needs a live `Console` capability
in its CapSet. The registry answers "what can this type do in general?" -- the
live capability answers "what can I do right now, and am I permitted to do it?"

This split is not a weakening of the capability model; it is the correct
expression of it. Consider the analogy: knowing that a bank offers a "transfer"
operation does not give you a bank account. Learning that `ProcessSpawner`
exposes a `spawn` method does not give you a `ProcessSpawner`.

Practical consequences:

- `SchemaRegistry` is **read-only** and holds no authority beyond serving
  schema metadata from the build-time blob. It is safe to grant to any process
  or agent that needs dynamic method discovery.
- The **kernel remains the validation trust boundary.** When a call arrives via
  the ring, the kernel dispatches it to the named `CapObject`. The registry is
  a client-side convenience for encoding the request correctly; the kernel
  validates the incoming message on dispatch and rejects malformed calls
  regardless of whether the caller used the registry.
- A caller that uses the registry to build a call message is still subject to
  the normal ring dispatch path. The registry cannot bypass or relax kernel
  validation.
- **Fail-safe.** If the registry is not granted, dynamic clients fall back to
  compile-time bindings or refuse to operate. The registry enhances
  ergonomics; it is not on the critical authority path.

## Data Source: Build-Time, Not Live-Object Reflection

The registry does not introspect a running object. It serves the static schema
baked from the `CodeGeneratorRequest` at build time. This has two consequences:

- **No live-object coupling.** The registry knows nothing about which
  capabilities are currently allocated, which processes hold them, or what
  runtime state a live capability has. It knows only what the schema says all
  instances of a given interface can do.
- **Blob freshness is tied to the build.** Like the System Manual blob,
  `buildInfo @4` carries the commit and build timestamp so a caller can tell
  which schema version is loaded. A running instance with a stale blob reflects
  that build's schema, not any live update.

## Where It Lives

`SchemaRegistry` is a userspace service backed by a boot-packaged schema blob,
consistent with the broader capOS policy of putting metadata and policy
enforcement in userspace while the kernel handles dispatch and isolation. The
implementation mirrors the System Manual service:

- At `make` time, a host tool reads the compiler's `CodeGeneratorRequest`
  output and produces a compact, read-only binary blob.
- The blob is packaged in the boot image alongside the manifest, delivered like
  `BootPackageCap` entries.
- A userspace `schema-registry` service reads the blob from the `BootPackage`,
  implements the `SchemaRegistry` interface, and is granted to processes that
  need it via the manifest cap grants or the `AuthorityBroker` bundle.

The shared blob between `Manual` and `SchemaRegistry` is a build artifact.
Whether they share a single service binary or run as two services consuming the
same blob is an implementation decision; the capability interface is the
boundary.

## Primary Use Cases

### Shell `call @cap.method(args)` dispatch

The shell receives a human-typed method invocation. To dispatch it:

1. Inspect the live capability to get its interface id (the `interface_id`
   surface is present today via `capos-lib/src/cap_table.rs`).
2. Call `SchemaRegistry.resolveMethod(interfaceId, methodName)` to get the
   ordinal, parameter type id, and result type id.
3. Use `lookupType(paramTypeId)` to get the parameter struct schema and validate
   or interactively prompt the user for each field.
4. Encode the capnp message with the resolved ordinal and parameter encoding.
5. Submit the call via the ring. On completion, use `lookupType(resultTypeId)`
   to decode the result message for display.

This eliminates the requirement for the shell to carry compile-time knowledge
of every capability interface ordinal and struct layout.

### Dynamic / Late-Bound Clients

A process or agent that receives a capability without compile-time bindings
can call `listMethods(interfaceId)` to enumerate what the interface supports,
then use `resolveMethod` for each call it intends to make. This enables
generic capability explorers, cross-version bridges, and agent-driven
automation that adapts to the interface rather than hardcoding ordinals.

### Cross-Language and Network Tooling

A host-side tool connecting to a running capOS instance via the remote-session
gateway fetches the schema blob via `lookupType` / `listMethods` calls relayed
through the remote session, and uses the result to encode and decode capnp
messages in any language that has a capnp parser. This decouples the tooling
from language-specific generated bindings.

### Schema-Driven Test Harnesses

A test harness can use the registry to enumerate all methods on an interface
and generate exerciser calls with synthetic arguments, validating that the
live capability handles all known methods without panicking -- a form of
schema-conformance fuzzing driven by the registry itself.

## Sequencing and Prerequisites

Two prerequisites are shared with the System Manual Phase 3:

1. **Doc-comment authoring in `schema/capos.capnp`.** The schema currently
   carries minimal doc comments. The `docComment` fields in the registry's
   schema nodes will be empty until this authoring work lands. The registry
   is still useful without doc comments -- method names, ordinals, and struct
   layouts are fully present -- but the schema-as-documentation story depends
   on this work.
2. **Doc-comment preservation in the generated-bindings pipeline.** The
   `tools/capnp-build` script currently strips doc comment text from the
   emitted Rust bindings. The registry's blob builder must read the raw
   `CodeGeneratorRequest` before that stripping occurs, so this prerequisite
   is about pipeline ordering, not a new tool.

The registry interface and blob format can be designed and the boot-packaging
infrastructure written before those prerequisites land; the `docComment` fields
start empty and are populated once the prerequisite lands.

## Relationship to Existing Proposals

- **System Manual** ([system-manual-proposal.md](system-manual-proposal.md)):
  the human-readable twin. Both share the `CodeGeneratorRequest` blob source;
  neither is a prerequisite of the other. They can be built in either order
  or together.
- **SystemInfo proposal** ([system-info-proposal.md](system-info-proposal.md)):
  SystemInfo provides scalar system facts; `SchemaRegistry` provides interface
  metadata. No overlap.
- **Interactive command surfaces** ([interactive-command-surface-proposal.md](interactive-command-surface-proposal.md)):
  a future typed `CommandSession` may use the registry to validate command
  arguments before dispatch.
- **Remote-session UI** ([remote-session-capset-client-proposal.md](remote-session-capset-client-proposal.md)):
  host-side tooling that relays capability calls through the remote session
  is a primary consumer of the registry's cross-language tooling use case.

## Open Questions

- **Blob sharing or dual instantiation?** The System Manual and Schema Registry
  share a blob source. Whether they are implemented as one service that exposes
  two capability interfaces or two separate services that each read the blob at
  startup is an implementation choice. Two interfaces, one service is the likely
  outcome; this should be decided when the first implementation starts.
- **Schema node format evolution.** As `schema/capos.capnp` evolves, the blob
  format must evolve with it. Whether the blob is a verbatim `CodeGeneratorRequest`
  wire encoding, a normalized subset, or a purpose-built indexed structure is a
  build-tool design question.
- **Search index.** The `search` method needs a keyword index built into the
  blob at `make` time rather than a linear scan. The index strategy (inverted
  index over name tokens and doc comment words) should be decided when the
  blob builder is implemented.

## Design Grounding

- Cap'n Proto reflection model and `CodeGeneratorRequest` wire format: `capnp`
  crate documentation and the capnp language reference.
- Interface id and `interface_id()` surface: `capos-lib/src/cap_table.rs`.
- Boot-packaged blob delivery pattern: `kernel/src/cap/boot_package.rs`.
- Shared substrate with the System Manual: [system-manual-proposal.md](system-manual-proposal.md),
  particularly the "schemaReflection source" section.
- Authority model grounding: `docs/capability-model.md` and Design Principle 4
  in `CLAUDE.md`.