# Proposal: Debug and Trace Authority

How capOS should expose process-attach, capability-table inspection,
ring-trace capture, and sampler/profiler authority to debuggers and
maintenance tools without granting kernel privilege, ambient inspection
rights, or a covert channel for authority transfer.


## Problem

A capability OS whose security claim is "you can only access what you
were explicitly granted" breaks silently if a debugger can attach to
any process without authority. Unix `ptrace` is the canonical example:
any process with sufficient Unix privilege can stop, inspect, and
modify another process's address space and register state, bypassing
all higher-level access controls. capOS must not import that model.

At the same time, debugging real failures requires more than serial
output. The existing `debug_tap` facility (`kernel/src/debug_tap.rs`)
emits bounded SQE/CQE records to the emergency serial path at
QEMU-only build time, but it has no userspace-facing capability, no
consent protocol, no audit trail, and no scoping to a specific target.
The `measure` feature adds benchmark-only TSC counters, also
build-gated and operator-facing only. There is currently no
capability-shaped debug/trace/profile surface at all.

This is a capability-model gap. Until it is filled, the only
debugging tool is serial output and offline log inspection — useful
for early kernel work, but insufficient once real service decomposition
and cross-process interactions exist.


## User Stories

- An operator maintenance session needs to inspect which capabilities a
  stuck service holds, without being able to invoke any of them.
- A developer investigating a failing smoke test wants a bounded record
  of the SQEs and CQEs the target process issued around the failure,
  decoded against the current schema.
- A profiler tool needs sampled PC/stack snapshots of a running service
  at a configured frequency without stopping the service or holding a
  live breakpoint.
- An agent-shell maintenance workflow needs to attach to a service
  granted to it by the authority broker, with that attach action
  recorded in the audit log.
- A supervisor needs to assert that a debugged process cannot escalate
  its authority into other processes by virtue of being debugged.


## Design Principles

1. **Attach is authority.** Connecting a debug session to a process
   requires an explicit `DebugSession` capability. No ambient ptrace
   analog. The kernel does not hand out debug access on the basis of
   Unix UID or any implicit privilege.
2. **Consent is required.** A `DebugSession` for a live process is
   obtained either by explicit owner consent (the process or its
   supervisor grants one), or through a broker-mediated maintenance
   session policy decision. Neither path is self-minted.
3. **Attach is audited.** Every `DebugSession` creation and every
   inspection operation through it is an auditable event. The target
   process and the audit log both observe it.
4. **Snapshots are read-only.** Cap-table and VM inspection through a
   debug session produce read-only snapshots. No capability in the
   snapshot is transferable to or activatable by the inspector. A
   debug session must not become a covert authority-transfer channel.
   The GDB-RSP prior art is a reminder that a *full* debugger is
   read/write authority over its target; in this design the read-only
   snapshot/trace surface (Phases 1-3) and any future read/write control
   (breakpoints, register writes, Phase 4) are distinct authorities.
   Write authority is a separately leased, stronger cap and never rides
   implicitly on the read-only `DebugSession`.
5. **Secrets and payload bytes are redacted by default.** Cap-table
   snapshots expose names, interface IDs, and slot indices — not raw
   capability payloads, bearer tokens, or memory-mapped buffer
   contents. Payload capture requires a separately leased and stronger
   cap.
6. **A debugged process cannot escalate.** A process being debugged
   must not thereby gain the ability to inspect or affect other
   processes. The debug session is scoped to one target; no
   cross-process read or call is admitted through it.
7. **Symbol resolution is bounded.** Resolving a PC address to a
   symbol name requires access to a symbol table file or binary, not
   filesystem authority. Symbol resolution is a separate, explicitly
   scoped cap — not bundled into the basic debug session.
8. **Build gates are graduated.** The `debug_tap` kernel facility stays
   behind `cfg(feature = "debug_tap")` for its current always-emit
   emergency-serial behavior. The userspace-facing `DebugSession` and
   `RingTrace` caps are not build-gated but are absent from production
   bootstrap CapSets; a broker may mint them only under an explicitly
   authorized maintenance session policy.


## Authority and Consent

A `DebugSession` is created through one of two paths:

**Owner consent.** The target process's supervisor or owner holds a
`ProcessHandle` and can call a `createDebugSession` method on it to
mint a `DebugSession` for the target. This is the normal developer
workflow: the supervisor that spawned a service grants a debug session
to a maintenance tool.

**Broker-mediated maintenance session.** The authority broker holds a
restricted ability to mint `DebugSession` caps for processes within a
maintenance session scope — for example, for an operator who has
authenticated and whose session policy permits debugging named
services. The broker records the grant as an audit event. Normal
shells and user sessions do not receive this authority.

Neither path is self-minted. A process cannot mint a `DebugSession`
for itself or for peers from ambient state. The kernel does not expose
a `DebugAll` cap at bootstrap.

Attaching a `DebugSession` produces an audit record covering: the
initiator session, the target pid and service name, the authority
source (owner consent or broker grant), and the timestamp. The target
process receives a notification at attach time if it has an active
ring — not as a blocking gate, but as an observable event.


## Proposed Interfaces

These are conceptual interfaces. They should not be added to
`schema/capos.capnp` until a Phase 1 implementation slice needs them.

```capnp
# Read-only snapshot of one capability slot in the target's cap table.
# Does not transfer or activate any authority.
struct CapSlotSnapshot {
  slotIndex   @0 :UInt32;
  interfaceId @1 :UInt64;   # capnp type ID; 0 if untyped or unknown
  methodCount @2 :UInt16;
  label       @3 :Text;     # kernel-assigned or schema-derived name
  state       @4 :Text;     # e.g. "live", "released", "pending-return"
}

# Read-only snapshot of the target's capability table.
# None of these slots are transferable to or callable by the inspector.
struct CapTableSnapshot {
  targetPid    @0 :UInt32;
  tick         @1 :UInt64;
  slots        @2 :List(CapSlotSnapshot);
  slotTotal    @3 :UInt32;
  slotUsed     @4 :UInt32;
  snapshotDrop @5 :UInt32;  # slots omitted due to budget/redaction
}

# A scoped debug session attached to one process.
interface DebugSession {
  # Read-only snapshot of the target's current capability table.
  capTableSnapshot @0 () -> (snapshot :CapTableSnapshot);

  # Arm a bounded ring-trace capture on the target.
  # Returns a RingTrace cap scoped to this session and target.
  armRingTrace @1 (maxRecords :UInt32, maxBytes :UInt32)
      -> (trace :RingTrace);

  # Read a bounded sampler record set for the target.
  # Returns PC/stack samples at the configured frequency without
  # stopping the target.
  armSampler @2 (intervalNs :UInt32, maxSamples :UInt32)
      -> (sampler :Sampler);

  # Detach. Further calls on this session are rejected.
  detach @3 () -> ();
}

# Bounded ring-trace cap, scoped to one DebugSession target.
interface RingTrace {
  # Drain buffered SQE/CQE records for the attached target.
  drain @0 (maxRecords :UInt32)
      -> (records :List(TraceRecord), complete :Bool, dropped :UInt64);

  # Disarm and release the capture buffer.
  release @1 () -> ();
}

# Sampler cap for sampled PC/stack snapshots.
interface Sampler {
  # Read the next available sample batch.
  read @0 (maxSamples :UInt32)
      -> (samples :List(SamplerRecord), dropped :UInt64);

  # Stop sampling and release the reservation.
  stop @1 () -> ();
}

struct SamplerRecord {
  tick         @0 :UInt64;
  pid          @1 :UInt32;
  pc           @2 :UInt64;
  # Shallow inline frames; bounded to avoid variable-length allocation
  # on the capture hot path.
  frames       @3 :List(UInt64);
  framesDrop   @4 :UInt8;  # frames omitted due to depth cap
}
```

`TraceRecord` is the same shape defined in
`docs/proposals/system-monitoring-proposal.md`: tick, pid, opcode,
cap_id, method_id, interface_id, result, flags, and an optional
payload blob gated by a separately leased stronger cap.


## Symbol and Source Boundary

Resolving a sampled PC address or a ring-trace cap_id to a human-readable
symbol requires access to symbol tables and debug info, not filesystem
authority. The design uses an explicit, scoped symbol-resolver cap:

- A `SymbolTable` cap holds a read-only ELF DWARF/symbol section for
  one binary, loaded from a trusted source (boot package or signed
  artifact store).
- The inspector passes a `SymbolTable` cap and a list of addresses; the
  resolver returns bounded name strings.
- No arbitrary filesystem path traversal is admitted through this path.
- `SymbolTable` is separately minted from `DebugSession`; holding a
  debug session does not imply symbol resolution authority, and holding
  a symbol table does not imply attach authority.

Symbol resolution is Phase 3+ work. Phase 1 produces raw addresses;
offline host-side tools (e.g., `addr2line` on the kernel ELF) handle
symbol lookup during the research phase.


## Phasing

### Phase 1 — DebugSession Attach and Cap-Table Snapshot (model-critical)

- Define `DebugSession`, `CapSlotSnapshot`, and `CapTableSnapshot` in
  `schema/capos.capnp`.
- Implement `ProcessHandle.createDebugSession` in the kernel, guarded by
  the existing `ProcessHandle` authority boundary. capOS uses
  process-level debug authority here because most current services are
  single-threaded; the seL4 per-TCB-cap prior art argues for deriving
  per-thread sessions from `ThreadControl`, the intended finer-grained
  follow-up once multi-threaded targets need it.
- `capTableSnapshot` returns a bounded, redacted read-only snapshot of
  the target's current cap table. No cap in the snapshot is
  transferable or callable.
- Audit record emitted to `AuditLog` at attach and at each snapshot
  call.
- No payload capture, no ring trace, no sampler in this phase.
- Proof: a smoke test where a supervisor attaches a debug session to a
  child, calls `capTableSnapshot`, and verifies the snapshot fields
  against what the child was granted at spawn time. The audit log must
  contain the attach record.

### Phase 2 — Ring Trace via DebugSession

- Add `armRingTrace` and `RingTrace` to the schema and kernel.
- Build on the existing `debug_tap` ring-capture record format
  (`RingCaptureRecord` in `capos_config::ring`), but route capture
  through the `DebugSession` authority rather than the always-emit
  emergency-serial path.
- The `RingTrace` cap is scoped to the attached target; it cannot
  observe other processes.
- Payload capture (`includePayloadBytes`) requires a separately
  presented stronger cap (not yet defined in Phase 2).
- Disarming the `RingTrace` releases the capture buffer and emits an
  audit record.
- Proof: extend the failing-call smoke from the Ring-as-Black-Box
  milestone (commit `da5f5e9`) to route capture through a `DebugSession`
  instead of the emergency serial path, and verify the drained records
  match the expected SQE/CQE sequence.

### Phase 3 — Sampler Authority

- Add `armSampler` and `Sampler` to the schema and kernel.
- The sampler fires at a configured interval, captures PC and a bounded
  inline call frame, and buffers records for drain.
- The target process is not stopped; sampler overhead is bounded by
  sample interval and buffer depth.
- Relates to the System Performance Benchmarks proposal: a benchmark
  runner may arm a sampler before a workload and drain it after to
  produce a flamegraph, subject to the same audit and consent rules.
- Symbol resolution is offline in this phase (host-side `addr2line`).

### Phase 4 — Breakpoint, Single-Step, and Payload Capture (deferred)

Breakpoint and single-step authority has a much larger kernel surface
than read-only snapshot and sampling. Payload capture risks exposing
secrets. Both are deferred until the Phase 1–3 model is stable and
the audit/consent infrastructure is proven.

When payload capture is added, it must:
- require a separately leased `PayloadCapture` cap distinct from the
  base `RingTrace` cap;
- be a separately audited grant;
- carry a per-call byte budget enforced by the kernel.


## Hazard Preflight

**paging/MMIO**: cap-table snapshots and ring-trace records read kernel
state under existing locks. No new user-mapping or MMIO surface is
introduced in Phase 1–3.

**ABI**: `DebugSession`, `CapTableSnapshot`, and `RingTrace` are new
schema interfaces. Generated bindings must be refreshed via
`make generated-code-check` before merging any Phase 1 branch.

**authority transfer via snapshot**: the critical invariant is that no
`CapSlotSnapshot` entry can be used by the inspector to call or transfer
a capability. The kernel must enforce that the snapshot data path does
not return live cap references — only metadata fields (interface ID,
label, state). This must be verified in the Phase 1 implementation review.

**audit bypass**: an inspector must not be able to suppress or delay
audit records for its own actions. Audit writes must occur synchronously
within the debug session dispatch path, not deferred.

**covert timing channel**: a sampler that returns precise timestamps
could be used to extract timing side-channel information about a target
service. The sampler tick field is clamped to PIT-resolution granularity
in Phase 3 to reduce precision; finer clock access for profiling remains
deferred.


## Security Boundaries

- A `DebugSession` holder can read snapshots of one target. It cannot
  call, transfer, or activate any capability belonging to the target.
- A `RingTrace` holder can read ring metadata for one target. Payload
  bytes require a separate stronger cap.
- A `Sampler` holder receives PC and bounded stack frames for one
  target. No memory-mapped content, no register state beyond PC.
- None of these caps admit cross-process inspection. A `DebugSession`
  for process A cannot observe process B.
- A debugged process remains subject to normal scheduler and capability
  enforcement. Being debugged does not grant the target any additional
  capability slots or authority.
- Redaction applies at snapshot construction time, not at read time.
  The kernel constructs the redacted view; the inspector never sees
  the raw kernel state.


## Non-Goals

- No ambient `ptrace`-style process attach without authority.
- No kernel debugger (GDB stub, JTAG) exposed as a userspace capability
  surface — those are operator boot-time tools, not capability-model
  components.
- No replay semantics. Ring trace is inspection, not record/replay.
  Replay requires payload retention, timer modeling, and capability
  checkpoints; that is out of scope.
- No cross-process or system-wide trace aggregation in this proposal.
  Aggregate trace is a monitoring concern covered by
  `docs/proposals/system-monitoring-proposal.md`.
- No memory read/write through a debug session. Address-space
  inspection is a separate and stronger authority not proposed here.
- No `DebugSession` self-grant. A process cannot debug itself through
  this interface.
- No crash/exception observation here. A read-only `ExceptionObserver`
  cap (the Zircon `task_create_exception_channel` analog) for receiving
  crash notifications without debug-write authority is a separate, weaker
  authority owned by
  [crash-recovery-supervision-proposal.md](crash-recovery-supervision-proposal.md),
  not bundled into `DebugSession`.


## Relevant Research and Prior Art

### In-Tree Notes

- [`docs/research/debug-trace-authority.md`](../research/debug-trace-authority.md)
  is the dedicated prior-art survey for this proposal: GDB remote serial
  protocol, Linux `ptrace`/Yama, `perf`/`CAP_PERFMON`, Fuchsia handle-scoped
  `debug_agent`/`zxdb`, seL4 TCB-cap hardware debug, and Genode CPU-session GDB
  monitor, grounding the `DebugSession`/`Sampler`/exception-observer authority
  split against real sources.
- `docs/research/zircon.md` documents Fuchsia's handle model: handles
  are process-local references with a rights bitmask, there is no
  ambient authority, and a process can only interact with kernel
  objects through handles it holds. capOS draws the directly applicable
  lesson here — a `DebugSession` is a held capability, not an ambient
  privilege, and inspection of a target's cap table is itself a
  distinct grantable authority rather than a side effect of holding a
  generic "debug" right. The note covers handle rights and transfer but
  not Fuchsia's `debug_agent`/`zxdb` debugging service specifically; that
  service is now surveyed in the dedicated research note above (and summarized
  below).
- `docs/research/sel4.md` records that seL4 has no in-kernel debug traps
  or thread-introspection mechanism in the verified configuration;
  debugging is pushed to userspace and the *design constraints* (typed
  authority, no ambient inspection) matter more than any debugger
  feature. capOS follows the same posture: keep the kernel surface to
  read-only snapshot and bounded capture, and route policy (who may
  attach, to what) to userspace consent and the broker.
- `docs/research/genode.md` documents Genode's session-and-label model,
  where every cross-component request carries a label and is mediated by
  a parent component. The applicable lesson is that attach authority
  should flow through the same parent/supervisor relationship that
  already governs spawning — a supervisor that holds a child's
  `ProcessHandle` is the natural minter of a `DebugSession` for that
  child, mirroring Genode's parent-mediated session routing rather than
  a global debugger service.
- `docs/research/completion-ring-threading.md` grounds the io_uring-style
  SQ/CQ ring transport that the Phase 2 ring trace observes. The trace
  records the same SQE/CQE structures already captured by the kernel
  `debug_tap` facility (`RingCaptureRecord` in `capos_config::ring`);
  this proposal adds the authority and consent layer that the existing
  build-gated emergency-serial capture lacks.

### External Precedent

- **GDB remote serial protocol (`gdbserver`).** GDB separates the
  debugger front-end from a target-side stub that exposes register,
  memory, and breakpoint operations over a serial/TCP channel. The
  lesson for capOS is that the inspection surface can be a narrow,
  well-defined protocol object rather than ambient access — but full
  register/memory read-write is exactly the strong authority capOS
  defers to Phase 4 and keeps out of the read-only `DebugSession`.
- **Linux `ptrace(2)`.** `ptrace` is the canonical ambient-authority
  footgun: attach authority derives from Unix UID and the
  Yama `ptrace_scope` sysctl rather than from a held, transferable
  capability, and a successful attach grants register and full
  address-space read/write at once. This conflates "may observe" with
  "may control" and bypasses higher-level access controls. capOS
  rejects this directly — `DebugSession` attach is owner-consented or
  broker-granted, audited, and read-only; observation and control are
  separate authorities.
- **Linux `perf` and eBPF tracing.** Sampled profiling and tracing on
  Linux sit behind privilege boundaries (`perf_event_paranoid`,
  `CAP_PERFMON`/`CAP_BPF`) precisely because PC/stack sampling and
  kernel-wide tracing leak timing and topology information across
  trust boundaries. capOS treats the same risk as a capability and an
  audit event: the `Sampler` cap is scoped to one consented target, its
  timestamp resolution is clamped, and arming it is recorded.
- **Fuchsia `debug_agent` / `zxdb`.** Fuchsia's debugger is a userspace
  service (`debug_agent`) that the `zxdb` front-end drives; it operates
  on process and thread *handles* rather than ambient privilege,
  consistent with Zircon's object-capability model. This is the closest
  external precedent for capOS's intended shape — debugging as a
  handle/capability-mediated service, not a kernel-ambient right. A
  dedicated in-tree note on the `debug_agent` design is **research-needed**
  per the `docs/backlog/research-design-gaps.md` convention before the
  Phase 4 breakpoint/single-step surface is designed.
- **Object-capability systems generally.** Capability systems avoid an
  ambient `ptrace` analog because there is no global principal that
  implicitly dominates other processes; the authority to inspect must be
  granted like any other capability. This is the structural reason
  capOS can offer debugging without reintroducing ambient authority,
  and why the consent and audit requirements in this proposal are
  load-bearing rather than optional hardening.


## Relevant Proposals

- **System Monitoring** (`system-monitoring-proposal.md`): owns
  aggregate ring traces (`TraceCapture`), log/metric/audit signal
  taxonomy, and the `RingTap` move-semantics note for payload-capturing
  taps. This proposal owns the per-process *debug attach* authority
  and consent model that monitoring's trace surfaces do not cover.
  `TraceRecord` schema is shared; authority and consent model is
  separate.
- **Security and Verification** (`security-and-verification-proposal.md`):
  the trust-boundary inventory (Track S.7) must be updated to include
  `DebugSession`, `RingTrace`, `Sampler`, and `CapTableSnapshot` as
  new boundaries before downstream services rely on them.
- **System Performance Benchmarks**
  (`system-performance-benchmarks-proposal.md`): benchmark runners may
  arm a `Sampler` before a workload run; this proposal defines the
  authority and consent model for that use.
- **Task State and Agent Telemetry**
  (`task-state-and-agent-telemetry-proposal.md`): agent maintenance
  sessions may use `DebugSession` to inspect service state; telemetry
  records that fact.
