Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Debug and Trace Authority

How capOS should expose process-attach, capability-table inspection, ring-trace capture, and sampler/profiler authority to debuggers and maintenance tools without granting kernel privilege, ambient inspection rights, or a covert channel for authority transfer.

Problem

A capability OS whose security claim is “you can only access what you were explicitly granted” breaks silently if a debugger can attach to any process without authority. Unix ptrace is the canonical example: any process with sufficient Unix privilege can stop, inspect, and modify another process’s address space and register state, bypassing all higher-level access controls. capOS must not import that model.

At the same time, debugging real failures requires more than serial output. The existing debug_tap facility (kernel/src/debug_tap.rs) emits bounded SQE/CQE records to the emergency serial path at QEMU-only build time, but it has no userspace-facing capability, no consent protocol, no audit trail, and no scoping to a specific target. The measure feature adds benchmark-only TSC counters, also build-gated and operator-facing only. There is currently no capability-shaped debug/trace/profile surface at all.

This is a capability-model gap. Until it is filled, the only debugging tool is serial output and offline log inspection — useful for early kernel work, but insufficient once real service decomposition and cross-process interactions exist.

User Stories

  • An operator maintenance session needs to inspect which capabilities a stuck service holds, without being able to invoke any of them.
  • A developer investigating a failing smoke test wants a bounded record of the SQEs and CQEs the target process issued around the failure, decoded against the current schema.
  • A profiler tool needs sampled PC/stack snapshots of a running service at a configured frequency without stopping the service or holding a live breakpoint.
  • An agent-shell maintenance workflow needs to attach to a service granted to it by the authority broker, with that attach action recorded in the audit log.
  • A supervisor needs to assert that a debugged process cannot escalate its authority into other processes by virtue of being debugged.

Design Principles

  1. Attach is authority. Connecting a debug session to a process requires an explicit DebugSession capability. No ambient ptrace analog. The kernel does not hand out debug access on the basis of Unix UID or any implicit privilege.
  2. Consent is required. A DebugSession for a live process is obtained either by explicit owner consent (the process or its supervisor grants one), or through a broker-mediated maintenance session policy decision. Neither path is self-minted.
  3. Attach is audited. Every DebugSession creation and every inspection operation through it is an auditable event. The target process and the audit log both observe it.
  4. Snapshots are read-only. Cap-table and VM inspection through a debug session produce read-only snapshots. No capability in the snapshot is transferable to or activatable by the inspector. A debug session must not become a covert authority-transfer channel. The GDB-RSP prior art is a reminder that a full debugger is read/write authority over its target; in this design the read-only snapshot/trace surface (Phases 1-3) and any future read/write control (breakpoints, register writes, Phase 4) are distinct authorities. Write authority is a separately leased, stronger cap and never rides implicitly on the read-only DebugSession.
  5. Secrets and payload bytes are redacted by default. Cap-table snapshots expose names, interface IDs, and slot indices — not raw capability payloads, bearer tokens, or memory-mapped buffer contents. Payload capture requires a separately leased and stronger cap.
  6. A debugged process cannot escalate. A process being debugged must not thereby gain the ability to inspect or affect other processes. The debug session is scoped to one target; no cross-process read or call is admitted through it.
  7. Symbol resolution is bounded. Resolving a PC address to a symbol name requires access to a symbol table file or binary, not filesystem authority. Symbol resolution is a separate, explicitly scoped cap — not bundled into the basic debug session.
  8. Build gates are graduated. The debug_tap kernel facility stays behind cfg(feature = "debug_tap") for its current always-emit emergency-serial behavior. The userspace-facing DebugSession and RingTrace caps are not build-gated but are absent from production bootstrap CapSets; a broker may mint them only under an explicitly authorized maintenance session policy.

A DebugSession is created through one of two paths:

Owner consent. The target process’s supervisor or owner holds a ProcessHandle and can call a createDebugSession method on it to mint a DebugSession for the target. This is the normal developer workflow: the supervisor that spawned a service grants a debug session to a maintenance tool.

Broker-mediated maintenance session. The authority broker holds a restricted ability to mint DebugSession caps for processes within a maintenance session scope — for example, for an operator who has authenticated and whose session policy permits debugging named services. The broker records the grant as an audit event. Normal shells and user sessions do not receive this authority.

Neither path is self-minted. A process cannot mint a DebugSession for itself or for peers from ambient state. The kernel does not expose a DebugAll cap at bootstrap.

Attaching a DebugSession produces an audit record covering: the initiator session, the target pid and service name, the authority source (owner consent or broker grant), and the timestamp. The target process receives a notification at attach time if it has an active ring — not as a blocking gate, but as an observable event.

Proposed Interfaces

These are conceptual interfaces. They should not be added to schema/capos.capnp until a Phase 1 implementation slice needs them.

# Read-only snapshot of one capability slot in the target's cap table.
# Does not transfer or activate any authority.
struct CapSlotSnapshot {
  slotIndex   @0 :UInt32;
  interfaceId @1 :UInt64;   # capnp type ID; 0 if untyped or unknown
  methodCount @2 :UInt16;
  label       @3 :Text;     # kernel-assigned or schema-derived name
  state       @4 :Text;     # e.g. "live", "released", "pending-return"
}

# Read-only snapshot of the target's capability table.
# None of these slots are transferable to or callable by the inspector.
struct CapTableSnapshot {
  targetPid    @0 :UInt32;
  tick         @1 :UInt64;
  slots        @2 :List(CapSlotSnapshot);
  slotTotal    @3 :UInt32;
  slotUsed     @4 :UInt32;
  snapshotDrop @5 :UInt32;  # slots omitted due to budget/redaction
}

# A scoped debug session attached to one process.
interface DebugSession {
  # Read-only snapshot of the target's current capability table.
  capTableSnapshot @0 () -> (snapshot :CapTableSnapshot);

  # Arm a bounded ring-trace capture on the target.
  # Returns a RingTrace cap scoped to this session and target.
  armRingTrace @1 (maxRecords :UInt32, maxBytes :UInt32)
      -> (trace :RingTrace);

  # Read a bounded sampler record set for the target.
  # Returns PC/stack samples at the configured frequency without
  # stopping the target.
  armSampler @2 (intervalNs :UInt32, maxSamples :UInt32)
      -> (sampler :Sampler);

  # Detach. Further calls on this session are rejected.
  detach @3 () -> ();
}

# Bounded ring-trace cap, scoped to one DebugSession target.
interface RingTrace {
  # Drain buffered SQE/CQE records for the attached target.
  drain @0 (maxRecords :UInt32)
      -> (records :List(TraceRecord), complete :Bool, dropped :UInt64);

  # Disarm and release the capture buffer.
  release @1 () -> ();
}

# Sampler cap for sampled PC/stack snapshots.
interface Sampler {
  # Read the next available sample batch.
  read @0 (maxSamples :UInt32)
      -> (samples :List(SamplerRecord), dropped :UInt64);

  # Stop sampling and release the reservation.
  stop @1 () -> ();
}

struct SamplerRecord {
  tick         @0 :UInt64;
  pid          @1 :UInt32;
  pc           @2 :UInt64;
  # Shallow inline frames; bounded to avoid variable-length allocation
  # on the capture hot path.
  frames       @3 :List(UInt64);
  framesDrop   @4 :UInt8;  # frames omitted due to depth cap
}

TraceRecord is the same shape defined in docs/proposals/system-monitoring-proposal.md: tick, pid, opcode, cap_id, method_id, interface_id, result, flags, and an optional payload blob gated by a separately leased stronger cap.

Symbol and Source Boundary

Resolving a sampled PC address or a ring-trace cap_id to a human-readable symbol requires access to symbol tables and debug info, not filesystem authority. The design uses an explicit, scoped symbol-resolver cap:

  • A SymbolTable cap holds a read-only ELF DWARF/symbol section for one binary, loaded from a trusted source (boot package or signed artifact store).
  • The inspector passes a SymbolTable cap and a list of addresses; the resolver returns bounded name strings.
  • No arbitrary filesystem path traversal is admitted through this path.
  • SymbolTable is separately minted from DebugSession; holding a debug session does not imply symbol resolution authority, and holding a symbol table does not imply attach authority.

Symbol resolution is Phase 3+ work. Phase 1 produces raw addresses; offline host-side tools (e.g., addr2line on the kernel ELF) handle symbol lookup during the research phase.

Phasing

Phase 1 — DebugSession Attach and Cap-Table Snapshot (model-critical)

  • Define DebugSession, CapSlotSnapshot, and CapTableSnapshot in schema/capos.capnp.
  • Implement ProcessHandle.createDebugSession in the kernel, guarded by the existing ProcessHandle authority boundary. capOS uses process-level debug authority here because most current services are single-threaded; the seL4 per-TCB-cap prior art argues for deriving per-thread sessions from ThreadControl, the intended finer-grained follow-up once multi-threaded targets need it.
  • capTableSnapshot returns a bounded, redacted read-only snapshot of the target’s current cap table. No cap in the snapshot is transferable or callable.
  • Audit record emitted to AuditLog at attach and at each snapshot call.
  • No payload capture, no ring trace, no sampler in this phase.
  • Proof: a smoke test where a supervisor attaches a debug session to a child, calls capTableSnapshot, and verifies the snapshot fields against what the child was granted at spawn time. The audit log must contain the attach record.

Phase 2 — Ring Trace via DebugSession

  • Add armRingTrace and RingTrace to the schema and kernel.
  • Build on the existing debug_tap ring-capture record format (RingCaptureRecord in capos_config::ring), but route capture through the DebugSession authority rather than the always-emit emergency-serial path.
  • The RingTrace cap is scoped to the attached target; it cannot observe other processes.
  • Payload capture (includePayloadBytes) requires a separately presented stronger cap (not yet defined in Phase 2).
  • Disarming the RingTrace releases the capture buffer and emits an audit record.
  • Proof: extend the failing-call smoke from the Ring-as-Black-Box milestone (commit da5f5e9) to route capture through a DebugSession instead of the emergency serial path, and verify the drained records match the expected SQE/CQE sequence.

Phase 3 — Sampler Authority

  • Add armSampler and Sampler to the schema and kernel.
  • The sampler fires at a configured interval, captures PC and a bounded inline call frame, and buffers records for drain.
  • The target process is not stopped; sampler overhead is bounded by sample interval and buffer depth.
  • Relates to the System Performance Benchmarks proposal: a benchmark runner may arm a sampler before a workload and drain it after to produce a flamegraph, subject to the same audit and consent rules.
  • Symbol resolution is offline in this phase (host-side addr2line).

Phase 4 — Breakpoint, Single-Step, and Payload Capture (deferred)

Breakpoint and single-step authority has a much larger kernel surface than read-only snapshot and sampling. Payload capture risks exposing secrets. Both are deferred until the Phase 1–3 model is stable and the audit/consent infrastructure is proven.

When payload capture is added, it must:

  • require a separately leased PayloadCapture cap distinct from the base RingTrace cap;
  • be a separately audited grant;
  • carry a per-call byte budget enforced by the kernel.

Hazard Preflight

paging/MMIO: cap-table snapshots and ring-trace records read kernel state under existing locks. No new user-mapping or MMIO surface is introduced in Phase 1–3.

ABI: DebugSession, CapTableSnapshot, and RingTrace are new schema interfaces. Generated bindings must be refreshed via make generated-code-check before merging any Phase 1 branch.

authority transfer via snapshot: the critical invariant is that no CapSlotSnapshot entry can be used by the inspector to call or transfer a capability. The kernel must enforce that the snapshot data path does not return live cap references — only metadata fields (interface ID, label, state). This must be verified in the Phase 1 implementation review.

audit bypass: an inspector must not be able to suppress or delay audit records for its own actions. Audit writes must occur synchronously within the debug session dispatch path, not deferred.

covert timing channel: a sampler that returns precise timestamps could be used to extract timing side-channel information about a target service. The sampler tick field is clamped to PIT-resolution granularity in Phase 3 to reduce precision; finer clock access for profiling remains deferred.

Security Boundaries

  • A DebugSession holder can read snapshots of one target. It cannot call, transfer, or activate any capability belonging to the target.
  • A RingTrace holder can read ring metadata for one target. Payload bytes require a separate stronger cap.
  • A Sampler holder receives PC and bounded stack frames for one target. No memory-mapped content, no register state beyond PC.
  • None of these caps admit cross-process inspection. A DebugSession for process A cannot observe process B.
  • A debugged process remains subject to normal scheduler and capability enforcement. Being debugged does not grant the target any additional capability slots or authority.
  • Redaction applies at snapshot construction time, not at read time. The kernel constructs the redacted view; the inspector never sees the raw kernel state.

Non-Goals

  • No ambient ptrace-style process attach without authority.
  • No kernel debugger (GDB stub, JTAG) exposed as a userspace capability surface — those are operator boot-time tools, not capability-model components.
  • No replay semantics. Ring trace is inspection, not record/replay. Replay requires payload retention, timer modeling, and capability checkpoints; that is out of scope.
  • No cross-process or system-wide trace aggregation in this proposal. Aggregate trace is a monitoring concern covered by docs/proposals/system-monitoring-proposal.md.
  • No memory read/write through a debug session. Address-space inspection is a separate and stronger authority not proposed here.
  • No DebugSession self-grant. A process cannot debug itself through this interface.
  • No crash/exception observation here. A read-only ExceptionObserver cap (the Zircon task_create_exception_channel analog) for receiving crash notifications without debug-write authority is a separate, weaker authority owned by Crash Recovery and Supervision, not bundled into DebugSession.

Relevant Research and Prior Art

In-Tree Notes

  • Debug, Trace, and Profiling Authority is the dedicated prior-art survey for this proposal: GDB remote serial protocol, Linux ptrace/Yama, perf/CAP_PERFMON, Fuchsia handle-scoped debug_agent/zxdb, seL4 TCB-cap hardware debug, and Genode CPU-session GDB monitor, grounding the DebugSession/Sampler/exception-observer authority split against real sources.
  • docs/research/zircon.md documents Fuchsia’s handle model: handles are process-local references with a rights bitmask, there is no ambient authority, and a process can only interact with kernel objects through handles it holds. capOS draws the directly applicable lesson here — a DebugSession is a held capability, not an ambient privilege, and inspection of a target’s cap table is itself a distinct grantable authority rather than a side effect of holding a generic “debug” right. The note covers handle rights and transfer but not Fuchsia’s debug_agent/zxdb debugging service specifically; that service is now surveyed in the dedicated research note above (and summarized below).
  • docs/research/sel4.md records that seL4 has no in-kernel debug traps or thread-introspection mechanism in the verified configuration; debugging is pushed to userspace and the design constraints (typed authority, no ambient inspection) matter more than any debugger feature. capOS follows the same posture: keep the kernel surface to read-only snapshot and bounded capture, and route policy (who may attach, to what) to userspace consent and the broker.
  • docs/research/genode.md documents Genode’s session-and-label model, where every cross-component request carries a label and is mediated by a parent component. The applicable lesson is that attach authority should flow through the same parent/supervisor relationship that already governs spawning — a supervisor that holds a child’s ProcessHandle is the natural minter of a DebugSession for that child, mirroring Genode’s parent-mediated session routing rather than a global debugger service.
  • docs/research/completion-ring-threading.md grounds the io_uring-style SQ/CQ ring transport that the Phase 2 ring trace observes. The trace records the same SQE/CQE structures already captured by the kernel debug_tap facility (RingCaptureRecord in capos_config::ring); this proposal adds the authority and consent layer that the existing build-gated emergency-serial capture lacks.

External Precedent

  • GDB remote serial protocol (gdbserver). GDB separates the debugger front-end from a target-side stub that exposes register, memory, and breakpoint operations over a serial/TCP channel. The lesson for capOS is that the inspection surface can be a narrow, well-defined protocol object rather than ambient access — but full register/memory read-write is exactly the strong authority capOS defers to Phase 4 and keeps out of the read-only DebugSession.
  • Linux ptrace(2). ptrace is the canonical ambient-authority footgun: attach authority derives from Unix UID and the Yama ptrace_scope sysctl rather than from a held, transferable capability, and a successful attach grants register and full address-space read/write at once. This conflates “may observe” with “may control” and bypasses higher-level access controls. capOS rejects this directly — DebugSession attach is owner-consented or broker-granted, audited, and read-only; observation and control are separate authorities.
  • Linux perf and eBPF tracing. Sampled profiling and tracing on Linux sit behind privilege boundaries (perf_event_paranoid, CAP_PERFMON/CAP_BPF) precisely because PC/stack sampling and kernel-wide tracing leak timing and topology information across trust boundaries. capOS treats the same risk as a capability and an audit event: the Sampler cap is scoped to one consented target, its timestamp resolution is clamped, and arming it is recorded.
  • Fuchsia debug_agent / zxdb. Fuchsia’s debugger is a userspace service (debug_agent) that the zxdb front-end drives; it operates on process and thread handles rather than ambient privilege, consistent with Zircon’s object-capability model. This is the closest external precedent for capOS’s intended shape — debugging as a handle/capability-mediated service, not a kernel-ambient right. A dedicated in-tree note on the debug_agent design is research-needed per the docs/backlog/research-design-gaps.md convention before the Phase 4 breakpoint/single-step surface is designed.
  • Object-capability systems generally. Capability systems avoid an ambient ptrace analog because there is no global principal that implicitly dominates other processes; the authority to inspect must be granted like any other capability. This is the structural reason capOS can offer debugging without reintroducing ambient authority, and why the consent and audit requirements in this proposal are load-bearing rather than optional hardening.

Relevant Proposals

  • System Monitoring (system-monitoring-proposal.md): owns aggregate ring traces (TraceCapture), log/metric/audit signal taxonomy, and the RingTap move-semantics note for payload-capturing taps. This proposal owns the per-process debug attach authority and consent model that monitoring’s trace surfaces do not cover. TraceRecord schema is shared; authority and consent model is separate.
  • Security and Verification (security-and-verification-proposal.md): the trust-boundary inventory (Track S.7) must be updated to include DebugSession, RingTrace, Sampler, and CapTableSnapshot as new boundaries before downstream services rely on them.
  • System Performance Benchmarks (system-performance-benchmarks-proposal.md): benchmark runners may arm a Sampler before a workload run; this proposal defines the authority and consent model for that use.
  • Task State and Agent Telemetry (task-state-and-agent-telemetry-proposal.md): agent maintenance sessions may use DebugSession to inspect service state; telemetry records that fact.