Proposal: Debug and Trace Authority
How capOS should expose process-attach, capability-table inspection, ring-trace capture, and sampler/profiler authority to debuggers and maintenance tools without granting kernel privilege, ambient inspection rights, or a covert channel for authority transfer.
Problem
A capability OS whose security claim is “you can only access what you
were explicitly granted” breaks silently if a debugger can attach to
any process without authority. Unix ptrace is the canonical example:
any process with sufficient Unix privilege can stop, inspect, and
modify another process’s address space and register state, bypassing
all higher-level access controls. capOS must not import that model.
At the same time, debugging real failures requires more than serial
output. The existing debug_tap facility (kernel/src/debug_tap.rs)
emits bounded SQE/CQE records to the emergency serial path at
QEMU-only build time, but it has no userspace-facing capability, no
consent protocol, no audit trail, and no scoping to a specific target.
The measure feature adds benchmark-only TSC counters, also
build-gated and operator-facing only. There is currently no
capability-shaped debug/trace/profile surface at all.
This is a capability-model gap. Until it is filled, the only debugging tool is serial output and offline log inspection — useful for early kernel work, but insufficient once real service decomposition and cross-process interactions exist.
User Stories
- An operator maintenance session needs to inspect which capabilities a stuck service holds, without being able to invoke any of them.
- A developer investigating a failing smoke test wants a bounded record of the SQEs and CQEs the target process issued around the failure, decoded against the current schema.
- A profiler tool needs sampled PC/stack snapshots of a running service at a configured frequency without stopping the service or holding a live breakpoint.
- An agent-shell maintenance workflow needs to attach to a service granted to it by the authority broker, with that attach action recorded in the audit log.
- A supervisor needs to assert that a debugged process cannot escalate its authority into other processes by virtue of being debugged.
Design Principles
- Attach is authority. Connecting a debug session to a process
requires an explicit
DebugSessioncapability. No ambient ptrace analog. The kernel does not hand out debug access on the basis of Unix UID or any implicit privilege. - Consent is required. A
DebugSessionfor a live process is obtained either by explicit owner consent (the process or its supervisor grants one), or through a broker-mediated maintenance session policy decision. Neither path is self-minted. - Attach is audited. Every
DebugSessioncreation and every inspection operation through it is an auditable event. The target process and the audit log both observe it. - Snapshots are read-only. Cap-table and VM inspection through a
debug session produce read-only snapshots. No capability in the
snapshot is transferable to or activatable by the inspector. A
debug session must not become a covert authority-transfer channel.
The GDB-RSP prior art is a reminder that a full debugger is
read/write authority over its target; in this design the read-only
snapshot/trace surface (Phases 1-3) and any future read/write control
(breakpoints, register writes, Phase 4) are distinct authorities.
Write authority is a separately leased, stronger cap and never rides
implicitly on the read-only
DebugSession. - Secrets and payload bytes are redacted by default. Cap-table snapshots expose names, interface IDs, and slot indices — not raw capability payloads, bearer tokens, or memory-mapped buffer contents. Payload capture requires a separately leased and stronger cap.
- A debugged process cannot escalate. A process being debugged must not thereby gain the ability to inspect or affect other processes. The debug session is scoped to one target; no cross-process read or call is admitted through it.
- Symbol resolution is bounded. Resolving a PC address to a symbol name requires access to a symbol table file or binary, not filesystem authority. Symbol resolution is a separate, explicitly scoped cap — not bundled into the basic debug session.
- Build gates are graduated. The
debug_tapkernel facility stays behindcfg(feature = "debug_tap")for its current always-emit emergency-serial behavior. The userspace-facingDebugSessionandRingTracecaps are not build-gated but are absent from production bootstrap CapSets; a broker may mint them only under an explicitly authorized maintenance session policy.
Authority and Consent
A DebugSession is created through one of two paths:
Owner consent. The target process’s supervisor or owner holds a
ProcessHandle and can call a createDebugSession method on it to
mint a DebugSession for the target. This is the normal developer
workflow: the supervisor that spawned a service grants a debug session
to a maintenance tool.
Broker-mediated maintenance session. The authority broker holds a
restricted ability to mint DebugSession caps for processes within a
maintenance session scope — for example, for an operator who has
authenticated and whose session policy permits debugging named
services. The broker records the grant as an audit event. Normal
shells and user sessions do not receive this authority.
Neither path is self-minted. A process cannot mint a DebugSession
for itself or for peers from ambient state. The kernel does not expose
a DebugAll cap at bootstrap.
Attaching a DebugSession produces an audit record covering: the
initiator session, the target pid and service name, the authority
source (owner consent or broker grant), and the timestamp. The target
process receives a notification at attach time if it has an active
ring — not as a blocking gate, but as an observable event.
Proposed Interfaces
These are conceptual interfaces. They should not be added to
schema/capos.capnp until a Phase 1 implementation slice needs them.
# Read-only snapshot of one capability slot in the target's cap table.
# Does not transfer or activate any authority.
struct CapSlotSnapshot {
slotIndex @0 :UInt32;
interfaceId @1 :UInt64; # capnp type ID; 0 if untyped or unknown
methodCount @2 :UInt16;
label @3 :Text; # kernel-assigned or schema-derived name
state @4 :Text; # e.g. "live", "released", "pending-return"
}
# Read-only snapshot of the target's capability table.
# None of these slots are transferable to or callable by the inspector.
struct CapTableSnapshot {
targetPid @0 :UInt32;
tick @1 :UInt64;
slots @2 :List(CapSlotSnapshot);
slotTotal @3 :UInt32;
slotUsed @4 :UInt32;
snapshotDrop @5 :UInt32; # slots omitted due to budget/redaction
}
# A scoped debug session attached to one process.
interface DebugSession {
# Read-only snapshot of the target's current capability table.
capTableSnapshot @0 () -> (snapshot :CapTableSnapshot);
# Arm a bounded ring-trace capture on the target.
# Returns a RingTrace cap scoped to this session and target.
armRingTrace @1 (maxRecords :UInt32, maxBytes :UInt32)
-> (trace :RingTrace);
# Read a bounded sampler record set for the target.
# Returns PC/stack samples at the configured frequency without
# stopping the target.
armSampler @2 (intervalNs :UInt32, maxSamples :UInt32)
-> (sampler :Sampler);
# Detach. Further calls on this session are rejected.
detach @3 () -> ();
}
# Bounded ring-trace cap, scoped to one DebugSession target.
interface RingTrace {
# Drain buffered SQE/CQE records for the attached target.
drain @0 (maxRecords :UInt32)
-> (records :List(TraceRecord), complete :Bool, dropped :UInt64);
# Disarm and release the capture buffer.
release @1 () -> ();
}
# Sampler cap for sampled PC/stack snapshots.
interface Sampler {
# Read the next available sample batch.
read @0 (maxSamples :UInt32)
-> (samples :List(SamplerRecord), dropped :UInt64);
# Stop sampling and release the reservation.
stop @1 () -> ();
}
struct SamplerRecord {
tick @0 :UInt64;
pid @1 :UInt32;
pc @2 :UInt64;
# Shallow inline frames; bounded to avoid variable-length allocation
# on the capture hot path.
frames @3 :List(UInt64);
framesDrop @4 :UInt8; # frames omitted due to depth cap
}
TraceRecord is the same shape defined in
docs/proposals/system-monitoring-proposal.md: tick, pid, opcode,
cap_id, method_id, interface_id, result, flags, and an optional
payload blob gated by a separately leased stronger cap.
Symbol and Source Boundary
Resolving a sampled PC address or a ring-trace cap_id to a human-readable symbol requires access to symbol tables and debug info, not filesystem authority. The design uses an explicit, scoped symbol-resolver cap:
- A
SymbolTablecap holds a read-only ELF DWARF/symbol section for one binary, loaded from a trusted source (boot package or signed artifact store). - The inspector passes a
SymbolTablecap and a list of addresses; the resolver returns bounded name strings. - No arbitrary filesystem path traversal is admitted through this path.
SymbolTableis separately minted fromDebugSession; holding a debug session does not imply symbol resolution authority, and holding a symbol table does not imply attach authority.
Symbol resolution is Phase 3+ work. Phase 1 produces raw addresses;
offline host-side tools (e.g., addr2line on the kernel ELF) handle
symbol lookup during the research phase.
Phasing
Phase 1 — DebugSession Attach and Cap-Table Snapshot (model-critical)
- Define
DebugSession,CapSlotSnapshot, andCapTableSnapshotinschema/capos.capnp. - Implement
ProcessHandle.createDebugSessionin the kernel, guarded by the existingProcessHandleauthority boundary. capOS uses process-level debug authority here because most current services are single-threaded; the seL4 per-TCB-cap prior art argues for deriving per-thread sessions fromThreadControl, the intended finer-grained follow-up once multi-threaded targets need it. capTableSnapshotreturns a bounded, redacted read-only snapshot of the target’s current cap table. No cap in the snapshot is transferable or callable.- Audit record emitted to
AuditLogat attach and at each snapshot call. - No payload capture, no ring trace, no sampler in this phase.
- Proof: a smoke test where a supervisor attaches a debug session to a
child, calls
capTableSnapshot, and verifies the snapshot fields against what the child was granted at spawn time. The audit log must contain the attach record.
Phase 2 — Ring Trace via DebugSession
- Add
armRingTraceandRingTraceto the schema and kernel. - Build on the existing
debug_tapring-capture record format (RingCaptureRecordincapos_config::ring), but route capture through theDebugSessionauthority rather than the always-emit emergency-serial path. - The
RingTracecap is scoped to the attached target; it cannot observe other processes. - Payload capture (
includePayloadBytes) requires a separately presented stronger cap (not yet defined in Phase 2). - Disarming the
RingTracereleases the capture buffer and emits an audit record. - Proof: extend the failing-call smoke from the Ring-as-Black-Box
milestone (commit
da5f5e9) to route capture through aDebugSessioninstead of the emergency serial path, and verify the drained records match the expected SQE/CQE sequence.
Phase 3 — Sampler Authority
- Add
armSamplerandSamplerto the schema and kernel. - The sampler fires at a configured interval, captures PC and a bounded inline call frame, and buffers records for drain.
- The target process is not stopped; sampler overhead is bounded by sample interval and buffer depth.
- Relates to the System Performance Benchmarks proposal: a benchmark runner may arm a sampler before a workload and drain it after to produce a flamegraph, subject to the same audit and consent rules.
- Symbol resolution is offline in this phase (host-side
addr2line).
Phase 4 — Breakpoint, Single-Step, and Payload Capture (deferred)
Breakpoint and single-step authority has a much larger kernel surface than read-only snapshot and sampling. Payload capture risks exposing secrets. Both are deferred until the Phase 1–3 model is stable and the audit/consent infrastructure is proven.
When payload capture is added, it must:
- require a separately leased
PayloadCapturecap distinct from the baseRingTracecap; - be a separately audited grant;
- carry a per-call byte budget enforced by the kernel.
Hazard Preflight
paging/MMIO: cap-table snapshots and ring-trace records read kernel state under existing locks. No new user-mapping or MMIO surface is introduced in Phase 1–3.
ABI: DebugSession, CapTableSnapshot, and RingTrace are new
schema interfaces. Generated bindings must be refreshed via
make generated-code-check before merging any Phase 1 branch.
authority transfer via snapshot: the critical invariant is that no
CapSlotSnapshot entry can be used by the inspector to call or transfer
a capability. The kernel must enforce that the snapshot data path does
not return live cap references — only metadata fields (interface ID,
label, state). This must be verified in the Phase 1 implementation review.
audit bypass: an inspector must not be able to suppress or delay audit records for its own actions. Audit writes must occur synchronously within the debug session dispatch path, not deferred.
covert timing channel: a sampler that returns precise timestamps could be used to extract timing side-channel information about a target service. The sampler tick field is clamped to PIT-resolution granularity in Phase 3 to reduce precision; finer clock access for profiling remains deferred.
Security Boundaries
- A
DebugSessionholder can read snapshots of one target. It cannot call, transfer, or activate any capability belonging to the target. - A
RingTraceholder can read ring metadata for one target. Payload bytes require a separate stronger cap. - A
Samplerholder receives PC and bounded stack frames for one target. No memory-mapped content, no register state beyond PC. - None of these caps admit cross-process inspection. A
DebugSessionfor process A cannot observe process B. - A debugged process remains subject to normal scheduler and capability enforcement. Being debugged does not grant the target any additional capability slots or authority.
- Redaction applies at snapshot construction time, not at read time. The kernel constructs the redacted view; the inspector never sees the raw kernel state.
Non-Goals
- No ambient
ptrace-style process attach without authority. - No kernel debugger (GDB stub, JTAG) exposed as a userspace capability surface — those are operator boot-time tools, not capability-model components.
- No replay semantics. Ring trace is inspection, not record/replay. Replay requires payload retention, timer modeling, and capability checkpoints; that is out of scope.
- No cross-process or system-wide trace aggregation in this proposal.
Aggregate trace is a monitoring concern covered by
docs/proposals/system-monitoring-proposal.md. - No memory read/write through a debug session. Address-space inspection is a separate and stronger authority not proposed here.
- No
DebugSessionself-grant. A process cannot debug itself through this interface. - No crash/exception observation here. A read-only
ExceptionObservercap (the Zircontask_create_exception_channelanalog) for receiving crash notifications without debug-write authority is a separate, weaker authority owned by Crash Recovery and Supervision, not bundled intoDebugSession.
Relevant Research and Prior Art
In-Tree Notes
- Debug, Trace, and Profiling Authority
is the dedicated prior-art survey for this proposal: GDB remote serial
protocol, Linux
ptrace/Yama,perf/CAP_PERFMON, Fuchsia handle-scopeddebug_agent/zxdb, seL4 TCB-cap hardware debug, and Genode CPU-session GDB monitor, grounding theDebugSession/Sampler/exception-observer authority split against real sources. docs/research/zircon.mddocuments Fuchsia’s handle model: handles are process-local references with a rights bitmask, there is no ambient authority, and a process can only interact with kernel objects through handles it holds. capOS draws the directly applicable lesson here — aDebugSessionis a held capability, not an ambient privilege, and inspection of a target’s cap table is itself a distinct grantable authority rather than a side effect of holding a generic “debug” right. The note covers handle rights and transfer but not Fuchsia’sdebug_agent/zxdbdebugging service specifically; that service is now surveyed in the dedicated research note above (and summarized below).docs/research/sel4.mdrecords that seL4 has no in-kernel debug traps or thread-introspection mechanism in the verified configuration; debugging is pushed to userspace and the design constraints (typed authority, no ambient inspection) matter more than any debugger feature. capOS follows the same posture: keep the kernel surface to read-only snapshot and bounded capture, and route policy (who may attach, to what) to userspace consent and the broker.docs/research/genode.mddocuments Genode’s session-and-label model, where every cross-component request carries a label and is mediated by a parent component. The applicable lesson is that attach authority should flow through the same parent/supervisor relationship that already governs spawning — a supervisor that holds a child’sProcessHandleis the natural minter of aDebugSessionfor that child, mirroring Genode’s parent-mediated session routing rather than a global debugger service.docs/research/completion-ring-threading.mdgrounds the io_uring-style SQ/CQ ring transport that the Phase 2 ring trace observes. The trace records the same SQE/CQE structures already captured by the kerneldebug_tapfacility (RingCaptureRecordincapos_config::ring); this proposal adds the authority and consent layer that the existing build-gated emergency-serial capture lacks.
External Precedent
- GDB remote serial protocol (
gdbserver). GDB separates the debugger front-end from a target-side stub that exposes register, memory, and breakpoint operations over a serial/TCP channel. The lesson for capOS is that the inspection surface can be a narrow, well-defined protocol object rather than ambient access — but full register/memory read-write is exactly the strong authority capOS defers to Phase 4 and keeps out of the read-onlyDebugSession. - Linux
ptrace(2).ptraceis the canonical ambient-authority footgun: attach authority derives from Unix UID and the Yamaptrace_scopesysctl rather than from a held, transferable capability, and a successful attach grants register and full address-space read/write at once. This conflates “may observe” with “may control” and bypasses higher-level access controls. capOS rejects this directly —DebugSessionattach is owner-consented or broker-granted, audited, and read-only; observation and control are separate authorities. - Linux
perfand eBPF tracing. Sampled profiling and tracing on Linux sit behind privilege boundaries (perf_event_paranoid,CAP_PERFMON/CAP_BPF) precisely because PC/stack sampling and kernel-wide tracing leak timing and topology information across trust boundaries. capOS treats the same risk as a capability and an audit event: theSamplercap is scoped to one consented target, its timestamp resolution is clamped, and arming it is recorded. - Fuchsia
debug_agent/zxdb. Fuchsia’s debugger is a userspace service (debug_agent) that thezxdbfront-end drives; it operates on process and thread handles rather than ambient privilege, consistent with Zircon’s object-capability model. This is the closest external precedent for capOS’s intended shape — debugging as a handle/capability-mediated service, not a kernel-ambient right. A dedicated in-tree note on thedebug_agentdesign is research-needed per thedocs/backlog/research-design-gaps.mdconvention before the Phase 4 breakpoint/single-step surface is designed. - Object-capability systems generally. Capability systems avoid an
ambient
ptraceanalog because there is no global principal that implicitly dominates other processes; the authority to inspect must be granted like any other capability. This is the structural reason capOS can offer debugging without reintroducing ambient authority, and why the consent and audit requirements in this proposal are load-bearing rather than optional hardening.
Relevant Proposals
- System Monitoring (
system-monitoring-proposal.md): owns aggregate ring traces (TraceCapture), log/metric/audit signal taxonomy, and theRingTapmove-semantics note for payload-capturing taps. This proposal owns the per-process debug attach authority and consent model that monitoring’s trace surfaces do not cover.TraceRecordschema is shared; authority and consent model is separate. - Security and Verification (
security-and-verification-proposal.md): the trust-boundary inventory (Track S.7) must be updated to includeDebugSession,RingTrace,Sampler, andCapTableSnapshotas new boundaries before downstream services rely on them. - System Performance Benchmarks
(
system-performance-benchmarks-proposal.md): benchmark runners may arm aSamplerbefore a workload run; this proposal defines the authority and consent model for that use. - Task State and Agent Telemetry
(
task-state-and-agent-telemetry-proposal.md): agent maintenance sessions may useDebugSessionto inspect service state; telemetry records that fact.