Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Authority Graph and Resource Accounting for Transfer

This document defines the authority graph and resource-accounting contract originally tracked as Security Verification Track S.9 in docs/proposals/security-and-verification-proposal.md. It covers:

  • capability transfer (xfer_cap_count, copy/move, rollback)
  • ProcessSpawner prerequisites (spawn quotas and result-cap insertion)

Security Verification Track S.9 is complete when this design contract is concrete enough to guide implementation. The invariants and acceptance criteria below are implementation gates for capability transfer, ProcessSpawner, Security Verification Track S.8, and Security Verification Track S.12 follow-up work, not requirements for declaring the Security Verification Track S.9 design artifact complete. Current capability-semantics follow-up items live in docs/backlog/stage-6-capability-semantics.md.

Current Implementation and Target Contract

The current implementation defines ResourceLedger fields in capos-lib/src/cap_table.rs for capability slots, outstanding calls, scratch bytes, frame-grant pages, and virtual-reservation pages. Cap-slot and frame/virtual page reservations are wired into current reservation paths. Outstanding-call and scratch-byte counters are present ledger fields but are not yet fully wired into reservation/preflight paths. Endpoint queue quota, diagnostic log-rate accounting, and CPU token-bucket accounting below are target contract fields for future implementation work, not current ResourceLedger members.

1. Authority Graph Model

Authority is modeled as a directed multigraph:

  • Nodes:
    • Process(Pid)
    • Object(ObjectId) (kernel object identity, independent of per-process CapId)
  • Edges:
    • Hold(Pid -> ObjectId) with metadata:
      • cap_id (table-local handle)
      • interface_id
      • badge
      • transfer_mode (copy, move, non_transferable)
      • origin (kernel, spawn_grant, ipc_transfer, result_cap)

Security invariant A1: all authority is represented by Hold edges; no operation can create object authority outside this graph.

Security invariant A2: each process mutates only its own CapTable edges except through explicit transfer/spawn transactions validated by the kernel.

Security invariant A3: for every live Hold edge there is exactly one cap_id slot in one process table referencing the object generation.

2. Per-Process Resource Ledger and Quotas

Each process owns a kernel-maintained ResourceLedger. For wired reservation paths, enforcement is fail-closed at reservation time (before side effects). The target contract completes enforcement for present-but-unwired fields and extends the ledger with endpoint queue, diagnostic log, and CPU budget counters.

ResourceLedger {
  // Current ledger fields.
  cap_slots_used / cap_slots_max
  outstanding_calls_used / outstanding_calls_max
  scratch_bytes_used / scratch_bytes_max
  frame_grant_pages_used / frame_grant_pages_max
  virtual_reservation_pages_used / virtual_reservation_pages_max

  // Target/future fields.
  endpoint_queue_used / endpoint_queue_max
  log_bytes_window_used / log_bytes_per_window (token bucket)
  cpu_time_us_window_used / cpu_budget_us_per_window (token bucket)
}

Initial quota profile for Stage 6/5.2 bring-up (tunable by kernel config):

  • cap_slots_max: 256
  • outstanding_calls_max: 64
  • scratch_bytes_max: 256 KiB
  • frame_grant_pages_max: 4096 pages (16 MiB at 4 KiB pages)
  • virtual_reservation_pages_max: kernel-configured virtual reservation budget
  • Future target fields: endpoint_queue_max 128 messages, log_bytes_per_window 64 KiB/sec with 256 KiB burst, and cpu_budget_us_per_window 10,000 us per 100,000 us window.

Security invariant Q1: no counter may exceed its max.

Security invariant Q2: every resource reservation has a matched release on all success, error, timeout, process-exit, and rollback paths.

Security invariant Q3: quota checks for transfer/spawn happen before mutating sender or receiver capability state.

3. Diagnostic Rate Limiting and Aggregation

Repeated invalid ring/cap submissions are aggregated per process and error key.

  • Key: (pid, error_code, opcode, cap_id_bucket)
  • Buckets:
    • cap_id_bucket = exact cap id for stale/invalid cap failures
    • cap_id_bucket = 0 for structural ring errors
  • Per-key token bucket: allow first N=4 emissions/sec, then suppress.
  • Suppressed counts are flushed once per second as one summary line:
    • pid=X invalid submissions suppressed=Y last_err=...

Security invariant D1: invalid submission floods cannot consume unbounded serial bandwidth or scheduler time in log formatting.

Security invariant D2: suppression never hides first-observation diagnostics for a new (pid,error,opcode,cap bucket) key.

4. Transfer and Rollback Semantics

Transfers (xfer_cap_count > 0) use a kernel transfer transaction (TransferTxn) scoped to a single SQE dispatch. The current ring ABI does not provide kernel-owned SQE sequence numbers or a durable transaction table, so userspace replay of a copy-transfer SQE is repeatable: each replay is treated as a new copy grant. Move-transfer replay fails closed after the source slot is removed or reserved by the first successful dispatch.

Future exactly-once replay suppression requires transaction identity scoped to (sender_pid, call_id, sqe_seq) and a monotonic transfer epoch. Until that exists, exactly-once claims apply only within one dispatch attempt, not across malicious rewrites of shared SQ ring indexes.

Sensitive interfaces must choose their transfer mode deliberately:

Transfer modeSemanticsSuitable forRequired negative tests
copyRepeatable grant; sender keeps authority and replaying the same copy-transfer SQE can mint another receiver hold.Stateless or explicitly shareable caps where duplicate receivers are acceptable and audited.Replay mints only allowed duplicate holds; quota exhaustion fails closed; copy across forbidden session/transfer scope is rejected.
moveSingle authority handoff; sender loses the source hold after successful destination insertion. Replay fails closed after source reservation/removal.Linear resources, accepted sockets, terminal sessions, one-shot result caps, and authority that should have one active owner.Replay after success fails; rollback restores sender on partial failure; receiver cannot observe authority before commit.
non_transferableNo IPC/spawn transfer.Process-local control caps, raw spawn/network/device authority, private keys, and caps whose authority depends on caller-local state.IPC/spawn transfer attempts fail closed and leave sender/receiver tables unchanged.

Copy-transfer replay is therefore acceptable only for caps whose interface contract says repeated receivers are safe. Sensitive caps must be move-only or non-transferable until the interface has an explicit replay threat model and hostile tests.

Phases:

  1. Prepare:
    • validate SQE transport fields and xfer_cap_count
    • validate sender ownership/generation/transferability for each exported cap
    • reserve receiver quota (cap_slots, outstanding_calls, scratch if needed)
    • pin sender entries in txn state (no sender table mutation yet)
  2. Commit:
    • insert destination edges exactly once
    • for copy: increment object refcount/export ref
    • for move: remove sender slot only after destination insertion succeeds
    • publish completion/result
  3. Finalize:
    • release transient reservations
    • mark txn terminal (committed or aborted)

On any error before Commit, rollback is full:

  • receiver inserts are not visible
  • sender slots/refcounts unchanged
  • reservations released
  • CQE returns transfer failure (CAP_ERR_TRANSFER_ABORTED / subtype)

On error during Commit, kernel executes compensating rollback to preserve exactly-once visibility: either all inserts are visible with matching sender state transition, or none are visible.

Security invariant T1: each transfer descriptor is applied at most once within a single SQE dispatch attempt.

Security invariant T2: move transfer is atomic from observer perspective; no state exists where both sender and receiver lose authority due to partial apply.

Security invariant T3: copy-transfer SQE replay is explicitly repeatable until kernel-owned transaction identity exists. Move-transfer replay fails closed after source removal or source reservation.

Security invariant T4: CAP_OP_RELEASE removes one local hold edge only from the caller table and decrements remote export refs exactly once.

5. Integration with 3.6 Capability Transfer

3.6 implementation must consume this design directly:

  • CALL and RETURN validate all currently-reserved transfer fields fail-closed when unsupported.
  • xfer_cap_count path is wired through TransferTxn (no ad hoc direct inserts).
  • Badge propagation is explicit in transfer descriptors and copied into destination edge metadata.
  • CAP_OP_RELEASE uses the same authority ledger and refcount bookkeeping.

3.6 acceptance criteria:

  1. Copy transfer produces one new receiver edge and retains sender edge.
  2. Move transfer produces one new receiver edge and deletes sender edge atomically.
  3. Any transfer failure leaves sender and receiver CapTables unchanged.
  4. Copy replay is an explicit repeatable-grant policy until a kernel-owned transaction identity is added; move replay fails closed after source removal or reservation.
  5. CAP_OP_RELEASE on stale/non-owned cap fails closed without mutating other process tables.

6. Integration with 5.2 ProcessSpawner Prerequisites

5.2 must use the same accounting and transfer machinery:

  • spawn() preflights child quotas (cap_slots, outstanding_calls, scratch, frame_grant_pages, endpoint queue baseline) before mapping child memory or scheduling.
  • Parent-provided CapGrant entries are inserted via the same transfer transaction semantics (copy for initial grants in 5.2.2).
  • Returned ProcessHandle is inserted through the standard result-cap insertion path and accounted as a normal cap slot.
  • Child setup rollback must unwind:
    • address space mappings
    • ring page
    • CapSet page
    • kernel stack
    • allocated frames
    • provisional capability edges/reservations

5.2 acceptance criteria:

  1. Spawn failure at any step leaves no child-visible process and no leaked ledger usage.
  2. Successful spawn accounts all child bootstrap resources within quotas.
  3. Parent and child cap-table accounting remains balanced under repeated spawn/exit cycles.
  4. ProcessHandle.wait and exit cleanup release outstanding-call/scratch/frame usage deterministically.

7. Implementation Notes for Verification Tracks

This design unblocks:

  • Security Verification Track S.8 hostile-input tests for quota and invalid-transfer failures.
  • Security Verification Track S.12 Kani bounds refresh for ledger and transfer invariants.
  • Target 12 in docs/proposals/security-and-verification-proposal.md with explicit allocator hooks and fail-closed exhaustion behavior.