# Authority Graph and Resource Accounting for Transfer

This document defines the authority graph and resource-accounting contract
originally tracked as Security Verification Track S.9 in
`docs/proposals/security-and-verification-proposal.md`. It covers:

- capability transfer (`xfer_cap_count`, copy/move, rollback)
- ProcessSpawner prerequisites (spawn quotas and result-cap insertion)


Security Verification Track S.9 is complete when this design contract is
concrete enough to guide implementation. The invariants and acceptance
criteria below are implementation gates for capability transfer,
ProcessSpawner, Security Verification Track S.8, and Security Verification
Track S.12 follow-up work, not requirements for declaring the Security
Verification Track S.9 design artifact complete. Current capability-semantics
follow-up items live in `docs/backlog/stage-6-capability-semantics.md`.

## Current Implementation and Target Contract

The current implementation defines `ResourceLedger` fields in
`capos-lib/src/cap_table.rs` for capability slots, outstanding calls, scratch
bytes, frame-grant pages, and virtual-reservation pages. Cap-slot and
frame/virtual page reservations are wired into current reservation paths.
Outstanding-call and scratch-byte counters are present ledger fields but are
not yet fully wired into reservation/preflight paths. Endpoint queue quota,
diagnostic log-rate accounting, and CPU token-bucket accounting below are
target contract fields for future implementation work, not current
`ResourceLedger` members.

## 1. Authority Graph Model

Authority is modeled as a directed multigraph:

- Nodes:
  - `Process(Pid)`
  - `Object(ObjectId)` (kernel object identity, independent of per-process `CapId`)
- Edges:
  - `Hold(Pid -> ObjectId)` with metadata:
    - `cap_id` (table-local handle)
    - `interface_id`
    - `badge`
    - `transfer_mode` (`copy`, `move`, `non_transferable`)
    - `origin` (`kernel`, `spawn_grant`, `ipc_transfer`, `result_cap`)

Security invariant A1: all authority is represented by `Hold` edges; no
operation can create object authority outside this graph.

Security invariant A2: each process mutates only its own `CapTable` edges except
through explicit transfer/spawn transactions validated by the kernel.

Security invariant A3: for every live `Hold` edge there is exactly one
`cap_id` slot in one process table referencing the object generation.

## 2. Per-Process Resource Ledger and Quotas

Each process owns a kernel-maintained `ResourceLedger`. For wired reservation
paths, enforcement is fail-closed at reservation time (before side effects).
The target contract completes enforcement for present-but-unwired fields and
extends the ledger with endpoint queue, diagnostic log, and CPU budget
counters.

```text
ResourceLedger {
  // Current ledger fields.
  cap_slots_used / cap_slots_max
  outstanding_calls_used / outstanding_calls_max
  scratch_bytes_used / scratch_bytes_max
  frame_grant_pages_used / frame_grant_pages_max
  virtual_reservation_pages_used / virtual_reservation_pages_max

  // Target/future fields.
  endpoint_queue_used / endpoint_queue_max
  log_bytes_window_used / log_bytes_per_window (token bucket)
  cpu_time_us_window_used / cpu_budget_us_per_window (token bucket)
}
```

Initial quota profile for Stage 6/5.2 bring-up (tunable by kernel config):

- `cap_slots_max`: 256
- `outstanding_calls_max`: 64
- `scratch_bytes_max`: 256 KiB
- `frame_grant_pages_max`: 4096 pages (16 MiB at 4 KiB pages)
- `virtual_reservation_pages_max`: kernel-configured virtual reservation
  budget
- Future target fields: `endpoint_queue_max` 128 messages,
  `log_bytes_per_window` 64 KiB/sec with 256 KiB burst, and
  `cpu_budget_us_per_window` 10,000 us per 100,000 us window.

Security invariant Q1: no counter may exceed its max.

Security invariant Q2: every resource reservation has a matched release on all
success, error, timeout, process-exit, and rollback paths.

Security invariant Q3: quota checks for transfer/spawn happen before mutating
sender or receiver capability state.

## 3. Diagnostic Rate Limiting and Aggregation

Repeated invalid ring/cap submissions are aggregated per process and error key.

- Key: `(pid, error_code, opcode, cap_id_bucket)`
- Buckets:
  - `cap_id_bucket = exact cap id` for stale/invalid cap failures
  - `cap_id_bucket = 0` for structural ring errors
- Per-key token bucket: allow first `N=4` emissions/sec, then suppress.
- Suppressed counts are flushed once per second as one summary line:
  - `pid=X invalid submissions suppressed=Y last_err=...`

Security invariant D1: invalid submission floods cannot consume unbounded serial
bandwidth or scheduler time in log formatting.

Security invariant D2: suppression never hides first-observation diagnostics for
a new `(pid,error,opcode,cap bucket)` key.

## 4. Transfer and Rollback Semantics

Transfers (`xfer_cap_count > 0`) use a kernel transfer transaction
(`TransferTxn`) scoped to a single SQE dispatch. The current ring ABI does not
provide kernel-owned SQE sequence numbers or a durable transaction table, so
userspace replay of a copy-transfer SQE is repeatable: each replay is treated
as a new copy grant. Move-transfer replay fails closed after the source slot is
removed or reserved by the first successful dispatch.

Future exactly-once replay suppression requires transaction identity scoped to
`(sender_pid, call_id, sqe_seq)` and a monotonic transfer epoch. Until that
exists, exactly-once claims apply only within one dispatch attempt, not across
malicious rewrites of shared SQ ring indexes.

Sensitive interfaces must choose their transfer mode deliberately:

| Transfer mode | Semantics | Suitable for | Required negative tests |
| --- | --- | --- | --- |
| `copy` | Repeatable grant; sender keeps authority and replaying the same copy-transfer SQE can mint another receiver hold. | Stateless or explicitly shareable caps where duplicate receivers are acceptable and audited. | Replay mints only allowed duplicate holds; quota exhaustion fails closed; copy across forbidden session/transfer scope is rejected. |
| `move` | Single authority handoff; sender loses the source hold after successful destination insertion. Replay fails closed after source reservation/removal. | Linear resources, accepted sockets, terminal sessions, one-shot result caps, and authority that should have one active owner. | Replay after success fails; rollback restores sender on partial failure; receiver cannot observe authority before commit. |
| `non_transferable` | No IPC/spawn transfer. | Process-local control caps, raw spawn/network/device authority, private keys, and caps whose authority depends on caller-local state. | IPC/spawn transfer attempts fail closed and leave sender/receiver tables unchanged. |

Copy-transfer replay is therefore acceptable only for caps whose interface
contract says repeated receivers are safe. Sensitive caps must be move-only or
non-transferable until the interface has an explicit replay threat model and
hostile tests.

Phases:

1. `Prepare`:
   - validate SQE transport fields and `xfer_cap_count`
   - validate sender ownership/generation/transferability for each exported cap
   - reserve receiver quota (`cap_slots`, `outstanding_calls`, scratch if needed)
   - pin sender entries in txn state (no sender table mutation yet)
2. `Commit`:
   - insert destination edges exactly once
   - for `copy`: increment object refcount/export ref
   - for `move`: remove sender slot only after destination insertion succeeds
   - publish completion/result
3. `Finalize`:
   - release transient reservations
   - mark txn terminal (`committed` or `aborted`)

On any error before `Commit`, rollback is full:

- receiver inserts are not visible
- sender slots/refcounts unchanged
- reservations released
- CQE returns transfer failure (`CAP_ERR_TRANSFER_ABORTED` / subtype)

On error during `Commit`, kernel executes compensating rollback to preserve
exactly-once visibility: either all inserts are visible with matching sender
state transition, or none are visible.

Security invariant T1: each transfer descriptor is applied at most once within a
single SQE dispatch attempt.

Security invariant T2: move transfer is atomic from observer perspective; no
state exists where both sender and receiver lose authority due to partial apply.

Security invariant T3: copy-transfer SQE replay is explicitly repeatable until
kernel-owned transaction identity exists. Move-transfer replay fails closed
after source removal or source reservation.

Security invariant T4: `CAP_OP_RELEASE` removes one local hold edge only from
the caller table and decrements remote export refs exactly once.

## 5. Integration with 3.6 Capability Transfer

3.6 implementation must consume this design directly:

- `CALL` and `RETURN` validate all currently-reserved transfer fields fail-closed
  when unsupported.
- `xfer_cap_count` path is wired through `TransferTxn` (no ad hoc direct inserts).
- Badge propagation is explicit in transfer descriptors and copied into
  destination edge metadata.
- `CAP_OP_RELEASE` uses the same authority ledger and refcount bookkeeping.

3.6 acceptance criteria:

1. Copy transfer produces one new receiver edge and retains sender edge.
2. Move transfer produces one new receiver edge and deletes sender edge atomically.
3. Any transfer failure leaves sender and receiver `CapTable`s unchanged.
4. Copy replay is an explicit repeatable-grant policy until a kernel-owned
   transaction identity is added; move replay fails closed after source removal
   or reservation.
5. `CAP_OP_RELEASE` on stale/non-owned cap fails closed without mutating other
   process tables.

## 6. Integration with 5.2 ProcessSpawner Prerequisites

5.2 must use the same accounting and transfer machinery:

- `spawn()` preflights child quotas (`cap_slots`, `outstanding_calls`,
  `scratch`, `frame_grant_pages`, endpoint queue baseline) before mapping child
  memory or scheduling.
- Parent-provided `CapGrant` entries are inserted via the same transfer
  transaction semantics (copy for initial grants in 5.2.2).
- Returned `ProcessHandle` is inserted through the standard result-cap insertion
  path and accounted as a normal cap slot.
- Child setup rollback must unwind:
  - address space mappings
  - ring page
  - CapSet page
  - kernel stack
  - allocated frames
  - provisional capability edges/reservations

5.2 acceptance criteria:

1. Spawn failure at any step leaves no child-visible process and no leaked
   ledger usage.
2. Successful spawn accounts all child bootstrap resources within quotas.
3. Parent and child cap-table accounting remains balanced under repeated
   spawn/exit cycles.
4. `ProcessHandle.wait` and exit cleanup release outstanding-call/scratch/frame
   usage deterministically.

## 7. Implementation Notes for Verification Tracks

This design unblocks:

- Security Verification Track S.8 hostile-input tests for quota and
  invalid-transfer failures.
- Security Verification Track S.12 Kani bounds refresh for ledger and transfer
  invariants.
- Target 12 in `docs/proposals/security-and-verification-proposal.md` with explicit
  allocator hooks and fail-closed exhaustion behavior.
