Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Durable Hardware Audit Log Persistence

How the HardwareAuditLog capability moves from a bounded volatile in-kernel ring to durable, tamper-evident audit storage without claiming authority it does not have.

Problem

HardwareAuditLog today is a read-only observer over the four hardware authority caps (DeviceMmio, Interrupt, DMAPool, DMABuffer). The kernel emits one cap-audit: line per lifecycle event and appends a copy into a fixed-size volatile ring (capacity 64, drop-oldest). A manifest-granted snapshot method exposes a bounded window (cursor, truncation labels, drop counter) to a single subscriber. The snapshot result already advertises its own limits: persistence_status=volatile-only, signature_status=unsigned, subscriber_admission_status=production-admission-policy-not-implemented.

That is honest groundwork, but it has three gaps that block any claim of durable audit evidence:

  1. No durability. The ring lives in kernel RAM. A reboot, crash, or QEMU exit loses every record. Audit evidence that vanishes on restart cannot support post-incident review.
  2. No tamper-evidence. Records are unsigned and unchained. A future consumer reading persisted bytes cannot tell whether history was edited, reordered, or truncated.
  3. No production admission policy. Exactly one manifest-granted reader gets a volatile snapshot. There is no model for multiple subscribers, scoped read windows, or revocation.

This proposal selects a concrete design for all three so the blocked task docs/tasks/blocked/ddf-audit-cap-durable-persistence.md can proceed to implementation. It is a design decision, not a kernel change.

Scope and Non-Claims

This proposal is deliberately narrow. It is observer-evidence design only.

  • Audit persistence records authority events. It does not grant, gate, or imply authority. The authority checks stay in the device-manager and cap-object paths exactly where they are now.
  • Durable audit is not IOMMU isolation. It does not bound DMA, validate MMIO ranges, or constrain interrupt routes. It records that those events happened.
  • Durable audit is not provider-driver readiness. A persisted audit trail does not make a userspace driver production-ready; it makes the driver’s hardware-cap lifecycle reviewable.
  • Tamper-evidence is detection, not prevention. A signed, hash-chained log proves history was not edited if verification passes; it cannot stop a privileged writer from refusing to append. Availability of the audit path is a separate concern.
  • The durable path must not depend on volatile QEMU-only state, the qemu cargo feature proof rings, or local run telemetry. Those remain harness scaffolding.

Design Grounding

  • docs/tasks/blocked/ddf-audit-cap-durable-persistence.md — acceptance criteria and hazard preflight this proposal answers.
  • docs/proposals/cryptography-and-key-management-proposal.mdSymmetricKey (mac/verify), PrivateKey (sign), KeySource, and KeyVault primitives consumed for tamper-evidence and key lifecycle.
  • docs/proposals/storage-and-naming-proposal.md — capability-native Store, append-only File/ledger semantics, content hashing, previous-record hash chaining, and stale-write rules consumed for the durable ring.
  • docs/proposals/system-monitoring-proposal.md — audit as a distinct append-only record type with its own readers and retention, X.740 audit field model, and “observation is authority” principle.
  • docs/dma-isolation-design.md and docs/plans/device-driver-foundation.md — the device-driver foundation context the hardware authority caps live in.
  • kernel/src/cap/hardware_audit.rs — the current volatile-ring behavior this design preserves and extends.

Design

1. Durable Audit-Record Ring

The durable audit path is a two-tier structure: the existing bounded in-kernel volatile ring stays as a fast-path staging buffer, and a userspace audit log service owns durable persistence behind the capability-native Store interface.

flowchart LR
    DM[Device manager and<br/>hardware cap objects] -->|emit_cap_audit| KR[Kernel volatile ring<br/>capacity 64, drop-oldest]
    KR -->|snapshot cursor poll| ALS[Audit log service<br/>userspace]
    ALS -->|append-only records| ST[(Store / append-only<br/>ledger segment)]
    ALS -->|sealed segment digest| KV[KeyVault / KeySource]
    ALS -->|scoped read window| SUB[Admitted subscribers]

Why a userspace service, not kernel-side disk I/O. Durable storage means a block device, a filesystem-like layout, segment rotation, and signing. None of that belongs in the kernel: the kernel’s job is dispatch and isolation. The kernel keeps doing exactly what it does today — bounded, alloc-free, lock-light ring emission — and a userspace audit log service drains it through the existing snapshot cursor. This also keeps the durable path off any QEMU-only state: the service persists through Store, which is backed by a real BlockDevice (or a cloud bridge adapter) per the storage proposal.

Drain protocol. The audit log service polls HardwareAuditLog.snapshot with a monotonic start_sequence cursor. Each poll returns the window since the last durably-committed sequence. The service:

  1. Reads the snapshot window and the dropped_records counter.
  2. Appends each record to the current segment (see rotation below).
  3. Advances its cursor to next_sequence only after the segment write is durably committed (Store sync).

If the kernel ring drops records between polls (dropped_records advanced by more than the records the service consumed), the service writes a gap marker record into the durable log: { kind: gap, lost_count, observed_at }. A gap is itself audit evidence — it is recorded, not hidden. The drop-oldest behavior of the kernel ring is therefore preserved and made visible in the durable log rather than silently lost.

Retention and rotation. The durable log is a sequence of fixed-size segments (proposed 1 MiB each; an implementation tuning parameter, not an ABI). When a segment fills:

  1. The service computes the segment digest (see tamper-evidence below).
  2. It seals the segment (digest + chain link recorded).
  3. It opens the next segment, whose first record carries the previous segment’s digest as prev_segment_digest.

Retention is count-bounded and age-bounded: keep at most N sealed segments (proposed default 64) or segments newer than T (proposed default 30 days), whichever is smaller. The bound is a manifest-configurable policy on the audit log service, not a kernel constant.

Overflow policy. Two distinct overflow points, two distinct policies:

  • Kernel ring → service drain lag. Drop-oldest, as today, with a recorded gap marker. Rationale: the kernel ring must never block a hardware cap lifecycle path on a slow or absent consumer. Audit emission is best-effort by construction; the gap marker makes the loss auditable.
  • Durable segment retention limit. Drop-oldest sealed segment, with a retention-eviction record appended to the active segment naming the evicted segment’s digest and sequence range. Rationale: an operator querying “what did we lose to retention” gets a definite answer, and the hash chain stays intact across the eviction (the eviction record links forward; the evicted segment’s digest is permanently recorded before deletion).

Backpressure is explicitly rejected for both points. Backpressuring a hardware authority cap on audit-storage latency would let a stalled disk wedge device lifecycle — an availability and correctness hazard far worse than a recorded gap. Audit is evidence over authority, never a gate on it.

Crash-recovery semantics. On audit log service restart:

  1. The service scans sealed segments oldest-to-newest, verifying each segment digest and the prev_segment_digest chain link.
  2. It finds the last segment. If the last segment is unsealed, it replays its records, recomputing the running digest; a torn final record (incomplete write) is truncated at the last valid record boundary and a recovery_truncation marker is appended.
  3. It re-derives the drain cursor from the highest durably-committed sequence and resumes polling the kernel ring from there.

Records lost in the window between the last durable commit and the crash are not recoverable — the kernel ring is volatile and a crash loses it. This is an explicit, accepted limitation: see Assumptions. The recovery markers make the boundary of trustworthy history explicit to any consumer.

2. Tamper-Evidence and Signing

Tamper-evidence is a hash chain plus segment signing, consuming the cryptography/key-management proposal’s primitives. No new crypto is invented here.

Per-record chaining. Each durable audit record carries prev_record_hash — a hash over the previous record’s canonical bytes. This is exactly the append-only-ledger pattern the storage proposal already prescribes (“append new records with previous-record hashes rather than rewriting history”). Editing or reordering any record breaks every subsequent prev_record_hash, so a verifier walking the chain detects the first divergence.

Per-segment signing. When a segment is sealed, the audit log service computes the segment digest (a hash over the sealed record range, anchored on the running chain hash) and produces a signature over { segment_index, sequence_range, record_count, segment_digest, prev_segment_digest }. Two signing modes, selected by manifest policy:

  • MAC mode (default). A SymmetricKey with KeyPurpose.integrity produces an HMAC tag over the segment header via SymmetricKey.mac. Cheaper, no asymmetric key handling, sufficient when the verifier is trusted to hold the same key. Verification is SymmetricKey.verify.
  • Asymmetric mode. A sign-only PrivateKey produces a signature via PrivateKey.sign. Used when audit evidence must be verifiable by a consumer that should not be able to forge records (e.g. an external reviewer holding only the public key). Verification uses the corresponding PublicKey.verify.

The audit log service receives a signing-capable key cap (a SymmetricKey restricted to mac, or a PrivateKey restricted to sign) at manifest grant time. It never holds raw key material — the key is a capability object per the key-management design.

What signs what. The chain hash protects record order and content within and across segments. The segment signature protects the segment header, binding the digest, sequence range, and previous-segment digest under a key. Together: a verifier with the verification key can confirm that the sealed segments form an unbroken, unedited chain back to the first segment, and that each seal was produced by the holder of the signing key.

Key lifecycle.

  • Provenance. The signing key is produced by a KeySource and stored sealed in a KeyVault (per the key-management proposal). The manifest grants the audit log service a use capability for the key, not the vault.
  • Rotation. Keys rotate on a policy interval (proposed default 90 days) or on demand. Rotation is segment-aligned: a segment is always signed by exactly one key. The first segment after rotation records a key_rotation marker carrying the new key’s identifier (KeySource.info identifier — a label, not a secret) and the previous key’s identifier. A verifier follows the identifier sequence to know which key verifies which segment range.
  • Revocation. If a signing key is suspected compromised, it is revoked in the KeyVault. Revocation does not invalidate already-sealed segments — those remain verifiable against the (now-revoked) key, and the revocation itself is recorded as a key_revocation marker. What revocation prevents is future seals with that key. A consumer treats segments signed by a revoked key as “authentic at seal time, key later revoked” — still evidence, with a documented caveat.
  • What is NOT protected. Tamper-evidence cannot protect records the kernel ring dropped before the service drained them, cannot protect the crash-window records, and cannot prevent an attacker who holds the live signing key from forging new well-formed history going forward. It detects edits to already-sealed history. These limits are stated in Assumptions.

3. Production Subscriber Admission Policy

Today exactly one manifest-granted reader gets a volatile snapshot. The production model keeps “observation is authority” but adds structure.

Reader caps are typed and scoped. The audit log service exposes readers as distinct capability objects, not a single shared snapshot method:

  • HardwareAuditReader — a read-only cap over a scoped window: a subscriber may be granted the full history, a single hardware-cap-tag slice (e.g. DMAPool events only), or a bounded recent window. Narrowing is structural — a narrower reader is a wrapper cap exposing less, per the capOS capability-model principle, not a rights bitmask.
  • The cap exposes snapshot (cursor-based, preserving the existing field model) and verify (returns segment-chain verification status so a subscriber can confirm tamper-evidence without holding the signing key, when the deployment uses asymmetric mode and grants the public verification key).

Admission is manifest-declared, with a runtime broker path. Two tiers:

  • Manifest-declared subscribers. The boot manifest declares which services receive which scoped reader caps, exactly like every other capability grant. This is the baseline and covers the monitoring/audit service itself.
  • Runtime-admitted subscribers. A later phase may route audit-reader requests through the userspace authority broker (docs/proposals/userspace-authority-broker-proposal.md), so an operator session can be granted a scoped, time-bounded reader without a reboot. This is explicitly future work, gated on the broker; it is named here so the admission model has a forward path, not so it ships in the first slice.

Revocation. Reader caps are ordinary caps and are revoked the ordinary way (cap-table teardown). Revoking a reader does not touch the durable log.

4. Preservation of Existing Volatile-Snapshot Behavior

The kernel-side volatile ring and its snapshot ABI are preserved unchanged as the staging tier:

  • The bounded ring (capacity 64), head/len/next_sequence/dropped_records bookkeeping, and drop-oldest admission stay exactly as in kernel/src/cap/hardware_audit.rs.
  • The snapshot cursor (start_sequence), truncation labels (no-records-requested, request-limited, snapshot-limit-limited, available-records-exhausted), and the dropped_records counter stay as the drain protocol between kernel and audit log service.
  • The QEMU-only proof rings and prove_qemu_snapshot_truncation_contract remain harness scaffolding and are not on the durable path.
  • The snapshot result’s self-describing status fields stay, and their values advance as the durable path lands: persistence_status moves from volatile-only to a value naming the durable tier, signature_status from unsigned to the active signing mode, and subscriber_admission_status from production-admission-policy-not-implemented to the active policy. Changing those field values is an ABI-adjacent change and must land with schema, generated bindings, runtime decode, demos, and smoke assertions in one branch, per the task hazard preflight.

No focused hardware-audit smoke is invalidated by this design: the kernel-side behavior they assert is unchanged. New durable-path behavior gets new smokes (see Evidence Expectations in the task file).

5. Assumptions

The durable evidence is trustworthy only under stated assumptions. A consumer must know these before trusting the log.

  • Crash window is lossy. Records in the kernel volatile ring that were not yet durably committed by the audit log service are lost on a crash or power loss. The durable log’s recovery markers bound trustworthy history; they do not recover the lost window. Audit is best-effort at the volatile staging tier by design — it must never block hardware cap lifecycle.
  • Rollback below the audit log is out of scope. This design assumes the Store/BlockDevice beneath the audit log service does not silently roll back committed segments. If the underlying storage can roll back (e.g. a snapshot-restore of the whole volume), the hash chain detects the resulting gap on next verification, but the design does not prevent it. Volume-level rollback protection is the volume-encryption/storage proposals’ concern.
  • Rotation is segment-aligned and monotonic. A segment is signed by exactly one key. Key identifiers in key_rotation markers are assumed monotonic and unique so a verifier can deterministically map segment ranges to keys.
  • Key lifecycle is delegated. Key generation, sealing, rotation scheduling, and revocation are the KeySource/KeyVault services’ responsibility. This proposal assumes those primitives behave as the key-management proposal specifies; it does not re-implement them.
  • Signing key compromise forges the future, not the past. An attacker holding the live signing key can produce well-formed new records. The hash chain plus revocation marker make the compromise boundary detectable once revocation is recorded, but records sealed during the compromise window are only as trustworthy as the key was. Asymmetric mode narrows this: a verifier holding only the public key cannot itself forge, but a compromised private key still can until revoked.
  • The audit log service is trusted to append. Tamper-evidence detects edits to sealed history. It does not prevent the audit log service from refusing to append, stalling, or being killed. Availability of the audit path — restart policy, health checks — is the service-architecture and monitoring proposals’ concern, not this one.

Relationship to Other Proposals

  • Cryptography and Key Management — this proposal consumes SymmetricKey.mac/verify, PrivateKey.sign, KeySource, and KeyVault. It adds no cryptographic primitive.
  • Storage and Naming — the durable ring is an append-only ledger on the capability-native Store, using the previous-record-hash chaining the storage proposal already prescribes.
  • System Monitoring — the audit log service is the hardware-cap-specific producer feeding the broader audit-record model in the monitoring proposal; scoped HardwareAuditReader caps follow the monitoring proposal’s “observation is authority” and per-record-type retention principles.
  • Device Driver Foundation — this design records hardware authority cap lifecycle events. It does not change where authority is checked, and does not claim provider-driver readiness or IOMMU isolation.

Open Questions

  • Segment size, retention counts, and rotation interval are proposed defaults, not ABI. They want a tuning pass once a real BlockDevice backend exists.
  • Whether the verify method on HardwareAuditReader should return a full chain proof or a bounded status summary depends on the first real consumer’s needs and is deferred to implementation.
  • Cloud-bridge-backed Store for the durable log inherits the storage proposal’s stale-write and size-bound rules; whether audit segments should also be content-addressed objects in that backend is left to the storage track.