Proposal: Durable Hardware Audit Log Persistence

How the HardwareAuditLog capability moves from a bounded volatile in-kernel ring to durable, tamper-evident audit storage without claiming authority it does not have.

Problem

HardwareAuditLog is the read-only observer over the four hardware authority caps (DeviceMmio, Interrupt, DMAPool, DMABuffer). The kernel still emits one cap-audit: line per lifecycle event and appends a copy into a fixed-size volatile ring (capacity 64, drop-oldest). The userspace hardware-audit-service now drains that ring into a Store-backed, hash-chained segment ring recoverable through Store.list inventory, and serves scoped HardwareAuditReader snapshots with self-describing persistence, retention, subscriber-admission, keyed-seal, key-lifecycle, physical-persistence, and runtime-admission metadata. The regular DDF audit service smoke uses the RAM-backed StoreCap and keeps the IOMMU abort-held DMAPool/DMABuffer evidence strict. The physical persistence proof manifest grants persistent_store to the service and reuses one disk image across two QEMU boots; pass 2 must recover and verify pass-1 audit segment blobs before draining current-boot records. The smoke also stores and reads a separate content-addressed marker as an independent Store-disk sanity check.

The current keyed mode uses a RAM-local RamSymmetricKey minted through the development-only local DevelopmentSoftwareKeySource and seals each segment header with HMAC-SHA256. The audit service never exports raw key material. Snapshot metadata reports the signing key identifier, generation, single-local-key rotation status, and RAM-local revocation caveat so a verifier can distinguish this local proof from external KeyVault custody.

The remaining gaps before a full production durability and audit-verifier claim are:

External verifier key custody. The shipped keyed seal is local HMAC evidence from a development-only deterministic key source. It is not yet a production KeyVault/KeySource-managed key with durable rotation and revocation enforcement.
Production media and rollback policy. The QEMU persistent_store reboot proof demonstrates Store-backed survival across boot using the CAPOSST1 disk format. Volume rollback resistance and cloud/hardware media assumptions remain the storage track’s responsibility.
Runtime subscribers are refused until a broker path exists. Manifest scoped reader grants work. HardwareAuditReader runtime admission now fails closed with an explicit no-authority-broker status instead of silently implying support.

The local proof was implemented by docs/tasks/done/2026-06-07/hardware-audit-physical-persistence-signing-local-proof.md.

This proposal selects the target design for those production extensions and records the boundaries of the Store-backed service that has landed.

Scope and Non-Claims

This proposal is deliberately narrow. It is observer-evidence design only.

Audit persistence records authority events. It does not grant, gate, or imply authority. The authority checks stay in the device-manager and cap-object paths exactly where they are now.
Durable audit is not IOMMU isolation. It does not bound DMA, validate MMIO ranges, or constrain interrupt routes. It records that those events happened.
Durable audit is not provider-driver readiness. A persisted audit trail does not make a userspace driver production-ready; it makes the driver’s hardware-cap lifecycle reviewable.
Tamper-evidence is detection, not prevention. A signed, hash-chained log proves history was not edited if verification passes; it cannot stop a privileged writer from refusing to append. Availability of the audit path is a separate concern.
The durable path must not depend on volatile QEMU-only state, the qemu cargo feature proof rings, or local run telemetry. Those remain harness scaffolding.

Design Grounding

docs/tasks/done/2026-05-22/ddf-audit-cap-durable-persistence.md — acceptance criteria and hazard preflight this proposal answers.
docs/proposals/cryptography-and-key-management-proposal.md — SymmetricKey (mac/verify), PrivateKey (sign), KeySource, and KeyVault primitives consumed for tamper-evidence and key lifecycle.
docs/proposals/storage-and-naming-proposal.md — capability-native Store, append-only File/ledger semantics, content hashing, previous-record hash chaining, and stale-write rules consumed for the durable ring.
docs/proposals/system-monitoring-proposal.md — audit as a distinct append-only record type with its own readers and retention, X.740 audit field model, and “observation is authority” principle.
docs/dma-isolation-design.md and docs/backlog/hardware-boot-storage.md — the device-driver foundation context the hardware authority caps live in.
kernel/src/cap/hardware_audit.rs — the current volatile-ring behavior this design preserves and extends.

Design

1. Durable Audit-Record Ring

The durable audit path is a two-tier structure: the existing bounded in-kernel volatile ring stays as a fast-path staging buffer, and a userspace audit log service owns durable persistence behind the capability-native Store interface.

flowchart LR
    DM[Device manager and<br/>hardware cap objects] -->|emit_cap_audit| KR[Kernel volatile ring<br/>capacity 64, drop-oldest]
    KR -->|drain cursor poll| ALS[Audit log service<br/>userspace]
    ALS -->|append-only records| ST[(Store / append-only<br/>ledger segment)]
    ALS -->|sealed segment digest| KV[KeyVault / KeySource]
    ALS -->|scoped read window| SUB[Admitted subscribers]

Why a userspace service, not kernel-side disk I/O. Durable storage means a block device, a filesystem-like layout, segment rotation, and signing. None of that belongs in the kernel: the kernel’s job is dispatch and isolation. The kernel keeps doing exactly what it does today — bounded, alloc-free, lock-light ring emission — and a userspace audit log service drains it through HardwareAuditLog.drain with a per-cap cursor. This also keeps the durable path off QEMU-only telemetry: the service persists through the Store interface. The current bootstrap StoreCap is RAM-backed and therefore demonstrates the contract; a real BlockDevice or cloud bridge adapter per the storage proposal is required before this path claims post-reboot retention.

Drain protocol. The audit log service polls HardwareAuditLog.drain with a monotonic expected_sequence cursor. Each successful drain returns the window since the last durably-committed sequence. The service:

Reads the drained window and the dropped_records counter.
Appends each record to the current segment (see rotation below).
Advances its cursor to next_sequence only after the segment write is durably committed (Store sync).

If the kernel ring drops records between polls (dropped_records advanced by more than the records the service consumed), the service writes a gap marker record into the durable log: { kind: gap, lost_count, observed_at }. A gap is itself audit evidence — it is recorded, not hidden. The drop-oldest behavior of the kernel ring is therefore preserved and made visible in the durable log rather than silently lost.

Retention and rotation. The durable log is a sequence of fixed-size segments (proposed 1 MiB each; an implementation tuning parameter, not an ABI). When a segment fills:

The service computes the segment digest (see tamper-evidence below).
It seals the segment (digest + chain link recorded).
It opens the next segment, whose first record carries the previous segment’s digest as prev_segment_digest.

Retention is count-bounded and age-bounded: keep at most N sealed segments (proposed default 64) or segments newer than T (proposed default 30 days), whichever is smaller. The bound is a manifest-configurable policy on the audit log service, not a kernel constant.

Overflow policy. Two distinct overflow points, two distinct policies:

Kernel ring → service drain lag. Drop-oldest, as today, with a recorded gap marker. Rationale: the kernel ring must never block a hardware cap lifecycle path on a slow or absent consumer. Audit emission is best-effort by construction; the gap marker makes the loss auditable.
Durable segment retention limit. Drop-oldest sealed segment, with a retention-eviction record appended to the active segment naming the evicted segment’s digest and sequence range. Rationale: an operator querying “what did we lose to retention” gets a definite answer, and the hash chain stays intact across the eviction (the eviction record links forward; the evicted segment’s digest is permanently recorded before deletion).

Backpressure is explicitly rejected for both points. Backpressuring a hardware authority cap on audit-storage latency would let a stalled disk wedge device lifecycle — an availability and correctness hazard far worse than a recorded gap. Audit is evidence over authority, never a gate on it.

Crash-recovery semantics. On audit log service restart:

The service scans sealed segments oldest-to-newest, verifying each segment digest and the prev_segment_digest chain link.
It finds the last segment. If the last segment is unsealed, it replays its records, recomputing the running digest; a torn final record (incomplete write) is truncated at the last valid record boundary and a recovery_truncation marker is appended.
It re-derives the drain cursor from the highest durably-committed sequence and resumes polling the kernel ring from there.

Records lost in the window between the last durable commit and the crash are not recoverable — the kernel ring is volatile and a crash loses it. This is an explicit, accepted limitation: see Assumptions. The recovery markers make the boundary of trustworthy history explicit to any consumer.

2. Tamper-Evidence and Segment Seals

Tamper-evidence is a hash chain plus segment signing, consuming the cryptography/key-management proposal’s primitives. No new crypto is invented here.

Per-record chaining. Each durable audit record carries prev_record_hash — a hash over the previous record’s canonical bytes. This is exactly the append-only-ledger pattern the storage proposal already prescribes (“append new records with previous-record hashes rather than rewriting history”). Editing or reordering any record breaks every subsequent prev_record_hash, so a verifier walking the chain detects the first divergence.

Per-segment signing. The shipped service records per-segment digests and a running chain head so retained-window tampering is detectable. The local keyed proof seals each segment header with HMAC-SHA256 using a RAM-local symmetric key cap minted by the development-only local key source. When a segment is sealed, the audit log service computes the segment digest (a hash over the sealed record range, anchored on the running chain hash) and produces a keyed seal over { segment_index, sequence_range, record_count, segment_digest, prev_segment_digest }. Production deployment should select one of these key custody modes by manifest policy:

MAC mode (default). A SymmetricKey with KeyPurpose.integrity produces an HMAC tag over the segment header via SymmetricKey.mac. Cheaper, no asymmetric key handling, sufficient when the verifier is trusted to hold the same key. Verification is SymmetricKey.verify.
Asymmetric mode. A sign-only PrivateKey produces a signature via PrivateKey.sign. Used when audit evidence must be verifiable by a consumer that should not be able to forge records (e.g. an external reviewer holding only the public key). Verification uses the corresponding PublicKey.verify.

The audit log service receives a signing-capable key cap (a SymmetricKey restricted to mac, or a PrivateKey restricted to sign) at manifest grant time. It never holds raw key material — the key is a capability object per the key-management design. The current local proof follows the same no-raw-key custody rule with a RamSymmetricKey minted by the development-only software key source. That source deterministically remints the same non-extractable local HMAC key from stable source metadata and an audit label for the reboot proof, but it is still not production custody: there is no external root, rollback resistance, rotation, or persistent revocation state.

What signs what. The chain hash protects record order and content within and across segments. The segment signature protects the segment header, binding the digest, sequence range, and previous-segment digest under a key. Together: a verifier with the verification key can confirm that the sealed segments form an unbroken, unedited chain back to the first segment, and that each seal was produced by the holder of the signing key.

Key lifecycle.

Current local proof. signing_key_id = "local-audit-hmac-v1" and signing_key_generation = 1 identify the development-key-source RAM-local HMAC key generation. key_rotation_status = "single-local-key-no-rotation" and key_revocation_status = "ram-local-key-revocation-not-persistent" are explicit caveats, not production lifecycle controls.
Provenance. The signing key is produced by a KeySource and stored sealed in a KeyVault (per the key-management proposal). The manifest grants the audit log service a use capability for the key, not the vault.
Rotation. Keys rotate on a policy interval (proposed default 90 days) or on demand. Rotation is segment-aligned: a segment is always signed by exactly one key. The first segment after rotation records a key_rotation marker carrying the new key’s identifier (KeySource.info identifier — a label, not a secret) and the previous key’s identifier. A verifier follows the identifier sequence to know which key verifies which segment range.
Revocation. If a signing key is suspected compromised, it is revoked in the KeyVault. Revocation does not invalidate already-sealed segments — those remain verifiable against the (now-revoked) key, and the revocation itself is recorded as a key_revocation marker. What revocation prevents is future seals with that key. A consumer treats segments signed by a revoked key as “authentic at seal time, key later revoked” — still evidence, with a documented caveat.
What is NOT protected. Tamper-evidence cannot protect records the kernel ring dropped before the service drained them, cannot protect the crash-window records, and cannot prevent an attacker who holds the live signing key from forging new well-formed history going forward. It detects edits to already-sealed history. These limits are stated in Assumptions.

3. Production Subscriber Admission Policy

Today exactly one manifest-granted reader gets a volatile snapshot. The production model keeps “observation is authority” but adds structure.

Reader caps are typed and scoped. The audit log service exposes readers as distinct capability objects, not a single shared snapshot method:

HardwareAuditReader — a read-only cap over a scoped window: a subscriber may be granted the full history, a single hardware-cap-tag slice (e.g. DMAPool events only), or a bounded recent window. Narrowing is structural — a narrower reader is a wrapper cap exposing less, per the capOS capability-model principle, not a rights bitmask.
The cap exposes snapshot (cursor-based, preserving the existing field model) and verify (returns segment-chain verification status so a subscriber can confirm tamper-evidence without holding the signing key, when the deployment uses asymmetric mode and grants the public verification key).

Admission is manifest-declared, with a runtime broker path. Two tiers:

Manifest-declared subscribers. The boot manifest declares which services receive which scoped reader caps, exactly like every other capability grant. This is the baseline and covers the monitoring/audit service itself.
Runtime-admitted subscribers. A later phase may route audit-reader requests through the userspace authority broker (docs/proposals/userspace-authority-broker-proposal.md), so an operator session can be granted a scoped, time-bounded reader without a reboot. This is explicitly future work, gated on the broker. The shipped reader endpoint exposes a runtime-admission method that refuses with InvalidArgument and reports runtime_admission_policy = "runtime-reader-admission-refused-no-authority-broker", so callers get a fail-closed status instead of an implied grant.

Revocation. Reader caps are ordinary caps and are revoked the ordinary way (cap-table teardown). Revoking a reader does not touch the durable log.

4. Preservation of Existing Volatile-Snapshot Behavior

The kernel-side volatile ring and its snapshot ABI are preserved unchanged as the staging tier:

The bounded ring (capacity 64), head/len/next_sequence/dropped_records bookkeeping, and drop-oldest admission stay exactly as in kernel/src/cap/hardware_audit.rs.
The snapshot cursor (start_sequence), truncation labels (no-records-requested, request-limited, snapshot-limit-limited, available-records-exhausted), and the dropped_records counter stay available to direct HardwareAuditLog.snapshot observers.
The durable service path uses HardwareAuditLog.drain(expected_sequence, max_records) as its per-cap cursor protocol. A cursor mismatch still fails closed; a cursor-verified overflow reanchors at the retained window and reports the advanced dropped_records counter so the service can record a visible gap.
The QEMU-only proof rings and prove_qemu_snapshot_truncation_contract remain harness scaffolding and are not on the durable path.
The HardwareAuditReader.snapshot result’s self-describing status fields stay, and their values advance as the durable path lands. The Store-backed service reports persistence_status = "store-backed-segment-ring", signature_status = "hash-chain-plus-local-hmac-segment-seals", keyed_seal_count greater than or equal to the retained sealed segment count, signing_key_id = "local-audit-hmac-v1", key_rotation_status = "single-local-key-no-rotation", key_revocation_status = "ram-local-key-revocation-not-persistent", physical_persistence_status = "store-cap-backing-manifest-selected", subscriber_admission_status = "manifest-admission-active-runtime-broker-refused", and runtime_admission_policy = "runtime-reader-admission-refused-no-authority-broker". Changing those field values is an ABI-adjacent change and must land with schema, generated bindings, runtime decode, demos, and smoke assertions in one branch, per the task hazard preflight.

No focused hardware-audit smoke is invalidated by this design: the kernel-side behavior they assert is unchanged. New durable-path behavior gets new smokes (see Evidence Expectations in the task file).

5. Assumptions

The durable evidence is trustworthy only under stated assumptions. A consumer must know these before trusting the log.

Crash window is lossy. Records in the kernel volatile ring that were not yet durably committed by the audit log service are lost on a crash or power loss. The durable log’s recovery markers bound trustworthy history; they do not recover the lost window. Audit is best-effort at the volatile staging tier by design — it must never block hardware cap lifecycle.
Rollback below the audit log is out of scope. This design assumes the Store/BlockDevice beneath the audit log service does not silently roll back committed segments. If the underlying storage can roll back (e.g. a snapshot-restore of the whole volume), the hash chain detects the resulting gap on next verification, but the design does not prevent it. Volume-level rollback protection is the volume-encryption/storage proposals’ concern.
Rotation is segment-aligned and monotonic. A production segment is signed by exactly one key. Key identifiers in key_rotation markers are assumed monotonic and unique so a verifier can deterministically map segment ranges to keys.
Key lifecycle is delegated. Key generation, sealing, rotation scheduling, and revocation are the KeySource/KeyVault services’ responsibility. This proposal assumes those primitives behave as the key-management proposal specifies; it does not re-implement them. The landed local HMAC proof uses a development-only deterministic source and states its lack of production rotation/revocation in reader-visible metadata.
Signing key compromise forges the future, not the past. An attacker holding the live signing key can produce well-formed new records. The hash chain plus revocation marker make the compromise boundary detectable once revocation is recorded, but records sealed during the compromise window are only as trustworthy as the key was. Asymmetric mode narrows this: a verifier holding only the public key cannot itself forge, but a compromised private key still can until revoked.
The audit log service is trusted to append. Tamper-evidence detects edits to sealed history. It does not prevent the audit log service from refusing to append, stalling, or being killed. Availability of the audit path — restart policy, health checks — is the service-architecture and monitoring proposals’ concern, not this one.

Relationship to Other Proposals

Cryptography and Key Management — this proposal consumes SymmetricKey.mac/verify, PrivateKey.sign, KeySource, and KeyVault. It adds no cryptographic primitive.
Storage and Naming — the durable ring is an append-only ledger on the capability-native Store, using the previous-record-hash chaining the storage proposal already prescribes.
System Monitoring — the audit log service is the hardware-cap-specific producer feeding the broader audit-record model in the monitoring proposal; scoped HardwareAuditReader caps follow the monitoring proposal’s “observation is authority” and per-record-type retention principles.
Device Driver Foundation — this design records hardware authority cap lifecycle events. It does not change where authority is checked, and does not claim provider-driver readiness or IOMMU isolation.

Open Questions

Segment size, retention counts, and rotation interval are proposed defaults, not ABI. The focused smoke currently retains eight sealed segments so boot-time abort-held DMA records remain inside the proof window; production defaults still need a tuning pass once a real BlockDevice backend exists.
Whether the verify method on HardwareAuditReader should return a full chain proof or a bounded status summary depends on the first real consumer’s needs and is deferred to implementation.
Cloud-bridge-backed Store for the durable log inherits the storage proposal’s stale-write and size-bound rules; whether audit segments should also be content-addressed objects in that backend is left to the storage track.

Keyboard shortcuts

capOS Documentation