Proposal: Durable Hardware Audit Log Persistence
How the HardwareAuditLog capability moves from a bounded volatile in-kernel
ring to durable, tamper-evident audit storage without claiming authority it
does not have.
Problem
HardwareAuditLog today is a read-only observer over the four hardware
authority caps (DeviceMmio, Interrupt, DMAPool, DMABuffer). The kernel
emits one cap-audit: line per lifecycle event and appends a copy into a
fixed-size volatile ring (capacity 64, drop-oldest). A manifest-granted snapshot
method exposes a bounded window (cursor, truncation labels, drop counter) to a
single subscriber. The snapshot result already advertises its own limits:
persistence_status=volatile-only, signature_status=unsigned,
subscriber_admission_status=production-admission-policy-not-implemented.
That is honest groundwork, but it has three gaps that block any claim of durable audit evidence:
- No durability. The ring lives in kernel RAM. A reboot, crash, or QEMU exit loses every record. Audit evidence that vanishes on restart cannot support post-incident review.
- No tamper-evidence. Records are unsigned and unchained. A future consumer reading persisted bytes cannot tell whether history was edited, reordered, or truncated.
- No production admission policy. Exactly one manifest-granted reader gets a volatile snapshot. There is no model for multiple subscribers, scoped read windows, or revocation.
This proposal selects a concrete design for all three so the blocked task
docs/tasks/blocked/ddf-audit-cap-durable-persistence.md can proceed to
implementation. It is a design decision, not a kernel change.
Scope and Non-Claims
This proposal is deliberately narrow. It is observer-evidence design only.
- Audit persistence records authority events. It does not grant, gate, or imply authority. The authority checks stay in the device-manager and cap-object paths exactly where they are now.
- Durable audit is not IOMMU isolation. It does not bound DMA, validate MMIO ranges, or constrain interrupt routes. It records that those events happened.
- Durable audit is not provider-driver readiness. A persisted audit trail does not make a userspace driver production-ready; it makes the driver’s hardware-cap lifecycle reviewable.
- Tamper-evidence is detection, not prevention. A signed, hash-chained log proves history was not edited if verification passes; it cannot stop a privileged writer from refusing to append. Availability of the audit path is a separate concern.
- The durable path must not depend on volatile QEMU-only state, the
qemucargo feature proof rings, or local run telemetry. Those remain harness scaffolding.
Design Grounding
docs/tasks/blocked/ddf-audit-cap-durable-persistence.md— acceptance criteria and hazard preflight this proposal answers.docs/proposals/cryptography-and-key-management-proposal.md—SymmetricKey(mac/verify),PrivateKey(sign),KeySource, andKeyVaultprimitives consumed for tamper-evidence and key lifecycle.docs/proposals/storage-and-naming-proposal.md— capability-nativeStore, append-onlyFile/ledger semantics, content hashing, previous-record hash chaining, and stale-write rules consumed for the durable ring.docs/proposals/system-monitoring-proposal.md— audit as a distinct append-only record type with its own readers and retention, X.740 audit field model, and “observation is authority” principle.docs/dma-isolation-design.mdanddocs/plans/device-driver-foundation.md— the device-driver foundation context the hardware authority caps live in.kernel/src/cap/hardware_audit.rs— the current volatile-ring behavior this design preserves and extends.
Design
1. Durable Audit-Record Ring
The durable audit path is a two-tier structure: the existing bounded
in-kernel volatile ring stays as a fast-path staging buffer, and a userspace
audit log service owns durable persistence behind the capability-native
Store interface.
flowchart LR
DM[Device manager and<br/>hardware cap objects] -->|emit_cap_audit| KR[Kernel volatile ring<br/>capacity 64, drop-oldest]
KR -->|snapshot cursor poll| ALS[Audit log service<br/>userspace]
ALS -->|append-only records| ST[(Store / append-only<br/>ledger segment)]
ALS -->|sealed segment digest| KV[KeyVault / KeySource]
ALS -->|scoped read window| SUB[Admitted subscribers]
Why a userspace service, not kernel-side disk I/O. Durable storage means a
block device, a filesystem-like layout, segment rotation, and signing. None of
that belongs in the kernel: the kernel’s job is dispatch and isolation. The
kernel keeps doing exactly what it does today — bounded, alloc-free,
lock-light ring emission — and a userspace audit log service drains it through
the existing snapshot cursor. This also keeps the durable path off any
QEMU-only state: the service persists through Store, which is backed by a
real BlockDevice (or a cloud bridge adapter) per the storage proposal.
Drain protocol. The audit log service polls HardwareAuditLog.snapshot
with a monotonic start_sequence cursor. Each poll returns the window since
the last durably-committed sequence. The service:
- Reads the snapshot window and the
dropped_recordscounter. - Appends each record to the current segment (see rotation below).
- Advances its cursor to
next_sequenceonly after the segment write is durably committed (Storesync).
If the kernel ring drops records between polls (dropped_records advanced by
more than the records the service consumed), the service writes a gap
marker record into the durable log: { kind: gap, lost_count, observed_at }.
A gap is itself audit evidence — it is recorded, not hidden. The drop-oldest
behavior of the kernel ring is therefore preserved and made visible in the
durable log rather than silently lost.
Retention and rotation. The durable log is a sequence of fixed-size segments (proposed 1 MiB each; an implementation tuning parameter, not an ABI). When a segment fills:
- The service computes the segment digest (see tamper-evidence below).
- It seals the segment (digest + chain link recorded).
- It opens the next segment, whose first record carries the previous
segment’s digest as
prev_segment_digest.
Retention is count-bounded and age-bounded: keep at most N sealed
segments (proposed default 64) or segments newer than T (proposed default 30
days), whichever is smaller. The bound is a manifest-configurable policy on the
audit log service, not a kernel constant.
Overflow policy. Two distinct overflow points, two distinct policies:
- Kernel ring → service drain lag. Drop-oldest, as today, with a recorded gap marker. Rationale: the kernel ring must never block a hardware cap lifecycle path on a slow or absent consumer. Audit emission is best-effort by construction; the gap marker makes the loss auditable.
- Durable segment retention limit. Drop-oldest sealed segment, with a retention-eviction record appended to the active segment naming the evicted segment’s digest and sequence range. Rationale: an operator querying “what did we lose to retention” gets a definite answer, and the hash chain stays intact across the eviction (the eviction record links forward; the evicted segment’s digest is permanently recorded before deletion).
Backpressure is explicitly rejected for both points. Backpressuring a hardware authority cap on audit-storage latency would let a stalled disk wedge device lifecycle — an availability and correctness hazard far worse than a recorded gap. Audit is evidence over authority, never a gate on it.
Crash-recovery semantics. On audit log service restart:
- The service scans sealed segments oldest-to-newest, verifying each
segment digest and the
prev_segment_digestchain link. - It finds the last segment. If the last segment is unsealed, it replays its
records, recomputing the running digest; a torn final record (incomplete
write) is truncated at the last valid record boundary and a
recovery_truncationmarker is appended. - It re-derives the drain cursor from the highest durably-committed
sequenceand resumes polling the kernel ring from there.
Records lost in the window between the last durable commit and the crash are not recoverable — the kernel ring is volatile and a crash loses it. This is an explicit, accepted limitation: see Assumptions. The recovery markers make the boundary of trustworthy history explicit to any consumer.
2. Tamper-Evidence and Signing
Tamper-evidence is a hash chain plus segment signing, consuming the cryptography/key-management proposal’s primitives. No new crypto is invented here.
Per-record chaining. Each durable audit record carries
prev_record_hash — a hash over the previous record’s canonical bytes. This is
exactly the append-only-ledger pattern the storage proposal already
prescribes (“append new records with previous-record hashes rather than
rewriting history”). Editing or reordering any record breaks every subsequent
prev_record_hash, so a verifier walking the chain detects the first
divergence.
Per-segment signing. When a segment is sealed, the audit log service
computes the segment digest (a hash over the sealed record range, anchored on
the running chain hash) and produces a signature over
{ segment_index, sequence_range, record_count, segment_digest, prev_segment_digest }. Two signing modes, selected by manifest policy:
- MAC mode (default). A
SymmetricKeywithKeyPurpose.integrityproduces an HMAC tag over the segment header viaSymmetricKey.mac. Cheaper, no asymmetric key handling, sufficient when the verifier is trusted to hold the same key. Verification isSymmetricKey.verify. - Asymmetric mode. A sign-only
PrivateKeyproduces a signature viaPrivateKey.sign. Used when audit evidence must be verifiable by a consumer that should not be able to forge records (e.g. an external reviewer holding only the public key). Verification uses the correspondingPublicKey.verify.
The audit log service receives a signing-capable key cap (a SymmetricKey
restricted to mac, or a PrivateKey restricted to sign) at manifest grant
time. It never holds raw key material — the key is a capability object per the
key-management design.
What signs what. The chain hash protects record order and content within and across segments. The segment signature protects the segment header, binding the digest, sequence range, and previous-segment digest under a key. Together: a verifier with the verification key can confirm that the sealed segments form an unbroken, unedited chain back to the first segment, and that each seal was produced by the holder of the signing key.
Key lifecycle.
- Provenance. The signing key is produced by a
KeySourceand stored sealed in aKeyVault(per the key-management proposal). The manifest grants the audit log service a use capability for the key, not the vault. - Rotation. Keys rotate on a policy interval (proposed default 90 days) or
on demand. Rotation is segment-aligned: a segment is always signed by exactly
one key. The first segment after rotation records a
key_rotationmarker carrying the new key’s identifier (KeySource.infoidentifier — a label, not a secret) and the previous key’s identifier. A verifier follows the identifier sequence to know which key verifies which segment range. - Revocation. If a signing key is suspected compromised, it is revoked in
the
KeyVault. Revocation does not invalidate already-sealed segments — those remain verifiable against the (now-revoked) key, and the revocation itself is recorded as akey_revocationmarker. What revocation prevents is future seals with that key. A consumer treats segments signed by a revoked key as “authentic at seal time, key later revoked” — still evidence, with a documented caveat. - What is NOT protected. Tamper-evidence cannot protect records the kernel ring dropped before the service drained them, cannot protect the crash-window records, and cannot prevent an attacker who holds the live signing key from forging new well-formed history going forward. It detects edits to already-sealed history. These limits are stated in Assumptions.
3. Production Subscriber Admission Policy
Today exactly one manifest-granted reader gets a volatile snapshot. The production model keeps “observation is authority” but adds structure.
Reader caps are typed and scoped. The audit log service exposes readers as distinct capability objects, not a single shared snapshot method:
HardwareAuditReader— a read-only cap over a scoped window: a subscriber may be granted the full history, a single hardware-cap-tag slice (e.g.DMAPoolevents only), or a bounded recent window. Narrowing is structural — a narrower reader is a wrapper cap exposing less, per the capOS capability-model principle, not a rights bitmask.- The cap exposes
snapshot(cursor-based, preserving the existing field model) andverify(returns segment-chain verification status so a subscriber can confirm tamper-evidence without holding the signing key, when the deployment uses asymmetric mode and grants the public verification key).
Admission is manifest-declared, with a runtime broker path. Two tiers:
- Manifest-declared subscribers. The boot manifest declares which services receive which scoped reader caps, exactly like every other capability grant. This is the baseline and covers the monitoring/audit service itself.
- Runtime-admitted subscribers. A later phase may route audit-reader
requests through the userspace authority broker
(
docs/proposals/userspace-authority-broker-proposal.md), so an operator session can be granted a scoped, time-bounded reader without a reboot. This is explicitly future work, gated on the broker; it is named here so the admission model has a forward path, not so it ships in the first slice.
Revocation. Reader caps are ordinary caps and are revoked the ordinary way (cap-table teardown). Revoking a reader does not touch the durable log.
4. Preservation of Existing Volatile-Snapshot Behavior
The kernel-side volatile ring and its snapshot ABI are preserved unchanged as the staging tier:
- The bounded ring (capacity 64),
head/len/next_sequence/dropped_recordsbookkeeping, and drop-oldest admission stay exactly as inkernel/src/cap/hardware_audit.rs. - The snapshot cursor (
start_sequence), truncation labels (no-records-requested,request-limited,snapshot-limit-limited,available-records-exhausted), and thedropped_recordscounter stay as the drain protocol between kernel and audit log service. - The QEMU-only proof rings and
prove_qemu_snapshot_truncation_contractremain harness scaffolding and are not on the durable path. - The snapshot result’s self-describing status fields stay, and their values
advance as the durable path lands:
persistence_statusmoves fromvolatile-onlyto a value naming the durable tier,signature_statusfromunsignedto the active signing mode, andsubscriber_admission_statusfromproduction-admission-policy-not-implementedto the active policy. Changing those field values is an ABI-adjacent change and must land with schema, generated bindings, runtime decode, demos, and smoke assertions in one branch, per the task hazard preflight.
No focused hardware-audit smoke is invalidated by this design: the kernel-side behavior they assert is unchanged. New durable-path behavior gets new smokes (see Evidence Expectations in the task file).
5. Assumptions
The durable evidence is trustworthy only under stated assumptions. A consumer must know these before trusting the log.
- Crash window is lossy. Records in the kernel volatile ring that were not yet durably committed by the audit log service are lost on a crash or power loss. The durable log’s recovery markers bound trustworthy history; they do not recover the lost window. Audit is best-effort at the volatile staging tier by design — it must never block hardware cap lifecycle.
- Rollback below the audit log is out of scope. This design assumes the
Store/BlockDevicebeneath the audit log service does not silently roll back committed segments. If the underlying storage can roll back (e.g. a snapshot-restore of the whole volume), the hash chain detects the resulting gap on next verification, but the design does not prevent it. Volume-level rollback protection is the volume-encryption/storage proposals’ concern. - Rotation is segment-aligned and monotonic. A segment is signed by exactly
one key. Key identifiers in
key_rotationmarkers are assumed monotonic and unique so a verifier can deterministically map segment ranges to keys. - Key lifecycle is delegated. Key generation, sealing, rotation scheduling,
and revocation are the
KeySource/KeyVaultservices’ responsibility. This proposal assumes those primitives behave as the key-management proposal specifies; it does not re-implement them. - Signing key compromise forges the future, not the past. An attacker holding the live signing key can produce well-formed new records. The hash chain plus revocation marker make the compromise boundary detectable once revocation is recorded, but records sealed during the compromise window are only as trustworthy as the key was. Asymmetric mode narrows this: a verifier holding only the public key cannot itself forge, but a compromised private key still can until revoked.
- The audit log service is trusted to append. Tamper-evidence detects edits to sealed history. It does not prevent the audit log service from refusing to append, stalling, or being killed. Availability of the audit path — restart policy, health checks — is the service-architecture and monitoring proposals’ concern, not this one.
Relationship to Other Proposals
- Cryptography and Key Management — this proposal consumes
SymmetricKey.mac/verify,PrivateKey.sign,KeySource, andKeyVault. It adds no cryptographic primitive. - Storage and Naming — the durable ring is an append-only ledger on the
capability-native
Store, using the previous-record-hash chaining the storage proposal already prescribes. - System Monitoring — the audit log service is the hardware-cap-specific
producer feeding the broader audit-record model in the monitoring proposal;
scoped
HardwareAuditReadercaps follow the monitoring proposal’s “observation is authority” and per-record-type retention principles. - Device Driver Foundation — this design records hardware authority cap lifecycle events. It does not change where authority is checked, and does not claim provider-driver readiness or IOMMU isolation.
Open Questions
- Segment size, retention counts, and rotation interval are proposed defaults,
not ABI. They want a tuning pass once a real
BlockDevicebackend exists. - Whether the
verifymethod onHardwareAuditReadershould return a full chain proof or a bounded status summary depends on the first real consumer’s needs and is deferred to implementation. - Cloud-bridge-backed
Storefor the durable log inherits the storage proposal’s stale-write and size-bound rules; whether audit segments should also be content-addressed objects in that backend is left to the storage track.