# Security Review and Formal Verification Proposal

How to reason about the correctness and security of the capOS kernel and its
trust boundaries in a way that fits a research OS -- pragmatic tooling now,
targeted verification where it pays off, no aspirational seL4-style full-
kernel proofs. The `docs/research/sel4.md` survey already concluded that
Isabelle/HOL-over-C verification does not transfer to Rust and that the
*design constraints* matter more than the proof artefact. This proposal
codifies that conclusion into a concrete tooling and process plan.


This proposal uses CWE for concrete vulnerability classes, CAPEC for attacker
patterns, Rust language rules / unsafe-code guidance for low-level coding
rules, Common Criteria protection-profile concepts for OS security functions,
ITU-T X.800/X.805 security-services taxonomy as a completeness checklist,
and capability-kernel practice (seL4/EROS-style invariants) for authority,
IPC, object lifetime, and scheduler properties. Web-application checklists are
not the baseline for OS design review.

Grounding sources:

- [MITRE CWE](https://cwe.mitre.org/data/definitions/20.html) for root-cause
  weakness labels: CWE-20 explicitly covers raw data, metadata, sizes, indexes,
  offsets, syntax, type, consistency, and domain rules; CWE also marks broad
  classes such as
  [CWE-20](https://cwe.mitre.org/data/definitions/20.html) and
  [CWE-400](https://cwe.mitre.org/data/definitions/400.html) as discouraged
  for final vulnerability mapping when a more precise child fits.
- [MITRE CAPEC](https://capec.mitre.org/) for attacker behavior, especially
  input manipulation
  ([CAPEC-153](https://capec.mitre.org/data/definitions/153.html)), command
  injection ([CAPEC-248](https://capec.mitre.org/data/definitions/248.html)),
  race exploitation
  ([CAPEC-26](https://capec.mitre.org/data/definitions/26.html) /
  [CAPEC-29](https://capec.mitre.org/data/definitions/29.html)), and
  flooding/resource pressure
  ([CAPEC-125](https://capec.mitre.org/data/definitions/125.html)).
- [Rust Reference](https://doc.rust-lang.org/stable/reference/unsafe-keyword.html)
  and
  [Rust 2024 Edition Guide](https://doc.rust-lang.org/edition-guide/rust-2024/unsafe-op-in-unsafe-fn.html)
  for unsafe-block and `unsafe_op_in_unsafe_fn` obligations.
- [seL4 MCS](https://docs.sel4.systems/Tutorials/mcs.html) and the existing
  capOS research notes for capability-authorized access to kernel objects and
  CPU time.
- [Common Criteria General Purpose Operating System Protection Profile](https://commoncriteria.github.io/operatingsystem/ext-ref-test/operatingsystem.html)
  for OS access-control, security-function, trusted-channel/path, and
  user-data protection concepts. capOS is not trying to certify against it;
  the PP is a vocabulary check for what an OS security review should not omit.
- ITU-T Rec. X.800 (03/91) *Security architecture for OSI* and X.805
  (10/03) *Security architecture for systems providing end-to-end
  communications* for the layered security-services taxonomy:
  authentication, access control, non-repudiation, data confidentiality,
  data integrity, availability, privacy × infrastructure/services/
  applications planes × end-user/control/management planes. Used as a
  completeness matrix: if a proposal claims to cover security but
  leaves one cell unaddressed (e.g. "we have confidentiality but no
  non-repudiation story for the management plane"), review should flag
  the gap. Also ITU-T X.810-X.816 for the individual framework
  breakdowns — authentication (X.811), access control (X.812),
  non-repudiation (X.813), confidentiality (X.814), integrity (X.815),
  audit and alarms (X.816).

## 1. Philosophy and Scope

capOS is explicitly a **research OS** whose design principle is "schema-first
typed capabilities, minimal kernel, reuse the Rust ecosystem." Three
consequences shape this proposal:

1. **The schema is part of the TCB.** A bug in the `.capnp` schema, or in the
   way generated code is patched for `no_std`, is exactly as dangerous as a
   bug in the kernel. The schema, the `capnpc` build pipeline, and the
   generated code all need review attention -- not only hand-written kernel
   code.
2. **The kernel should stay small.** "Everything else is a capability" means
   the TCB is naturally bounded. Verification effort scales with TCB size, so
   resisting kernel bloat is itself a security property.
3. **The interface is the permission.** Access control lives in capnp method
   definitions and in userspace cap wrappers (a narrow cap is a different
   `CapObject`), not in kernel rights bitmasks. Review must confirm that the
   kernel never short-circuits this: no ambient authority, no method that
   bypasses `CapObject::call`, no syscall that exposes an object without a
   capability handle.

Non-goals:

- Full functional-correctness proof of the kernel à la seL4. Infeasible in
  Rust today, and the payoff is low for a research system whose surface
  area is still changing.
- Proving information-flow / confidentiality properties end-to-end.
- Certifying a specific configuration for external deployment.

## 2. Trust Boundaries and Threat Model

Enumerating the boundaries forces every future review to ask "which
boundary does this change touch?" and picks out the code paths that matter.

### TCB Statement

Current demo/proof TCB is broader than the target production TCB. Security
claims must name which one they rely on.

Current demo/proof TCB:

- kernel, including scheduler, memory management, capability dispatch,
  endpoint IPC, in-kernel networking, smoltcp runtime, line discipline, Telnet
  IAC filtering, PCI/virtio-net smoke code, and kernel-owned DMA buffers;
- `capos-config`, schema/codegen output, manifest validator, and checked-in
  generated bindings;
- `capos-rt` runtime transport, userspace entry/panic/allocator glue, and typed
  handle release behavior;
- standalone `init`, `AuthorityBroker`, `SessionManager`, `CredentialStore`,
  shell launcher, restricted launcher, and demo services used by the active
  manifest;
- focused QEMU manifests, host harnesses, and build tools used to construct
  and validate each proof image;
- QEMU virtio devices and host-local loopback forwarding for networking proofs.

Target production TCB:

- kernel primitives that enforce address-space isolation, capability tables,
  generation/epoch checks, ring transport validation, scheduler/thread safety,
  interrupt/timer correctness, and explicit DMA/IOMMU policy;
- schema definitions, generated-code owner, shared ABI constants, and the
  build/signature path for production boot images;
- minimal init/supervisor authority needed to assemble the service graph,
  grant narrowed caps, restart services, and expose scoped status/audit;
- credential, session, broker, key-vault, audit, and remote-ingress services
  that directly decide authentication, authorization, disclosure, and key use;
- production device managers, network stack, and storage services only to the
  extent they hold the corresponding device, network, or persistence authority.

Target non-TCB components should include ordinary applications, untrusted
service binaries, domain libraries without privileged caps, shell children, and
network peers. The target is not reached while default networking runs in the
kernel TCB, the focused Telnet terminal-hosting fixture still relies on kernel
TCP terminal handoff, SSH uses fixture/dev key material, or remote shells share
pre-auth and post-auth process authority.

### Current boundaries

| Boundary | Who trusts whom | Code that enforces it |
|---|---|---|
| Ring 0 ↔ Ring 3 | kernel trusts nothing from user | `kernel/src/mem/paging.rs`, `kernel/src/mem/validate.rs`, `arch/x86_64/syscall.rs`; exercised by `init/` and `demos/*` |
| Kernel ↔ user pointer | kernel validates address + PTE perms under the process VM lock | `AddressSpace::validate_user_buffer`, `copy_from_user`, `copy_to_user`, and legacy `validate_user_buffer` for current-CR3 diagnostics |
| Manifest ↔ kernel | kernel parses capnp manifest at boot | `capos-config::manifest`, called from `kmain` |
| Build inputs ↔ TCB | kernel trusts schema/codegen/build artifacts | `schema/capos.capnp`, `build.rs`, `Cargo.lock`, `Makefile` |
| Host tools ↔ filesystem/process | tools must not let manifest/config input escape intended host boundaries | `tools/mkmanifest`, generators, CI scripts |
| ELF bytes ↔ kernel | kernel parses user ELF to map segments | `capos-lib::elf` |
| User ring ↔ kernel dispatch | kernel trusts no SQ state | `kernel/src/cap/ring.rs` |
| `CapObject::call` wire format | kernel trusts no params bytes | generated capnp decoders + impls |
| Process ↔ process IPC | kernel routes calls between mutually isolated address spaces and trusts neither side's buffers | `kernel/src/cap/endpoint.rs`, `kernel/src/cap/ring.rs`, `kernel/src/sched.rs` |
| Device DMA ↔ physical memory | kernel and device-manager trust no userspace driver-supplied device address, stale DMA handle, or stale interrupt route | `kernel/src/dma_backend.rs`, `kernel/src/device_dma.rs`, `kernel/src/device_manager/`, and the DDF cap objects select a DMA backend at boot, expose manager-owned bounce-buffer handles when no trusted remapping domain exists, hide host physical addresses/IOVAs from userspace providers, and bind `DeviceMmio`/`DMAPool`/`Interrupt` lifecycle to generation-checked ownership ledgers. The QEMU Intel path has bounded per-device remapping evidence; current no-IOMMU cloud/GCE paths are brokered bounce-buffer authority and still do not claim hostile bus-master isolation. |
| WASI host adapter sandbox | userspace wasm-host runs untrusted Preview 1 payloads inside the vendored wasmi interpreter; capOS trusts no wasm import beyond the explicit grant set on `HostState` | `capos-wasm/src/wasi/preview1.rs` translates wasm calls into typed `Console`/`Timer`/`BootPackage`/`EntropySource` invocations; per-instance argv text grants and `random_get` against the kernel `EntropySource` cap honor manifest-declared scope. Ungranted Preview 1 calls return `ERRNO_NOSYS` rather than fabricating authority. The boundary surface today covers W.1-W.4 (substrate, stdout-only stubs, argv grant, `random_get` production wiring); `wasi_args` and entropy fills are bounded by `WASI_ARGS_MAX_*` and `RANDOM_GET_MAX_BYTES`. Filesystem, environment beyond argv, full clocks, and remaining Preview 1 surface remain un-implemented refusals. |
| POSIX adapter v0 substrate | libcapos-posix exposes a narrow fork-for-exec / pipe / socket / clock surface to C code; capOS trusts that the recording-shim window stays scoped to the synthetic child branch and that explicit grants pass through `ProcessSpawner.spawn` | `libcapos-posix` per-process static fd table, single-thread errno cell, kernel `UdpSocket`/`Timer`/`Pipe` clients, and the recording-shim Move-grant `stdio_<N>` path. The pseudo-child branch never calls `_exit()` on `execve()` failure; surface remains research/v0, not a full POSIX TCB. |
| Persistent config overlay ↔ init | init trusts no bytes from the on-disk `system/config/overlay.bin`; it validates the overlay version, SHA-256 content hash, the base manifest's declared extension points (allowed service caps, max additional services, `minOverlayEpoch`, settings allowances), and base-pin non-collision before composing, and rejects the whole overlay (booting the base manifest floor) on any violation | `capos-config::manifest` (`SystemConfigOverlay::from_capnp_bytes` + `compose_onto`), `init/src/main.rs` `apply_config_overlay`; proof `make run-installable-overlay` |
| Hardware cap teardown audit | kernel must record every acquire, release, rollback-detach, Drop-detach-failure, explicit driver-crash trigger, reset/disable trigger, interrupt-waiter trigger, and explicit bounded proof-buffer free for the DDF caps so post-mortem review can correlate device-manager state with cap lifecycle | `kernel/src/cap/hardware_audit.rs` emit helper invoked from `device_mmio.rs`, `interrupt.rs`, `dma_pool.rs`, `dma_buffer.rs`, plus the `devicemmio_grant_source.rs`, `dmapool_grant_source.rs`, and `interrupt_grant_source.rs` userspace-grant rollback paths. The bounded `DMAPool` grant source emits `DmaPool` acquire for its manifest grant; `DMAPool.allocateBuffer` can mint one manager-attached proof `DMABuffer` result cap with its own acquire/free-buffer/release-after-free audit, while duplicate proof-buffer allocation and real DMA allocation remain blocked. Parent-first `DMAPool` release records a pending parent detach and completes after typed `DMABuffer.freeBuffer` frees the proof page, after cap release frees the proof page, or after successful `DMABuffer` driver-crash/reset-disable cleanup frees that page, preserving the one final `DmaPool` release audit. The real driver-crash teardown trigger entry points on `DeviceMmioCap`, `InterruptCap`, `DmaPoolCap`, and `DmaBufferCap` (`device_manager::trigger_driver_crash_for_*` plus each cap's `on_driver_crash`) emit `event=driver-crash` exactly once per successful detach; stale rerun stays silent. The reset/disable trigger entry points on all four cap types (`trigger_reset_disable_for_*` plus each cap's `on_reset_disable`) mirror that single-emit policy with `event=reset-disable`. The first cap-specific interrupt-waiter trigger on `InterruptCap` (`trigger_interrupt_waiter_for_interrupt` plus `InterruptCap::on_interrupt_waiter`) mirrors the same policy with exactly one `event=interrupt-waiter` audit record for the first successful detach. `DMABuffer.freeBuffer` emits exactly one `event=free-buffer` record on the successful explicit proof-buffer free, invalidates later `DMABuffer.info`, and leaves the later cap release as a no-op detach. The DMA pool reset path keeps the zero-live/quiesced/scrubbed evidence precondition, and the DMA buffer reset path reuses the bounded `FreeBuffer` page cleanup path before evidence-gated parent-pool cleanup. All explicit-trigger impls use the load-bearing exhaustive `match outcome.detach_label() { "ok" => emit event, "noop" => silent, label => kprintln + DropDetachFailed }` shape, so any future non-`"ok"`/non-`"noop"` outcome label still surfaces as a `DropDetachFailed` audit rather than being silently dropped. Each event is a `cap-audit:` key=value line on COM1 carrying the cap tag (interface id), event class, BDF, owner, and the relevant generation fields, and `emit_cap_audit` also appends it to a bounded volatile ring. `HardwareAuditLog.snapshot` exposes the latest retained records to userspace with drop-oldest retention while reporting `volatile-only` persistence, `unsigned` signatures, manifest-granted read-only snapshot access, production subscriber admission policy not implemented, and the volatile snapshot truncation contract; a QEMU-only local-ring proof asserts all four truncation labels without mutating the live ring. Durable storage, signing, and production subscriber admission remain future work. The legacy `hardware-cap-release:` line is retained alongside the audit line. |

### Attacker model

- **Untrusted service binaries.** Today's services are checked into the
  repo, but the manifest pipeline is meant to load arbitrary binaries
  eventually. Assume every byte of a service's SQEs, params buffers, result
  buffer pointers, and return addresses is attacker-controlled.
- **Untrusted manifest.** Once manifests are produced outside the repo (e.g.
  generated from CUE fragments, passed in as a Limine module), the manifest
  parser must reject every malformed input without panicking.
- **Resource exhaustion.** Once multiple mutually-untrusting services run,
  a service can attack by filling rings, endpoint queues, capability tables,
  frame pools, scratch arenas, logs, or CPU time. Boundedness and accounting
  are security properties, not performance polish.
- **Build input drift.** The schema/codegen path is already part of the TCB.
  External build inputs such as the bootloader checkout, Rust dependencies,
  capnp code generation, and generated-code patching must be reproducible
  enough that review can tell what changed.
- **Host tooling input.** Build tools and generators run with developer/CI
  filesystem access. Treat manifest/config-derived paths and command arguments
  as untrusted until bounded to the intended directory and execution context.
- **Residual state and disclosure.** Kernel logs, returned buffers, recycled
  frames, endpoint scratch space, and generated artifacts must not expose
  kernel pointers, stale bytes from another process, secrets, or build-system
  paths that increase attacker leverage.
- **Hostile interrupts / preemption.** The scheduler preempts at arbitrary
  points. Any kernel invariant that is only transiently true must be held
  under the right lock or with interrupts disabled.
- **Out of scope (for now):** physical attacks, speculative-execution side
  channels, malicious hardware, IOMMU bypass from DMA devices. These become
  in-scope once the driver stack lands; revisit the threat model then.

### Threat Actor Matrix

| Actor | Current scope | Current treatment | Production gate |
| --- | --- | --- | --- |
| Local physical attacker | Out of scope. | The prototype does not claim protection against physical memory access, bus probing, evil-maid boot replacement, cold boot, firmware compromise, or direct console access. | Secure/measured boot, sealed storage keys, physical console policy, and hardware-rooted attestation before production claims. |
| Malicious DMA device | Out of scope for hostile hardware; in scope only as confused userspace around cooperative QEMU virtio. | The virtio-net smoke assumes QEMU-provided cooperative virtio hardware and kernel-owned bounce buffers. Without an IOMMU, a bus-mastering device can DMA arbitrary RAM. | IOMMU-backed DMA domains or a documented hardware policy that forbids untrusted bus-mastering devices before userspace drivers or production hardware claims. |
| Malicious boot manifest | Partially in scope. | Manifest decoding/validation must fail closed and not panic. A manifest accepted by the kernel/init is still trusted to define the initial service graph and bootstrap grants. | Signed/authorized manifest policy, boot-package integrity, and review-visible payload hashes before accepting manifests from outside the repo or operator-controlled build path. |
| Compromised init/supervisor | Partially out of scope for current proofs. | Current demo TCB includes init and manifest-declared trusted services. If init is compromised, it can misgrant authority within the bootstrap service graph. | Minimize init, split supervisors, require narrow grant construction, audit graph changes, and make restart/update authority explicit. |
| Compromised service with narrow caps | In scope. | Address-space isolation, cap-table lookup, generation checks, ring validation, transfer checks, and resource ledgers should constrain it to granted authority. | Complete hostile smokes for transfer modes, resource exhaustion, panic surfaces, and revoke/epoch behavior per service class. |
| Hostile network peer | In scope only for loopback demo robustness, not production remote access. | Telnet is plaintext loopback-only. SSH gateway work is fixture/prototype status without complete encrypted transport, durable key/account storage, full OpenSSH userauth/channel handling, or complete audit gates. | Non-loopback remote shells stay blocked until SSH transport/auth/key/audit/storage gates pass and pre-auth/post-auth authority is isolated or otherwise proven constrained. |
| Hostile local web client of the remote-session-ui bridge | In scope. | Today's bridge shares one upstream capOS session across all loopback HTTP clients with no per-browser session, missing-`Origin` short-circuit, and non-constant-time secret comparison. | Per-browser `BrowserSession` cookie, CSRF/`Host`/Content-Type guards, strict CSP with the matching inline-script/style refactor, constant-time comparators, rate limiting, and the carry-over Tauri capability-allowlist minimization, all per `remote-session-ui-security-proposal.md`. |
| Malicious build dependency or tool | Partially in scope. | Lockfiles, generated-code checks, pinned Cap'n Proto/Limine/docs tools, and dependency-policy checks make drift review-visible, but Rust nightly, QEMU/xorriso/OVMF, and final image hashes are not fully pinned. | Date/hash-pinned toolchains, recorded host tool versions, image/payload hashes, and reproducible production build path. |

### ITU-T X.800 security-services completeness matrix

X.800 enumerates five security services; X.805 extends the list with
availability and privacy. Each review of a proposal or kernel change
should be able to say which service it touches, or that it touches
none. The point is not to implement every cell — capOS explicitly
defers some (end-to-end non-repudiation, for example) — it is to make
gaps explicit.

| X.800/X.805 service | capOS surface that provides it |
|---|---|
| Authentication (peer entity, data origin) | `user-identity-and-policy-proposal.md` X.1254 LoA tiers; passkey + password credentials in `boot-to-shell-proposal.md`; certificate-based peer auth in `certificates-and-tls-proposal.md` (mTLS); future attestation in `cryptography-and-key-management-proposal.md` `AttestationKeySource`. |
| Access control | Structural: the capability model itself. The interface is the permission; wrapper caps attenuate; `CapTable` cannot be bypassed. Policy layer: `AuthorityBroker` (X.812 ADF) over `CapObject::call` (X.812 AEF). |
| Data confidentiality | Transport: `certificates-and-tls-proposal.md` `TlsSocket`. At rest: `volume-encryption-proposal.md`. In memory: address-space isolation + SMAP + SMEP. |
| Data integrity | Transport: TLS AEAD. At rest: authenticated block encryption (`SymmetricAlgorithm.aes256GcmSiv` etc.). Manifest/boot: signed manifests (`storage-and-naming-proposal.md` Open Q #5). In-transit schema: Cap'n Proto wire format + bounds-checked decoders. |
| Non-repudiation (origin, delivery) | Partial. Signed audit records (`system-monitoring-proposal.md` + `cryptography-and-key-management-proposal.md` audit key purpose). End-to-end non-repudiation for user actions is deferred until signed sessions exist. |
| Availability (X.805) | Resource ledgers, bounded rings, `CAP_OP_RELEASE`, supervisor restart policy, rate limiters on monitoring ingestion. DoS resistance is a review dimension, not a separate subsystem. |
| Privacy (X.805) | Principal pseudonymity (`user-identity-and-policy-proposal.md` pseudonymous profile), audit-record redaction, monitoring "payload capture is exceptional" default. |

The matrix is a checklist, not a claim of completeness: individual
proposals remain authoritative about what they do and don't provide.

## 3. Tiered Approach

Four tiers, cheapest first. Each tier is independently useful, and later
tiers assume earlier ones are in place.

### Tier 1 -- Hygiene and CI (cheap, high value)

These are the controls that make every other tier work. The only checked-in
GitHub Actions workflow is `.github/workflows/ci.yml`; it runs formatting,
host tests, `cargo build --features qemu`, `make capos-rt-check`,
`make generated-code-check`, `make dependency-policy-check`, and
`make workflow-check`. The QEMU smoke job installs its own boot tools and
runs `make` plus `make run`, but remains non-blocking, so it is not yet a
required boot assertion. No separate clippy, miri, fuzz, or Kani workflow
files exist yet -- those are scheduled per the track table below.

- **Continuous integration** via GitHub Actions (or equivalent). Current
  baseline: `make fmt-check`, `cargo test-config`, `cargo test-ring-loom`,
  `cargo test-lib`, `cargo test-mkmanifest`, `cargo build --features qemu`,
  `make capos-rt-check`, `make generated-code-check`,
  `make dependency-policy-check`, and `make workflow-check`. Remaining CI
  work: treat QEMU boot as a required CI gate once runtime flakiness is
  acceptable, then add the security policy jobs below.
- **`cargo clippy --all-targets -- -D warnings`** across workspace members,
  with a curated set of `clippy::pedantic` / `clippy::nursery` lints that
  pay off for kernel code (`clippy::undocumented_unsafe_blocks`,
  `clippy::missing_safety_doc`, `clippy::cast_possible_truncation`, etc.).
  Do NOT enable all of pedantic blindly -- review each lint and either enable
  it or add a rationale comment.
- **`cargo-deny`** for license and advisory gating; **`cargo-audit`** for
  the RustSec advisory DB against `Cargo.lock`. Dependencies include
  `capnp`, `spin`, `x86_64`, `limine`, `linked_list_allocator` -- all
  externally maintained.
- **`cargo-geiger`** report of unsafe surface area per crate, checked in as
  a snapshot and diffed in CI so growth is visible in PRs.
- **Deny `unsafe_op_in_unsafe_fn`** (already required by edition 2024; make
  sure it stays on) and `missing_docs` on public kernel items where it is
  not already the case.
- **Dependency review discipline**: every new dep needs a one-line
  rationale in the commit message and a check that it is `no_std`-capable,
  maintained, and does not pull in a surprise async runtime or heavy
  transitive graph.
- **No-std dependency rubric**: kernel/no_std additions require an explicit
  compatibility check that `core`/`alloc` paths do not regress to `std`
  through default feature drift, and class ownership is recorded against
  `docs/trusted-build-inputs.md`.
- **Boot/build input pinning**: pin external bootloader/tool downloads to an
  auditable revision or checksum. Branch names are not enough for TCB inputs.
  CI should fail when generated capnp bindings or no-std patching change
  outside an intentional schema/codegen update.
- **Untrusted-path panic audit**: `panic!`, `assert!`, `.unwrap()`, and
  `.expect()` are acceptable during bring-up, but every path reachable from
  manifest bytes, ELF bytes, SQEs, params buffers, result buffers, and future
  IPC messages needs either a fail-closed error or a documented halt policy.
- **Hardware protection smoke tests**: boot under QEMU with SMEP/SMAP-capable
  CPU flags and assert CR4.SMEP/CR4.SMAP once paging is initialized. Every
  explicit user-memory dereference must be wrapped in a short STAC/CLAC
  window once SMAP is enabled.

### Tier 2 -- Targeted dynamic analysis

Aimed at the host-testable pure-logic crates (`capos-lib`, `capos-config`)
where the Rust toolchain just works. No kernel changes required.

- **Miri** on the `cargo test-lib` and `cargo test-config` suites. Catches
  UB in pure-logic code: invalid pointer arithmetic, uninitialized reads,
  bad provenance, unsound `unsafe`. The FrameBitmap and CapTable tests in
  particular push against slot indexing, generation counters, and raw
  `&mut [u8]` handling -- exactly what miri is good at.
- **`proptest`** (or `quickcheck`) on:
  - `capos-lib::elf::parse` -- random bytes / random perturbations of a
    valid header must never panic and must refuse anything that isn't a
    correctly formed user-half ELF64.
  - `capos-lib::frame_bitmap` -- interleaved sequences of `alloc`,
    `alloc_contiguous`, `free`, `mark_used` preserve the invariant
    `free_count == popcount(bitmap == 0)` and never double-free.
  - `capos-lib::cap_table` -- insert/remove/lookup sequences preserve
    "every returned id resolves to its insertion-time object, and stale
    ids are rejected."
  - `capos-config::manifest` encode/decode round trip on arbitrary
    manifests.
  - Schema round-trip tests in `capos-config/tests/`: today
    `remote_capnp_rpc_dto_roundtrip.rs` pins the remote capnp-rpc DTO
    wire shape, and `remote_paperclips_dto_roundtrip.rs` (10 tests)
    pins the Remote Session Paperclips DTO wire shape ahead of the
    future gateway/worker/browser bridge that will marshal traffic
    through them. New shared-DTO families should land alongside
    similar round-trip coverage so schema drift is review-visible.
- **`cargo fuzz`** harnesses (libFuzzer). The current `fuzz/fuzz_targets/`
  set is seven targets: `elf_parse.rs`, `manifest_capnp.rs`,
  `mkmanifest_json.rs`, `sqe_validation.rs` (ring SQE wire validator via
  `capos_config::ring::sqe_wire_validation_error`), `telnet_filter.rs`,
  `telnet_filter_roundtrip.rs`, and `line_discipline.rs`. The Telnet
  round-trip oracle exists alongside the structural Telnet filter target
  because the round-trip variant found a real EXOPL parsing bug
  (`docs/changelog.md`). These run outside CI (they never terminate) but
  have seed corpora under `fuzz/corpus/` and can be exercised in fixed
  budgets via `make fuzz-build` and `make fuzz-smoke`.
- **Sanitizers on host tests**: `make sanitizer-host-tests` runs
  AddressSanitizer over the `capos-lib` and `capos-config` host suites
  under the repo-pinned nightly (zero findings to date). ASan is indeed
  cheap -- it needs no `-Zbuild-std`. ThreadSanitizer
  (`make sanitizer-host-tests-tsan`) is wired but currently blocked by an
  upstream cargo `-Zbuild-std` + build-script limitation when the
  sanitizer target equals the host triple; see Track S.17 for the
  recorded reproduction.

### Tier 3 -- Concurrency model checking

The capability ring is a lock-free single-producer / single-consumer
protocol using volatile reads, release/acquire fences, and a shared head/
tail pair. It is the most likely source of subtle memory-ordering bugs and
is also the most isolated -- a perfect fit for model checking.

- **Loom** on a host-buildable wrapper of the ring protocol. Extract the
  producer/consumer state machine from `capos-config::ring` into a form
  where atomics can be swapped for `loom::sync::atomic`, and write Loom
  tests that enumerate all interleavings of producer/consumer for small
  ring sizes (2--4 slots). Properties to check:
  - No CQE is lost.
  - No CQE is double-delivered.
  - The `sq_head/sq_tail` and `cq_head/cq_tail` pointers never observe a
    state that implies `tail - head > SQ_ENTRIES`.
  - The userspace ring "corrupted producer state" fail-closed policy from prior
    review-finding task records holds under adversarial interleavings.
- **Shuttle** as a lighter alternative for regression-style tests once the
  specific bugs are known; cheaper per run, randomised rather than
  exhaustive. Good for long-running overnight jobs.

Loom coverage here is disproportionately valuable: it substitutes for the
SMP-hardness work the project has explicitly deferred, and it exercises
exactly the ordering that TOCTOU-style bugs hide in.

### Tier 4 -- Bounded verification of specific invariants

Not a full-kernel proof. Targeted, property-specific, one-module-at-a-time.

- **Kani** (bounded model checking for Rust, via CBMC). Good fit for
  small, heap-free, arithmetic-heavy functions. Candidate modules:
  - `capos-lib::cap_table` -- prove that for all `insert; remove; insert'`
    sequences under a `u8` generation counter, a stale CapId never
    resolves. Bound: table size ≤ 4, generation window ≤ 256.
  - `capos-lib::frame_bitmap` -- prove that for all bitmap sizes up to N
    bytes, `alloc_frame` followed by `free_frame` of the same frame
    restores the original bitmap and `free_count`.
  - `capos-lib::elf::parse` bounds checks: prove that every index into the
    program header table is `< len`, given the validated `phentsize` and
    `phnum`.
- **Verus** (SMT-based Rust verifier, active development at MSR) for
  invariants that Kani can't handle ergonomically, particularly those
  involving loops and ghost state. Worth tracking but don't commit to it
  yet -- the proof-engineering cost is real, and the tool is still young.
  Revisit once IPC lands and the kernel has stable public APIs.
- **Creusot** / **Prusti** are alternatives in the same space. Do not
  invest in more than one SMT-based verifier; pick whichever has the best
  story for `no_std + alloc` code when Tier 4 starts.

Deliberately out of scope: Isabelle/HOL, Coq proofs, Frama-C. They would
require re-encoding Rust in a foreign semantic framework with no
established Rust front-end mature enough for kernel code.

## 4. Security Review Process

`REVIEW.md` is the rules document and `docs/tasks/**` is the open remediation
and review-finding ledger. `REVIEW.md` contains the common security checklist
that applies across kernel, userspace, host tooling, generators, and CI. The
per-boundary prompts below are an expansion of that common checklist for
OS-specific code paths.

### CWE/CAPEC tagging policy

Security findings should carry CWE metadata when the mapping is specific enough
to help a reviewer or future audit. Do not force a CWE into every title.

- Prefer Base/Variant CWE IDs when the root cause is known: CWE-770 for
  unbounded allocation, CWE-88 for argument injection, CWE-367 for a concrete
  validation-to-use race, CWE-416 for a real use-after-free.
- Use Class IDs as temporary or umbrella labels: CWE-20 for "input was not
  validated enough" before the missing property is known; CWE-400 for general
  resource exhaustion only when the enabling mistake is not more precise.
- Use capability-kernel invariants instead of weak CWE mappings for design
  properties such as "no ambient authority", "cap transfer happens exactly
  once", "revocation cannot leave stale authority", and "scheduling context
  donation cannot fabricate CPU authority". Cite CWE-862/CWE-863 only when
  the issue is actually a missing or incorrect authorization check.
- Use CAPEC for the attacker pattern when useful: input manipulation, command
  injection, race exploitation, flooding, or path/file manipulation. CAPEC is
  not a substitute for the CWE root-cause tag.

Current checklist coverage:

| Area | Primary tags | Review intent |
|---|---|---|
| Structured input validation | CWE-20, CWE-1284--CWE-1289 when precise | Validate syntax, type, range, length, indexes, offsets, and cross-field consistency before privileged use |
| Filesystem paths | CWE-22, CWE-23, CWE-59 | Keep host-tool paths inside intended roots across absolute paths, traversal, symlinks, and file-type confusion |
| Commands/processes | CWE-78, CWE-88 | Avoid shell interpolation; constrain binaries and arguments |
| Numeric/buffer bounds | CWE-190, CWE-125, CWE-787 | Check arithmetic before pointer, slice, copy, ELF segment, and page-table use |
| Resource exhaustion | CWE-770 preferred; CWE-400 broad | Bound queues, allocations, retries, spin loops, frames, scratch arenas, cap slots, and CPU budget |
| Exceptional paths | CWE-703, CWE-754, CWE-755; CWE-248 only for uncaught exceptions | Fail closed on malformed or adversarial input; avoid trust-boundary panic/abort |
| Authorization/cap authority | CWE-862, CWE-863 plus capOS invariants | Verify capability ownership, generation, object identity, address-space ownership, and transfer policy |
| Concurrency/TOCTOU | CWE-362, CWE-367, CWE-667 | Preserve lock ordering, interrupt masking, page-table stability, and validation-to-use assumptions |
| Lifetime/reuse | CWE-416, CWE-664, CWE-672 | Prevent stale caps, stale kernel stacks, stale frames, and expired IPC state from being used |
| Disclosure/residual data | CWE-200, CWE-226 | Prevent logs, result buffers, frames, scratch arenas, and generated artifacts from leaking stale or sensitive data |
| Supply chain / generated TCB | capOS TCB invariant; use CWE only for concrete bug | Pin or review-visible drift for bootloader, dependencies, schema/codegen, generated code, and patching |

### Per-boundary review checklist

- **Syscall surface change** (`arch/x86_64/syscall.rs`):
  - Every register-passed argument is treated as attacker-controlled.
  - No user pointer is dereferenced without an `AddressSpace`-locked
    copy/read helper or an explicitly documented equivalent stability
    guarantee.
  - Numeric conversions, copy lengths, and pointer arithmetic are checked
    before constructing slices or entering any direct user-access scope.
  - Kernel stack pointer and TSS.RSP0 invariants are preserved.
  - The syscall count stays bounded; a new syscall has an SQE-opcode
    alternative considered and explicitly rejected with rationale.
- **Ring dispatch change** (`kernel/src/cap/ring.rs`):
  - SQ bounds check and per-dispatch SQE limit still enforced.
  - Corrupted SQ state fails closed (never re-processes the same bad
    state on the next tick).
  - No allocation in the interrupt-driven path beyond what the owning task
    record or panic-surface inventory explicitly accepts.
  - Result buffers and endpoint scratch buffers cannot leak stale bytes
    beyond the returned completion length.
- **User buffer validation change** (`kernel/src/mem/paging.rs`,
  `kernel/src/mem/validate.rs`):
  - Address range check precedes PTE walk.
  - PTE flags checked: present, user, and write (if the buffer is
    written).
  - For process-owned buffers, validation and copy/read hold the process
    `AddressSpace` mutex. Any current-CR3 validator caller must document its
    own page-table stability guarantee.
- **ELF loader change** (`capos-lib::elf`):
  - Every field bounded before use (phentsize, phnum, p_offset, p_filesz,
    p_memsz, p_vaddr).
  - Segments confined to the user half.
  - Overlap check preserved.
  - Integer arithmetic uses checked add/subtract before deriving mapped
    addresses, file slices, or zero-fill ranges.
- **Manifest change** (`capos-config::manifest`):
  - Every optional field is either present or the service is rejected.
  - Name / binary / cap source strings are length-bounded.
  - Unknown / unsupported numbers in CUE input fail-closed with a path-
    specific error.
  - Capability grants are checked as an authority graph before any rejected
    graph can start a service.
- **Schema change** (`schema/capos.capnp`):
  - Backward-compatible with existing wire format, or migration documented.
  - Every new method has an explicit capability-granting story (who mints
    the cap that lets this method be called?).
  - Generated code `no_std` patching still applies.
- **Host tool or generator change** (`tools/*`, `build.rs`, CI scripts):
  - Manifest/config-derived paths cannot escape intended directories through
    absolute paths, traversal, symlinks, or file-type confusion.
  - External command execution uses explicit binaries and argument APIs, not
    shell interpolation of untrusted strings.
  - Generated outputs are review-visible and fail closed on malformed inputs.
  - Generated files and diagnostics do not disclose secrets, absolute paths,
    or stale build outputs beyond what the developer intentionally requested.
- **Unsafe block added or expanded**: Tier 1 clippy lints plus REVIEW.md
  §"Unsafe Usage" checklist already cover this; the review should cite
  the specific invariant being maintained in the commit message.

### Threat-model refresh

On every stage completion (Stage 6 IPC, Stage 7 SMP, first driver landing,
first time a manifest comes from outside the repo), re-run §2 of this
document and update it. The list of trust boundaries grows over time; the
proposal decays if it doesn't grow with the code.

### Periodic full audit

Once per stage, schedule a focused audit pass:

1. Re-verify every boundary's code is still enforced at its documented
   entry point (no new bypass path).
2. Re-run all Tier 2/3 jobs with the latest toolchain (catches
   tool-upgrade regressions).
3. Walk through open review-finding task records and confirm each is still
   correctly classified (still open, fixed, explicitly accepted, blocked, or
   on-hold).
4. Record the audit date and outcome in the relevant task records or a focused
   closeout task, matching the repository timestamp convention.

## 5. Concrete Verification Targets

Ordered by value and feasibility. Each one is a specific, bounded piece of
work a contributor can pick up without needing to redesign the kernel.

| # | Target | Tier | Property | Blocker |
|---|---|---|---|---|
| 1 | `capos-lib::cap_table` | 4 (Kani) | Stale CapId never resolves after slot reuse within the generation window | None |
| 2 | `capos-lib::frame_bitmap` | 4 (Kani) | `alloc`/`free` preserve `free_count` invariant; no double-alloc | None |
| 3 | `capos-lib::elf::parse` | 2 (proptest + fuzz) | No panic on arbitrary input; only well-formed user-half ELF64 accepted | None |
| 4 | `capos-config::manifest` | 2 (proptest + fuzz) | Decode/encode round-trip; malformed input rejected without panic | None |
| 5 | Ring SPSC protocol | 3 (Loom) | No lost/doubled CQEs; fail-closed on corruption under all interleavings | Extract protocol into Loom-testable wrapper |
| 6 | `AddressSpace` user-buffer helpers | 4 (Kani) | Every accepted buffer lies entirely in user half with correct PTE flags, and validation/use happens under the address-space lock | Formalise PTE and locking model |
| 7 | Ring dispatch path | 3 (Loom + proptest) | SQE poll is bounded per tick; no allocation on the dispatch path | Initial alloc-free synchronous path landed; async transfer/release paths still need coverage |
| 8 | IPC routing | 3 | Capabilities transferred exactly once; no duplication under direct-switch | Capability transfer |
| 9 | Direct-switch IPC handoff | 2 + 3 | Scheduler invariants preserved when a blocked receiver bypasses normal run-queue order | Loom-testable scheduler/ring model |
| 10 | SMEP/SMAP + user access windows | 1 + QEMU integration | Kernel cannot execute user pages; direct user-memory touches either use audited access windows or the AddressSpace/HHDM copy path | Wire existing x86_64 helper into init path |
| 11 | Manifest authority graph | 2 (property tests) | Every granted cap source resolves, every export is unique, and no service starts after a rejected graph | Manifest executor path |
| 12 | Resource accounting | 2 + 3 | Rings, endpoints, cap tables, scratch arenas, frames, and CPU budget fail closed under exhaustion | Security Verification Track S.9 design complete; implementation hooks pending |
| 13 | Build/codegen TCB | 1 | Bootloader/deps/codegen inputs are pinned and generated output changes are review-visible | CI bootstrap |
| 14 | Device DMA boundary (future) | 1 + design review | No driver or device can DMA outside explicitly granted buffers | PCI/device work; IOMMU or bounce-buffer decision |

Targets 1--4 are feasible today and should be the first batch of work.
Target 10 is the security gate before treating Stage 6 services as
untrusted. Targets 11--12 should be designed before capability transfer
lands, otherwise the first IPC implementation will bake in ambient resource
authority. Target 14 gates user-mode or semi-trusted drivers.

Current status as of 2026-05-16:

- Targets 1--2 are part of the completed **Verified Core** visible milestone:
  commit `d43b691` at `2026-04-23 22:09 UTC` made `make kani-lib` the
  bounded local/GitHub proof gate for cap-table and frame-bitmap invariants,
  and commit `c5968ee` at `2026-04-23 22:12 UTC` recorded the
  high-memory `make kani-lib-full` Cloud Build gate.
- Target 3 has arbitrary-input proptest coverage and a cargo-fuzz target for
  ELF bytes. The current Kani harness still only proves the short-input
  early-reject path because fully symbolic ELF parsing reaches allocator and
  sort internals before there is a sharper proof obligation.
- Target 4 has cargo-fuzz coverage for manifest decoding/roundtrip and
  mkmanifest exported-JSON conversion.
- Target 5 has a feature-gated Loom model for the shared ring protocol.
- Target 13 has an initial CI baseline plus generated-code drift checking,
  dependency audit/deny gates, and required QEMU boot still open. Remaining
  supply-chain provenance work is tracked by
  `docs/tasks/trusted-build-inputs-pr-blocking-provenance.md`; panic-surface
  hardening remains tracked by its owning task records across IPC/scheduler
  guarded unwraps, rollback restoration, stale queues, blocking waits,
  process/thread exit, endpoint cancellation, TLB shootdown send failures, and
  scheduler hot-path expects. Scheduler hot-path panic
  surface fully closed (2026-05-17, REVIEW_FINDINGS commit `1b295cb3`):
  all `.expect()` / `.unwrap()` in `block_current_on_cap_enter`,
  `next_start_context`, `schedule`, `exit_current`,
  `exit_current_thread`, `capos_block_current_syscall`, and
  `retain_endpoint_queue` hardened per the established let-else + log +
  drop-lock + `hcf()` / `return None` / `break` pattern (per-function
  closures at `7f86796f` / `777e0b3a` / `0af439d4` / `7d93aea4` /
  `b04d6d65` / `2bea189c`).
- Out-of-band scheduler/runtime hazards tracked in review-finding task records
  but not yet expressed as Concrete Verification Targets above: current
  post-AP kernel upper-half page-table mutation through the MMIO/firmware
  helper path is closed by kernel-wide TLB shootdown plus preseed/fail-closed
  PML4-slot handling
  (`../tasks/done/2026-06-07/kernel-upper-half-pml4-propagation-hardening.md`);
  future helper windows or allocator-growth paths that need a new kernel-half
  PML4 slot still require boot preseed or synchronized live-root propagation.
  ParkSpace
  unmap/reuse cleanup still owes shared park-word cleanup and
  address-space generation cleanup; resource quota fields for scratch
  bytes, outstanding calls, endpoint queues, and in-flight calls need
  real wiring or removal. Each is owned by its respective subsystem
  proposal; the consolidated routing index lives in
  `docs/design-risks-register.md`.

## 6. Security Verification Track Registry

The `S.x` labels are registry identifiers for this proposal's
security-verification track. They are not product stages and should be expanded
as "Security Verification Track S.x" when cited outside this proposal.

| Track | Name | Status | Primary document or evidence |
| --- | --- | --- | --- |
| S.1 | CI bootstrap | Landed 2026-04-21 | `.github/workflows/ci.yml` |
| S.2 | Miri + proptest on capos-lib | Landed 2026-04-21 | `cargo test-lib`, `cargo miri-lib` |
| S.3 | Manifest + mkmanifest fuzzing | Landed 2026-04-21 | `fuzz/` manifest and mkmanifest targets |
| S.4 | Ring Loom harness | Landed 2026-04-21 | `capos-config/tests/ring_loom.rs` |
| S.5 | Kani on capos-lib | Initial landed 2026-04-21, expanded bounded gate landed 2026-04-23 | `make kani-lib` |
| S.6 | Security review docs stay aligned | Ongoing | `REVIEW.md`, `CLAUDE.md` |
| S.7 | Stage-6-aware refresh | Planned/ongoing | Trust-boundary inventory after Stage 6 changes |
| S.8 | Untrusted-service hardening gate | Planned | SMEP/SMAP, user access windows, hostile-userspace tests |
| S.9 | Authority graph and resource accounting | Landed 2026-04-21 | `docs/authority-accounting-transfer-design.md` |
| S.10 | Supply-chain and generated-code TCB | Partially landed | `docs/trusted-build-inputs.md` |
| S.11 | Device/DMA isolation gate | Design accepted; brokered-bounce DDF production authority gates landed for the current local/GCE path, while direct-remapping and hostile-hardware claims remain future | `docs/dma-isolation-design.md` |
| S.12 | Kani harness bounds refresh | Planned | Future transfer/accounting/user-buffer proof obligations |
| S.13 | ELF parser arbitrary-input coverage | Landed | `capos-lib::elf::parse`, `fuzz/fuzz_targets/elf_parse.rs` |
| S.14 | Telnet IAC filter fuzz coverage | Landed 2026-04-27 16:33 EEST | `capos-lib::telnet`, `fuzz/fuzz_targets/telnet_filter.rs` |
| S.15 | Telnet differential round-trip + line-discipline extraction | Landed 2026-04-27 17:18 EEST | `capos-lib::line_discipline`, Telnet round-trip fuzz target |
| S.16 | Ring SQE wire-validation extraction + fuzz target | Landed 2026-04-27 19:42 EEST | `capos_config::ring::sqe_wire_validation_error`, `fuzz/fuzz_targets/sqe_validation.rs` |
| S.17 | Sanitizers on host tests | ASan landed (zero findings); TSan blocked upstream | `make sanitizer-host-tests` / `make sanitizer-host-tests-tsan` |

### Track Details

This slots into docs/tasks/README.md as a cross-cutting track rather than a phase
-- items are independent of Stage 6 IPC and can proceed in parallel.

Subtracks are scoped identifiers under their parent track:

| Subtrack | Parent | Name | Primary document or evidence |
| --- | --- | --- | --- |
| S.10.0 | S.10 | Trusted build input inventory | `docs/trusted-build-inputs.md` |
| S.10.2 | S.10 | Generated-code drift check | `make generated-code-check` |
| S.10.3 | S.10 | Dependency policy and no_std review gate | `make dependency-policy-check`, `deny.toml` |
| S.11.1 | S.11 | DMA capability invariants | `docs/dma-isolation-design.md` |
| S.11.2 | S.11 | Userspace-driver ownership-transition gate | `docs/dma-isolation-design.md` |

Security Verification Track S.11.2 defines checklist rows S.11.2.0 through
S.11.2.9 in `docs/dma-isolation-design.md`; those row labels are local
acceptance criteria for the userspace-driver transition, not independent
registry tracks.

- **Track S.1 -- CI bootstrap** -- landed 2026-04-21
  - `.github/workflows/ci.yml`: fmt-check, test-config, test-ring-loom,
    test-lib, test-mkmanifest, `cargo build --features qemu`,
    `make capos-rt-check`, generated-code drift checking, and dependency
    policy checking.
  - QEMU smoke installs `build-essential`, `capnproto`, `qemu-system-x86`,
    `xorriso`, and `cue` v0.16.0 before running `make` and `make run`; it
    remains optional/non-blocking until boot runtime is stable enough to make
    it a required gate.
  - Clippy-with-deny and cargo-geiger remain future hardening jobs.
- **Track S.2 -- Miri + proptest on capos-lib** -- landed 2026-04-21
  - Add `proptest` dev-dependency to `capos-lib`.
  - Host properties for `capos-lib::cap_table` and
    `capos-lib::frame_bitmap`; ELF arbitrary-input coverage is tracked
    separately under landed Security Verification Track S.13.
  - `cargo test-lib` runs the native host suite; `cargo miri-lib` runs the
    same crate under Miri.
- **Track S.3 -- Manifest + mkmanifest fuzzing** -- landed 2026-04-21
  - `fuzz/` crate with harnesses for `manifest::decode` and
    `tools/mkmanifest` CUE → capnp pipeline. Seed corpus checked in.
- **Track S.4 -- Ring Loom harness** -- landed 2026-04-21
  - Extract the SPSC protocol from `capos-config::ring` into a test-only
    wrapper where atomics are swappable.
  - Loom tests covering corruption, overflow, and ordering.
  - Doubles as regression coverage for Phase 1.5 in docs/tasks/README.md.
- **Track S.5 -- Kani on capos-lib** -- initial harnesses landed 2026-04-21,
  expanded bounded gate landed 2026-04-23
  - CapTable generation/index/stale-reference invariants.
  - FrameBitmap fail-closed free-error behavior plus a concrete bounded
    contiguous-allocation proof.
  - Transfer/resource-accounting fail-closed invariants for cap-slot
    preflight, frame-grant reservation, invalid transfer-origin rejection,
    move-reservation rollback after revocation, source visibility/accounting
    after the real `prepare_copy_transfer` path, and provisional destination
    cap-slot/frame-grant ledger restoration.
  - Propagation of real prepared transfer metadata into a provisional
    destination slot is reserved for `make kani-lib-full`; Google Cloud Build
    run `95b49620-06a5-49f4-85e6-782adb82d11c` passed this high-memory gate on
    2026-04-23.
  - ELF parser short-input early-reject panic-freedom exists as a targeted Kani
    harness but is not part of the mandatory bounded gate.
  - The current bounds are intentionally conservative so `make kani-lib`
    remains a practical local/GitHub CI gate; broader symbolic ELF and
    contiguous-allocation proofs should wait for more specific invariants or
    high-memory runners.
- **Track S.6 -- Security review docs stay aligned**
  - Keep REVIEW.md's common security checklist aligned with §4's boundary
    prompts as new boundaries land.
  - Add a "threat model refresh" step to the stage-completion workflow in
    CLAUDE.md.
- **Track S.7 -- Stage-6-aware refresh**
  - Re-run §2 trust-boundary inventory after capability transfer/release
    semantics land.
  - Plan Loom coverage for cross-process routing and direct-switch IPC.
  - Carry the inventory through the active scheduler-evolution phases
    (Phase D WFQ, Phase E `SchedulingContext`, Phase F one-SQ-consumer and
    nohz telemetry) and the WASI host-adapter surface (Phase W.4 entropy
    production wiring + per-instance argv text grant) so each new boundary
    is reflected in §2 before it can be relied on. The WASI host adapter
    is a userspace trust boundary -- wasmi sandbox around untrusted
    Preview 1 payloads with per-instance `EntropySource` / argv grants --
    that the Tier 2/3 plan should explicitly cover as new harness targets
    emerge (see `docs/proposals/wasi-host-adapter-proposal.md`).
  - The Phase 1 monitoring log surface (`LogSink`/`LogReader`,
    `kernel/src/cap/log.rs`) is a new kernel boundary: a `LogSink` accepts
    bounded userspace-supplied records (decoded, length-truncated, severity-
    filtered against `SystemConfig.logLevel`) into a bounded drop-oldest ring,
    and a scoped `LogReader` serves cursor/filtered snapshots. It confers no
    transfer/grant authority beyond the scoped sink/reader and adds no ambient
    log namespace. Carry it in §2 before downstream services rely on it;
    per-process log token-bucket backpressure remains future work
    (`docs/proposals/system-monitoring-proposal.md`).
- **Track S.8 -- Untrusted-service hardening gate**
  - Wire SMEP/SMAP enablement into x86_64 init after paging is live.
  - Replace raw user-slice construction in syscall/ring paths with checked
    copy/access helpers that bracket the actual access with STAC/CLAC.
  - Add QEMU hostile-userspace tests for bad pointers, kernel-half pointers,
    invalid caps, corrupted rings, and services without Console authority.
  - Audit untrusted-input paths for panics before Stage 6 endpoints run
    mutually-untrusting processes.
- **Track S.9 -- Authority graph and resource accounting** -- landed 2026-04-21
  - Concrete design is captured in
    `docs/authority-accounting-transfer-design.md`.
  - Defines authority graph invariants, per-process quota ledger
    (`cap slots`, `endpoint queue`, `outstanding calls`, `scratch`,
    `frame grants`, `log volume`, `CPU budget`), diagnostic aggregation, and
    exactly-once transfer/rollback semantics.
  - Establishes acceptance criteria that gate capability transfer and
    ProcessSpawner implementation. Current follow-up items live in
    `docs/backlog/stage-6-capability-semantics.md`.
- **Track S.10 -- Supply-chain and generated-code TCB**
  - Pin Limine and other external build inputs by revision/checksum rather
    than branch name.
  - Make capnp generated-code changes review-visible in CI, including the
    no-std patching step.
  - Consider `cargo-vet` only after `cargo-deny`/`cargo-audit` are in place;
    vetting too early is process theater.
  - Security Verification Track S.10.3 adds a concrete dependency policy:
    no_std additions are accepted only with class attribution, `cargo deny` +
    `cargo audit`, and explicit lockfile intent.
  - Security Verification Track S.10.3 enforcement is
    `make dependency-policy-check`, backed by `deny.toml` and pinned CI
    installs of `cargo-deny 0.19.4` and `cargo-audit 0.22.1`.
- **Track S.11 -- Device/DMA isolation gate**
  - The DMA isolation story is now runtime-selected and fail-closed:
    guest-programmable remapping only when capOS can discover, program, and
    validate it; otherwise labeled brokered bounce buffers or unsupported.
  - `DMAPool`, `DeviceMmio`, and `Interrupt` invariants are represented by done
    task evidence for bounded physical/device-visible ranges, explicit
    interrupt ownership, reset/release teardown, generation checks, and no raw
    host-physical grants to untrusted drivers.
  - The current GCP/no-IOMMU userspace-provider path is brokered bounce-buffer
    authority. It supports the proved virtio-net and NVMe provider chains
    without claiming direct DMA, IOVA export, hostile bus-master isolation, or
    device-autonomous MSI-X delivery.
  - The DDF production-authority closeout closes the retained review finding
    for the current brokered-bounce provider path. Security Verification Track
    S.11.2 remains the canonical matrix for future direct-remapping/vIOMMU,
    hostile-hardware isolation, and broader device-owner claims.
- **Track S.12 -- Kani harness bounds refresh**
  - Revisit Kani bounds and harness shape once capability transfer,
    resource-accounting, or `AddressSpace` user-buffer helpers expose concrete
    proof obligations.
  - Prefer actionably narrow properties over arbitrary symbolic parser
    exploration that spends verifier time in allocator or sort internals.
- **Track S.13 -- ELF parser arbitrary-input coverage** -- landed
  - `capos-lib::elf::parse` has proptest coverage for arbitrary bytes and
    valid-header perturbations.
  - `fuzz/fuzz_targets/elf_parse.rs` exercises ELF bytes through cargo-fuzz.
- **Track S.14 -- Telnet IAC filter fuzz coverage** -- landed 2026-04-27 16:33 EEST
  - Extract the kernel's `TelnetFilter` byte-stream parser into
    `capos-lib::telnet` so it is host-fuzzable and survives the Phase C
    move of Telnet framing into userspace per
    `docs/proposals/networking-proposal.md`.
  - Add `fuzz/fuzz_targets/telnet_filter.rs` with structural assertions
    (Normal must pass non-IAC bytes through unchanged; AfterIac is the
    only state allowed to emit a 0xFF; emitted byte count never exceeds
    input length).
  - Wired into `make fuzz-build` and `make fuzz-smoke`.
- **Track S.15 -- Telnet differential round-trip + line-discipline extraction** -- landed 2026-04-27 17:18 EEST
  - Add `fuzz/fuzz_targets/telnet_filter_roundtrip.rs`: synthesize
    arbitrary RFC 854 event streams from fuzzer bytes, encode to wire,
    run through `TelnetFilter`, assert output equals the concatenation
    of `Data(_)` payloads. Found a real EXOPL handling bug -- the
    option byte right after `IAC SB` was being mis-parsed as the start
    of an `IAC IAC` escape when its value was 0xFF, leaving the filter
    stuck in subnegotiation and silently dropping all subsequent data.
    Fixed via a new `AfterSb` state that consumes the option byte
    unconditionally; pinned by a regression test in `capos-lib::telnet`.
  - Extract the cooked-mode line discipline from `kernel::cap::network`
    into `capos_lib::line_discipline::LineDiscipline`, returning
    `LineStep { outcome, echo }` so all socket I/O stays at the caller.
    Add `fuzz/fuzz_targets/line_discipline.rs` with structural
    invariants (line_len <= max_bytes; ±1 line_len delta per Pending
    step; Cancelled clears; Echo::Byte/Backspace iff buffer grew/shrank
    by exactly one).
  - Future follow-up: differential against an external Telnet library
    (libtelnet C or Rust port) to catch RFC conformance bugs the
    structural targets cannot express.
- **Track S.16 -- Ring SQE wire-validation extraction + fuzz target** -- landed 2026-04-27 19:42 EEST
  - Closes the original three-parser fuzz plan (`elf::parse`,
    `manifest::decode`, ring SQE decoder). Lifts the per-opcode
    `*_sqe_has_unsupported_fields` predicates from
    `kernel/src/cap/ring.rs` into `capos_config::ring`, exposes a
    unified `sqe_wire_validation_error(&CapSqe) -> Result<(), i32>`
    entry point, and reroutes the kernel through the shared functions
    so the kernel-host pair has one source of truth for ABI rules.
  - Add `fuzz/fuzz_targets/sqe_validation.rs`: cast arbitrary 64 bytes
    to `CapSqe`, run `sqe_wire_validation_error` and the matching
    per-opcode predicate, assert determinism, opcode-classification
    consistency (`CAP_OP_FINISH` -> `CAP_ERR_UNSUPPORTED_OPCODE`,
    unknown opcodes -> `CAP_ERR_INVALID_REQUEST`), and that the
    unified validator never disagrees with the predicate it dispatches
    to. Wired into `make fuzz-build` / `fuzz-smoke`.
  - Add 12 host unit tests in `capos_config::ring` covering the
    classification rules each opcode imposes (THREAD_OWNED + call_id
    pairing on CALL/PARK, RETURN's APPLICATION_EXCEPTION flag,
    CANCEL's required pipeline_dep target, NOP's reserved-fields-zero
    rule, PARK_BENCH's required addr).
  - The structural fuzz target pins arbitrary-byte behavior. The follow-up
    well-formed SQE generator oracle landed on 2026-06-06: the test/fuzz-only
    `sqe-validation-oracle` feature exposes
    `capos_config::ring::sqe_oracle`, which generates validator-accepted SQEs
    for each accepted opcode and one-field rejecting mutations, and
    `fuzz/fuzz_targets/sqe_validation.rs` runs that oracle on each input.
    This is a shared wire-validator oracle only; it does not claim cap-table
    lookup, userspace pointer mapping, transfer-descriptor loading, or full
    kernel ring semantic coverage. A future differential against an independent
    reference predicate remains a possible stronger disagreement oracle.
- **Track S.17 -- Sanitizers on host tests** -- ASan landed; TSan
  blocked upstream
  - `make sanitizer-host-tests` runs `RUSTFLAGS=-Zsanitizer=address`
    over the `capos-lib` and `capos-config` host suites (crate set /
    features mirror the `test-lib` / `test-config` aliases) on the
    repo-pinned nightly + host target. It is a focused gate, not part of
    `make check`, mirroring `dependency-policy-check` /
    `sdk-publish-dry-run`. Outcome so far: zero findings; both suites
    pass clean, including the named `unsafe` suspects (FrameBitmap slot
    indexing, CapTable generation counters, `lazy_buffer` raw
    `&mut [u8]`). The §Tier 2 "cheap to add" claim holds for ASan, which
    needs no `-Zbuild-std`.
  - `make sanitizer-host-tests-tsan` is wired but currently blocked by an
    upstream cargo limitation, not a capOS defect. TSan changes the crate
    ABI, so rustc refuses to link sanitized code against the
    uninstrumented precompiled std; instrumenting std needs
    `-Zbuild-std`, which fails with duplicate `core` lang items for
    build-script-bearing dependencies (typenum / libc / cfg-if / subtle)
    when the sanitizer target equals the host triple. The exact
    reproduction (four attempted workarounds) is recorded in
    `docs/backlog/security-verification.md` Track S.17. Concurrency
    invariants are meanwhile covered by the dedicated Loom model
    (`cargo test-ring-loom`).
  - Done means: the ASan gate exists, runs under nightly, and any
    findings either land as fixes or get a documented disposition; the
    TSan target starts passing once the upstream `-Zbuild-std` +
    build-script issue is fixed.

Security Verification Tracks S.1 through S.5 have initial coverage. Track S.6
is ongoing doc hygiene and should move with review-process changes. Track S.8
must land before Stage 6 runs mutually-untrusting services. Track S.9 design is
complete and now gates concrete implementation work in 3.6/5.2. Track S.11
gates device-driver work. Track S.12 should not expand bounds for their own
sake; it is a refresh point when new kernel invariants make better proof
targets available. Track S.13 closes the remaining target-3 gap from the table
above.

## 7. What This Proposal Does Not Promise

- No claim that capOS will be "secure" at the end. It will be **harder to
  write a silently wrong change** to the code paths the tooling covers,
  and it will be **easier to find the ones that are still wrong**.
- No proof obligation on every PR. Kani and Loom are expensive to run on
  every push; CI runs them on a reduced schedule (e.g. nightly, or on PRs
  that touch the covered crates).
- Userspace and host-tool bugs are in scope, but their impact is classified by
  boundary. A userspace bug should not compromise kernel isolation; a host-tool
  bug can still compromise the build TCB or developer/CI filesystem.
- No claim that confidentiality is handled beyond architectural isolation.
  Timing channels, cache side channels, device side channels, and covert
  channels through shared services remain explicit research topics, not
  current implementation goals.

## 8. Relation to Other Docs

- `docs/research/sel4.md` §1 and §6.1 already make the case that full
  verification is not the right goal. This proposal is the operational
  answer.
- `REVIEW.md` is the reviewer's rulebook. This proposal explains the security
  and verification rationale behind its common checklist and per-boundary
  prompts.
- `docs/tasks/**` is the open-issue ledger. This proposal feeds it -- every bug
  found by Tier 2/3/4 tooling gets a task record unless fixed in the same
  change.
- `docs/roadmap.md` owns the stages; this proposal does not add stages, only
  a cross-cutting track that runs alongside them.
- Task records under `docs/tasks/` own concrete ordering; Security Verification
  Tracks S.1--S.17 above are mirrored there when they are actionable slices.
- `docs/design-risks-register.md` is the consolidated index of long-horizon
  design risks and open architectural questions; consult it when this
  proposal's open gaps reference a hazard whose primary owner lives in a
  subsystem proposal, backlog, or design file rather than here.