Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Security Review and Formal Verification Proposal

How to reason about the correctness and security of the capOS kernel and its trust boundaries in a way that fits a research OS – pragmatic tooling now, targeted verification where it pays off, no aspirational seL4-style full- kernel proofs. The docs/research/sel4.md survey already concluded that Isabelle/HOL-over-C verification does not transfer to Rust and that the design constraints matter more than the proof artefact. This proposal codifies that conclusion into a concrete tooling and process plan.

This proposal uses CWE for concrete vulnerability classes, CAPEC for attacker patterns, Rust language rules / unsafe-code guidance for low-level coding rules, Common Criteria protection-profile concepts for OS security functions, ITU-T X.800/X.805 security-services taxonomy as a completeness checklist, and capability-kernel practice (seL4/EROS-style invariants) for authority, IPC, object lifetime, and scheduler properties. Web-application checklists are not the baseline for OS design review.

Grounding sources:

  • MITRE CWE for root-cause weakness labels: CWE-20 explicitly covers raw data, metadata, sizes, indexes, offsets, syntax, type, consistency, and domain rules; CWE also marks broad classes such as CWE-20 and CWE-400 as discouraged for final vulnerability mapping when a more precise child fits.
  • MITRE CAPEC for attacker behavior, especially input manipulation (CAPEC-153), command injection (CAPEC-248), race exploitation (CAPEC-26 / CAPEC-29), and flooding/resource pressure (CAPEC-125).
  • Rust Reference and Rust 2024 Edition Guide for unsafe-block and unsafe_op_in_unsafe_fn obligations.
  • seL4 MCS and the existing capOS research notes for capability-authorized access to kernel objects and CPU time.
  • Common Criteria General Purpose Operating System Protection Profile for OS access-control, security-function, trusted-channel/path, and user-data protection concepts. capOS is not trying to certify against it; the PP is a vocabulary check for what an OS security review should not omit.
  • ITU-T Rec. X.800 (03/91) Security architecture for OSI and X.805 (10/03) Security architecture for systems providing end-to-end communications for the layered security-services taxonomy: authentication, access control, non-repudiation, data confidentiality, data integrity, availability, privacy × infrastructure/services/ applications planes × end-user/control/management planes. Used as a completeness matrix: if a proposal claims to cover security but leaves one cell unaddressed (e.g. “we have confidentiality but no non-repudiation story for the management plane”), review should flag the gap. Also ITU-T X.810-X.816 for the individual framework breakdowns — authentication (X.811), access control (X.812), non-repudiation (X.813), confidentiality (X.814), integrity (X.815), audit and alarms (X.816).

1. Philosophy and Scope

capOS is explicitly a research OS whose design principle is “schema-first typed capabilities, minimal kernel, reuse the Rust ecosystem.” Three consequences shape this proposal:

  1. The schema is part of the TCB. A bug in the .capnp schema, or in the way generated code is patched for no_std, is exactly as dangerous as a bug in the kernel. The schema, the capnpc build pipeline, and the generated code all need review attention – not only hand-written kernel code.
  2. The kernel should stay small. “Everything else is a capability” means the TCB is naturally bounded. Verification effort scales with TCB size, so resisting kernel bloat is itself a security property.
  3. The interface is the permission. Access control lives in capnp method definitions and in userspace cap wrappers (a narrow cap is a different CapObject), not in kernel rights bitmasks. Review must confirm that the kernel never short-circuits this: no ambient authority, no method that bypasses CapObject::call, no syscall that exposes an object without a capability handle.

Non-goals:

  • Full functional-correctness proof of the kernel à la seL4. Infeasible in Rust today, and the payoff is low for a research system whose surface area is still changing.
  • Proving information-flow / confidentiality properties end-to-end.
  • Certifying a specific configuration for external deployment.

2. Trust Boundaries and Threat Model

Enumerating the boundaries forces every future review to ask “which boundary does this change touch?” and picks out the code paths that matter.

TCB Statement

Current demo/proof TCB is broader than the target production TCB. Security claims must name which one they rely on.

Current demo/proof TCB:

  • kernel, including scheduler, memory management, capability dispatch, endpoint IPC, in-kernel networking, smoltcp runtime, line discipline, Telnet IAC filtering, PCI/virtio-net smoke code, and kernel-owned DMA buffers;
  • capos-config, schema/codegen output, manifest validator, and checked-in generated bindings;
  • capos-rt runtime transport, userspace entry/panic/allocator glue, and typed handle release behavior;
  • standalone init, AuthorityBroker, SessionManager, CredentialStore, shell launcher, restricted launcher, and demo services used by the active manifest;
  • focused QEMU manifests, host harnesses, and build tools used to construct and validate each proof image;
  • QEMU virtio devices and host-local loopback forwarding for networking proofs.

Target production TCB:

  • kernel primitives that enforce address-space isolation, capability tables, generation/epoch checks, ring transport validation, scheduler/thread safety, interrupt/timer correctness, and explicit DMA/IOMMU policy;
  • schema definitions, generated-code owner, shared ABI constants, and the build/signature path for production boot images;
  • minimal init/supervisor authority needed to assemble the service graph, grant narrowed caps, restart services, and expose scoped status/audit;
  • credential, session, broker, key-vault, audit, and remote-ingress services that directly decide authentication, authorization, disclosure, and key use;
  • production device managers, network stack, and storage services only to the extent they hold the corresponding device, network, or persistence authority.

Target non-TCB components should include ordinary applications, untrusted service binaries, domain libraries without privileged caps, shell children, and network peers. The target is not reached while default networking runs in the kernel TCB, the focused Telnet terminal-hosting fixture still relies on kernel TCP terminal handoff, SSH uses fixture/dev key material, or remote shells share pre-auth and post-auth process authority.

Current boundaries

BoundaryWho trusts whomCode that enforces it
Ring 0 ↔ Ring 3kernel trusts nothing from userkernel/src/mem/paging.rs, kernel/src/mem/validate.rs, arch/x86_64/syscall.rs; exercised by init/ and demos/*
Kernel ↔ user pointerkernel validates address + PTE perms under the process VM lockAddressSpace::validate_user_buffer, copy_from_user, copy_to_user, and legacy validate_user_buffer for current-CR3 diagnostics
Manifest ↔ kernelkernel parses capnp manifest at bootcapos-config::manifest, called from kmain
Build inputs ↔ TCBkernel trusts schema/codegen/build artifactsschema/capos.capnp, build.rs, Cargo.lock, Makefile
Host tools ↔ filesystem/processtools must not let manifest/config input escape intended host boundariestools/mkmanifest, generators, CI scripts
ELF bytes ↔ kernelkernel parses user ELF to map segmentscapos-lib::elf
User ring ↔ kernel dispatchkernel trusts no SQ statekernel/src/cap/ring.rs
CapObject::call wire formatkernel trusts no params bytesgenerated capnp decoders + impls
Process ↔ process IPCkernel routes calls between mutually isolated address spaces and trusts neither side’s bufferskernel/src/cap/endpoint.rs, kernel/src/cap/ring.rs, kernel/src/sched.rs
Device DMA ↔ physical memorykernel and device-manager trust no userspace driver-supplied device address, stale DMA handle, or stale interrupt routekernel/src/dma_backend.rs, kernel/src/device_dma.rs, kernel/src/device_manager/, and the DDF cap objects select a DMA backend at boot, expose manager-owned bounce-buffer handles when no trusted remapping domain exists, hide host physical addresses/IOVAs from userspace providers, and bind DeviceMmio/DMAPool/Interrupt lifecycle to generation-checked ownership ledgers. The QEMU Intel path has bounded per-device remapping evidence; current no-IOMMU cloud/GCE paths are brokered bounce-buffer authority and still do not claim hostile bus-master isolation.
WASI host adapter sandboxuserspace wasm-host runs untrusted Preview 1 payloads inside the vendored wasmi interpreter; capOS trusts no wasm import beyond the explicit grant set on HostStatecapos-wasm/src/wasi/preview1.rs translates wasm calls into typed Console/Timer/BootPackage/EntropySource invocations; per-instance argv text grants and random_get against the kernel EntropySource cap honor manifest-declared scope. Ungranted Preview 1 calls return ERRNO_NOSYS rather than fabricating authority. The boundary surface today covers W.1-W.4 (substrate, stdout-only stubs, argv grant, random_get production wiring); wasi_args and entropy fills are bounded by WASI_ARGS_MAX_* and RANDOM_GET_MAX_BYTES. Filesystem, environment beyond argv, full clocks, and remaining Preview 1 surface remain un-implemented refusals.
POSIX adapter v0 substratelibcapos-posix exposes a narrow fork-for-exec / pipe / socket / clock surface to C code; capOS trusts that the recording-shim window stays scoped to the synthetic child branch and that explicit grants pass through ProcessSpawner.spawnlibcapos-posix per-process static fd table, single-thread errno cell, kernel UdpSocket/Timer/Pipe clients, and the recording-shim Move-grant stdio_<N> path. The pseudo-child branch never calls _exit() on execve() failure; surface remains research/v0, not a full POSIX TCB.
Persistent config overlay ↔ initinit trusts no bytes from the on-disk system/config/overlay.bin; it validates the overlay version, SHA-256 content hash, the base manifest’s declared extension points (allowed service caps, max additional services, minOverlayEpoch, settings allowances), and base-pin non-collision before composing, and rejects the whole overlay (booting the base manifest floor) on any violationcapos-config::manifest (SystemConfigOverlay::from_capnp_bytes + compose_onto), init/src/main.rs apply_config_overlay; proof make run-installable-overlay
Hardware cap teardown auditkernel must record every acquire, release, rollback-detach, Drop-detach-failure, explicit driver-crash trigger, reset/disable trigger, interrupt-waiter trigger, and explicit bounded proof-buffer free for the DDF caps so post-mortem review can correlate device-manager state with cap lifecyclekernel/src/cap/hardware_audit.rs emit helper invoked from device_mmio.rs, interrupt.rs, dma_pool.rs, dma_buffer.rs, plus the devicemmio_grant_source.rs, dmapool_grant_source.rs, and interrupt_grant_source.rs userspace-grant rollback paths. The bounded DMAPool grant source emits DmaPool acquire for its manifest grant; DMAPool.allocateBuffer can mint one manager-attached proof DMABuffer result cap with its own acquire/free-buffer/release-after-free audit, while duplicate proof-buffer allocation and real DMA allocation remain blocked. Parent-first DMAPool release records a pending parent detach and completes after typed DMABuffer.freeBuffer frees the proof page, after cap release frees the proof page, or after successful DMABuffer driver-crash/reset-disable cleanup frees that page, preserving the one final DmaPool release audit. The real driver-crash teardown trigger entry points on DeviceMmioCap, InterruptCap, DmaPoolCap, and DmaBufferCap (device_manager::trigger_driver_crash_for_* plus each cap’s on_driver_crash) emit event=driver-crash exactly once per successful detach; stale rerun stays silent. The reset/disable trigger entry points on all four cap types (trigger_reset_disable_for_* plus each cap’s on_reset_disable) mirror that single-emit policy with event=reset-disable. The first cap-specific interrupt-waiter trigger on InterruptCap (trigger_interrupt_waiter_for_interrupt plus InterruptCap::on_interrupt_waiter) mirrors the same policy with exactly one event=interrupt-waiter audit record for the first successful detach. DMABuffer.freeBuffer emits exactly one event=free-buffer record on the successful explicit proof-buffer free, invalidates later DMABuffer.info, and leaves the later cap release as a no-op detach. The DMA pool reset path keeps the zero-live/quiesced/scrubbed evidence precondition, and the DMA buffer reset path reuses the bounded FreeBuffer page cleanup path before evidence-gated parent-pool cleanup. All explicit-trigger impls use the load-bearing exhaustive match outcome.detach_label() { "ok" => emit event, "noop" => silent, label => kprintln + DropDetachFailed } shape, so any future non-"ok"/non-"noop" outcome label still surfaces as a DropDetachFailed audit rather than being silently dropped. Each event is a cap-audit: key=value line on COM1 carrying the cap tag (interface id), event class, BDF, owner, and the relevant generation fields, and emit_cap_audit also appends it to a bounded volatile ring. HardwareAuditLog.snapshot exposes the latest retained records to userspace with drop-oldest retention while reporting volatile-only persistence, unsigned signatures, manifest-granted read-only snapshot access, production subscriber admission policy not implemented, and the volatile snapshot truncation contract; a QEMU-only local-ring proof asserts all four truncation labels without mutating the live ring. Durable storage, signing, and production subscriber admission remain future work. The legacy hardware-cap-release: line is retained alongside the audit line.

Attacker model

  • Untrusted service binaries. Today’s services are checked into the repo, but the manifest pipeline is meant to load arbitrary binaries eventually. Assume every byte of a service’s SQEs, params buffers, result buffer pointers, and return addresses is attacker-controlled.
  • Untrusted manifest. Once manifests are produced outside the repo (e.g. generated from CUE fragments, passed in as a Limine module), the manifest parser must reject every malformed input without panicking.
  • Resource exhaustion. Once multiple mutually-untrusting services run, a service can attack by filling rings, endpoint queues, capability tables, frame pools, scratch arenas, logs, or CPU time. Boundedness and accounting are security properties, not performance polish.
  • Build input drift. The schema/codegen path is already part of the TCB. External build inputs such as the bootloader checkout, Rust dependencies, capnp code generation, and generated-code patching must be reproducible enough that review can tell what changed.
  • Host tooling input. Build tools and generators run with developer/CI filesystem access. Treat manifest/config-derived paths and command arguments as untrusted until bounded to the intended directory and execution context.
  • Residual state and disclosure. Kernel logs, returned buffers, recycled frames, endpoint scratch space, and generated artifacts must not expose kernel pointers, stale bytes from another process, secrets, or build-system paths that increase attacker leverage.
  • Hostile interrupts / preemption. The scheduler preempts at arbitrary points. Any kernel invariant that is only transiently true must be held under the right lock or with interrupts disabled.
  • Out of scope (for now): physical attacks, speculative-execution side channels, malicious hardware, IOMMU bypass from DMA devices. These become in-scope once the driver stack lands; revisit the threat model then.

Threat Actor Matrix

ActorCurrent scopeCurrent treatmentProduction gate
Local physical attackerOut of scope.The prototype does not claim protection against physical memory access, bus probing, evil-maid boot replacement, cold boot, firmware compromise, or direct console access.Secure/measured boot, sealed storage keys, physical console policy, and hardware-rooted attestation before production claims.
Malicious DMA deviceOut of scope for hostile hardware; in scope only as confused userspace around cooperative QEMU virtio.The virtio-net smoke assumes QEMU-provided cooperative virtio hardware and kernel-owned bounce buffers. Without an IOMMU, a bus-mastering device can DMA arbitrary RAM.IOMMU-backed DMA domains or a documented hardware policy that forbids untrusted bus-mastering devices before userspace drivers or production hardware claims.
Malicious boot manifestPartially in scope.Manifest decoding/validation must fail closed and not panic. A manifest accepted by the kernel/init is still trusted to define the initial service graph and bootstrap grants.Signed/authorized manifest policy, boot-package integrity, and review-visible payload hashes before accepting manifests from outside the repo or operator-controlled build path.
Compromised init/supervisorPartially out of scope for current proofs.Current demo TCB includes init and manifest-declared trusted services. If init is compromised, it can misgrant authority within the bootstrap service graph.Minimize init, split supervisors, require narrow grant construction, audit graph changes, and make restart/update authority explicit.
Compromised service with narrow capsIn scope.Address-space isolation, cap-table lookup, generation checks, ring validation, transfer checks, and resource ledgers should constrain it to granted authority.Complete hostile smokes for transfer modes, resource exhaustion, panic surfaces, and revoke/epoch behavior per service class.
Hostile network peerIn scope only for loopback demo robustness, not production remote access.Telnet is plaintext loopback-only. SSH gateway work is fixture/prototype status without complete encrypted transport, durable key/account storage, full OpenSSH userauth/channel handling, or complete audit gates.Non-loopback remote shells stay blocked until SSH transport/auth/key/audit/storage gates pass and pre-auth/post-auth authority is isolated or otherwise proven constrained.
Hostile local web client of the remote-session-ui bridgeIn scope.Today’s bridge shares one upstream capOS session across all loopback HTTP clients with no per-browser session, missing-Origin short-circuit, and non-constant-time secret comparison.Per-browser BrowserSession cookie, CSRF/Host/Content-Type guards, strict CSP with the matching inline-script/style refactor, constant-time comparators, rate limiting, and the carry-over Tauri capability-allowlist minimization, all per remote-session-ui-security-proposal.md.
Malicious build dependency or toolPartially in scope.Lockfiles, generated-code checks, pinned Cap’n Proto/Limine/docs tools, and dependency-policy checks make drift review-visible, but Rust nightly, QEMU/xorriso/OVMF, and final image hashes are not fully pinned.Date/hash-pinned toolchains, recorded host tool versions, image/payload hashes, and reproducible production build path.

ITU-T X.800 security-services completeness matrix

X.800 enumerates five security services; X.805 extends the list with availability and privacy. Each review of a proposal or kernel change should be able to say which service it touches, or that it touches none. The point is not to implement every cell — capOS explicitly defers some (end-to-end non-repudiation, for example) — it is to make gaps explicit.

X.800/X.805 servicecapOS surface that provides it
Authentication (peer entity, data origin)user-identity-and-policy-proposal.md X.1254 LoA tiers; passkey + password credentials in boot-to-shell-proposal.md; certificate-based peer auth in certificates-and-tls-proposal.md (mTLS); future attestation in cryptography-and-key-management-proposal.md AttestationKeySource.
Access controlStructural: the capability model itself. The interface is the permission; wrapper caps attenuate; CapTable cannot be bypassed. Policy layer: AuthorityBroker (X.812 ADF) over CapObject::call (X.812 AEF).
Data confidentialityTransport: certificates-and-tls-proposal.md TlsSocket. At rest: volume-encryption-proposal.md. In memory: address-space isolation + SMAP + SMEP.
Data integrityTransport: TLS AEAD. At rest: authenticated block encryption (SymmetricAlgorithm.aes256GcmSiv etc.). Manifest/boot: signed manifests (storage-and-naming-proposal.md Open Q #5). In-transit schema: Cap’n Proto wire format + bounds-checked decoders.
Non-repudiation (origin, delivery)Partial. Signed audit records (system-monitoring-proposal.md + cryptography-and-key-management-proposal.md audit key purpose). End-to-end non-repudiation for user actions is deferred until signed sessions exist.
Availability (X.805)Resource ledgers, bounded rings, CAP_OP_RELEASE, supervisor restart policy, rate limiters on monitoring ingestion. DoS resistance is a review dimension, not a separate subsystem.
Privacy (X.805)Principal pseudonymity (user-identity-and-policy-proposal.md pseudonymous profile), audit-record redaction, monitoring “payload capture is exceptional” default.

The matrix is a checklist, not a claim of completeness: individual proposals remain authoritative about what they do and don’t provide.

3. Tiered Approach

Four tiers, cheapest first. Each tier is independently useful, and later tiers assume earlier ones are in place.

Tier 1 – Hygiene and CI (cheap, high value)

These are the controls that make every other tier work. The only checked-in GitHub Actions workflow is .github/workflows/ci.yml; it runs formatting, host tests, cargo build --features qemu, make capos-rt-check, make generated-code-check, make dependency-policy-check, and make workflow-check. The QEMU smoke job installs its own boot tools and runs make plus make run, but remains non-blocking, so it is not yet a required boot assertion. No separate clippy, miri, fuzz, or Kani workflow files exist yet – those are scheduled per the track table below.

  • Continuous integration via GitHub Actions (or equivalent). Current baseline: make fmt-check, cargo test-config, cargo test-ring-loom, cargo test-lib, cargo test-mkmanifest, cargo build --features qemu, make capos-rt-check, make generated-code-check, make dependency-policy-check, and make workflow-check. Remaining CI work: treat QEMU boot as a required CI gate once runtime flakiness is acceptable, then add the security policy jobs below.
  • cargo clippy --all-targets -- -D warnings across workspace members, with a curated set of clippy::pedantic / clippy::nursery lints that pay off for kernel code (clippy::undocumented_unsafe_blocks, clippy::missing_safety_doc, clippy::cast_possible_truncation, etc.). Do NOT enable all of pedantic blindly – review each lint and either enable it or add a rationale comment.
  • cargo-deny for license and advisory gating; cargo-audit for the RustSec advisory DB against Cargo.lock. Dependencies include capnp, spin, x86_64, limine, linked_list_allocator – all externally maintained.
  • cargo-geiger report of unsafe surface area per crate, checked in as a snapshot and diffed in CI so growth is visible in PRs.
  • Deny unsafe_op_in_unsafe_fn (already required by edition 2024; make sure it stays on) and missing_docs on public kernel items where it is not already the case.
  • Dependency review discipline: every new dep needs a one-line rationale in the commit message and a check that it is no_std-capable, maintained, and does not pull in a surprise async runtime or heavy transitive graph.
  • No-std dependency rubric: kernel/no_std additions require an explicit compatibility check that core/alloc paths do not regress to std through default feature drift, and class ownership is recorded against docs/trusted-build-inputs.md.
  • Boot/build input pinning: pin external bootloader/tool downloads to an auditable revision or checksum. Branch names are not enough for TCB inputs. CI should fail when generated capnp bindings or no-std patching change outside an intentional schema/codegen update.
  • Untrusted-path panic audit: panic!, assert!, .unwrap(), and .expect() are acceptable during bring-up, but every path reachable from manifest bytes, ELF bytes, SQEs, params buffers, result buffers, and future IPC messages needs either a fail-closed error or a documented halt policy.
  • Hardware protection smoke tests: boot under QEMU with SMEP/SMAP-capable CPU flags and assert CR4.SMEP/CR4.SMAP once paging is initialized. Every explicit user-memory dereference must be wrapped in a short STAC/CLAC window once SMAP is enabled.

Tier 2 – Targeted dynamic analysis

Aimed at the host-testable pure-logic crates (capos-lib, capos-config) where the Rust toolchain just works. No kernel changes required.

  • Miri on the cargo test-lib and cargo test-config suites. Catches UB in pure-logic code: invalid pointer arithmetic, uninitialized reads, bad provenance, unsound unsafe. The FrameBitmap and CapTable tests in particular push against slot indexing, generation counters, and raw &mut [u8] handling – exactly what miri is good at.
  • proptest (or quickcheck) on:
    • capos-lib::elf::parse – random bytes / random perturbations of a valid header must never panic and must refuse anything that isn’t a correctly formed user-half ELF64.
    • capos-lib::frame_bitmap – interleaved sequences of alloc, alloc_contiguous, free, mark_used preserve the invariant free_count == popcount(bitmap == 0) and never double-free.
    • capos-lib::cap_table – insert/remove/lookup sequences preserve “every returned id resolves to its insertion-time object, and stale ids are rejected.”
    • capos-config::manifest encode/decode round trip on arbitrary manifests.
    • Schema round-trip tests in capos-config/tests/: today remote_capnp_rpc_dto_roundtrip.rs pins the remote capnp-rpc DTO wire shape, and remote_paperclips_dto_roundtrip.rs (10 tests) pins the Remote Session Paperclips DTO wire shape ahead of the future gateway/worker/browser bridge that will marshal traffic through them. New shared-DTO families should land alongside similar round-trip coverage so schema drift is review-visible.
  • cargo fuzz harnesses (libFuzzer). The current fuzz/fuzz_targets/ set is seven targets: elf_parse.rs, manifest_capnp.rs, mkmanifest_json.rs, sqe_validation.rs (ring SQE wire validator via capos_config::ring::sqe_wire_validation_error), telnet_filter.rs, telnet_filter_roundtrip.rs, and line_discipline.rs. The Telnet round-trip oracle exists alongside the structural Telnet filter target because the round-trip variant found a real EXOPL parsing bug (docs/changelog.md). These run outside CI (they never terminate) but have seed corpora under fuzz/corpus/ and can be exercised in fixed budgets via make fuzz-build and make fuzz-smoke.
  • Sanitizers on host tests: make sanitizer-host-tests runs AddressSanitizer over the capos-lib and capos-config host suites under the repo-pinned nightly (zero findings to date). ASan is indeed cheap – it needs no -Zbuild-std. ThreadSanitizer (make sanitizer-host-tests-tsan) is wired but currently blocked by an upstream cargo -Zbuild-std + build-script limitation when the sanitizer target equals the host triple; see Track S.17 for the recorded reproduction.

Tier 3 – Concurrency model checking

The capability ring is a lock-free single-producer / single-consumer protocol using volatile reads, release/acquire fences, and a shared head/ tail pair. It is the most likely source of subtle memory-ordering bugs and is also the most isolated – a perfect fit for model checking.

  • Loom on a host-buildable wrapper of the ring protocol. Extract the producer/consumer state machine from capos-config::ring into a form where atomics can be swapped for loom::sync::atomic, and write Loom tests that enumerate all interleavings of producer/consumer for small ring sizes (2–4 slots). Properties to check:
    • No CQE is lost.
    • No CQE is double-delivered.
    • The sq_head/sq_tail and cq_head/cq_tail pointers never observe a state that implies tail - head > SQ_ENTRIES.
    • The userspace ring “corrupted producer state” fail-closed policy from prior review-finding task records holds under adversarial interleavings.
  • Shuttle as a lighter alternative for regression-style tests once the specific bugs are known; cheaper per run, randomised rather than exhaustive. Good for long-running overnight jobs.

Loom coverage here is disproportionately valuable: it substitutes for the SMP-hardness work the project has explicitly deferred, and it exercises exactly the ordering that TOCTOU-style bugs hide in.

Tier 4 – Bounded verification of specific invariants

Not a full-kernel proof. Targeted, property-specific, one-module-at-a-time.

  • Kani (bounded model checking for Rust, via CBMC). Good fit for small, heap-free, arithmetic-heavy functions. Candidate modules:
    • capos-lib::cap_table – prove that for all insert; remove; insert' sequences under a u8 generation counter, a stale CapId never resolves. Bound: table size ≤ 4, generation window ≤ 256.
    • capos-lib::frame_bitmap – prove that for all bitmap sizes up to N bytes, alloc_frame followed by free_frame of the same frame restores the original bitmap and free_count.
    • capos-lib::elf::parse bounds checks: prove that every index into the program header table is < len, given the validated phentsize and phnum.
  • Verus (SMT-based Rust verifier, active development at MSR) for invariants that Kani can’t handle ergonomically, particularly those involving loops and ghost state. Worth tracking but don’t commit to it yet – the proof-engineering cost is real, and the tool is still young. Revisit once IPC lands and the kernel has stable public APIs.
  • Creusot / Prusti are alternatives in the same space. Do not invest in more than one SMT-based verifier; pick whichever has the best story for no_std + alloc code when Tier 4 starts.

Deliberately out of scope: Isabelle/HOL, Coq proofs, Frama-C. They would require re-encoding Rust in a foreign semantic framework with no established Rust front-end mature enough for kernel code.

4. Security Review Process

REVIEW.md is the rules document and docs/tasks/** is the open remediation and review-finding ledger. REVIEW.md contains the common security checklist that applies across kernel, userspace, host tooling, generators, and CI. The per-boundary prompts below are an expansion of that common checklist for OS-specific code paths.

CWE/CAPEC tagging policy

Security findings should carry CWE metadata when the mapping is specific enough to help a reviewer or future audit. Do not force a CWE into every title.

  • Prefer Base/Variant CWE IDs when the root cause is known: CWE-770 for unbounded allocation, CWE-88 for argument injection, CWE-367 for a concrete validation-to-use race, CWE-416 for a real use-after-free.
  • Use Class IDs as temporary or umbrella labels: CWE-20 for “input was not validated enough” before the missing property is known; CWE-400 for general resource exhaustion only when the enabling mistake is not more precise.
  • Use capability-kernel invariants instead of weak CWE mappings for design properties such as “no ambient authority”, “cap transfer happens exactly once”, “revocation cannot leave stale authority”, and “scheduling context donation cannot fabricate CPU authority”. Cite CWE-862/CWE-863 only when the issue is actually a missing or incorrect authorization check.
  • Use CAPEC for the attacker pattern when useful: input manipulation, command injection, race exploitation, flooding, or path/file manipulation. CAPEC is not a substitute for the CWE root-cause tag.

Current checklist coverage:

AreaPrimary tagsReview intent
Structured input validationCWE-20, CWE-1284–CWE-1289 when preciseValidate syntax, type, range, length, indexes, offsets, and cross-field consistency before privileged use
Filesystem pathsCWE-22, CWE-23, CWE-59Keep host-tool paths inside intended roots across absolute paths, traversal, symlinks, and file-type confusion
Commands/processesCWE-78, CWE-88Avoid shell interpolation; constrain binaries and arguments
Numeric/buffer boundsCWE-190, CWE-125, CWE-787Check arithmetic before pointer, slice, copy, ELF segment, and page-table use
Resource exhaustionCWE-770 preferred; CWE-400 broadBound queues, allocations, retries, spin loops, frames, scratch arenas, cap slots, and CPU budget
Exceptional pathsCWE-703, CWE-754, CWE-755; CWE-248 only for uncaught exceptionsFail closed on malformed or adversarial input; avoid trust-boundary panic/abort
Authorization/cap authorityCWE-862, CWE-863 plus capOS invariantsVerify capability ownership, generation, object identity, address-space ownership, and transfer policy
Concurrency/TOCTOUCWE-362, CWE-367, CWE-667Preserve lock ordering, interrupt masking, page-table stability, and validation-to-use assumptions
Lifetime/reuseCWE-416, CWE-664, CWE-672Prevent stale caps, stale kernel stacks, stale frames, and expired IPC state from being used
Disclosure/residual dataCWE-200, CWE-226Prevent logs, result buffers, frames, scratch arenas, and generated artifacts from leaking stale or sensitive data
Supply chain / generated TCBcapOS TCB invariant; use CWE only for concrete bugPin or review-visible drift for bootloader, dependencies, schema/codegen, generated code, and patching

Per-boundary review checklist

  • Syscall surface change (arch/x86_64/syscall.rs):
    • Every register-passed argument is treated as attacker-controlled.
    • No user pointer is dereferenced without an AddressSpace-locked copy/read helper or an explicitly documented equivalent stability guarantee.
    • Numeric conversions, copy lengths, and pointer arithmetic are checked before constructing slices or entering any direct user-access scope.
    • Kernel stack pointer and TSS.RSP0 invariants are preserved.
    • The syscall count stays bounded; a new syscall has an SQE-opcode alternative considered and explicitly rejected with rationale.
  • Ring dispatch change (kernel/src/cap/ring.rs):
    • SQ bounds check and per-dispatch SQE limit still enforced.
    • Corrupted SQ state fails closed (never re-processes the same bad state on the next tick).
    • No allocation in the interrupt-driven path beyond what the owning task record or panic-surface inventory explicitly accepts.
    • Result buffers and endpoint scratch buffers cannot leak stale bytes beyond the returned completion length.
  • User buffer validation change (kernel/src/mem/paging.rs, kernel/src/mem/validate.rs):
    • Address range check precedes PTE walk.
    • PTE flags checked: present, user, and write (if the buffer is written).
    • For process-owned buffers, validation and copy/read hold the process AddressSpace mutex. Any current-CR3 validator caller must document its own page-table stability guarantee.
  • ELF loader change (capos-lib::elf):
    • Every field bounded before use (phentsize, phnum, p_offset, p_filesz, p_memsz, p_vaddr).
    • Segments confined to the user half.
    • Overlap check preserved.
    • Integer arithmetic uses checked add/subtract before deriving mapped addresses, file slices, or zero-fill ranges.
  • Manifest change (capos-config::manifest):
    • Every optional field is either present or the service is rejected.
    • Name / binary / cap source strings are length-bounded.
    • Unknown / unsupported numbers in CUE input fail-closed with a path- specific error.
    • Capability grants are checked as an authority graph before any rejected graph can start a service.
  • Schema change (schema/capos.capnp):
    • Backward-compatible with existing wire format, or migration documented.
    • Every new method has an explicit capability-granting story (who mints the cap that lets this method be called?).
    • Generated code no_std patching still applies.
  • Host tool or generator change (tools/*, build.rs, CI scripts):
    • Manifest/config-derived paths cannot escape intended directories through absolute paths, traversal, symlinks, or file-type confusion.
    • External command execution uses explicit binaries and argument APIs, not shell interpolation of untrusted strings.
    • Generated outputs are review-visible and fail closed on malformed inputs.
    • Generated files and diagnostics do not disclose secrets, absolute paths, or stale build outputs beyond what the developer intentionally requested.
  • Unsafe block added or expanded: Tier 1 clippy lints plus REVIEW.md §“Unsafe Usage” checklist already cover this; the review should cite the specific invariant being maintained in the commit message.

Threat-model refresh

On every stage completion (Stage 6 IPC, Stage 7 SMP, first driver landing, first time a manifest comes from outside the repo), re-run §2 of this document and update it. The list of trust boundaries grows over time; the proposal decays if it doesn’t grow with the code.

Periodic full audit

Once per stage, schedule a focused audit pass:

  1. Re-verify every boundary’s code is still enforced at its documented entry point (no new bypass path).
  2. Re-run all Tier 2/3 jobs with the latest toolchain (catches tool-upgrade regressions).
  3. Walk through open review-finding task records and confirm each is still correctly classified (still open, fixed, explicitly accepted, blocked, or on-hold).
  4. Record the audit date and outcome in the relevant task records or a focused closeout task, matching the repository timestamp convention.

5. Concrete Verification Targets

Ordered by value and feasibility. Each one is a specific, bounded piece of work a contributor can pick up without needing to redesign the kernel.

#TargetTierPropertyBlocker
1capos-lib::cap_table4 (Kani)Stale CapId never resolves after slot reuse within the generation windowNone
2capos-lib::frame_bitmap4 (Kani)alloc/free preserve free_count invariant; no double-allocNone
3capos-lib::elf::parse2 (proptest + fuzz)No panic on arbitrary input; only well-formed user-half ELF64 acceptedNone
4capos-config::manifest2 (proptest + fuzz)Decode/encode round-trip; malformed input rejected without panicNone
5Ring SPSC protocol3 (Loom)No lost/doubled CQEs; fail-closed on corruption under all interleavingsExtract protocol into Loom-testable wrapper
6AddressSpace user-buffer helpers4 (Kani)Every accepted buffer lies entirely in user half with correct PTE flags, and validation/use happens under the address-space lockFormalise PTE and locking model
7Ring dispatch path3 (Loom + proptest)SQE poll is bounded per tick; no allocation on the dispatch pathInitial alloc-free synchronous path landed; async transfer/release paths still need coverage
8IPC routing3Capabilities transferred exactly once; no duplication under direct-switchCapability transfer
9Direct-switch IPC handoff2 + 3Scheduler invariants preserved when a blocked receiver bypasses normal run-queue orderLoom-testable scheduler/ring model
10SMEP/SMAP + user access windows1 + QEMU integrationKernel cannot execute user pages; direct user-memory touches either use audited access windows or the AddressSpace/HHDM copy pathWire existing x86_64 helper into init path
11Manifest authority graph2 (property tests)Every granted cap source resolves, every export is unique, and no service starts after a rejected graphManifest executor path
12Resource accounting2 + 3Rings, endpoints, cap tables, scratch arenas, frames, and CPU budget fail closed under exhaustionSecurity Verification Track S.9 design complete; implementation hooks pending
13Build/codegen TCB1Bootloader/deps/codegen inputs are pinned and generated output changes are review-visibleCI bootstrap
14Device DMA boundary (future)1 + design reviewNo driver or device can DMA outside explicitly granted buffersPCI/device work; IOMMU or bounce-buffer decision

Targets 1–4 are feasible today and should be the first batch of work. Target 10 is the security gate before treating Stage 6 services as untrusted. Targets 11–12 should be designed before capability transfer lands, otherwise the first IPC implementation will bake in ambient resource authority. Target 14 gates user-mode or semi-trusted drivers.

Current status as of 2026-05-16:

  • Targets 1–2 are part of the completed Verified Core visible milestone: commit d43b691 at 2026-04-23 22:09 UTC made make kani-lib the bounded local/GitHub proof gate for cap-table and frame-bitmap invariants, and commit c5968ee at 2026-04-23 22:12 UTC recorded the high-memory make kani-lib-full Cloud Build gate.
  • Target 3 has arbitrary-input proptest coverage and a cargo-fuzz target for ELF bytes. The current Kani harness still only proves the short-input early-reject path because fully symbolic ELF parsing reaches allocator and sort internals before there is a sharper proof obligation.
  • Target 4 has cargo-fuzz coverage for manifest decoding/roundtrip and mkmanifest exported-JSON conversion.
  • Target 5 has a feature-gated Loom model for the shared ring protocol.
  • Target 13 has an initial CI baseline plus generated-code drift checking, dependency audit/deny gates, and required QEMU boot still open. Remaining supply-chain provenance work is tracked by docs/tasks/trusted-build-inputs-pr-blocking-provenance.md; panic-surface hardening remains tracked by its owning task records across IPC/scheduler guarded unwraps, rollback restoration, stale queues, blocking waits, process/thread exit, endpoint cancellation, TLB shootdown send failures, and scheduler hot-path expects. Scheduler hot-path panic surface fully closed (2026-05-17, REVIEW_FINDINGS commit 1b295cb3): all .expect() / .unwrap() in block_current_on_cap_enter, next_start_context, schedule, exit_current, exit_current_thread, capos_block_current_syscall, and retain_endpoint_queue hardened per the established let-else + log + drop-lock + hcf() / return None / break pattern (per-function closures at 7f86796f / 777e0b3a / 0af439d4 / 7d93aea4 / b04d6d65 / 2bea189c).
  • Out-of-band scheduler/runtime hazards tracked in review-finding task records but not yet expressed as Concrete Verification Targets above: current post-AP kernel upper-half page-table mutation through the MMIO/firmware helper path is closed by kernel-wide TLB shootdown plus preseed/fail-closed PML4-slot handling (../tasks/done/2026-06-07/kernel-upper-half-pml4-propagation-hardening.md); future helper windows or allocator-growth paths that need a new kernel-half PML4 slot still require boot preseed or synchronized live-root propagation. ParkSpace unmap/reuse cleanup still owes shared park-word cleanup and address-space generation cleanup; resource quota fields for scratch bytes, outstanding calls, endpoint queues, and in-flight calls need real wiring or removal. Each is owned by its respective subsystem proposal; the consolidated routing index lives in docs/design-risks-register.md.

6. Security Verification Track Registry

The S.x labels are registry identifiers for this proposal’s security-verification track. They are not product stages and should be expanded as “Security Verification Track S.x” when cited outside this proposal.

TrackNameStatusPrimary document or evidence
S.1CI bootstrapLanded 2026-04-21.github/workflows/ci.yml
S.2Miri + proptest on capos-libLanded 2026-04-21cargo test-lib, cargo miri-lib
S.3Manifest + mkmanifest fuzzingLanded 2026-04-21fuzz/ manifest and mkmanifest targets
S.4Ring Loom harnessLanded 2026-04-21capos-config/tests/ring_loom.rs
S.5Kani on capos-libInitial landed 2026-04-21, expanded bounded gate landed 2026-04-23make kani-lib
S.6Security review docs stay alignedOngoingREVIEW.md, CLAUDE.md
S.7Stage-6-aware refreshPlanned/ongoingTrust-boundary inventory after Stage 6 changes
S.8Untrusted-service hardening gatePlannedSMEP/SMAP, user access windows, hostile-userspace tests
S.9Authority graph and resource accountingLanded 2026-04-21docs/authority-accounting-transfer-design.md
S.10Supply-chain and generated-code TCBPartially landeddocs/trusted-build-inputs.md
S.11Device/DMA isolation gateDesign accepted; brokered-bounce DDF production authority gates landed for the current local/GCE path, while direct-remapping and hostile-hardware claims remain futuredocs/dma-isolation-design.md
S.12Kani harness bounds refreshPlannedFuture transfer/accounting/user-buffer proof obligations
S.13ELF parser arbitrary-input coverageLandedcapos-lib::elf::parse, fuzz/fuzz_targets/elf_parse.rs
S.14Telnet IAC filter fuzz coverageLanded 2026-04-27 16:33 EESTcapos-lib::telnet, fuzz/fuzz_targets/telnet_filter.rs
S.15Telnet differential round-trip + line-discipline extractionLanded 2026-04-27 17:18 EESTcapos-lib::line_discipline, Telnet round-trip fuzz target
S.16Ring SQE wire-validation extraction + fuzz targetLanded 2026-04-27 19:42 EESTcapos_config::ring::sqe_wire_validation_error, fuzz/fuzz_targets/sqe_validation.rs
S.17Sanitizers on host testsASan landed (zero findings); TSan blocked upstreammake sanitizer-host-tests / make sanitizer-host-tests-tsan

Track Details

This slots into docs/tasks/README.md as a cross-cutting track rather than a phase – items are independent of Stage 6 IPC and can proceed in parallel.

Subtracks are scoped identifiers under their parent track:

SubtrackParentNamePrimary document or evidence
S.10.0S.10Trusted build input inventorydocs/trusted-build-inputs.md
S.10.2S.10Generated-code drift checkmake generated-code-check
S.10.3S.10Dependency policy and no_std review gatemake dependency-policy-check, deny.toml
S.11.1S.11DMA capability invariantsdocs/dma-isolation-design.md
S.11.2S.11Userspace-driver ownership-transition gatedocs/dma-isolation-design.md

Security Verification Track S.11.2 defines checklist rows S.11.2.0 through S.11.2.9 in docs/dma-isolation-design.md; those row labels are local acceptance criteria for the userspace-driver transition, not independent registry tracks.

  • Track S.1 – CI bootstrap – landed 2026-04-21
    • .github/workflows/ci.yml: fmt-check, test-config, test-ring-loom, test-lib, test-mkmanifest, cargo build --features qemu, make capos-rt-check, generated-code drift checking, and dependency policy checking.
    • QEMU smoke installs build-essential, capnproto, qemu-system-x86, xorriso, and cue v0.16.0 before running make and make run; it remains optional/non-blocking until boot runtime is stable enough to make it a required gate.
    • Clippy-with-deny and cargo-geiger remain future hardening jobs.
  • Track S.2 – Miri + proptest on capos-lib – landed 2026-04-21
    • Add proptest dev-dependency to capos-lib.
    • Host properties for capos-lib::cap_table and capos-lib::frame_bitmap; ELF arbitrary-input coverage is tracked separately under landed Security Verification Track S.13.
    • cargo test-lib runs the native host suite; cargo miri-lib runs the same crate under Miri.
  • Track S.3 – Manifest + mkmanifest fuzzing – landed 2026-04-21
    • fuzz/ crate with harnesses for manifest::decode and tools/mkmanifest CUE → capnp pipeline. Seed corpus checked in.
  • Track S.4 – Ring Loom harness – landed 2026-04-21
    • Extract the SPSC protocol from capos-config::ring into a test-only wrapper where atomics are swappable.
    • Loom tests covering corruption, overflow, and ordering.
    • Doubles as regression coverage for Phase 1.5 in docs/tasks/README.md.
  • Track S.5 – Kani on capos-lib – initial harnesses landed 2026-04-21, expanded bounded gate landed 2026-04-23
    • CapTable generation/index/stale-reference invariants.
    • FrameBitmap fail-closed free-error behavior plus a concrete bounded contiguous-allocation proof.
    • Transfer/resource-accounting fail-closed invariants for cap-slot preflight, frame-grant reservation, invalid transfer-origin rejection, move-reservation rollback after revocation, source visibility/accounting after the real prepare_copy_transfer path, and provisional destination cap-slot/frame-grant ledger restoration.
    • Propagation of real prepared transfer metadata into a provisional destination slot is reserved for make kani-lib-full; Google Cloud Build run 95b49620-06a5-49f4-85e6-782adb82d11c passed this high-memory gate on 2026-04-23.
    • ELF parser short-input early-reject panic-freedom exists as a targeted Kani harness but is not part of the mandatory bounded gate.
    • The current bounds are intentionally conservative so make kani-lib remains a practical local/GitHub CI gate; broader symbolic ELF and contiguous-allocation proofs should wait for more specific invariants or high-memory runners.
  • Track S.6 – Security review docs stay aligned
    • Keep REVIEW.md’s common security checklist aligned with §4’s boundary prompts as new boundaries land.
    • Add a “threat model refresh” step to the stage-completion workflow in CLAUDE.md.
  • Track S.7 – Stage-6-aware refresh
    • Re-run §2 trust-boundary inventory after capability transfer/release semantics land.
    • Plan Loom coverage for cross-process routing and direct-switch IPC.
    • Carry the inventory through the active scheduler-evolution phases (Phase D WFQ, Phase E SchedulingContext, Phase F one-SQ-consumer and nohz telemetry) and the WASI host-adapter surface (Phase W.4 entropy production wiring + per-instance argv text grant) so each new boundary is reflected in §2 before it can be relied on. The WASI host adapter is a userspace trust boundary – wasmi sandbox around untrusted Preview 1 payloads with per-instance EntropySource / argv grants – that the Tier 2/3 plan should explicitly cover as new harness targets emerge (see docs/proposals/wasi-host-adapter-proposal.md).
    • The Phase 1 monitoring log surface (LogSink/LogReader, kernel/src/cap/log.rs) is a new kernel boundary: a LogSink accepts bounded userspace-supplied records (decoded, length-truncated, severity- filtered against SystemConfig.logLevel) into a bounded drop-oldest ring, and a scoped LogReader serves cursor/filtered snapshots. It confers no transfer/grant authority beyond the scoped sink/reader and adds no ambient log namespace. Carry it in §2 before downstream services rely on it; per-process log token-bucket backpressure remains future work (docs/proposals/system-monitoring-proposal.md).
  • Track S.8 – Untrusted-service hardening gate
    • Wire SMEP/SMAP enablement into x86_64 init after paging is live.
    • Replace raw user-slice construction in syscall/ring paths with checked copy/access helpers that bracket the actual access with STAC/CLAC.
    • Add QEMU hostile-userspace tests for bad pointers, kernel-half pointers, invalid caps, corrupted rings, and services without Console authority.
    • Audit untrusted-input paths for panics before Stage 6 endpoints run mutually-untrusting processes.
  • Track S.9 – Authority graph and resource accounting – landed 2026-04-21
    • Concrete design is captured in docs/authority-accounting-transfer-design.md.
    • Defines authority graph invariants, per-process quota ledger (cap slots, endpoint queue, outstanding calls, scratch, frame grants, log volume, CPU budget), diagnostic aggregation, and exactly-once transfer/rollback semantics.
    • Establishes acceptance criteria that gate capability transfer and ProcessSpawner implementation. Current follow-up items live in docs/backlog/stage-6-capability-semantics.md.
  • Track S.10 – Supply-chain and generated-code TCB
    • Pin Limine and other external build inputs by revision/checksum rather than branch name.
    • Make capnp generated-code changes review-visible in CI, including the no-std patching step.
    • Consider cargo-vet only after cargo-deny/cargo-audit are in place; vetting too early is process theater.
    • Security Verification Track S.10.3 adds a concrete dependency policy: no_std additions are accepted only with class attribution, cargo deny + cargo audit, and explicit lockfile intent.
    • Security Verification Track S.10.3 enforcement is make dependency-policy-check, backed by deny.toml and pinned CI installs of cargo-deny 0.19.4 and cargo-audit 0.22.1.
  • Track S.11 – Device/DMA isolation gate
    • The DMA isolation story is now runtime-selected and fail-closed: guest-programmable remapping only when capOS can discover, program, and validate it; otherwise labeled brokered bounce buffers or unsupported.
    • DMAPool, DeviceMmio, and Interrupt invariants are represented by done task evidence for bounded physical/device-visible ranges, explicit interrupt ownership, reset/release teardown, generation checks, and no raw host-physical grants to untrusted drivers.
    • The current GCP/no-IOMMU userspace-provider path is brokered bounce-buffer authority. It supports the proved virtio-net and NVMe provider chains without claiming direct DMA, IOVA export, hostile bus-master isolation, or device-autonomous MSI-X delivery.
    • The DDF production-authority closeout closes the retained review finding for the current brokered-bounce provider path. Security Verification Track S.11.2 remains the canonical matrix for future direct-remapping/vIOMMU, hostile-hardware isolation, and broader device-owner claims.
  • Track S.12 – Kani harness bounds refresh
    • Revisit Kani bounds and harness shape once capability transfer, resource-accounting, or AddressSpace user-buffer helpers expose concrete proof obligations.
    • Prefer actionably narrow properties over arbitrary symbolic parser exploration that spends verifier time in allocator or sort internals.
  • Track S.13 – ELF parser arbitrary-input coverage – landed
    • capos-lib::elf::parse has proptest coverage for arbitrary bytes and valid-header perturbations.
    • fuzz/fuzz_targets/elf_parse.rs exercises ELF bytes through cargo-fuzz.
  • Track S.14 – Telnet IAC filter fuzz coverage – landed 2026-04-27 16:33 EEST
    • Extract the kernel’s TelnetFilter byte-stream parser into capos-lib::telnet so it is host-fuzzable and survives the Phase C move of Telnet framing into userspace per docs/proposals/networking-proposal.md.
    • Add fuzz/fuzz_targets/telnet_filter.rs with structural assertions (Normal must pass non-IAC bytes through unchanged; AfterIac is the only state allowed to emit a 0xFF; emitted byte count never exceeds input length).
    • Wired into make fuzz-build and make fuzz-smoke.
  • Track S.15 – Telnet differential round-trip + line-discipline extraction – landed 2026-04-27 17:18 EEST
    • Add fuzz/fuzz_targets/telnet_filter_roundtrip.rs: synthesize arbitrary RFC 854 event streams from fuzzer bytes, encode to wire, run through TelnetFilter, assert output equals the concatenation of Data(_) payloads. Found a real EXOPL handling bug – the option byte right after IAC SB was being mis-parsed as the start of an IAC IAC escape when its value was 0xFF, leaving the filter stuck in subnegotiation and silently dropping all subsequent data. Fixed via a new AfterSb state that consumes the option byte unconditionally; pinned by a regression test in capos-lib::telnet.
    • Extract the cooked-mode line discipline from kernel::cap::network into capos_lib::line_discipline::LineDiscipline, returning LineStep { outcome, echo } so all socket I/O stays at the caller. Add fuzz/fuzz_targets/line_discipline.rs with structural invariants (line_len <= max_bytes; ±1 line_len delta per Pending step; Cancelled clears; Echo::Byte/Backspace iff buffer grew/shrank by exactly one).
    • Future follow-up: differential against an external Telnet library (libtelnet C or Rust port) to catch RFC conformance bugs the structural targets cannot express.
  • Track S.16 – Ring SQE wire-validation extraction + fuzz target – landed 2026-04-27 19:42 EEST
    • Closes the original three-parser fuzz plan (elf::parse, manifest::decode, ring SQE decoder). Lifts the per-opcode *_sqe_has_unsupported_fields predicates from kernel/src/cap/ring.rs into capos_config::ring, exposes a unified sqe_wire_validation_error(&CapSqe) -> Result<(), i32> entry point, and reroutes the kernel through the shared functions so the kernel-host pair has one source of truth for ABI rules.
    • Add fuzz/fuzz_targets/sqe_validation.rs: cast arbitrary 64 bytes to CapSqe, run sqe_wire_validation_error and the matching per-opcode predicate, assert determinism, opcode-classification consistency (CAP_OP_FINISH -> CAP_ERR_UNSUPPORTED_OPCODE, unknown opcodes -> CAP_ERR_INVALID_REQUEST), and that the unified validator never disagrees with the predicate it dispatches to. Wired into make fuzz-build / fuzz-smoke.
    • Add 12 host unit tests in capos_config::ring covering the classification rules each opcode imposes (THREAD_OWNED + call_id pairing on CALL/PARK, RETURN’s APPLICATION_EXCEPTION flag, CANCEL’s required pipeline_dep target, NOP’s reserved-fields-zero rule, PARK_BENCH’s required addr).
    • The structural fuzz target pins arbitrary-byte behavior. The follow-up well-formed SQE generator oracle landed on 2026-06-06: the test/fuzz-only sqe-validation-oracle feature exposes capos_config::ring::sqe_oracle, which generates validator-accepted SQEs for each accepted opcode and one-field rejecting mutations, and fuzz/fuzz_targets/sqe_validation.rs runs that oracle on each input. This is a shared wire-validator oracle only; it does not claim cap-table lookup, userspace pointer mapping, transfer-descriptor loading, or full kernel ring semantic coverage. A future differential against an independent reference predicate remains a possible stronger disagreement oracle.
  • Track S.17 – Sanitizers on host tests – ASan landed; TSan blocked upstream
    • make sanitizer-host-tests runs RUSTFLAGS=-Zsanitizer=address over the capos-lib and capos-config host suites (crate set / features mirror the test-lib / test-config aliases) on the repo-pinned nightly + host target. It is a focused gate, not part of make check, mirroring dependency-policy-check / sdk-publish-dry-run. Outcome so far: zero findings; both suites pass clean, including the named unsafe suspects (FrameBitmap slot indexing, CapTable generation counters, lazy_buffer raw &mut [u8]). The §Tier 2 “cheap to add” claim holds for ASan, which needs no -Zbuild-std.
    • make sanitizer-host-tests-tsan is wired but currently blocked by an upstream cargo limitation, not a capOS defect. TSan changes the crate ABI, so rustc refuses to link sanitized code against the uninstrumented precompiled std; instrumenting std needs -Zbuild-std, which fails with duplicate core lang items for build-script-bearing dependencies (typenum / libc / cfg-if / subtle) when the sanitizer target equals the host triple. The exact reproduction (four attempted workarounds) is recorded in docs/backlog/security-verification.md Track S.17. Concurrency invariants are meanwhile covered by the dedicated Loom model (cargo test-ring-loom).
    • Done means: the ASan gate exists, runs under nightly, and any findings either land as fixes or get a documented disposition; the TSan target starts passing once the upstream -Zbuild-std + build-script issue is fixed.

Security Verification Tracks S.1 through S.5 have initial coverage. Track S.6 is ongoing doc hygiene and should move with review-process changes. Track S.8 must land before Stage 6 runs mutually-untrusting services. Track S.9 design is complete and now gates concrete implementation work in 3.6/5.2. Track S.11 gates device-driver work. Track S.12 should not expand bounds for their own sake; it is a refresh point when new kernel invariants make better proof targets available. Track S.13 closes the remaining target-3 gap from the table above.

7. What This Proposal Does Not Promise

  • No claim that capOS will be “secure” at the end. It will be harder to write a silently wrong change to the code paths the tooling covers, and it will be easier to find the ones that are still wrong.
  • No proof obligation on every PR. Kani and Loom are expensive to run on every push; CI runs them on a reduced schedule (e.g. nightly, or on PRs that touch the covered crates).
  • Userspace and host-tool bugs are in scope, but their impact is classified by boundary. A userspace bug should not compromise kernel isolation; a host-tool bug can still compromise the build TCB or developer/CI filesystem.
  • No claim that confidentiality is handled beyond architectural isolation. Timing channels, cache side channels, device side channels, and covert channels through shared services remain explicit research topics, not current implementation goals.

8. Relation to Other Docs

  • docs/research/sel4.md §1 and §6.1 already make the case that full verification is not the right goal. This proposal is the operational answer.
  • REVIEW.md is the reviewer’s rulebook. This proposal explains the security and verification rationale behind its common checklist and per-boundary prompts.
  • docs/tasks/** is the open-issue ledger. This proposal feeds it – every bug found by Tier 2/3/4 tooling gets a task record unless fixed in the same change.
  • docs/roadmap.md owns the stages; this proposal does not add stages, only a cross-cutting track that runs alongside them.
  • Task records under docs/tasks/ own concrete ordering; Security Verification Tracks S.1–S.17 above are mirrored there when they are actionable slices.
  • docs/design-risks-register.md is the consolidated index of long-horizon design risks and open architectural questions; consult it when this proposal’s open gaps reference a hazard whose primary owner lives in a subsystem proposal, backlog, or design file rather than here.