Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Security And Verification Backlog

Detailed decompositions for security and verification work. docs/tasks/README.md links here but should not inline these subtasks.

Stage-6 Trust-Boundary Refresh

  • Refresh trust-boundary docs after Stage 6 IPC/capability-transfer work.

Untrusted-Service Hardening Pass

Cover unmapped pointers, kernel-half pointers, invalid capability IDs, corrupted rings, SQ/CQ overflow behavior, and a service without Console authority. Audit manifest, ELF, SQE, params, and result-buffer paths so untrusted input fails closed instead of reaching kernel panic paths.

Completed context:

  • Panic-surface inventory: audited panic!, assert!, unwrap, and expect reachable from manifest, ELF, SQE, params, result-buffer, IPC, and spawn inputs.
  • Ring/user-pointer hostile demos: added unmapped params/result-pointer, kernel-half params-path, invalid-capability-ID, corrupted RETURN call_id, corrupted SQ/CQ head, undersized-params, undersized-result, and SQ/CQ overflow coverage.
  • No-authority smoke: empty-CapSet service verifies expected cap lookups fail and invalid-cap CALLs return controlled CQEs; after removal of syscall 0, it proves a no-authority process cannot write and can only exit/cap_enter.

Remaining decomposition:

  • Quota and exhaustion smokes (make run-untrusted-exhaustion, two QEMU passes; covered 2026-05-25 06:42 EEST):
    • Cap-table and endpoint-queue exhaustion fail closed without corrupting existing calls. Endpoint-queue is proven by the small-scratch core pass (per-owner queue ceiling -> Overloaded, then a held console call still completes). Cap-table is proven by the small-scratch core pass and the default-profile *-captable companion pass: single-frame MemoryObject allocations first return bounded FrameAllocator success replies, then continue until the per-process cap-slot ledger fails closed (Overloaded: failed to reserve MemoryObject cap slot); a held console call still completes after the boundary.
    • Scratch/result-buffer pressure returns controlled errors and later valid calls still complete (core pass: ring-scratch oversize CALL rejected with CAP_ERR_INVALID_REQUEST, reply-scratch clamp returns a serialized exception, then a valid console write completes).
    • Repeated invalid submissions stay bounded: each structurally invalid SQE returns a controlled per-SQE error CQE and the ring stays usable (a recovery NOP completes). Note: the per-key token-bucket log aggregation in docs/authority-accounting-transfer-design.md §3 (D1/D2 suppressed- count summary line) is still a design target, not implemented; the smoke asserts bounded per-SQE rejection, not the summary line.
    • Frame-grant-page exhaustion: not cleanly reachable from a smoke. For single-page allocations the cap-slot ceiling (PROCESS_CAP_SLOT_LIMIT, 256) is reached far before the frame-grant ledger (PROCESS_FRAME_GRANT_PAGE_LIMIT, 4096 pages), and reaching 4096 grant pages needs large contiguous allocations whose failure mode is physical fragmentation, not the grant ledger. The cap-table pass exercises the same fail-closed preflight-reserve path. Remaining gap.
  • Fail-closed cleanup: the FrameAllocator success-path result serialization now honors the caller’s effective reply-scratch capacity, so small-scratch processes can receive bounded MemoryObject result caps before cap-slot exhaustion fails closed. Closed by security-reply-scratch-success-path-limit-local-proof.

Kani Harness Bounds Refresh

  • Revisit Kani harness bounds and proof shape once capability transfer, resource accounting, or user-buffer validation has more concrete proof obligations. Keep current bounds practical for make kani-lib; expand only when the added verifier cost buys a specific kernel invariant.

DMA Assurance Model Operationalization

dma-assurance-model-v0 (2026-05-24) landed the accepted proposal (docs/proposals/dma-assurance-model-proposal.md) and inspectable-only TLA+/Alloy skeletons (models/dma/), but stopped there: no run target, no CI gate, no reconciliation with DMA code landed since. Kickoff task: dma-assurance-model-operationalization (decomposition — reconciled the v0 model with landed code and emitted the per-tool slices below).

  • Reconcile models/dma/ with landed invariants (ownership-generation on recycle, map-record-before-PTE-install ordering, drive-pin, epoch fence, scrub-before-free): gap table in models/dma/README.md grounded against the landed symbols, done 2026-06-04.
  • make model-dma-tla — bounded TLC run of dma_authority.tla (pinned TLC 2.19 / tla2tools 1.7.4 + pinned Temurin JRE 17.0.19), lifecycle ordering plus generation-keyed stale completion, record-before-PTE-install split, drive-pin/quarantine, and queue-enable epoch-fence interleavings, checked clean at 2 devices / 2 domains / 2 pages / 2 iovas, generations 0..1, done 2026-06-04: dma-assurance-model-tla-checked-gate.
  • make model-dma-alloy — Alloy analysis of dma_authority.als (pinned Alloy Analyzer 6.2.0), device/domain/IOVA/page/alias authority graph plus the ownership-generation stale-handle gate, checked at scope for 4, done 2026-06-04: dma-assurance-model-alloy-checked-gate.
  • make kani-dma-authority — bounded Kani over an extracted pure DMA-authority core (capos_lib::dma_authority: ownership-generation bump on recycle, stale-handle rejection without mutation, no-re-expose before completion), make kani-lib style, done 2026-06-04. Faithful extraction of the device_dma.rs authority arithmetic; routing the kernel call site through the core is a tracked follow-up (kernel is no_std/no_main, not host-built): dma-assurance-model-kani-authority-core.
  • make model-dma-deferred-completion-loom — focused Loom (pinned 0.7.2) over the DeferredCompletionQueue reservation budget and the multi-CPU TLB shootdown generation re-read (deferred-EOI / completion concurrency the ring Loom does not cover), done 2026-06-04: dma-assurance-model-deferred-completion-loom.
  • CI wiring into the GitHub gate and local aggregate now that each target has a checked result. make dma-assurance-model-check runs Alloy/TLA+/Loom/ Kani locally when cargo-kani is installed; GitHub CI runs model-dma-alloy, model-dma-tla, and model-dma-deferred-completion-loom in dma-assurance-models, and kani-dma-authority in kani-proofs. Done 2026-06-05: dma-assurance-model-ci-wiring.

Scheduler & IRQ Assurance Models

The scheduler is the densest unmodeled concurrency surface in the kernel (per-CPU atomics read lock-free from ISR context while another path holds the scheduler mutex via try_lock, plus IPI cross-CPU activation) and has zero formal coverage today (smoke + measured suppressed-tick counters only). The IRQ MSI-X waiter race was fixed by reproduction, not a model. Mirrors the DMA operationalization pattern; tasks reuse the TLC/Alloy/Kani pins that track lands.

  • S1 scheduler-nohz-activation-model (done 2026-06-04 09:00 UTC) – TLA+/TLC for the nohz activation/rollback lifecycle + a focused Loom for the lock-free NOHZ_ACTIVE_CPUS bit vs locked nohz_activation[slot] record race. make model-scheduler-nohz-tla checks no timer-less CPU (NoTimerlessStall + EventuallyReArmed), bit/record agreement (EventuallyConsistent), and that a staled remote activation is dropped not applied to a newer lease (NoStaleActivation + StaledRecordEventuallyCleared); make model-scheduler-nohz-loom checks the lock-free-bit ↔ locked-record reconciliation keeps the timer armed. Checked results + mutation/non-vacuity evidence in models/scheduler/README.md.
  • S2 scheduler-lapic-oneshot-timer-model (done 2026-06-04) – Kani over the extracted pure count/clamp arithmetic (capos_lib::clockevent) + a TLA+ mode-transition lemma pinning the halt-first reprogram ordering. make kani-lapic-oneshot proves the clamp window is well-formed, the armed count is in [1, u32::MAX] with no u128 overflow, and the count round-trips to the request within one LAPIC count (3/3 SUCCESSFUL). make model-scheduler-lapic-oneshot-tla checks that after the halt-first reprogram the next fire is the one-shot at the armed count, never the periodic reload (OneshotModeBoundedCount + HaltedDisarmed + the periodicFiredInOneshotMode sentinel), and that every fire path (user- and kernel-mode consumption) restores a timer source (NoTimerlessStall + EventuallyReArmed / FiredEventuallyRestored liveness). Checked results + mutation/non-vacuity evidence in models/scheduler/README.md.
  • S3/S4 scheduler-cpu-isolation-lease-authority-model (done 2026-06-04 07:04 UTC) – Alloy for the lease/grant relational invariants + TLA+ for the two-lock teardown and the documented non-atomic createLease-vs-revokeGrant SMP window. make model-scheduler-lease-alloy checks the unforgeable grant->lease binding, no live lease through a revoked grant outside the explicitly modeled bounded window, capacity never undercounting a live lease, and the stale-handle generation gate; make model-scheduler-lease-tla checks generation advances exactly once per termination, no capacity double-free, the single chokepoint always runs unregister + SQPOLL-stop + nohz-rollback before recycle, no stranded generation (liveness), and that the renew deadline branch never resurrects. Checked results + non-vacuity evidence in models/scheduler/README.md.
  • IRQ irq-msix-waiter-determinism-model (done 2026-06-04 06:10 UTC) – TLA+/TLC for the waiter <-> delivery <-> deferred-EOI ordering the RX MSI-X waiter fix established. make model-irq-waiter-tla checks no spurious/early injection (NoCompletedEarly), exactly-once delivery/EOI/completion accounting, EOI drain before route re-arm (EpochDrainSound), and the NoLostWake liveness property; checked result + mutation evidence in models/irq/README.md.

Preserved Completed Security Context

These are completed and should not be re-read by default. They remain here so future work can find their design context without bloating docs/tasks/README.md.

  • Authority graph and resource accounting design for transfer model: docs/authority-accounting-transfer-design.md.
  • Supply-chain and generated-code TCB hardening: pinned Limine and external build downloads, generated-code drift checks, dependency policy, pinned Cap’n Proto compiler, shared tools/capnp-build, and deterministic generated-binding comparisons.
  • DMA isolation model before PCI/virtio/user-driver work: docs/dma-isolation-design.md defines short-term QEMU bounce-buffer decision, DMAPool, DeviceMmio, Interrupt invariants, and the userspace-driver transition gate.
  • ELF parser arbitrary-input coverage: proptest coverage plus a bounded cargo-fuzz target.
  • Telnet IAC filter fuzz coverage: TelnetFilter extracted to capos-lib::telnet, fuzz/fuzz_targets/telnet_filter.rs exercises the state machine with structural assertions (Normal/AfterIac emission rules, monotonic emit count). Will travel with the parser when networking moves to userspace per docs/proposals/networking-proposal.md.
  • Telnet IAC filter differential round-trip fuzzing (fuzz/fuzz_targets/telnet_filter_roundtrip.rs): synthesize arbitrary RFC 854 event streams (Data, WILL/WONT/DO/DONT, SB blocks with payload), encode to wire bytes, and assert that filter output equals the concatenated Data payloads. Found a real EXOPL handling bug in the original filter – the option byte right after IAC SB was being interpreted as the start of an IAC IAC escape, leaving the filter stuck in the subnegotiation state and silently dropping all subsequent data bytes. Fixed via a new AfterSb state that consumes the option byte unconditionally before entering payload parsing.
  • Line discipline extraction and fuzz coverage: pure LineDiscipline lives in capos-lib::line_discipline, returning LineStep { outcome, echo } descriptions. The kernel transport drives it and translates Echo::Byte/Echo::Backspace/Cancelled/ Submitted/Reprompt into the existing send_*_track_cr calls. Backed by fuzz/fuzz_targets/line_discipline.rs with structural invariants (line_len <= max_bytes, ±1 line_len delta per Pending step, Cancelled clears, echo only when buffer grows/shrinks).
  • Future: differential fuzzing against an external Telnet library (libtelnet or a Rust port) to catch RFC conformance bugs the structural and round-trip targets cannot express. Tracked as a follow-up to Track S.14.
  • Ring SQE wire validation extraction and fuzz coverage: lifted the per-opcode *_sqe_has_unsupported_fields predicates from the kernel into capos_config::ring, exposed a unified sqe_wire_validation_error entry point, and reroute the kernel through it. Added fuzz/fuzz_targets/sqe_validation.rs plus 12 host unit tests covering the classification boundaries each opcode imposes. Closes the originally planned three-parser fuzz set (elf::parse, manifest::decode, ring SQE decoder).
  • Well-formed SQE generator oracle (docs/tasks/done/2026-06-06/security-sqe-well-formed-generator-fuzz-local-proof.md): the test/fuzz-only sqe-validation-oracle feature exposes capos_config::ring::sqe_oracle, which generates validator-accepted SQEs for every opcode accepted by the current build, plus one-field rejecting mutations for reserved fields, unsupported flags, constrained cap/pointer/size fields, opcode-specific constraints, and session-disclosure reserved bits. The existing sqe_validation fuzz target keeps arbitrary-byte coverage and runs the positive/negative oracle on each input. This closes the shared wire-validator oracle gap only; it does not claim cap-table lookup, userspace pointer mapping, transfer-descriptor loading, or full kernel ring semantic coverage.
  • Track S.17 – sanitizers on host tests – partially landed. make sanitizer-host-tests runs AddressSanitizer over the capos-lib and capos-config host suites (crate set / features mirror the test-lib / test-config aliases). Outcome: zero findings – both suites pass clean under ASan, including the named unsafe suspects (FrameBitmap slot indexing, CapTable generation counters, lazy_buffer raw &mut [u8]). The “cheap to add” claim holds for ASan only: it needs no -Zbuild-std because its libc interceptors cover the uninstrumented precompiled std. - ThreadSanitizer (make sanitizer-host-tests-tsan) is blocked upstream, not by a capOS defect. TSan changes the crate ABI, so rustc refuses to link sanitized code against the uninstrumented precompiled std (mixing -Zsanitizer will cause an ABI mismatch). Instrumenting std needs -Zbuild-std, which then fails with duplicate lang item in crate core: sized for build-script-bearing dependencies (typenum / libc / cfg-if / subtle) when the sanitizer target equals the host triple – reproduced four ways (plain -Zbuild-std, renamed --target JSON spec, target-applies-to-host=false, and -Zhost-config). The TSan target is kept wired so it starts passing once the upstream -Zbuild-std + build-script issue is fixed. capOS concurrency invariants are meanwhile covered by the dedicated Loom model (cargo test-ring-loom); these host unit tests spawn no threads, so TSan’s marginal value here is low.