# Design Risks and Open Questions Register

Consolidated index of known design risks and open architectural questions for
capOS. Every entry routes to the file that owns the long-form design or the
remediation backlog for that risk; this register itself is a pointer document,
not a place to put new design.

Use this document to answer "is this risk already tracked, and where?" without
re-deriving the state from the proposal tree on each review.

Last refresh: 2026-04-29 11:52 UTC.

## How To Use

- Each design-risk row records the **current observable state** (what the code
  and docs say today), the **owning tracker** (the proposal/backlog/design file
  to update when the state changes), and the **remaining gap** (what is still
  open).
- Each open-question row records a **current answer** if one exists in the
  tree, plus a **pointer** to the canonical tracker. Questions that are
  genuinely unanswered are marked **Open**; those should not be closed by
  guessing here -- update the relevant proposal, then update this register.
- When a risk is closed by code or by an explicit design decision, move the
  short closure summary into `docs/changelog.md` and remove the row.
- New review findings still go into `REVIEW_FINDINGS.md`; this register is
  about long-horizon design risks, not concrete unresolved review issues.

## Design Risks

### R1 -- Process-wide ring vs multi-threaded userspace and full SMP

- **State.** The capability ring is one per process. `capos-rt` enforces a
  single-owner `RuntimeRingClient`. After in-process threading, at most one
  process-ring waiter is allowed. The first SMP Phase C AP scheduler-owner
  proof deliberately keeps process-wide ring execution on a single CPU at a
  time behind a scheduler-owner latch.
- **Owner.** `docs/proposals/ring-v2-smp-proposal.md`,
  `docs/research/completion-ring-threading.md`,
  `docs/backlog/smp-phase-c.md`, `docs/architecture/threading.md`.
- **Gap.** Per-thread capability rings, per-thread completion routing, and the
  Multi-Process / In-Process Threading Scalability milestones in
  `docs/roadmap.md` remain future work. Userspace threading scales only as far
  as the single ring waiter allows.

### R2 -- "Interface IS the permission" pushes safety into wrapper TCB

- **State.** capOS deliberately has no parallel rights bitmask: attenuation is
  done by handing out a narrower `CapObject` wrapper, not a flag-reduced copy
  of the same cap. Wrapper correctness is therefore part of the trust base.
- **Owner.** `docs/capability-model.md`,
  `docs/proposals/session-bound-invocation-context-proposal.md`,
  `docs/security/trust-boundaries.md`,
  `docs/backlog/stage-6-capability-semantics.md`.
- **Gap.** The selected Session-Bound Invocation Context migration has the
  one-session-per-process proof, privacy-preserving endpoint caller-session
  metadata, explicit subject-disclosure coverage, chat session-keyed state,
  and terminal/stdio bridge liveness guards. Remaining cleanup is focused on
  peer-owned Adventure/shared-service migration, retiring remaining normal
  user-facing receiver-selector syntax, and final full-gate verification before
  treating the Tier-1 paper prerequisite as closed.

### R3 -- Legacy endpoint metadata as transitional service identity

- **State.** Legacy endpoint receiver metadata is contained as internal
  transport/debug state for normal paths. Chat uses session-keyed membership,
  terminal/stdio bridges enforce live caller-session guards, and delegated
  relabeling containment plus the historical service-object routing/lifecycle
  proof have landed. Adventure/shared-service cleanup remains the visible
  tail of the selected migration.
- **Owner.** `docs/proposals/session-bound-invocation-context-proposal.md`,
  `docs/backlog/stage-6-capability-semantics.md`.
- **Gap.** Finish Adventure/shared-service migration and final legacy cleanup.
  Receiver metadata must remain internal transport state or hostile-test
  fixture, not subject identity or disclosure.

### R4 -- Resource accounting is fragmented

- **State.** Per-process memory, cap-table, and thread quotas exist;
  `ResourceProfile`, session quotas, scheduling-context donation, and
  cross-service donation/fairness are still proposal-shaped.
- **Owner.** `docs/proposals/resource-accounting-proposal.md`,
  `docs/proposals/oom-and-swap-proposal.md`,
  `docs/proposals/user-identity-and-policy-proposal.md`,
  `docs/proposals/system-monitoring-proposal.md`.
- **Gap.** CPU accounting/scheduling contexts, log volume accounting,
  per-service fairness, donation semantics, and unified resource bundles for
  guest/anonymous/external/service principals are not implemented.

### R5 -- Copy-transfer SQE replay is repeatable by design

- **State.** `docs/authority-accounting-transfer-design.md` documents that
  userspace replay of a copy-transfer SQE is repeatable per dispatch attempt,
  with move-transfer replay failing closed once the source slot is
  removed/reserved. Exactly-once replay suppression is explicitly future work
  (security invariant T3).
- **Owner.** `docs/authority-accounting-transfer-design.md`,
  `docs/proposals/security-and-verification-proposal.md`.
- **Gap.** The `(sender_pid, call_id, sqe_seq)` plus monotonic transfer-epoch
  identity needed for exactly-once replay across dispatch attempts is not
  implemented. Each transferable interface must continue to acknowledge this
  in its threat model.

### R6 -- `CAP_OP_RELEASE` is deferred / queued, not synchronous

- **State.** Owned-handle drop in `capos-rt` queues one local `CAP_OP_RELEASE`
  on the ring; process exit performs fallback cleanup. Release does not run
  before the next ring flush (`cap_enter` or process exit).
- **Owner.** `docs/authority-accounting-transfer-design.md`,
  `docs/proposals/error-handling-proposal.md`,
  `docs/capability-model.md`.
- **Gap.** Resource-pressure or revocation-sensitive flows must not assume a
  Drop call has already taken effect at the kernel layer. Time-critical
  revocation should use `CapabilityManager.revoke` or epoch revocation rather
  than relying on Drop.

### R7 -- Shared memory / zero-copy / shared park are incomplete

- **State.** `MemoryObject` substrate exists; `SharedBuffer` provenance,
  file/network/DMA zero-copy paths, and shared park/`SharedParkSpace` are blocked on
  mapping provenance / object pinning work.
- **Owner.** `docs/proposals/storage-and-naming-proposal.md`,
  `docs/proposals/networking-proposal.md`,
  `docs/architecture/park.md`,
  `docs/backlog/runtime-network-shell.md`.
- **Gap.** Workloads that need true zero-copy IPC, storage, or network
  pipelines pay a copy/serialization cost until provenance/pinning lands.
  ParkSpace private waiters on reusable unmapped addresses remain restricted
  by documentation until VM-unmap stale-key cleanup and tests land.

### R8 -- Networking lives inside the kernel TCB

- **State.** virtio-net, ARP/ICMP, the smoltcp runtime, and the TCP
  `CapObject`s currently run in the kernel address space behind capability
  objects. The Telnet and SSH terminal-host proofs are built on this path.
- **Owner.** `docs/proposals/networking-proposal.md`,
  `docs/dma-isolation-design.md`,
  `docs/backlog/runtime-network-shell.md`.
- **Gap.** Userspace NIC driver and userspace TCP stack are Phase C / future
  work; until then the network stack temporarily expands the kernel TCB
  against the long-term service-decomposition direction.

### R9 -- DMA threat model assumes cooperative virtio

- **State.** `docs/dma-isolation-design.md` records that the first virtio-net
  smoke uses kernel-owned bounce buffers, does not expose userspace DMA
  buffers or physical addresses, and explicitly assumes a non-hostile QEMU
  virtio device. Without an IOMMU a malicious bus master can DMA arbitrary
  RAM.
- **Owner.** `docs/dma-isolation-design.md`,
  `docs/proposals/networking-proposal.md`,
  `docs/proposals/cloud-deployment-proposal.md`.
- **Gap.** `DMAPool` / `DeviceMmio` / `Interrupt` userspace-driver gating, and
  IOMMU-backed isolation for production hardware, are not implemented.

### R10 -- Boot package model embeds all binaries

- **State.** `tools/mkmanifest` embeds every declared binary as a `NamedBlob`
  inside `manifest.bin`. The kernel loads only `init`; everything else is
  fetched by `init` from the in-memory `BootPackage`.
- **Owner.** `docs/backlog/hardware-boot-storage.md`,
  `docs/proposals/storage-and-naming-proposal.md`,
  `docs/trusted-build-inputs.md`.
- **Gap.** Boot binary ISO layout (separate ELF payloads), package/storage
  update model, and persistent storage-backed delivery are not yet designed
  as code; the current scheme is an explicit prototype compromise.

### R11 -- Pre-auth and post-auth share a shell process

- **State.** The shell-led boot flow folds `console-login` into `capos-shell`
  and uses an anonymous-first session that escalates via `login`/`setup`. The
  pre-auth and post-auth code paths run in one userspace process and address
  space.
- **Owner.** `docs/proposals/boot-to-shell-proposal.md`,
  `docs/proposals/shell-proposal.md`,
  `docs/security/trust-boundaries.md`,
  `docs/proposals/user-identity-and-policy-proposal.md`.
- **Gap.** Separation depends on shell/auth implementation quality, not on a
  process boundary. The future direction (separate login service with minimal
  authority, restricted launchers, WebShell/SshGateway) is proposal-shaped.
  Remote and non-loopback shells must remain blocked until pre-auth and
  post-auth authority are process-isolated or a shared-process proof is
  accepted.

### R16 -- Remote shell ingress is demo/prototype only

- **State.** Telnet is a plaintext loopback-only QEMU demo. SSH has
  SSH-shaped prerequisites, fixture authentication proofs, dev key material,
  policy classification, and restricted-shell launcher coverage, but no
  production encrypted SSH transport, durable key/account storage, full
  OpenSSH-compatible userauth/channel handling, channel binding, or complete
  audit/storage gates.
- **Owner.** `docs/proposals/ssh-shell-proposal.md`,
  `docs/backlog/runtime-network-shell.md`, `WORKPLAN.md`,
  `docs/build-run-test.md`.
- **Gap.** Production/non-loopback shell exposure is blocked on SSH transport,
  key, account, audit, storage, session-bound delegation, and pre-auth/post-auth
  isolation gates.

### R12 -- Verification coverage is partial, not full proof

- **State.** Bounded Kani gate (`make kani-lib`/`make kani-lib-full`), Loom
  ring model, Miri lib tests, proptest, fuzz harnesses, panic-surface
  inventory, and CI dependency policy exist. Coverage is not whole-system
  and not seL4-style functional refinement.
- **Owner.** `docs/proposals/security-and-verification-proposal.md`,
  `docs/security/verification-workflow.md`,
  `docs/panic-surface-inventory.md`,
  `docs/backlog/security-verification.md`.
- **Gap.** Public/external claims must distinguish "bounded model checked"
  from "fully verified". Promote new properties into Kani/Loom only when the
  invariant is concrete and bounded.

### R13 -- Trusted build inputs are partly pinned

- **State.** Limine (commit + artifact SHA-256), `capnp` 1.2.0 source tarball,
  CUE 0.16.0, mdBook/mdbook-mermaid, Typst 0.14.2, Cargo lockfiles, and the
  Kani toolchain bundle are pinned. The Rust nightly toolchain is a floating
  channel without a date or hash pin; `xorriso`, `qemu-system-x86_64`, and
  OVMF firmware are observed-not-pinned.
- **Owner.** `docs/trusted-build-inputs.md`.
- **Gap.** Reproducible-build pinning for the Rust nightly (date/hash), host
  ISO/firmware tools, and final ISO/payload checksums is unfinished; the
  document already lists these as open S.10.x gaps.

### R14 -- User identity / policy is proposal-shaped

- **State.** Anonymous/operator sessions, password setup/login, broker-issued
  shell bundles, and redacted audit records exist. Durable accounts, ABAC/MAC
  context, OIDC/passkeys, disk-backed account stores, and resource bundles
  are proposal-shaped.
- **Owner.** `docs/proposals/user-identity-and-policy-proposal.md`,
  `docs/backlog/local-users-management.md`,
  `docs/proposals/oidc-and-oauth2-proposal.md`,
  `docs/proposals/certificates-and-tls-proposal.md`,
  `docs/proposals/cryptography-and-key-management-proposal.md`.
- **Gap.** Until durable identity / persistence / passkey paths land, capOS
  is not a complete multi-user OS. Demo claims must scope to the proven
  anonymous + operator + manifest-seeded local accounts model.

### R15 -- App exception serialization depends on result-buffer capacity

- **State.** Application-level exceptions are serialized into the caller's
  result buffer; if the target cannot be identified, invocation fails earlier
  with transport errors. Truncation/transport failures are documented.
- **Owner.** `docs/proposals/error-handling-proposal.md`,
  `docs/capability-model.md`.
- **Gap.** Service UX/debuggability can degrade for malformed or
  small-buffer clients. No remediation is required in code today, but each
  service contract should document its expected result-buffer capacity.

## Open Design Questions

The following questions came up in external review. Each row gives the
**current best answer** observed in the tree, the **canonical tracker** to
update, and an explicit **status**.

### Q1 -- Cap'n Proto ABI compatibility policy

- **Current answer.** Schema interface IDs / method IDs / struct layout are
  reviewed for stability via `make generated-code-check` and
  `tools/check-generated-capnp.sh`. There is no published ABI-compatibility
  guarantee window yet.
- **Tracker.** `docs/proposals/error-handling-proposal.md`,
  `docs/trusted-build-inputs.md`, `schema/capos.capnp`.
- **Status.** Open. A formal ABI-evolution policy (interface-id stability,
  reserved-field semantics, deprecation window) needs to be authored before
  external consumers depend on the schema.

### Q2 -- Ring v2 backward compatibility

- **Current answer.** `docs/proposals/ring-v2-smp-proposal.md` treats per-thread
  ring ownership as the full-SMP target and frames it as an evolution that
  may need ABI changes; `WORKPLAN.md` calls runtime ring reactor work the
  compatibility bridge.
- **Tracker.** `docs/proposals/ring-v2-smp-proposal.md`,
  `docs/backlog/smp-phase-c.md`.
- **Status.** Open. Whether Ring v2 is backward-compatible with the
  process-wide ring or an explicit ABI break has not been decided.

### Q3 -- Which capabilities are copy-transferable vs move-only vs non-transferable

- **Current answer.** `docs/authority-accounting-transfer-design.md` defines
  copy/move/none transfer modes and the accounting/rollback rules.
  Per-interface transfer mode is encoded on the schema-defined `CapObject`.
- **Tracker.** `docs/authority-accounting-transfer-design.md`,
  `schema/capos.capnp`.
- **Status.** Partial. The mode is enforced per object, but the user-visible
  matrix (which named caps are copy/move/none) is not consolidated in one
  document.

### Q4 -- Copy-transfer replay: feature or compromise

- **Current answer.** Repeatable copy-transfer replay is documented as the
  current accepted semantics. Exactly-once replay suppression is future work.
  See R5.
- **Tracker.** `docs/authority-accounting-transfer-design.md`.
- **Status.** Decided as "current semantics, future tightening optional".

### Q5 -- When legacy endpoint identity is replaced and what migrates

- **Current answer.** `docs/backlog/session-bound-invocation-context.md`
  decomposes the selected migration: one immutable session context per process,
  privacy-preserving endpoint caller-session metadata, chat/adventure/stdio
  session-keyed migration, and legacy endpoint-identity cleanup. The old
  service-object identity plan is superseded.
- **Tracker.** `docs/proposals/session-bound-invocation-context-proposal.md`,
  `docs/backlog/session-bound-invocation-context.md`,
  `docs/backlog/stage-6-capability-semantics.md`.
- **Status.** Selected milestone. See R3.

### Q6 -- Minimum production TCB target

- **Current answer.** `docs/proposals/security-and-verification-proposal.md`
  now enumerates the current demo/proof TCB and the target production TCB.
  Current proofs still trust kernel networking, init/supervisors, broker/session
  services, harnesses, and QEMU virtio. The target production TCB removes
  ordinary apps and shell children but still includes minimal init/supervisor,
  credential/session/broker/key/audit services, production device managers,
  and ABI/schema/build-signature inputs.
- **Tracker.** `docs/security/trust-boundaries.md`,
  `docs/proposals/userspace-authority-broker-proposal.md`,
  `docs/proposals/boot-to-shell-proposal.md`.
- **Status.** Partially answered. The TCB statement exists; reducing the actual
  implementation to that target and proving the non-loopback shell gates remains
  open.

### Q7 -- Revocation strategy

- **Current answer.** Generation/epoch revocation exists for endpoint-backed
  caps; `CapabilityManager.revoke` cleans up endpoint-backed service objects
  by object behavior. Revocation trees, leases, and supervisor-owned-cap
  patterns are proposal-shaped.
- **Tracker.** `docs/proposals/service-architecture-proposal.md`,
  `docs/proposals/session-bound-invocation-context-proposal.md`,
  `docs/capability-model.md`.
- **Status.** Open. The chosen revocation primitive set (epochs vs trees vs
  leases vs explicit-revoke methods per object) needs an explicit decision.

### Q8 -- Boundary between kernel and service-level resource accounting

- **Current answer.** Memory frame grants and cap-table slots are kernel
  accounting; storage/network buffer accounting is proposed at the service
  layer. The boundary is not yet implementation-driven.
- **Tracker.** `docs/proposals/resource-accounting-proposal.md`,
  `docs/proposals/storage-and-naming-proposal.md`,
  `docs/proposals/networking-proposal.md`.
- **Status.** Open.

### Q9 -- CPU accounting and scheduling contexts

- **Current answer.** Round-robin is the current scheduler. Donation,
  priority inheritance, fixed budgets, and explicit scheduling capabilities
  are proposed but not implemented.
- **Tracker.** `docs/proposals/smp-proposal.md`,
  `docs/proposals/resource-accounting-proposal.md`,
  `docs/architecture/scheduling.md`.
- **Status.** Open. Pick a CPU accounting model before Multi-Process SMP
  Concurrency or In-Process Threading Scalability milestones land.

### Q10 -- IOMMU requirement for userspace networking

- **Current answer.** `docs/dma-isolation-design.md` keeps the kernel-owned
  bounce-buffer model for QEMU virtio and explicitly defers an IOMMU
  requirement to the userspace driver gate.
- **Tracker.** `docs/dma-isolation-design.md`,
  `docs/proposals/networking-proposal.md`,
  `docs/proposals/cloud-deployment-proposal.md`.
- **Status.** Open. The production answer (IOMMU-required vs continued bounce
  buffers vs both) depends on platform support and will be decided at the
  userspace driver gate.

### Q11 -- Capability persistence model

- **Current answer.** All capabilities are runtime-only today; sealed/stored
  caps and namespace-mediated reconstitution are storage-proposal scope.
- **Tracker.** `docs/proposals/storage-and-naming-proposal.md`,
  `docs/proposals/volume-encryption-proposal.md`,
  `docs/paper/plan.md` (paper-scoped persistence Tier-1 prerequisite).
- **Status.** Open.

### Q12 -- Least-privilege shell command invocation

- **Current answer.** `capos-shell` runs commands using broker-issued
  bundles; the broker, not the shell, is the policy decision point.
  `RestrictedShellLauncher` keeps remote shell launches off raw spawn
  authority.
- **Tracker.** `docs/proposals/shell-proposal.md`,
  `docs/proposals/userspace-authority-broker-proposal.md`,
  `docs/proposals/boot-to-shell-proposal.md`.
- **Status.** Direction agreed, complete migration to broker-only authority
  for every shell-driven invocation is open.

### Q13 -- Formal properties to prove

- **Current answer.** Existing bounded proofs cover cap-table non-forgery,
  frame-bitmap invariants, transfer rollback, and ring producer-consumer
  invariants. seL4-style full functional refinement is explicitly out of
  scope.
- **Tracker.** `docs/proposals/security-and-verification-proposal.md`,
  `docs/security/verification-workflow.md`,
  `docs/proposals/formal-mac-mic-proposal.md`.
- **Status.** Partially answered. A definitive list of "what we will keep
  proving" vs "what we will keep testing" should be added when the next
  Kani/Loom obligation set is concrete.

### Q14 -- Threat model coverage

- **Current answer.** `docs/proposals/security-and-verification-proposal.md`
  now contains a threat actor matrix for local physical attackers, malicious DMA
  devices, malicious boot manifests, compromised init/supervisors, compromised
  narrow services, hostile network peers, and malicious build dependencies.
- **Tracker.** `docs/security/trust-boundaries.md`,
  `docs/proposals/security-and-verification-proposal.md`,
  `docs/dma-isolation-design.md`,
  `docs/trusted-build-inputs.md`.
- **Status.** Answered at design level. Remaining work is implementation/proof
  for the gates listed in `REVIEW_FINDINGS.md`.

### Q15 -- Language runtimes integration model

- **Current answer.** `capos-rt` is the canonical no_std Rust runtime;
  Go, Lua, WASI, and POSIX adapters are proposal-shaped and currently
  framed as separate runtimes consuming generated capnp clients, not as a
  shared C/WASI ABI layer.
- **Tracker.** `docs/proposals/userspace-binaries-proposal.md`,
  `docs/proposals/go-runtime-proposal.md`,
  `docs/proposals/lua-scripting-proposal.md`.
- **Status.** Open. A common ABI layer vs per-runtime generated clients has
  not been decided; the current default is per-runtime clients.
