Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Research And Design Gaps Backlog

This file tracks important OS design, development, and user-story areas that are absent, thinly covered, or only indirectly owned by existing capOS proposals. It is a triage register, not an execution queue. Listing a gap here does not change the selected milestone in WORKPLAN.md and does not mean the project should immediately create a full proposal.

Promote an entry out of this file only when a visible milestone, paper evidence gap, review finding, or explicit user direction makes the area actionable. Promotion targets:

  • docs/research/ for prior-art survey or external precedent.
  • docs/proposals/ for a concrete reviewed design direction.
  • A focused docs/backlog/ file when the design is accepted enough to decompose implementation.
  • docs/design-risks-register.md when the gap is an active architectural risk with an owner.

Status Vocabulary

  • Uncovered: no owned design exists yet.
  • Thin: mentioned indirectly, but no coherent owner or decision record.
  • Backlog-only: task decomposition exists without a full proposal.
  • Research-needed: design should not start before prior-art review.
  • Ready-for-proposal: enough constraints exist to draft a proposal.
  • Deferred: intentionally future work, not a near-term blocker.
  • Rejected: considered and explicitly not pursued.

Promotion Checklist

Before creating a proposal from an entry here:

  • Identify the visible user or operator outcome the work would enable.
  • List existing capOS docs that already partially cover the area.
  • List the docs/research/ files actually read, or explain why no research file applies.
  • Decide the first capability boundary or trust boundary that must be designed.
  • Define one QEMU, host-test, documentation, or review gate that would prove the proposal made progress.

Display, GUI, And Input

Status: Uncovered.

User story: a user boots capOS on a desktop, laptop, or remote graphical session and uses multiple graphical apps with keyboard, pointer, clipboard, accessibility, and app isolation.

Current coverage: browser, browser/WASM, agent, GPU, and shell proposals point toward future visual sessions, but there is no native display server, compositor, input routing, window authority, clipboard, screenshot, or accessibility model.

Missing decisions:

  • Display ownership and framebuffer/GPU authority.
  • Compositor trust boundary and per-window capability model.
  • Keyboard, pointer, touch, IME, and focus authority.
  • Clipboard and drag/drop data-transfer policy.
  • Screen capture and remote desktop authority.
  • Accessibility-service authority and privacy boundaries.

Research needed:

  • Genode GUI/session routing and report-ROM style composition.
  • Wayland compositor security model and clipboard limitations.
  • Fuchsia Scenic/input pipeline if the native GUI track becomes near-term.
  • seL4/CapROS precedents for trusted path or secure attention, if applicable.

Promote when: native graphical sessions, browser UI, desktop app isolation, or rich web/agent interaction becomes a selected milestone.

Driver Framework And Hotplug

Status: Thin.

User story: an operator plugs in a device, capOS identifies it, starts or restarts the correct isolated driver, and exposes only the intended typed capabilities.

Current coverage: docs/dma-isolation-design.md, docs/backlog/hardware-boot-storage.md, networking, storage, cloud, and GPU proposals cover pieces of device work. There is no general driver framework for discovery, binding, isolation, recovery, firmware, or hotplug.

Missing decisions:

  • Device discovery authority and driver matching policy.
  • Driver process lifecycle, crash restart, and stale handle behavior.
  • Firmware loading and firmware provenance.
  • Hotplug attach/detach semantics.
  • Interrupt, MMIO, DMA, and power authority handoff.
  • User-space driver SDK boundaries and test harnesses.

Research needed:

  • Genode driver components and session routing.
  • Zircon/Fuchsia driver framework concepts.
  • Linux VFIO/uio and userspace-driver isolation tradeoffs.
  • seL4 device-driver partitioning examples.

Promote when: userspace NIC, block-device, USB, GPU, or real hardware bring-up requires reusable driver lifecycle rules.

Power, Suspend, Resume, And Thermal Policy

Status: Uncovered.

User story: a laptop or VM can sleep, wake, preserve sessions, and report power or thermal limits without leaking stale authority or corrupting timers.

Current coverage: tickless scheduling covers timer cleanup and idle mechanics, but not power management as an OS product area.

Missing decisions:

  • Suspend/resume authority and system-wide quiesce protocol.
  • Wake-source capabilities and audit.
  • Battery, charger, lid, and thermal sensor surfaces.
  • CPU frequency, C-state, and thermal-throttling policy.
  • Timer and network behavior across sleep.
  • Session and service liveness after resume.

Research needed:

  • ACPI power-state model and Linux suspend blockers/wakeup sources.
  • Fuchsia power framework if relevant.
  • Genode power-management patterns for component systems.

Promote when: laptop hardware, cloud hibernation, low-power idle, or interactive remote-shell reliability needs sleep/resume semantics.

Time, Clock, And Trusted Timestamp Services

Status: Thin.

User story: services can distinguish monotonic time, wall-clock time, and trusted audit time, and cannot silently forge system time.

Current coverage: scheduler and tickless proposals mention clocks, timers, deadlines, and clocksource/clockevent split. There is no user-facing time authority model.

Missing decisions:

  • Monotonic, boot, realtime, and coarse clock capability surfaces.
  • Who can set wall-clock time and how changes are audited.
  • NTP/PTP/cloud-metadata time synchronization authority.
  • Timezone and locale data ownership.
  • Leap-second and clock-step behavior.
  • Timestamp trust level carried into audit records.

Research needed:

  • Linux clock ids, adjtimex/NTP discipline, and time namespaces.
  • Fuchsia clock objects and UTC maintenance.
  • Cloud metadata time and attestation interactions.

Promote when: audit log completion, TLS certificate validation, distributed services, or durable storage needs trusted timestamp semantics.

Software Installation, Packages, And Rollback

Status: Thin.

User story: a user or operator installs an app, inspects requested authority, updates it, rolls it back, and removes its state without ambient filesystem assumptions.

Current coverage: repository composition, storage/naming, userspace binaries, live upgrade, cloud deployment, and public-release proposals cover adjacent pieces. There is no package/app distribution model.

Missing decisions:

  • Package manifest schema and authority-request review.
  • Signed repositories, update channels, and revocation.
  • Dependency resolution and build provenance.
  • App install/remove lifecycle and state ownership.
  • Rollback, staged rollout, and compatibility policy.
  • Vulnerability advisory and emergency update workflow.

Research needed:

  • Nix/Guix, OSTree, Flatpak portals, Android package permissions, and Fuchsia package/update system.
  • Supply-chain signing systems such as TUF/in-toto/Sigstore if this becomes release-critical.

Promote when: capOS needs installable demos, sibling repositories, public release, or cloud image update flow.

Crash Recovery, Supervision, And Diagnostics

Status: Thin.

User story: a service crashes; init or an authorized supervisor restarts it or enters a known degraded mode without leaking authority, hiding the cause, or looping forever.

Current coverage: service architecture already sketches SpawnRequest restart policy, supervisor-owned respawn, and always/on-failure restart modes; libcapos-service covers service lifecycle pieces; live-upgrade planning ties fault containment to supervisor respawn; and system monitoring covers logs/metrics/crash records at a high level. Crash-loop budgets, core/minidump capture, degraded-mode semantics, watchdog policy, and stale/in-flight cleanup are still not owned as one recovery design.

Missing decisions:

  • Restart policy authority and failure budget.
  • Crash-loop backoff and operator override.
  • Core dump or minidump capture with capability redaction.
  • Watchdog and health-check capabilities.
  • Degraded boot and emergency shell semantics.
  • Stale capabilities and in-flight calls after service death.

Research needed:

  • Erlang/OTP supervision trees, systemd restart policy, Kubernetes probes, and Fuchsia component lifecycle.
  • Capability-system precedent for crash propagation and service replacement.

Promote when: shared services, remote shell, storage, or agent workloads need production-grade recovery behavior.

Backup, Restore, Snapshots, And Migration

Status: Thin.

User story: an operator loses a disk or VM and restores users, services, keys, and app state while avoiding stale authority and accidental data disclosure.

Current coverage: storage/naming, cloud deployment, and the hardware/boot/ storage backlog already cover narrower pieces: user-owned encrypted save transport, fake Drive/Firebase restore rejection tests, rollback/stale handling, and cloud-backed snapshot material. System-wide disaster recovery for users, services, keys, machine identity, and authority state is still not owned as one design.

Missing decisions:

  • Snapshot capability boundary and consistency protocol.
  • Encrypted export/import and restore identity.
  • Key recovery and disaster recovery drills.
  • Partial restore and per-service state ownership.
  • Backup retention, deletion, and privacy policy.
  • Migration between machines or cloud instances.

Research needed:

  • ZFS/Btrfs snapshot semantics, Borg/Restic encrypted backup models, and cloud snapshot/key-management practices.
  • Capability-specific concerns from EROS/CapROS persistence if applicable.

Promote when: writable storage, durable local accounts, volume encryption, or cloud deployment becomes near-term.

Human-Facing Administration And Explainability

Status: Thin.

User story: an operator can answer who has access to a service, why, since when, what will happen if access is revoked, and why a request was denied.

Current coverage: shell, system info, local users, system monitoring, configuration, and security proposals cover pieces. There is no unified administrator UX or policy explainability track.

Missing decisions:

  • Account and role management commands or UI.
  • Grant inspection, diff, revoke, and dry-run behavior.
  • Denial explanation format across kernel, broker, and services.
  • Audit search and incident timeline views.
  • Diagnostics bundle generation and redaction.
  • Safe repair workflow for broken configuration or policy.

Research needed:

  • Kubernetes RBAC can-i/audit practices.
  • Cloud IAM policy simulators and access-analyzer tools.
  • Genode configuration/reporting UX for component graphs.

Promote when: local users, ABAC/MAC, remote shell, or operator configuration needs day-2 administration rather than proof-only commands.

Developer Debugging, Profiling, And Tooling

Status: Thin.

User story: a developer writes a capOS service, runs it locally, debugs a failed capability call, profiles it, and ships it with reproducible evidence.

Current coverage: harness engineering, benchmarks, generated-code checks, run-targets, and the paper evidence track cover pieces. There is no full debugger/profiler/developer-tooling proposal.

Missing decisions:

  • Debug authority and process attach policy.
  • Symbols, stack traces, crash dumps, and source maps.
  • Ring/syscall/capability-call tracing.
  • Service schema explorer and request replay tooling.
  • Guest profiling, flamegraph, and benchmark attribution workflow.
  • App/service templates and local developer SDK.

Research needed:

  • GDB remote protocol, Linux perf/eBPF-style tracing boundaries, Fuchsia diagnostics, and seL4 debug authority practices.

Promote when: non-trivial third-party services, public release, or performance claims need repeatable developer workflows.

Compatibility And App Porting Strategy

Status: Thin.

User story: a developer ports a small existing CLI or server to capOS and knows which Unix assumptions work, fail, or require explicit capability adapters.

Current coverage: userspace binaries, Go, Lua, POSIX adapters, WASI, C/C++, and language-runtime proposals mention porting targets. There is no concrete compatibility profile matrix.

Missing decisions:

  • Minimal libc/POSIX surface and unsupported-call policy.
  • Filesystem, environment, argv, signal, pipe, socket, and process semantics.
  • Dynamic linking and shared-library policy.
  • WASI adapter authority model.
  • Build recipes and package corpus selection.
  • Porting report template and acceptance tests.

Research needed:

  • WASI preview models, CloudABI history, Redox, Hermit, Fuchsia POSIX layer, and Genode libc/VFS integration.

Promote when: a language runtime, POSIX adapter, or real application corpus becomes a selected milestone.

Accessibility And Internationalization

Status: Uncovered.

User story: non-English users and assistive-technology users can operate capOS shells, graphical sessions, and web/agent surfaces without privileged workarounds.

Current coverage: none beyond general shell/browser surface discussions.

Missing decisions:

  • Unicode, locale, collation, and timezone data ownership.
  • Input methods and keyboard layout authority.
  • Screen reader or accessibility tree service boundary.
  • High-contrast, font scaling, and reduced-motion policy.
  • Translation and message-catalog strategy.
  • Accessible denial/audit messages and setup flow.

Research needed:

  • Web accessibility platform architecture, Wayland accessibility status, Fuchsia accessibility manager, and terminal accessibility conventions.

Promote when: graphical sessions, public demos, web shell, or production interactive setup becomes user-facing beyond developer/operator proof flows.

Fleet Operations And Remote Management

Status: Thin.

User story: an operator manages many capOS nodes and can prove which version, policy, keys, services, and update state each node is running.

Current coverage: cloud deployment, cloud metadata, system monitoring, configuration, hosted agents, and public release cover adjacent concerns. There is no fleet-management design.

Missing decisions:

  • Node enrollment and identity bootstrap.
  • Remote attestation and inventory reporting.
  • Configuration rollout and drift detection.
  • Remote logs, metrics, and audit aggregation.
  • Staged update and rollback policy.
  • Break-glass access and emergency revocation.

Research needed:

  • Kubernetes node/bootstrap models, cloud instance identity, SPIFFE/SPIRE, TPM/measured boot attestation, and OSQuery-style inventory.

Promote when: cloud deployment, hosted agent swarms, public release, or remote administration becomes more than a single-node proof.

Privacy And Data Governance

Status: Thin.

User story: a user can see and revoke what data a service can access, and deleted data does not unintentionally persist in logs, backups, or derived indexes.

Current coverage: capability authority, session privacy, audit redaction, identity policy, storage, monitoring, and browser/agent proposals cover parts of the problem. There is no explicit data-governance design.

Missing decisions:

  • Data classification and purpose-bound access metadata.
  • Retention, deletion, and legal-hold semantics.
  • Derived data, indexes, caches, and backup deletion behavior.
  • User consent and service data export.
  • Audit redaction versus forensic retention.
  • Cross-service data-sharing policy and review UX.

Research needed:

  • Object-capability privacy patterns, GDPR-style data lifecycle controls, browser permission UX, and cloud DLP/data catalog practices.

Promote when: persistent user data, browser/agent activity, hosted services, or public release introduces real privacy expectations.