Research And Design Gaps Backlog

This file tracks important OS design, development, and user-story areas that are absent, thinly covered, or only indirectly owned by existing capOS proposals. It is a triage register, not an execution queue. Listing a gap here does not change the selected milestone (the loopyard project setting selected_milestone) and does not mean the project should immediately create a full proposal.

Promote an entry out of this file only when a visible milestone, paper evidence gap, review finding, or explicit user direction makes the area actionable. Promotion targets:

docs/research/ for prior-art survey or external precedent.
docs/proposals/ for a concrete reviewed design direction.
A focused docs/backlog/ file when the design is accepted enough to decompose implementation.
docs/design-risks-register.md when the gap is an active architectural risk with an owner.

Status Vocabulary

Uncovered: no owned design exists yet.
Thin: mentioned indirectly, but no coherent owner or decision record.
Backlog-only: task decomposition exists without a full proposal.
Research-needed: design should not start before prior-art review.
Ready-for-proposal: enough constraints exist to draft a proposal.
Deferred: intentionally future work, not a near-term blocker.
Rejected: considered and explicitly not pursued.

Promotion Checklist

Before creating a proposal from an entry here:

Identify the visible user or operator outcome the work would enable.
List existing capOS docs that already partially cover the area.
List the docs/research/ files actually read, or explain why no research file applies.
Decide the first capability boundary or trust boundary that must be designed.
Define one QEMU, host-test, documentation, or review gate that would prove the proposal made progress.

Display, GUI, And Input

Status: Uncovered.

User story: a user boots capOS on a desktop, laptop, or remote graphical session and uses multiple graphical apps with keyboard, pointer, clipboard, accessibility, and app isolation.

Current coverage: browser, browser/WASM, agent, GPU, and shell proposals point toward future visual sessions, but there is no native display server, compositor, input routing, window authority, clipboard, screenshot, or accessibility model.

Missing decisions:

Display ownership and framebuffer/GPU authority.
Compositor trust boundary and per-window capability model.
Keyboard, pointer, touch, IME, and focus authority.
Clipboard and drag/drop data-transfer policy.
Screen capture and remote desktop authority.
Accessibility-service authority and privacy boundaries.

Research needed:

Genode GUI/session routing and report-ROM style composition.
Wayland compositor security model and clipboard limitations.
Fuchsia Scenic/input pipeline if the native GUI track becomes near-term.
seL4/CapROS precedents for trusted path or secure attention, if applicable.

Promote when: native graphical sessions, browser UI, desktop app isolation, or rich web/agent interaction becomes a selected milestone.

Driver Framework And Hotplug

Status: Thin.

User story: an operator plugs in a device, capOS identifies it, starts or restarts the correct isolated driver, and exposes only the intended typed capabilities.

Current coverage: docs/dma-isolation-design.md, docs/backlog/hardware-boot-storage.md, networking, storage, cloud, and GPU proposals cover pieces of device work. There is no general driver framework for discovery, binding, isolation, recovery, firmware, or hotplug.

Missing decisions:

Device discovery authority and driver matching policy.
Driver process lifecycle, crash restart, and stale handle behavior.
Firmware loading and firmware provenance.
Hotplug attach/detach semantics.
Interrupt, MMIO, DMA, and power authority handoff.
User-space driver SDK boundaries and test harnesses.

Research needed:

Genode driver components and session routing.
Zircon/Fuchsia driver framework concepts.
Linux VFIO/uio and userspace-driver isolation tradeoffs.
seL4 device-driver partitioning examples.

Promote when: userspace NIC, block-device, USB, GPU, or real hardware bring-up requires reusable driver lifecycle rules.

Power, Suspend, Resume, And Thermal Policy

Status: Uncovered.

User story: a laptop or VM can sleep, wake, preserve sessions, and report power or thermal limits without leaking stale authority or corrupting timers.

Current coverage: tickless scheduling covers timer cleanup and idle mechanics, but not power management as an OS product area.

Missing decisions:

Suspend/resume authority and system-wide quiesce protocol.
Wake-source capabilities and audit.
Battery, charger, lid, and thermal sensor surfaces.
CPU frequency, C-state, and thermal-throttling policy.
Timer and network behavior across sleep.
Session and service liveness after resume.

Research needed:

ACPI power-state model and Linux suspend blockers/wakeup sources.
Fuchsia power framework if relevant.
Genode power-management patterns for component systems.

Promote when: laptop hardware, cloud hibernation, low-power idle, or interactive remote-shell reliability needs sleep/resume semantics.

Time, Clock, And Trusted Timestamp Services

Status: Promoted to proposal (2026-05-22). See Time and Clock Authority and the prior-art note Time and Clock Authority research. Residual research (servo/loop-filter, holdover/error-bound, suspend recovery) is noted in that proposal. Original gap status was Thin.

User story: services can distinguish monotonic time, wall-clock time, and trusted audit time, and cannot silently forge system time.

Current coverage: scheduler and tickless proposals mention clocks, timers, deadlines, and clocksource/clockevent split. There is no user-facing time authority model.

Missing decisions:

Monotonic, boot, realtime, and coarse clock capability surfaces.
Who can set wall-clock time and how changes are audited.
NTP/PTP/cloud-metadata time synchronization authority.
Timezone and locale data ownership.
Leap-second and clock-step behavior.
Timestamp trust level carried into audit records.

Research needed:

Linux clock ids, adjtimex/NTP discipline, and time namespaces.
Fuchsia clock objects and UTC maintenance.
Cloud metadata time and attestation interactions.

Promote when: audit log completion, TLS certificate validation, distributed services, or durable storage needs trusted timestamp semantics.

Software Installation, Packages, And Rollback

Status: Thin.

User story: a user or operator installs an app, inspects requested authority, updates it, rolls it back, and removes its state without ambient filesystem assumptions.

Current coverage: repository composition, storage/naming, userspace binaries, live upgrade, cloud deployment, and public-release proposals cover adjacent pieces. There is no package/app distribution model.

Missing decisions:

Package manifest schema and authority-request review.
Signed repositories, update channels, and revocation.
Dependency resolution and build provenance.
App install/remove lifecycle and state ownership.
Rollback, staged rollout, and compatibility policy.
Vulnerability advisory and emergency update workflow.

Research needed:

Nix/Guix, OSTree, Flatpak portals, Android package permissions, and Fuchsia package/update system.
Supply-chain signing systems such as TUF/in-toto/Sigstore if this becomes release-critical.

Promote when: capOS needs installable demos, sibling repositories, public release, or cloud image update flow.

Crash Recovery, Supervision, And Diagnostics

Status: Promoted to proposal (2026-05-22). See Crash Recovery and Supervision and the prior-art note Crash Recovery and Supervision research. Residual research (Fuchsia component-manager escrow semantics) is noted in that proposal. Original gap status was Thin.

User story: a service crashes; init or an authorized supervisor restarts it or enters a known degraded mode without leaking authority, hiding the cause, or looping forever.

Current coverage: service architecture already sketches SpawnRequest restart policy, supervisor-owned respawn, and always/on-failure restart modes; capos-service covers service lifecycle pieces; live-upgrade planning ties fault containment to supervisor respawn; and system monitoring covers logs/metrics/crash records at a high level. Crash-loop budgets, core/minidump capture, degraded-mode semantics, watchdog policy, and stale/in-flight cleanup are still not owned as one recovery design.

Missing decisions:

Restart policy authority and failure budget.
Crash-loop backoff and operator override.
Core dump or minidump capture with capability redaction.
Watchdog and health-check capabilities.
Degraded boot and emergency shell semantics.
Stale capabilities and in-flight calls after service death.

Research needed:

Erlang/OTP supervision trees, systemd restart policy, Kubernetes probes, and Fuchsia component lifecycle.
Capability-system precedent for crash propagation and service replacement.

Promote when: shared services, remote shell, storage, or agent workloads need production-grade recovery behavior.

Backup, Restore, Snapshots, And Migration

Status: Thin.

User story: an operator loses a disk or VM and restores users, services, keys, and app state while avoiding stale authority and accidental data disclosure.

Current coverage: storage/naming, cloud deployment, and the hardware/boot/ storage backlog already cover narrower pieces: user-owned encrypted save transport, fake Drive/Firebase restore rejection tests, rollback/stale handling, and cloud-backed snapshot material. System-wide disaster recovery for users, services, keys, machine identity, and authority state is still not owned as one design.

Missing decisions:

Snapshot capability boundary and consistency protocol.
Encrypted export/import and restore identity.
Key recovery and disaster recovery drills.
Partial restore and per-service state ownership.
Backup retention, deletion, and privacy policy.
Migration between machines or cloud instances.

Research needed:

ZFS/Btrfs snapshot semantics, Borg/Restic encrypted backup models, and cloud snapshot/key-management practices.
Capability-specific concerns from EROS/CapROS persistence if applicable.

Promote when: writable storage, durable local accounts, volume encryption, or cloud deployment becomes near-term.

Human-Facing Administration And Explainability

Status: Thin.

User story: an operator can answer who has access to a service, why, since when, what will happen if access is revoked, and why a request was denied.

Current coverage: shell, system info, local users, system monitoring, configuration, and security proposals cover pieces. There is no unified administrator UX or policy explainability track.

Missing decisions:

Account and role management commands or UI.
Grant inspection, diff, revoke, and dry-run behavior.
Denial explanation format across kernel, broker, and services.
Audit search and incident timeline views.
Diagnostics bundle generation and redaction.
Safe repair workflow for broken configuration or policy.

Research needed:

Kubernetes RBAC can-i/audit practices.
Cloud IAM policy simulators and access-analyzer tools.
Genode configuration/reporting UX for component graphs.

Promote when: local users, ABAC/MAC, remote shell, or operator configuration needs day-2 administration rather than proof-only commands.

Developer Debugging, Profiling, And Tooling

Status: Partially promoted to proposal (2026-05-22). The debug/trace/profile authority slice is now Debug and Trace Authority with the prior-art note Debug, Trace, and Profiling Authority research. The broader developer-tooling surface (service templates, local SDK, schema explorer, request-replay) remains Thin and is not yet owned by a proposal.

User story: a developer writes a capOS service, runs it locally, debugs a failed capability call, profiles it, and ships it with reproducible evidence.

Current coverage: harness engineering, benchmarks, generated-code checks, run-targets, and the paper evidence track cover pieces. There is no full debugger/profiler/developer-tooling proposal.

Missing decisions:

Debug authority and process attach policy.
Symbols, stack traces, crash dumps, and source maps.
Ring/syscall/capability-call tracing.
Service schema explorer and request replay tooling.
Guest profiling, flamegraph, and benchmark attribution workflow.
App/service templates and local developer SDK.

Research needed:

GDB remote protocol, Linux perf/eBPF-style tracing boundaries, Fuchsia diagnostics, and seL4 debug authority practices.

Promote when: non-trivial third-party services, public release, or performance claims need repeatable developer workflows.

Compatibility And App Porting Strategy

Status: Thin.

User story: a developer ports a small existing CLI or server to capOS and knows which Unix assumptions work, fail, or require explicit capability adapters.

Current coverage: userspace binaries, Go, Lua, POSIX adapters, WASI, C/C++, and language-runtime proposals mention porting targets. There is no concrete compatibility profile matrix.

Missing decisions:

Minimal libc/POSIX surface and unsupported-call policy.
Filesystem, environment, argv, signal, pipe, socket, and process semantics.
Dynamic linking and shared-library policy.
WASI adapter authority model.
Build recipes and package corpus selection.
Porting report template and acceptance tests.

Research needed:

WASI preview models, CloudABI history, Redox, Hermit, Fuchsia POSIX layer, and Genode libc/VFS integration.

Promote when: a language runtime, POSIX adapter, or real application corpus becomes a selected milestone.

Accessibility And Internationalization

Status: Uncovered.

User story: non-English users and assistive-technology users can operate capOS shells, graphical sessions, and web/agent surfaces without privileged workarounds.

Current coverage: none beyond general shell/browser surface discussions.

Missing decisions:

Unicode, locale, collation, and timezone data ownership.
Input methods and keyboard layout authority.
Screen reader or accessibility tree service boundary.
High-contrast, font scaling, and reduced-motion policy.
Translation and message-catalog strategy.
Accessible denial/audit messages and setup flow.

Research needed:

Web accessibility platform architecture, Wayland accessibility status, Fuchsia accessibility manager, and terminal accessibility conventions.

Promote when: graphical sessions, public demos, web shell, or production interactive setup becomes user-facing beyond developer/operator proof flows.

Fleet Operations And Remote Management

Status: Thin.

User story: an operator manages many capOS nodes and can prove which version, policy, keys, services, and update state each node is running.

Current coverage: cloud deployment, cloud metadata, system monitoring, configuration, hosted agents, and public release cover adjacent concerns. There is no fleet-management design.

Missing decisions:

Node enrollment and identity bootstrap.
Remote attestation and inventory reporting.
Configuration rollout and drift detection.
Remote logs, metrics, and audit aggregation.
Staged update and rollback policy.
Break-glass access and emergency revocation.

Research needed:

Kubernetes node/bootstrap models, cloud instance identity, SPIFFE/SPIRE, TPM/measured boot attestation, and OSQuery-style inventory.

Promote when: cloud deployment, hosted agent swarms, public release, or remote administration becomes more than a single-node proof.

Privacy And Data Governance

Status: Thin.

User story: a user can see and revoke what data a service can access, and deleted data does not unintentionally persist in logs, backups, or derived indexes.

Current coverage: capability authority, session privacy, audit redaction, identity policy, storage, monitoring, and browser/agent proposals cover parts of the problem. There is no explicit data-governance design.

Missing decisions:

Data classification and purpose-bound access metadata.
Retention, deletion, and legal-hold semantics.
Derived data, indexes, caches, and backup deletion behavior.
User consent and service data export.
Audit redaction versus forensic retention.
Cross-service data-sharing policy and review UX.

Research needed:

Object-capability privacy patterns, GDPR-style data lifecycle controls, browser permission UX, and cloud DLP/data catalog practices.

Promote when: persistent user data, browser/agent activity, hosted services, or public release introduces real privacy expectations.

Keyboard shortcuts

capOS Documentation