Research And Design Gaps Backlog
This file tracks important OS design, development, and user-story areas that
are absent, thinly covered, or only indirectly owned by existing capOS
proposals. It is a triage register, not an execution queue. Listing a gap here
does not change the selected milestone in WORKPLAN.md and does not mean the
project should immediately create a full proposal.
Promote an entry out of this file only when a visible milestone, paper evidence gap, review finding, or explicit user direction makes the area actionable. Promotion targets:
docs/research/for prior-art survey or external precedent.docs/proposals/for a concrete reviewed design direction.- A focused
docs/backlog/file when the design is accepted enough to decompose implementation. docs/design-risks-register.mdwhen the gap is an active architectural risk with an owner.
Status Vocabulary
- Uncovered: no owned design exists yet.
- Thin: mentioned indirectly, but no coherent owner or decision record.
- Backlog-only: task decomposition exists without a full proposal.
- Research-needed: design should not start before prior-art review.
- Ready-for-proposal: enough constraints exist to draft a proposal.
- Deferred: intentionally future work, not a near-term blocker.
- Rejected: considered and explicitly not pursued.
Promotion Checklist
Before creating a proposal from an entry here:
- Identify the visible user or operator outcome the work would enable.
- List existing capOS docs that already partially cover the area.
- List the
docs/research/files actually read, or explain why no research file applies. - Decide the first capability boundary or trust boundary that must be designed.
- Define one QEMU, host-test, documentation, or review gate that would prove the proposal made progress.
Display, GUI, And Input
Status: Uncovered.
User story: a user boots capOS on a desktop, laptop, or remote graphical session and uses multiple graphical apps with keyboard, pointer, clipboard, accessibility, and app isolation.
Current coverage: browser, browser/WASM, agent, GPU, and shell proposals point toward future visual sessions, but there is no native display server, compositor, input routing, window authority, clipboard, screenshot, or accessibility model.
Missing decisions:
- Display ownership and framebuffer/GPU authority.
- Compositor trust boundary and per-window capability model.
- Keyboard, pointer, touch, IME, and focus authority.
- Clipboard and drag/drop data-transfer policy.
- Screen capture and remote desktop authority.
- Accessibility-service authority and privacy boundaries.
Research needed:
- Genode GUI/session routing and report-ROM style composition.
- Wayland compositor security model and clipboard limitations.
- Fuchsia Scenic/input pipeline if the native GUI track becomes near-term.
- seL4/CapROS precedents for trusted path or secure attention, if applicable.
Promote when: native graphical sessions, browser UI, desktop app isolation, or rich web/agent interaction becomes a selected milestone.
Driver Framework And Hotplug
Status: Thin.
User story: an operator plugs in a device, capOS identifies it, starts or restarts the correct isolated driver, and exposes only the intended typed capabilities.
Current coverage: docs/dma-isolation-design.md,
docs/backlog/hardware-boot-storage.md, networking, storage, cloud, and GPU
proposals cover pieces of device work. There is no general driver framework
for discovery, binding, isolation, recovery, firmware, or hotplug.
Missing decisions:
- Device discovery authority and driver matching policy.
- Driver process lifecycle, crash restart, and stale handle behavior.
- Firmware loading and firmware provenance.
- Hotplug attach/detach semantics.
- Interrupt, MMIO, DMA, and power authority handoff.
- User-space driver SDK boundaries and test harnesses.
Research needed:
- Genode driver components and session routing.
- Zircon/Fuchsia driver framework concepts.
- Linux VFIO/uio and userspace-driver isolation tradeoffs.
- seL4 device-driver partitioning examples.
Promote when: userspace NIC, block-device, USB, GPU, or real hardware bring-up requires reusable driver lifecycle rules.
Power, Suspend, Resume, And Thermal Policy
Status: Uncovered.
User story: a laptop or VM can sleep, wake, preserve sessions, and report power or thermal limits without leaking stale authority or corrupting timers.
Current coverage: tickless scheduling covers timer cleanup and idle mechanics, but not power management as an OS product area.
Missing decisions:
- Suspend/resume authority and system-wide quiesce protocol.
- Wake-source capabilities and audit.
- Battery, charger, lid, and thermal sensor surfaces.
- CPU frequency, C-state, and thermal-throttling policy.
- Timer and network behavior across sleep.
- Session and service liveness after resume.
Research needed:
- ACPI power-state model and Linux suspend blockers/wakeup sources.
- Fuchsia power framework if relevant.
- Genode power-management patterns for component systems.
Promote when: laptop hardware, cloud hibernation, low-power idle, or interactive remote-shell reliability needs sleep/resume semantics.
Time, Clock, And Trusted Timestamp Services
Status: Thin.
User story: services can distinguish monotonic time, wall-clock time, and trusted audit time, and cannot silently forge system time.
Current coverage: scheduler and tickless proposals mention clocks, timers, deadlines, and clocksource/clockevent split. There is no user-facing time authority model.
Missing decisions:
- Monotonic, boot, realtime, and coarse clock capability surfaces.
- Who can set wall-clock time and how changes are audited.
- NTP/PTP/cloud-metadata time synchronization authority.
- Timezone and locale data ownership.
- Leap-second and clock-step behavior.
- Timestamp trust level carried into audit records.
Research needed:
- Linux clock ids, adjtimex/NTP discipline, and time namespaces.
- Fuchsia clock objects and UTC maintenance.
- Cloud metadata time and attestation interactions.
Promote when: audit log completion, TLS certificate validation, distributed services, or durable storage needs trusted timestamp semantics.
Software Installation, Packages, And Rollback
Status: Thin.
User story: a user or operator installs an app, inspects requested authority, updates it, rolls it back, and removes its state without ambient filesystem assumptions.
Current coverage: repository composition, storage/naming, userspace binaries, live upgrade, cloud deployment, and public-release proposals cover adjacent pieces. There is no package/app distribution model.
Missing decisions:
- Package manifest schema and authority-request review.
- Signed repositories, update channels, and revocation.
- Dependency resolution and build provenance.
- App install/remove lifecycle and state ownership.
- Rollback, staged rollout, and compatibility policy.
- Vulnerability advisory and emergency update workflow.
Research needed:
- Nix/Guix, OSTree, Flatpak portals, Android package permissions, and Fuchsia package/update system.
- Supply-chain signing systems such as TUF/in-toto/Sigstore if this becomes release-critical.
Promote when: capOS needs installable demos, sibling repositories, public release, or cloud image update flow.
Crash Recovery, Supervision, And Diagnostics
Status: Thin.
User story: a service crashes; init or an authorized supervisor restarts it or enters a known degraded mode without leaking authority, hiding the cause, or looping forever.
Current coverage: service architecture already sketches SpawnRequest restart
policy, supervisor-owned respawn, and always/on-failure restart modes;
libcapos-service covers service lifecycle pieces; live-upgrade planning ties
fault containment to supervisor respawn; and system monitoring covers
logs/metrics/crash records at a high level. Crash-loop budgets, core/minidump
capture, degraded-mode semantics, watchdog policy, and stale/in-flight cleanup
are still not owned as one recovery design.
Missing decisions:
- Restart policy authority and failure budget.
- Crash-loop backoff and operator override.
- Core dump or minidump capture with capability redaction.
- Watchdog and health-check capabilities.
- Degraded boot and emergency shell semantics.
- Stale capabilities and in-flight calls after service death.
Research needed:
- Erlang/OTP supervision trees, systemd restart policy, Kubernetes probes, and Fuchsia component lifecycle.
- Capability-system precedent for crash propagation and service replacement.
Promote when: shared services, remote shell, storage, or agent workloads need production-grade recovery behavior.
Backup, Restore, Snapshots, And Migration
Status: Thin.
User story: an operator loses a disk or VM and restores users, services, keys, and app state while avoiding stale authority and accidental data disclosure.
Current coverage: storage/naming, cloud deployment, and the hardware/boot/ storage backlog already cover narrower pieces: user-owned encrypted save transport, fake Drive/Firebase restore rejection tests, rollback/stale handling, and cloud-backed snapshot material. System-wide disaster recovery for users, services, keys, machine identity, and authority state is still not owned as one design.
Missing decisions:
- Snapshot capability boundary and consistency protocol.
- Encrypted export/import and restore identity.
- Key recovery and disaster recovery drills.
- Partial restore and per-service state ownership.
- Backup retention, deletion, and privacy policy.
- Migration between machines or cloud instances.
Research needed:
- ZFS/Btrfs snapshot semantics, Borg/Restic encrypted backup models, and cloud snapshot/key-management practices.
- Capability-specific concerns from EROS/CapROS persistence if applicable.
Promote when: writable storage, durable local accounts, volume encryption, or cloud deployment becomes near-term.
Human-Facing Administration And Explainability
Status: Thin.
User story: an operator can answer who has access to a service, why, since when, what will happen if access is revoked, and why a request was denied.
Current coverage: shell, system info, local users, system monitoring, configuration, and security proposals cover pieces. There is no unified administrator UX or policy explainability track.
Missing decisions:
- Account and role management commands or UI.
- Grant inspection, diff, revoke, and dry-run behavior.
- Denial explanation format across kernel, broker, and services.
- Audit search and incident timeline views.
- Diagnostics bundle generation and redaction.
- Safe repair workflow for broken configuration or policy.
Research needed:
- Kubernetes RBAC
can-i/audit practices. - Cloud IAM policy simulators and access-analyzer tools.
- Genode configuration/reporting UX for component graphs.
Promote when: local users, ABAC/MAC, remote shell, or operator configuration needs day-2 administration rather than proof-only commands.
Developer Debugging, Profiling, And Tooling
Status: Thin.
User story: a developer writes a capOS service, runs it locally, debugs a failed capability call, profiles it, and ships it with reproducible evidence.
Current coverage: harness engineering, benchmarks, generated-code checks, run-targets, and the paper evidence track cover pieces. There is no full debugger/profiler/developer-tooling proposal.
Missing decisions:
- Debug authority and process attach policy.
- Symbols, stack traces, crash dumps, and source maps.
- Ring/syscall/capability-call tracing.
- Service schema explorer and request replay tooling.
- Guest profiling, flamegraph, and benchmark attribution workflow.
- App/service templates and local developer SDK.
Research needed:
- GDB remote protocol, Linux
perf/eBPF-style tracing boundaries, Fuchsia diagnostics, and seL4 debug authority practices.
Promote when: non-trivial third-party services, public release, or performance claims need repeatable developer workflows.
Compatibility And App Porting Strategy
Status: Thin.
User story: a developer ports a small existing CLI or server to capOS and knows which Unix assumptions work, fail, or require explicit capability adapters.
Current coverage: userspace binaries, Go, Lua, POSIX adapters, WASI, C/C++, and language-runtime proposals mention porting targets. There is no concrete compatibility profile matrix.
Missing decisions:
- Minimal libc/POSIX surface and unsupported-call policy.
- Filesystem, environment, argv, signal, pipe, socket, and process semantics.
- Dynamic linking and shared-library policy.
- WASI adapter authority model.
- Build recipes and package corpus selection.
- Porting report template and acceptance tests.
Research needed:
- WASI preview models, CloudABI history, Redox, Hermit, Fuchsia POSIX layer, and Genode libc/VFS integration.
Promote when: a language runtime, POSIX adapter, or real application corpus becomes a selected milestone.
Accessibility And Internationalization
Status: Uncovered.
User story: non-English users and assistive-technology users can operate capOS shells, graphical sessions, and web/agent surfaces without privileged workarounds.
Current coverage: none beyond general shell/browser surface discussions.
Missing decisions:
- Unicode, locale, collation, and timezone data ownership.
- Input methods and keyboard layout authority.
- Screen reader or accessibility tree service boundary.
- High-contrast, font scaling, and reduced-motion policy.
- Translation and message-catalog strategy.
- Accessible denial/audit messages and setup flow.
Research needed:
- Web accessibility platform architecture, Wayland accessibility status, Fuchsia accessibility manager, and terminal accessibility conventions.
Promote when: graphical sessions, public demos, web shell, or production interactive setup becomes user-facing beyond developer/operator proof flows.
Fleet Operations And Remote Management
Status: Thin.
User story: an operator manages many capOS nodes and can prove which version, policy, keys, services, and update state each node is running.
Current coverage: cloud deployment, cloud metadata, system monitoring, configuration, hosted agents, and public release cover adjacent concerns. There is no fleet-management design.
Missing decisions:
- Node enrollment and identity bootstrap.
- Remote attestation and inventory reporting.
- Configuration rollout and drift detection.
- Remote logs, metrics, and audit aggregation.
- Staged update and rollback policy.
- Break-glass access and emergency revocation.
Research needed:
- Kubernetes node/bootstrap models, cloud instance identity, SPIFFE/SPIRE, TPM/measured boot attestation, and OSQuery-style inventory.
Promote when: cloud deployment, hosted agent swarms, public release, or remote administration becomes more than a single-node proof.
Privacy And Data Governance
Status: Thin.
User story: a user can see and revoke what data a service can access, and deleted data does not unintentionally persist in logs, backups, or derived indexes.
Current coverage: capability authority, session privacy, audit redaction, identity policy, storage, monitoring, and browser/agent proposals cover parts of the problem. There is no explicit data-governance design.
Missing decisions:
- Data classification and purpose-bound access metadata.
- Retention, deletion, and legal-hold semantics.
- Derived data, indexes, caches, and backup deletion behavior.
- User consent and service data export.
- Audit redaction versus forensic retention.
- Cross-service data-sharing policy and review UX.
Research needed:
- Object-capability privacy patterns, GDPR-style data lifecycle controls, browser permission UX, and cloud DLP/data catalog practices.
Promote when: persistent user data, browser/agent activity, hosted services, or public release introduces real privacy expectations.