# Research And Design Gaps Backlog

This file tracks important OS design, development, and user-story areas that
are absent, thinly covered, or only indirectly owned by existing capOS
proposals. It is a triage register, not an execution queue. Listing a gap here
does not change the selected milestone in `WORKPLAN.md` and does not mean the
project should immediately create a full proposal.

Promote an entry out of this file only when a visible milestone, paper evidence
gap, review finding, or explicit user direction makes the area actionable.
Promotion targets:

- `docs/research/` for prior-art survey or external precedent.
- `docs/proposals/` for a concrete reviewed design direction.
- A focused `docs/backlog/` file when the design is accepted enough to
  decompose implementation.
- `docs/design-risks-register.md` when the gap is an active architectural risk
  with an owner.

## Status Vocabulary

- **Uncovered**: no owned design exists yet.
- **Thin**: mentioned indirectly, but no coherent owner or decision record.
- **Backlog-only**: task decomposition exists without a full proposal.
- **Research-needed**: design should not start before prior-art review.
- **Ready-for-proposal**: enough constraints exist to draft a proposal.
- **Deferred**: intentionally future work, not a near-term blocker.
- **Rejected**: considered and explicitly not pursued.

## Promotion Checklist

Before creating a proposal from an entry here:

- Identify the visible user or operator outcome the work would enable.
- List existing capOS docs that already partially cover the area.
- List the `docs/research/` files actually read, or explain why no research
  file applies.
- Decide the first capability boundary or trust boundary that must be designed.
- Define one QEMU, host-test, documentation, or review gate that would prove
  the proposal made progress.

## Display, GUI, And Input

Status: Uncovered.

User story: a user boots capOS on a desktop, laptop, or remote graphical
session and uses multiple graphical apps with keyboard, pointer, clipboard,
accessibility, and app isolation.

Current coverage: browser, browser/WASM, agent, GPU, and shell proposals point
toward future visual sessions, but there is no native display server,
compositor, input routing, window authority, clipboard, screenshot, or
accessibility model.

Missing decisions:

- Display ownership and framebuffer/GPU authority.
- Compositor trust boundary and per-window capability model.
- Keyboard, pointer, touch, IME, and focus authority.
- Clipboard and drag/drop data-transfer policy.
- Screen capture and remote desktop authority.
- Accessibility-service authority and privacy boundaries.

Research needed:

- Genode GUI/session routing and report-ROM style composition.
- Wayland compositor security model and clipboard limitations.
- Fuchsia Scenic/input pipeline if the native GUI track becomes near-term.
- seL4/CapROS precedents for trusted path or secure attention, if applicable.

Promote when: native graphical sessions, browser UI, desktop app isolation, or
rich web/agent interaction becomes a selected milestone.

## Driver Framework And Hotplug

Status: Thin.

User story: an operator plugs in a device, capOS identifies it, starts or
restarts the correct isolated driver, and exposes only the intended typed
capabilities.

Current coverage: `docs/dma-isolation-design.md`,
`docs/backlog/hardware-boot-storage.md`, networking, storage, cloud, and GPU
proposals cover pieces of device work. There is no general driver framework
for discovery, binding, isolation, recovery, firmware, or hotplug.

Missing decisions:

- Device discovery authority and driver matching policy.
- Driver process lifecycle, crash restart, and stale handle behavior.
- Firmware loading and firmware provenance.
- Hotplug attach/detach semantics.
- Interrupt, MMIO, DMA, and power authority handoff.
- User-space driver SDK boundaries and test harnesses.

Research needed:

- Genode driver components and session routing.
- Zircon/Fuchsia driver framework concepts.
- Linux VFIO/uio and userspace-driver isolation tradeoffs.
- seL4 device-driver partitioning examples.

Promote when: userspace NIC, block-device, USB, GPU, or real hardware bring-up
requires reusable driver lifecycle rules.

## Power, Suspend, Resume, And Thermal Policy

Status: Uncovered.

User story: a laptop or VM can sleep, wake, preserve sessions, and report power
or thermal limits without leaking stale authority or corrupting timers.

Current coverage: tickless scheduling covers timer cleanup and idle mechanics,
but not power management as an OS product area.

Missing decisions:

- Suspend/resume authority and system-wide quiesce protocol.
- Wake-source capabilities and audit.
- Battery, charger, lid, and thermal sensor surfaces.
- CPU frequency, C-state, and thermal-throttling policy.
- Timer and network behavior across sleep.
- Session and service liveness after resume.

Research needed:

- ACPI power-state model and Linux suspend blockers/wakeup sources.
- Fuchsia power framework if relevant.
- Genode power-management patterns for component systems.

Promote when: laptop hardware, cloud hibernation, low-power idle, or
interactive remote-shell reliability needs sleep/resume semantics.

## Time, Clock, And Trusted Timestamp Services

Status: Thin.

User story: services can distinguish monotonic time, wall-clock time, and
trusted audit time, and cannot silently forge system time.

Current coverage: scheduler and tickless proposals mention clocks, timers,
deadlines, and clocksource/clockevent split. There is no user-facing time
authority model.

Missing decisions:

- Monotonic, boot, realtime, and coarse clock capability surfaces.
- Who can set wall-clock time and how changes are audited.
- NTP/PTP/cloud-metadata time synchronization authority.
- Timezone and locale data ownership.
- Leap-second and clock-step behavior.
- Timestamp trust level carried into audit records.

Research needed:

- Linux clock ids, adjtimex/NTP discipline, and time namespaces.
- Fuchsia clock objects and UTC maintenance.
- Cloud metadata time and attestation interactions.

Promote when: audit log completion, TLS certificate validation, distributed
services, or durable storage needs trusted timestamp semantics.

## Software Installation, Packages, And Rollback

Status: Thin.

User story: a user or operator installs an app, inspects requested authority,
updates it, rolls it back, and removes its state without ambient filesystem
assumptions.

Current coverage: repository composition, storage/naming, userspace binaries,
live upgrade, cloud deployment, and public-release proposals cover adjacent
pieces. There is no package/app distribution model.

Missing decisions:

- Package manifest schema and authority-request review.
- Signed repositories, update channels, and revocation.
- Dependency resolution and build provenance.
- App install/remove lifecycle and state ownership.
- Rollback, staged rollout, and compatibility policy.
- Vulnerability advisory and emergency update workflow.

Research needed:

- Nix/Guix, OSTree, Flatpak portals, Android package permissions, and Fuchsia
  package/update system.
- Supply-chain signing systems such as TUF/in-toto/Sigstore if this becomes
  release-critical.

Promote when: capOS needs installable demos, sibling repositories, public
release, or cloud image update flow.

## Crash Recovery, Supervision, And Diagnostics

Status: Thin.

User story: a service crashes; init or an authorized supervisor restarts it or
enters a known degraded mode without leaking authority, hiding the cause, or
looping forever.

Current coverage: service architecture already sketches `SpawnRequest` restart
policy, supervisor-owned respawn, and always/on-failure restart modes;
`libcapos-service` covers service lifecycle pieces; live-upgrade planning ties
fault containment to supervisor respawn; and system monitoring covers
logs/metrics/crash records at a high level. Crash-loop budgets, core/minidump
capture, degraded-mode semantics, watchdog policy, and stale/in-flight cleanup
are still not owned as one recovery design.

Missing decisions:

- Restart policy authority and failure budget.
- Crash-loop backoff and operator override.
- Core dump or minidump capture with capability redaction.
- Watchdog and health-check capabilities.
- Degraded boot and emergency shell semantics.
- Stale capabilities and in-flight calls after service death.

Research needed:

- Erlang/OTP supervision trees, systemd restart policy, Kubernetes probes, and
  Fuchsia component lifecycle.
- Capability-system precedent for crash propagation and service replacement.

Promote when: shared services, remote shell, storage, or agent workloads need
production-grade recovery behavior.

## Backup, Restore, Snapshots, And Migration

Status: Thin.

User story: an operator loses a disk or VM and restores users, services, keys,
and app state while avoiding stale authority and accidental data disclosure.

Current coverage: storage/naming, cloud deployment, and the hardware/boot/
storage backlog already cover narrower pieces: user-owned encrypted save
transport, fake Drive/Firebase restore rejection tests, rollback/stale handling,
and cloud-backed snapshot material. System-wide disaster recovery for users,
services, keys, machine identity, and authority state is still not owned as one
design.

Missing decisions:

- Snapshot capability boundary and consistency protocol.
- Encrypted export/import and restore identity.
- Key recovery and disaster recovery drills.
- Partial restore and per-service state ownership.
- Backup retention, deletion, and privacy policy.
- Migration between machines or cloud instances.

Research needed:

- ZFS/Btrfs snapshot semantics, Borg/Restic encrypted backup models, and
  cloud snapshot/key-management practices.
- Capability-specific concerns from EROS/CapROS persistence if applicable.

Promote when: writable storage, durable local accounts, volume encryption, or
cloud deployment becomes near-term.

## Human-Facing Administration And Explainability

Status: Thin.

User story: an operator can answer who has access to a service, why, since
when, what will happen if access is revoked, and why a request was denied.

Current coverage: shell, system info, local users, system monitoring,
configuration, and security proposals cover pieces. There is no unified
administrator UX or policy explainability track.

Missing decisions:

- Account and role management commands or UI.
- Grant inspection, diff, revoke, and dry-run behavior.
- Denial explanation format across kernel, broker, and services.
- Audit search and incident timeline views.
- Diagnostics bundle generation and redaction.
- Safe repair workflow for broken configuration or policy.

Research needed:

- Kubernetes RBAC `can-i`/audit practices.
- Cloud IAM policy simulators and access-analyzer tools.
- Genode configuration/reporting UX for component graphs.

Promote when: local users, ABAC/MAC, remote shell, or operator configuration
needs day-2 administration rather than proof-only commands.

## Developer Debugging, Profiling, And Tooling

Status: Thin.

User story: a developer writes a capOS service, runs it locally, debugs a
failed capability call, profiles it, and ships it with reproducible evidence.

Current coverage: harness engineering, benchmarks, generated-code checks,
run-targets, and the paper evidence track cover pieces. There is no full
debugger/profiler/developer-tooling proposal.

Missing decisions:

- Debug authority and process attach policy.
- Symbols, stack traces, crash dumps, and source maps.
- Ring/syscall/capability-call tracing.
- Service schema explorer and request replay tooling.
- Guest profiling, flamegraph, and benchmark attribution workflow.
- App/service templates and local developer SDK.

Research needed:

- GDB remote protocol, Linux `perf`/eBPF-style tracing boundaries, Fuchsia
  diagnostics, and seL4 debug authority practices.

Promote when: non-trivial third-party services, public release, or performance
claims need repeatable developer workflows.

## Compatibility And App Porting Strategy

Status: Thin.

User story: a developer ports a small existing CLI or server to capOS and knows
which Unix assumptions work, fail, or require explicit capability adapters.

Current coverage: userspace binaries, Go, Lua, POSIX adapters, WASI, C/C++,
and language-runtime proposals mention porting targets. There is no concrete
compatibility profile matrix.

Missing decisions:

- Minimal libc/POSIX surface and unsupported-call policy.
- Filesystem, environment, argv, signal, pipe, socket, and process semantics.
- Dynamic linking and shared-library policy.
- WASI adapter authority model.
- Build recipes and package corpus selection.
- Porting report template and acceptance tests.

Research needed:

- WASI preview models, CloudABI history, Redox, Hermit, Fuchsia POSIX layer,
  and Genode libc/VFS integration.

Promote when: a language runtime, POSIX adapter, or real application corpus
becomes a selected milestone.

## Accessibility And Internationalization

Status: Uncovered.

User story: non-English users and assistive-technology users can operate
capOS shells, graphical sessions, and web/agent surfaces without privileged
workarounds.

Current coverage: none beyond general shell/browser surface discussions.

Missing decisions:

- Unicode, locale, collation, and timezone data ownership.
- Input methods and keyboard layout authority.
- Screen reader or accessibility tree service boundary.
- High-contrast, font scaling, and reduced-motion policy.
- Translation and message-catalog strategy.
- Accessible denial/audit messages and setup flow.

Research needed:

- Web accessibility platform architecture, Wayland accessibility status,
  Fuchsia accessibility manager, and terminal accessibility conventions.

Promote when: graphical sessions, public demos, web shell, or production
interactive setup becomes user-facing beyond developer/operator proof flows.

## Fleet Operations And Remote Management

Status: Thin.

User story: an operator manages many capOS nodes and can prove which version,
policy, keys, services, and update state each node is running.

Current coverage: cloud deployment, cloud metadata, system monitoring,
configuration, hosted agents, and public release cover adjacent concerns.
There is no fleet-management design.

Missing decisions:

- Node enrollment and identity bootstrap.
- Remote attestation and inventory reporting.
- Configuration rollout and drift detection.
- Remote logs, metrics, and audit aggregation.
- Staged update and rollback policy.
- Break-glass access and emergency revocation.

Research needed:

- Kubernetes node/bootstrap models, cloud instance identity, SPIFFE/SPIRE,
  TPM/measured boot attestation, and OSQuery-style inventory.

Promote when: cloud deployment, hosted agent swarms, public release, or remote
administration becomes more than a single-node proof.

## Privacy And Data Governance

Status: Thin.

User story: a user can see and revoke what data a service can access, and
deleted data does not unintentionally persist in logs, backups, or derived
indexes.

Current coverage: capability authority, session privacy, audit redaction,
identity policy, storage, monitoring, and browser/agent proposals cover parts
of the problem. There is no explicit data-governance design.

Missing decisions:

- Data classification and purpose-bound access metadata.
- Retention, deletion, and legal-hold semantics.
- Derived data, indexes, caches, and backup deletion behavior.
- User consent and service data export.
- Audit redaction versus forensic retention.
- Cross-service data-sharing policy and review UX.

Research needed:

- Object-capability privacy patterns, GDPR-style data lifecycle controls,
  browser permission UX, and cloud DLP/data catalog practices.

Promote when: persistent user data, browser/agent activity, hosted services,
or public release introduces real privacy expectations.