Proposal: capOS Repository Harness Engineering
This proposal applies OpenAI-style harness engineering to the capOS repository itself. The goal is not to add agent features to the operating system. The goal is to make this repository a better, safer work environment for long-running agents and human reviewers.
The related capOS-Hosted Agent Swarms proposal describes capOS as a future host for OpenClaw-like agent services. This proposal describes the repository infrastructure needed so agents can work on capOS without repeatedly rediscovering project state, extending superseded designs, choosing the wrong QEMU proof, or silently drifting documentation.
Why This Proposal Exists
The capOS repo is already heavily agent-shaped:
AGENTS.mdandCLAUDE.mddefine workflow rules.WORKPLAN.mdselects the current milestone and immediate gates.REVIEW_FINDINGS.mdrecords open remediation work.docs/proposals/,docs/backlog/, anddocs/research/hold design context.docs/topics.md,docs/SUMMARY.md, and proposal indexes make docs navigable.- Make targets and QEMU harnesses prove behavior.
- CUE manifests define focused system configurations.
That is enough for a careful agent to work, but it is not yet a complete harness. Too much project state still requires fragile human-style inference: which document is authoritative, which proposal is stale, which run target proves which behavior, which open finding blocks a task, and which design pivot explains why old text should not be extended.
OpenAI’s harness engineering lesson is direct: what an agent cannot inspect in its working context effectively does not exist. capOS should therefore compile its project state into repo-local, versioned, mechanically checked artifacts.
Scope
In scope:
- agent-facing repository map;
- task-selection and milestone state;
- proposal/research/status consistency checks;
- run-target and QEMU proof inventory;
- machine-readable design relationships;
- agent-maintained but reviewed knowledge compilation;
- deterministic evals for future coding agents;
- active-work and shared-resource visibility;
- review and security handoff artifacts.
Out of scope:
- capOS-hosted agent runtime implementation;
- model provider selection;
- browser, MCP, or A2A runtime integration;
- replacing human review;
- changing the current mandatory worktree workflow.
Design Principles
-
Repository-local context wins. Important design and workflow state should live in tracked files, not in chat history or operator memory.
-
Indexes are harness inputs.
docs/topics.md,docs/SUMMARY.md, proposal indexes, backlog pointers, and run-target tables are not cosmetic; they are how agents find the right context. -
Status must be checkable. Proposal status, supersession, implementation status, selected milestone, and review findings should fail checks when they drift.
-
Proofs need names and ownership. A QEMU harness target should say what it proves, which manifest it uses, which proposal/backlog owns it, and what transcript shape is expected.
-
Compiled knowledge is non-authoritative until reviewed. Agent-generated wiki pages can help navigation, but proposals, architecture docs, schemas, code, and review findings remain authoritative.
-
Prefer generation over duplicate hand-maintained state. When possible, sidecars and indexes should be generated from front matter, Makefile metadata, manifests, or explicit source files.
-
Expose replacement paths. If a proposal is superseded, an agent should see the replacement before acting on stale text.
-
Make unsafe shortcuts hard. The harness should steer agents away from main-worktree edits, stale branches, missing review, unverified QEMU claims, and undocumented design pivots.
-
Agents must know when they are not alone. Shared resources such as git branches, worktrees, docs indexes, task lists, generated files, and review queues need visible ownership, lease, and version state before agents mutate them.
Proposed Artifacts
docs/agent-harness.md
A concise entry point for future agents. It should answer:
- where current project state lives;
- how to choose a task;
- how to create a compliant worktree;
- how to find relevant proposals, backlog, research, and review findings;
- how to choose checks;
- how to handle docs/status updates;
- how to hand off verification and review.
This file should link to authoritative docs rather than duplicate them. It is a map, not a new policy source.
docs/run-targets.md
Generated or maintained inventory of run/check targets:
| Target | Manifest | Purpose | Expected proof | Owner |
|---|---|---|---|---|
make run-session-context | system-session-context.cue | one immutable session context proof | hostile second-session attempts fail closed | session-bound invocation context |
make run-chat | system-chat.cue | resident chat service proof | session-scoped chat transcript | chat/shared-service proposal |
The table should cover make run-*, make qemu-*, docs checks,
generated-code checks, and security checks. Agents should not infer target
meaning from target names alone.
Active Work Registry
Add a small generated or reviewed active-work registry for concurrent agents. It should be derived from git worktrees where possible and supplemented by task metadata:
| Task | Branch | Worktree | Claimed resources | Mode | Expires | Status |
|---|---|---|---|---|---|---|
| example-session-model | feat/session-model-proof | /home/ei-grad/capos-worktrees/session-model-proof | src/capos/service.rs, docs/proposals/session-context.md | exclusive source, shared docs | 2026-05-01 | checking |
The registry is not a replacement for git or human review. It is a harness surface for “another agent is already touching this shared resource.” The row above is synthetic sample data, not live project state.
Minimum fields:
- task or issue id;
- owner identity or runner id;
- branch and worktree path;
- claimed paths, subsystems, generated outputs, todo items, or review queues;
- exclusive/shared mode;
- observed base revision;
- lease expiry and renewal time;
- status: planning, editing, checking, review, merge, blocked, abandoned.
For the current repo workflow, this would make the existing worktree policy
queryable. For a future capOS-hosted swarm, the same shape becomes a
SharedResource/ResourceLease service: git repos, shared todo items, wiki
pages, generated docs, and merge queues all get visible claims and versioned
writes.
Proposal Relationship Metadata
Add or standardize front matter fields:
status: "Future design. No implementation."
last_reviewed: "2026-04-28 00:00 UTC"
supersedes:
- old-proposal.md
superseded_by: new-proposal.md
implemented_by:
- commit-or-target
owned_backlog: docs/backlog/example.md
proof_targets:
- make run-example
The exact schema can be narrower at first. The important requirement is that replacement and proof relationships become queryable.
Design Pivot Records
Add short ADR-style files under docs/decisions/ for high-impact pivots:
- endpoint badges as service identity rejected;
- service-object capabilities superseded by session-bound invocation context;
- SSH work paused behind session-bound invocation context;
- hosted agents split from shell agent mode.
Each record should state context, decision, consequences, superseded docs, and current replacement docs.
docs/agent-wiki/
A generated or agent-maintained compiled knowledge tree:
index.md: current topic map;capability-model.md: current “interface is permission” model;session-model.md: selected session-bound invocation context summary;shell-and-remote-access.md: shell, Telnet, SSH, WebShellGateway status;qemu-proofs.md: proof target summaries;open-findings.md: current review findings summarized with links.
This tree must be clearly labeled as compiled navigation, not authority. It can be hidden from public docs until reviewed.
Agent Evals
Add deterministic repository-workflow evals:
- identify selected milestone from
WORKPLAN.md; - find the relevant backlog and proposal;
- reject editing the main worktree;
- detect another active worker claiming the same exclusive path or generated output;
- choose a non-overlapping task or wait when a shared resource is already leased;
- identify required checks for a doc-only proposal change;
- detect a superseded proposal and follow replacement;
- update proposal index and summary when adding a proposal;
- avoid claiming full tests passed when only docs built;
- surface open review findings before unrelated feature work.
These evals can start as scripted fixtures. They do not need live model calls.
Mechanical Checks
Extend existing documentation tooling to check:
- every proposal in
docs/proposals/is present indocs/proposals/index.mdor an explicit archive section; - every proposal linked in
docs/SUMMARY.mdexists; - every proposal with topics appears in
docs/topics.mdafter generation; superseded_bypoints to an existing file;- superseded proposals display a replacement link near the top;
WORKPLAN.mdselected milestone has a matching backlog pointer;- run-target inventory entries point to existing Make targets and manifests;
- research-backed proposals link at least one
docs/research/*.mdnote; - external source snapshots in research notes include a review date;
- QEMU proof claims name a target.
- active-work registry entries point to existing branches/worktrees when local;
- no two active registry entries claim the same exclusive resource unless one is marked blocked, abandoned, or waiting for merge.
These checks should start warning-only if needed, then become required once the metadata is in place.
Workflow Impact
For agents:
- start at
docs/agent-harness.md; - read selected milestone state through stable headings or generated sidecar;
- inspect active-work/resource claims before choosing or mutating shared files;
- follow proposal relationship metadata to avoid stale design;
- choose checks from run-target inventory;
- update docs/status through mechanically checked indexes;
- hand off with proof target names and transcript artifacts.
For humans:
- less repeated explanation of repo rules;
- easier review of whether an agent chose the right context;
- clearer detection of stale docs;
- explicit locations for “why did we change direction?” records.
Implementation Phases
Phase 1 - Map and Inventory
- Add
docs/agent-harness.md. - Add initial
docs/run-targets.mdby hand for major run targets. - Link both from
docs/SUMMARY.md,docs/topics.md, andREADME.md. - Add a short section in
WORKPLAN.mdpointing future agents to the harness map.
Phase 2 - Metadata and Checks
- Standardize front matter for proposals and research notes.
- Extend mdBook metadata tooling to validate proposal index, topic membership, summary links, status fields, and supersession links.
- Add run-target inventory validation against Makefile and manifest paths.
Phase 3 - Decision Records
- Add
docs/decisions/and initial pivot records for the session-bound invocation context change and hosted-agent split. - Link decisions from affected proposals and backlog files.
Phase 4 - Compiled Agent Wiki
- Create a reviewed
docs/agent-wiki/seed for the current selected milestone. - Add lint for stale links, missing citations, and “compiled, not authority” labels.
- Decide whether generated wiki pages are published in mdBook or kept as repo-internal harness files.
Phase 5 - Agent Workflow Evals
- Add fixtures and scripts for repository-workflow evals.
- Run them in a docs/check target.
- Use failures to improve
docs/agent-harness.md, metadata, and run-target inventory.
Open Questions
- Should proposal relationship metadata live only in front matter, or should there be a generated JSON sidecar for fast agent/tool consumption?
- Should
docs/agent-wiki/be generated on demand or checked in after review? - How much QEMU transcript output should be retained as proof artifacts without bloating the repository?
- Should run-target metadata live in Makefile comments, a CUE file, or
docs/run-targets.mdfront matter blocks? - How strict should the first status linter be, given existing historical docs?
- Should agent evals be part of
make docs, a separatemake agent-harness-check, or a broadermake check?
Relationship to Existing Documents
- Hosted agent harnesses research records the external harness research and the initial checklist.
- capOS-Hosted Agent Swarms uses this repo harness as precedent for future capOS-hosted agents.
- mdBook Documentation Site owns public docs structure and status vocabulary; this proposal adds agent-legibility and mechanical checks on top.
CLAUDE.md,AGENTS.md,WORKPLAN.md, andREVIEW_FINDINGS.mdremain authoritative workflow inputs.docs/agent-harness.mdshould route to them, not replace them.