Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: capOS Repository Harness Engineering

This proposal applies OpenAI-style harness engineering to the capOS repository itself. The goal is not to add agent features to the operating system. The goal is to make this repository a better, safer work environment for long-running agents and human reviewers.

The related capOS-Hosted Agent Swarms proposal describes capOS as a future host for OpenClaw-like agent services. This proposal describes the repository infrastructure needed so agents can work on capOS without repeatedly rediscovering project state, extending superseded designs, choosing the wrong QEMU proof, or silently drifting documentation.

Why This Proposal Exists

The capOS repo is already heavily agent-shaped:

  • AGENTS.md and CLAUDE.md define workflow rules.
  • WORKPLAN.md selects the current milestone and immediate gates.
  • REVIEW_FINDINGS.md records open remediation work.
  • docs/proposals/, docs/backlog/, and docs/research/ hold design context.
  • docs/topics.md, docs/SUMMARY.md, and proposal indexes make docs navigable.
  • Make targets and QEMU harnesses prove behavior.
  • CUE manifests define focused system configurations.

That is enough for a careful agent to work, but it is not yet a complete harness. Too much project state still requires fragile human-style inference: which document is authoritative, which proposal is stale, which run target proves which behavior, which open finding blocks a task, and which design pivot explains why old text should not be extended.

OpenAI’s harness engineering lesson is direct: what an agent cannot inspect in its working context effectively does not exist. capOS should therefore compile its project state into repo-local, versioned, mechanically checked artifacts.

Scope

In scope:

  • agent-facing repository map;
  • task-selection and milestone state;
  • proposal/research/status consistency checks;
  • run-target and QEMU proof inventory;
  • machine-readable design relationships;
  • agent-maintained but reviewed knowledge compilation;
  • deterministic evals for future coding agents;
  • active-work and shared-resource visibility;
  • review and security handoff artifacts.

Out of scope:

  • capOS-hosted agent runtime implementation;
  • model provider selection;
  • browser, MCP, or A2A runtime integration;
  • replacing human review;
  • changing the current mandatory worktree workflow.

Design Principles

  1. Repository-local context wins. Important design and workflow state should live in tracked files, not in chat history or operator memory.

  2. Indexes are harness inputs. docs/topics.md, docs/SUMMARY.md, proposal indexes, backlog pointers, and run-target tables are not cosmetic; they are how agents find the right context.

  3. Status must be checkable. Proposal status, supersession, implementation status, selected milestone, and review findings should fail checks when they drift.

  4. Proofs need names and ownership. A QEMU harness target should say what it proves, which manifest it uses, which proposal/backlog owns it, and what transcript shape is expected.

  5. Compiled knowledge is non-authoritative until reviewed. Agent-generated wiki pages can help navigation, but proposals, architecture docs, schemas, code, and review findings remain authoritative.

  6. Prefer generation over duplicate hand-maintained state. When possible, sidecars and indexes should be generated from front matter, Makefile metadata, manifests, or explicit source files.

  7. Expose replacement paths. If a proposal is superseded, an agent should see the replacement before acting on stale text.

  8. Make unsafe shortcuts hard. The harness should steer agents away from main-worktree edits, stale branches, missing review, unverified QEMU claims, and undocumented design pivots.

  9. Agents must know when they are not alone. Shared resources such as git branches, worktrees, docs indexes, task lists, generated files, and review queues need visible ownership, lease, and version state before agents mutate them.

Proposed Artifacts

docs/agent-harness.md

A concise entry point for future agents. It should answer:

  • where current project state lives;
  • how to choose a task;
  • how to create a compliant worktree;
  • how to find relevant proposals, backlog, research, and review findings;
  • how to choose checks;
  • how to handle docs/status updates;
  • how to hand off verification and review.

This file should link to authoritative docs rather than duplicate them. It is a map, not a new policy source.

docs/run-targets.md

Generated or maintained inventory of run/check targets:

TargetManifestPurposeExpected proofOwner
make run-session-contextsystem-session-context.cueone immutable session context proofhostile second-session attempts fail closedsession-bound invocation context
make run-chatsystem-chat.cueresident chat service proofsession-scoped chat transcriptchat/shared-service proposal

The table should cover make run-*, make qemu-*, docs checks, generated-code checks, and security checks. Agents should not infer target meaning from target names alone.

Active Work Registry

Add a small generated or reviewed active-work registry for concurrent agents. It should be derived from git worktrees where possible and supplemented by task metadata:

TaskBranchWorktreeClaimed resourcesModeExpiresStatus
example-session-modelfeat/session-model-proof/home/ei-grad/capos-worktrees/session-model-proofsrc/capos/service.rs, docs/proposals/session-context.mdexclusive source, shared docs2026-05-01checking

The registry is not a replacement for git or human review. It is a harness surface for “another agent is already touching this shared resource.” The row above is synthetic sample data, not live project state.

Minimum fields:

  • task or issue id;
  • owner identity or runner id;
  • branch and worktree path;
  • claimed paths, subsystems, generated outputs, todo items, or review queues;
  • exclusive/shared mode;
  • observed base revision;
  • lease expiry and renewal time;
  • status: planning, editing, checking, review, merge, blocked, abandoned.

For the current repo workflow, this would make the existing worktree policy queryable. For a future capOS-hosted swarm, the same shape becomes a SharedResource/ResourceLease service: git repos, shared todo items, wiki pages, generated docs, and merge queues all get visible claims and versioned writes.

Proposal Relationship Metadata

Add or standardize front matter fields:

status: "Future design. No implementation."
last_reviewed: "2026-04-28 00:00 UTC"
supersedes:
  - old-proposal.md
superseded_by: new-proposal.md
implemented_by:
  - commit-or-target
owned_backlog: docs/backlog/example.md
proof_targets:
  - make run-example

The exact schema can be narrower at first. The important requirement is that replacement and proof relationships become queryable.

Design Pivot Records

Add short ADR-style files under docs/decisions/ for high-impact pivots:

  • endpoint badges as service identity rejected;
  • service-object capabilities superseded by session-bound invocation context;
  • SSH work paused behind session-bound invocation context;
  • hosted agents split from shell agent mode.

Each record should state context, decision, consequences, superseded docs, and current replacement docs.

docs/agent-wiki/

A generated or agent-maintained compiled knowledge tree:

  • index.md: current topic map;
  • capability-model.md: current “interface is permission” model;
  • session-model.md: selected session-bound invocation context summary;
  • shell-and-remote-access.md: shell, Telnet, SSH, WebShellGateway status;
  • qemu-proofs.md: proof target summaries;
  • open-findings.md: current review findings summarized with links.

This tree must be clearly labeled as compiled navigation, not authority. It can be hidden from public docs until reviewed.

Agent Evals

Add deterministic repository-workflow evals:

  • identify selected milestone from WORKPLAN.md;
  • find the relevant backlog and proposal;
  • reject editing the main worktree;
  • detect another active worker claiming the same exclusive path or generated output;
  • choose a non-overlapping task or wait when a shared resource is already leased;
  • identify required checks for a doc-only proposal change;
  • detect a superseded proposal and follow replacement;
  • update proposal index and summary when adding a proposal;
  • avoid claiming full tests passed when only docs built;
  • surface open review findings before unrelated feature work.

These evals can start as scripted fixtures. They do not need live model calls.

Mechanical Checks

Extend existing documentation tooling to check:

  • every proposal in docs/proposals/ is present in docs/proposals/index.md or an explicit archive section;
  • every proposal linked in docs/SUMMARY.md exists;
  • every proposal with topics appears in docs/topics.md after generation;
  • superseded_by points to an existing file;
  • superseded proposals display a replacement link near the top;
  • WORKPLAN.md selected milestone has a matching backlog pointer;
  • run-target inventory entries point to existing Make targets and manifests;
  • research-backed proposals link at least one docs/research/*.md note;
  • external source snapshots in research notes include a review date;
  • QEMU proof claims name a target.
  • active-work registry entries point to existing branches/worktrees when local;
  • no two active registry entries claim the same exclusive resource unless one is marked blocked, abandoned, or waiting for merge.

These checks should start warning-only if needed, then become required once the metadata is in place.

Workflow Impact

For agents:

  • start at docs/agent-harness.md;
  • read selected milestone state through stable headings or generated sidecar;
  • inspect active-work/resource claims before choosing or mutating shared files;
  • follow proposal relationship metadata to avoid stale design;
  • choose checks from run-target inventory;
  • update docs/status through mechanically checked indexes;
  • hand off with proof target names and transcript artifacts.

For humans:

  • less repeated explanation of repo rules;
  • easier review of whether an agent chose the right context;
  • clearer detection of stale docs;
  • explicit locations for “why did we change direction?” records.

Implementation Phases

Phase 1 - Map and Inventory

  • Add docs/agent-harness.md.
  • Add initial docs/run-targets.md by hand for major run targets.
  • Link both from docs/SUMMARY.md, docs/topics.md, and README.md.
  • Add a short section in WORKPLAN.md pointing future agents to the harness map.

Phase 2 - Metadata and Checks

  • Standardize front matter for proposals and research notes.
  • Extend mdBook metadata tooling to validate proposal index, topic membership, summary links, status fields, and supersession links.
  • Add run-target inventory validation against Makefile and manifest paths.

Phase 3 - Decision Records

  • Add docs/decisions/ and initial pivot records for the session-bound invocation context change and hosted-agent split.
  • Link decisions from affected proposals and backlog files.

Phase 4 - Compiled Agent Wiki

  • Create a reviewed docs/agent-wiki/ seed for the current selected milestone.
  • Add lint for stale links, missing citations, and “compiled, not authority” labels.
  • Decide whether generated wiki pages are published in mdBook or kept as repo-internal harness files.

Phase 5 - Agent Workflow Evals

  • Add fixtures and scripts for repository-workflow evals.
  • Run them in a docs/check target.
  • Use failures to improve docs/agent-harness.md, metadata, and run-target inventory.

Open Questions

  • Should proposal relationship metadata live only in front matter, or should there be a generated JSON sidecar for fast agent/tool consumption?
  • Should docs/agent-wiki/ be generated on demand or checked in after review?
  • How much QEMU transcript output should be retained as proof artifacts without bloating the repository?
  • Should run-target metadata live in Makefile comments, a CUE file, or docs/run-targets.md front matter blocks?
  • How strict should the first status linter be, given existing historical docs?
  • Should agent evals be part of make docs, a separate make agent-harness-check, or a broader make check?

Relationship to Existing Documents

  • Hosted agent harnesses research records the external harness research and the initial checklist.
  • capOS-Hosted Agent Swarms uses this repo harness as precedent for future capOS-hosted agents.
  • mdBook Documentation Site owns public docs structure and status vocabulary; this proposal adds agent-legibility and mechanical checks on top.
  • CLAUDE.md, AGENTS.md, WORKPLAN.md, and REVIEW_FINDINGS.md remain authoritative workflow inputs. docs/agent-harness.md should route to them, not replace them.