# Proposal: capOS Repository Harness Engineering

This proposal applies OpenAI-style harness engineering to the capOS repository
itself. The goal is not to add agent features to the operating system. The goal
is to make this repository a better, safer work environment for long-running
agents and human reviewers.

The related [capOS-Hosted Agent Swarms](hosted-agent-swarm-proposal.md)
proposal describes capOS as a future host for OpenClaw-like agent services.
This proposal describes the repository infrastructure needed so agents can work
on capOS without repeatedly rediscovering project state, extending superseded
designs, choosing the wrong QEMU proof, or silently drifting documentation.

## Why This Proposal Exists

The capOS repo is already heavily agent-shaped:

- `AGENTS.md` and `CLAUDE.md` define workflow rules.
- `WORKPLAN.md` selects the current milestone and immediate gates.
- `REVIEW_FINDINGS.md` records open remediation work.
- `docs/proposals/`, `docs/backlog/`, and `docs/research/` hold design
  context.
- `docs/topics.md`, `docs/SUMMARY.md`, and proposal indexes make docs
  navigable.
- Make targets and QEMU harnesses prove behavior.
- CUE manifests define focused system configurations.

That is enough for a careful agent to work, but it is not yet a complete
harness. Too much project state still requires fragile human-style inference:
which document is authoritative, which proposal is stale, which run target
proves which behavior, which open finding blocks a task, and which design pivot
explains why old text should not be extended.

OpenAI's harness engineering lesson is direct: what an agent cannot inspect in
its working context effectively does not exist. capOS should therefore compile
its project state into repo-local, versioned, mechanically checked artifacts.

## Scope

In scope:

- agent-facing repository map;
- task-selection and milestone state;
- proposal/research/status consistency checks;
- run-target and QEMU proof inventory;
- machine-readable design relationships;
- agent-maintained but reviewed knowledge compilation;
- deterministic evals for future coding agents;
- active-work and shared-resource visibility;
- review and security handoff artifacts.

Out of scope:

- capOS-hosted agent runtime implementation;
- model provider selection;
- browser, MCP, or A2A runtime integration;
- replacing human review;
- changing the current mandatory worktree workflow.

## Design Principles

1. **Repository-local context wins.** Important design and workflow state should
   live in tracked files, not in chat history or operator memory.

2. **Indexes are harness inputs.** `docs/topics.md`, `docs/SUMMARY.md`,
   proposal indexes, backlog pointers, and run-target tables are not cosmetic;
   they are how agents find the right context.

3. **Status must be checkable.** Proposal status, supersession, implementation
   status, selected milestone, and review findings should fail checks when they
   drift.

4. **Proofs need names and ownership.** A QEMU harness target should say what
   it proves, which manifest it uses, which proposal/backlog owns it, and what
   transcript shape is expected.

5. **Compiled knowledge is non-authoritative until reviewed.** Agent-generated
   wiki pages can help navigation, but proposals, architecture docs, schemas,
   code, and review findings remain authoritative.

6. **Prefer generation over duplicate hand-maintained state.** When possible,
   sidecars and indexes should be generated from front matter, Makefile
   metadata, manifests, or explicit source files.

7. **Expose replacement paths.** If a proposal is superseded, an agent should
   see the replacement before acting on stale text.

8. **Make unsafe shortcuts hard.** The harness should steer agents away from
   main-worktree edits, stale branches, missing review, unverified QEMU claims,
   and undocumented design pivots.

9. **Agents must know when they are not alone.** Shared resources such as git
   branches, worktrees, docs indexes, task lists, generated files, and review
   queues need visible ownership, lease, and version state before agents mutate
   them.

## Proposed Artifacts

### `docs/agent-harness.md`

A concise entry point for future agents. It should answer:

- where current project state lives;
- how to choose a task;
- how to create a compliant worktree;
- how to find relevant proposals, backlog, research, and review findings;
- how to choose checks;
- how to handle docs/status updates;
- how to hand off verification and review.

This file should link to authoritative docs rather than duplicate them. It is a
map, not a new policy source.

### `docs/run-targets.md`

Generated or maintained inventory of run/check targets:

| Target | Manifest | Purpose | Expected proof | Owner |
| --- | --- | --- | --- | --- |
| `make run-session-context` | `system-session-context.cue` | one immutable session context proof | hostile second-session attempts fail closed | session-bound invocation context |
| `make run-chat` | `system-chat.cue` | resident chat service proof | session-scoped chat transcript | chat/shared-service proposal |

The table should cover `make run-*`, `make qemu-*`, docs checks,
generated-code checks, and security checks. Agents should not infer target
meaning from target names alone.

### Active Work Registry

Add a small generated or reviewed active-work registry for concurrent agents.
It should be derived from git worktrees where possible and supplemented by
task metadata:

| Task | Branch | Worktree | Claimed resources | Mode | Expires | Status |
| --- | --- | --- | --- | --- | --- | --- |
| example-session-model | `feat/session-model-proof` | `/home/ei-grad/capos-worktrees/session-model-proof` | `src/capos/service.rs`, `docs/proposals/session-context.md` | exclusive source, shared docs | 2026-05-01 | checking |

The registry is not a replacement for git or human review. It is a harness
surface for "another agent is already touching this shared resource." The row
above is synthetic sample data, not live project state.

Minimum fields:

- task or issue id;
- owner identity or runner id;
- branch and worktree path;
- claimed paths, subsystems, generated outputs, todo items, or review queues;
- exclusive/shared mode;
- observed base revision;
- lease expiry and renewal time;
- status: planning, editing, checking, review, merge, blocked, abandoned.

For the current repo workflow, this would make the existing worktree policy
queryable. For a future capOS-hosted swarm, the same shape becomes a
`SharedResource`/`ResourceLease` service: git repos, shared todo items, wiki
pages, generated docs, and merge queues all get visible claims and versioned
writes.

### Proposal Relationship Metadata

Add or standardize front matter fields:

```yaml
status: "Future design. No implementation."
last_reviewed: "2026-04-28 00:00 UTC"
supersedes:
  - old-proposal.md
superseded_by: new-proposal.md
implemented_by:
  - commit-or-target
owned_backlog: docs/backlog/example.md
proof_targets:
  - make run-example
```

The exact schema can be narrower at first. The important requirement is that
replacement and proof relationships become queryable.

### Design Pivot Records

Add short ADR-style files under `docs/decisions/` for high-impact pivots:

- endpoint badges as service identity rejected;
- service-object capabilities superseded by session-bound invocation context;
- SSH work paused behind session-bound invocation context;
- hosted agents split from shell agent mode.

Each record should state context, decision, consequences, superseded docs, and
current replacement docs.

### `docs/agent-wiki/`

A generated or agent-maintained compiled knowledge tree:

- `index.md`: current topic map;
- `capability-model.md`: current "interface is permission" model;
- `session-model.md`: selected session-bound invocation context summary;
- `shell-and-remote-access.md`: shell, Telnet, SSH, WebShellGateway status;
- `qemu-proofs.md`: proof target summaries;
- `open-findings.md`: current review findings summarized with links.

This tree must be clearly labeled as compiled navigation, not authority. It can
be hidden from public docs until reviewed.

### Agent Evals

Add deterministic repository-workflow evals:

- identify selected milestone from `WORKPLAN.md`;
- find the relevant backlog and proposal;
- reject editing the main worktree;
- detect another active worker claiming the same exclusive path or generated
  output;
- choose a non-overlapping task or wait when a shared resource is already
  leased;
- identify required checks for a doc-only proposal change;
- detect a superseded proposal and follow replacement;
- update proposal index and summary when adding a proposal;
- avoid claiming full tests passed when only docs built;
- surface open review findings before unrelated feature work.

These evals can start as scripted fixtures. They do not need live model calls.

## Mechanical Checks

Extend existing documentation tooling to check:

- every proposal in `docs/proposals/` is present in `docs/proposals/index.md`
  or an explicit archive section;
- every proposal linked in `docs/SUMMARY.md` exists;
- every proposal with topics appears in `docs/topics.md` after generation;
- `superseded_by` points to an existing file;
- superseded proposals display a replacement link near the top;
- `WORKPLAN.md` selected milestone has a matching backlog pointer;
- run-target inventory entries point to existing Make targets and manifests;
- research-backed proposals link at least one `docs/research/*.md` note;
- external source snapshots in research notes include a review date;
- QEMU proof claims name a target.
- active-work registry entries point to existing branches/worktrees when local;
- no two active registry entries claim the same exclusive resource unless one
  is marked blocked, abandoned, or waiting for merge.

These checks should start warning-only if needed, then become required once the
metadata is in place.

## Workflow Impact

For agents:

- start at `docs/agent-harness.md`;
- read selected milestone state through stable headings or generated sidecar;
- inspect active-work/resource claims before choosing or mutating shared files;
- follow proposal relationship metadata to avoid stale design;
- choose checks from run-target inventory;
- update docs/status through mechanically checked indexes;
- hand off with proof target names and transcript artifacts.

For humans:

- less repeated explanation of repo rules;
- easier review of whether an agent chose the right context;
- clearer detection of stale docs;
- explicit locations for "why did we change direction?" records.

## Implementation Phases

### Phase 1 - Map and Inventory

- Add `docs/agent-harness.md`.
- Add initial `docs/run-targets.md` by hand for major run targets.
- Link both from `docs/SUMMARY.md`, `docs/topics.md`, and `README.md`.
- Add a short section in `WORKPLAN.md` pointing future agents to the harness
  map.

### Phase 2 - Metadata and Checks

- Standardize front matter for proposals and research notes.
- Extend mdBook metadata tooling to validate proposal index, topic membership,
  summary links, status fields, and supersession links.
- Add run-target inventory validation against Makefile and manifest paths.

### Phase 3 - Decision Records

- Add `docs/decisions/` and initial pivot records for the session-bound
  invocation context change and hosted-agent split.
- Link decisions from affected proposals and backlog files.

### Phase 4 - Compiled Agent Wiki

- Create a reviewed `docs/agent-wiki/` seed for the current selected milestone.
- Add lint for stale links, missing citations, and "compiled, not authority"
  labels.
- Decide whether generated wiki pages are published in mdBook or kept as
  repo-internal harness files.

### Phase 5 - Agent Workflow Evals

- Add fixtures and scripts for repository-workflow evals.
- Run them in a docs/check target.
- Use failures to improve `docs/agent-harness.md`, metadata, and run-target
  inventory.

## Open Questions

- Should proposal relationship metadata live only in front matter, or should
  there be a generated JSON sidecar for fast agent/tool consumption?
- Should `docs/agent-wiki/` be generated on demand or checked in after review?
- How much QEMU transcript output should be retained as proof artifacts without
  bloating the repository?
- Should run-target metadata live in Makefile comments, a CUE file, or
  `docs/run-targets.md` front matter blocks?
- How strict should the first status linter be, given existing historical docs?
- Should agent evals be part of `make docs`, a separate
  `make agent-harness-check`, or a broader `make check`?

## Relationship to Existing Documents

- [Hosted agent harnesses research](../research/hosted-agent-harnesses.md)
  records the external harness research and the initial checklist.
- [capOS-Hosted Agent Swarms](hosted-agent-swarm-proposal.md) uses this repo
  harness as precedent for future capOS-hosted agents.
- [mdBook Documentation Site](mdbook-docs-site-proposal.md) owns public docs
  structure and status vocabulary; this proposal adds agent-legibility and
  mechanical checks on top.
- `CLAUDE.md`, `AGENTS.md`, `WORKPLAN.md`, and `REVIEW_FINDINGS.md` remain
  authoritative workflow inputs. `docs/agent-harness.md` should route to them,
  not replace them.
