# Benchmarks

capOS benchmark results are correctness-gated evidence, not standalone
performance claims. A benchmark run is useful only when its output verifier
passes, the authority profile matches the workload being measured, raw logs are
kept under `target/`, and the comparison environment is stated.

The general benchmark model is in
[System Performance Benchmarks](proposals/system-performance-benchmarks-proposal.md).
The latest accepted benchmark evidence is the completed Multi-Process SMP
Concurrency proof.

## Multi-Process SMP Concurrency

The completed milestone required `make run-smp-process-scale` to boot a
focused capOS manifest under QEMU/KVM, run independent CPU-bound worker
processes, and record at least `1.6x` median speedup from the `-smp 1`
one-worker case to the `-smp 2` two-worker case over repeated runs.

Current capOS evidence passes the milestone speedup and smoke gates. Both
comparison rows report five runs of the primes `2..3_000_000` workload with
balanced contiguous splits. The nested QEMU/KVM rows pin QEMU to host CPUs
`0,1,2,3`. The capOS row is a controlled `capos-bench` comparison run retained
for follow-up analysis, not a 4-core scaling claim; the Linux row is retained
as the matching guest baseline.

<!-- capos-benchmark-results:multi-process-smp start -->
| System | smp1 median | smp2 median | smp4 median | 1-to-2 speedup | 1-to-4 speedup |
|---|---:|---:|---:|---:|---:|
| capOS | 1,639 scaled cycles | 875 scaled cycles | 1,111 scaled cycles | 1.873x | 1.475x |
| Linux | 1,275,187,210 ns | 659,218,025 ns | 337,877,986 ns | 1.934x | 3.774x |
<!-- capos-benchmark-results:multi-process-smp end -->

The accepted local milestone closeout remains recorded in
`target/smp-process-scale/cycle-balanced-default/`; the comparison table above
keeps only the controlled `capos-bench` reruns. The elapsed unit is the
worker-side user-mode cycle counter shifted right by 20 bits, so the proof is
not quantized by the 100 Hz kernel tick. The `capos-bench` reruns were recorded
on GCE `n2-highcpu-8` in `europe-west3-b` at commit `0d89a91b`
(`2026-04-30 11:09 UTC`), using nested QEMU/KVM on an Ubuntu
`6.17.0-1012-gcp` host, QEMU `8.2.2`, Rust nightly `1.97.0-nightly`
(`c935696dd 2026-04-29`), and host CPU topology where logical CPUs `0,1,2,3`
are distinct physical cores with SMT siblings `4,5,6,7`. Raw capOS artifacts
are under
`target/smp-process-scale/capos-bench-pinned-20260430T1113Z/`; raw Linux
baseline artifacts are under
`target/linux-smp-process-scale/capos-bench-pinned-20260430T1118Z/`.

The capOS and Linux medians intentionally use different elapsed units. capOS
reports a worker-side user-mode cycle counter, scaled by shifting right 20
bits, because the current kernel tick is too coarse for this benchmark. The
Linux comparison reports guest `clock_gettime` elapsed nanoseconds. Do not
compare the absolute capOS cycle values with Linux nanoseconds; compare
speedup ratios within each system row.

The `capos-bench` 4-vCPU case is deliberately reported but not accepted as a
capOS scaling claim: `smp4` was faster than the 1-vCPU baseline but slower than
the 2-vCPU case. Linux continued scaling through 4 vCPUs under the same pinning
and workload. The current capOS milestone gate remains the repeated 1-to-2
speedup proof; explaining and improving the 4-vCPU behavior is follow-on SMP
scheduler/runtime work.

The GCE host exposes eight logical CPUs, but only four physical cores. An
8-vCPU row is therefore an SMT diagnostic, not 8-core evidence. The capOS and
Linux process-scale harnesses define a separately labeled opt-in `smp8-smt`
case with eight contiguous worker ranges over the same `2..3_000_000`
workload. It prints `workers=8` and `cpus=8` when run under `-smp 8`, but it is
informational on 4-core/8-thread hosts and is not part of the accepted 1-to-2
speedup threshold.

Informational `smp8-smt` evidence from `capos-bench` at commit `7c15dd47`
(`2026-04-30 11:45 UTC`) used nested QEMU/KVM pinned to logical CPUs
`0,1,2,3,4,5,6,7`. Both rows report five runs. The capOS row is an
informational SMT diagnostic with no 8-core claim; the Linux row is the
matching guest diagnostic baseline.

<!-- capos-benchmark-results:smp8-smt-medians start -->
| System | smp1 median | smp2 median | smp4 median | smp8-smt median |
|---|---:|---:|---:|---:|
| capOS | 1,500 scaled cycles | 787 scaled cycles | 1,052 scaled cycles | 1,595 scaled cycles |
| Linux | 1,274,507,854 ns | 647,611,418 ns | 337,479,795 ns | 198,903,231 ns |
<!-- capos-benchmark-results:smp8-smt-medians end -->

<!-- capos-benchmark-results:smp8-smt-speedups start -->
| System | 1-to-2 speedup | 1-to-4 speedup | 1-to-8 speedup |
|---|---:|---:|---:|
| capOS | 1.906x | 1.426x | 0.940x |
| Linux | 1.968x | 3.777x | 6.408x |
<!-- capos-benchmark-results:smp8-smt-speedups end -->

Raw capOS SMT artifacts are under
`target/smp-process-scale/capos-bench-smt8-20260430T1148Z/`; raw Linux SMT
artifacts are under
`target/linux-smp-process-scale/capos-bench-smt8-20260430T1151Z/`.

The prior Linux comparison for the earlier static-split workload recorded
`1.632x` 1-to-2 speedup and `2.62x` 1-to-4 speedup, but that row is historical
only because the workload partitioning changed.

Both SMP harnesses can pin QEMU to a host CPU set with `taskset`:
`CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus>` for capOS and
`LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus>` for Linux. The summary logs record
the configured CPU set. Pinning QEMU is not the same as isolating CPUs from
other host work; publishable isolated-CPU runs must also document the host
isolation mechanism, such as boot-time `isolcpus`/`nohz_full`/`rcu_nocbs` or
a cpuset/systemd CPU-affinity policy. Pinning QEMU to fewer host logical CPUs
than the guest `-smp` count is functional-only oversubscription, not speedup
evidence; 4-vCPU claims require at least four suitable host CPUs, preferably
four physical cores.

The passing capOS run closes the focused speedup gate. The milestone closeout
also reran ordinary `run-smoke` and `run-spawn` coverage under `-smp 2`, with
logs in `target/smp2-smokes/`, covering the default manifest, ring, thread
lifecycle, park cleanup, generic child waits, and process exit.

The `capos-bench` rows are still nested-QEMU guest evidence, not proof that
capOS boots directly on cloud VM hardware. After capOS reaches a first real
cloud-VM boot, rerun the benchmark profiles
that support the booted hardware profile. Cloud reruns must record provider,
instance type, cloud image identity, firmware/device model, CPU topology, and
tool/provenance bundle separately from local QEMU/KVM evidence.

## Next CPU-Scaling Workload

Prime counting was sufficient to prove independent worker processes can run on
multiple CPUs, but it is not the best workload for future scaling claims because
trial-division cost grows with the candidate value. The next CPU-scaling
profile should use fixed-size independent chunks, preferably a parallel
hash/checksum workload:

- each worker receives the same number of bytes or blocks;
- the timed region performs no syscalls and writes only a private result slot;
- the verifier combines per-chunk hashes into a deterministic root hash;
- the Linux comparison uses the same corpus, chunk size, and root-hash rule;
- 4-vCPU results are published only on hosts with at least four suitable host
  CPUs, preferably four physical cores or a documented cloud topology.

This shape is more realistic than an artificial arithmetic loop because it
matches content-addressed storage, package verification, boot-package
integrity, and artifact validation.

## Commands

Run the capOS proof:

```bash
make run-smp-process-scale
```

Run the capOS proof with QEMU pinned to one logical CPU from each of two
physical cores after confirming the host CPU topology:

```bash
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 make run-smp-process-scale
```

Run the capOS proof with the optional 8-logical-CPU SMT diagnostic on a host
that reports at least eight logical CPUs, pinning QEMU to the same eight
logical CPUs when collecting controlled evidence:

```bash
CAPOS_SMP_SCALE_INCLUDE_SMT=1 \
  CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
  make run-smp-process-scale
```

Run the Linux comparison with a readable kernel image:

```bash
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  tools/linux-smp-process-scale-baseline.sh
```

Run the Linux comparison with QEMU pinned to the same host CPUs:

```bash
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 \
  tools/linux-smp-process-scale-baseline.sh
```

Run the matching Linux `smp8-smt` diagnostic:

```bash
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  LINUX_SMP_SCALE_INCLUDE_SMT=1 \
  LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
  tools/linux-smp-process-scale-baseline.sh
```

On hosts where `/boot/vmlinuz` is not readable by the current user, copy a
kernel image into ignored `target/` storage first through the host's normal
administrative path, then pass it as `LINUX_SMP_SCALE_KERNEL`. The script does
not invoke `sudo` itself.

Both harnesses keep per-run QEMU logs and CSV summaries under `target/`.
Benchmark artifacts are intentionally ignored build outputs; publishable
results must be summarized in source documentation with the relevant commit,
workload, run count, QEMU/KVM envelope, and pass/fail status.