Benchmarks
capOS benchmark results are correctness-gated evidence, not standalone
performance claims. A benchmark run is useful only when its output verifier
passes, the authority profile matches the workload being measured, raw logs are
kept under target/, and the comparison environment is stated.
The general benchmark model is in System Performance Benchmarks. The latest accepted benchmark evidence is the completed Multi-Process SMP Concurrency proof.
Multi-Process SMP Concurrency
The completed milestone required make run-smp-process-scale to boot a
focused capOS manifest under QEMU/KVM, run independent CPU-bound worker
processes, and record at least 1.6x median speedup from the -smp 1
one-worker case to the -smp 2 two-worker case over repeated runs.
Current capOS evidence passes the milestone speedup and smoke gates:
| System | Workload | Runs | smp1 median | smp2 median | smp4 median | 1-to-2 speedup | 1-to-4 speedup | Status |
|---|---|---|---|---|---|---|---|---|
| capOS local QEMU/KVM | primes 2..3_000_000, balanced contiguous splits | 5 | 1,693 scaled cycles | 1,053 scaled cycles | 2,314 scaled cycles | 1.608x | 0.732x | Passes 1-to-2 gate only |
capOS capos-bench nested QEMU/KVM | same primes and balanced contiguous splits, QEMU pinned to host CPUs 0,1,2,3 | 5 | 1,639 scaled cycles | 875 scaled cycles | 1,111 scaled cycles | 1.873x | 1.475x | Comparison run retained; no 4-core claim |
Linux guest capos-bench nested QEMU/KVM | same primes and balanced contiguous splits, QEMU pinned to host CPUs 0,1,2,3 | 5 | 1,275,187,210 ns | 659,218,025 ns | 337,877,986 ns | 1.934x | 3.774x | Comparison run retained |
The local capOS run is recorded in
target/smp-process-scale/cycle-balanced-default/. The elapsed unit is the
worker-side user-mode cycle counter shifted right by 20 bits, so the proof is
not quantized by the 100 Hz kernel tick. The capos-bench reruns were recorded
on GCE n2-highcpu-8 in europe-west3-b at commit 0d89a91b
(2026-04-30 11:09 UTC), using nested QEMU/KVM on an Ubuntu
6.17.0-1012-gcp host, QEMU 8.2.2, Rust nightly 1.97.0-nightly
(c935696dd 2026-04-29), and host CPU topology where logical CPUs 0,1,2,3
are distinct physical cores with SMT siblings 4,5,6,7. Raw capOS artifacts
are under
target/smp-process-scale/capos-bench-pinned-20260430T1113Z/; raw Linux
baseline artifacts are under
target/linux-smp-process-scale/capos-bench-pinned-20260430T1118Z/.
The capOS and Linux medians intentionally use different elapsed units. capOS
reports a worker-side user-mode cycle counter, scaled by shifting right 20
bits, because the current kernel tick is too coarse for this benchmark. The
Linux comparison reports guest clock_gettime elapsed nanoseconds. Do not
compare the absolute capOS cycle values with Linux nanoseconds; compare
speedup ratios within each system row.
The capos-bench 4-vCPU case is deliberately reported but not accepted as a
capOS scaling claim: smp4 was faster than the 1-vCPU baseline but slower than
the 2-vCPU case. Linux continued scaling through 4 vCPUs under the same pinning
and workload. The current capOS milestone gate remains the repeated 1-to-2
speedup proof; explaining and improving the 4-vCPU behavior is follow-on SMP
scheduler/runtime work.
The GCE host exposes eight logical CPUs, but only four physical cores. An
8-vCPU row would therefore be an SMT diagnostic, not 8-core evidence, and the
current capOS and Linux process-scale harnesses do not yet define an eight-way
split for this workload. Add that as a separately labeled smp8-smt
diagnostic before publishing 8-vCPU numbers.
The prior Linux comparison for the earlier static-split workload recorded
1.632x 1-to-2 speedup and 2.62x 1-to-4 speedup, but that row is historical
only because the workload partitioning changed.
Both SMP harnesses can pin QEMU to a host CPU set with taskset:
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for capOS and
LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for Linux. The summary logs record
the configured CPU set. Pinning QEMU is not the same as isolating CPUs from
other host work; publishable isolated-CPU runs must also document the host
isolation mechanism, such as boot-time isolcpus/nohz_full/rcu_nocbs or
a cpuset/systemd CPU-affinity policy. Pinning QEMU to fewer host logical CPUs
than the guest -smp count is functional-only oversubscription, not speedup
evidence; 4-vCPU claims require at least four suitable host CPUs, preferably
four physical cores.
The passing capOS run closes the focused speedup gate. The milestone closeout
also reran ordinary run-smoke and run-spawn coverage under -smp 2, with
logs in target/smp2-smokes/, covering the default manifest, ring, thread
lifecycle, park cleanup, generic child waits, and process exit.
The capos-bench rows are still nested-QEMU guest evidence, not proof that
capOS boots directly on cloud VM hardware. After capOS reaches a first real
cloud-VM boot, rerun the benchmark profiles
that support the booted hardware profile. Cloud reruns must record provider,
instance type, cloud image identity, firmware/device model, CPU topology, and
tool/provenance bundle separately from local QEMU/KVM evidence.
Next CPU-Scaling Workload
Prime counting was sufficient to prove independent worker processes can run on multiple CPUs, but it is not the best workload for future scaling claims because trial-division cost grows with the candidate value. The next CPU-scaling profile should use fixed-size independent chunks, preferably a parallel hash/checksum workload:
- each worker receives the same number of bytes or blocks;
- the timed region performs no syscalls and writes only a private result slot;
- the verifier combines per-chunk hashes into a deterministic root hash;
- the Linux comparison uses the same corpus, chunk size, and root-hash rule;
- 4-vCPU results are published only on hosts with at least four suitable host CPUs, preferably four physical cores or a documented cloud topology.
This shape is more realistic than an artificial arithmetic loop because it matches content-addressed storage, package verification, boot-package integrity, and artifact validation.
Commands
Run the capOS proof:
make run-smp-process-scale
Run the capOS proof with QEMU pinned to one logical CPU from each of two physical cores after confirming the host CPU topology:
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 make run-smp-process-scale
Run the Linux comparison with a readable kernel image:
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
tools/linux-smp-process-scale-baseline.sh
Run the Linux comparison with QEMU pinned to the same host CPUs:
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 \
tools/linux-smp-process-scale-baseline.sh
On hosts where /boot/vmlinuz is not readable by the current user, copy a
kernel image into ignored target/ storage first through the host’s normal
administrative path, then pass it as LINUX_SMP_SCALE_KERNEL. The script does
not invoke sudo itself.
Both harnesses keep per-run QEMU logs and CSV summaries under target/.
Benchmark artifacts are intentionally ignored build outputs; publishable
results must be summarized in source documentation with the relevant commit,
workload, run count, QEMU/KVM envelope, and pass/fail status.