Benchmarks
capOS benchmark results are correctness-gated evidence, not standalone
performance claims. A benchmark run is useful only when its output verifier
passes, the authority profile matches the workload being measured, raw logs are
kept under target/, and the comparison environment is stated.
The general benchmark model is in System Performance Benchmarks. The latest accepted benchmark evidence is the completed Multi-Process SMP Concurrency proof.
Multi-Process SMP Concurrency
The completed milestone required make run-smp-process-scale to boot a
focused capOS manifest under QEMU/KVM, run independent CPU-bound worker
processes, and record at least 1.6x median speedup from the -smp 1
one-worker case to the -smp 2 two-worker case over repeated runs.
Current capOS evidence passes the milestone speedup and smoke gates. Both
comparison rows report five runs of the primes 2..3_000_000 workload with
balanced contiguous splits. The nested QEMU/KVM rows pin QEMU to host CPUs
0,1,2,3. The capOS row is a controlled capos-bench comparison run retained
for follow-up analysis, not a 4-core scaling claim; the Linux row is retained
as the matching guest baseline.
| System | smp1 median | smp2 median | smp4 median | 1-to-2 speedup | 1-to-4 speedup |
|---|---|---|---|---|---|
| capOS | 1,639 scaled cycles | 875 scaled cycles | 1,111 scaled cycles | 1.873x | 1.475x |
| Linux | 1,275,187,210 ns | 659,218,025 ns | 337,877,986 ns | 1.934x | 3.774x |
The accepted local milestone closeout remains recorded in
target/smp-process-scale/cycle-balanced-default/; the comparison table above
keeps only the controlled capos-bench reruns. The elapsed unit is the
worker-side user-mode cycle counter shifted right by 20 bits, so the proof is
not quantized by the 100 Hz kernel tick. The capos-bench reruns were recorded
on GCE n2-highcpu-8 in europe-west3-b at commit 0d89a91b
(2026-04-30 11:09 UTC), using nested QEMU/KVM on an Ubuntu
6.17.0-1012-gcp host, QEMU 8.2.2, Rust nightly 1.97.0-nightly
(c935696dd 2026-04-29), and host CPU topology where logical CPUs 0,1,2,3
are distinct physical cores with SMT siblings 4,5,6,7. Raw capOS artifacts
are under
target/smp-process-scale/capos-bench-pinned-20260430T1113Z/; raw Linux
baseline artifacts are under
target/linux-smp-process-scale/capos-bench-pinned-20260430T1118Z/.
The capOS and Linux medians intentionally use different elapsed units. capOS
reports a worker-side user-mode cycle counter, scaled by shifting right 20
bits, because the current kernel tick is too coarse for this benchmark. The
Linux comparison reports guest clock_gettime elapsed nanoseconds. Do not
compare the absolute capOS cycle values with Linux nanoseconds; compare
speedup ratios within each system row.
The capos-bench 4-vCPU case is deliberately reported but not accepted as a
capOS scaling claim: smp4 was faster than the 1-vCPU baseline but slower than
the 2-vCPU case. Linux continued scaling through 4 vCPUs under the same pinning
and workload. The current capOS milestone gate remains the repeated 1-to-2
speedup proof; explaining and improving the 4-vCPU behavior is follow-on SMP
scheduler/runtime work.
The GCE host exposes eight logical CPUs, but only four physical cores. An
8-vCPU row is therefore an SMT diagnostic, not 8-core evidence. The capOS and
Linux process-scale harnesses define a separately labeled opt-in smp8-smt
case with eight contiguous worker ranges over the same 2..3_000_000
workload. It prints workers=8 and cpus=8 when run under -smp 8, but it is
informational on 4-core/8-thread hosts and is not part of the accepted 1-to-2
speedup threshold.
Informational smp8-smt evidence from capos-bench at commit 7c15dd47
(2026-04-30 11:45 UTC) used nested QEMU/KVM pinned to logical CPUs
0,1,2,3,4,5,6,7. Both rows report five runs. The capOS row is an
informational SMT diagnostic with no 8-core claim; the Linux row is the
matching guest diagnostic baseline.
| System | smp1 median | smp2 median | smp4 median | smp8-smt median |
|---|---|---|---|---|
| capOS | 1,500 scaled cycles | 787 scaled cycles | 1,052 scaled cycles | 1,595 scaled cycles |
| Linux | 1,274,507,854 ns | 647,611,418 ns | 337,479,795 ns | 198,903,231 ns |
| System | 1-to-2 speedup | 1-to-4 speedup | 1-to-8 speedup |
|---|---|---|---|
| capOS | 1.906x | 1.426x | 0.940x |
| Linux | 1.968x | 3.777x | 6.408x |
Raw capOS SMT artifacts are under
target/smp-process-scale/capos-bench-smt8-20260430T1148Z/; raw Linux SMT
artifacts are under
target/linux-smp-process-scale/capos-bench-smt8-20260430T1151Z/.
The prior Linux comparison for the earlier static-split workload recorded
1.632x 1-to-2 speedup and 2.62x 1-to-4 speedup, but that row is historical
only because the workload partitioning changed.
Both SMP harnesses can pin QEMU to a host CPU set with taskset:
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for capOS and
LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for Linux. The summary logs record
the configured CPU set. Pinning QEMU is not the same as isolating CPUs from
other host work; publishable isolated-CPU runs must also document the host
isolation mechanism, such as boot-time isolcpus/nohz_full/rcu_nocbs or
a cpuset/systemd CPU-affinity policy. Pinning QEMU to fewer host logical CPUs
than the guest -smp count is functional-only oversubscription, not speedup
evidence; 4-vCPU claims require at least four suitable host CPUs, preferably
four physical cores.
The passing capOS run closes the focused speedup gate. The milestone closeout
also reran ordinary run-smoke and run-spawn coverage under -smp 2, with
logs in target/smp2-smokes/, covering the default manifest, ring, thread
lifecycle, park cleanup, generic child waits, and process exit.
The capos-bench rows are still nested-QEMU guest evidence, not proof that
capOS boots directly on cloud VM hardware. After capOS reaches a first real
cloud-VM boot, rerun the benchmark profiles
that support the booted hardware profile. Cloud reruns must record provider,
instance type, cloud image identity, firmware/device model, CPU topology, and
tool/provenance bundle separately from local QEMU/KVM evidence.
Next CPU-Scaling Workload
Prime counting was sufficient to prove independent worker processes can run on multiple CPUs, but it is not the best workload for future scaling claims because trial-division cost grows with the candidate value. The next CPU-scaling profile should use fixed-size independent chunks, preferably a parallel hash/checksum workload:
- each worker receives the same number of bytes or blocks;
- the timed region performs no syscalls and writes only a private result slot;
- the verifier combines per-chunk hashes into a deterministic root hash;
- the Linux comparison uses the same corpus, chunk size, and root-hash rule;
- 4-vCPU results are published only on hosts with at least four suitable host CPUs, preferably four physical cores or a documented cloud topology.
This shape is more realistic than an artificial arithmetic loop because it matches content-addressed storage, package verification, boot-package integrity, and artifact validation.
Commands
Run the capOS proof:
make run-smp-process-scale
Run the capOS proof with QEMU pinned to one logical CPU from each of two physical cores after confirming the host CPU topology:
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 make run-smp-process-scale
Run the capOS proof with the optional 8-logical-CPU SMT diagnostic on a host that reports at least eight logical CPUs, pinning QEMU to the same eight logical CPUs when collecting controlled evidence:
CAPOS_SMP_SCALE_INCLUDE_SMT=1 \
CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
make run-smp-process-scale
Run the Linux comparison with a readable kernel image:
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
tools/linux-smp-process-scale-baseline.sh
Run the Linux comparison with QEMU pinned to the same host CPUs:
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 \
tools/linux-smp-process-scale-baseline.sh
Run the matching Linux smp8-smt diagnostic:
LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
LINUX_SMP_SCALE_INCLUDE_SMT=1 \
LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
tools/linux-smp-process-scale-baseline.sh
On hosts where /boot/vmlinuz is not readable by the current user, copy a
kernel image into ignored target/ storage first through the host’s normal
administrative path, then pass it as LINUX_SMP_SCALE_KERNEL. The script does
not invoke sudo itself.
Both harnesses keep per-run QEMU logs and CSV summaries under target/.
Benchmark artifacts are intentionally ignored build outputs; publishable
results must be summarized in source documentation with the relevant commit,
workload, run count, QEMU/KVM envelope, and pass/fail status.