Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarks

capOS benchmark results are correctness-gated evidence, not standalone performance claims. A benchmark run is useful only when its output verifier passes, the authority profile matches the workload being measured, raw logs are kept under target/, and the comparison environment is stated.

The general benchmark model is in System Performance Benchmarks. The latest accepted benchmark evidence is the completed Multi-Process SMP Concurrency proof.

Multi-Process SMP Concurrency

The completed milestone required make run-smp-process-scale to boot a focused capOS manifest under QEMU/KVM, run independent CPU-bound worker processes, and record at least 1.6x median speedup from the -smp 1 one-worker case to the -smp 2 two-worker case over repeated runs.

Current capOS evidence passes the milestone speedup and smoke gates:

SystemWorkloadRunssmp1 mediansmp2 mediansmp4 median1-to-2 speedup1-to-4 speedupStatus
capOS local QEMU/KVMprimes 2..3_000_000, balanced contiguous splits51,693 scaled cycles1,053 scaled cycles2,314 scaled cycles1.608x0.732xPasses 1-to-2 gate only
capOS capos-bench nested QEMU/KVMsame primes and balanced contiguous splits, QEMU pinned to host CPUs 0,1,2,351,639 scaled cycles875 scaled cycles1,111 scaled cycles1.873x1.475xComparison run retained; no 4-core claim
Linux guest capos-bench nested QEMU/KVMsame primes and balanced contiguous splits, QEMU pinned to host CPUs 0,1,2,351,275,187,210 ns659,218,025 ns337,877,986 ns1.934x3.774xComparison run retained

The local capOS run is recorded in target/smp-process-scale/cycle-balanced-default/. The elapsed unit is the worker-side user-mode cycle counter shifted right by 20 bits, so the proof is not quantized by the 100 Hz kernel tick. The capos-bench reruns were recorded on GCE n2-highcpu-8 in europe-west3-b at commit 0d89a91b (2026-04-30 11:09 UTC), using nested QEMU/KVM on an Ubuntu 6.17.0-1012-gcp host, QEMU 8.2.2, Rust nightly 1.97.0-nightly (c935696dd 2026-04-29), and host CPU topology where logical CPUs 0,1,2,3 are distinct physical cores with SMT siblings 4,5,6,7. Raw capOS artifacts are under target/smp-process-scale/capos-bench-pinned-20260430T1113Z/; raw Linux baseline artifacts are under target/linux-smp-process-scale/capos-bench-pinned-20260430T1118Z/.

The capOS and Linux medians intentionally use different elapsed units. capOS reports a worker-side user-mode cycle counter, scaled by shifting right 20 bits, because the current kernel tick is too coarse for this benchmark. The Linux comparison reports guest clock_gettime elapsed nanoseconds. Do not compare the absolute capOS cycle values with Linux nanoseconds; compare speedup ratios within each system row.

The capos-bench 4-vCPU case is deliberately reported but not accepted as a capOS scaling claim: smp4 was faster than the 1-vCPU baseline but slower than the 2-vCPU case. Linux continued scaling through 4 vCPUs under the same pinning and workload. The current capOS milestone gate remains the repeated 1-to-2 speedup proof; explaining and improving the 4-vCPU behavior is follow-on SMP scheduler/runtime work. The GCE host exposes eight logical CPUs, but only four physical cores. An 8-vCPU row would therefore be an SMT diagnostic, not 8-core evidence, and the current capOS and Linux process-scale harnesses do not yet define an eight-way split for this workload. Add that as a separately labeled smp8-smt diagnostic before publishing 8-vCPU numbers.

The prior Linux comparison for the earlier static-split workload recorded 1.632x 1-to-2 speedup and 2.62x 1-to-4 speedup, but that row is historical only because the workload partitioning changed.

Both SMP harnesses can pin QEMU to a host CPU set with taskset: CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for capOS and LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for Linux. The summary logs record the configured CPU set. Pinning QEMU is not the same as isolating CPUs from other host work; publishable isolated-CPU runs must also document the host isolation mechanism, such as boot-time isolcpus/nohz_full/rcu_nocbs or a cpuset/systemd CPU-affinity policy. Pinning QEMU to fewer host logical CPUs than the guest -smp count is functional-only oversubscription, not speedup evidence; 4-vCPU claims require at least four suitable host CPUs, preferably four physical cores.

The passing capOS run closes the focused speedup gate. The milestone closeout also reran ordinary run-smoke and run-spawn coverage under -smp 2, with logs in target/smp2-smokes/, covering the default manifest, ring, thread lifecycle, park cleanup, generic child waits, and process exit.

The capos-bench rows are still nested-QEMU guest evidence, not proof that capOS boots directly on cloud VM hardware. After capOS reaches a first real cloud-VM boot, rerun the benchmark profiles that support the booted hardware profile. Cloud reruns must record provider, instance type, cloud image identity, firmware/device model, CPU topology, and tool/provenance bundle separately from local QEMU/KVM evidence.

Next CPU-Scaling Workload

Prime counting was sufficient to prove independent worker processes can run on multiple CPUs, but it is not the best workload for future scaling claims because trial-division cost grows with the candidate value. The next CPU-scaling profile should use fixed-size independent chunks, preferably a parallel hash/checksum workload:

  • each worker receives the same number of bytes or blocks;
  • the timed region performs no syscalls and writes only a private result slot;
  • the verifier combines per-chunk hashes into a deterministic root hash;
  • the Linux comparison uses the same corpus, chunk size, and root-hash rule;
  • 4-vCPU results are published only on hosts with at least four suitable host CPUs, preferably four physical cores or a documented cloud topology.

This shape is more realistic than an artificial arithmetic loop because it matches content-addressed storage, package verification, boot-package integrity, and artifact validation.

Commands

Run the capOS proof:

make run-smp-process-scale

Run the capOS proof with QEMU pinned to one logical CPU from each of two physical cores after confirming the host CPU topology:

CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 make run-smp-process-scale

Run the Linux comparison with a readable kernel image:

LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  tools/linux-smp-process-scale-baseline.sh

Run the Linux comparison with QEMU pinned to the same host CPUs:

LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 \
  tools/linux-smp-process-scale-baseline.sh

On hosts where /boot/vmlinuz is not readable by the current user, copy a kernel image into ignored target/ storage first through the host’s normal administrative path, then pass it as LINUX_SMP_SCALE_KERNEL. The script does not invoke sudo itself.

Both harnesses keep per-run QEMU logs and CSV summaries under target/. Benchmark artifacts are intentionally ignored build outputs; publishable results must be summarized in source documentation with the relevant commit, workload, run count, QEMU/KVM envelope, and pass/fail status.