Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarks

capOS benchmark results are correctness-gated evidence, not standalone performance claims. A benchmark run is useful only when its output verifier passes, the authority profile matches the workload being measured, raw logs are kept under target/, and the comparison environment is stated.

The general benchmark model is in System Performance Benchmarks. The latest accepted benchmark evidence is the completed Multi-Process SMP Concurrency proof.

Multi-Process SMP Concurrency

The completed milestone required make run-smp-process-scale to boot a focused capOS manifest under QEMU/KVM, run independent CPU-bound worker processes, and record at least 1.6x median speedup from the -smp 1 one-worker case to the -smp 2 two-worker case over repeated runs.

Current capOS evidence passes the milestone speedup and smoke gates. Both comparison rows report five runs of the primes 2..3_000_000 workload with balanced contiguous splits. The nested QEMU/KVM rows pin QEMU to host CPUs 0,1,2,3. The capOS row is a controlled capos-bench comparison run retained for follow-up analysis, not a 4-core scaling claim; the Linux row is retained as the matching guest baseline.

Systemsmp1 mediansmp2 mediansmp4 median1-to-2 speedup1-to-4 speedup
capOS1,639 scaled cycles875 scaled cycles1,111 scaled cycles1.873x1.475x
Linux1,275,187,210 ns659,218,025 ns337,877,986 ns1.934x3.774x

The accepted local milestone closeout remains recorded in target/smp-process-scale/cycle-balanced-default/; the comparison table above keeps only the controlled capos-bench reruns. The elapsed unit is the worker-side user-mode cycle counter shifted right by 20 bits, so the proof is not quantized by the 100 Hz kernel tick. The capos-bench reruns were recorded on GCE n2-highcpu-8 in europe-west3-b at commit 0d89a91b (2026-04-30 11:09 UTC), using nested QEMU/KVM on an Ubuntu 6.17.0-1012-gcp host, QEMU 8.2.2, Rust nightly 1.97.0-nightly (c935696dd 2026-04-29), and host CPU topology where logical CPUs 0,1,2,3 are distinct physical cores with SMT siblings 4,5,6,7. Raw capOS artifacts are under target/smp-process-scale/capos-bench-pinned-20260430T1113Z/; raw Linux baseline artifacts are under target/linux-smp-process-scale/capos-bench-pinned-20260430T1118Z/.

The capOS and Linux medians intentionally use different elapsed units. capOS reports a worker-side user-mode cycle counter, scaled by shifting right 20 bits, because the current kernel tick is too coarse for this benchmark. The Linux comparison reports guest clock_gettime elapsed nanoseconds. Do not compare the absolute capOS cycle values with Linux nanoseconds; compare speedup ratios within each system row.

The capos-bench 4-vCPU case is deliberately reported but not accepted as a capOS scaling claim: smp4 was faster than the 1-vCPU baseline but slower than the 2-vCPU case. Linux continued scaling through 4 vCPUs under the same pinning and workload. The current capOS milestone gate remains the repeated 1-to-2 speedup proof; explaining and improving the 4-vCPU behavior is follow-on SMP scheduler/runtime work.

The GCE host exposes eight logical CPUs, but only four physical cores. An 8-vCPU row is therefore an SMT diagnostic, not 8-core evidence. The capOS and Linux process-scale harnesses define a separately labeled opt-in smp8-smt case with eight contiguous worker ranges over the same 2..3_000_000 workload. It prints workers=8 and cpus=8 when run under -smp 8, but it is informational on 4-core/8-thread hosts and is not part of the accepted 1-to-2 speedup threshold.

Informational smp8-smt evidence from capos-bench at commit 7c15dd47 (2026-04-30 11:45 UTC) used nested QEMU/KVM pinned to logical CPUs 0,1,2,3,4,5,6,7. Both rows report five runs. The capOS row is an informational SMT diagnostic with no 8-core claim; the Linux row is the matching guest diagnostic baseline.

Systemsmp1 mediansmp2 mediansmp4 mediansmp8-smt median
capOS1,500 scaled cycles787 scaled cycles1,052 scaled cycles1,595 scaled cycles
Linux1,274,507,854 ns647,611,418 ns337,479,795 ns198,903,231 ns
System1-to-2 speedup1-to-4 speedup1-to-8 speedup
capOS1.906x1.426x0.940x
Linux1.968x3.777x6.408x

Raw capOS SMT artifacts are under target/smp-process-scale/capos-bench-smt8-20260430T1148Z/; raw Linux SMT artifacts are under target/linux-smp-process-scale/capos-bench-smt8-20260430T1151Z/.

The prior Linux comparison for the earlier static-split workload recorded 1.632x 1-to-2 speedup and 2.62x 1-to-4 speedup, but that row is historical only because the workload partitioning changed.

Both SMP harnesses can pin QEMU to a host CPU set with taskset: CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for capOS and LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=<cpus> for Linux. The summary logs record the configured CPU set. Pinning QEMU is not the same as isolating CPUs from other host work; publishable isolated-CPU runs must also document the host isolation mechanism, such as boot-time isolcpus/nohz_full/rcu_nocbs or a cpuset/systemd CPU-affinity policy. Pinning QEMU to fewer host logical CPUs than the guest -smp count is functional-only oversubscription, not speedup evidence; 4-vCPU claims require at least four suitable host CPUs, preferably four physical cores.

The passing capOS run closes the focused speedup gate. The milestone closeout also reran ordinary run-smoke and run-spawn coverage under -smp 2, with logs in target/smp2-smokes/, covering the default manifest, ring, thread lifecycle, park cleanup, generic child waits, and process exit.

The capos-bench rows are still nested-QEMU guest evidence, not proof that capOS boots directly on cloud VM hardware. After capOS reaches a first real cloud-VM boot, rerun the benchmark profiles that support the booted hardware profile. Cloud reruns must record provider, instance type, cloud image identity, firmware/device model, CPU topology, and tool/provenance bundle separately from local QEMU/KVM evidence.

Next CPU-Scaling Workload

Prime counting was sufficient to prove independent worker processes can run on multiple CPUs, but it is not the best workload for future scaling claims because trial-division cost grows with the candidate value. The next CPU-scaling profile should use fixed-size independent chunks, preferably a parallel hash/checksum workload:

  • each worker receives the same number of bytes or blocks;
  • the timed region performs no syscalls and writes only a private result slot;
  • the verifier combines per-chunk hashes into a deterministic root hash;
  • the Linux comparison uses the same corpus, chunk size, and root-hash rule;
  • 4-vCPU results are published only on hosts with at least four suitable host CPUs, preferably four physical cores or a documented cloud topology.

This shape is more realistic than an artificial arithmetic loop because it matches content-addressed storage, package verification, boot-package integrity, and artifact validation.

Commands

Run the capOS proof:

make run-smp-process-scale

Run the capOS proof with QEMU pinned to one logical CPU from each of two physical cores after confirming the host CPU topology:

CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 make run-smp-process-scale

Run the capOS proof with the optional 8-logical-CPU SMT diagnostic on a host that reports at least eight logical CPUs, pinning QEMU to the same eight logical CPUs when collecting controlled evidence:

CAPOS_SMP_SCALE_INCLUDE_SMT=1 \
  CAPOS_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
  make run-smp-process-scale

Run the Linux comparison with a readable kernel image:

LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  tools/linux-smp-process-scale-baseline.sh

Run the Linux comparison with QEMU pinned to the same host CPUs:

LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1 \
  tools/linux-smp-process-scale-baseline.sh

Run the matching Linux smp8-smt diagnostic:

LINUX_SMP_SCALE_KERNEL=target/linux-smp-process-scale/kernel/vmlinuz \
  LINUX_SMP_SCALE_INCLUDE_SMT=1 \
  LINUX_SMP_SCALE_QEMU_TASKSET_CPUS=0,1,2,3,4,5,6,7 \
  tools/linux-smp-process-scale-baseline.sh

On hosts where /boot/vmlinuz is not readable by the current user, copy a kernel image into ignored target/ storage first through the host’s normal administrative path, then pass it as LINUX_SMP_SCALE_KERNEL. The script does not invoke sudo itself.

Both harnesses keep per-run QEMU logs and CSV summaries under target/. Benchmark artifacts are intentionally ignored build outputs; publishable results must be summarized in source documentation with the relevant commit, workload, run count, QEMU/KVM envelope, and pass/fail status.