Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

In-Process Threading Contract

This page records the implemented contract for kernel-managed threads inside one process. The park authority contract is frozen separately in Park Authority. These pages are the handoff from the initial single-thread runtime checkpoint to same-process SMP work. The current slice has per-thread completion rings for spawned child threads, per-CPU WFQ run queues with bounded stealing, a caller-thread-bound SchedulingPolicyCap, and a SchedulingContext cap that records identity, bind/revoke, dispatcher budget charging/replenishment, bounded endpoint donation/return, and fixed depletion/deadline notification cells. Same-process sibling scheduling has formal accepted 1-to-2 evidence on capos-bench 2026-05-02 21:38 UTC against main commit 374f8556 (capOS work 1.883x / total 1.787x, both clearing the configured 1.6x gates; matching Linux pthread baseline 1.988x/1.987x on the same physical-core pin set). The 2026-05-02 1-to-4 row was the diagnostic that justified Phase D’s fair-share enqueue policy: capOS sat at 1.566x/1.538x while Linux scaled to 3.963x/3.858x. Phase D now runs per-CPU WFQ queues with bounded stealing and manually accepted the 2026-05-10 1-to-4 diagnostic row (3.088x/2.700x) while the harness-enforced gate remains 1-to-2 work/total speedup; see docs/benchmarks.md for the full evidence table including historical pre-collapse rows. Phase F has landed the one-SQ-consumer prerequisite, nohz telemetry, housekeeping/deferred-work placement, the clockevent/deadline substrate, and bounded SQPOLL ring mode including the non-periodic SQPOLL producer-wake progress path; the first automatic nohz activation increment is closed via docs/tasks/done/2026/scheduler-phase-f-auto-nohz-activation.md and SQPOLL-driven auto-nohz activation is also closed via docs/tasks/done/2026/scheduler-phase-f-auto-nohz-sqpoll.md; generic full-nohz for ordinary budgeted compute leases and timeout-based auto-revoke are landed; policy-service AutoNoHz issuance remains future work.

Scope

The threading milestone changes the scheduler’s unit of execution from process to thread while keeping the process as the authority, address-space, and resource-accounting boundary. Same-process sibling scheduling on multiple CPUs is functional for per-thread-ring processes. The accepted 1-to-2 performance claim is now the formal capos-bench 5-run pair recorded on 2026-05-02 21:38 UTC against main commit 374f8556: capOS work 1.883x and total 1.787x clear the configured 1.6x gates; the matching Linux pthread baseline on the same physical-core pin set (0,1,2,3) records 1.988x/1.987x, validating the workload shape. The 2026-05-02 1-to-4 row was the diagnostic that justified Phase D: capOS sat at 1.566x/1.538x while Linux scaled to 3.963x/3.858x. Phase D now runs per-CPU WFQ queues with bounded stealing and its 2026-05-10 1-to-4 row (3.088x/2.700x) was manually accepted from recorded diagnostics; the harness-enforced gate remains 1-to-2 work/total speedup. Historical pre-collapse rows and the post-collapse 3-run diagnostic remain in docs/benchmarks.md for reference. Phase E adds the SchedulingContext cap (identity, caller-thread bind, revoke, budget charging/replenishment, bounded synchronous endpoint donation/return, and fixed depletion/deadline notification cells with drain observer results), and Phase F has landed the bounded SQPOLL ring mode plus the clockevent/deadline substrate. Automatic nohz activation, realtime admission, and privileged userspace scheduler-policy services remain later work.

This contract covers:

  • process-owned versus thread-owned state;
  • the initial thread creation ABI;
  • per-thread FS-base/TLS rules;
  • thread exit and join semantics;
  • the per-thread ring blocking and completion-routing contract;
  • the caller-thread-bound SchedulingPolicyCap and SchedulingContext surfaces that mutate per-thread WFQ weight/latency-class and per-thread scheduling-context binding;
  • the handoff to the 7.1.1 park authority design.

Ownership Split

The process remains the security boundary. All threads in one process share the same address space and capability table, so a thread has the same authority as its sibling threads.

Process-owned stateThread-owned state
Process id and process generationThread id and thread generation
User address space and CR3Saved CPU context and user register state
Capability table and resource ledgerKernel stack and syscall stack top
Initial compatibility ring and ring arena ownershipPer-thread ring endpoint, scratch, and FS base
Read-only CapSet pageScheduling/blocking state
ProcessHandle exit stateThreadHandle join/exit state
Endpoint owner state and process-wide cleanup hooksWFQ weight, latency class, virtual runtime, and virtual_finish_ns enqueue tag
Process-wide resource ledgers (thread records, kernel stacks, cap-table slots)SchedulingContext binding (identity/generation, remaining budget, replenish/deadline timestamps, donation/return slot, notification recorder)

The implementation migrated incrementally. The 7.2.0 slice made each process contain a single initial Thread, with saved context, kernel stack, FS base, and blocking state stored on that thread. Later slices changed scheduler-owned queues, current execution, direct IPC handoff, and wake records to generation-checked ThreadRef values, added creation and lifecycle caps, and then assigned per-thread rings to spawned children.

Scheduler Contract

Scheduler stores runnable execution contexts as thread references, not process ids. A thread reference is (pid, process_generation, tid, thread_generation). The process generation keeps handles from naming a reused process; the thread generation keeps handles from naming a reused thread slot inside a live process.

This identity applies to Scheduler.current, run queues, direct IPC targets, Timer sleep waiters, process/terminal waiters, endpoint caller/receiver wake records, and deferred cancellation state.

Runnable ownership is split across per-CPU run queues (SCHEDULER_CPUS = 4). Each queue is ordered ascending by virtual_finish_ns, which is recomputed per enqueue from virtual_runtime_ns, the thread’s WFQ weight (clamped to [MIN_WEIGHT, MAX_WEIGHT] in capos-abi::scheduler), and a per-class slice scaled by LatencyClass (Interactive divides the slice, Batch multiplies it, Normal/IpcServer pass it through). Default placement targets the current CPU; a bounded steal path balances when a CPU’s local queue is empty, recomputes the WFQ tag at the destination, and records placement-spread / steal migrations under the measure feature. Each per-CPU queue is reserved at thread-create time to the live runnable-capable thread count so timer-tick, unblock, direct-IPC fallback, and steal-requeue paths never allocate.

The run queue, current, direct IPC target, and blocked waiter scans are thread-oriented. Address-space switches happen only when the next runnable thread belongs to a different process. TSS.RSP0, the syscall kernel stack, and FS base are updated on every thread switch because those are thread-local machine resources. Per-thread runtime_ns advances 1:1 with elapsed CPU time; virtual_runtime_ns advances by elapsed_ns * REFERENCE_WEIGHT / weight so weight changes the cumulative WFQ share rather than just an enqueue tie-breaker.

SchedulingContext bindings layer dispatcher budget on top of WFQ. A thread may carry at most one SchedulingContextThreadBinding. While bound, the dispatcher charges elapsed time against the binding’s remaining_budget_ns, replenishes from period_ns at the next replenish boundary, records deadline_or_timeout and budget_depleted notifications in the per-context fixed cells, and routes synchronous endpoint donation/return for passive receiver threads (donated_holder in the notification snapshot tracks whether the holder is the donor or the receiver). Stale-generation or revoked caps fail closed before mutating scheduler state. Realtime-island admission, CPU placement enforcement, and overrun-fault policy remain deferred.

The idle path is a per-CPU CPL0 (kernel-mode) idle thread; the former special user-mode idle process has been removed. Each CPU’s idle thread is a kernel-owned execution context — it runs on the kernel PML4 with a dedicated idle kernel stack and cannot block, exit, or hold ordinary caps. A lightweight synthetic idle Process record is retained per CPU only so the idle ThreadRef resolves through scheduler bookkeeping; it maps no user code, stack, or cap ring. See the “Idle paths” section of docs/architecture/scheduling.md.

Phase F has landed the one-SQ-consumer prerequisite, nohz telemetry, housekeeping/deferred-work placement, the clockevent/deadline substrate, and a bounded SQPOLL ring-mode worker (MAX_SQPOLL_WORKERS = 16, request_sqpoll_start_for_thread / finalize_pending_sqpoll_start_for_thread with stale-owner rollback). Tick suppression now exists behind explicit CpuIsolationLease admission, including ordinary budgeted compute leases that target a live SchedulingContext; policy-service AutoNoHz issuance and generic SQPOLL nohz for arbitrary rings remain future work.

Thread Creation ABI

Thread creation is exposed through a process-local ThreadSpawner capability. It creates threads only in the caller’s current process. It does not grant authority to another process and is non-transferable across IPC in the initial implementation.

The initial control-plane shape is:

interface ThreadSpawner {
    create @0 (
        entry :UInt64,
        stackTop :UInt64,
        arg :UInt64,
        fsBase :UInt64,
        flags :UInt64
    ) -> (handleIndex :UInt16);
}

interface ThreadHandle {
    join @0 () -> (exitCode :Int64);
    exitCode @1 () -> (exited :Bool, exitCode :Int64);
}

interface ThreadControl {
    getFsBase @0 () -> (fsBase :UInt64);
    setFsBase @1 (fsBase :UInt64) -> ();
    exitThread @2 (code :Int64) -> ();
}

Any 7.2 schema adjustment must update this page in the same branch before implementation review. The stable semantics are that creation is in-process, the returned handle is an observed result cap, ThreadHandle observes one thread rather than the whole process, and current-thread exit is available through both ThreadControl.exitThread and the raw exit(code) syscall.

The new thread starts in Ring 3 at entry with:

  • RDI = arg;
  • RSI = tid;
  • RDX = pid;
  • RCX = the current thread's ring address;
  • R8 = CAPSET_VADDR, or zero if the process has no CapSet.

The runtime supplies the user stack and TLS block. The kernel validates that entry, stackTop, and fsBase are user-canonical, that stackTop is 16-byte aligned at entry, and that reserved flags bits are zero. Page presence and stack-growth policy remain process address-space questions; before a page-fault subsystem exists, an invalid thread stack can fault the process.

Resource Accounting

Thread creation allocates kernel memory and is quota-backed by process-owned ledger state, not per-capability helper counters. The 7.2.0 checkpoint charges the initial thread during process creation; ThreadSpawner.create extends the same ledgers to additional threads. The ledger of record is:

  • PROCESS_THREAD_LIMIT, the maximum live or retained thread records in one process, initially 16;
  • PROCESS_THREAD_KERNEL_STACK_PAGES, initially matching the current per-thread kernel stack allocation size of 32 pages;
  • thread_records_used / thread_records_max;
  • thread_kernel_stack_pages_used / thread_kernel_stack_pages_max.

The initial process thread charges one thread record and one kernel-stack allocation during process creation. ThreadSpawner.create reserves a thread record and kernel-stack page budget before allocating the stack or publishing a ThreadHandle; every later failure rolls both reservations back before returning. Cap-slot reservation for the result handle remains charged to the existing process cap-table ledger.

Creation failures are controlled application exceptions. Thread count, kernel-stack budget, handle cap-slot exhaustion, and kernel stack allocation failure return Overloaded with a specific message and no partially runnable thread. Invalid entry, stack, FS base, or flags return Failed.

Thread exit releases the kernel stack only after the scheduler is running on a different kernel stack. The thread record remains charged while a live ThreadHandle, pending join waiter, or unjoined exit status can still observe it. Once the handle is released without a pending join, or once a one-shot join has consumed the status and no wait record pins it, the retained record charge is released. Process exit releases all thread records and stack charges once.

The off-stack property is enforced by an OffStackToken witness on every stack frame release path: the deferred per-thread drain calls Process::release_thread_kernel_stack, whole-process teardown calls Process::release_all_thread_kernel_stacks, and pre-publication rollback calls Process::rollback_created_thread. The token constructor is private to the scheduler module. Implicit Thread::Drop is deliberately not a release path; if a Thread value reaches its destructor with a nonzero stack, it fails closed by leaving the frames allocated instead of freeing a stack without an off-stack witness.

FS Base And TLS

FS base is thread-owned. The existing ThreadControl.getFsBase and ThreadControl.setFsBase operations keep their names, but after threading they refer to the current thread, not the whole process. setFsBase continues to reject non-user-canonical values and writes the CPU FS-base MSR immediately when called by the running thread. Both methods route through context-aware dispatch (CapCallContext::caller_thread) so the operation always targets the caller, never a different thread; calling ThreadControl from a non-live caller returns ProcessFsBaseError::CallerNotLive.

The initial process thread uses the PT_TLS block installed by ELF loading. Additional threads receive an FS base from ThreadSpawner.create; the runtime is responsible for allocating and initializing each thread’s TLS/TCB data. There is no process-global FS base. Current-thread FS-base operations are useful for the single-thread runtime checkpoint, but they must not be treated as the final threading ABI for language runtimes. True multi-threaded Go or C/POSIX-like runtime support requires each ThreadRef to own a distinct TLS block and FS base.

Context switching must save the outgoing thread’s FS base and restore the next thread’s FS base even when both threads belong to the same process and no CR3 switch is needed.

Thread Identity In Waiters And Dispatch

The concrete identity type for in-process scheduling is:

#![allow(unused)]
fn main() {
ThreadRef {
    pid,
    process_generation,
    tid,
    thread_generation,
}
}

Process identity still governs authority and accounting, but wakeup and blocking state must name a thread. 7.2 changes context-aware capability dispatch so CapCallContext carries both the caller process id for authority checks and the caller ThreadRef for wake/cancel decisions. Existing pid-only records that can resume execution or write a caller CQE must be widened before multiple threads can run in one process.

The migration target is:

  • TimerSleepWaiter stores the sleeping ThreadRef and validates the generation before waking it;
  • endpoint CALL, RECV, RETURN target, deferred-cancel, current-caller, and direct IPC handoff records store the blocked or target ThreadRef;
  • terminal line input and any other ProcessWaiter consumer store the waiting ThreadRef and validate the generation before writing a CQE;
  • ProcessHandle.wait records the waiting ThreadRef while the handle still names the child process;
  • ThreadHandle.join records the waiting ThreadRef and the target ThreadRef;
  • cap_enter blocks the current ThreadRef on that thread’s ring endpoint;
  • process-exit cleanup cancels every waiter whose pid and process_generation match the exiting process, regardless of thread id.

A generation mismatch on wake or completion is a stale waiter and must be drained without writing to userspace. This mirrors current process-generation behavior and prevents one thread slot reuse from receiving another thread’s Timer, endpoint, join, or ring completion.

Exit And Join

The current exit(code) syscall terminates the current thread. This preserves single-thread process exit because the process exits when its last non-idle thread exits, and it avoids tearing down a shared address space while sibling threads are still current on other CPUs.

Thread exit does not add a new syscall. The initial implementation added ThreadControl.exitThread(code) as a terminal capability-ring operation on the current thread, with the same current-thread termination semantics as the raw syscall. A successful invocation does not post a CQE back to the exiting thread, because cap_enter will not return to that execution context. It records the exit code, wakes or completes any valid join waiter, and removes only the current thread from scheduling. If the last non-idle thread in a process exits through exit(code) or exitThread, the process exits with that thread’s code and completes the parent-facing ProcessHandle.

Whole-process termination remains a ProcessHandle operation. It releases the shared capability table, cancels process-owned endpoint state, removes timer/park/ring waiters for every thread in the process, and completes the parent-facing ProcessHandle after the process is no longer current on any CPU.

ThreadHandle.join is process-local and one-shot. If the target thread already exited and its status is retained, join returns its code immediately and marks the status joined. If it is still live, join blocks the caller’s thread until the target exits. Self-join returns Failed. A second waiter, join after a successful join, or join after detach returns Failed; it must not park an ambiguous waiter. ThreadHandle.exitCode is nonblocking and may observe the retained status while the handle is live, but it does not consume the one-shot join right.

Releasing the last ThreadHandle before the target exits detaches the target: the thread continues to run, but no exit status is retained after it exits unless a join waiter already pins the state. Releasing the handle after exit but before join drops the retained status and releases the thread-record charge. A pending join waiter pins the handle state until completion or process exit, so cap release cannot create a use-after-free. The exiting thread’s kernel stack must not be freed while it is still executing on that stack; final process teardown performs an explicit token-gated stack release after another kernel stack is active, before the deferred Process value is dropped.

Fatal user faults remain process-fatal in the first implementation. Per-thread fault isolation can be designed later, after the basic scheduler and futex paths are stable.

Capability Ring And Blocking

The first Ring v2 implementation keeps the initial thread’s compatibility ring at RING_VADDR and gives each spawned child thread a kernel-chosen ring mapping inside the reserved process ring arena. Runtime-selected ring address ranges remain a later VirtualMemory reservation extension.

ThreadSpawner.create allocates a ring record and user mapping for the new thread, stores that mapping on the child ThreadRef, and passes the ring address in the child start registers. cap_enter blocks the current thread against that thread’s own CQ, so same-process sibling threads may block in cap_enter independently. Timer, endpoint, join, park, and cancellation paths must route completions by generation-checked ThreadRef to the target thread’s ring endpoint.

The runtime’s single-owner ring-client invariant remains local to each ring client. Well-formed userspace serializes submission and completion matching per thread ring through capos-rt; it must not have two consumers racing on the same SQ/CQ. The scheduler still refuses to run the exact same ThreadRef on two CPUs at once, but it no longer treats every multithreaded pid as tied to one scheduler CPU.

This is sufficient for functional same-process sibling scheduling. The formal accepted 1-to-2 make run-thread-scale capOS evidence is the capos-bench 2026-05-02 21:38 UTC pair (work 1.883x, total 1.787x, both clearing the configured 1.6x gates). The guest result row’s accepted field remains diagnostic; the host summary enforces the work-window and total-time gates, and refuses speedup enforcement unless CAPOS_THREAD_SCALE_QEMU_TASKSET_CPUS records the QEMU CPU pin set. Linux validates the repaired benchmark shape through four workers on physical cores (3.963x/3.858x). That capOS 4-worker row was diagnostic (1.566x/1.538x) and justified Phase D’s per-CPU WFQ queues plus bounded stealing. The 2026-05-10 Phase D rerun recorded 1-to-4 work/total diagnostics 3.088x/2.700x, manually accepted for closeout; remaining risks are the shared scheduler lock, temporary CPU pinning, CQ/join/exit/block/schedule overhead, broader workload classes, and higher-thread-count evidence.

Scheduling Policy And Context Authority

SchedulingPolicyCap is the caller-thread-bound surface for WFQ knobs. Every method routes through CapCallContext::caller_thread; there is no per-cap-object ThreadHandle, no badge-encoded thread id, and no cross-thread mutation in this slice. Cross-thread authority is deferred to the privileged scheduler-policy service plan. The schema shape is:

interface SchedulingPolicyCap {
    setWeight @0 (weight :UInt16) -> ();
    setLatencyClass @1 (class :LatencyClass) -> ();
    snapshot @2 () -> (
        weight :UInt16,
        class :LatencyClass,
        runtimeNs :UInt64,
        virtualRuntimeNs :UInt64,
    );
}

setWeight validates against [MIN_WEIGHT, MAX_WEIGHT] at the cap boundary and updates the caller thread’s WFQ weight; the new weight applies to the next enqueue’s virtual_finish_ns tag and to subsequent virtual_runtime_ns accounting. setLatencyClass swaps the per-thread LatencyClass (Normal, Interactive, IpcServer, Batch) used to scale the dispatcher slice. snapshot is a read-only observer over the core WFQ state and does not expose the measure-only counters.

SchedulingContext is the schema-typed cap for dispatcher budget authority:

interface SchedulingContext {
    info @0 () -> (info :SchedulingContextInfo);
    create @1 (spec :SchedulingContextSpec) -> (
        contextIndex :UInt16,
        identity :SchedulingContextIdentity,
        result :SchedulingContextOperationResult,
        dispatchEffect :SchedulingContextDispatchEffect,
    );
    bindCallerThread @2 () -> (
        identity :SchedulingContextIdentity,
        binding :SchedulingContextBinding,
        result :SchedulingContextOperationResult,
        dispatchEffect :SchedulingContextDispatchEffect,
    );
    revoke @3 () -> (
        identity :SchedulingContextIdentity,
        previousGeneration :UInt64,
        result :SchedulingContextOperationResult,
        dispatchEffect :SchedulingContextDispatchEffect,
    );
    drainNotifications @4 () -> (
        notifications :SchedulingContextNotificationSnapshot,
    );
}

create returns a same-interface child context as transferred result cap 0 and becomes chargeable only after bindCallerThread. revoke bumps the generation and clears any matching thread binding; later calls through the stale cap generation report staleGeneration or fail closed before mutating scheduler state. drainNotifications reads the fixed per-context budget-depleted and deadline-or-timeout slots; the scheduler updates these in place from hard paths without allocation, including the holder identity and a donatedHolder bit for endpoint donation/return. The bootstrap manifest grants SchedulingPolicyCap and SchedulingContext only to focused-proof manifests; the default boot manifest does not grant them.

Userspace API Surface

The capos-rt runtime exposes the threading caps as typed clients on top of the per-thread ring:

  • ThreadControlClientget_fs_base/set_fs_base/exit_thread, including *_wait blocking variants over RuntimeRingClient.
  • ThreadSpawnerClient::create – submits the entry/stackTop/arg/ fsBase/flags ABI and returns an OwnedCapability<ThreadHandle> delivered as transferred result cap 0 in the CQE.
  • ThreadHandleClientjoin, exit_code (nonblocking observer), and their finish_* helpers; finish_join decodes the one-shot exit code.
  • SchedulingPolicyClientset_weight, set_latency_class, and snapshot, all caller-thread-bound.
  • SchedulingContextClientinfo, create, bind_caller_thread, revoke, and drain_notifications.

A typical spawn/join pseudocode against these clients is:

#![allow(unused)]
fn main() {
let handle = thread_spawner.create_wait(
    &mut ring,
    entry_addr,
    user_stack_top,
    arg,
    fs_base,
    /* flags */ 0,
    timeout_ns,
)?;
// ... runtime work on the parent thread ...
let exit_code = thread_handle
    .join_wait(&mut ring, timeout_ns)?;
}

The userspace runtime is responsible for the user stack, TLS/TCB, and any free-list bookkeeping for retired handles; the kernel only validates the ABI fields and charges the per-process ledgers.

Park Handoff

Park authority is defined in Park Authority. The scheduler changes above must leave room for a thread block reason that is not tied to the process ring CQ. The frozen handoff is:

  • park wait blocks the current thread, not the whole process;
  • park wake makes selected generation-checked ThreadRef values runnable;
  • timeouts use the same monotonic time base as Timer;
  • private park keys are based on address-space identity plus user virtual address;
  • shared-memory park keys are MemoryObject-derived identity plus offset;
  • the first implementation starts with compact CAP_OP_PARK and CAP_OP_UNPARK operations rather than generic Cap’n Proto methods;
  • park wait SQEs are thread-owned so ring dispatch cannot park a sibling thread under the waiter’s user_data;
  • blocking park wait is a syscall-context operation that releases runtime ring-client ownership before the thread parks, while capos-rt demultiplexes reserved park CQEs back to the waiting thread.

Pre-thread 4.5.4 measurement chose the compact capability-authorized shape for failed wait and empty wake. 4.5.5 measured the real blocked/resume path through thread-lifecycle under make run-measure, so the compact ParkSpace opcodes remain the runtime ABI target for this slice.

Security Invariants

  • A thread never owns a separate capability table in the initial model.
  • A thread cannot escape the authority of its containing process.
  • A ThreadHandle names only a thread in the same process and is non-transferable in the first implementation.
  • Thread creation is charged to one process-owned thread/kernel-stack ledger of record before the thread can become runnable.
  • Process exit releases shared authority once, after all live threads are removed from scheduling.
  • Per-process resource quotas are shared by all threads.
  • ThreadControl changes only the current thread’s FS base.
  • ThreadControl.exitThread terminates only the current thread and is a capability-ring operation, not a syscall.
  • Every waiter or direct handoff that can resume execution stores a generation checked ThreadRef.
  • Process-owned user-buffer validation/copy/read paths hold the process AddressSpace lock; future shared-memory thread primitives still need mapping provenance or object pins when they derive keys from shared backing.

Implementation Order

  1. Add internal Thread state, make each process own one initial thread, move saved context / kernel stack / FS base / block state onto that thread, and charge the initial thread against private process ledgers. Done 2026-04-24 23:09 UTC.
  2. Change scheduler queues, blocking, exit cleanup, and direct IPC targets from pid-oriented state to thread references while preserving one thread per process. Done 2026-04-24 23:33 UTC.
  3. Add ThreadSpawner, ThreadHandle, and ThreadControl.exitThread with a QEMU smoke for create, join, detach, self-join rejection, second join rejection, and last-thread process exit. Done 2026-04-25.
  4. Implement the ParkSpace private wait/wake path from Park Authority after the scheduler can block and wake individual threads, then run 4.5.5 blocked/resume measurements before declaring the park ABI stable. Done 2026-04-25.

Validation

The thread-lifecycle proof creates multiple threads in one process, proves they share the address space and CapSet, proves each has an independent FS base, rejects invalid join cases, joins one thread from another, and lets the last thread exit the process. The existing make run-spawn path keeps covering runtime-fs-base and single-thread-runtime so regressions in the pre-thread runtime contract stay visible. make run-measure additionally records the private ParkSpace blocked/resume timings and proves process exit with a parked park waiter. Phase D fairness/Interactive/weight-change smokes (make run-thread-fairness, make run-thread-fairness-interactive, make run-thread-fairness-weight-change) exercise the SchedulingPolicyCap caller-thread-bound surface; the thread-scale proof carries the recorded WFQ scaling evidence. The recorded 1-to-2 work/total speedup gate is the host-enforced Phase D acceptance criterion; the 1-to-4 row remains a manually accepted diagnostic. Safe runtime park wrappers and a focused SchedulingContext budget/donation/notification smoke remain future capos-rt and harness work.