Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Completion Rings And Threaded Runtimes

This note grounds the capOS ring/threading roadmap in existing completion I/O and futex designs. The question is not whether a shared CQ can be made to work with many waiting threads; it can. The question is which ownership model keeps the kernel ABI stable once capOS runs multiple process threads on multiple CPUs.

Sources Checked

  • Linux io_uring_enter(2) documents the aggregate wait shape: with IORING_ENTER_GETEVENTS, the syscall waits until min_complete completion events are available.
  • Linux io_uring_setup(2) documents SQPOLL, CQ sizing, and single-issuer-oriented task-run modes.
  • Linux io_uring_register(2) documents registered wait regions.
  • Jens Axboe’s io_uring paper explains the core ring design as a pair of shared rings with single producer/single consumer ownership on each side and user_data copied from request to completion for matching.
  • Linux futex(2) and futex(7) document futexes as a kernel-assisted blocking path for synchronization objects whose uncontended state lives in user memory.
  • Microsoft I/O completion ports document the port model: threads wait on a completion port and dequeue completion packets, rather than each thread waiting directly on one specific operation’s storage slot.

Consequences For capOS

The current process-wide capOS ring matches the early io_uring shape: one SQ, one CQ, and user_data for completion matching. That shape is efficient when userspace serializes submission and completion consumption through one runtime owner. It becomes the wrong primitive for full SMP if multiple kernel-scheduled threads in the same process concurrently enter the kernel, because the ring turns into a multi-producer/multi-consumer coordination problem.

Waiting for a raw CQ slot is not a good abstraction. CQ slots are circular buffer storage and are reused. Stable wait identities are request user_data, kernel answer ids, completion packets, or a completion queue/lane chosen at submission time.

The clean full-SMP target is per-thread completion ownership. Each thread gets its own capability ring endpoint: a complete SQ/CQ pair, even if multiple endpoints are packed into one larger mapping. The existing cap_enter(min_complete, timeout_ns) semantics can then remain aggregate: min_complete counts completions available on the current thread’s CQ. Runtime code still matches individual operations by user_data, but two sibling threads no longer race to consume the same process CQ.

The Windows IOCP model is a useful counterpoint: a shared completion port works when the abstraction is explicitly a packet queue consumed by a worker pool. That is a runtime/service scheduling model, not the same thing as multiple threads blocking on one raw process CQ while each expects a private answer.

  1. Keep the current process ring as the bootstrap and compatibility surface.
  2. Add runtime reactor/demux support as an interim path for multithreaded runtimes that still use one process ring.
  3. Make the full SMP ABI a per-thread ring model:
    • each Thread owns one ring endpoint with a complete SQ/CQ pair;
    • cap_enter operates on the current thread’s ring;
    • SQPOLL, when enabled, is the sole kernel SQ consumer for that ring;
    • result-cap transfers still mutate the process cap table;
    • endpoint, timer, process-wait, thread-join, and futex completions post to the waiting ThreadRef’s ring.
  4. Consider shared completion ports only as a userspace runtime/service abstraction above per-thread rings, not as the kernel’s first full-SMP ring ABI.

References