Completion Rings And Threaded Runtimes
This note grounds the capOS ring/threading roadmap in existing completion I/O and futex designs. The question is not whether a shared CQ can be made to work with many waiting threads; it can. The question is which ownership model keeps the kernel ABI stable once capOS runs multiple process threads on multiple CPUs.
Sources Checked
- Linux
io_uring_enter(2)documents the aggregate wait shape: withIORING_ENTER_GETEVENTS, the syscall waits untilmin_completecompletion events are available. - Linux
io_uring_setup(2)documents SQPOLL, CQ sizing, and single-issuer-oriented task-run modes. - Linux
io_uring_register(2)documents registered wait regions. - Jens Axboe’s
io_uringpaper explains the core ring design as a pair of shared rings with single producer/single consumer ownership on each side anduser_datacopied from request to completion for matching. - Linux
futex(2)andfutex(7)document futexes as a kernel-assisted blocking path for synchronization objects whose uncontended state lives in user memory. - Microsoft I/O completion ports document the port model: threads wait on a completion port and dequeue completion packets, rather than each thread waiting directly on one specific operation’s storage slot.
Consequences For capOS
The current process-wide capOS ring matches the early io_uring shape: one SQ,
one CQ, and user_data for completion matching. That shape is efficient when
userspace serializes submission and completion consumption through one runtime
owner. It becomes the wrong primitive for full SMP if multiple kernel-scheduled
threads in the same process concurrently enter the kernel, because the ring
turns into a multi-producer/multi-consumer coordination problem.
Waiting for a raw CQ slot is not a good abstraction. CQ slots are circular
buffer storage and are reused. Stable wait identities are request user_data,
kernel answer ids, completion packets, or a completion queue/lane chosen at
submission time.
The clean full-SMP target is per-thread completion ownership. Each thread gets
its own capability ring endpoint: a complete SQ/CQ pair, even if multiple
endpoints are packed into one larger mapping. The existing
cap_enter(min_complete, timeout_ns) semantics can then remain aggregate:
min_complete counts completions available on the current thread’s CQ. Runtime
code still matches individual operations by user_data, but two sibling
threads no longer race to consume the same process CQ.
The Windows IOCP model is a useful counterpoint: a shared completion port works when the abstraction is explicitly a packet queue consumed by a worker pool. That is a runtime/service scheduling model, not the same thing as multiple threads blocking on one raw process CQ while each expects a private answer.
Recommended Direction
- Keep the current process ring as the bootstrap and compatibility surface.
- Add runtime reactor/demux support as an interim path for multithreaded runtimes that still use one process ring.
- Make the full SMP ABI a per-thread ring model:
- each
Threadowns one ring endpoint with a complete SQ/CQ pair; cap_enteroperates on the current thread’s ring;- SQPOLL, when enabled, is the sole kernel SQ consumer for that ring;
- result-cap transfers still mutate the process cap table;
- endpoint, timer, process-wait, thread-join, and futex completions post to
the waiting
ThreadRef’s ring.
- each
- Consider shared completion ports only as a userspace runtime/service abstraction above per-thread rings, not as the kernel’s first full-SMP ring ABI.
References
- Linux
io_uring_enter(2): https://man.archlinux.org/man/io_uring_enter.2.en - Linux
io_uring_setup(2): https://man7.org/linux/man-pages/man2/io_uring_setup.2.html - Linux
io_uring_register(2)registered wait regions: https://www.man7.org/linux/man-pages/man2/io_uring_register.2.html - Jens Axboe, “Efficient IO with io_uring”: https://www.kernel.dk/io_uring.pdf
- Linux
futex(2): https://man7.org/linux/man-pages/man2/futex.2.html - Linux
futex(7): https://man7.org/linux/man-pages/man7/futex.7.html - Microsoft I/O completion ports: https://learn.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports