Research: Multimedia Pipeline Latency

Survey of PipeWire and JACK design lessons for a capOS multimedia graph whose explicit goal is the minimal possible guaranteed-stable stack latency.

Goal

The capOS multimedia pipeline should optimize for the lowest end-to-end latency that capOS can guarantee stable under the selected workload, device, and routing graph. “Guaranteed stable” means the graph is admitted only when the kernel/services can reserve enough CPU, memory, device, and wakeup budget for every realtime cycle, and the graph fails closed when those guarantees can no longer be met. A graph that reports a smaller nominal buffer but produces xruns, underruns, clock drift, or large tail latency is worse than a graph with a slightly larger fixed quantum and a schedulability guarantee.

The target is not one universal latency number. The target is a measurable operating point with an explicit contract:

fixed sample rate and quantum for the realtime island;
bounded callback/process time per node;
bounded graph traversal time per cycle;
admitted worst-case execution budget for every node and bridge;
reserved memory and pre-registered buffers for the whole graph;
no allocation, blocking IPC, paging, logging, or credential checks on the realtime data path;
visible latency contribution per node, link, bridge, device, and provider;
admission rejection when the graph cannot fit the selected quantum;
fail-closed handling through bypass, silence, stream stop, or quantum renegotiation rather than unbounded queue growth;
policy that can choose “lowest stable” for pro audio and “efficient stable” for ordinary desktop/media playback.

This guarantee applies to local capOS-controlled realtime islands. It does not extend to browser scheduling, networks, or remote model/provider inference. Those parts can be measured, bounded by policy where possible, and isolated from the local graph, but not honestly guaranteed by capOS.

PipeWire Lessons

PipeWire separates graph configuration and IPC from realtime data processing. Its graph scheduling documentation describes a main thread for IPC and graph configuration and data processing threads that run with realtime priority. Node resources, buffers, I/O areas, and metadata are prepared in shared memory before realtime processing begins.

PipeWire also treats graph quantum and rate as first-class timing controls. Synchronous links can process in the same cycle, while asynchronous links add one cycle of latency. Its latency model propagates min/max latency through ports and adds latency when links or nodes introduce buffering.

Consequences for capOS:

Media graph control and media graph processing should be separate execution domains.
Buffers and metadata must be preallocated before the realtime cycle starts.
A link that crosses an isolation, clock, process, network, or wakeup boundary must declare its additional latency instead of hiding it.
Latency should be graph metadata, not an after-the-fact measurement only.
Quantum and rate are policy inputs, not incidental driver details.

JACK Lessons

JACK was designed for professional low-latency audio. Its API centers on a process callback invoked by the JACK server at the correct time, graph-order callbacks, xrun notification, and port latency ranges. JACK’s latency API asks clients to report min/max latency so applications can detect routing that has become anomalous or needs compensation.

Consequences for capOS:

A capOS native audio graph needs a cycle callback model for realtime nodes, even if the public API is capability-oriented rather than JACK-compatible C.
The realtime callback contract must be restrictive: no blocking endpoint calls, no dynamic allocation, no filesystem/name lookups, and no waiting for policy decisions.
Xruns and deadline misses are not debug trivia. They are first-class graph events that policy can use to increase quantum, disable expensive nodes, or move work to a different scheduling context.
Per-port latency ranges are more useful than a single optimistic value.

Guarantee Model

capOS should use a guarantee ladder rather than a single vague “low latency” mode:

Level	Meaning	Allowed uses
Best effort	No reserved budget; telemetry only	ordinary media, background capture
Bounded soft realtime	Deadlines and drops, but no formal admission proof	web shell voice, remote model paths
Guaranteed realtime island	Fixed quantum, admitted CPU/memory/device budgets, fail-closed overruns	native audio, local voice, pro-audio paths
Hard device deadline	Driver/device deadline is reserved and violation is treated as a system fault for that island	future dedicated hardware paths

The first serious multimedia milestone should target guaranteed realtime islands for local audio. Web shell and remote model voice should remain bounded soft realtime because the browser/provider/network portions are outside local control.

Admission should require:

every node declares worst-case execution time or a conservative budget;
every bridge declares buffering and wakeup latency;
every buffer pool is allocated and pinned/registered before start;
every realtime thread has a scheduling context with period, budget, and priority;
graph topology is frozen for the active cycle plan;
overrun policy is configured before start.

If admission fails, the graph does not start at that quantum. If a running graph misses its guarantee, the system records a violation and applies the configured fail-closed policy instead of preserving continuity by accumulating hidden latency.

Stack Latency Model

For capOS, “stack latency” should be modeled as a composed budget:

flowchart LR
    DeviceIn[ADC / capture device] --> DriverIn[driver period]
    DriverIn --> CaptureRing[capture ring]
    CaptureRing --> Graph[media graph quantum cycles]
    Graph --> Bridge[process / isolation / network bridges]
    Bridge --> Codec[codec / resampler / model adapter]
    Codec --> PlaybackRing[playback ring]
    PlaybackRing --> DriverOut[driver period]
    DriverOut --> DeviceOut[DAC / playback device]

Each edge should carry:

latency min/max in frames or nanoseconds;
clock domain;
quantum/rate;
buffering depth;
deadline;
drift estimate;
xrun/drop counters.

The useful metric is not just nominal round-trip latency. For guaranteed islands it is the admitted latency bound plus violation count. For softer paths it is nominal latency, p95/p99 process-cycle latency, worst observed cycle over a window, xrun rate, and drift between capture and playback clocks.

capOS Media Graph Shape

The multimedia graph should be a userspace service family:

flowchart LR
    Control[control plane endpoint] --> GraphManager[MediaGraphManager]
    GraphManager --> Policy[latency / route / permission policy]
    GraphManager --> Nodes[node services]
    Nodes --> Rings[MemoryObject media rings]
    Rings --> Driver[audio/video driver services]
    Rings --> Apps[application nodes]
    Rings --> Provider[realtime model provider nodes]

The control plane may use normal capability endpoints. The data plane should use shared MemoryObject rings plus futex/notification wakeups. Cap’n Proto messages remain appropriate for graph setup, route changes, permission checks, and telemetry, but not for per-frame audio payload copying.

Node classes:

driver node: owns device-facing caps such as DeviceMmio, DMAPool, and Interrupt;
graph driver node: provides the cycle clock for a realtime island;
transform node: resampler, mixer, echo canceller, VAD, format converter;
app node: user application capture/playback endpoint;
bridge node: crosses process, clock, network, provider, or web boundary;
realtime model node: provider/local model adapter that consumes and emits media plus control events.

Guaranteed Realtime Islands

capOS should not try to make the whole desktop one realtime graph. It should support small realtime islands with explicit rate/quantum policy:

pro-audio island: low quantum, strict admission, few nodes, no remote model hop in the realtime loop;
voice-agent island: low enough latency for conversation, with VAD/barge-in priority and bounded buffering;
ordinary media island: efficient quantum and power policy;
screen/video island: frame-deadline oriented rather than audio-period oriented.

Bridges between islands are allowed, but each bridge declares the latency it adds. A bridge from a guaranteed island to a non-guaranteed island must not backpressure the guaranteed island. It may drop, resample, replace with silence, or move to a larger negotiated quantum, but it must not create an unbounded queue. This is the PipeWire/JACK lesson in capOS terms: do not hide async links.

Scheduling Implications

Per-SQE deadlines are useful for stale work handling, but they are not enough for guaranteed multimedia latency. The graph needs future scheduling contexts:

period: graph quantum duration;
budget: maximum CPU time per period for a node or node group;
priority: realtime island priority relative to other interactive work;
affinity: optional CPU isolation for device and graph threads;
overrun policy: drop, silence, bypass node, increase quantum, or stop graph.

Until scheduling contexts exist, capOS can only prototype bounded soft realtime. The design should still attach monotonic deadlines to media buffers and SQEs so late work is discarded deterministically instead of accumulating hidden latency, but documentation should not claim a local guarantee before admission and budget reservation exist.

Web Shell And Remote Models

Web shell voice and remote realtime models cannot provide guaranteed local stack latency across the full path. Browser scheduling, WebRTC/WebSocket transport, provider inference, and network jitter all sit outside capOS control.

The capOS goal still applies: guarantee the part of the stack capOS controls when it is inside an admitted realtime island, then expose the rest as measured latency and jitter:

browser capture/playback buffer estimates;
gateway queue depth;
provider adapter send/receive jitter;
model first-audio latency;
tool-call pause duration;
barge-in cancellation latency;
playback underrun/drop counters.

This argues for a local media graph even when the model session is provider native. The local graph is where capOS can enforce bounded buffers, drops, deadlines, and audit.

Design Rules

Prefer fixed quantum inside a realtime island.
Reject graph activation or graph changes that cannot be admitted at the selected quantum unless policy explicitly relaxes the guarantee.
Treat every async boundary as one or more declared latency cycles.
Keep realtime callbacks pure data processing.
Move permission checks, tool execution, logging, graph mutation, and model policy to non-realtime threads.
Preallocate buffers and register memory before starting the graph.
Use latency ranges and measured telemetry, not a single optimistic latency.
Provide fail-closed policy that stops, bypasses, silences, or renegotiates quantum when a guarantee is violated, rather than letting queues grow.
Preserve capability isolation even when it costs a cycle; make the cost explicit and measurable.
Keep pro-audio/local paths independent from remote-provider voice paths.

Open Questions

What is the first capOS-visible latency target: voice shell, local playback, or pro-audio loopback?
Should graph-driver threads live in a privileged media service, or can an application own a realtime island under broker policy?
How should admission control estimate whether a new node can fit a quantum before activating it?
Should bridge latency be specified by policy, measured dynamically, or both?
Which telemetry window should determine when a bounded-soft-realtime path should switch to a larger quantum?
How should future CPU donation interact with graph scheduling contexts?

References

PipeWire, Graph Scheduling
PipeWire, Latency support
PipeWire, jack.conf
JACK Audio Connection Kit, API overview
JACK Audio Connection Kit, Setting Client Callbacks
JACK Audio Connection Kit, Managing and determining latency

Keyboard shortcuts

capOS Documentation