Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Network Usability And Post-smoltcp Backlog

This page decomposes the work that makes capOS networking usable after the Phase C userspace L4 stack exists. It deliberately sits beside the lower-layer Phase C track in Phase C Userspace NIC Driver Relocation and the cloud/Web UI chain in Hardware, Boot, and Storage.

The first public GCE Web UI path remains IPv4-first. Its network blockers are Phase C userspace L4, DHCP/IPv4 configuration, ARP/default-route reachability, private GCE proof, and the reviewed public HTTPS ingress posture. DNS, ping, IPv6, packet tracing, and advanced transport policy improve usability and diagnostics, but they do not block first public self-hosted Web UI unless a later ingress policy explicitly chooses them as health or routing requirements.

Current State Boundaries

  • Production non-qemu L4 has a local Phase C 7c-ii(b) serve-from-userspace proof: cloud-prod-userspace-network-stack-smoltcp-local-proof boots the non-qemu cloudboot manifest, grants an application client only a userspace-served TcpListenAuthority, and completes one hostfwd TCP request/response through served TcpListener/TcpSocket caps. The qemu-only kernel smoltcp / virtio-net path still exists for local fixtures and transitional TCP/UDP caps; the legacy kernel socket owner is cleanup-only after the served-socket proof.
  • The current Nic cap is raw-frame oriented. It copies frames as inline Data through manager-owned buffers and exposes no host-physical or device-usable address to userspace.
  • The landed Nic.receive @1 is single-frame per call: it posts one RX buffer, drains one frame (or resets the device on an empty poll), and frees the buffer – it keeps no pool of RX buffers armed between calls and has no non-resetting “no frame yet” path. Multi-frame asynchronous TCP needs a sustained, keep-armed receive, designed as the receivePoll @4 bounce-RX-pool primitive in Phase C Userspace NIC Driver Relocation and landed by cloud-prod-nic-driver-userspace-sustained-receive-pool-local-proof. That slice is the prerequisite for Phase C 7c-iii (TcpListener/TcpSocket).
  • The first local DHCP IPv4 configuration proof is done: cloud-prod-network-stack-dhcp-ipv4-config-local-proof follows the served userspace smoltcp/socket proof, acquires a DHCPv4 lease over the Nic cap, installs IPv4 address/default-route state, resolves gateway and same-subnet ARP neighbors, and feeds userspace-served NetworkManager.getConfig. Renewal/rebind/expiry lifecycle, DNS option publication, and operator-visible lease status remain follow-up work.
  • A POSIX DNS smoke exists: demos/posix-dns-resolver/. It manually builds one DNS A query and sends it through the kernel UdpSocket cap to QEMU slirp DNS at 10.0.2.3. It is not a system resolver service, not a typed DnsResolver cap, and not a getaddrinfo / /etc/resolv.conf bridge.
  • IPv6 is already decomposed as a separate lane in Hardware, Boot, and Storage. Do not duplicate that lane here; link to it when diagnostics or resolver work needs dual-stack behavior.

User-facing Stories

Usable networking means operators and ordinary services can answer concrete questions without reading QEMU logs or proof tokens. Each story below maps to the task record that owns it and is classified against the first public GCE Web UI critical path stated above: Critical path items block the first public self-hosted Web UI proof; Diagnostics and Completeness items improve usability but do not block it unless a later ingress policy explicitly promotes one (see the IPv4-first scoping in the page header and DHCP Plan below).

Two of these stories are satisfied by configuration proofs that already live in the Current State Boundaries and DHCP Plan sections rather than by a usability tool: an operator gets a non-fixture address, default route, and userspace-served config status from cloud-prod-network-stack-dhcp-ipv4-config-local-proof (the first local DHCPv4/IPv4 config proof, critical path), and the basic socket substrate a server binds against comes from the Phase C socket-cap and TcpListener/TcpSocket proofs, with production manifest wiring owned by cloud-prod-userspace-network-stack-smoltcp-local-proof (critical path). The usability tasks below layer status, resolution, diagnostics, and server semantics on top of those.

Operator stories

What an operator needs to observe and diagnose the running network without holding raw NIC, DMA, or NetworkManager authority:

Operator storyOwning task recordWeb UI critical path
What interfaces exist, is link up, what MAC/address/prefix/default route/DNS config is active, and did it come from DHCP, static manifest, or a test fixture?network-operator-status-tool-local-proofDiagnostics (non-blocking)
Which sockets/listeners are active, which authority granted them, what peer/port is bound, and are calls blocked on accept/recv/send/backpressure?network-operator-status-tool-local-proof over network-transport-status-cap-local-proof (done)Diagnostics (non-blocking)
Does the stack publish DHCP-derived IPv4 address, default route, and gateway-neighbor state instead of a static fixture?cloud-prod-network-stack-dhcp-ipv4-config-local-proofCritical path
Can a service bind a listener after boot without depending on the static QEMU 10.0.2.15 assumption?cloud-prod-remote-session-web-ui-l4-local-proof over the done DHCP config proof and Phase C socket capsCritical path
Is a DHCP lease active, and what are its renewal/rebind/expiry state and operator-visible status?network-dhcpv4-lease-lifecycle-local-proof (done)Completeness (non-blocking)
Can an operator run bounded ping / route / DNS-lookup / socket-status checks?network-ping-diagnostics-tool-local-proof (done), network-operator-status-tool-local-proofDiagnostics (non-blocking)
Can an operator run bounded IPv6 ping6?network-ping6-diagnostics-tool-local-proof (over the IPv6 lane)Diagnostics (non-blocking)
Can a debugging authority capture bounded per-interface packets/summaries without arbitrary NIC, DMA, or raw network-manager authority?network-packet-trace-authority-local-proofDiagnostics (non-blocking)

Application stories

What an ordinary service or POSIX program needs to use the network through narrowly-scoped capabilities instead of raw socket/manager authority:

Application storyOwning task recordWeb UI critical path
Can a process resolve a hostname through a typed resolver capability instead of holding raw UDP socket authority?network-system-dnsresolver-cap-local-proof (done)Completeness (non-blocking)
Can POSIX software call getaddrinfo and read resolver config through the adapter without owning a broader NetworkManager?posix-getaddrinfo-system-resolver-bridge-local-proofCompleteness (non-blocking)
Can a long-lived server rely on readiness, cancellation, and backpressure instead of assuming every socket call eventually completes?network-socket-readiness-poll-cancel-backpressure-local-proof (done)Completeness (non-blocking)
Can POSIX software wait for socket readiness through poll/select over the settled readiness model?posix-socket-poll-select-bridge-local-proof (done)Completeness (non-blocking)
Can a server set keepalive and connect/accept/recv timeouts?network-transport-keepalive-timeout-policy-local-proof (done)Completeness (non-blocking)
Can a server read connection state, backpressure depth, active keepalive/timeout, congestion controller, and interface MTU/MSS?network-transport-status-cap-local-proof (done)Completeness (non-blocking)

DNS resolution is listed as Completeness rather than Critical path because the selected public ingress can route to a backend by configured address/load-balancer target; it becomes a deployment-policy dependency only under the conditions in System Resolver Plan below.

DHCP Plan

DHCP belongs in the userspace network-stack process or a narrowly-authorized userspace configuration service, not in the kernel. The kernel should stage only the minimal capabilities needed to start the network stack and deliver socket/result caps. Lease parsing, renewal timers, rebind behavior, expiry, DNS/search-domain extraction, and status reporting are policy/state-machine work and should not be added to the qemu-only kernel smoltcp path.

The ordering is:

  1. Phase C slice 7a proves smoltcp can run in a userspace process over the Nic cap.
  2. Phase C 7b, 7c-i, 7c-ii(a), and 7c-iii prove the socket-cap and TcpListener/TcpSocket substrate; 7c-ii(b) locally proves the production manifest through the selected serve-from-userspace path.
  3. cloud-prod-network-stack-dhcp-ipv4-config-local-proof is done. It implements the first local DHCPv4 lease/configuration proof: lease acquisition, IPv4 address, prefix/netmask, default gateway, and ARP neighbor proof.
  4. network-dhcpv4-lease-lifecycle-local-proof is done. It extends that first proof into the full DHCPv4 lease lifecycle. A deterministic in-process fixture DHCP/ARP responder drives the real userspace smoltcp DHCPv4 client under a harness-controlled synthetic clock through initial lease acquisition, T1 unicast renewal, T2 broadcast rebind, and lease expiry; the served NetworkManager.getConfig status surface reports a fail-closed zero state on expiry (never stale lease data) and resolves static-config precedence over a live DHCP lease; DNS server and search-domain options are extracted from the wire and held as resolver inputs without being exposed through getConfig. Proof: make run-network-dhcpv4-lease-lifecycle. The real-network initial acquisition over the Nic cap stays proven by make run-cloud-prod-network-stack-dhcp-ipv4-config.

System Resolver Plan

capOS should expose DNS through a typed resolver capability, not by making every consumer hold NetworkManager or raw UDP authority. The first resolver should be a stub resolver service, not a recursive resolver:

  • Inputs: DHCP-provided nameserver/search-domain options from the IPv4 config path and optional static manifest resolver config.
  • Authority: one narrowly-scoped UDP socket or resolver-upstream authority plus Timer; no broad NetworkManager unless the slice explicitly justifies it.
  • Output: a typed DnsResolver cap with bounded query names, record types, timeouts, response-size limits, negative/error mapping, and observable configuration provenance.
  • POSIX bridge: getaddrinfo and a bounded /etc/resolv.conf projection call into DnsResolver; POSIX callers should not parse raw DHCP state or own upstream sockets.

The typed resolver capability landed as network-system-dnsresolver-cap-local-proof. The POSIX bridge landed as posix-getaddrinfo-system-resolver-bridge-local-proof: libcapos-posix now implements getaddrinfo / freeaddrinfo / gai_strerror over a granted dns_resolver endpoint (resolver status -> typed addrinfo/EAI_*; no ambient UDP fallback), plus a read-only /etc/resolv.conf projection derived from the resolver status (writes fail-closed EACCES, absent without the cap). Proof: make run-posix-getaddrinfo. AAAA / sockaddr_in6, AI_* flags, and an /etc/services table remain follow-ups (getaddrinfo fails closed on each: EAI_FAMILY / EAI_BADFLAGS / EAI_SERVICE).

DNS does not normally block the first GCE Web UI proof because the selected public ingress path can route to a backend by configured address/load-balancer target. DNS becomes a deployment-policy dependency when capOS itself must resolve outbound names, when the public proof asserts a DNS hostname end to end, or when IPv6 ingress adds AAAA/certificate policy.

Beyond smoltcp

The near-term plan is not to replace smoltcp or hand-roll TCP algorithms. Phase C should first move smoltcp out of the kernel, preserve the existing socket contract, and make its behavior observable. The distinction this lane keeps is between relocation (Phase C slices 7a-7c: run the selected smoltcp build in userspace and preserve the socket contract) and transport policy/status (the capOS control plane around that stack, decomposed below). Relocation does not require any new transport mechanic; the policy/status work starts only after the stack is observable.

What the selected smoltcp build actually exposes

smoltcp is pinned at version 0.13.0 (Cargo.lock). capOS does not build the crate’s default feature set; it enables narrow per-proof subsets:

  • The qemu-only kernel fixture (kernel/Cargo.toml) enables alloc, medium-ethernet, proto-ipv4, socket-tcp, and socket-udp.
  • The early Phase C userspace 7a/7b demos demos/cloud-prod-network-stack-process-smoltcp-skeleton-smoke and demos/cloud-prod-network-stack-smoltcp-socket-caps-smoke enable alloc, medium-ethernet, proto-ipv4, and socket-udp only. Those early demos are UDP-only and should not be read as the current full Phase C L4 status.
  • The later Phase C TCP proofs demos/cloud-prod-network-stack-smoltcp-tcp-listener-roundtrip-smoke and demos/cloud-prod-network-stack-smoltcp-tcp-socket-cap-ipc-smoke enable alloc, medium-ethernet, proto-ipv4, and socket-tcp. The completed cloud-prod-userspace-network-stack-smoltcp-local-proof builds on that substrate and proves a local served TcpListenAuthority / TcpListener / TcpSocket request/response through the userspace network stack.
  • The selected IPv4 Web UI path now has a local DHCP/IPv4 configuration proof over smoltcp’s socket-dhcpv4 path in the Phase C userspace stack. Landed proof stops at config/status, route, and ARP neighbor evidence; the local bounded ICMPv4 Echo Reply proof is also done for diagnostics. socket-dns, the operator IPv4 ping tool, the local remote-session-web-ui L4 proof, private GCE reachability, and public ingress/TLS remain separate gates.

None of the IPv4 TCP builds cited above enables socket-tcp-reno or socket-tcp-cubic. Those features are what compile smoltcp’s Reno and CUBIC controllers into the CongestionControl enum; without them the only available variant is CongestionControl::None, which is also smoltcp’s default. capOS therefore runs with no congestion control today as a consequence of its build configuration, not as a reviewed policy choice. Selecting Reno (or CUBIC, which uses f64) is a build-feature flip plus a set_congestion_control call, not a custom algorithm.

For read-only status, smoltcp’s TCP socket already exposes the introspection capOS would surface: connection state (state() over the TCP state machine), local_endpoint()/remote_endpoint(), liveness predicates (is_open/is_active/is_listening, may_send/may_recv, can_send/can_recv), buffer sizes (send_capacity/recv_capacity), and the current backpressure depth (send_queue/recv_queue bytes). Keepalive and idle timeout are policy setters with matching getters (keep_alive/set_keep_alive, timeout/set_timeout). There is no per-socket getter for negotiated MSS, RTT, or retransmission counts in 0.13.0; MTU is an Interface/phy::Device property, so MTU/MSS status must be sourced from the interface and device capabilities, not from the TCP socket.

Status capOS must surface

Read-only transport status the socket/listener caps should expose, each backed by an existing smoltcp getter (or interface property) so it records selected behavior rather than asserting new mechanics:

Statussmoltcp / interface source
Connection statetcp::Socket::state()
Local / remote endpointlocal_endpoint() / remote_endpoint()
Send/receive backpressure depthsend_queue() / recv_queue() vs send_capacity() / recv_capacity()
Readiness / livenessmay_send/may_recv, can_send/can_recv, is_active/is_listening
Active keepalive / idle timeoutkeep_alive() / timeout()
Active congestion controllercongestion_control() (today always None)
Interface MTU and configured-MTU sourceInterface/phy::Device capabilities, manifest config
Listener backlog pressureaccepted-socket count vs configured backlog
Close / error / reset reasonsocket close transition plus the cap/network.rs error mapping

v0 classification

  • v0 policy inputs (operator/service-settable): per-socket keepalive interval and connect/recv/idle timeout (smoltcp set_keep_alive / set_timeout plus connect/accept/recv deadlines), and listener backlog bound. These map to existing smoltcp setters and to call-level deadlines.
  • v0 read-only status: the status table above — exposed through the socket, listener, and NetworkManager-side status surface without letting callers mutate stack internals.
  • Deferred until workload evidence: congestion-control algorithm selection, path-MTU discovery, TCP-mechanic tuning (window scaling, Nagle/quickack policy), and any stack replacement. The default is to observe and surface the selected stack’s behavior first.

Decomposed follow-ups

  • Cancellation, readiness, close, and backpressure semantics are settled by network-socket-readiness-poll-cancel-backpressure-local-proof (done); the POSIX poll/select bridge over that model is settled by posix-socket-poll-select-bridge-local-proof (done). The settled readiness states map to POSIX event bits in the shared capos-rt::pollselect core (POLLIN/POLLOUT/POLLHUP/POLLERR/POLLNVAL, no stale readable/writable after close/release); the libcapos-posix C poll()/select() surface and <poll.h>/<sys/select.h> headers delegate to it and fail closed on unsupported event bits / bad nfds / closed fds. The proof is an in-process smoltcp fixture (harness=in-process-smoltcp-fixture, posix_surface=demo-local-model) plus the c-libc-surface C-surface checks; make run-posix-socket-poll-select. Blocking readiness (a Pollable cap) is the follow-up lane, since the v0 UdpSocket/Pipe caps expose no non-blocking readiness method.
  • Read-only transport status (the table above, including congestion-control reporting and interface MTU/MSS reporting) is settled by network-transport-status-cap-local-proof (done). The local proof is an in-process smoltcp fixture (harness=in-process-smoltcp-fixture, status_surface=demo-local-model); the production cap/schema wiring of the status surface is the follow-up lane.
  • Keepalive and connect/accept/recv timeout policy inputs are owned by network-transport-keepalive-timeout-policy-local-proof (done). The local proof is an in-process smoltcp fixture (harness=in-process-smoltcp-fixture, policy_surface=demo-local-model); the production cap/schema wiring of these inputs is the follow-up lane. That lane should model connection-refused as its own terminal call outcome (the v0 demo’s DeadlineWaiter only distinguishes timeout from a still-parked call, proving refused-vs-timeout distinctness at the socket layer rather than in the waiter abstraction).
  • Congestion-control evaluation is a deliberately deferred lane, not a runnable task. It may only open after the read-only transport-status proof lands and a workload produces evidence (loss/throughput/latency under a real capOS network server) that the default CongestionControl::None is inadequate. Its entry criteria are: a reproducible workload, a recorded baseline under None, and a decision to flip the socket-tcp-reno/ socket-tcp-cubic build feature (configuration, still not a custom algorithm) before any hand-rolled TCP mechanic is even considered. Replacing smoltcp’s TCP mechanics remains speculative until that evidence exists.

Task Lanes

Docs/status lanes (both done 2026-06-03):

Blocked behavior/read-side lanes:

  • network-operator-status-tool-local-proof is done: it adds the operator-visible ip addr / route / DNS / link / socket-state equivalent over the Phase C userspace stack. A network-stack server acquires a real IPv4 DHCP lease, snapshots link/MAC/address/prefix/ route/gateway-neighbour/DNS/search-domain/lease-state/socket state, and serves it over a read-only network_status endpoint to a separately spawned status tool. The tool holds no NetworkManager cap, prints a bounded status table reflecting the live stack state (distinguishing available DNS from the unavailable search domain), and observes the fail-closed rejection of a forged socket-creation call. Proof: make run-network-status-tool. Promoting the demo-local status surface to a first-class NetworkStatus schema interface is deferred (it would cross the schema/generated-bindings conflict domain).
  • network-dhcpv4-lease-lifecycle-local-proof is done: it extends the first DHCP config proof into a real lease lifecycle (renewal, rebind, expiry/fail-closed, static precedence, DNS option publication) via make run-network-dhcpv4-lease-lifecycle.
  • network-system-dnsresolver-cap-local-proof is done: it adds a typed DnsResolver capability with a strict cross-process authority split. A resolver server owns the upstream-DNS authority (it runs the query over a real smoltcp UDP socket against a configured upstream, with the upstream isolated in-process as a deterministic DNS responder under a synthetic clock), sources resolver config from a static-manifest entry plus a modelled DHCP option-6 entry with observable provenance, and serves a read-only DnsResolver endpoint. A separately spawned resolver tool holds only the endpoint – no NetworkManager, Nic, or UDP socket authority – so it resolves bounded A/AAAA hostnames through the cap and cannot resolve names by ambient network authority. The proof exercises a resolved A record, a resolved AAAA record (A/AAAA-capable API shape), NXDOMAIN -> not-found, a silent upstream -> typed timeout, fail-closed unavailable with no upstream config, a status surface reporting config source/active upstreams/last error (no packet payloads or raw DHCP leases), and the fail-closed rejection of a forged raw-upstream call on the read-only endpoint. No schema, kernel, or capos-rt change: like the operator status surface, the resolver endpoint is an interface-agnostic protocol local to the demo, and promoting it to a first-class DnsResolver schema interface is deferred to avoid the schema/generated-bindings conflict domain. Proof: make run-network-system-dnsresolver.
  • posix-getaddrinfo-system-resolver-bridge-local-proof bridges POSIX getaddrinfo / resolver configuration to DnsResolver.
  • network-ping-diagnostics-tool-local-proof is done: it adds the bounded local IPv4 ping diagnostics tool over the done ICMPv4 Echo Reply lane, proving same-subnet and gateway-routed echo success, malformed-reply drop, timeout/unreachable classification, and retry/payload bounds. Proof: make run-network-ping-tool.
  • network-ping6-diagnostics-tool-local-proof is done: it adds the bounded local IPv6 ping diagnostics tool over the existing IPv6/ICMPv6 lane without changing the IPv4-first Web UI critical path.
  • network-socket-readiness-poll-cancel-backpressure-local-proof is done: it settles usable server semantics for readiness (accept/read/write/closed/reset/config-unavailable), parked-call cancellation, close/stale-waiter rejection, and send/receive backpressure. A single proof process drives two real userspace smoltcp interfaces wired by an in-process frame shuttle under a synthetic clock and asserts each case straight from real smoltcp getters (state, may_*/can_*, *_queue vs *_capacity). Proof: make run-network-socket-readiness. The POSIX poll/select bridge landed in posix-socket-poll-select-bridge-local-proof (done), which exposes the surface only once implemented and proven.
  • network-transport-status-cap-local-proof is done: it surfaces read-only transport status (connection state, endpoints, send/recv backpressure depth, active keepalive/timeout, active congestion controller, interface link/IP MTU with MSS marked derived/not-exposed, listener backlog pressure, and the close/reset reason mapped onto the cap/network.rs NetworkError vocabulary) over the userspace stack, and proves the status read is strictly read-only (fingerprint unchanged, zero frames emitted). A single proof process drives two real userspace smoltcp interfaces wired by an in-process frame shuttle under a synthetic clock (harness=in-process-smoltcp-fixture, status_surface=demo-local-model). Proof: make run-network-transport-status; the production cap/schema wiring of the status surface is the follow-up lane.
  • network-transport-keepalive-timeout-policy-local-proof (done) adds keepalive and connect/accept/recv timeout policy inputs over the userspace stack.
  • network-packet-trace-authority-local-proof adds bounded per-interface packet/debug trace authority for diagnostics. The local proof (make run-network-packet-trace) feeds every transmit/receive frame of one real userspace-smoltcp DHCP bring-up path through a bounded PacketTrace: a fixed capture capacity (so the drop counter is exercised), a fixed per-packet header-only byte cap (payload_policy=header-only-no-body – at most the leading L2/L3/L4 header bytes of any frame are recorded, packet bodies are never captured), a direction filter, an expiry deadline, and capture/drop/admission counters with grant provenance (which authority enabled the trace and why). The captured trace is served over a read-only endpoint to a reader granted only a console plus that endpoint – no Nic, DMAPool, DeviceMmio, Interrupt, or NetworkManager – so the diagnostic authority is strictly observe-only: forged transmit/reconfigure/open-socket calls are rejected fail-closed, and a sibling probe holding no trace cap cannot observe any packet. Payload-visibility policy: the trace exposes only bounded headers for protocol diagnosis (DHCP/ARP/IPv4-UDP classification), never application payload, and it transfers no device or socket authority – this is why the authority is diagnostic-only and is grounded in Debug, Trace, and Profiling Authority (the read-only sampler authority class, not the read/write DebugSession class). Promotion to a first-class PacketTrace schema interface is deferred to avoid the schema/generated-bindings conflict domain, matching the sibling status/DNS diagnostic proofs.