Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scope Decision: Real-GCE “Reachable Network Stack” – Raw-Frame TX/RX vs L4 Sockets

Decision

Option A. For the second cloud milestone (“usable cloud instance”, docs/backlog/hardware-boot-storage.md), the network data-path reachability bar – “a reachable network data path” / “reachable network stack” – means raw-frame (ethernet) TX/RX reachability over the live GCE NIC: the production polled userspace virtio-net provider exchanging frames over the real function is the reachability proof. Slices 1-4 of the GCE polling-path track plus the slice-6 billable boot close that data-path reachability bar.

L4 sockets (TCP/UDP reachable from a userspace application) are a separate future track – networking-proposal Phase C – and are explicitly not a real-GCE-boot data-path blocker. This decision does not start that track; it records that the track exists, is sequenced after the milestone, and is gated by its own Phase C prerequisites rather than by the cloud usable-instance data-path bar.

Scope boundary: data-path reachability vs L4 terminal access

The milestone bullet (docs/backlog/hardware-boot-storage.md, “Second cloud milestone: usable cloud instance”) states two network requirements, not one: add network drivers and “prove SSH/WebShell or other network terminal access over the cloud NIC.” SSH and WebShell are inherently L4 (TCP) – a raw frame cannot carry an SSH session. Option A therefore disambiguates only the first requirement (the network data path / “reachable network stack”, which is also what the billable gate checks). It does not claim that raw frames satisfy the SSH/WebShell terminal-access requirement. L4 network terminal access (SSH/WebShell) is deferred to Phase C and is tracked there; the operator access path demonstrated today is the serial-console shell (cloudboot access-path serial-console-shell marker), not a network terminal. Option A is thus a deliberate re-scoping of the milestone’s network-reachability gate down to the raw-frame data path, with L4 terminal access sequenced after the milestone – not a claim that the milestone delivers SSH/WebShell.

Rationale

The decisive principle for the data-path bar: the milestone’s automatically gated network proof is whatever the billable harness actually checks. The billable gate is make cloudboot-test (tools/cloudboot/run-test.sh). Reading that harness directly settles the ambiguity in the “reachable network stack” phrasing in one observation – it never checks an L4 socket round-trip. (The milestone’s separate SSH/WebShell terminal-access requirement is not harness-gated today and is handled under “Scope boundary” above: deferred to Phase C.)

What the cloudboot harness actually gates on

run-test.sh has exactly two success gates over kernel network behavior, and both are below the L4 layer:

  • Boot landmark. run-test.sh:BOOT_LANDMARK is the literal string capos kernel starting; main’s step 5 polls the serial port until that landmark appears (run-test.sh:main, the grep -q "${BOOT_LANDMARK}" poll loop). No TCP, no UDP, no handshake.
  • Provider-NIC proof (optional, raw-frame). Under --require-provider-nic-proof (run-test.sh:REQUIRE_NIC_PROOF), the run fails unless the serial output contains the run-test.sh:NIC_PROOF_MARKER line (cloudboot-evidence: provider-nic-bound <token>). The gate is pure marker presence (serial_marker_tokens "${NIC_PROOF_MARKER}" non-empty); it parses no socket state and performs no connect/send/recv against the instance.

The provider-nic-bound marker is, by its own documented contract (tools/cloudboot/README.md, “Serial evidence-marker contract”), a raw-frame bind proof: the non-qemu kernel composes the DeviceMmio + DMAPool/DMABuffer + MSI-X Interrupt grant proofs over one virtio function, programs the MSI-X table entry, and tears down with stale-handle assertions. It explicitly does NOT write any virtio common-config register, does NOT activate the device, and emits a summary line recording device_autonomous_raise=not-attempted. There is no IP address, no socket, and no L4 protocol anywhere in the marker contract. The harness’s structured provider.json schema (tools/cloudboot/README.md, “provider.json schema”) likewise has no TCP/UDP/socket/L4 field – the network-facing fields are provider_nic_proof, enumerated_device_classes, enumerated_device_inventory, dma_pool_grant, interrupt_route_allocated, interrupt_route_delivered, and storage_bind_proof, all device/frame-level.

Choosing Option B would mean adopting a milestone acceptance bar (an L4 socket round-trip) that the billable gate does not enforce, and blocking the milestone on a large Phase C chain that the milestone’s own proof substrate never exercises. That is not an honest reading of the gate.

What the production polled path can and cannot reach today

Can reach (raw frame): kernel/src/cap/virtio_net_polled_provider.rs is the always-built (non-qemu) production provider. It exercises raw-frame DMABuffer movement over the live virtio function: the provider submits the brokered RX receive buffer and observes its completion by polling the used ring (InterruptCapVirtioNetPolledProvider::invoke_wait reads the latched PublishedRx used.idx/used[0] captured in attempt_rx_submit), with zero interrupts – no device_interrupt::wait_kernel_injected_dispatch, no inject_real_lapic_int_for_proof on the wait/ack path. The TX leg is a kernel-half SLIRP stimulus (a manager-owned broadcast-ARP frame authored on queue 1 to elicit the inbound reply, attempt_rx_submit “Stimulus” step), not a provider-submitted frame. One real device->host RX DMA of used_len=76 (an ethernet frame, ethertype 0x0806 ARP) has been observed this way. This is the ethernet-frame level: frames traverse the live function in both directions, with the provider owning the RX receive path.

Cannot reach (L4): there is no TCP/UDP socket layer in the production data path. The entire L4 surface is cfg(feature = "qemu")-gated and replaced in the cloud kernel by kernel/src/virtio_stub.rs, whose socket entry points all fail closed:

  • virtio_stub.rs:create_tcp_listener -> NetworkError::DeviceUnavailable
  • virtio_stub.rs:connect_tcp_ipv4 -> NetworkError::DeviceUnavailable
  • virtio_stub.rs:create_udp_socket -> NetworkError::DeviceUnavailable
  • virtio_stub.rs:send_tcp / recv_tcp -> NetworkError::InvalidSocket
  • virtio_stub.rs:accept_tcp -> NetworkError::InvalidListener
  • virtio_stub.rs:network_config -> all-zero addr/netmask/gateway
  • virtio_stub.rs:poll_scheduler -> no-op

The cap/network.rs TCP/UDP socket CapObject family (TcpListener/TcpSocket/UdpSocket, deferred accept/recv waiters, the socket-terminal handoff) is wired to crate::virtio::poll_scheduler – i.e. to the stub in production – so in the cloud kernel a userspace caller holding a socket cap gets DeviceUnavailable/InvalidSocket, not a connection. The in-kernel smoltcp stack, TCP listeners, accepted-socket state, the cooked-mode line discipline, and the Telnet IAC filter live only in the cfg(qemu) kernel/src/virtio.rs build.

Why Option B is genuinely a separate, larger track

Option B is networking-proposal Part 3: Userspace Decomposition (Phase C): relocating smoltcp and the cap/network.rs socket caps out of the cfg(qemu) kernel/src/virtio.rs into a userspace NIC-driver process (holding DeviceMmio/Interrupt/DMAPool) and a userspace network-stack process (holding the Nic cap + Timer), with applications holding socket caps. Its declared exit criterion is “the kernel contains no smoltcp dependency and no virtio-net code on the hot path.” Its prerequisite table (networking-proposal “Phase C prerequisites”) requires production grantable DMAPool/DeviceMmio/Interrupt lifecycles, real provider-driver interrupt wait/ack/mask/unmask consumption, durable audit consumption, an IOMMU domain or explicit production bounce-buffer policy, and full driver ownership handoff – and the proposal itself states current DDF evidence is “narrower than these Phase C prerequisites.” This is a multi-slice chain, not a finishing touch on the milestone.

Sequencing it after the milestone is also consistent with the GCE polling-path decision already recorded in the backlog (2026-06-01): the production data path is polled, device-autonomous MSI-X is a parallel efficiency follow-up, and the milestone is deliberately decoupled from interrupt delivery. Raw-frame reachability is the layer that decision already commits to; L4 sits above it.

Consequence

  • Slices 1-4 (the real polled provider, its default-manifest graduation, the real provider-nic-bound source, and the polled-provider stale-authority teardown) plus slice 6 (the billable make cloudboot-test --require-provider-nic-proof boot) close the usable-cloud-instance milestone’s network data-path reachability bar – the requirement the billable gate actually checks.
  • The milestone’s separate SSH/WebShell / network terminal access requirement is not closed by these slices; it is L4 and is deferred to Phase C as future work. The access path demonstrated on the current cloud kernel is the serial-console shell, not a network terminal.
  • L4 sockets remain future work under networking-proposal Phase C, gated by the Phase C prerequisites, not by the data-path bar. No child task chain is created by this decision; Phase C is tracked where it already lives (the networking proposal and the DDF Task 5/6 prerequisites in docs/backlog/hardware-boot-storage.md).

2026-06-08 Follow-Up: Phase C Web UI Chain

The later Phase C serve-from-userspace proof does not reopen the 2026-06-02 raw-frame-vs-L4 decision above. That decision remains the historical scope record for the closed usable-cloud-instance raw-frame data-path bar. The selected milestone has since moved to GCE Self-Hosted Web UI, whose proof chain owns L4 and Web UI reachability through separate task records.

The relevant Phase C design home is Phase C Userspace NIC Driver Relocation. Its local 7c proof is now landed in cloud-prod-userspace-network-stack-smoltcp-local-proof: the non-qemu cloudboot manifest starts the userspace smoltcp network-stack process, serves a scoped TcpListenAuthority, and completes one local host-forwarded TCP request/response through served TcpListener/TcpSocket caps. That is local cloudboot L4 evidence, not private GCE reachability and not public operator ingress.

The current Web UI ladder is task-owned:

This follow-up changes documentation scope only. It does not change any remaining task status, selected milestone, cloud resource posture, public ingress authority, TLS custody, or production release authority.

Inputs weighed

  • tools/cloudboot/run-test.sh (BOOT_LANDMARK, NIC_PROOF_MARKER, REQUIRE_NIC_PROOF, main, PROVIDER_JSON_REQUIRED_KEYS) and tools/cloudboot/README.md (“Serial evidence-marker contract”, “provider.json schema”, “Gate semantics”) – the billable gate, and the single most decisive input.
  • kernel/src/virtio_stub.rs – the production L4 surface (all socket entry points fail closed).
  • kernel/src/cap/network.rs – the L4 socket CapObject contract, wired to the stubbed poll_scheduler in production.
  • kernel/src/cap/virtio_net_polled_provider.rs – the always-built raw-frame polled provider (real device->host RX DMA used_len=76, zero interrupts).
  • docs/proposals/networking-proposal.md, Part 3 (Phase C architecture, prerequisites, exit criteria) – the scope of Option B.
  • docs/backlog/hardware-boot-storage.md, “Cloud Device Tracks – Real GCE Polling Path (decoupled from MSI-X)” – the track this decision is slice 5 of.