Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GCE gVNIC (Google Virtual Ethernet)

This is a provenance map for gVNIC, the Google Virtual NIC presented to Compute Engine guests. It cites the public specification basis, summarizes only the wire-format subset a capOS driver would implement, and maps the device onto capOS’s userspace-driver hardware-authority gate. It is not a re-spec: where the behavior is defined in the upstream driver or the public docs, it links rather than transcribing register tables.

Maturity caveat. This page remains primarily a grounding map. capOS has landed live-GCE proofs that request the GVNIC image/instance posture, record the gVNIC PCI function (1ae0:0042) with BAR and MSI-X metadata, map BAR0 through DeviceMmio, use manager-owned DMA pages for the admin queue and descriptor buffer, and bring up one GQI/QPL TX/RX queue pair far enough to send one DHCP DISCOVER raw Ethernet frame and receive one inbound IPv4 frame before teardown. capOS also has a bounded hardware-only typed Nic adaptation proof over that same queue path: the proof marker records Nic.transmit, Nic.receive, Nic.macAddress, and Nic.linkStatus semantics with inline frame transfer and no host-physical/IOVA export. capOS still has no reusable gVNIC provider service and no host conformance suite. There is no gVNIC device model in QEMU, so unlike the virtio-net path there is no local make run-* smoke that can execute the device. The ## 3. capOS mapping section distinguishes the landed inventory/admin-queue/raw-frame proof and typed Nic-adaptation proof from future productionization work. The bounded implementation lane that consumes this map is decomposed in Hardware, Boot, and Storage.

gVNIC is a separate GCE portability lane, not a blocker for the first public Web UI proof. GCE exposes a selectable VIRTIO_NET NIC type on supported first/second-generation machine families, and capOS already drives modern virtio-net (see virtio-net). A first public Web UI proof scoped to a virtio-compatible GCE machine type needs no gVNIC support. gVNIC matters because Google documents it as the Compute Engine NIC alternative to virtio, with third-generation-and-later machine series supporting only gVNIC for virtual network interfaces; it is the portability lane for those shapes, not a precondition for the virtio-net Web UI proof.

1. Spec basis

  • Device: Google Virtual NIC (gVNIC), the modern Compute Engine virtual network interface. Exposed to the guest as a PCI function with vendor 0x1ae0 (Google) and device 0x0042. The same vendor/device pair is recorded for the GCP NIC path in Cloud Deployment (“PCI Device IDs for Cloud Hardware”). The upstream Linux driver names the device family GVE (Google Virtual Ethernet).
  • Authoritative spec: gVNIC has no freely published register specification. The basis of record is the combination of:
  • Reference driver: the upstream GVE Linux driver (drivers/net/ethernet/google/gve/) is the behavior cross-check for the admin-queue handshake, queue creation, and the two descriptor formats.

2. Wire format (subset a capOS driver would implement)

The subset below is the slow-path bring-up plus one traffic-queue format a minimal capOS gVNIC driver would need. Exact register offsets, opcode numbers, and descriptor bit layouts are defined in the GVE headers cited above and are not transcribed here — this is a map, not a re-spec. Endianness is not uniform on this device: admin-queue messages and GQI descriptors are big-endian, while DQO descriptors are little-endian (per the GVE driver docs), so a capOS decoder/encoder must select endianness per structure.

  • Registers / BARs: three 32-bit memory BARs.
    • BAR0 — device configuration and status registers (the gve_register.h block): GVE_DEVICE_STATUS / driver-status handshake, max TX/RX queue counts, the admin-queue PFN and doorbell, the admin-queue event counter, and the reset trigger.
    • BAR1 — the MSI-X vector table.
    • BAR2 — the IRQ doorbells plus the per-queue RX and TX doorbells.
  • Admin queue (AQ): a single page-sized command array. The driver writes a command into a free slot, advances its submission counter, rings the admin-queue doorbell in BAR0, and polls the admin-queue event counter until the device marks the command executed and writes back its status. The gve_adminq.h opcode space covers device description and resource lifecycle (describe device, configure/deconfigure device resources, register/unregister page list, create/destroy TX queue, create/destroy RX queue, and feature/option negotiation). The landed capOS proofs register the AQ page, issue DESCRIBE_DEVICE, parse the returned descriptor and GQI/QPL option, configure device resources with two notification blocks, register TX/RX queue page lists, create one TX and one RX queue, then destroy/unregister/deconfigure and release the admin queue before emitting evidence.
  • Interrupt classes: MSI-X only, in two roles.
    • A management interrupt that tells the driver to re-examine GVE_DEVICE_STATUS (link / device-state changes).
    • Notification-block interrupts, one block servicing a set of traffic queues; a block firing tells the driver to poll the associated queues. The notification blocks are the per-queue completion-signal path.
  • Queue formats (GQI vs DQO): gVNIC defines two mutually incompatible descriptor formats; a device instance negotiates one.
    • GQI (“Google Queue Interface”): fixed-size, power-of-two descriptor rings; the classic format. Big-endian descriptors.
    • DQO (“Descriptor Queue, Out-of-order”): split descriptor and completion queues with per-completion generation bits for ownership tracking and 16-bit tags identifying which posted buffer a completion refers to, allowing out-of-order completion. Little-endian descriptors. DQO is the format the newer machine families use.
  • Addressing modes (QPL vs RDA): independent of the descriptor format, each queue uses one of two buffer-addressing modes.
    • QPL (“queue page list”): the driver pre-registers a fixed set of guest pages with the device through the admin queue, and descriptors reference offsets into that registered page list rather than arbitrary guest physical addresses. The device only ever DMAs into pages the driver explicitly registered.
    • RDA (“raw DMA addressing”): descriptors carry guest DMA addresses directly, so the device can DMA to dynamically allocated guest memory.
  • Descriptor / ring ownership: the driver owns descriptor production and doorbell rings; the device owns completions. In GQI the device advances a completion/used position the driver reads; in DQO the device writes completion entries whose generation bit flips when the entry is the device’s to consume, so the driver detects new completions without a separate tail register.
  • Reset / link-up sequence: bring-up drives the BAR0 device-status / driver-status handshake, sets up the admin queue (legacy revision: program the AQ PFN; newer revisions: program AQ length/base and set driver-status RUN), issues the admin commands above to describe the device and create queues, and arms the notification-block interrupts. Teardown follows the upstream driver: legacy revision writes 0x0 to the AQ PFN and waits for it to read back zero; newer revisions write driver-status RESET and wait for DEVICE_IS_RESET.
  • Known unsupported / out-of-scope features: offloads (checksum, TSO/LRO, RSS hashing), jumbo frames, multi-queue scaling beyond a single TX/RX pair, and the RDA addressing mode are out of scope for an initial bring-up. The first capOS lane targets QPL addressing with one TX and one RX queue (see §3).

3. capOS mapping

gVNIC is a vendor-custom cloud NIC. capOS now exercises inventory, admin-queue/register, bounded raw-frame GQI/QPL TX/RX, and a bounded typed Nic-adaptation proof in private GCE runs. Productionization remains future work: there is no reusable gVNIC provider service, local device model, DQO/RDA support, or host conformance suite yet.

  • Authority gate: the gVNIC PCI function is inventoried over the production PCI enumeration source. The admin-queue proof binds BAR0 and a manager-owned DMA pool for one DESCRIBE_DEVICE command (kernel/src/cap/gvnic_adminq_register_proof.rs). The raw-frame proof (kernel/src/cap/gvnic_raw_frame_proof.rs) then uses the same device-manager authority model to configure one GQI/QPL TX/RX queue pair, transmit one DHCP DISCOVER, poll a bounded RX descriptor completion, and tear the queues down. The cloud_gce_gvnic_nic_cap_adaptation_proof build reuses that module’s report_nic_cap_adaptation path to prove the existing Nic ABI semantics over the same GQI/QPL data path: the marker records inline-frame Nic.transmit / Nic.receive, Nic.macAddress, and Nic.linkStatus evidence without exposing queue addresses or emitting the broader provider bind claim. Both proofs use kernel/src/pci.rs find_driver_bind_device for resolved-source driver enumeration and kernel/src/device_manager/stub.rs devicemmio_kernel_window_for_proof for the live BAR0 DeviceMmio window. They do not issue a reusable userspace gVNIC provider service and do not claim provider-nic-bound.
  • DeviceMmio: the landed proof stages BAR0 as a device-manager DeviceMmio record, bounds all big-endian register accesses to the staged window, rings the admin-queue doorbell, and detaches the record with a stale-handle assertion. The raw-frame proof also maps a bounded 64 KiB BAR2 kernel-only doorbell window and validates returned TX/RX doorbell indexes before ringing them. BAR1 MSI-X remains unprogrammed in this polling proof.
  • Interrupt: the management interrupt and each notification-block vector would each bind one Interrupt cap over an MSI-X table entry, with the same mask-first / deferred-LAPIC-EOI lifecycle the landed production interrupt path uses (kernel/src/device_interrupt.rs, exercised by the virtio-net userspace IRQ-ownership slice). gVNIC uses MSI-X exclusively — there is no legacy-IRQ fallback. The admin-queue proof does not program MSI-X.
  • DMAPool / DMABuffer: the admin-queue pages come from the manager-owned bounce-buffer pool through stage_bounce_buffer_dmapool_record and issue_manager_attached_dmabuffer_handle_with_request. The raw-frame proof keeps larger queue resources and QPL pages manager/proof-owned, publishes device-visible addresses only internally to the hardware, and never grants userspace a DMABuffer cap or raw host-physical/IOVA value. It asserts DmaBufferCap::info_for_handle reports host_physical_user_visible=0, device_iova=0, and iova_export=disabled-future-only. Teardown destroys queues, unregisters both QPLs, deconfigures device resources, releases/resets the admin queue, scrubs/frees traffic frames, requires scrub/ledger removal/frame-free labels for manager buffers, and checks stale pool/buffer/MMIO handles. Future reusable gVNIC provider integration must use the same selected DMA backend model documented in DMA Isolation.
  • Fail-closed / validation rules: the landed proof emits cloudboot-evidence: gvnic-adminq-register <token> or cloudboot-evidence: gvnic-raw-frame-tx-rx <token> only after the bounded command/traffic sequence passes, the release/reset handshake completes, the PCI command register is restored, and stale DeviceMmio/DMAPool/DMABuffer handles all fail closed. The typed adaptation proof emits cloudboot-evidence: gvnic-nic-cap-adaptation <token> only after the same teardown and stale-handle checks plus Nic-semantic TX/RX evidence. If queue or admin-queue release times out, the proof intentionally leaves still-owned DMA pages live and emits no success marker rather than freeing memory the device may still own.
  • QEMU-emulable vs hardware-only: none of gVNIC is QEMU-emulable — QEMU has no gVNIC/GVE device model. Every bind step is therefore hardware-only and requires a private, explicitly billable GCE instance launched with the GVNIC guest-OS feature and nic-type=GVNIC. The lane is gated accordingly: the landed inventory proof (cloud-gce-gvnic-image-launch-inventory-proof), the landed admin-queue/register proof (cloud-gce-gvnic-adminq-register-proof), the landed bounded raw-frame TX/RX proof (cloud-gce-gvnic-raw-frame-tx-rx-proof), and the landed typed Nic adaptation proof (cloud-gce-gvnic-nic-cap-adaptation-proof). Each is decomposed in Hardware, Boot, and Storage and requires a private, explicitly billable GCE run for hardware evidence.