# GCE gVNIC (Google Virtual Ethernet)

This is a provenance map for **gVNIC**, the Google Virtual NIC presented to
Compute Engine guests. It cites the public specification basis, summarizes only
the wire-format subset a capOS driver would implement, and maps the device onto
capOS's userspace-driver hardware-authority gate. It is not a re-spec: where the
behavior is defined in the upstream driver or the public docs, it links rather
than transcribing register tables.

**Maturity caveat.** This page remains primarily a **grounding map**. capOS has
landed live-GCE proofs that request the `GVNIC` image/instance posture, record
the gVNIC PCI function (`1ae0:0042`) with BAR and MSI-X metadata, map BAR0
through `DeviceMmio`, use manager-owned DMA pages for the admin queue and
descriptor buffer, and bring up one GQI/QPL TX/RX queue pair far enough to send
one DHCP DISCOVER raw Ethernet frame and receive one inbound IPv4 frame before
teardown. capOS also has a bounded hardware-only typed `Nic` adaptation proof
over that same queue path: the proof marker records `Nic.transmit`,
`Nic.receive`, `Nic.macAddress`, and `Nic.linkStatus` semantics with inline
frame transfer and no host-physical/IOVA export. capOS still has **no reusable
gVNIC provider service and no host conformance suite**. There is no gVNIC device
model in QEMU, so unlike the virtio-net path there is no local `make run-*`
smoke that can execute the device. The `## 3. capOS mapping` section
distinguishes the landed inventory/admin-queue/raw-frame proof and typed
`Nic`-adaptation proof from future productionization work. The bounded
implementation lane that consumes this map is decomposed in
[`docs/backlog/hardware-boot-storage.md`](../backlog/hardware-boot-storage.md).

**gVNIC is a separate GCE portability lane, not a blocker for the first public
Web UI proof.** GCE exposes a selectable `VIRTIO_NET` NIC type on supported
first/second-generation machine families, and capOS already drives modern
virtio-net (see [`virtio-net.md`](virtio-net.md)). A first public Web UI proof
scoped to a virtio-compatible GCE machine type needs no gVNIC support. gVNIC
matters because Google documents it as the Compute Engine NIC alternative to
virtio, with third-generation-and-later machine series supporting **only** gVNIC
for virtual network interfaces; it is the portability lane for those shapes, not
a precondition for the virtio-net Web UI proof.

## 1. Spec basis

- **Device**: Google Virtual NIC (gVNIC), the modern Compute Engine virtual
  network interface. Exposed to the guest as a PCI function with vendor
  `0x1ae0` (Google) and device `0x0042`. The same vendor/device pair is recorded
  for the GCP NIC path in
  [`docs/proposals/cloud-deployment-proposal.md`](../proposals/cloud-deployment-proposal.md)
  ("PCI Device IDs for Cloud Hardware"). The upstream Linux driver names the
  device family **GVE** (Google Virtual Ethernet).
- **Authoritative spec**: gVNIC has no freely published register specification.
  The basis of record is the combination of:
  - Google Cloud's **"Using Google Virtual NIC"** Compute Engine documentation,
    which defines the supported machine families, the `GVNIC` guest-OS image
    feature, the `nic-type=GVNIC` instance network-interface selection, and the
    virtio-net-versus-gVNIC machine-family matrix
    (<https://cloud.google.com/compute/docs/networking/using-gvnic>).
  - The **Google Compute Virtual Ethernet (GVE) Linux driver**, whose headers
    are the documented wire contract: the device register block
    (`gve_register.h`), the admin-queue command space (`gve_adminq.h`), and the
    GQI / DQO descriptor formats (`gve_desc.h`, `gve_desc_dqo.h`). Source:
    <https://github.com/torvalds/linux/tree/master/drivers/net/ethernet/google/gve>.
  - The **Linux GVE device-driver documentation**, which is the closest thing to
    a published interface description: BAR layout, admin queue, interrupt
    classes, the GQI/DQO queue formats, QPL/RDA addressing, and the reset
    handshake (<https://docs.kernel.org/networking/device_drivers/ethernet/google/gve.html>).
- **Reference driver**: the upstream GVE Linux driver
  (`drivers/net/ethernet/google/gve/`) is the behavior cross-check for the
  admin-queue handshake, queue creation, and the two descriptor formats.

## 2. Wire format (subset a capOS driver would implement)

The subset below is the slow-path bring-up plus one traffic-queue format a
minimal capOS gVNIC driver would need. Exact register offsets, opcode numbers,
and descriptor bit layouts are defined in the GVE headers cited above and are
**not transcribed here** — this is a map, not a re-spec. Endianness is
**not uniform** on this device: admin-queue messages and **GQI** descriptors are
big-endian, while **DQO** descriptors are little-endian (per the GVE driver
docs), so a capOS decoder/encoder must select endianness per structure.

- **Registers / BARs**: three 32-bit memory BARs.
  - **BAR0** — device configuration and status registers (the `gve_register.h`
    block): `GVE_DEVICE_STATUS` / driver-status handshake, max TX/RX queue
    counts, the admin-queue PFN and doorbell, the admin-queue event counter, and
    the reset trigger.
  - **BAR1** — the MSI-X vector table.
  - **BAR2** — the IRQ doorbells plus the per-queue RX and TX doorbells.
- **Admin queue (AQ)**: a single page-sized command array. The driver writes a
  command into a free slot, advances its submission counter, rings the
  admin-queue doorbell in BAR0, and polls the admin-queue event counter until
  the device marks the command executed and writes back its status. The
  `gve_adminq.h` opcode space covers device description and resource lifecycle
  (describe device, configure/deconfigure device resources, register/unregister
  page list, create/destroy TX queue, create/destroy RX queue, and feature/option
  negotiation). The landed capOS proofs register the AQ page, issue
  `DESCRIBE_DEVICE`, parse the returned descriptor and GQI/QPL option, configure
  device resources with two notification blocks, register TX/RX queue page
  lists, create one TX and one RX queue, then destroy/unregister/deconfigure and
  release the admin queue before emitting evidence.
- **Interrupt classes**: MSI-X only, in two roles.
  - A **management interrupt** that tells the driver to re-examine
    `GVE_DEVICE_STATUS` (link / device-state changes).
  - **Notification-block interrupts**, one block servicing a set of traffic
    queues; a block firing tells the driver to poll the associated queues. The
    notification blocks are the per-queue completion-signal path.
- **Queue formats (GQI vs DQO)**: gVNIC defines two **mutually incompatible**
  descriptor formats; a device instance negotiates one.
  - **GQI** ("Google Queue Interface"): fixed-size, power-of-two descriptor
    rings; the classic format. Big-endian descriptors.
  - **DQO** ("Descriptor Queue, Out-of-order"): split descriptor and completion
    queues with per-completion **generation bits** for ownership tracking and
    16-bit tags identifying which posted buffer a completion refers to, allowing
    out-of-order completion. Little-endian descriptors. DQO is the format the
    newer machine families use.
- **Addressing modes (QPL vs RDA)**: independent of the descriptor format, each
  queue uses one of two buffer-addressing modes.
  - **QPL** ("queue page list"): the driver pre-registers a fixed set of guest
    pages with the device through the admin queue, and descriptors reference
    offsets into that registered page list rather than arbitrary guest physical
    addresses. The device only ever DMAs into pages the driver explicitly
    registered.
  - **RDA** ("raw DMA addressing"): descriptors carry guest DMA addresses
    directly, so the device can DMA to dynamically allocated guest memory.
- **Descriptor / ring ownership**: the driver owns descriptor production and
  doorbell rings; the device owns completions. In GQI the device advances a
  completion/used position the driver reads; in DQO the device writes completion
  entries whose generation bit flips when the entry is the device's to consume,
  so the driver detects new completions without a separate tail register.
- **Reset / link-up sequence**: bring-up drives the BAR0 device-status /
  driver-status handshake, sets up the admin queue (legacy revision: program the
  AQ PFN; newer revisions: program AQ length/base and set driver-status RUN),
  issues the admin commands above to describe the device and create queues, and
  arms the notification-block interrupts. Teardown follows the upstream driver:
  legacy revision writes `0x0` to the AQ PFN and waits for it to read back zero;
  newer revisions write driver-status RESET and wait for `DEVICE_IS_RESET`.
- **Known unsupported / out-of-scope features**: offloads (checksum, TSO/LRO,
  RSS hashing), jumbo frames, multi-queue scaling beyond a single TX/RX pair,
  and the RDA addressing mode are out of scope for an initial bring-up. The
  first capOS lane targets QPL addressing with one TX and one RX queue (see §3).

## 3. capOS mapping

gVNIC is a vendor-custom cloud NIC. capOS now exercises inventory,
admin-queue/register, bounded raw-frame GQI/QPL TX/RX, and a bounded typed
`Nic`-adaptation proof in private GCE runs. Productionization remains future
work: there is no reusable gVNIC provider service, local device model, DQO/RDA
support, or host conformance suite yet.

- **Authority gate**: the gVNIC PCI function is inventoried over the production
  PCI enumeration source. The admin-queue proof binds BAR0 and a manager-owned
  DMA pool for one `DESCRIBE_DEVICE` command
  (`kernel/src/cap/gvnic_adminq_register_proof.rs`). The raw-frame proof
  (`kernel/src/cap/gvnic_raw_frame_proof.rs`) then uses the same device-manager
  authority model to configure one GQI/QPL TX/RX queue pair, transmit one DHCP
  DISCOVER, poll a bounded RX descriptor completion, and tear the queues down.
  The `cloud_gce_gvnic_nic_cap_adaptation_proof` build reuses that module's
  `report_nic_cap_adaptation` path to prove the existing `Nic` ABI semantics
  over the same GQI/QPL data path: the marker records inline-frame
  `Nic.transmit` / `Nic.receive`, `Nic.macAddress`, and `Nic.linkStatus`
  evidence without exposing queue addresses or emitting the broader provider
  bind claim.
  Both proofs use `kernel/src/pci.rs` `find_driver_bind_device` for
  resolved-source driver enumeration and `kernel/src/device_manager/stub.rs`
  `devicemmio_kernel_window_for_proof` for the live BAR0 `DeviceMmio` window.
  They do not issue a reusable userspace gVNIC provider service and do not claim
  `provider-nic-bound`.
- **`DeviceMmio`**: the landed proof stages BAR0 as a device-manager
  `DeviceMmio` record, bounds all big-endian register accesses to the staged
  window, rings the admin-queue doorbell, and detaches the record with a
  stale-handle assertion. The raw-frame proof also maps a bounded 64 KiB BAR2
  kernel-only doorbell window and validates returned TX/RX doorbell indexes
  before ringing them. BAR1 MSI-X remains unprogrammed in this polling proof.
- **`Interrupt`**: the management interrupt and each notification-block vector
  would each bind one `Interrupt` cap over an MSI-X table entry, with the
  same mask-first / deferred-LAPIC-EOI lifecycle the landed production interrupt
  path uses (`kernel/src/device_interrupt.rs`, exercised by the virtio-net
  userspace IRQ-ownership slice). gVNIC uses MSI-X exclusively — there is no
  legacy-IRQ fallback. The admin-queue proof does not program MSI-X.
- **`DMAPool` / `DMABuffer`**: the admin-queue pages come from the
  manager-owned bounce-buffer pool through
  `stage_bounce_buffer_dmapool_record` and
  `issue_manager_attached_dmabuffer_handle_with_request`. The raw-frame proof
  keeps larger queue resources and QPL pages manager/proof-owned, publishes
  device-visible addresses only internally to the hardware, and never grants
  userspace a `DMABuffer` cap or raw host-physical/IOVA value. It asserts
  `DmaBufferCap::info_for_handle` reports `host_physical_user_visible=0`,
  `device_iova=0`, and `iova_export=disabled-future-only`. Teardown destroys
  queues, unregisters both QPLs, deconfigures device resources, releases/resets
  the admin queue, scrubs/frees traffic frames, requires scrub/ledger
  removal/frame-free labels for manager buffers, and checks stale
  pool/buffer/MMIO handles. Future reusable gVNIC provider integration must use
  the same selected DMA backend model documented in
  [`docs/dma-isolation-design.md`](../dma-isolation-design.md).
- **Fail-closed / validation rules**: the landed proof emits
  `cloudboot-evidence: gvnic-adminq-register <token>` or
  `cloudboot-evidence: gvnic-raw-frame-tx-rx <token>` only after the bounded
  command/traffic sequence passes, the release/reset handshake completes, the
  PCI command register is restored, and stale `DeviceMmio`/`DMAPool`/`DMABuffer`
  handles all fail closed. The typed adaptation proof emits
  `cloudboot-evidence: gvnic-nic-cap-adaptation <token>` only after the same
  teardown and stale-handle checks plus `Nic`-semantic TX/RX evidence. If queue
  or admin-queue release times out, the proof intentionally leaves still-owned
  DMA pages live and emits no success marker rather than freeing memory the
  device may still own.
- **QEMU-emulable vs hardware-only**: **none of gVNIC is QEMU-emulable** — QEMU
  has no gVNIC/GVE device model. Every bind step is therefore hardware-only and
  requires a private, explicitly billable GCE instance launched with the `GVNIC`
  guest-OS feature and `nic-type=GVNIC`. The lane is gated accordingly: the
  landed inventory proof (`cloud-gce-gvnic-image-launch-inventory-proof`), the
  landed admin-queue/register proof (`cloud-gce-gvnic-adminq-register-proof`),
  the landed bounded raw-frame TX/RX proof
  (`cloud-gce-gvnic-raw-frame-tx-rx-proof`), and the landed typed `Nic`
  adaptation proof (`cloud-gce-gvnic-nic-cap-adaptation-proof`). Each is
  decomposed in
  [`docs/backlog/hardware-boot-storage.md`](../backlog/hardware-boot-storage.md)
  and requires a private, explicitly billable GCE run for hardware evidence.
