# Azure MANA (Microsoft Azure Network Adapter)

This is a provenance map for the MANA / GDMA wire logic in
`capos-lib/src/mana.rs`: it cites the spec basis, summarizes only the
wire-format subset the code actually implements, and points into the
implementation by symbol name. It is not a re-spec.

**Maturity caveat.** This page documents *protocol encode/decode logic with a
host-side conformance suite*, not a bound driver. There is no MANA device in
QEMU, so this logic is a deliberate QEMU-exception gated by `cargo test-lib`
plus a warning-free `cargo build --features qemu`, not a `make run-*` smoke.
End-to-end MANA bind / send / receive / teardown on real Azure hardware --
including SR-IOV VF revocation with fallback-to-synthetic and DMA/MMIO/IRQ
teardown -- is future work
(tracked as `cloud-azure-mana-nic-live-proof`), blocked until Azure access is
provisioned. The `## 3. capOS mapping` section below therefore describes the
*planned* binding, not landed authority.

## 1. Spec basis

- **Device**: Microsoft Azure Network Adapter (MANA), the modern Azure NIC for
  Dv5/Ev5 and later VM families. Exposed to the guest as a PCI SR-IOV Virtual
  Function. PCI vendor `0x1414` (Microsoft); device `0x00ba` (VF, the
  guest-bound function) / `0x00b9` (PF). IDs at `capos-lib/src/mana.rs`
  (`MANA_PCI_VENDOR_ID`, `MANA_VF_DEVICE_ID`, `MANA_PF_DEVICE_ID`). The device
  is fronted by GDMA (Generic DMA), Microsoft's queue/DMA abstraction; MANA is
  the network client riding on GDMA queues.
- **Authoritative spec**: MANA has no freely published register specification.
  The basis of record is the upstream open-source MANA Linux driver, whose "HW
  DATA" structures are the documented wire contract:
  - `include/net/mana/gdma.h` -- GDMA registers, doorbells, message headers,
    WQE/CQE/EQE, request-type space, device/queue enums.
  - `include/net/mana/mana.h` -- MANA TX/RX OOB descriptors, completion OOBs,
    `mana_cqe_type`, `mana_command_code`.
  - `include/net/mana/hw_channel.h`, `include/net/mana/shm_channel.h` -- the
    HWC management channel and the shared-memory bootstrap aperture.
  - Reference snapshot: `torvalds/linux` master at commit
    `d60ec36cab338dfe2ae40d73e9c8d6c4af70d2b8` (the `gdma.h` structures are
    stable across recent kernels).
- **Reference driver**: the same MANA Linux driver
  (`drivers/net/ethernet/microsoft/mana/`) is the behavior cross-check;
  `mana_gd_init_req_hdr` defines the standard request-header construction
  mirrored by `GdmaReqHdr::standard`.

## 2. Wire format (implemented subset)

All multi-byte words are little-endian; GDMA "HW DATA" structures are naturally
aligned (not packed). Every decoder validates buffer length, rejects unknown
enum members, and enforces must-be-zero (MBZ) reserved fields; every encoder
range-checks its bitfields. Symbols below are in `capos-lib/src/mana.rs`.

- **Registers / BAR**: single register BAR (BAR0). VF doorbell-page and
  shared-memory aperture offsets (`GDMA_REG_*`) and PF offsets
  (`GDMA_PF_REG_*`), the SR-IOV config base, and the fixed CQE/EQE/WQE-BU and
  max SQE/RQE sizes are in the `regs` module (`REG_DB_PAGE_OFFSET`,
  `REG_SHM_OFFSET`, `PF_REG_*`, `SRIOV_REG_CFG_BASE_OFF`, `CQE_SIZE`,
  `EQE_SIZE`, `MAX_SQE_SIZE`, `MAX_RQE_SIZE`, `WQE_BU_SIZE`).
- **Doorbells**: the four-variant `union gdma_doorbell_entry` is modeled by the
  `DoorbellEntry` enum (`Cq`/`Rq`/`Sq`/`Eq`), encoding the 24- or 16-bit queue
  id, the 31- or 32-bit tail pointer, the RQ `wqe_cnt`, and the CQ/EQ `arm`
  bit, with kind-specific reserved MBZ enforcement on decode.
- **Admin (HWC) messages**: `GdmaMsgHdr` (`gdma_msg_hdr`), `GdmaDevId`
  (`gdma_dev_id`), `GdmaReqHdr` (`gdma_req_hdr`, with `standard` mirroring
  `mana_gd_init_req_hdr`), and `GdmaRespHdr` (`gdma_resp_hdr`, reserved-word
  MBZ). The request-type space is `GdmaRequestType` (`gdma_request_type`,
  fail-closed); the GDMA admin status is the open `GdmaStatus` space (success /
  `MoreEntries` / `CmdUnsupported` / preserved `Other`, since GDMA status is a
  firmware error space, not a closed enum).
- **Work queue**: `GdmaSge` (`gdma_sge`, 16-byte SGE with 64-bit address) and
  `GdmaWqeHeader` (`gdma_wqe`, the 8-byte WQE header: `num_sge`,
  `inline_oob_size_div4`, `client_oob_in_sgl`, `client_data_unit`, with
  reserved MBZ). MANA TX OOB descriptors that prepend the SGL:
  `ManaTxShortOob` (`mana_tx_short_oob`, checksum-offload + completion-CQ +
  vSQ-frame selection) and `ManaTxLongOob` (`mana_tx_long_oob`, encapsulation /
  VLAN / inner-offset fields).
- **Completion / event**: `GdmaCqeInfo` (`gdma_cqe.cqe_info`: `wq_num`,
  `is_sq`, 3-bit `owner_bits`) and `GdmaEqeInfo` (`union gdma_eqe_info`: event
  `type` via `GdmaEqeType`, `client_id`, `owner_bits`). MANA completion OOBs:
  `ManaCqeHeader` (`mana_cqe_header`, `cqe_type` via the fail-closed
  `ManaCqeType` enum), `ManaRxcompOob` (`mana_rxcomp_oob`, RX flags +
  `MANA_RXCOMP_OOB_NUM_PPI` per-packet `ManaRxcompPerpktInfo` + RX WQE offset),
  and `ManaTxCompOob` (`mana_tx_comp_oob`, TX data/SGL/WQE offsets +
  reserved-padding MBZ).
- **Capability / feature negotiation**: the verify-version surface
  (`GdmaRequestType::VerifyVfDriverVersion`, `GDMA_PROTOCOL_V1`, `GdmaOsType`)
  and the MANA control command space `ManaCommandCode` (`mana_command_code`,
  fail-closed) including `QueryDevConfig` / `QueryVportConfig` /
  `ConfigVportTx`/`Rx` / `CreateWqObj`.

## 3. capOS mapping (planned -- not yet implemented)

MANA is a vendor-custom cloud NIC behind SR-IOV. The intended binding, when the
live-proof work is unblocked, follows the same userspace-driver authority gate
the other DDF device classes use; none of the grants below are exercised by the
host conformance logic.

- **Authority gate**: the MANA VF would be enumerated over PCI, claimed through
  the reviewed userspace-driver hardware-authority gate, and tracked in the
  device-manager ownership ledger, exactly as the cloud NIC/storage drivers are
  planned to bind. The current implementation grants nothing.
- **`DeviceMmio`**: BAR0 (the GDMA register block, doorbell page, and SHM
  aperture) would be mapped device-uncacheable / NX, with doorbell writes
  scoped to the owning driver's BAR window. The 64-bit `DoorbellEntry` values
  are the writes that path would emit.
- **`Interrupt`**: GDMA EQs deliver completions via MSI-X; the live driver
  would bind one `Interrupt` per EQ vector and arm it through the EQ doorbell
  `arm` bit. The `owner_bits` phase mechanism (`GdmaCqeInfo`/`GdmaEqeInfo`) is
  how the driver detects new entries without a tail register.
- **`DMAPool`**: GDMA queues and TX/RX buffers would be allocated from a
  labeled DMA pool through the selected DMA backend
  (`cloud-dma-backend-selection`: direct IOMMU vs labeled bounce buffer), with
  quiesce/scrub-before-reuse and host-physical-address / IOVA non-exposure. The
  `GdmaSge` address fields are IOVAs from that pool; the current implementation
  does not allocate or program any DMA.
- **Fail-closed / validation rules**: the encode/decode logic is the
  fail-closed boundary capOS implements today -- unknown
  request/queue/event/completion types and command codes are rejected, reserved
  fields are MBZ-enforced, and bitfields are range-checked. Stale-generation
  rejection, BAR bounds, doorbell scoping, and release/reset/VF-revocation
  teardown are the live driver's responsibility and are future work.
- **QEMU-emulable vs hardware-only**: **none of MANA is QEMU-emulable** -- QEMU
  has no MANA device model. The wire logic here is provable only by the host
  conformance suite (`cargo test-lib`); SR-IOV VF revocation/hot-remove
  semantics in particular cannot be reproduced even by a hypothetical QEMU MANA
  device model and remain a live-hardware concern.
