# Cloud DMA Provider Evidence Inventory

This note is the research substrate for the cloud DMA backend decision. It
records official AWS, Azure, and Google Compute Engine device-surface facts,
defines the evidence-matrix schema that the backend policy fills, specifies the
live guest-probe checklist a later credentialed cloud-run task captures, and
fixes the classification rules that separate a DMA-capable surface from
guest-programmable remapping authority.

It makes no backend selection and no per-VM-shape safety claim. It does not
launch a cloud VM, require provider credentials, or assert that any instance
shape is safe for direct DMA. Selecting a backend and asserting bounce-buffer
safety or IOMMU coverage for a specific shape require attended sign-off and are
out of scope here; that work is `cloud-dma-backend-selection`. The model this
note feeds is `docs/proposals/dma-assurance-model-proposal.md`; the local
QEMU/IOMMU grounding it builds on is `docs/research/iommu-remapping.md`.

## How These Facts Were Collected

Provider facts are from official provider documentation and API/CLI references
only, retrieved on the dates recorded below. A "fact" here is a statement the
provider document makes directly. Where a property is read from an API field
rather than stated in prose, it is marked as an **inference from API field**.
No statement in this note comes from running a cloud instance; the live-probe
checklist exists precisely because a guest cannot prove provider-side isolation
from documentation alone.

## Provider Official Facts

### AWS EC2

Source: `ec2:DescribeInstanceTypes` API reference
([InstanceTypeInfo](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceTypeInfo.html),
[NetworkInfo](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_NetworkInfo.html),
[EbsInfo](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_EbsInfo.html)),
retrieved 2026-05-24. The matching CLI is
`aws ec2 describe-instance-types --instance-types <type>`.

- **Network surface.** `networkInfo.enaSupport` reports Elastic Network Adapter
  (ENA) support with values `unsupported | supported | required`.
  `networkInfo.efaSupported` (boolean) and `networkInfo.efaInfo` report Elastic
  Fabric Adapter presence. `networkInfo.enaSrdSupported` (boolean) reports ENA
  Express (Scalable Reliable Datagram). `networkInfo.encryptionInTransitSupported`
  (boolean) reports automatic in-transit encryption between instances.
- **EBS/NVMe surface.** `ebsInfo.nvmeSupport` reports NVMe support for EBS with
  values `unsupported | supported | required`. `ebsInfo.ebsOptimizedSupport`
  reports EBS-optimized behavior (`unsupported | supported | default`).
- **Instance store.** `instanceStorageSupported` (boolean) and
  `instanceStorageInfo` report local instance-store NVMe disks.
- **Accelerators.** `gpuInfo`, `fpgaInfo`, `inferenceAcceleratorInfo`,
  `neuronInfo`, and `mediaAcceleratorInfo` describe GPU/FPGA/inference/Neuron/
  media accelerator surfaces when present.
- **Hypervisor.** `hypervisor` reports `nitro | xen`. Modern Nitro instances
  report `nitro`; the Nitro system is where ENA and NVMe EBS exposure originate.

Inference from API field: an instance type with `enaSupport=required` and
`ebsInfo.nvmeSupport=required` exposes a DMA-capable NIC and NVMe block surface.
This identifies a DMA-capable surface; it is not evidence of guest-programmable
remapping authority.

### Azure Virtual Machines

Source: [Azure Accelerated Networking overview](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview)
(page `ms.date` 2026-02-05, last updated 2026-05-05) and
[az vm list-skus](https://learn.microsoft.com/en-us/cli/azure/vm#az-vm-list-skus),
retrieved 2026-05-24.

- **Network surface.** Accelerated Networking enables single-root I/O
  virtualization (SR-IOV) on supported VM sizes, providing a host-bypass data
  path. The underlying SR-IOV hardware is one of NVIDIA/Mellanox ConnectX-3,
  ConnectX-4 Lx, ConnectX-5, or the Microsoft Azure Network Adapter (MANA).
- **Capability query.** A VM size's Accelerated Networking capability is read
  from `az vm list-skus` as the `AcceleratedNetworkingEnabled` capability value.
  Most general-purpose and compute-optimized sizes with two or more vCPUs
  support it (four or more on hyperthreaded sizes); NC and NV sizes appear in
  output but do not support it.
- **VF dynamic binding and revocation.** The document states the SR-IOV virtual
  function (VF) is dynamically revoked and restored across host maintenance and
  live migration. Guest images must bind to the synthetic `hv_netvsc` device,
  not the VF, to keep connectivity, and must mark `mana | mlx4_core | mlx5_core`
  SR-IOV devices unmanaged so the synthetic/VF bond is transparent.
- **Driver delivery.** Azure does not update the Mellanox or MANA in-guest
  drivers; the guest kernel/distribution provides them.

Inference from API field: `AcceleratedNetworkingEnabled=True` identifies a
DMA-capable SR-IOV NIC surface whose VF can appear and disappear at runtime. The
documented VF revoke/restore behavior is a driver-lifecycle constraint, not
remapping evidence.

### Google Compute Engine

Source: [Use Google Virtual NIC (gVNIC)](https://cloud.google.com/compute/docs/networking/using-gvnic)
and [About Local SSD disks](https://cloud.google.com/compute/docs/disks/local-ssd),
retrieved 2026-05-24.

- **Network surface.** Third-generation and later machine series (excluding bare
  metal) support only gVNIC for the virtual network interface (no virtio-net).
  First- and second-generation machines must use gVNIC when on Arm CPU
  platforms, when configured as Confidential VM, or when requiring network
  speeds between 50 and 100 Gbps, and otherwise still support VirtIO-Net. Custom images declare gVNIC support through
  the `GVNIC` guest OS feature (`--guest-os-features=GVNIC`, or
  `guestOsFeatures:[{type:"GVNIC"}]`).
- **Local SSD surface.** Local SSD is attached over either the NVMe or SCSI
  interface; the NVMe interface is required for peak performance, and some
  machine series support only one of the two interfaces. The interface is chosen
  by the disk `interface` field (`NVME` or `SCSI`).
- **Storage transport.** Persistent Disk attaches as virtio-scsi on machine
  families that expose it, while newer families expose NVMe; the exact transport
  is a per-machine-family property to be captured per shape rather than assumed.

Inference from API field: a third-generation-or-later GCE machine type exposes a
gVNIC NIC surface and may expose NVMe Local SSD/Persistent Disk. This identifies
DMA-capable NIC/storage surfaces; it is not remapping evidence.

## Evidence-Matrix Schema

The backend policy fills one row per observed (provider, shape, image) tuple.
Provider-fact columns come from documentation/API; observation columns come from
the live-probe checklist; the last two columns are derived classifications, not
provider claims.

| Column | Meaning |
| --- | --- |
| Provider | `aws` / `azure` / `gcp`. |
| Region/zone | The region or zone the observation was taken in. |
| Instance type | Provider instance type / VM size / machine type. |
| Image/kernel | Boot image identifier and guest kernel version. |
| Source command or URL | The exact API/CLI command or official doc URL. |
| Retrieval date | Date the source was read or the probe was captured. |
| Visible PCI/storage/network devices | Devices the guest enumerates (`lspci`, block/net inventory). |
| Visible IOMMU tables/groups | ACPI DMAR/IVRS/IORT presence and `/sys/kernel/iommu_groups`. |
| Provider-side isolation notes | Documented host-side isolation (support-policy assumption, not proof). |
| Guest-programmable remapping observations | Whether the guest can discover, program, and validate a remapping authority. |
| Runtime backend inferred by capOS | The backend capOS would select from observations (see classification rules). |
| Support-policy status | Coarse advertised-target roll-up: `Direct-remapping` / `Labeled-bounce-buffer` / `Unsupported`, pending attended sign-off. |

### Seed Rows (docs/API-derived, no safety claim)

These rows are seeded from documentation and API fields only. Observation and
backend columns are intentionally blank because no instance was probed; they are
filled by a later credentialed cloud-run task. No row asserts that any shape is
safe for direct DMA.

| Provider | Example shape | Documented NIC surface | Documented storage surface | Remapping observation | Backend |
| --- | --- | --- | --- | --- | --- |
| aws | Nitro instance, `enaSupport=required`, `nvmeSupport=required` | ENA (SR-IOV) | NVMe EBS + optional instance-store NVMe | not yet probed | not yet selected |
| azure | Size with `AcceleratedNetworkingEnabled=True` | SR-IOV VF (MANA/ConnectX) bonded to synthetic `hv_netvsc` | Managed disk (transport per shape) | not yet probed | not yet selected |
| gcp | 3rd-gen+ machine type (e.g. C3) | gVNIC only | NVMe Local SSD / PD per family | probed 2026-05-24: IOMMU disabled, SWIOTLB (see GCE Live Probe Results) | labeled bounce-buffer |
| gcp | 1st/2nd-gen, x86, non-Confidential, under 50 Gbps | VirtIO-Net or gVNIC | virtio-scsi PD / Local SSD (NVMe or SCSI) | probed 2026-05-24: IOMMU disabled, SWIOTLB (see GCE Live Probe Results) | labeled bounce-buffer |

### GCE Live Probe Results (2026-05-24)

These rows replace the GCE "not yet probed" placeholders with live guest
observations. Four representative shapes were booted on Google Compute Engine
(stock Debian 12, kernel `6.1.0-47-cloud-amd64`) in a dedicated test project,
each running a `/sys`- and `/proc`-only probe delivered through instance
metadata and read back over the serial console. Every instance booted with no
external IP, no service account, and was deleted immediately after its probe
output was captured.

| Machine type | Class | NIC driver | Storage | Guest IOMMU / DMAR | DMA path |
| --- | --- | --- | --- | --- | --- |
| `n1-standard-1` | 1st-gen | `virtio_net` | virtio-scsi (`sda`) | `intel_iommu=off`, `DMAR: IOMMU disabled`, no DMAR table, empty `iommu_groups` | SWIOTLB software bounce buffering |
| `e2-small` | 2nd-gen | `virtio_net` | virtio-scsi (`sda`) | same: IOMMU disabled, no DMAR, no groups | SWIOTLB |
| `c3-standard-4` | 3rd-gen Intel | `gvnic` | `nvme` Local SSD (Google vendor `0x1ae0`) | same | SWIOTLB |
| `n2d-standard-2` Confidential | AMD SEV | `gvnic` | `nvme` | same; additionally `Memory Encryption Features active: AMD SEV` | SWIOTLB **forced** (512 MB) |

Verbatim kernel evidence common to all four shapes:

- the boot command line carries `intel_iommu=off`;
- `DMAR: IOMMU disabled`;
- `PCI-DMA: Using software bounce buffering for IO (SWIOTLB)`;
- `/sys/kernel/iommu_groups` is empty, and no `DMAR`, `IVRS`, or `IORT` table is
  present under `/sys/firmware/acpi/tables/`.

The Confidential (SEV) shape additionally logs `software IO TLB: Memory
encryption is active and system is using DMA bounce buffers`, confirming that
bounce buffering is enforced by memory encryption, not merely by configuration.

**Classification.** No probed GCE shape -- neither the older virtio surface nor
the modern gVNIC/NVMe surface -- exposes a guest-programmable IOMMU that capOS
could discover, program, and validate. By the
[classification rules](#classification-rules) this rules out the direct-remapping
backend and selects the **labeled bounce-buffer fallback** for the cloud path on
these shapes. On the Confidential VM the bounce-buffer path is a hardware
invariant: the device cannot reach encrypted guest memory directly. This is a
fail-closed observation, not a hostile-hardware isolation claim; the binding
backend selection and any "supported shape" advertisement remain attended
sign-off work in `cloud-dma-backend-selection`.

**Design implication for GCP storage/NIC drivers.** A provider-side or
hypervisor-side IOMMU may still protect Google infrastructure, but that is not
guest-programmable remapping authority for capOS. On the probed GCE shapes a
capOS userspace storage or NIC provider must therefore be planned as a
no-IOMMU, brokered-bounce design: userspace receives buffer capabilities,
grant IDs, or typed commands, while the kernel or device manager materializes
the device-visible queue-base, descriptor, PRP/SGL, or virtqueue address fields.
The direct-remapping lane remains valid for QEMU `run-iommu-remapping` and for
future cloud/hardware shapes that expose a guest-programmable remapping unit;
it is not a GCP premise today. The generic design consequences are recorded in
[`dma-userspace-driver-isolation.md`](dma-userspace-driver-isolation.md).

## Runtime Probe Protocol

A later credentialed cloud-run task captures the following from the guest, with
the region/zone, image, kernel, and retrieval date recorded for each command.
Capture the verbatim command output as evidence; do not summarize it.

- `lspci -nnk -D` -- PCI topology with full domain:bus:device.function, vendor/
  device IDs, and bound kernel driver per function (NIC, storage controller,
  accelerator identity).
- `ls /sys/kernel/iommu_groups` (and per-group `devices/`) -- whether the guest
  sees IOMMU groups at all, and how devices are grouped.
- ACPI table presence: DMAR (Intel VT-d), IVRS (AMD-Vi), IORT (Arm SMMU)
  under `/sys/firmware/acpi/tables/`. Absence is itself evidence.
- Kernel log IOMMU/SWIOTLB lines (`dmesg | grep -iE 'iommu|dmar|ivrs|iort|swiotlb'`)
  -- whether the kernel enabled an IOMMU, fell back to software bounce
  (SWIOTLB), or found no remapping unit.
- Network driver identity: `ethtool -i <iface>` and the bound driver
  (`ena`, `mana`/`mlx5_core`, `gve`, `virtio_net`).
- Block transport identity: `lsblk -o NAME,TRAN,MODEL` and controller driver
  (`nvme`, `virtio_blk`, `virtio_scsi`).
- NVMe inventory: `nvme list` and `nvme id-ctrl <dev>` for controller identity
  where NVMe is present.

A probe result is only usable evidence if capOS could perform the equivalent
discovery from its own ACPI/PCI enumeration; the Linux commands above stand in
for that discovery during the research phase.

## Classification Rules

These rules are deliberately fail-closed and feed the
`runtime backend inferred by capOS` and `support-policy status` columns.

- SR-IOV, a virtual NIC (ENA, gVNIC, MANA, virtio-net), a GPU, an accelerator,
  or local NVMe identifies a **DMA-capable or DMA-adjacent surface**. This is the
  presence of a device that does or could bus-master; it is not a safety claim.
- A **direct-remapping** classification requires guest-programmable remapping
  authority that capOS can discover, program, and validate -- a usable Intel
  VT-d, AMD-Vi, or Arm SMMU unit the guest controls, with translation, fault,
  and invalidation behavior matching `docs/research/iommu-remapping.md`. A DMA-capable
  surface alone never implies this.
- Provider-side isolation facts (host-enforced VPC isolation, Nitro/host data-
  path bypass, hypervisor-side IOMMU) are **support-policy assumptions**, not
  proof that capOS can safely use direct DMA from inside the guest.
- Ambiguous, contradictory, or unvalidated observations select **`Unsupported`**.
  This matches the assurance model: unknown or contradictory observations select
  `Unsupported`, not an optimistic default.

These map onto the three backend candidates in the assurance model
(`docs/proposals/dma-assurance-model-proposal.md`): a direct remapping domain, a
labeled bounce-buffer fallback (`direct_dma=blocked`, all device-visible memory
manager-owned, no host physical address exposed, hostile-hardware isolation not
claimed), or `Unsupported`.

## Relationship to Backend Selection

`cloud-dma-backend-selection` consumes this inventory: it maps each backend
candidate to the assurance-model invariants, fills the evidence matrix per cloud
VM shape, and drafts the downstream-contract scaffolding (which device-manager
policy fields a driver declares -- `direct_dma`, `trusted_domain`,
`bounce_buffer` -- and which stale-handle/stale-completion/teardown/
no-host-physical-exposure gates each candidate must satisfy). That task already
declares this inventory as a dependency. The binding backend selection and any
per-shape safety assertion remain attended-sign-off work and are not made here.

## Relevant Research and Grounding

- `docs/research/iommu-remapping.md` -- primary-source Intel VT-d/AMD-Vi/QEMU
  remapping grounding the direct-DMA classification depends on.
- `docs/proposals/dma-assurance-model-proposal.md` -- the model objects,
  invariants, and backend-candidate matrix this evidence feeds.
- `docs/dma-isolation-design.md` -- the manager-owned DMA isolation contract and
  bounce-buffer fallback the labeled-fallback candidate must satisfy.
- `docs/proposals/cloud-deployment-proposal.md` -- the cloud deployment context
  for the usable-instance milestone.
- `docs/tasks/cloud-dma-backend-selection.md` -- the backend decision
  that consumes this inventory.
