# Azure managed disk (NVMe storage)

This is a provenance map for the **Azure managed-disk** storage shape: how an
Azure VM presents its managed (and local) disks to the guest, why the modern
surface is the *same* standard NVMe device the shared NVMe storage-provider
foundation already drives, why the older-family SCSI path is **not** a usable
alternative here, and the small Azure delta capOS adds on top of the shared
foundation. It is not a re-spec; the NVMe register/queue/PRP wire subset capOS
actually touches is documented once in [`nvme.md`](nvme.md) and not repeated here.

**Maturity caveat.** This page documents a **local QEMU cloud-shape
classification**, not a bound driver running on real Azure hardware. The NVMe
bind/identify/read lifecycle is proven locally on `make run-pci-nvme` against
QEMU's `-device nvme`; the Azure delta is the Azure-context classification
proof line and the Azure DMA-backend policy note on top of that shared NVMe
foundation. End-to-end Azure managed-disk enumeration, live namespace I/O, and
cloud evidence capture are future work (tracked as
`cloud-azure-storage-live-proof`), to be done when Azure access is provisioned.
The Azure MANA NIC is a distinct driver-binding claim
(see [`azure-mana.md`](azure-mana.md)) and is out of scope here.

## 1. Spec basis

- **Device**: Azure managed-disk storage controller. Azure presents storage in
  two shapes depending on VM generation:
  - **Azure Boost and newer NVMe-capable families** expose managed disks (and
    local SSD) as **NVMe namespaces** behind a standard NVMe PCI controller --
    PCI class `0x01` (mass storage), subclass `0x08` (NVM), programming
    interface `0x02` (NVM Express). This is the same class triple QEMU emulates
    with `-device nvme` and the kernel detects with
    `PciDevice::is_nvme_controller` (`kernel/src/pci.rs`). This is the path this
    page documents.
  - **Older VM families** present managed disks over a **Hyper-V SCSI**
    controller (a virtio-scsi-shaped interface). capOS has no userspace
    virtio-scsi provider driver, and `make run-virtio-blk` proves the
    *kernel-owned* virtio-blk driver -- a kernel-owned driver leaves the hidden
    kernel DMA ownership the userspace-provider acceptance forbids. The SCSI
    path is therefore **out of scope** for this driver (recorded on the
    classification line as
    `azure_scsi_path=no-userspace-provider-driver-out-of-scope`); supporting it
    would be a separate userspace virtio-scsi provider-driver foundation, not a
    re-use of the `run-virtio-blk` gate.
- **Production PCI identity**: the Azure Boost NVMe controller carries
  Microsoft's PCI vendor id `0x1414`, distinct from QEMU's `0x1b36`. capOS
  therefore classifies on the device *class surface* and the brokered no-IOMMU
  bounce DMA shape, **not** on a vendor-id match (see §3); live vendor-id
  confirmation and real namespace geometry belong to the deferred
  `cloud-azure-storage-live-proof`.
- **Authoritative spec**: the NVM Express Base Specification (NVMe 1.4 / 2.0) is
  the wire contract; Azure publishes no separate managed-disk register spec
  because the modern device *is* a standard NVMe controller. Azure documents the
  Boost NVMe interface and namespace exposure in the "Azure Boost" and
  "Enable NVMe" VM documentation
  (<https://learn.microsoft.com/azure/virtual-machines/enable-nvme-interface>);
  the in-guest reference driver is the upstream Linux `drivers/nvme/host/`.
- **Wire-format subset capOS implements**: identical to the standard NVMe subset
  documented in [`nvme.md`](nvme.md) §1-§2 (controller registers `CAP`/`CC`/
  `AQA`/`ASQ`/`ACQ`/`CSTS`, the admin and one I/O submission/completion queue
  pair, per-queue doorbells, and PRP1/PRP2 data pointers). Azure Boost adds no
  fields beyond that subset, so this page does not re-list them.

## 2. Wire format (relevant subset)

See [`nvme.md`](nvme.md) §2 and §6-§8. There is no Azure-specific wire format to
document: the brokered controller enable (manager-authored `AQA`/`ASQ`/`ACQ`),
the admin `IDENTIFY`, the one I/O queue pair, and the bounded `READ` all use the
standard NVMe encoding the shared foundation already implements and proves.

## 3. capOS mapping

The Azure delta is a **cloud-shape classification** plus a **DMA-backend policy**
consumption detail layered onto the shared NVMe storage-provider foundation; it
adds no new driver code.

- **Cloud-shape classification proof**: after the first enumerated NVMe
  controller is bound (`bind_qemu_nvme_controller`), the enumeration path emits a
  `nvme: cloud shape classification cloud_shape=azure-managed-disk ...` proof
  line (`kernel/src/pci.rs` `report_cloud_nvme_shape_azure`, alongside the AWS
  `report_cloud_nvme_shape`) classifying the same bound controller against the
  documented Azure managed-disk device surface. It prints the enumerated
  `pci_vendor`/`pci_device_id` and `class`/`subclass`/`prog_if`, records the
  production `azure_nvme_vendor=0x1414` identity as documentation (not as a
  claimed match), records the out-of-scope SCSI path
  (`azure_scsi_path=no-userspace-provider-driver-out-of-scope`), and carries
  explicit scope flags (`local_qemu_precursor=true`,
  `real_azure_enumeration=not-claimed`, `mana=separate-nic-driver-out-of-scope`).
  `make run-pci-nvme` asserts this line (`tools/qemu-pci-nvme-smoke.sh`
  `assert_nvme_cloud_shape_azure`) in the same boot as the bounce-buffer
  `dma: backend selection` line asserted by `assert_nvme_cloud_shape`, tying the
  bound device surface to the DMA backend resolved that boot.
- **Azure IOMMU-availability DMA-backend policy**: Azure does not guarantee a
  guest-visible VT-d/IOMMU the way QEMU's emulated IOMMU does, so the DMA backend
  the live Azure path consumes is selected by `cloud-dma-backend-selection`
  (`kernel/src/dma_backend.rs` `select_and_report`): direct-remapping where a
  usable+safe IOMMU is positively probe-verified, else the labeled bounce-buffer
  fallback. The classification line labels the expected backend
  (`azure_labeled_dma_backend=bounce-buffer`,
  `dma_backend_policy=direct-remapping-if-verified-else-bounce-buffer`); the
  *resolved* backend is proven separately by the `dma: backend selection` line,
  which on the no-IOMMU `make run-pci-nvme` gate is `bounce-buffer`.
- **Brokered DMA / no host-physical exposure**: the binding lifecycle reuses the
  brokered no-IOMMU lane documented in [`nvme.md`](nvme.md) §6-§8 -- the manager
  authors every address-bearing register and PRP from the live DMA ledger, and
  `host_physical_user_visible=false` holds throughout. On a verified remapping
  lane the provider-written Model B path would apply instead; on the no-IOMMU
  gate the brokered bounce shape is the only consistent path (see
  `docs/dma-isolation-design.md`, "Provider-Written Addresses And No-IOMMU
  Brokered Bounce").
- **`DeviceMmio` / `Interrupt` / `DMAPool`**: unchanged from the shared
  foundation -- the reset-only `CC` selected-write claim, the brokered admin and
  I/O doorbells, the interrupt-driven admin completion wake, and the
  `DMAPool`-allocated queue/data pages described in [`nvme.md`](nvme.md) §4-§8.
- **QEMU-emulable vs hardware-only**: the classification and the full
  bind/identify/read lifecycle are end-to-end QEMU-emulable (`make run-pci-nvme`).
  Live managed-disk enumeration over a real Azure Boost controller -- vendor-id
  `0x1414` confirmation, real namespace geometry, and live block I/O -- is
  hardware-only and is the deferred `cloud-azure-storage-live-proof`.

## Related

- [NVMe](nvme.md) -- the shared NVMe controller wire subset and brokered
  no-IOMMU storage-provider foundation this shape binds onto.
- [AWS Nitro EBS (NVMe storage)](aws-nvme.md) -- the sibling cloud NVMe
  storage shape; same shared foundation, different cloud provenance. AWS is
  NVMe-only with no SCSI alternative, whereas Azure's older families use SCSI.
- [virtio-net](virtio-net.md) -- the worked cloud-shape classification example
  (GCP virtio-net) the storage classifications mirror.
- [Azure MANA](azure-mana.md) -- the distinct Azure NIC driver-binding claim,
  out of scope for this storage surface.
- `docs/dma-isolation-design.md` -- the DMA-backend selection model and the
  no-IOMMU brokered bounce policy.
- `docs/backlog/hardware-boot-storage.md` -- the cloud device tracks, including
  the deferred live-Azure storage proof.
