Azure managed disk (NVMe storage)
This is a provenance map for the Azure managed-disk storage shape: how an Azure VM presents its managed (and local) disks to the guest, why the modern surface is the same standard NVMe device the shared NVMe storage-provider foundation already drives, why the older-family SCSI path is not a usable alternative here, and the small Azure delta capOS adds on top of the shared foundation. It is not a re-spec; the NVMe register/queue/PRP wire subset capOS actually touches is documented once in NVMe and not repeated here.
Maturity caveat. This page documents a local QEMU cloud-shape
classification, not a bound driver running on real Azure hardware. The NVMe
bind/identify/read lifecycle is proven locally on make run-pci-nvme against
QEMU’s -device nvme; the Azure delta is the Azure-context classification
proof line and the Azure DMA-backend policy note on top of that shared NVMe
foundation. End-to-end Azure managed-disk enumeration, live namespace I/O, and
cloud evidence capture are future work (tracked as
cloud-azure-storage-live-proof), to be done when Azure access is provisioned.
The Azure MANA NIC is a distinct driver-binding claim
(see Azure MANA) and is out of scope here.
1. Spec basis
- Device: Azure managed-disk storage controller. Azure presents storage in
two shapes depending on VM generation:
- Azure Boost and newer NVMe-capable families expose managed disks (and
local SSD) as NVMe namespaces behind a standard NVMe PCI controller –
PCI class
0x01(mass storage), subclass0x08(NVM), programming interface0x02(NVM Express). This is the same class triple QEMU emulates with-device nvmeand the kernel detects withPciDevice::is_nvme_controller(kernel/src/pci.rs). This is the path this page documents. - Older VM families present managed disks over a Hyper-V SCSI
controller (a virtio-scsi-shaped interface). capOS has no userspace
virtio-scsi provider driver, and
make run-virtio-blkproves the kernel-owned virtio-blk driver – a kernel-owned driver leaves the hidden kernel DMA ownership the userspace-provider acceptance forbids. The SCSI path is therefore out of scope for this driver (recorded on the classification line asazure_scsi_path=no-userspace-provider-driver-out-of-scope); supporting it would be a separate userspace virtio-scsi provider-driver foundation, not a re-use of therun-virtio-blkgate.
- Azure Boost and newer NVMe-capable families expose managed disks (and
local SSD) as NVMe namespaces behind a standard NVMe PCI controller –
PCI class
- Production PCI identity: the Azure Boost NVMe controller carries
Microsoft’s PCI vendor id
0x1414, distinct from QEMU’s0x1b36. capOS therefore classifies on the device class surface and the brokered no-IOMMU bounce DMA shape, not on a vendor-id match (see §3); live vendor-id confirmation and real namespace geometry belong to the deferredcloud-azure-storage-live-proof. - Authoritative spec: the NVM Express Base Specification (NVMe 1.4 / 2.0) is
the wire contract; Azure publishes no separate managed-disk register spec
because the modern device is a standard NVMe controller. Azure documents the
Boost NVMe interface and namespace exposure in the “Azure Boost” and
“Enable NVMe” VM documentation
(https://learn.microsoft.com/azure/virtual-machines/enable-nvme-interface);
the in-guest reference driver is the upstream Linux
drivers/nvme/host/. - Wire-format subset capOS implements: identical to the standard NVMe subset
documented in NVMe §1-§2 (controller registers
CAP/CC/AQA/ASQ/ACQ/CSTS, the admin and one I/O submission/completion queue pair, per-queue doorbells, and PRP1/PRP2 data pointers). Azure Boost adds no fields beyond that subset, so this page does not re-list them.
2. Wire format (relevant subset)
See NVMe §2 and §6-§8. There is no Azure-specific wire format to
document: the brokered controller enable (manager-authored AQA/ASQ/ACQ),
the admin IDENTIFY, the one I/O queue pair, and the bounded READ all use the
standard NVMe encoding the shared foundation already implements and proves.
3. capOS mapping
The Azure delta is a cloud-shape classification plus a DMA-backend policy consumption detail layered onto the shared NVMe storage-provider foundation; it adds no new driver code.
- Cloud-shape classification proof: after the first enumerated NVMe
controller is bound (
bind_qemu_nvme_controller), the enumeration path emits anvme: cloud shape classification cloud_shape=azure-managed-disk ...proof line (kernel/src/pci.rsreport_cloud_nvme_shape_azure, alongside the AWSreport_cloud_nvme_shape) classifying the same bound controller against the documented Azure managed-disk device surface. It prints the enumeratedpci_vendor/pci_device_idandclass/subclass/prog_if, records the productionazure_nvme_vendor=0x1414identity as documentation (not as a claimed match), records the out-of-scope SCSI path (azure_scsi_path=no-userspace-provider-driver-out-of-scope), and carries explicit scope flags (local_qemu_precursor=true,real_azure_enumeration=not-claimed,mana=separate-nic-driver-out-of-scope).make run-pci-nvmeasserts this line (tools/qemu-pci-nvme-smoke.shassert_nvme_cloud_shape_azure) in the same boot as the bounce-bufferdma: backend selectionline asserted byassert_nvme_cloud_shape, tying the bound device surface to the DMA backend resolved that boot. - Azure IOMMU-availability DMA-backend policy: Azure does not guarantee a
guest-visible VT-d/IOMMU the way QEMU’s emulated IOMMU does, so the DMA backend
the live Azure path consumes is selected by
cloud-dma-backend-selection(kernel/src/dma_backend.rsselect_and_report): direct-remapping where a usable+safe IOMMU is positively probe-verified, else the labeled bounce-buffer fallback. The classification line labels the expected backend (azure_labeled_dma_backend=bounce-buffer,dma_backend_policy=direct-remapping-if-verified-else-bounce-buffer); the resolved backend is proven separately by thedma: backend selectionline, which on the no-IOMMUmake run-pci-nvmegate isbounce-buffer. - Brokered DMA / no host-physical exposure: the binding lifecycle reuses the
brokered no-IOMMU lane documented in NVMe §6-§8 – the manager
authors every address-bearing register and PRP from the live DMA ledger, and
host_physical_user_visible=falseholds throughout. On a verified remapping lane the provider-written Model B path would apply instead; on the no-IOMMU gate the brokered bounce shape is the only consistent path (seedocs/dma-isolation-design.md, “Provider-Written Addresses And No-IOMMU Brokered Bounce”). DeviceMmio/Interrupt/DMAPool: unchanged from the shared foundation – the reset-onlyCCselected-write claim, the brokered admin and I/O doorbells, the interrupt-driven admin completion wake, and theDMAPool-allocated queue/data pages described in NVMe §4-§8.- QEMU-emulable vs hardware-only: the classification and the full
bind/identify/read lifecycle are end-to-end QEMU-emulable (
make run-pci-nvme). Live managed-disk enumeration over a real Azure Boost controller – vendor-id0x1414confirmation, real namespace geometry, and live block I/O – is hardware-only and is the deferredcloud-azure-storage-live-proof.
Related
- NVMe – the shared NVMe controller wire subset and brokered no-IOMMU storage-provider foundation this shape binds onto.
- AWS Nitro EBS (NVMe storage) – the sibling cloud NVMe storage shape; same shared foundation, different cloud provenance. AWS is NVMe-only with no SCSI alternative, whereas Azure’s older families use SCSI.
- virtio-net – the worked cloud-shape classification example (GCP virtio-net) the storage classifications mirror.
- Azure MANA – the distinct Azure NIC driver-binding claim, out of scope for this storage surface.
docs/dma-isolation-design.md– the DMA-backend selection model and the no-IOMMU brokered bounce policy.docs/backlog/hardware-boot-storage.md– the cloud device tracks, including the deferred live-Azure storage proof.