Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Azure managed disk (NVMe storage)

This is a provenance map for the Azure managed-disk storage shape: how an Azure VM presents its managed (and local) disks to the guest, why the modern surface is the same standard NVMe device the shared NVMe storage-provider foundation already drives, why the older-family SCSI path is not a usable alternative here, and the small Azure delta capOS adds on top of the shared foundation. It is not a re-spec; the NVMe register/queue/PRP wire subset capOS actually touches is documented once in NVMe and not repeated here.

Maturity caveat. This page documents a local QEMU cloud-shape classification, not a bound driver running on real Azure hardware. The NVMe bind/identify/read lifecycle is proven locally on make run-pci-nvme against QEMU’s -device nvme; the Azure delta is the Azure-context classification proof line and the Azure DMA-backend policy note on top of that shared NVMe foundation. End-to-end Azure managed-disk enumeration, live namespace I/O, and cloud evidence capture are future work (tracked as cloud-azure-storage-live-proof), to be done when Azure access is provisioned. The Azure MANA NIC is a distinct driver-binding claim (see Azure MANA) and is out of scope here.

1. Spec basis

  • Device: Azure managed-disk storage controller. Azure presents storage in two shapes depending on VM generation:
    • Azure Boost and newer NVMe-capable families expose managed disks (and local SSD) as NVMe namespaces behind a standard NVMe PCI controller – PCI class 0x01 (mass storage), subclass 0x08 (NVM), programming interface 0x02 (NVM Express). This is the same class triple QEMU emulates with -device nvme and the kernel detects with PciDevice::is_nvme_controller (kernel/src/pci.rs). This is the path this page documents.
    • Older VM families present managed disks over a Hyper-V SCSI controller (a virtio-scsi-shaped interface). capOS has no userspace virtio-scsi provider driver, and make run-virtio-blk proves the kernel-owned virtio-blk driver – a kernel-owned driver leaves the hidden kernel DMA ownership the userspace-provider acceptance forbids. The SCSI path is therefore out of scope for this driver (recorded on the classification line as azure_scsi_path=no-userspace-provider-driver-out-of-scope); supporting it would be a separate userspace virtio-scsi provider-driver foundation, not a re-use of the run-virtio-blk gate.
  • Production PCI identity: the Azure Boost NVMe controller carries Microsoft’s PCI vendor id 0x1414, distinct from QEMU’s 0x1b36. capOS therefore classifies on the device class surface and the brokered no-IOMMU bounce DMA shape, not on a vendor-id match (see §3); live vendor-id confirmation and real namespace geometry belong to the deferred cloud-azure-storage-live-proof.
  • Authoritative spec: the NVM Express Base Specification (NVMe 1.4 / 2.0) is the wire contract; Azure publishes no separate managed-disk register spec because the modern device is a standard NVMe controller. Azure documents the Boost NVMe interface and namespace exposure in the “Azure Boost” and “Enable NVMe” VM documentation (https://learn.microsoft.com/azure/virtual-machines/enable-nvme-interface); the in-guest reference driver is the upstream Linux drivers/nvme/host/.
  • Wire-format subset capOS implements: identical to the standard NVMe subset documented in NVMe §1-§2 (controller registers CAP/CC/ AQA/ASQ/ACQ/CSTS, the admin and one I/O submission/completion queue pair, per-queue doorbells, and PRP1/PRP2 data pointers). Azure Boost adds no fields beyond that subset, so this page does not re-list them.

2. Wire format (relevant subset)

See NVMe §2 and §6-§8. There is no Azure-specific wire format to document: the brokered controller enable (manager-authored AQA/ASQ/ACQ), the admin IDENTIFY, the one I/O queue pair, and the bounded READ all use the standard NVMe encoding the shared foundation already implements and proves.

3. capOS mapping

The Azure delta is a cloud-shape classification plus a DMA-backend policy consumption detail layered onto the shared NVMe storage-provider foundation; it adds no new driver code.

  • Cloud-shape classification proof: after the first enumerated NVMe controller is bound (bind_qemu_nvme_controller), the enumeration path emits a nvme: cloud shape classification cloud_shape=azure-managed-disk ... proof line (kernel/src/pci.rs report_cloud_nvme_shape_azure, alongside the AWS report_cloud_nvme_shape) classifying the same bound controller against the documented Azure managed-disk device surface. It prints the enumerated pci_vendor/pci_device_id and class/subclass/prog_if, records the production azure_nvme_vendor=0x1414 identity as documentation (not as a claimed match), records the out-of-scope SCSI path (azure_scsi_path=no-userspace-provider-driver-out-of-scope), and carries explicit scope flags (local_qemu_precursor=true, real_azure_enumeration=not-claimed, mana=separate-nic-driver-out-of-scope). make run-pci-nvme asserts this line (tools/qemu-pci-nvme-smoke.sh assert_nvme_cloud_shape_azure) in the same boot as the bounce-buffer dma: backend selection line asserted by assert_nvme_cloud_shape, tying the bound device surface to the DMA backend resolved that boot.
  • Azure IOMMU-availability DMA-backend policy: Azure does not guarantee a guest-visible VT-d/IOMMU the way QEMU’s emulated IOMMU does, so the DMA backend the live Azure path consumes is selected by cloud-dma-backend-selection (kernel/src/dma_backend.rs select_and_report): direct-remapping where a usable+safe IOMMU is positively probe-verified, else the labeled bounce-buffer fallback. The classification line labels the expected backend (azure_labeled_dma_backend=bounce-buffer, dma_backend_policy=direct-remapping-if-verified-else-bounce-buffer); the resolved backend is proven separately by the dma: backend selection line, which on the no-IOMMU make run-pci-nvme gate is bounce-buffer.
  • Brokered DMA / no host-physical exposure: the binding lifecycle reuses the brokered no-IOMMU lane documented in NVMe §6-§8 – the manager authors every address-bearing register and PRP from the live DMA ledger, and host_physical_user_visible=false holds throughout. On a verified remapping lane the provider-written Model B path would apply instead; on the no-IOMMU gate the brokered bounce shape is the only consistent path (see docs/dma-isolation-design.md, “Provider-Written Addresses And No-IOMMU Brokered Bounce”).
  • DeviceMmio / Interrupt / DMAPool: unchanged from the shared foundation – the reset-only CC selected-write claim, the brokered admin and I/O doorbells, the interrupt-driven admin completion wake, and the DMAPool-allocated queue/data pages described in NVMe §4-§8.
  • QEMU-emulable vs hardware-only: the classification and the full bind/identify/read lifecycle are end-to-end QEMU-emulable (make run-pci-nvme). Live managed-disk enumeration over a real Azure Boost controller – vendor-id 0x1414 confirmation, real namespace geometry, and live block I/O – is hardware-only and is the deferred cloud-azure-storage-live-proof.
  • NVMe – the shared NVMe controller wire subset and brokered no-IOMMU storage-provider foundation this shape binds onto.
  • AWS Nitro EBS (NVMe storage) – the sibling cloud NVMe storage shape; same shared foundation, different cloud provenance. AWS is NVMe-only with no SCSI alternative, whereas Azure’s older families use SCSI.
  • virtio-net – the worked cloud-shape classification example (GCP virtio-net) the storage classifications mirror.
  • Azure MANA – the distinct Azure NIC driver-binding claim, out of scope for this storage surface.
  • docs/dma-isolation-design.md – the DMA-backend selection model and the no-IOMMU brokered bounce policy.
  • docs/backlog/hardware-boot-storage.md – the cloud device tracks, including the deferred live-Azure storage proof.