Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Go VirtualMemory Contract

Design slice for the review finding “Go-style VirtualMemory reserve/commit/decommit semantics are missing.” This file does not change the selected milestone; it records the contract that the Go/runtime allocator implementation must satisfy.

Implementation status as of 2026-04-26 18:51 EEST: the kernel, schema, generated bindings, capos-config, capos-rt, host tests, and QEMU proof coverage implement this contract. The closure summary and verification gates are recorded in REVIEW_FINDINGS.md.

Design Context

Grounding

Project docs and code read for this slice:

  • REVIEW_FINDINGS.md
  • WORKPLAN.md
  • docs/roadmap.md
  • docs/proposals/go-runtime-proposal.md
  • docs/architecture/memory.md
  • docs/architecture/userspace-runtime.md
  • docs/proposals/oom-and-swap-proposal.md
  • schema/capos.capnp
  • capos-config/src/lib.rs
  • capos-rt/src/client.rs
  • kernel/src/cap/virtual_memory.rs
  • kernel/src/mem/paging.rs

Research files were selected after listing the actual docs/research/ contents. The relevant files are:

  • docs/research/llvm-target.md
  • docs/research/zircon.md

docs/research/llvm-target.md records that the Go runtime path depends on mapping sysAlloc, sysReserve, sysMap, and sysUnused/madvise-like behavior onto VirtualMemory. docs/research/zircon.md is relevant prior art because Zircon separates virtual address regions from memory objects and names commit/decommit as range operations on VMOs; capOS should keep the same separation of virtual reservation authority from physical backing.

Current Gap

The current schema exposes only:

interface VirtualMemory {
    map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
    unmap @1 (addr :UInt64, size :UInt64) -> ();
    protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}

The current implementation allocates zeroed physical frames during map, records ownership per committed anonymous page, charges the caller’s frame_grant_pages ledger immediately, rejects non-readable protection, and frees frames during unmap. This is a useful baseline, but it is not the contract Go expects:

  • sysReserve needs address-space reservation without physical frames.
  • sysMap/sysUsed need explicit physical commit inside a prior reservation.
  • sysUnused needs decommit that releases frames while preserving the virtual reservation.
  • sysFree needs unmap-style reservation release so returned arenas do not leak virtual quota or address-space ranges.
  • Stack and arena guard pages need PROT_NONE semantics that reliably fault or fail validation without implying the reservation is gone.
  • Virtual reservation pressure and physical commit pressure need separate quotas and separate auditability.

Contract

The future schema should preserve existing method ids and add explicit reservation operations:

interface VirtualMemory {
    map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
    unmap @1 (addr :UInt64, size :UInt64) -> ();
    protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();

    reserve @3 (hint :UInt64, size :UInt64) -> (addr :UInt64);
    commit @4 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
    decommit @5 (addr :UInt64, size :UInt64) -> ();
}

Protection constants become:

#![allow(unused)]
fn main() {
pub const VM_PROT_NONE: u32 = 0x0;
pub const VM_PROT_READ: u32 = 0x1;
pub const VM_PROT_WRITE: u32 = 0x2;
pub const VM_PROT_EXEC: u32 = 0x4;
}

VM_PROT_NONE is the only valid zero-bit protection value. Unknown bits are rejected. Any non-NONE protection must include VM_PROT_READ; write-only or execute-only user mappings are rejected rather than silently upgraded, because x86_64 cannot represent a present user page that lacks read access. Writable and executable mappings remain rejected. VM_PROT_NONE must be represented by ledger state plus a non-present user PTE, not by relying on hardware “no read” permission.

map remains as a compatibility/convenience operation equivalent to reserve(hint, size) followed by commit(addr, size, prot), with atomic rollback if the commit or result serialization fails. Existing runtime clients can keep using it until the Go allocator switches to the explicit reserve path.

Semantics

All sizes are rounded up to 4 KiB pages after rejecting zero-size ranges and overflow. All non-zero addresses must be page-aligned, and the entire rounded range [addr, addr + size) must fit at or below USER_ADDR_LIMIT without overflow. Ranges overlapping the capability ring or CapSet page remain invalid for reserve, map, commit, decommit, unmap, and protect.

reserve(hint, size):

  • Reserves a contiguous virtual range in the caller’s address space.
  • Allocates no physical frames and installs no user-accessible PTEs.
  • Charges only the virtual reservation ledger.
  • With hint == 0, chooses a free range in the user address space.
  • With hint != 0, acts as fixed no-replace placement: overlap with any live reservation, committed page, object mapping, ring page, or CapSet page fails.
  • Returns the base address of the reservation.

commit(addr, size, prot):

  • Requires the whole range to lie inside existing anonymous reservations owned by the same address-space-bound VirtualMemory cap.
  • Requires every page in the range to be currently uncommitted.
  • Allocates zeroed physical frames, charges the physical commit ledger, and records the committed state per page.
  • Installs present user PTEs for readable/writable/executable protections.
  • For VM_PROT_NONE, allocates and charges frames but leaves user PTEs non-present while retaining the frames in the reservation ledger.
  • This is for committed inaccessible memory whose contents must survive a later protection restore. Pure stack or arena guard pages should stay reserved but uncommitted so they consume virtual quota without consuming physical commit budget.
  • Is all-or-nothing: allocation, page-table updates, ledger charge, and TLB completion reservation must either all become visible or all roll back.

decommit(addr, size):

  • Requires the whole range to lie inside existing anonymous reservations owned by the same cap.
  • Allows committed and already-uncommitted pages in the range.
  • Removes any present PTEs, releases frames for committed pages, drops physical commit charges, and preserves the virtual reservation.
  • Leaves every page in the range in the uncommitted reserved state.
  • Must perform the same local flush and remote shootdown discipline as unmap and protect before a frame can return to the allocator.

protect(addr, size, prot):

  • Requires the whole range to be committed anonymous pages owned by the same cap. It does not commit uncommitted reserved pages.
  • May set VM_PROT_NONE; the kernel keeps the committed frames charged and associated with the pages, removes present user PTEs, and denies user access until a later protect restores readable permissions or decommit releases the frames.
  • Preserves existing zeroed/data contents when moving between VM_PROT_NONE and accessible protections.
  • Keeps W^X enforcement and rejects unknown bits.

unmap(addr, size):

  • Releases the reservation for the whole range.
  • Frees committed frames and physical commit charges for committed pages.
  • Releases virtual reservation charges for every page.
  • Fails if the range is not wholly covered by anonymous reservations owned by the same cap.

Page faults and validation:

  • Access to an unreserved page is an ordinary unmapped access.
  • Access to a reserved uncommitted page is a reservation fault. The initial Go contract should fail closed; demand commit on fault is a later policy choice, not implicit behavior in this slice.
  • Access to a committed VM_PROT_NONE page is a protection fault and must not release the reservation or physical frame.
  • A pure guard page is a reserved uncommitted page, not a committed VM_PROT_NONE page, unless the runtime deliberately needs hidden retained contents.
  • Kernel user-buffer validation and copy helpers must treat reserved uncommitted pages and committed VM_PROT_NONE pages as inaccessible.

Ledgers

The implementation needs two ledgers of record:

  • Virtual reservation pages: charged by reserve, released by unmap, and unchanged by commit, decommit, or protect. Compatibility map charges this ledger because it creates an implicit reservation.
  • Physical commit pages: charged by commit or map, released by decommit or unmap, and unchanged by protect.

The current ResourceLedger::frame_grant_pages can continue to represent physical commit pressure if the implementation gives anonymous committed pages, held MemoryObject caps, and borrowed object mappings one shared physical-page budget. Virtual reservations need a separate process-owned quota; do not hide virtual reservation pages in the physical frame ledger.

Address-space ownership tracking must become reservation-based instead of a flat list of committed anonymous pages. The reservation ledger must be sparse: Go-scale reservations can be terabytes, so reserve must not allocate one metadata entry per reserved page. A minimal host-testable model should track non-overlapping reservation intervals and sparse committed state inside those intervals, such as committed subranges or a committed-page map keyed only by pages that currently hold frames.

  • Reserved
  • Committed { frame, prot }

MemoryObject borrowed mappings stay outside anonymous reservations for this slice. Any future design that allows object mappings inside sub-reservations must explicitly define ownership and teardown interaction.

Implementation Gates

  1. Add VM_PROT_NONE, reserve, commit, and decommit to schema, generated bindings, capos-config, and capos-rt clients while preserving current method ids.
  2. Replace committed-page-only anonymous ownership tracking with a sparse reservation ledger that can represent large uncommitted intervals plus committed accessible and committed VM_PROT_NONE pages without allocating per-page metadata for every reserved page.
  3. Add a virtual reservation quota separate from the physical frame-grant ledger and make quota errors distinguish virtual exhaustion from physical commit exhaustion.
  4. Rework VirtualMemoryCap map/unmap/protect around the reservation ledger, including rollback paths, TLB shootdown completion, and process-exit cleanup.
  5. Keep ring and CapSet virtual pages reserved outside caller control.
  6. Update capos-rt so allocator paths can use caller-owned scratch buffers for reserve, commit, decommit, protect, and unmap without allocating during heap growth.
  7. Add host tests for overlap rejection, fixed no-replace hints, partial decommit, recommit zero-fill, VM_PROT_NONE protect/restore, quota accounting, rollback, and process teardown.
  8. Add QEMU proof coverage before closing the review finding: reserve-without-frame-commit, commit and write, protect to VM_PROT_NONE, restore and preserve contents, decommit and recommit zero-fill, unmap reservation release, virtual-quota exhaustion, and physical-commit quota release after decommit.

Non-Goals

  • Demand paging on first access.
  • Swap or overcommit policy.
  • File-backed mappings.
  • Copy-on-write snapshots.
  • Hierarchical VMAR/sub-address-space capabilities.
  • Sharing anonymous reservations across processes.

Those can build on the reservation ledger later, but the Go allocator contract must not depend on them.