Go VirtualMemory Contract
Design slice for the review finding “Go-style VirtualMemory
reserve/commit/decommit semantics are missing.” This file does not change the
selected milestone; it records the contract that the Go/runtime allocator
implementation must satisfy.
Implementation status as of 2026-04-26 18:51 EEST: the kernel, schema,
generated bindings, capos-config, capos-rt, host tests, and QEMU proof
coverage implement this contract. The closure summary and verification gates
are recorded in REVIEW_FINDINGS.md.
Design Context
- Current manual pages:
Memory Management for the implemented
VirtualMemory/MemoryObjectbaseline and Userspace Runtime for the runtime client surface that allocator code uses. - Owning proposal: Go Runtime, especially the memory management syscall-equivalent section.
- Related policy proposals: OOM Handling and Swap for memory pressure outcomes and Resource Accounting and Quotas for separate virtual-reservation and physical-commit ledgers.
- Validation reference: Verification Workflow for the evidence expected before closing the review finding.
- Research grounding: LLVM Target for Go runtime OS hooks and Zircon for VMO/VMAR precedent.
Grounding
Project docs and code read for this slice:
REVIEW_FINDINGS.mdWORKPLAN.mddocs/roadmap.mddocs/proposals/go-runtime-proposal.mddocs/architecture/memory.mddocs/architecture/userspace-runtime.mddocs/proposals/oom-and-swap-proposal.mdschema/capos.capnpcapos-config/src/lib.rscapos-rt/src/client.rskernel/src/cap/virtual_memory.rskernel/src/mem/paging.rs
Research files were selected after listing the actual docs/research/
contents. The relevant files are:
docs/research/llvm-target.mddocs/research/zircon.md
docs/research/llvm-target.md records that the Go runtime path depends on
mapping sysAlloc, sysReserve, sysMap, and sysUnused/madvise-like
behavior onto VirtualMemory. docs/research/zircon.md is relevant prior art
because Zircon separates virtual address regions from memory objects and names
commit/decommit as range operations on VMOs; capOS should keep the same
separation of virtual reservation authority from physical backing.
Current Gap
The current schema exposes only:
interface VirtualMemory {
map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
unmap @1 (addr :UInt64, size :UInt64) -> ();
protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
}
The current implementation allocates zeroed physical frames during map,
records ownership per committed anonymous page, charges the caller’s
frame_grant_pages ledger immediately, rejects non-readable protection, and
frees frames during unmap. This is a useful baseline, but it is not the
contract Go expects:
sysReserveneeds address-space reservation without physical frames.sysMap/sysUsedneed explicit physical commit inside a prior reservation.sysUnusedneeds decommit that releases frames while preserving the virtual reservation.sysFreeneeds unmap-style reservation release so returned arenas do not leak virtual quota or address-space ranges.- Stack and arena guard pages need
PROT_NONEsemantics that reliably fault or fail validation without implying the reservation is gone. - Virtual reservation pressure and physical commit pressure need separate quotas and separate auditability.
Contract
The future schema should preserve existing method ids and add explicit reservation operations:
interface VirtualMemory {
map @0 (hint :UInt64, size :UInt64, prot :UInt32) -> (addr :UInt64);
unmap @1 (addr :UInt64, size :UInt64) -> ();
protect @2 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
reserve @3 (hint :UInt64, size :UInt64) -> (addr :UInt64);
commit @4 (addr :UInt64, size :UInt64, prot :UInt32) -> ();
decommit @5 (addr :UInt64, size :UInt64) -> ();
}
Protection constants become:
#![allow(unused)]
fn main() {
pub const VM_PROT_NONE: u32 = 0x0;
pub const VM_PROT_READ: u32 = 0x1;
pub const VM_PROT_WRITE: u32 = 0x2;
pub const VM_PROT_EXEC: u32 = 0x4;
}
VM_PROT_NONE is the only valid zero-bit protection value. Unknown bits are
rejected. Any non-NONE protection must include VM_PROT_READ; write-only or
execute-only user mappings are rejected rather than silently upgraded, because
x86_64 cannot represent a present user page that lacks read access. Writable
and executable mappings remain rejected. VM_PROT_NONE must be represented by
ledger state plus a non-present user PTE, not by relying on hardware “no read”
permission.
map remains as a compatibility/convenience operation equivalent to
reserve(hint, size) followed by commit(addr, size, prot), with atomic
rollback if the commit or result serialization fails. Existing runtime clients
can keep using it until the Go allocator switches to the explicit reserve path.
Semantics
All sizes are rounded up to 4 KiB pages after rejecting zero-size ranges and
overflow. All non-zero addresses must be page-aligned, and the entire rounded
range [addr, addr + size) must fit at or below USER_ADDR_LIMIT without
overflow. Ranges overlapping the capability ring or CapSet page remain invalid
for reserve, map, commit, decommit, unmap, and protect.
reserve(hint, size):
- Reserves a contiguous virtual range in the caller’s address space.
- Allocates no physical frames and installs no user-accessible PTEs.
- Charges only the virtual reservation ledger.
- With
hint == 0, chooses a free range in the user address space. - With
hint != 0, acts as fixed no-replace placement: overlap with any live reservation, committed page, object mapping, ring page, or CapSet page fails. - Returns the base address of the reservation.
commit(addr, size, prot):
- Requires the whole range to lie inside existing anonymous reservations owned
by the same address-space-bound
VirtualMemorycap. - Requires every page in the range to be currently uncommitted.
- Allocates zeroed physical frames, charges the physical commit ledger, and records the committed state per page.
- Installs present user PTEs for readable/writable/executable protections.
- For
VM_PROT_NONE, allocates and charges frames but leaves user PTEs non-present while retaining the frames in the reservation ledger. - This is for committed inaccessible memory whose contents must survive a later protection restore. Pure stack or arena guard pages should stay reserved but uncommitted so they consume virtual quota without consuming physical commit budget.
- Is all-or-nothing: allocation, page-table updates, ledger charge, and TLB completion reservation must either all become visible or all roll back.
decommit(addr, size):
- Requires the whole range to lie inside existing anonymous reservations owned by the same cap.
- Allows committed and already-uncommitted pages in the range.
- Removes any present PTEs, releases frames for committed pages, drops physical commit charges, and preserves the virtual reservation.
- Leaves every page in the range in the uncommitted reserved state.
- Must perform the same local flush and remote shootdown discipline as unmap and protect before a frame can return to the allocator.
protect(addr, size, prot):
- Requires the whole range to be committed anonymous pages owned by the same cap. It does not commit uncommitted reserved pages.
- May set
VM_PROT_NONE; the kernel keeps the committed frames charged and associated with the pages, removes present user PTEs, and denies user access until a laterprotectrestores readable permissions ordecommitreleases the frames. - Preserves existing zeroed/data contents when moving between
VM_PROT_NONEand accessible protections. - Keeps W^X enforcement and rejects unknown bits.
unmap(addr, size):
- Releases the reservation for the whole range.
- Frees committed frames and physical commit charges for committed pages.
- Releases virtual reservation charges for every page.
- Fails if the range is not wholly covered by anonymous reservations owned by the same cap.
Page faults and validation:
- Access to an unreserved page is an ordinary unmapped access.
- Access to a reserved uncommitted page is a reservation fault. The initial Go contract should fail closed; demand commit on fault is a later policy choice, not implicit behavior in this slice.
- Access to a committed
VM_PROT_NONEpage is a protection fault and must not release the reservation or physical frame. - A pure guard page is a reserved uncommitted page, not a committed
VM_PROT_NONEpage, unless the runtime deliberately needs hidden retained contents. - Kernel user-buffer validation and copy helpers must treat reserved
uncommitted pages and committed
VM_PROT_NONEpages as inaccessible.
Ledgers
The implementation needs two ledgers of record:
- Virtual reservation pages: charged by
reserve, released byunmap, and unchanged bycommit,decommit, orprotect. Compatibilitymapcharges this ledger because it creates an implicit reservation. - Physical commit pages: charged by
commitormap, released bydecommitorunmap, and unchanged byprotect.
The current ResourceLedger::frame_grant_pages can continue to represent
physical commit pressure if the implementation gives anonymous committed pages,
held MemoryObject caps, and borrowed object mappings one shared physical-page
budget. Virtual reservations need a separate process-owned quota; do not hide
virtual reservation pages in the physical frame ledger.
Address-space ownership tracking must become reservation-based instead of a
flat list of committed anonymous pages. The reservation ledger must be sparse:
Go-scale reservations can be terabytes, so reserve must not allocate one
metadata entry per reserved page. A minimal host-testable model should track
non-overlapping reservation intervals and sparse committed state inside those
intervals, such as committed subranges or a committed-page map keyed only by
pages that currently hold frames.
ReservedCommitted { frame, prot }
MemoryObject borrowed mappings stay outside anonymous reservations for this
slice. Any future design that allows object mappings inside sub-reservations
must explicitly define ownership and teardown interaction.
Implementation Gates
- Add
VM_PROT_NONE,reserve,commit, anddecommitto schema, generated bindings,capos-config, andcapos-rtclients while preserving current method ids. - Replace committed-page-only anonymous ownership tracking with a sparse
reservation ledger that can represent large uncommitted intervals plus
committed accessible and committed
VM_PROT_NONEpages without allocating per-page metadata for every reserved page. - Add a virtual reservation quota separate from the physical frame-grant ledger and make quota errors distinguish virtual exhaustion from physical commit exhaustion.
- Rework
VirtualMemoryCapmap/unmap/protect around the reservation ledger, including rollback paths, TLB shootdown completion, and process-exit cleanup. - Keep ring and CapSet virtual pages reserved outside caller control.
- Update
capos-rtso allocator paths can use caller-owned scratch buffers for reserve, commit, decommit, protect, and unmap without allocating during heap growth. - Add host tests for overlap rejection, fixed no-replace hints, partial
decommit, recommit zero-fill,
VM_PROT_NONEprotect/restore, quota accounting, rollback, and process teardown. - Add QEMU proof coverage before closing the review finding:
reserve-without-frame-commit, commit and write, protect to
VM_PROT_NONE, restore and preserve contents, decommit and recommit zero-fill, unmap reservation release, virtual-quota exhaustion, and physical-commit quota release after decommit.
Non-Goals
- Demand paging on first access.
- Swap or overcommit policy.
- File-backed mappings.
- Copy-on-write snapshots.
- Hierarchical VMAR/sub-address-space capabilities.
- Sharing anonymous reservations across processes.
Those can build on the reservation ledger later, but the Go allocator contract must not depend on them.