Memory Management
Memory management gives the kernel controlled ownership of physical frames, separates user processes, enforces page permissions, and exposes memory authority only through explicit capabilities.
Related
- Go VirtualMemory Contract
records the reserve/commit/decommit contract that extended the
VirtualMemoryimplementation for Go-style arena allocation. - Userspace Runtime describes the typed
VirtualMemoryClientsurface used by allocator and runtime code. - OOM Handling and Swap and Resource Accounting and Quotas define future memory-pressure and quota policy.
- Memory Authority Model defines the future cross-cutting contract for memory authority classes, residency, mapping consistency, pins, DMA boundaries, swap eligibility, and proof obligations.
Current Behavior
The frame allocator builds a bitmap from the Limine memory map, marks all non-usable frames as used, reserves frame zero, and reserves its own bitmap frames. The heap is initialized separately for kernel allocation.
Paging initialization builds a new kernel PML4, remaps kernel sections with section-specific permissions, copies upper-half mappings with NX applied and user access stripped, switches CR3, then enables page-global support. SMEP/SMAP are enabled after those mappings are active.
Each user AddressSpace owns its lower-half page tables and clones the
kernel’s upper-half mappings. Dropping an address space walks the user half and
frees mapped frames, committed anonymous frames retained behind VM_PROT_NONE,
and page-table frames. VirtualMemory lets a process reserve anonymous
address ranges, commit and decommit physical backing, unmap reservations, and
protect committed pages. Anonymous reservations charge the process virtual
reservation ledger. Committed anonymous pages charge
ResourceLedger::frame_grant_pages.
FrameAllocator allocation methods return a MemoryObject result capability,
not a physical address. The normal result payload carries the result-cap index,
and the CQE transfer-result record carries the local cap id plus
MemoryObject interface id. MemoryObject.info exposes page count and size;
MemoryObject.map maps page-aligned object ranges into the caller address
space, MemoryObject.unmap removes those borrowed mappings, and
MemoryObject.protect updates their page-table flags. Held MemoryObject caps
charge the holder’s frame_grant_pages ledger, and final CAP_OP_RELEASE or
process exit frees the owned frames once no borrowed address-space mapping still
holds the backing alive.
Design
The kernel keeps physical allocation host-testable by placing bitmap logic in
capos-lib and wrapping it with kernel HHDM access in kernel/src/mem/frame.rs.
Page-table manipulation stays in the kernel because it is architecture-specific.
ELF loading and VirtualMemory both use page-table flags to preserve W^X:
non-executable data gets NX, writable mappings are explicit, and userspace
pages must be USER_ACCESSIBLE. The CapSet and ring bootstrap pages occupy
reserved virtual pages; VirtualMemory rejects ranges that overlap either one.
User-buffer validation for process-owned buffers uses the process
AddressSpace mutex. The kernel checks that user pointers stay below the user
address limit, verifies page-table permissions for the requested read/write
access, and copies through the HHDM mapping while holding the same address-space
lock. This keeps validation and use tied to one stable page-table view. The
legacy current-CR3 validator remains only for callers that already provide an
equivalent page-table stability guarantee.
Committed VirtualMemory pages and held MemoryObject caps use the same
per-process frame-grant ledger, with quota checks before frame allocation or
mapping side effects. Anonymous reservation consumes a separate virtual page
quota, so guard ranges and Go-style sysReserve arenas do not spend physical
commit budget. Held MemoryObject caps charge for the backing they keep
reachable, and each live borrowed MemoryObject mapping reserves frame-grant
pages until it is unmapped. This prevents a process from mapping an object,
releasing the cap to drop the cap-slot charge, and keeping the backing pinned
without quota. The address space records borrowed pages separately from sparse
anonymous reservations so teardown and unmap can distinguish anonymous pages
from object-backed pages. Future file/network/DMA resources should reuse that
authority ledger instead of adding one-off counters per cap.
Invariants
- Frame addresses are 4 KiB aligned.
- The frame bitmap’s own frames are never returned as free frames.
- Upper-half kernel mappings are not user-accessible.
- Kernel text is RX, rodata is read-only NX, and data/bss are RW NX.
- User address spaces own only lower-half page-table frames.
- Process frame-grant usage covers committed anonymous VM pages, held
MemoryObjectcaps, and live borrowedMemoryObjectmappings. - Process virtual-reservation usage covers reserved anonymous VM pages whether or not they are committed.
- Committed
VM_PROT_NONEpages retain their frames and data while exposing no present user PTE; reserved uncommitted pages consume no frame-grant quota. - Object-backed user mappings are tracked as borrowed pages and hold the
MemoryObjectbacking alive until unmapped or address-space teardown. MemoryObjectunmap/protect only succeeds for borrowed pages backed by the same object.VirtualMemorycaps are bound to one address space and are not valid cross-process service exports.- CapSet is read-only/no-execute; ring is writable/no-execute.
VirtualMemorycannot reserve, map, commit, decommit, unmap, or protect the ring or CapSet pages.VirtualMemorycommit/decommit/protect/unmap only succeeds for ranges covered by anonymous reservations owned by the cap’s address space.- Capability-ring CALL/RECV/RETURN buffers, transfer descriptors, process and
thread wait completions, and private ParkSpace word reads must validate and
copy/read while holding the target process
AddressSpacelock.
Code Map
capos-lib/src/frame_bitmap.rs- host-testable physical frame bitmap core.capos-lib/src/cap_table.rs- capability holds and per-processResourceLedgerframe-grant accounting.capos-lib/src/frame_ledger.rs- bounded frame-grant helper retained for host tests.kernel/src/mem/frame.rs- Limine memory-map integration and global frame allocator wrapper.kernel/src/mem/heap.rs- kernel heap setup.kernel/src/mem/paging.rs- kernel remap,AddressSpace, page mapping, VM-cap page tracking, user copy helpers.kernel/src/mem/validate.rs- user-address bounds and legacy current-CR3 validation helper.kernel/src/cap/frame_alloc.rs- FrameAllocator capability and cleanup.demos/memoryobject-shared-parent/anddemos/memoryobject-shared-child/- QEMU shared MemoryObject smoke.tools/qemu-memoryobject-shared-smoke.sh- transcript checks for the shared MemoryObject smoke.kernel/src/cap/virtual_memory.rs- VirtualMemory capability.kernel/src/spawn.rs- ELF, stack, and TLS user mappings.kernel/src/arch/x86_64/smap.rs- SMEP/SMAP setup and legacy direct user access guard.
Validation
cargo test-libcovers frame bitmap, frame ledger, ELF parser, and cap-table pure logic.cargo miri-libruns host-testablecapos-libtests under Miri when installed.make kani-libproves the bounded mandatory frame-bitmap, stale-handle, cap-slot/frame-grant accounting, and transfer preflight fail-closed invariants when Kani is installed.make run-smokevalidates ELF mapping, process teardown, TLS, and clean shell-led halt.make run-spawnvalidates MemoryObject-backed FrameAllocator cleanup, VirtualMemory reserve/commit/decommit/VM_PROT_NONE/quota/release smoke, and runtime spawn checks.make run-memoryobject-sharedvalidates a parent allocating and mapping a MemoryObject, transferring it to a child, observing a child write through the same backing pages, unmapping both sides, and halting cleanly.make run-ipc-zerocopyvalidates the multi-message shared point-to-point buffer pattern at the substrate level: a producer transfers oneMemoryObjectto the consumer and then exchanges four record payloads through the shared mapping while endpoint CALLs carry only sequence numbers and checksums. This is a substrate proof, not the production data-plane shape: typedSharedBufferwith explicit producer/consumer ring metadata, notification primitives, and consuming service APIs (File.readBuf,BlockDevice.readBlocks, NIC RX/TX rings) are tracked under Open Work.make run-spawnvalidates ELF load failure rollback and frame exhaustion handling throughProcessSpawner.
Open Work
- Extend frame-grant accounting only if future DMA pinning or service-owned shared-buffer pools need authority beyond held MemoryObject caps and live borrowed mappings.
- Define page-pinning or mapping-identity rules for future shared WaitSet, DMA, and service-owned shared-buffer paths that must keep physical backing stable beyond a single locked copy/read. The owning planning track is Memory Authority Model.
- Add file, block, network, and DMA service APIs that use MemoryObject-backed SharedBuffer caps for zero-copy data paths.
- Add DMA isolation and device memory capability boundaries before userspace drivers.
- Add huge-page handling only with explicit ownership and teardown rules.