Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Memory Management

Memory management gives the kernel controlled ownership of physical frames, separates user processes, enforces page permissions, and exposes memory authority only through explicit capabilities.

Current Behavior

The frame allocator builds a bitmap from the Limine memory map, marks all non-usable frames as used, reserves frame zero, and reserves its own bitmap frames. The heap is initialized separately for kernel allocation.

Paging initialization builds a new kernel PML4, remaps kernel sections with section-specific permissions, copies upper-half mappings with NX applied and user access stripped, switches CR3, then enables page-global support. SMEP/SMAP are enabled after those mappings are active.

Each user AddressSpace owns its lower-half page tables and clones the kernel’s upper-half mappings. Dropping an address space walks the user half and frees mapped frames, committed anonymous frames retained behind VM_PROT_NONE, and page-table frames. VirtualMemory lets a process reserve anonymous address ranges, commit and decommit physical backing, unmap reservations, and protect committed pages. Anonymous reservations charge the process virtual reservation ledger. Committed anonymous pages charge ResourceLedger::frame_grant_pages.

FrameAllocator allocation methods return a MemoryObject result capability, not a physical address. The normal result payload carries the result-cap index, and the CQE transfer-result record carries the local cap id plus MemoryObject interface id. MemoryObject.info exposes page count and size; MemoryObject.map maps page-aligned object ranges into the caller address space, MemoryObject.unmap removes those borrowed mappings, and MemoryObject.protect updates their page-table flags. Held MemoryObject caps charge the holder’s frame_grant_pages ledger, and final CAP_OP_RELEASE or process exit frees the owned frames once no borrowed address-space mapping still holds the backing alive.

Design

The kernel keeps physical allocation host-testable by placing bitmap logic in capos-lib and wrapping it with kernel HHDM access in kernel/src/mem/frame.rs. Page-table manipulation stays in the kernel because it is architecture-specific.

ELF loading and VirtualMemory both use page-table flags to preserve W^X: non-executable data gets NX, writable mappings are explicit, and userspace pages must be USER_ACCESSIBLE. The CapSet and ring bootstrap pages occupy reserved virtual pages; VirtualMemory rejects ranges that overlap either one.

User-buffer validation for process-owned buffers uses the process AddressSpace mutex. The kernel checks that user pointers stay below the user address limit, verifies page-table permissions for the requested read/write access, and copies through the HHDM mapping while holding the same address-space lock. This keeps validation and use tied to one stable page-table view. The legacy current-CR3 validator remains only for callers that already provide an equivalent page-table stability guarantee.

Committed VirtualMemory pages and held MemoryObject caps use the same per-process frame-grant ledger, with quota checks before frame allocation or mapping side effects. Anonymous reservation consumes a separate virtual page quota, so guard ranges and Go-style sysReserve arenas do not spend physical commit budget. Held MemoryObject caps charge for the backing they keep reachable, and each live borrowed MemoryObject mapping reserves frame-grant pages until it is unmapped. This prevents a process from mapping an object, releasing the cap to drop the cap-slot charge, and keeping the backing pinned without quota. The address space records borrowed pages separately from sparse anonymous reservations so teardown and unmap can distinguish anonymous pages from object-backed pages. Future file/network/DMA resources should reuse that authority ledger instead of adding one-off counters per cap.

Invariants

  • Frame addresses are 4 KiB aligned.
  • The frame bitmap’s own frames are never returned as free frames.
  • Upper-half kernel mappings are not user-accessible.
  • Kernel text is RX, rodata is read-only NX, and data/bss are RW NX.
  • User address spaces own only lower-half page-table frames.
  • Process frame-grant usage covers committed anonymous VM pages, held MemoryObject caps, and live borrowed MemoryObject mappings.
  • Process virtual-reservation usage covers reserved anonymous VM pages whether or not they are committed.
  • Committed VM_PROT_NONE pages retain their frames and data while exposing no present user PTE; reserved uncommitted pages consume no frame-grant quota.
  • Object-backed user mappings are tracked as borrowed pages and hold the MemoryObject backing alive until unmapped or address-space teardown.
  • MemoryObject unmap/protect only succeeds for borrowed pages backed by the same object.
  • VirtualMemory caps are bound to one address space and are not valid cross-process service exports.
  • CapSet is read-only/no-execute; ring is writable/no-execute.
  • VirtualMemory cannot reserve, map, commit, decommit, unmap, or protect the ring or CapSet pages.
  • VirtualMemory commit/decommit/protect/unmap only succeeds for ranges covered by anonymous reservations owned by the cap’s address space.
  • Capability-ring CALL/RECV/RETURN buffers, transfer descriptors, process and thread wait completions, and private ParkSpace word reads must validate and copy/read while holding the target process AddressSpace lock.

Code Map

  • capos-lib/src/frame_bitmap.rs - host-testable physical frame bitmap core.
  • capos-lib/src/cap_table.rs - capability holds and per-process ResourceLedger frame-grant accounting.
  • capos-lib/src/frame_ledger.rs - bounded frame-grant helper retained for host tests.
  • kernel/src/mem/frame.rs - Limine memory-map integration and global frame allocator wrapper.
  • kernel/src/mem/heap.rs - kernel heap setup.
  • kernel/src/mem/paging.rs - kernel remap, AddressSpace, page mapping, VM-cap page tracking, user copy helpers.
  • kernel/src/mem/validate.rs - user-address bounds and legacy current-CR3 validation helper.
  • kernel/src/cap/frame_alloc.rs - FrameAllocator capability and cleanup.
  • demos/memoryobject-shared-parent/ and demos/memoryobject-shared-child/ - QEMU shared MemoryObject smoke.
  • tools/qemu-memoryobject-shared-smoke.sh - transcript checks for the shared MemoryObject smoke.
  • kernel/src/cap/virtual_memory.rs - VirtualMemory capability.
  • kernel/src/spawn.rs - ELF, stack, and TLS user mappings.
  • kernel/src/arch/x86_64/smap.rs - SMEP/SMAP setup and legacy direct user access guard.

Validation

  • cargo test-lib covers frame bitmap, frame ledger, ELF parser, and cap-table pure logic.
  • cargo miri-lib runs host-testable capos-lib tests under Miri when installed.
  • make kani-lib proves the bounded mandatory frame-bitmap, stale-handle, cap-slot/frame-grant accounting, and transfer preflight fail-closed invariants when Kani is installed.
  • make run-smoke validates ELF mapping, process teardown, TLS, and clean shell-led halt.
  • make run-spawn validates MemoryObject-backed FrameAllocator cleanup, VirtualMemory reserve/commit/decommit/VM_PROT_NONE/quota/release smoke, and runtime spawn checks.
  • make run-memoryobject-shared validates a parent allocating and mapping a MemoryObject, transferring it to a child, observing a child write through the same backing pages, unmapping both sides, and halting cleanly.
  • make run-ipc-zerocopy validates the multi-message shared point-to-point buffer pattern at the substrate level: a producer transfers one MemoryObject to the consumer and then exchanges four record payloads through the shared mapping while endpoint CALLs carry only sequence numbers and checksums. This is a substrate proof, not the production data-plane shape: typed SharedBuffer with explicit producer/consumer ring metadata, notification primitives, and consuming service APIs (File.readBuf, BlockDevice.readBlocks, NIC RX/TX rings) are tracked under Open Work.
  • make run-spawn validates ELF load failure rollback and frame exhaustion handling through ProcessSpawner.

Open Work

  • Extend frame-grant accounting only if future DMA pinning or service-owned shared-buffer pools need authority beyond held MemoryObject caps and live borrowed mappings.
  • Define page-pinning or mapping-identity rules for future shared WaitSet, DMA, and service-owned shared-buffer paths that must keep physical backing stable beyond a single locked copy/read. The owning planning track is Memory Authority Model.
  • Add file, block, network, and DMA service APIs that use MemoryObject-backed SharedBuffer caps for zero-copy data paths.
  • Add DMA isolation and device memory capability boundaries before userspace drivers.
  • Add huge-page handling only with explicit ownership and teardown rules.