# LLVM Target Customization for capOS

Deep research report on creating custom LLVM/Rust/Go targets for a
capability-based OS.

Status 2026-04-30 00:41 UTC: capOS keeps the kernel on
`x86_64-unknown-none`, while userspace builds through the checked-in
`x86_64-unknown-capos` target plus the runtime linker-script path. Since this
report was first written, PT_TLS parsing, userspace TLS block setup, FS-base
save/restore, the `VirtualMemory` capability, a `#[thread_local]` QEMU smoke,
Timer `now`/`sleep`, current-execution-context `ThreadControl` FS-base updates,
the single-thread runtime checkpoint, process-local thread lifecycle, and
private `ParkSpace` wait/wake have landed. Anonymous `VirtualMemory`
unmap/decommit and explicit `MemoryObject.unmap` now drain private park waiters
before address reuse. Runtime park clients, Go `futexsleep`/`futexwake` glue,
per-thread TLS ownership for full multi-thread runtime use, shared park words,
address-space generation cleanup, and a Go port remain future work.

## Table of Contents

1. [Custom OS Target Triple](#1-custom-os-target-triple)
2. [Calling Conventions](#2-calling-conventions)
3. [Relocations](#3-relocations)
4. [TLS (Thread-Local Storage) Models](#4-tls-thread-local-storage-models)
5. [Rust Target Specification](#5-rust-target-specification)
6. [Go Runtime Requirements](#6-go-runtime-requirements)
7. [Relevance to capOS](#7-relevance-to-capos)

---

## 1. Custom OS Target Triple

### Target Triple Format

LLVM target triples follow the format `<arch>-<vendor>-<os>` or
`<arch>-<vendor>-<os>-<env>`:

- **arch**: `x86_64`, `aarch64`, `riscv64gc`, etc.
- **vendor**: `unknown`, `apple`, `pc`, etc. (often `unknown` for custom OSes)
- **os**: `linux`, `none`, `redox`, `hermit`, `fuchsia`, etc.
- **env** (optional): `gnu`, `musl`, `eabi`, etc.

For capOS, the eventual userspace target triple should be
`x86_64-unknown-capos`. The kernel should keep using a freestanding target
(`x86_64-unknown-none`) unless a kernel-specific target file becomes useful
for build hygiene.

### What LLVM Needs

LLVM's target description consists of:

1. **Target machine**: Architecture (instruction set, register file, calling
   conventions). x86_64 already exists in LLVM.
2. **Object format**: ELF, COFF, Mach-O. capOS uses ELF.
3. **Relocation model**: static, PIC, PIE, dynamic-no-pic.
4. **Code model**: small, kernel, medium, large.
5. **OS-specific ABI details**: Stack alignment, calling convention defaults,
   TLS model, exception handling mechanism.

LLVM does NOT need kernel-level knowledge of your OS. It needs to know how
to generate correct object code for the target environment. The OS name in
the triple primarily affects:

- Default calling convention selection
- Default relocation model
- TLS model selection
- Object file format and flags
- C library assumptions (relevant for C compilation, less for Rust no_std)

### Creating a New OS in LLVM (Upstream Path)

To add `capos` as a recognized OS in LLVM itself:

1. Add the OS to `llvm/include/llvm/TargetParser/Triple.h` (the `OSType` enum)
2. Add string parsing in `llvm/lib/TargetParser/Triple.cpp`
3. Define ABI defaults in the relevant target (`llvm/lib/Target/X86/`)
4. Update Clang's driver for the new OS
   (`clang/lib/Driver/ToolChains/`, `clang/lib/Basic/Targets/`)

This is significant upstream work and not necessary initially. The pragmatic
path is using Rust's custom target JSON mechanism (see Section 5).

### What Other OSes Do

| OS | LLVM status | Approach |
|---|---|---|
| **Redox** | Upstream in Rust; no dedicated LLVM OS enum in current LLVM | Full triple `x86_64-unknown-redox`, Tier 2 in Rust |
| **Hermit** | Upstream in LLVM and Rust | `x86_64-unknown-hermit`, Tier 3, unikernel |
| **Fuchsia** | Upstream in LLVM and Rust | `x86_64-unknown-fuchsia`, Tier 2 |
| **Theseus** | Custom target JSON | Uses `x86_64-unknown-theseus` JSON spec, not upstream |
| **Blog OS (phil-opp)** | Custom target JSON | Uses JSON target spec, targets `x86_64-unknown-none` base |
| **seL4/Robigalia** | Custom target JSON | Modified from `x86_64-unknown-none` |

**Recommendation for capOS**: keep the kernel on `x86_64-unknown-none`.
Introduce a userspace-only custom target JSON when `cfg(target_os = "capos")`
or toolchain packaging becomes valuable. Do not upstream a `capos` OS triple
until the userspace ABI is stable.

Treat the userspace target as build hygiene and runtime scaffolding for now. It
does not promise a stable language ABI, Rust `std`, Go, C runtime, or upstream
target contract beyond the current static `no_std` userspace model.

---

## 2. Calling Conventions

### LLVM Calling Conventions

LLVM supports numerous calling conventions. The ones relevant to capOS:

| CC | LLVM ID | Description | Relevance |
|---|---|---|---|
| **C** | 0 | Default C calling convention (System V AMD64 ABI on x86_64) | Primary for interop |
| **Fast** | 8 | Optimized for internal use, passes in registers | Rust internal use |
| **Cold** | 9 | Rarely-called functions, callee-save heavy | Error paths |
| **GHC** | 10 | Glasgow Haskell Compiler, everything in registers | Not relevant |
| **HiPE** | 11 | Erlang HiPE, similar to GHC | Not relevant |
| **WebKit JS** | 12 | JavaScript JIT | Not relevant |
| **AnyReg** | 13 | Dynamic register allocation | JIT compilers |
| **PreserveMost** | 14 | Caller saves almost nothing | Interrupt handlers |
| **PreserveAll** | 15 | Caller saves nothing | Context switches |
| **Swift** | 16 | Swift self/error registers | Not relevant |
| **CXX_FAST_TLS** | 17 | C++ TLS access optimization | TLS wrappers |
| **X86_StdCall** | 64 | Windows stdcall | Not relevant |
| **X86_FastCall** | 65 | Windows fastcall | Not relevant |
| **X86_RegCall** | 95 | Register-based calling | Performance-critical code |
| **X86_INTR** | 83 | x86 interrupt handler | IDT handlers |
| **Win64** | 79 | Windows x64 calling convention | Not relevant |

### System V AMD64 ABI (The Default for capOS)

On x86_64, the System V AMD64 ABI (CC 0, "C") is the standard:

- **Integer args**: RDI, RSI, RDX, RCX, R8, R9
- **Float args**: XMM0-XMM7
- **Return**: RAX (integer), XMM0 (float)
- **Caller-saved**: RAX, RCX, RDX, RSI, RDI, R8-R11, XMM0-XMM15
- **Callee-saved**: RBX, RBP, R12-R15
- **Stack alignment**: 16-byte at call site
- **Red zone**: 128 bytes below RSP (unavailable in kernel mode)

capOS already uses this convention -- the syscall handler in
`kernel/src/arch/x86_64/syscall.rs` maps syscall registers to System V
registers before calling `syscall_handler`.

### Customizing for a New OS Target

For a custom OS, calling convention customization is usually minimal:

1. **Kernel code**: Disable the red zone (capOS already does this via
   `x86_64-unknown-none` which sets `"disable-redzone": true`). The red
   zone is unsafe in interrupt/syscall contexts.

2. **Userspace code**: Standard System V ABI is fine. The red zone is safe
   in userspace.

3. **Syscall convention**: This is an OS design choice, not an LLVM CC.
   capOS uses: RAX=syscall number, RDI-R9=args (matching System V for
   easy dispatch). Linux uses a slightly different register mapping
   (R10 instead of RCX for arg4, because SYSCALL clobbers RCX).

4. **Interrupt handlers**: Use `X86_INTR` (CC 83) or manual
   save/restore. capOS currently uses manual asm stubs.

### Cross-Language Interop Implications

| Languages | Convention | Notes |
|---|---|---|
| Rust <-> Rust | Rust ABI (unstable) | Internal to a crate, not stable across crates |
| Rust <-> C | `extern "C"` (System V) | Stable, well-defined. Used for `libcapos` API |
| Rust <-> Go | Complex (see Section 6) | Go has its own internal ABI (ABIInternal) |
| C <-> Go | `extern "C"` via cgo | Go's cgo bridge, heavy overhead |
| Any <-> Kernel | Syscall convention | Register-based, OS-defined, not a CC |

**Key point**: The System V AMD64 ABI is the lingua franca. All languages
can produce `extern "C"` functions. capOS should standardize on System V
for all cross-language boundaries and capability invocations.

Go's internal ABI (ABIInternal, using R14 as the `g` register) is different
from System V. Go functions called from outside Go must go through a
trampoline. This is handled by the Go runtime, not something capOS needs
to solve at the LLVM level.

---

## 3. Relocations

### LLVM Relocation Models

| Model | Flag | Description |
|---|---|---|
| **static** | `-relocation-model=static` | All addresses resolved at link time. No GOT/PLT. |
| **pic** | `-relocation-model=pic` | Position-independent code. Uses GOT for globals, PLT for calls. |
| **dynamic-no-pic** | `-relocation-model=dynamic-no-pic` | Like static but with dynamic linking support (macOS legacy). |
| **ropi** | `-relocation-model=ropi` | Read-only position-independent (ARM embedded). |
| **rwpi** | `-relocation-model=rwpi` | Read-write position-independent (ARM embedded). |
| **ropi-rwpi** | `-relocation-model=ropi-rwpi` | Both ROPI and RWPI (ARM embedded). |

### Code Models (x86_64)

| Model | Flag | Address Range | Use Case |
|---|---|---|---|
| **small** | `-code-model=small` | 0 to 2GB | Userspace default |
| **kernel** | `-code-model=kernel` | Top 2GB (negative 32-bit) | Higher-half kernel |
| **medium** | `-code-model=medium` | Code in low 2GB, data anywhere | Large data sets |
| **large** | `-code-model=large` | No assumptions | Maximum flexibility, worst performance |

### What capOS Currently Uses

From `.cargo/config.toml`:
```toml
[target.x86_64-unknown-none]
rustflags = ["-C", "link-arg=-Tkernel/linker-x86_64.ld", "-C", "code-model=kernel", "-C", "relocation-model=static"]
```

- **Kernel**: `code-model=kernel` + `relocation-model=static`. Correct for
  a higher-half kernel at `0xffffffff80000000`. All kernel symbols are in the
  top 2GB of virtual address space, so 32-bit sign-extended addressing works.

- **Init/demos/capos-rt/shell/libcapos/libcapos-posix/capos-wasm userspace**:
  All standalone userspace crates build against
  `targets/x86_64-unknown-capos.json` (checked in at that path) via the
  `build-*-capos` Cargo aliases in `.cargo/config.toml`. The target sets
  `code-model = "small"`, `relocation-model = "static"`, `os = "capos"`,
  `has-thread-local = true`, and `tls-model = "local-exec"`. The pinned
  nightly toolchain is `nightly-2026-04-20`; verify the effective LLVM version
  with `rustc --version --verbose` against that toolchain date.

### Kernel vs. Userspace Requirements

**Kernel:**
- Static relocations, kernel code model.
- No PIC overhead needed -- the kernel is loaded at a known address.
- The linker script places everything in the higher half.
- This is the correct and standard approach (Linux kernel does the same).

**Userspace (current -- static binaries):**
- Static relocations. A future custom userspace target should choose the small
  code model explicitly.
- Simple, no runtime relocator needed.
- Binary is loaded at a fixed address (`0x200000`).
- Works perfectly for single-binary-per-address-space.

**Userspace (future -- if shared libraries or ASLR desired):**
- PIE (Position-Independent Executable) = PIC + static linking.
- Requires a dynamic loader or kernel-side relocator.
- Enables ASLR (Address Space Layout Randomization) for security.
- Adds GOT indirection overhead (typically < 5% performance impact).

### Position-Independent Code in a Capability Context

PIC/PIE is relevant to capOS for several reasons:

1. **ASLR**: PIE enables loading binaries at random addresses, making
   ROP attacks harder. Even in a capability system, defense-in-depth matters.

2. **Shared libraries**: If capOS ever supports shared objects (e.g., a
   shared `libcapos.so`), PIC is required for the shared library.

3. **WASI/Wasm**: Not relevant -- Wasm has its own memory model.

4. **Multiple instances**: With static linking, two instances of the same
   binary can share read-only pages (text, rodata) if loaded at the same
   address. PIC/PIE allows sharing even at different addresses (copy-on-write
   for the GOT).

**Recommendation for capOS**: Keep static relocation for now. Consider PIE
for userspace when implementing ASLR (after threading and IPC are stable).
The kernel should remain static forever.

---

## 4. TLS (Thread-Local Storage) Models

### LLVM TLS Models

LLVM supports four TLS models, in order from most dynamic to most
constrained:

| Model | Description | Runtime Requirement | Performance |
|---|---|---|---|
| **general-dynamic** | Any module, any time | Full `__tls_get_addr` via dynamic linker | Slowest (function call per access) |
| **local-dynamic** | Same module, any time | `__tls_get_addr` for module base, then offset | Slow (one call per module per thread) |
| **initial-exec** | Only modules loaded at startup | GOT slot populated by dynamic linker | Fast (one memory load) |
| **local-exec** | Main executable only | Direct FS/GS offset, known at link time | Fastest (single instruction) |

### How TLS Works on x86_64

On x86_64, TLS is accessed via the FS segment register:

1. The OS sets the FS base address for each thread (via `MSR_FS_BASE` or
   `arch_prctl(ARCH_SET_FS)`).
2. TLS variables are accessed as offsets from FS base:
   - `local-exec`: `mov %fs:OFFSET, %rax` (offset known at link time)
   - `initial-exec`: `mov %fs:0, %rax; mov GOT_OFFSET(%rax), %rcx; mov %fs:(%rcx), %rdx`
   - `general-dynamic`: `call __tls_get_addr` (returns pointer to TLS block)

### Which Model for capOS?

**Kernel:**
- The kernel does not use compiler TLS. Current TLS support is for loaded
  userspace ELF images only.
- For SMP: per-CPU data via GS segment register (the standard approach).
  Set `MSR_GS_BASE` on each CPU to point to a `PerCpu` struct.
  `swapgs` on kernel entry switches between user and kernel GS base.
- Kernel TLS model: Not applicable (per-CPU data is accessed via GS, not
  the compiler's TLS mechanism).

**Userspace (static binaries, no dynamic linker):**
- **local-exec** is the only correct choice. There's no dynamic linker to
  resolve TLS relocations, so general-dynamic and initial-exec won't work.
- Implemented for the current single-threaded process model: the ELF parser
  records `PT_TLS`, the loader maps a Variant II TLS block plus TCB self
  pointer, and the scheduler saves/restores FS base on context switch.
- Implemented for the current execution context: `ThreadControl.setFsBase`
  gives a runtime a capability-authorized equivalent to
  `arch_prctl(ARCH_SET_FS)`.
- `ThreadControl.setFsBase` affects only the current thread or execution
  context. There is no process-global FS-base mutation.
- Still missing for future threading and full Go: per-thread TLS state and
  independently settable FS bases for each user thread.
- Future thread creation must allocate or receive a distinct TLS block and FS
  base per `ThreadRef`; treating TLS as process-global would break Rust
  `#[thread_local]`, Go `g` state, and any C runtime that assumes per-thread TLS.
- Current-process/current-thread FS-base operations are useful for the
  single-thread runtime checkpoint, but they are not the final threading ABI.
  True multi-threaded Go or C/POSIX-like runtime support requires per-ThreadRef
  TLS allocation, per-thread FS-base ownership, and context switches that
  save/restore FS base as thread state.

**Userspace (with dynamic linker, future):**
- **initial-exec** for the main executable and preloaded libraries.
- **general-dynamic** for `dlopen()`-loaded libraries.
- Requires implementing `__tls_get_addr` in the dynamic linker.

### TLS Initialization Sequence

For a statically-linked userspace binary with local-exec TLS:

```
1. Kernel creates thread
2. Kernel allocates TLS block (size from ELF TLS program header)
3. Kernel copies .tdata (initialized TLS) into TLS block
4. Kernel zeros .tbss (uninitialized TLS) in TLS block
5. Kernel sets FS base = TLS block address (writes MSR_FS_BASE)
6. Thread starts executing; %fs:OFFSET accesses TLS directly
```

The ELF file contains two TLS sections:
- `.tdata` (PT_TLS segment, initialized thread-local data)
- `.tbss` (zero-initialized thread-local data, like `.bss` but per-thread)

The PT_TLS program header tells the loader:
- Virtual address and file offset of `.tdata`
- `p_memsz` = total TLS size (including `.tbss`)
- `p_filesz` = size of `.tdata` only
- `p_align` = required alignment

### FS/GS Base Register Usage Plan

| Register | Used By | Purpose |
|---|---|---|
| **FS** | Userspace threads | Thread-local storage (set per-thread by kernel) |
| **GS** | Kernel (via swapgs) | Per-CPU data (set per-CPU during boot) |

This is the standard Linux convention and what Go expects (Go uses
`arch_prctl(ARCH_SET_FS)` to set the FS base for each OS thread).

### What capOS Has and Still Needs

1. **Implemented**: parse `PT_TLS` in `capos-lib/src/elf.rs`.
2. **Implemented**: allocate/map a TLS block during process image load in
   `kernel/src/spawn.rs`.
3. **Implemented**: copy `.tdata`, zero `.tbss`, and write the TCB self
   pointer for the current Variant II static TLS layout.
4. **Implemented**: save/restore FS base through `kernel/src/sched.rs` and
   `kernel/src/arch/x86_64/tls.rs`.
5. **Implemented for the current process execution context**:
   `ThreadControl.getFsBase` and `ThreadControl.setFsBase`.
6. **Still needed**: per-thread FS-base state for future multi-threaded
   userspace.

---

## 5. Rust Target Specification

### How Custom Targets Work

Rust supports custom targets via JSON specification files. The workflow:

1. Create a `<target-name>.json` file
2. Pass it to rustc: `--target path/to/x86_64-unknown-capos.json`
3. Use with cargo via `-Zbuild-std` to build core/alloc/std from source

Target lookup priority:
1. Built-in target names
2. File path (if the target string contains `/` or `.json`)
3. `RUST_TARGET_PATH` environment variable directories

The Rust target JSON schema is explicitly unstable. Generate examples from the
pinned compiler with `rustc -Z unstable-options --print target-spec-json` and
validate against that same compiler's `target-spec-json-schema` before checking
in a target file.

### Viewing Existing Specs

```bash
# Print the JSON spec for a built-in target:
rustc +nightly -Z unstable-options --target=x86_64-unknown-none --print target-spec-json

# Print the JSON schema for all available fields:
rustc +nightly -Z unstable-options --print target-spec-json-schema
```

### Example: x86_64-unknown-capos Kernel Target

Based on the current `x86_64-unknown-none` target, with capOS-specific
adjustments. This is a sketch; regenerate from the pinned rustc schema before
using it.

```json
{
    "llvm-target": "x86_64-unknown-none-elf",
    "metadata": {
        "description": "capOS kernel (x86_64)",
        "tier": 3,
        "host_tools": false,
        "std": false
    },
    "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128",
    "arch": "x86_64",
    "cpu": "x86-64",
    "target-endian": "little",
    "target-pointer-width": 64,
    "target-c-int-width": 32,
    "os": "none",
    "env": "",
    "vendor": "unknown",
    "linker-flavor": "gnu-lld",
    "linker": "rust-lld",
    "pre-link-args": {
        "gnu-lld": ["-Tkernel/linker-x86_64.ld"]
    },
    "features": "-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2,+soft-float",
    "disable-redzone": true,
    "panic-strategy": "abort",
    "code-model": "kernel",
    "relocation-model": "static",
    "rustc-abi": "softfloat",
    "executables": true,
    "exe-suffix": "",
    "has-thread-local": false,
    "position-independent-executables": false,
    "static-position-independent-executables": false,
    "plt-by-default": false,
    "max-atomic-width": 64,
    "stack-probes": { "kind": "inline" }
}
```

### Example: x86_64-unknown-capos Userspace Target

```json
{
    "llvm-target": "x86_64-unknown-none-elf",
    "metadata": {
        "description": "capOS userspace (x86_64)",
        "tier": 3,
        "host_tools": false,
        "std": false
    },
    "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128",
    "arch": "x86_64",
    "cpu": "x86-64",
    "target-endian": "little",
    "target-pointer-width": 64,
    "target-c-int-width": 32,
    "os": "capos",
    "env": "",
    "vendor": "unknown",
    "linker-flavor": "gnu-lld",
    "linker": "rust-lld",
    "pre-link-args": {
        "gnu-lld": ["-Tinit/linker.ld"]
    },
    "features": "-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2,+soft-float",
    "disable-redzone": false,
    "panic-strategy": "abort",
    "code-model": "small",
    "relocation-model": "static",
    "rustc-abi": "softfloat",
    "executables": true,
    "exe-suffix": "",
    "has-thread-local": true,
    "position-independent-executables": false,
    "static-position-independent-executables": false,
    "max-atomic-width": 64,
    "plt-by-default": false,
    "stack-probes": { "kind": "inline" },
    "tls-model": "local-exec"
}
```

### Key JSON Fields

| Field | Purpose | Typical Values |
|---|---|---|
| `llvm-target` | LLVM triple for code generation | `x86_64-unknown-none-elf` (reuse existing backend) |
| `os` | OS name (affects `cfg(target_os = "...")`) | `"none"`, `"capos"`, `"linux"` |
| `arch` | Architecture name | `"x86_64"`, `"aarch64"` |
| `data-layout` | LLVM data layout string | Copy from same-arch target |
| `linker-flavor` | Which linker to use | `"gnu-lld"`, `"gcc"`, `"msvc"` |
| `linker` | Linker binary | `"rust-lld"`, `"ld.lld"` |
| `features` | CPU features to enable/disable | Disable SIMD/FPU until context switching saves that state |
| `disable-redzone` | Disable System V red zone | `true` for kernel, `false` for userspace |
| `code-model` | LLVM code model | `"kernel"`, `"small"` |
| `relocation-model` | LLVM relocation model | `"static"`, `"pic"` |
| `panic-strategy` | How to handle panics | `"abort"`, `"unwind"` |
| `has-thread-local` | Enable `#[thread_local]` | `true` for userspace now that PT_TLS/FS base works |
| `tls-model` | Default TLS model | `"local-exec"` for static binaries |
| `max-atomic-width` | Largest atomic type (bits) | `64` for x86_64 |
| `pre-link-args` | Arguments passed to linker before user args | Linker script path |
| `position-independent-executables` | Generate PIE by default | `false` for now |
| `exe-suffix` | Executable file extension | `""` for ELF |
| `stack-probes` | Stack overflow detection mechanism | `{"kind": "inline"}` in the current freestanding x86_64 spec |

The SIMD/FPU-disabled userspace target is a temporary runtime constraint, not a
long-term property of `x86_64-unknown-capos`. It is acceptable only while the
kernel lacks full FPU/SIMD context switching and language runtimes are confined
to the current static `no_std` subset. Before Go, C, or full Rust `std` support,
validate the target against each runtime's amd64 codegen assumptions; mainstream
amd64 runtimes may assume SSE2/FPU state even when application code does not
explicitly use vector types.

Do not let the custom userspace target accidentally ossify a weaker ABI solely
because early kernel context switching does not yet save full FPU/SIMD state.
The final language-runtime target must be selected after the kernel's amd64
context-switch state and the runtime's codegen assumptions are both reviewed.

### no_std vs std Support Path

**Current state**: capOS uses `no_std` + `alloc`. This works with any
target, including `x86_64-unknown-none`.

**Path to std support** (what Redox, Hermit, and Fuchsia did):

1. **Phase 1: Custom target with `os: "capos"`** (current report). Use
   `-Zbuild-std=core,alloc` to build core and alloc. No std.

2. **Phase 2: Add capOS to Rust's `std` library**. This requires:
   - Adding `mod capos` under `library/std/src/sys/` with OS-specific
     implementations of: filesystem, networking, threads, time, stdio,
     process spawning, etc.
   - Each of these maps to capOS capabilities
   - Use `cfg(target_os = "capos")` throughout std
   - Build with `-Zbuild-std=std`

3. **Phase 3: Upstream the target** (optional). Submit the target spec and
   std implementations to the Rust project. Requires sustained maintenance.

**What Redox did**: Redox implemented a full POSIX-like userspace (`relibc`)
and added std support by implementing the `sys` module in terms of relibc
syscalls. This made Redox a Tier 2 target with pre-built std artifacts.

**What Hermit did**: Hermit is a unikernel, so std is implemented directly
in terms of Hermit's kernel-level APIs. Tier 3, community maintained.

**What Fuchsia did**: Fuchsia implemented std using Fuchsia's native
`zircon` syscalls (handles, channels, VMOs -- similar in spirit to
capabilities). Tier 2.

**Recommendation for capOS**: Stay on `no_std` + `alloc` with the custom
target JSON. std support is a large effort that should wait until the
syscall surface is stable and threading works. When the time comes, Fuchsia's
approach (std over native capability syscalls) is the best model, since
Fuchsia's handle-based API is conceptually close to capOS's capabilities.

### Other OS Projects Reference

| OS | Target | Tier | std | Approach |
|---|---|---|---|---|
| **Redox** | `x86_64-unknown-redox` | 2 | Yes | relibc (custom libc) over Redox syscalls |
| **Hermit** | `x86_64-unknown-hermit` | 3 | Yes | std directly over kernel API |
| **Fuchsia** | `x86_64-unknown-fuchsia` | 2 | Yes | std over zircon handles (capability-like) |
| **Theseus** | `x86_64-unknown-theseus` | N/A | No | Custom JSON, no_std, research OS |
| **Blog OS** | Custom JSON | N/A | No | Based on x86_64-unknown-none |
| **MOROS** | Custom JSON | N/A | No | Simple hobby OS |

---

## 6. Go Runtime Requirements

### Go's Runtime Architecture

Go's runtime is essentially a userspace operating system. It manages
goroutine scheduling, garbage collection, memory allocation, and I/O
multiplexing. The runtime interfaces with the actual OS through a narrow
set of functions that each GOOS must implement.

### Minimum OS Interface for a Go Port

Based on analysis of `runtime/os_linux.go`, `runtime/os_plan9.go`, and
`runtime/os_js.go`, here is the minimum interface:

#### Tier 1: Absolute Minimum (single-threaded, like GOOS=js)

These functions are needed for "Hello, World!":

```go
func osinit()                                    // OS initialization
func write1(fd uintptr, p unsafe.Pointer, n int32) int32  // stdout/stderr output
func exit(code int32)                            // process termination
func usleep(usec uint32)                         // sleep (can be no-op initially)
func readRandom(r []byte) int                    // random data (for maps, etc.)
func goenvs()                                    // environment variables
func mpreinit(mp *m)                             // pre-init new M on parent thread
func minit()                                     // init new M on its own thread
func unminit()                                   // undo minit
func mdestroy(mp *m)                             // destroy M resources
```

Plus memory management (in `runtime/mem_*.go`):
```go
func sysAllocOS(n uintptr) unsafe.Pointer        // allocate memory (mmap)
func sysFreeOS(v unsafe.Pointer, n uintptr)       // free memory (munmap)
func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer  // reserve VA range
func sysMapOS(v unsafe.Pointer, n uintptr)        // commit reserved pages
func sysUsedOS(v unsafe.Pointer, n uintptr)       // mark as used
func sysUnusedOS(v unsafe.Pointer, n uintptr)     // mark as unused (madvise)
func sysFaultOS(v unsafe.Pointer, n uintptr)      // remove access
func sysHugePageOS(v unsafe.Pointer, n uintptr)   // hint: use huge pages
```

#### Tier 2: Multi-threaded (real goroutines)

```go
func newosproc(mp *m)                            // create OS thread (clone)
func exitThread(wait *atomic.Uint32)             // exit current thread
func futexsleep(addr *uint32, val uint32, ns int64)  // futex wait
func futexwakeup(addr *uint32, cnt uint32)        // futex wake
func settls()                                     // set FS base for TLS
func nanotime1() int64                            // monotonic nanosecond clock
func walltime() (sec int64, nsec int32)           // wall clock time
func osyield()                                    // sched_yield
```

#### Tier 3: Full Runtime (signals, profiling, network poller)

```go
func sigaction(sig uint32, new *sigactiont, old *sigactiont)
func signalM(mp *m, sig int)                      // send signal to thread
func setitimer(mode int32, new *itimerval, old *itimerval)
func netpollopen(fd uintptr, pd *pollDesc) uintptr
func netpoll(delta int64) (gList, int32)
func netpollBreak()
```

### Linux Syscalls Used by Go Runtime (Complete List)

From `runtime/sys_linux_amd64.s`:

| Syscall | # | Go Wrapper | capOS Equivalent |
|---|---|---|---|
| `read` | 0 | `runtime.read` | Store cap |
| `write` | 1 | `runtime.write1` | Console cap |
| `close` | 3 | `runtime.closefd` | Cap drop |
| `mmap` | 9 | `runtime.sysMmap` | VirtualMemory cap |
| `munmap` | 11 | `runtime.sysMunmap` | VirtualMemory.unmap |
| `brk` | 12 | `runtime.sbrk0` | VirtualMemory cap |
| `rt_sigaction` | 13 | `runtime.rt_sigaction` | Signal cap (future) |
| `rt_sigprocmask` | 14 | `runtime.rtsigprocmask` | Signal cap (future) |
| `sched_yield` | 24 | `runtime.osyield` | sys_yield |
| `mincore` | 27 | `runtime.mincore` | VirtualMemory.query |
| `madvise` | 28 | `runtime.madvise` | Future VirtualMemory decommit/query semantics, or unmap/remap policy |
| `nanosleep` | 35 | `runtime.usleep` | Timer cap |
| `setitimer` | 38 | `runtime.setitimer` | Timer cap |
| `getpid` | 39 | `runtime.getpid` | Process info |
| `clone` | 56 | `runtime.clone` | Thread cap |
| `exit` | 60 | `runtime.exit` | sys_exit |
| `sigaltstack` | 131 | `runtime.sigaltstack` | Not needed initially |
| `arch_prctl` | 158 | `runtime.settls` | ThreadControl.setFsBase |
| `gettid` | 186 | `runtime.gettid` | Thread info |
| `futex` | 202 | `runtime.futex` | ParkSpace compact `CAP_OP_PARK` / `CAP_OP_UNPARK` |
| `sched_getaffinity` | 204 | `runtime.sched_getaffinity` | CPU info |
| `timer_create` | 222 | `runtime.timer_create` | Timer cap |
| `timer_settime` | 223 | `runtime.timer_settime` | Timer cap |
| `timer_delete` | 226 | `runtime.timer_delete` | Timer cap |
| `clock_gettime` | 228 | `runtime.nanotime1` | Timer cap |
| `exit_group` | 231 | `runtime.exit` | sys_exit |
| `tgkill` | 234 | `runtime.tgkill` | Thread signal (future) |
| `openat` | 257 | `runtime.open` | Namespace cap |
| `pipe2` | 293 | `runtime.pipe2` | IPC cap |

### Go's TLS Model

Go uses `arch_prctl(ARCH_SET_FS, addr)` to set the FS segment base for
each OS thread. The convention:

- FS base points to the thread's `m.tls` array
- Goroutine pointer `g` is stored at `-8(FS)` (ELF TLS convention)
- In Go's ABIInternal, R14 is cached as the `g` register for performance
- On signal entry or thread start, `g` is loaded from TLS into R14

Go does NOT use the compiler's TLS mechanisms (no `__thread` or
`thread_local!`). It manages TLS entirely in its own runtime via the FS
register.

For capOS, this means the kernel needs:
1. `arch_prctl(ARCH_SET_FS)` equivalent capability method
2. The kernel must save/restore FS base on context switch
3. Each thread's FS base must be independently settable

### Adding GOOS=capos to Go

Files that need to be created/modified in a Go fork:

```
src/runtime/
    os_capos.go           // osinit, newosproc, futexsleep, etc.
    os_capos_amd64.go     // arch-specific OS functions
    sys_capos_amd64.s     // syscall wrappers in assembly
    mem_capos.go          // sysAlloc/sysFree/etc. over VirtualMemory cap
    signal_capos.go       // signal stubs (no real signals initially)
    stubs_capos.go        // misc stubs
    netpoll_capos.go      // network poller (stub initially)
    defs_capos.go         // OS-level constants
    vdso_capos.go         // VDSO stubs (no VDSO)

src/syscall/
    syscall_capos.go      // Go's syscall package
    zsyscall_capos_amd64.go

src/internal/platform/
    (modifications to supported.go, zosarch.go)

src/cmd/dist/
    (modifications to add capOS to known OS list)
```

Estimated: ~2000-3000 lines for Phase 1 (single-threaded).

### Feasibility Assessment

| Feature | Difficulty | Blocked On |
|---|---|---|
| Hello World (write + exit) | Easy | Console capability plus `exit` syscall |
| Memory allocator (mmap) | Medium | VirtualMemory capability exists; Go glue and any missing query/decommit semantics remain |
| Single-threaded goroutines (M=1) | Medium | VirtualMemory and Timer capabilities exist; Go runtime glue remains |
| Multi-threaded (real threads) | Hard | capos-rt thread/park clients, Go `newosproc` and `futexsleep`/`futexwake` glue, per-ThreadRef TLS ownership, GC/runtime coordination |
| Network poller | Hard | Async cap invocation, networking stack |
| Signal-based preemption | Hard | Signal delivery mechanism |
| Full stdlib | Very Hard | POSIX layer or native cap wrappers |

---

## 7. Relevance to capOS

### Practical Scope of Work

#### Phase 1: Custom Target JSON (done)

**What**: A `targets/x86_64-unknown-capos.json` target spec is checked into
the repo. All userspace crates (init, demos, shell, capos-rt, libcapos,
libcapos-posix, capos-wasm) build against it via Cargo aliases in
`.cargo/config.toml`. The kernel stays on `x86_64-unknown-none`.

**Why**: Enables `cfg(target_os = "capos")`, sets `code-model = "small"` and
`tls-model = "local-exec"` explicitly, and removes the dependency on
per-crate rustflag overrides.

**Recurring maintenance**: Rust target JSON fields are not stable; validate
the checked-in file against `rustc -Z unstable-options --print
target-spec-json-schema` when upgrading the pinned nightly.

#### Phase 2: TLS Support (mostly landed, required for Go)

**What**: Parse PT_TLS from ELF, allocate per-thread TLS blocks, set FS base
on context switch, add `arch_prctl`-equivalent syscall.

**Why**: Required for Go runtime (Go's `settls()` sets FS base), for Rust
`#[thread_local]` in userspace, and for C's `__thread`.

**Current state**: PT_TLS parsing, static TLS mapping, FS-base context-switch
state, runtime-controlled current FS-base updates, and Rust `#[thread_local]`
smokes are implemented. Process-local thread lifecycle also exists. Remaining
work is allocating and owning distinct TLS blocks and FS-base state per
`ThreadRef` for Go's multi-thread runtime path.

**Blockers**: per-ThreadRef TLS ownership rules and Go `newosproc` integration
for the multi-threaded case.

#### Phase 3: VirtualMemory Capability (implemented baseline, required for Go)

**What**: Implement the VirtualMemory capability interface. The current schema
has map, unmap, and protect; Go may need decommit/query semantics later.

**Why**: Go's memory allocator (`sysAlloc`, `sysReserve`, `sysMap`, etc.)
needs mmap-like functionality. This is the single biggest kernel-side
requirement for Go.

**Current state**: `VirtualMemoryCap` implements map/unmap/protect over the
existing page-table code with ownership tracking and quota checks. Go-specific
work still has to map runtime `sysAlloc`/`sysReserve`/`sysMap` expectations
onto that interface.

**Blockers**: None for the baseline capability. Useful Go still needs runtime
glue for VirtualMemory/Timer, capos-rt park clients, Go futex glue, Go thread
integration, and address-space generation cleanup for reusable private park
words outside the landed explicit unmap/decommit paths.

#### Phase 4: ParkSpace Go Futex Glue (Low-medium effort, required for Go threading)

**What**: map Go's `futex(WAIT)` and `futex(WAKE)` runtime hooks onto the
implemented `ParkSpace` compact wait/wake operations.

**Why**: Go's runtime synchronization (`lock_futex.go`) is built on futexes.
The entire goroutine scheduler depends on futex-based sleeping.

**Effort**: the compact park ABI already exists as `CAP_OP_PARK` and
`CAP_OP_UNPARK`; Go futex glue should target that `ParkSpace` contract instead
of inventing a parallel wait namespace.

**Private futex authority and keying rules**: use
[ParkSpace](../architecture/park.md) as the normative design. Private futex
keys are generation-bearing address-space keys:

```rust
ParkKey::Private {
    address_space_id,
    address_space_generation,
    uaddr,
}
```

- `WAIT` validates that the address is mapped readable in the caller's current
  address space and that the expected value still matches under the same
  page-table stability rules used for process-buffer validation.
- The value check and waiter insertion are one atomic kernel operation with
  respect to `WAKE`, unmap, process exit, and address-space teardown.
- `WAKE` for a private futex can only wake waiters with the same
  `address_space_id` and `address_space_generation`; a raw virtual address is
  never a cross-process sync key.
- Unmap, revoke, or address-space teardown drains or fails waiters for the old
  key before the virtual address can be reused as unrelated state.
- A future shared-futex design must use `ParkKey::Shared` with
  `memory_object_id`, `memory_object_generation`, and aligned object offset, not
  raw user virtual address.

The authority boundary stays the caller's `ParkSpace` capability for private
parks and a future `SharedParkSpace` for MemoryObject-derived shared parks. Do
not introduce a global futex namespace or a generation-less duplicate key shape.

**Blockers**: capos-rt park clients, Go `futexsleep`/`futexwake` glue, and full
multi-thread runtime integration.

#### Phase 5: Go Thread Runtime Integration (High effort, required for Go GOMAXPROCS>1)

**What**: connect Go's `newosproc`, TLS ownership, futex glue, and GC
coordination to the implemented process-local thread lifecycle and private
`ParkSpace` wait/wake substrate.

**Why**: Go's `newosproc()` creates OS threads via `clone()`. Without real
threads, Go is limited to `GOMAXPROCS=1`.

**Effort**: still high, but the kernel substrate is no longer a blank
scheduler extension. The remaining work is capos-rt clients, Go runtime glue,
per-ThreadRef TLS ownership, and validation under Go's scheduler.

**Blockers**: capos-rt thread and park clients, `newosproc` glue,
`futexsleep`/`futexwake` glue, per-ThreadRef TLS ownership rules, GC
coordination across kernel threads, address-space generation cleanup for
reusable private park-word memory outside explicit unmap/decommit paths, and
shared park words for future cross-process futexes. Per-CPU data and SMP are
later blockers for multi-core scaling, not for the first single-CPU Go thread
integration.

### Biggest Blockers for Go

In priority order after the 2026-04-24 TLS, VirtualMemory, Timer,
ThreadControl, single-thread runtime-checkpoint, process-local thread
lifecycle, and private ParkSpace work:

1. **Go park/futex glue** -- Go's M:N scheduler depends on futex-shaped
   sleeping/waking. The kernel has private ParkSpace wait/wake; the Go port
   still needs capos-rt clients and `futexsleep`/`futexwake` integration.

2. **Go thread integration** -- Required for `GOMAXPROCS > 1`. The kernel has
   process-local thread lifecycle; the Go port still needs `newosproc`,
   per-ThreadRef TLS ownership, and GC coordination across those threads.

3. **Go runtime port glue** -- the capOS capability side now has a
   single-thread checkpoint for VirtualMemory and Timer, but a real Go fork
   still needs to map `sysAlloc`/`write1`/`exit`/random/env/time to capOS
   runtime and capabilities.

### Biggest Blockers for C

C is much simpler than Go:

1. **Linker and toolchain setup** -- Need a cross-compilation toolchain
   targeting capOS (Clang with the custom target, or GCC cross-compiler).
2. **`libcapos.a` with C headers** -- Rust library with `extern "C"` API.
3. **musl integration (optional)** -- For full libc, replace musl's
   `__syscall()` with capability invocations.

### Recommended Implementation Order

```
1. Custom userspace target JSON          [done: targets/x86_64-unknown-capos.json]
     |
2. VirtualMemory capability              [done: baseline map/unmap/protect]
     |
3. TLS support (PT_TLS, FS base)         [done: static ELF + ThreadControl]
     |
4. ParkSpace compact wait/wake           [done: private path; clients open]
     |
5. Timer capability (monotonic clock)    [done: monotonic now/sleep]
     |
6. Go Phase 1: minimal GOOS=capos       [checkpoint done; Go fork remains]
     |
7. Kernel threading for Go runtime       [partial thread lifecycle; Go integration open]
     |
8. Go Phase 2: multi-threaded           [GOMAXPROCS>1, concurrent GC]
     |
9. C toolchain + libcapos               [parallel with Go work]
     |
10. Go Phase 3: network poller          [depends on networking stack]
```

Steps 1-5 are kernel prerequisites. Step 6 is the Go fork. Steps 7-10 are
incremental improvements that can proceed in parallel.

### Key Architectural Decisions for capOS

1. **Keep `x86_64-unknown-none` for kernel, `x86_64-unknown-capos` for
   userspace.** The kernel does not benefit from a custom OS target (it's
   freestanding). Userspace benefits from `cfg(target_os = "capos")`.

2. **Use local-exec TLS model for static binaries.** No dynamic linker means
   no general-dynamic or initial-exec TLS. local-exec is zero-overhead.

3. **Implement FS base save/restore early.** Both Go and Rust `#[thread_local]`
   need it. It's a small addition to context switch code.

4. **VirtualMemory cap stays on the Go critical path.** The baseline exists;
   the Go port still needs exact runtime allocator semantics and any missing
   query/decommit behavior.

5. **Futex is the synchronization primitive.** Both Go and any future
   pthreads implementation need futex-shaped wait/wake. The capOS authority
   surface is `ParkSpace`, using compact `CAP_OP_PARK` / `CAP_OP_UNPARK`
   transport rather than generic Cap'n Proto method dispatch on the hot path.

6. **Signals can be deferred.** Go can start with cooperative-only
   preemption (no `SIGURG`). Signal delivery is complex and can come much
   later.

## Used By

- [Go Runtime](../proposals/go-runtime-proposal.md) for the native
  `GOOS=capos` runtime plan.
- [Go VirtualMemory Contract](../backlog/go-virtual-memory-contract.md) for
  the `sysReserve`/`sysMap`/`sysUnused` allocator contract.
- [Userspace Runtime](../architecture/userspace-runtime.md) for the
  `capos-rt` hooks a language runtime calls.
