Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Lua Scripting

How capOS should add Lua as a small capability-aware scripting environment without turning scripts into ambiently privileged shell fragments.

Problem

capOS needs a lightweight scripting path for operator workflows, demos, service glue, and eventually interactive shell automation. The native shell already exposes typed capabilities and explicit child grants, but a shell REPL is not a full programming language. Lua is attractive because it is small, embeddable, and designed to let a host provide the domain API.

The risk is predictable: “system scripting” often becomes an escape hatch around the operating system model. A script runner that receives broad ProcessSpawner, BootPackage, filesystem, network, or terminal authority and then exposes io, os, package.loadlib, or raw handle integers would recreate the ambient authority capOS is trying to avoid.

The target is not “make Lua root.” The target is:

  • Lua as ordinary userspace code.
  • Capabilities as the only authority.
  • Host-provided Lua libraries that map to typed capOS interfaces.
  • Exact grants for script processes, with no default filesystem, network, process, terminal, or debug authority.

Scope

In scope:

  • A capos-lua userspace runner for trusted operator and service scripts.
  • A small Lua host API over capos-rt typed clients.
  • A policy for standard Lua libraries on capOS.
  • Script packaging and shell launch shape.
  • Validation through QEMU scripts that prove granted and ungranted paths.

Out of scope for the first implementation:

  • LuaJIT.
  • Dynamic native Lua C modules.
  • A POSIX-compatible Lua environment.
  • Treating in-process Lua sandboxing as the isolation boundary for hostile scripts.
  • Kernel awareness of Lua.

Research Grounding

The actual docs/research/ contents were checked before selecting grounding files. Relevant local research:

  • Capability research index: keep typed Cap’n Proto interfaces as the permission boundary and avoid parallel rights flags.
  • Genode: route service access structurally; sessions are typed and resource-accounted.
  • Plan 9 and Inferno: per-process namespaces are useful precedent, but capOS should not turn scripts into path-global clients.
  • EROS, CapROS, and Coyotos: confinement depends on constructing the subject with only the capabilities it may use.
  • seL4: keep the privileged kernel surface small and let userspace policy build higher-level systems.

External Lua references:

  • The official Lua 5.5 manual describes Lua as an embeddable C library with a host program that registers C functions callable from Lua.
  • The official Lua version history says Lua 5.5.0 was released on 2025-12-22, while Lua 5.4.8 is the current 5.4 bug-fix release from 2025-06-04. It also says different x.y versions have different APIs and virtual machines, and precompiled chunks are not portable between versions.
  • The official Lua 5.5 readme says Lua is distributed as pure ISO C and normally builds into lua, luac, and liblua.a. That makes Lua a plausible native port once capOS has the C userspace and libcapos substrate; it does not make Lua runnable on today’s no-std Rust-only userspace by itself.

Rust implementation candidates checked:

  • mlua is a mature Rust binding layer for PUC Lua, LuaJIT, and Luau. It is not a pure-Rust VM. Its vendored path still builds C/C++ Lua-family sources through mlua-sys, cc, and lua-src/luajit-src, and the public crate uses std, libc, parking_lot, panic catching, and host linker/module assumptions. It is a useful API reference, but it does not avoid the native C/libcapos port.
  • piccolo is the only inspected pure-Rust implementation that looks like a credible capOS bootstrap candidate. It has a stackless VM, fuel-based stepping, memory tracking through gc-arena, safe userdata downcasting, and most core language behavior. The current crate is still std-based, depends on anyhow, thiserror, rand, ahash, and a git-pinned gc-arena, and its built-in I/O path writes to host stdout. Porting it to capOS would require a no_std + alloc fork plus host-library replacement, but that is likely less work than bringing up C Lua before libcapos.
  • silt-lua, hematita, and luar were also inspected. They are pure Rust in varying degrees, but their own READMEs/code show early, incomplete, or CLI-oriented implementations. They are not good foundations for capOS runtime work today.

Design Principles

  1. Lua is not a kernel feature. The kernel sees a normal process with a CapSet and a capability ring.

  2. The runner’s CapSet is the authority. Script text, module names, global variables, and Lua tables are data. They cannot create authority.

  3. In-process sandboxing is defense in depth, not confinement. A trusted service may embed Lua for local configuration or small trusted extensions. Untrusted user scripts must run in a separate process with a narrow CapSet, quotas, and no access to the host service’s private caps.

  4. The standard libraries are curated. Base, coroutine, table, string, math, and utf8 are reasonable starting points. io, os, package, debug, dynamic loading, and process execution are absent by default or replaced by capOS-specific libraries backed by explicit caps.

  5. No raw CapIds in Lua. A Lua capability value is host-owned userdata with a hidden metatable. Scripts can call methods exposed by the wrapper, but they cannot forge a handle by guessing an integer.

  6. Lua version is part of the runtime contract. Precompiled chunks, language behavior, and C API details are series-specific. capOS should pin the runner to a declared Lua series and expose that in manifests and smoke output.

  7. C module loading waits. Dynamic native modules need loader, linker, symbol, and authority policy. The first runner should statically link the selected Lua implementation and capOS host libraries.

Architecture

flowchart TD
    Shell[capos-shell] --> Launcher[RestrictedLauncher]
    Launcher --> Runner[capos-lua process]
    Runner --> Lua[PUC Lua VM]
    Runner --> Rt[capos-rt / libcapos host API]
    Rt --> Ring[capability ring]
    Ring --> Kernel[kernel CapObject dispatch]
    Ring --> Services[userspace services]

    ScriptPkg[ScriptPackage or Namespace cap] --> Runner
    Terminal[TerminalSession cap] --> Runner
    OtherCaps[Exact service caps] --> Runner

capos-lua is just another binary launched by the shell or init-owned service graph. The parent chooses the script source and the exact caps. The runner creates one Lua state, installs selected libraries, wraps granted caps as userdata, loads the script with a controlled environment, executes it in protected mode, flushes queued releases, and exits with a normal process status.

The initial implementation should be a standalone runner, not Lua embedded in capos-shell. Keeping the runner as a child process prevents script bugs, Lua VM bugs, and accidental infinite loops from corrupting the interactive shell state. It also gives QEMU smokes a clear process boundary to inspect.

Version Choice

Use PUC Lua, not LuaJIT, for the first runner.

As of 2026-04-25, Lua 5.5.0 is the current upstream series and has features that fit capOS scripting: explicit global declarations, compact arrays, and static fixed binaries. It is the right default target for new capOS-native scripts.

Keep a narrow compatibility option open for Lua 5.4.8 if imported scripts or libraries require it. Do not mix bytecode or native modules between Lua series. A script package should declare:

language = "lua"
series = "5.5"
entry = "main.lua"

Source scripts are preferable to precompiled chunks for reviewability. If precompiled chunks are allowed later, they must be tied to the exact runtime series and treated as trusted build inputs.

There is one practical sequencing exception: a piccolo-based capos-lua-smoke may be the fastest way to prove the capOS host API before C userspace support exists. That should be treated as an implementation bootstrap, not as a promise of exact PUC Lua compatibility. If capOS takes that route, the smoke should declare the runtime as piccolo rather than lua-5.5.

Host API

The first host API should be explicit and boring:

local capos = require("capos")

local terminal = capos.require_cap("terminal", "TerminalSession")
terminal:write_line("hello from Lua")

local now = capos.require_cap("timer", "Timer"):now()
terminal:write_line("now_ns=" .. tostring(now))

capos.require_cap(name, interface) looks up a bootstrap cap by manifest name and checks the expected interface metadata before returning userdata. It fails closed if the cap is absent or has the wrong interface.

Generated or handwritten bindings should expose method names, not method numbers. The binding owns Cap’n Proto serialization through capos-rt or libcapos; scripts should not construct raw SQEs, raw method IDs, transfer descriptors, or cap_enter calls.

Transferred result caps become owned Lua userdata. Release is deterministic when possible:

do
  local h <close> = launcher:spawn({
    name = "child",
    binary = "timer-smoke",
    grants = { terminal = terminal },
  })
  local code = h:wait()
end

Finalizers may queue cleanup, but they are not the primary lifetime contract. The runner must flush owned-handle releases at script return and process exit.

Standard Library Policy

Initial allowed libraries:

LibraryPolicy
baseLoad selected safe functions. load is allowed only with text mode and a supplied environment.
coroutineAllowed for cooperative script structure. It does not map to OS threads.
table, string, math, utf8Allowed.
debugDenied by default. It pierces ordinary Lua abstraction and should require an explicit developer-profile cap.
ioDenied by default. Replace with capos wrappers over TerminalSession, future File, ByteStream, or Namespace caps.
osDenied by default. Replace time, exit, and process operations with cap-backed methods.
packageRestricted. require searches a script package or namespace cap, not host paths or environment variables.
dynamic C modulesDenied until native module loading has a reviewed authority model.

Lua _ENV is useful for presenting a small global namespace, but it is not a security boundary by itself. The security boundary is the process plus its CapSet.

Script Sources

The current ProcessSpawner.spawn shape names a binary and grants caps; it does not yet pass arbitrary argument vectors or script blobs. That creates an implementation dependency for useful Lua scripting.

Near-term options, in order:

  1. Smoke-only compiled script: capos-lua-smoke statically embeds one script string in .rodata and proves the host API. This is not the general product, but it verifies the Lua VM, allocator, CapSet lookup, and terminal output without new startup ABI.

  2. Runner config cap: init or the shell grants a read-only ScriptPackage or ConfigBlob cap to capos-lua. The runner asks that cap for main.lua and module bytes. This keeps script data out of the kernel and fits the existing capability model.

  3. Storage-backed scripts: after Store/Namespace exists, scripts live under a granted namespace. require searches only that namespace and only through a read-only script-package view unless the script also receives a writable namespace cap.

Do not add a Lua-specific boot manifest field or kernel cap. Script packaging belongs to init, shell, storage, or a userspace package service.

Shell Integration

The shell should treat Lua as a launched workload:

run "capos-lua" with {
  terminal: @terminal
  timer:    @timer
  scripts:  @home.sub("scripts/admin")
}

Later, the shell can add sugar such as:

lua scripts/admin/inspect.lua with { terminal: @terminal, timer: @timer }

That sugar must compile to the same explicit spawn plan. There is no implicit inheritance of the shell’s full current CapSet.

Agent mode can also use Lua, but Lua should be a tool target rather than the model itself. The agent runner may advertise “run this approved Lua script” as a consent-gated tool. The model still does not receive session caps.

Adventure Game Use

The adventure game is a good later demonstration target because it needs both strict authority and authorable behavior. The kernel and service capabilities still enforce authority; Lua should only express deterministic scenario logic over the caps granted to the script runner.

Suitable Lua-owned behavior:

  • mission beat selection,
  • deterministic NPC dialogue state machines,
  • quest-board text,
  • hint selection,
  • debrief variants,
  • scripted reactions that call typed game APIs through granted object caps.

Unsuitable Lua-owned behavior:

  • deciding whether a player has authority,
  • mutating relic custody without a typed service call,
  • applying combat damage outside the game service,
  • minting or transferring caps,
  • holding broad spawn, debug, filesystem, or network authority by default.

The useful proof is language independence: a Rust adventure service and a Lua scenario script should both demonstrate proper capability use, including bounded failures when a script lacks a required cap.

Blocking, Async, and Coroutines

The first runner can use synchronous typed client calls over the existing single-owner ring client. A blocking Lua method blocks the runner process, which is acceptable for the first operator-script use case.

Coroutines provide script-local cooperative structure, not OS scheduling. A future runtime reactor can resume Lua coroutines when capability completions arrive, but that should wait until the capOS runtime has a general demux path for threaded and async clients. Do not design Lua-specific CQ demultiplexing.

Security Model

Threat boundaries:

  • Script source is untrusted input until parsed and loaded in protected mode.
  • Script packages are trusted build or storage inputs only when their source, digest, author, and runtime series are review-visible.
  • The Lua VM is not trusted to confine hostile code inside a privileged host process.
  • Capability wrappers must validate method parameters, buffer sizes, transfer counts, and result-cap interface IDs before translating Lua values into ring calls.
  • Terminal and audit output must not print secrets. Lua error rendering should use bounded messages and avoid dumping arbitrary cap userdata internals.

Default deny list for untrusted scripts:

  • no debug,
  • no dynamic module loading,
  • no raw os/io,
  • no broad ProcessSpawner,
  • no broad network manager,
  • no boot package,
  • no mutable namespace unless that is the explicit script purpose,
  • no host environment variables.

Quotas matter. The first useful quota is process memory. CPU budgets, timer budgets, and capability-call quotas should follow the normal capOS scheduling and resource-accounting path rather than special Lua hooks.

Implementation Phases

Phase 0: Contract and Host Surface

  • Add this proposal and update the userspace-binaries language note.
  • Decide the initial Lua series in a checked-in design note or manifest field.
  • Define the minimal capos Lua host library: require_cap, interfaces, print routing, error rendering, and owned-cap release.
  • Decide whether the first proof waits for PUC Lua via C/libcapos or forks piccolo into a no_std + alloc temporary Rust VM path. If piccolo is used, keep the compatibility contract explicit and do not label the runner as Lua 5.5.

Phase 1: Native Runner Smoke

  • Build a static capos-lua-smoke userspace binary.
  • Load selected Lua libraries only.
  • Expose TerminalSession.writeLine and Timer.now.
  • Run one embedded script in QEMU and assert output plus absence of denied io/os/debug APIs.
  • Verify wrong-interface and missing-cap failures are typed script errors, not panics.

Phase 2: Script Package Input

  • Add a userspace-owned script source cap or startup-config path.
  • Let shell/init launch capos-lua with a selected package and exact grants.
  • Implement restricted require over the package.
  • Add QEMU proof for a granted TerminalSession call and a denied ungranted cap lookup.

Phase 3: Generated Capability Bindings

  • Generate Lua binding metadata from schema/capos.capnp or from the same interface registry used by the native shell.
  • Expose method names and structured params/results.
  • Add transfer-result cap adoption and deterministic release tests.
  • Keep raw Cap’n Proto builders out of script code unless a separate developer diagnostic cap grants that power.

Phase 4: Shell and Service Use

  • Add shell sugar for script execution after the exact spawn plan exists.
  • Permit trusted services to embed Lua only when they can prove the embedded state holds no extra authority beyond what the script should use.
  • Add audit records for script launch, script package digest, grants, exit status, and authority-touching cap calls when audit caps are available.

Validation

The first implementation is not complete until it has QEMU evidence:

  • A Lua script prints through a granted TerminalSession.
  • The same script cannot use io, os.execute, debug, or an ungranted cap.
  • A missing or wrong-interface cap lookup returns a bounded Lua error.
  • An owned result cap is released deterministically.
  • The runner exits cleanly and does not wedge the shell.

Host tests should cover Lua value conversion and binding generation once those pieces are pure enough to test outside QEMU. Do not claim “Lua scripting works” from host tests alone; the useful behavior is authority-shaped process execution in capOS.

Open Questions

  • Whether the initial implementation should wait for libcapos C support or use a temporary Rust Lua VM to prove the host API earlier.
  • The exact startup-config mechanism for selecting main.lua before storage and general process arguments exist.
  • Whether Lua 5.5 should be the only supported series or whether a 5.4 runner is worth carrying for ecosystem compatibility.
  • How much schema reflection the Lua binding should expose before the native shell’s generic call surface lands.
  • Which audit fields belong in AuditLog once script launch becomes an operator workflow rather than a smoke.