# Proposal: Lua Scripting

How capOS should add Lua as a small capability-aware scripting environment
without turning scripts into ambiently privileged shell fragments.


## Problem

capOS needs a lightweight scripting path for operator workflows, demos,
service glue, and eventually interactive shell automation. The native shell
already exposes typed capabilities and explicit child grants, but a shell REPL
is not a full programming language. Lua is attractive because it is small,
embeddable, and designed to let a host provide the domain API.

The risk is predictable: "system scripting" often becomes an escape hatch
around the operating system model. A script runner that receives broad
`ProcessSpawner`, `BootPackage`, filesystem, network, or terminal authority
and then exposes `io`, `os`, `package.loadlib`, or raw handle integers would
recreate the ambient authority capOS is trying to avoid.

The target is not "make Lua root." The target is:

- Lua as ordinary userspace code.
- Capabilities as the only authority.
- Host-provided Lua libraries that map to typed capOS interfaces.
- Exact grants for script processes, with no default filesystem, network,
  process, terminal, or debug authority.

## Scope

In scope:

- A `capos-lua` userspace runner for trusted operator and service scripts.
- A small Lua host API over `capos-rt` typed clients.
- A policy for standard Lua libraries on capOS.
- Script packaging and shell launch shape.
- Validation through QEMU scripts that prove granted and ungranted paths.

Out of scope for the first implementation:

- LuaJIT.
- Dynamic native Lua C modules.
- A POSIX-compatible Lua environment.
- Treating in-process Lua sandboxing as the isolation boundary for hostile
  scripts.
- Kernel awareness of Lua.

## Research Grounding

The actual `docs/research/` contents were checked before selecting grounding
files. Relevant local research:

- [Capability research index](../research.md): keep typed Cap'n Proto
  interfaces as the permission boundary and avoid parallel rights flags.
- [Genode](../research/genode.md): route service access structurally; sessions
  are typed and resource-accounted.
- [Plan 9 and Inferno](../research/plan9-inferno.md): per-process namespaces
  are useful precedent, but capOS should not turn scripts into path-global
  clients.
- [EROS, CapROS, and Coyotos](../research/eros-capros-coyotos.md): confinement
  depends on constructing the subject with only the capabilities it may use.
- [seL4](../research/sel4.md): keep the privileged kernel surface small and
  let userspace policy build higher-level systems.

External Lua references:

- The official [Lua 5.5 manual](https://www.lua.org/manual/5.5/manual.html)
  describes Lua as an embeddable C library with a host program that registers
  C functions callable from Lua.
- The official [Lua version history](https://www.lua.org/versions.html) says
  Lua 5.5.0 was released on 2025-12-22, while Lua 5.4.8 is the current 5.4
  bug-fix release from 2025-06-04. It also says different `x.y` versions have
  different APIs and virtual machines, and precompiled chunks are not portable
  between versions.
- The official [Lua 5.5 readme](https://www.lua.org/manual/5.5/readme.html)
  says Lua is distributed as pure ISO C and normally builds into `lua`,
  `luac`, and `liblua.a`. That makes Lua a plausible native port once capOS has
  the C userspace and `libcapos` substrate; it does not make Lua runnable on
  today's no-std Rust-only userspace by itself.

Rust implementation candidates checked:

- [mlua](https://github.com/mlua-rs/mlua) is a mature Rust binding layer for
  PUC Lua, LuaJIT, and Luau. It is not a pure-Rust VM. Its `vendored` path
  still builds C/C++ Lua-family sources through `mlua-sys`, `cc`, and
  `lua-src`/`luajit-src`, and the public crate uses `std`, `libc`,
  `parking_lot`, panic catching, and host linker/module assumptions. It is a
  useful API reference, but it does not avoid the native C/`libcapos` port.
- [piccolo](https://github.com/kyren/piccolo) is the only inspected pure-Rust
  implementation that looks like a credible capOS bootstrap candidate. It has
  a stackless VM, fuel-based stepping, memory tracking through `gc-arena`,
  safe userdata downcasting, and most core language behavior. The current crate
  is still `std`-based, depends on `anyhow`, `thiserror`, `rand`, `ahash`, and
  a git-pinned `gc-arena`, and its built-in I/O path writes to host stdout.
  Porting it to capOS would require a `no_std + alloc` fork plus host-library
  replacement, but that is likely less work than bringing up C Lua before
  `libcapos`.
- [silt-lua](https://github.com/auxnon/silt-lua), [hematita](https://github.com/danii/hematita),
  and [luar](https://github.com/sunray-ley/luar) were also inspected. They are
  pure Rust in varying degrees, but their own READMEs/code show early,
  incomplete, or CLI-oriented implementations. They are not good foundations
  for capOS runtime work today.

## Design Principles

1. **Lua is not a kernel feature.** The kernel sees a normal process with a
   CapSet and a capability ring.

2. **The runner's CapSet is the authority.** Script text, module names,
   global variables, and Lua tables are data. They cannot create authority.

3. **In-process sandboxing is defense in depth, not confinement.** A trusted
   service may embed Lua for local configuration or small trusted extensions.
   Untrusted user scripts must run in a separate process with a narrow CapSet,
   quotas, and no access to the host service's private caps.

4. **The standard libraries are curated.** Base, coroutine, table, string,
   math, and utf8 are reasonable starting points. `io`, `os`, `package`,
   `debug`, dynamic loading, and process execution are absent by default or
   replaced by capOS-specific libraries backed by explicit caps.

5. **No raw CapIds in Lua.** A Lua capability value is host-owned userdata with
   a hidden metatable. Scripts can call methods exposed by the wrapper, but
   they cannot forge a handle by guessing an integer.

6. **Lua version is part of the runtime contract.** Precompiled chunks,
   language behavior, and C API details are series-specific. capOS should pin
   the runner to a declared Lua series and expose that in manifests and smoke
   output.

7. **C module loading waits.** Dynamic native modules need loader, linker,
   symbol, and authority policy. The first runner should statically link the
   selected Lua implementation and capOS host libraries.

## Architecture

```mermaid
flowchart TD
    Shell[capos-shell] --> Launcher[RestrictedLauncher]
    Launcher --> Runner[capos-lua process]
    Runner --> Lua[PUC Lua VM]
    Runner --> Rt[capos-rt / libcapos host API]
    Rt --> Ring[capability ring]
    Ring --> Kernel[kernel CapObject dispatch]
    Ring --> Services[userspace services]

    ScriptPkg[ScriptPackage or Namespace cap] --> Runner
    Terminal[TerminalSession cap] --> Runner
    OtherCaps[Exact service caps] --> Runner
```

`capos-lua` is just another binary launched by the shell or init-owned
service graph. The parent chooses the script source and the exact caps. The
runner creates one Lua state, installs selected libraries, wraps granted caps
as userdata, loads the script with a controlled environment, executes it in
protected mode, flushes queued releases, and exits with a normal process
status.

The initial implementation should be a standalone runner, not Lua embedded in
`capos-shell`. Keeping the runner as a child process prevents script bugs,
Lua VM bugs, and accidental infinite loops from corrupting the interactive
shell state. It also gives QEMU smokes a clear process boundary to inspect.

## Version Choice

Use PUC Lua, not LuaJIT, for the first runner.

As of 2026-04-25, Lua 5.5.0 is the current upstream series and has features
that fit capOS scripting: explicit global declarations, compact arrays, and
static fixed binaries. It is the right default target for new capOS-native
scripts.

Keep a narrow compatibility option open for Lua 5.4.8 if imported scripts or
libraries require it. Do not mix bytecode or native modules between Lua
series. A script package should declare:

```text
language = "lua"
series = "5.5"
entry = "main.lua"
```

Source scripts are preferable to precompiled chunks for reviewability. If
precompiled chunks are allowed later, they must be tied to the exact runtime
series and treated as trusted build inputs.

There is one practical sequencing exception: a `piccolo`-based
`capos-lua-smoke` may be the fastest way to prove the capOS host API before C
userspace support exists. That should be treated as an implementation
bootstrap, not as a promise of exact PUC Lua compatibility. If capOS takes that
route, the smoke should declare the runtime as `piccolo` rather than `lua-5.5`.

## Host API

The first host API should be explicit and boring:

```lua
local capos = require("capos")

local terminal = capos.require_cap("terminal", "TerminalSession")
terminal:write_line("hello from Lua")

local now = capos.require_cap("timer", "Timer"):now()
terminal:write_line("now_ns=" .. tostring(now))
```

`capos.require_cap(name, interface)` looks up a bootstrap cap by manifest name
and checks the expected interface metadata before returning userdata. It fails
closed if the cap is absent or has the wrong interface.

Generated or handwritten bindings should expose method names, not method
numbers. The binding owns Cap'n Proto serialization through `capos-rt` or
`libcapos`; scripts should not construct raw SQEs, raw method IDs, transfer
descriptors, or `cap_enter` calls.

Transferred result caps become owned Lua userdata. Release is deterministic
when possible:

```lua
do
  local h <close> = launcher:spawn({
    name = "child",
    binary = "timer-smoke",
    grants = { terminal = terminal },
  })
  local code = h:wait()
end
```

Finalizers may queue cleanup, but they are not the primary lifetime contract.
The runner must flush owned-handle releases at script return and process exit.

## Standard Library Policy

Initial allowed libraries:

| Library | Policy |
| --- | --- |
| `base` | Load selected safe functions. `load` is allowed only with text mode and a supplied environment. |
| `coroutine` | Allowed for cooperative script structure. It does not map to OS threads. |
| `table`, `string`, `math`, `utf8` | Allowed. |
| `debug` | Denied by default. It pierces ordinary Lua abstraction and should require an explicit developer-profile cap. |
| `io` | Denied by default. Replace with `capos` wrappers over `TerminalSession`, future `File`, `ByteStream`, or `Namespace` caps. |
| `os` | Denied by default. Replace time, exit, and process operations with cap-backed methods. |
| `package` | Restricted. `require` searches a script package or namespace cap, not host paths or environment variables. |
| dynamic C modules | Denied until native module loading has a reviewed authority model. |

Lua `_ENV` is useful for presenting a small global namespace, but it is not a
security boundary by itself. The security boundary is the process plus its
CapSet.

## Script Sources

The current `ProcessSpawner.spawn` shape names a binary and grants caps; it
does not yet pass arbitrary argument vectors or script blobs. That creates an
implementation dependency for useful Lua scripting.

Near-term options, in order:

1. **Smoke-only compiled script:** `capos-lua-smoke` statically embeds one
   script string in `.rodata` and proves the host API. This is not the general
   product, but it verifies the Lua VM, allocator, CapSet lookup, and terminal
   output without new startup ABI.

2. **Runner config cap:** init or the shell grants a read-only `ScriptPackage`
   or `ConfigBlob` cap to `capos-lua`. The runner asks that cap for `main.lua`
   and module bytes. This keeps script data out of the kernel and fits the
   existing capability model.

3. **Storage-backed scripts:** after Store/Namespace exists, scripts live
   under a granted namespace. `require` searches only that namespace and only
   through a read-only script-package view unless the script also receives a
   writable namespace cap.

Do not add a Lua-specific boot manifest field or kernel cap. Script packaging
belongs to init, shell, storage, or a userspace package service.

## Shell Integration

The shell should treat Lua as a launched workload:

```text
run "capos-lua" with {
  terminal: @terminal
  timer:    @timer
  scripts:  @home.sub("scripts/admin")
}
```

Later, the shell can add sugar such as:

```text
lua scripts/admin/inspect.lua with { terminal: @terminal, timer: @timer }
```

That sugar must compile to the same explicit spawn plan. There is no implicit
inheritance of the shell's full current CapSet.

Agent mode can also use Lua, but Lua should be a tool target rather than the
model itself. The agent runner may advertise "run this approved Lua script" as
a consent-gated tool. The model still does not receive session caps.

## Adventure Game Use

The adventure game is a good later demonstration target because it needs both
strict authority and authorable behavior. The kernel and service capabilities
still enforce authority; Lua should only express deterministic scenario logic
over the caps granted to the script runner.

Suitable Lua-owned behavior:

- mission beat selection,
- deterministic NPC dialogue state machines,
- quest-board text,
- hint selection,
- debrief variants,
- scripted reactions that call typed game APIs through granted object caps.

Unsuitable Lua-owned behavior:

- deciding whether a player has authority,
- mutating relic custody without a typed service call,
- applying combat damage outside the game service,
- minting or transferring caps,
- holding broad spawn, debug, filesystem, or network authority by default.

The useful proof is language independence: a Rust adventure service and a Lua
scenario script should both demonstrate proper capability use, including
bounded failures when a script lacks a required cap.

## Blocking, Async, and Coroutines

The first runner can use synchronous typed client calls over the existing
single-owner ring client. A blocking Lua method blocks the runner process,
which is acceptable for the first operator-script use case.

Coroutines provide script-local cooperative structure, not OS scheduling. A
future runtime reactor can resume Lua coroutines when capability completions
arrive, but that should wait until the capOS runtime has a general demux path
for threaded and async clients. Do not design Lua-specific CQ demultiplexing.

## Security Model

Threat boundaries:

- Script source is untrusted input until parsed and loaded in protected mode.
- Script packages are trusted build or storage inputs only when their source,
  digest, author, and runtime series are review-visible.
- The Lua VM is not trusted to confine hostile code inside a privileged host
  process.
- Capability wrappers must validate method parameters, buffer sizes, transfer
  counts, and result-cap interface IDs before translating Lua values into ring
  calls.
- Terminal and audit output must not print secrets. Lua error rendering should
  use bounded messages and avoid dumping arbitrary cap userdata internals.

Default deny list for untrusted scripts:

- no `debug`,
- no dynamic module loading,
- no raw `os`/`io`,
- no broad `ProcessSpawner`,
- no broad network manager,
- no boot package,
- no mutable namespace unless that is the explicit script purpose,
- no host environment variables.

Quotas matter. The first useful quota is process memory. CPU budgets, timer
budgets, and capability-call quotas should follow the normal capOS scheduling
and resource-accounting path rather than special Lua hooks.

## Implementation Phases

### Phase 0: Contract and Host Surface

- Add this proposal and update the userspace-binaries language note.
- Decide the initial Lua series in a checked-in design note or manifest field.
- Define the minimal `capos` Lua host library:
  `require_cap`, `interfaces`, `print` routing, error rendering, and owned-cap
  release.
- Decide whether the first proof waits for PUC Lua via C/`libcapos` or forks
  `piccolo` into a `no_std + alloc` temporary Rust VM path. If `piccolo` is
  used, keep the compatibility contract explicit and do not label the runner
  as Lua 5.5.

### Phase 1: Native Runner Smoke

- Build a static `capos-lua-smoke` userspace binary.
- Load selected Lua libraries only.
- Expose `TerminalSession.writeLine` and `Timer.now`.
- Run one embedded script in QEMU and assert output plus absence of denied
  `io`/`os`/`debug` APIs.
- Verify wrong-interface and missing-cap failures are typed script errors, not
  panics.

### Phase 2: Script Package Input

- Add a userspace-owned script source cap or startup-config path.
- Let shell/init launch `capos-lua` with a selected package and exact grants.
- Implement restricted `require` over the package.
- Add QEMU proof for a granted `TerminalSession` call and a denied ungranted
  cap lookup.

### Phase 3: Generated Capability Bindings

- Generate Lua binding metadata from `schema/capos.capnp` or from the same
  interface registry used by the native shell.
- Expose method names and structured params/results.
- Add transfer-result cap adoption and deterministic release tests.
- Keep raw Cap'n Proto builders out of script code unless a separate developer
  diagnostic cap grants that power.

### Phase 4: Shell and Service Use

- Add shell sugar for script execution after the exact spawn plan exists.
- Permit trusted services to embed Lua only when they can prove the embedded
  state holds no extra authority beyond what the script should use.
- Add audit records for script launch, script package digest, grants, exit
  status, and authority-touching cap calls when audit caps are available.

## Validation

The first implementation is not complete until it has QEMU evidence:

- A Lua script prints through a granted `TerminalSession`.
- The same script cannot use `io`, `os.execute`, `debug`, or an ungranted cap.
- A missing or wrong-interface cap lookup returns a bounded Lua error.
- An owned result cap is released deterministically.
- The runner exits cleanly and does not wedge the shell.

Host tests should cover Lua value conversion and binding generation once those
pieces are pure enough to test outside QEMU. Do not claim "Lua scripting
works" from host tests alone; the useful behavior is authority-shaped process
execution in capOS.

## Open Questions

- Whether the initial implementation should wait for `libcapos` C support or
  use a temporary Rust Lua VM to prove the host API earlier.
- The exact startup-config mechanism for selecting `main.lua` before storage
  and general process arguments exist.
- Whether Lua 5.5 should be the only supported series or whether a 5.4 runner
  is worth carrying for ecosystem compatibility.
- How much schema reflection the Lua binding should expose before the native
  shell's generic call surface lands.
- Which audit fields belong in `AuditLog` once script launch becomes an
  operator workflow rather than a smoke.
