Proposal: Scientific Standard Package And Agent Lab Capabilities

capOS should eventually ship a curated scientific standard package: a capability-scoped service graph that gives agents and users high-level access to computer algebra, numerical computing, solvers, formal proof systems, notebooks, reproducible package environments, and experiment records.

This is not a request to turn the kernel into a scientific runtime. The kernel still provides capability tables, address spaces, scheduling, IPC, memory, device, and storage primitives. The scientific package lives in userspace, above package, workspace, job-graph, model, and broker services.

Design Grounding

Local grounding:

External grounding is summarized in the research notes and covers PARI/GP, SageMath, GAP, Singular, OSCAR, SymPy, SciPy, R, Octave, JupyterLab, Z3, cvc5, HiGHS, SCIP, OR-Tools, JuMP, CVXPY, Lean/mathlib, Rocq, Isabelle, Agda, Spack, Guix-HPC, Nix, Apptainer, Linux namespaces/cgroups/seccomp/Landlock, User-Mode Linux, gVisor, QEMU/KVM, Firecracker, Kata Containers, and Linux CPU isolation/housekeeping.

Goals

Give users, agent runners, and batch services high-level scientific capabilities without granting an unrestricted shell.
Make exact computation, numerical computation, optimization, SMT solving, and proof checking ordinary capOS services with explicit authority.
Preserve reproducibility: package closure, input data, seed, backend, version, timeout, quota, output, and audit metadata travel with every result.
Reuse mature upstream tools wherever possible.
Support both interactive research and unattended agent jobs.
Keep tool authority separate from model inference. Models propose; trusted capOS runners execute through broker policy.

Non-Goals

Do not invent a replacement for SageMath, PARI, GAP, Singular, OSCAR, SciPy, Jupyter, Lean, Rocq, Isabelle, or established solvers.
Do not add POSIX, Docker, Conda, Nix, Guix, or Spack as ambient system authority.
Do not make notebook execution equivalent to shell access.
Do not treat SMT or CAS answers as formal proof unless a proof checker validates an artifact.
Do not make this package part of the active in-process threading milestone.

Package Profiles

The standard package should be split into explicit profiles so capOS can ship or grant only what a session needs.

Profile	Contents	Primary use
`scientific-base`	PARI/GP or PARI C service, SymPy, Z3, cvc5, HiGHS, Lean checker, artifact store	Low-risk exact math, solver, and proof assistance
`scientific-research`	SageMath, GAP, Singular, OSCAR/Julia, R, Octave, SciPy, JuMP, CVXPY, SCIP, OR-Tools	Full interactive research workflows
`scientific-notebook`	Jupyter-compatible notebook/session service and language kernels	Literate experiments with replayable artifacts
`scientific-lab`	Experiment registry, workspaces, job graphs, retrieval, review gates, GPU/model integration	Long-running research labs with users, agents, and review workflows
`scientific-commercial`	Optional proprietary/commercial connectors such as Wolfram Engine or commercial solvers	Explicitly licensed site-local extensions

Profiles grant service roots, not every concrete backend cap. A user, agent runner, or batch service normally receives a ScientificSession facade that advertises only the tools and methods permitted for the current session.

Capability Surface

Catalog And Environment

ScientificCatalog: lists installed profiles, backend identities, supported interfaces, licenses, package closures, and known reproducibility caveats.
PackageCatalog: resolves named package environments to content-addressed closures.
PackageClosure: immutable description of packages, build inputs, toolchain versions, hashes, license metadata, vulnerability metadata, and supported CPU/GPU features.
Environment: starts a bounded interpreter, solver, proof, notebook, or job process with exactly the selected closure and granted caps.

Workspaces And Artifacts

ResearchWorkspace: branchable namespace for source, notebooks, data, generated files, proofs, and run records.
ArtifactStore: immutable objects for solver inputs, proof logs, notebooks, datasets, plots, tables, binaries, and transcripts.
ProvenanceLog: append-only record of who or which agent produced an artifact, with model/tool/package/session metadata.
ExperimentRegistry: immutable run specifications plus mutable review status, labels, and publication decisions.

CAS And Mathematical Services

ComputerAlgebra: general symbolic manipulation facade for factorization, simplification, integration, exact linear algebra, polynomial operations, and expression normalization.
NumberTheory: PARI-backed exact number theory, elliptic curves, modular forms, algebraic number fields, L-functions, and related computations.
DiscreteAlgebra: GAP-backed group, representation, finite algebra, and combinatorics workflows.
PolynomialAlgebra: Singular-backed ideals, modules, Groebner bases, quotient rings, and algebraic geometry computations.
JuliaAlgebraKernel: OSCAR/Nemo/Hecke/AbstractAlgebra workflows for cases where a general Julia session is the correct backend.

Each method returns structured values when practical and always records the backend, package closure, input, output, elapsed time, and resource envelope.

Solvers

SmtSolver: typed SMT-LIB import/export, assertions, check-sat, model, unsat core, timeout, random seed, proof/certificate metadata, backend selection among Z3/cvc5 or future solvers.
OptimizationSolver: LP, MIP, QP, conic, CP-SAT, routing, scheduling, and nonlinear solve jobs with declared model format, backend, objective, constraints, time limit, memory limit, gap/tolerance policy, and solution status.
ModelingSession: JuMP/CVXPY/OR-Tools-style language session for models that need high-level construction rather than direct serialized input.

Solver calls must distinguish optimal, feasible, infeasible, unbounded, unknown, timeout, resource_exceeded, and backend_error. User-facing tools should not collapse these into a single textual answer.

Formal Proof

ProofCatalog: installed proof assistants, libraries, theorem indexes, and package closures.
ProofSession: checkout, edit, build, query goals, run tactics, run tests, and produce checked proof artifacts.
ProofChecker: batch verification of a named theorem or project under a pinned closure.
LemmaSearch: retrieval over local proof libraries, declarations, docs, and prior accepted project artifacts.

The first implementation target should be Lean plus mathlib because it is the most useful default for current agent-assisted mathematics. Rocq, Isabelle, and Agda should remain first-class future backends with separate kernels and project layouts.

Notebook And Interactive Kernel Sessions

NotebookDocument: immutable or branchable notebook object with cells, outputs, attachments, environment id, and execution provenance.
NotebookSession: starts kernels, executes cells, captures outputs, renders rich media, and gates side effects.
KernelSession: Python, Sage, Julia, R, Octave, Lean, GAP, PARI, or other REPL-like process with explicit workspace and package environment caps.

Notebook execution is authority-bearing. Opening a notebook for reading should not execute it. Running a notebook should prompt or use session policy for network access, package installation, writes outside the workspace, long jobs, credential access, GPU use, and publication.

Agent Lab Architecture

An LLM agent research lab on capOS should be a service graph:

flowchart LR
    User[User] --> Runner[Agent Runner]
    Runner --> Broker[AuthorityBroker]
    Runner --> Model[LanguageModel]
    Runner --> Sci[ScientificSession]
    Sci --> CAS[CAS Services]
    Sci --> Solvers[Solver Services]
    Sci --> Proof[Proof Services]
    Sci --> Notebook[NotebookSession]
    Sci --> Jobs[JobGraph]
    Sci --> Workspace[ResearchWorkspace]
    Workspace --> Artifacts[ArtifactStore]
    Jobs --> Compute[CPU/GPU/Storage/Network Caps]
    Runner --> Audit[ProvenanceLog]

The runner owns the user session and applies tool policy. The model service does not hold scientific tool caps directly. Tool calls from the model become typed proposals to the runner, and the runner invokes ScientificSession methods only when broker policy allows it.

Authority And Safety Rules

A tool cap grants only the named interface. NumberTheory does not imply file, shell, network, package install, or proof-publication authority.
Package installation and environment resolution are separate authorities from executing an already-pinned environment.
External network fetch is separate from local computation. Literature search, package download, model-provider calls, and dataset upload are different caps.
Every long-running calculation must have a job id, quota, cancellation path, and durable status.
GPU use requires a GPU/session cap and should record driver/runtime/kernel metadata.
Proof acceptance must be checker-backed. Agent confidence, CAS evidence, or SMT success is advisory unless the proof kernel accepts the artifact.
Published results must cite the artifact ids and package closure ids that produced them.
Commercial or proprietary engines must be opt-in, labeled, and grantable only through site policy.
Linux workload placement must distinguish ordinary resource-limited work from capOS-native auto-nohz-eligible work. Linux nohz_full inside a guest may be useful compatibility or benchmark state, but capOS CPU isolation, auto full-nohz activation, housekeeping placement, IRQ routing, and exclusive CPU use are outer scheduler-authority decisions, not options an agent tool descriptor can set by itself.

Linux Workload And Virtualization Strategy

The first implementation is likely to consume the generic Linux workload sandbox substrate for large scientific stacks. Scientific jobs should be selected by trust and compatibility class:

Backend	Use	Boundary claim
namespace/cgroup/seccomp/Landlock sandbox	trusted batch tools and fast command wrappers	shares host Linux kernel; useful policy layer, not strong multi-tenant isolation
bubblewrap/nsjail	early command-wrapper executor for `gp`, solvers, proof checkers, and scripts	structured process sandbox over Linux primitives
User-Mode Linux	developer/debug fallback when KVM is unavailable	Linux-as-host-process compatibility; not the main strong-isolation path
gVisor	container-compatible higher-risk workloads	per-sandbox application kernel reduces direct host-kernel exposure
QEMU/KVM Linux guest	broad compatibility, full distro roots, package builds, untrusted notebooks	hardware-backed guest kernel boundary
Firecracker or Kata-style microVM	repeated stateless solver/proof/notebook jobs with narrow device models	hardware-backed microVM boundary with smaller operational surface
dedicated host or single-tenant node	high-risk tenants, sensitive data, GPU/device passthrough, side-channel-sensitive jobs, long-lived browser/GUI workloads	reduces shared-host and VM-escape blast radius beyond ordinary VM tenancy

The generic LinuxWorkloadSandbox service should record backend, image/rootfs/package hashes, sandbox policy or VM device model, kernel version, CPU affinity, cgroup quota, deployment location, external-host placement metadata, capOS NoHzEligibility/NoHzActivation state for capOS-scheduled proxies or VMMs, guest tickless/nohz state, network policy, artifact inputs, artifact outputs, and exit reason. A result from a namespace sandbox and a result from a KVM guest may be functionally equivalent, but their security, scheduler, and reproducibility claims are different.

For the native capOS auto full-nohz scheduler track, scientific jobs should use the generic workload placement classes:

ordinary placement: cgroup v2 resource limits and optional affinity for normal solver, proof, CAS, package, and notebook jobs.
auto-nohz-eligible placement: explicit capOS eligibility plus CPU-time authority for low-jitter benchmark, realtime, GPU-feed, SQPOLL-like, or latency-bound workload loops. The outer capOS scheduler must know the workload’s vCPU/helper/poller threads and must also account for housekeeping CPUs, IRQ placement, timers, and deferred kernel work. Guest Linux tickless state and external Linux-host isolation state are recorded separately and do not by themselves activate capOS nohz.

Existing Solutions To Adapt

Area	Adapt first	Reasonable capOS adaptation
Number theory	PARI/GP	Wrap `gp` early; use PARI C library for stable service calls later.
Broad math	SageMath	Host as a Python/Sage kernel with pinned closure and notebook integration.
Discrete algebra	GAP	Wrap CLI and package loading; later expose common group-theory methods.
Polynomial algebra	Singular	Wrap command/batch mode; later expose polynomial/ideal operations.
Algebra research	OSCAR	Host Julia/OSCAR kernel; avoid flattening its object model prematurely.
Symbolic Python	SymPy	Embed in Python service for lightweight symbolic calls and code generation.
Scientific Python	NumPy/SciPy	Provide Python kernel and batch-job environments with BLAS/LAPACK metadata.
Statistics	R	Provide Rscript and R kernel sessions with package closure capture.
MATLAB-like workflows	GNU Octave	Provide batch and interactive kernel sessions.
SMT	Z3, cvc5	Provide `SmtSolver` with backend identity, model, core, and timeout fields.
Optimization engines	HiGHS, SCIP, OR-Tools	Provide direct solve jobs and higher-level modeling sessions.
Modeling layers	JuMP, CVXPY	Host Julia/Python modeling kernels and export normalized model artifacts.
Formal proof	Lean/mathlib first; Rocq, Isabelle, Agda later	Provide proof sessions, build logs, theorem search, and checked artifacts.
Notebooks	JupyterLab model	Reuse `.ipynb` concepts and kernels but replace ambient authority with caps.
Package closure	Nix, Guix, Spack	Ingest closures and recipes; expose capOS package catalogs and Store objects.
HPC containers	Apptainer	Use as a Linux-sidecar compatibility bridge, not as native authority.

Staged Implementation

Stage 0: Interface-Only Design

Define schemas for ScientificSession, ArtifactStore, PackageClosure, SmtSolver, OptimizationSolver, ProofSession, and NotebookSession. No backend porting is required. The goal is to make the authority and result model reviewable.

Stage 1: Linux Sidecar Prototype

Run tools on a controlled Linux host or hardware-backed Linux guest and expose them to capOS through a capability proxy. Namespace/cgroup/seccomp/Landlock wrappers are acceptable for trusted batch tools, but untrusted notebooks, model-generated code, package builds, and multi-tenant jobs should use a QEMU/KVM guest first and Firecracker/Kata-style microVMs later. High-risk tenants, sensitive data, GPU/device passthrough, and side-channel-sensitive jobs may require single-tenant hosts instead of shared VM hosts. User-Mode Linux may remain a developer/debug fallback when KVM is unavailable, but it is not the default strong-isolation backend. This proves the API, audit, and reproducibility model before native userspace can run Python, Julia, R, and large C++ stacks.

Initial tools:

PARI/GP;
SymPy;
Z3 and cvc5;
HiGHS;
Lean plus mathlib project build;
immutable artifact store and provenance records.

Stage 2: Native Wrapper Services

When capOS userspace has the necessary binary/runtime support, add command wrapper services for gp, lean/lake, z3, cvc5, highs, Rscript, octave, gap, and Singular. Each wrapper runs with an explicit workspace, environment, timeout, and resource ledger.

Stage 3: Notebook And Language Kernels

Add Jupyter-compatible document storage and kernel-launch policy. Python, Sage, Julia, R, Octave, Lean, and GAP kernels can then run as KernelSession services with capOS-owned artifact capture.

Stage 4: Package-Closure Store

Import or build Nix/Guix/Spack-style closures into capOS Store and Namespace capabilities. Package resolution stays outside the kernel. The important kernel-visible property is that executable environments are immutable objects with explicit resource and authority grants.

Stage 5: Lab Workflow

Combine scientific sessions with hosted-agent workspaces, experiment registry, review gates, browser/literature tools, GPU/model services, and stateful job graphs. This is the point where capOS becomes a credible LLM agent research lab rather than a collection of math commands.

Open Questions

Should the first sidecar protocol be Cap’n Proto RPC directly, MCP through a gateway, or both?
Which package-closure source should capOS ingest first: Nix for breadth, Guix for scientific reproducibility, or Spack for HPC variants?
Which hardware-backed Linux guest backend should be first after QEMU/KVM: Firecracker for narrow batch workers, Kata-style VM containers for OCI integration, or both?
Which workload classes are eligible for capOS native auto full-nohz placement, and how should that map to future CpuIsolationLease, NoHzEligibility, NoHzActivation, and SchedulingContext authority?
How much of .ipynb should be preserved versus represented as a capOS-native notebook object with import/export?
Which proof artifacts can be reduced to small trusted checker inputs, and which require full project build logs for confidence?
How should floating-point nondeterminism and randomized solver behavior be summarized so agents do not overclaim exactness?
Where should license policy live: package catalog, broker policy, or both?

Near-Term Recommendation

Do not start by porting SageMath or Jupyter. Start with a small scientific-base sidecar proof:

NumberTheory.eval backed by PARI/GP;
SmtSolver.check backed by Z3 and cvc5;
OptimizationSolver.solve backed by HiGHS for LP/QP/MIP smoke cases;
ProofChecker.build backed by Lean/mathlib for a pinned project;
immutable artifact/provenance records for every call.

That profile gives users, agent runners, and batch services exact arithmetic, constraint checking, optimization, and formal proof validation while keeping authority narrow enough for review. SageMath, OSCAR, Jupyter, R, Octave, and full package closure support should follow after the base interfaces and audit model are credible.

Keyboard shortcuts

capOS Documentation