Proposal: Scientific Standard Package And Agent Lab Capabilities
capOS should eventually ship a curated scientific standard package: a capability-scoped service graph that gives agents and users high-level access to computer algebra, numerical computing, solvers, formal proof systems, notebooks, reproducible package environments, and experiment records.
This is not a request to turn the kernel into a scientific runtime. The kernel still provides capability tables, address spaces, scheduling, IPC, memory, device, and storage primitives. The scientific package lives in userspace, above package, workspace, job-graph, model, and broker services.
Design Grounding
Local grounding:
- Scientific Agent-Lab Software Stack
- Linux Sandboxes And Virtualization For Workloads
- NO_HZ, SQPOLL, and Realtime Scheduling
- Language Models and Agent Runtime
- capOS-Hosted Agent Swarms
- Userspace Binaries
- Stateful Task and Job Graphs
- Storage and Naming
- HPC Parallel Processing Patterns
- GPU Capability
- System Performance Benchmarks
External grounding is summarized in the research notes and covers PARI/GP, SageMath, GAP, Singular, OSCAR, SymPy, SciPy, R, Octave, JupyterLab, Z3, cvc5, HiGHS, SCIP, OR-Tools, JuMP, CVXPY, Lean/mathlib, Rocq, Isabelle, Agda, Spack, Guix-HPC, Nix, Apptainer, Linux namespaces/cgroups/seccomp/Landlock, User-Mode Linux, gVisor, QEMU/KVM, Firecracker, Kata Containers, and Linux CPU isolation/housekeeping.
Goals
- Give users, agent runners, and batch services high-level scientific capabilities without granting an unrestricted shell.
- Make exact computation, numerical computation, optimization, SMT solving, and proof checking ordinary capOS services with explicit authority.
- Preserve reproducibility: package closure, input data, seed, backend, version, timeout, quota, output, and audit metadata travel with every result.
- Reuse mature upstream tools wherever possible.
- Support both interactive research and unattended agent jobs.
- Keep tool authority separate from model inference. Models propose; trusted capOS runners execute through broker policy.
Non-Goals
- Do not invent a replacement for SageMath, PARI, GAP, Singular, OSCAR, SciPy, Jupyter, Lean, Rocq, Isabelle, or established solvers.
- Do not add POSIX, Docker, Conda, Nix, Guix, or Spack as ambient system authority.
- Do not make notebook execution equivalent to shell access.
- Do not treat SMT or CAS answers as formal proof unless a proof checker validates an artifact.
- Do not make this package part of the active in-process threading milestone.
Package Profiles
The standard package should be split into explicit profiles so capOS can ship or grant only what a session needs.
| Profile | Contents | Primary use |
|---|---|---|
scientific-base | PARI/GP or PARI C service, SymPy, Z3, cvc5, HiGHS, Lean checker, artifact store | Low-risk exact math, solver, and proof assistance |
scientific-research | SageMath, GAP, Singular, OSCAR/Julia, R, Octave, SciPy, JuMP, CVXPY, SCIP, OR-Tools | Full interactive research workflows |
scientific-notebook | Jupyter-compatible notebook/session service and language kernels | Literate experiments with replayable artifacts |
scientific-lab | Experiment registry, workspaces, job graphs, retrieval, review gates, GPU/model integration | Long-running research labs with users, agents, and review workflows |
scientific-commercial | Optional proprietary/commercial connectors such as Wolfram Engine or commercial solvers | Explicitly licensed site-local extensions |
Profiles grant service roots, not every concrete backend cap. A user, agent
runner, or batch service normally receives a ScientificSession facade that
advertises only the tools and methods permitted for the current session.
Capability Surface
Catalog And Environment
ScientificCatalog: lists installed profiles, backend identities, supported interfaces, licenses, package closures, and known reproducibility caveats.PackageCatalog: resolves named package environments to content-addressed closures.PackageClosure: immutable description of packages, build inputs, toolchain versions, hashes, license metadata, vulnerability metadata, and supported CPU/GPU features.Environment: starts a bounded interpreter, solver, proof, notebook, or job process with exactly the selected closure and granted caps.
Workspaces And Artifacts
ResearchWorkspace: branchable namespace for source, notebooks, data, generated files, proofs, and run records.ArtifactStore: immutable objects for solver inputs, proof logs, notebooks, datasets, plots, tables, binaries, and transcripts.ProvenanceLog: append-only record of who or which agent produced an artifact, with model/tool/package/session metadata.ExperimentRegistry: immutable run specifications plus mutable review status, labels, and publication decisions.
CAS And Mathematical Services
ComputerAlgebra: general symbolic manipulation facade for factorization, simplification, integration, exact linear algebra, polynomial operations, and expression normalization.NumberTheory: PARI-backed exact number theory, elliptic curves, modular forms, algebraic number fields, L-functions, and related computations.DiscreteAlgebra: GAP-backed group, representation, finite algebra, and combinatorics workflows.PolynomialAlgebra: Singular-backed ideals, modules, Groebner bases, quotient rings, and algebraic geometry computations.JuliaAlgebraKernel: OSCAR/Nemo/Hecke/AbstractAlgebra workflows for cases where a general Julia session is the correct backend.
Each method returns structured values when practical and always records the backend, package closure, input, output, elapsed time, and resource envelope.
Solvers
SmtSolver: typed SMT-LIB import/export, assertions, check-sat, model, unsat core, timeout, random seed, proof/certificate metadata, backend selection among Z3/cvc5 or future solvers.OptimizationSolver: LP, MIP, QP, conic, CP-SAT, routing, scheduling, and nonlinear solve jobs with declared model format, backend, objective, constraints, time limit, memory limit, gap/tolerance policy, and solution status.ModelingSession: JuMP/CVXPY/OR-Tools-style language session for models that need high-level construction rather than direct serialized input.
Solver calls must distinguish optimal, feasible, infeasible,
unbounded, unknown, timeout, resource_exceeded, and backend_error.
User-facing tools should not collapse these into a single textual answer.
Formal Proof
ProofCatalog: installed proof assistants, libraries, theorem indexes, and package closures.ProofSession: checkout, edit, build, query goals, run tactics, run tests, and produce checked proof artifacts.ProofChecker: batch verification of a named theorem or project under a pinned closure.LemmaSearch: retrieval over local proof libraries, declarations, docs, and prior accepted project artifacts.
The first implementation target should be Lean plus mathlib because it is the most useful default for current agent-assisted mathematics. Rocq, Isabelle, and Agda should remain first-class future backends with separate kernels and project layouts.
Notebook And Interactive Kernel Sessions
NotebookDocument: immutable or branchable notebook object with cells, outputs, attachments, environment id, and execution provenance.NotebookSession: starts kernels, executes cells, captures outputs, renders rich media, and gates side effects.KernelSession: Python, Sage, Julia, R, Octave, Lean, GAP, PARI, or other REPL-like process with explicit workspace and package environment caps.
Notebook execution is authority-bearing. Opening a notebook for reading should not execute it. Running a notebook should prompt or use session policy for network access, package installation, writes outside the workspace, long jobs, credential access, GPU use, and publication.
Agent Lab Architecture
An LLM agent research lab on capOS should be a service graph:
flowchart LR
User[User] --> Runner[Agent Runner]
Runner --> Broker[AuthorityBroker]
Runner --> Model[LanguageModel]
Runner --> Sci[ScientificSession]
Sci --> CAS[CAS Services]
Sci --> Solvers[Solver Services]
Sci --> Proof[Proof Services]
Sci --> Notebook[NotebookSession]
Sci --> Jobs[JobGraph]
Sci --> Workspace[ResearchWorkspace]
Workspace --> Artifacts[ArtifactStore]
Jobs --> Compute[CPU/GPU/Storage/Network Caps]
Runner --> Audit[ProvenanceLog]
The runner owns the user session and applies tool policy. The model service
does not hold scientific tool caps directly. Tool calls from the model become
typed proposals to the runner, and the runner invokes ScientificSession
methods only when broker policy allows it.
Authority And Safety Rules
- A tool cap grants only the named interface.
NumberTheorydoes not imply file, shell, network, package install, or proof-publication authority. - Package installation and environment resolution are separate authorities from executing an already-pinned environment.
- External network fetch is separate from local computation. Literature search, package download, model-provider calls, and dataset upload are different caps.
- Every long-running calculation must have a job id, quota, cancellation path, and durable status.
- GPU use requires a GPU/session cap and should record driver/runtime/kernel metadata.
- Proof acceptance must be checker-backed. Agent confidence, CAS evidence, or SMT success is advisory unless the proof kernel accepts the artifact.
- Published results must cite the artifact ids and package closure ids that produced them.
- Commercial or proprietary engines must be opt-in, labeled, and grantable only through site policy.
- Linux workload placement must distinguish ordinary resource-limited work
from capOS-native auto-nohz-eligible work. Linux
nohz_fullinside a guest may be useful compatibility or benchmark state, but capOS CPU isolation, auto full-nohz activation, housekeeping placement, IRQ routing, and exclusive CPU use are outer scheduler-authority decisions, not options an agent tool descriptor can set by itself.
Linux Workload And Virtualization Strategy
The first implementation is likely to consume the generic Linux workload sandbox substrate for large scientific stacks. Scientific jobs should be selected by trust and compatibility class:
| Backend | Use | Boundary claim |
|---|---|---|
| namespace/cgroup/seccomp/Landlock sandbox | trusted batch tools and fast command wrappers | shares host Linux kernel; useful policy layer, not strong multi-tenant isolation |
| bubblewrap/nsjail | early command-wrapper executor for gp, solvers, proof checkers, and scripts | structured process sandbox over Linux primitives |
| User-Mode Linux | developer/debug fallback when KVM is unavailable | Linux-as-host-process compatibility; not the main strong-isolation path |
| gVisor | container-compatible higher-risk workloads | per-sandbox application kernel reduces direct host-kernel exposure |
| QEMU/KVM Linux guest | broad compatibility, full distro roots, package builds, untrusted notebooks | hardware-backed guest kernel boundary |
| Firecracker or Kata-style microVM | repeated stateless solver/proof/notebook jobs with narrow device models | hardware-backed microVM boundary with smaller operational surface |
| dedicated host or single-tenant node | high-risk tenants, sensitive data, GPU/device passthrough, side-channel-sensitive jobs, long-lived browser/GUI workloads | reduces shared-host and VM-escape blast radius beyond ordinary VM tenancy |
The generic LinuxWorkloadSandbox service should record backend,
image/rootfs/package hashes,
sandbox policy or VM device model, kernel version, CPU affinity, cgroup quota,
deployment location, external-host placement metadata, capOS
NoHzEligibility/NoHzActivation state for capOS-scheduled proxies or VMMs,
guest tickless/nohz state, network policy, artifact inputs, artifact outputs,
and exit reason. A result from a namespace sandbox and a result from a KVM
guest may be functionally equivalent, but their security, scheduler, and
reproducibility claims are different.
For the native capOS auto full-nohz scheduler track, scientific jobs should use the generic workload placement classes:
- ordinary placement: cgroup v2 resource limits and optional affinity for normal solver, proof, CAS, package, and notebook jobs.
- auto-nohz-eligible placement: explicit capOS eligibility plus CPU-time authority for low-jitter benchmark, realtime, GPU-feed, SQPOLL-like, or latency-bound workload loops. The outer capOS scheduler must know the workload’s vCPU/helper/poller threads and must also account for housekeeping CPUs, IRQ placement, timers, and deferred kernel work. Guest Linux tickless state and external Linux-host isolation state are recorded separately and do not by themselves activate capOS nohz.
Existing Solutions To Adapt
| Area | Adapt first | Reasonable capOS adaptation |
|---|---|---|
| Number theory | PARI/GP | Wrap gp early; use PARI C library for stable service calls later. |
| Broad math | SageMath | Host as a Python/Sage kernel with pinned closure and notebook integration. |
| Discrete algebra | GAP | Wrap CLI and package loading; later expose common group-theory methods. |
| Polynomial algebra | Singular | Wrap command/batch mode; later expose polynomial/ideal operations. |
| Algebra research | OSCAR | Host Julia/OSCAR kernel; avoid flattening its object model prematurely. |
| Symbolic Python | SymPy | Embed in Python service for lightweight symbolic calls and code generation. |
| Scientific Python | NumPy/SciPy | Provide Python kernel and batch-job environments with BLAS/LAPACK metadata. |
| Statistics | R | Provide Rscript and R kernel sessions with package closure capture. |
| MATLAB-like workflows | GNU Octave | Provide batch and interactive kernel sessions. |
| SMT | Z3, cvc5 | Provide SmtSolver with backend identity, model, core, and timeout fields. |
| Optimization engines | HiGHS, SCIP, OR-Tools | Provide direct solve jobs and higher-level modeling sessions. |
| Modeling layers | JuMP, CVXPY | Host Julia/Python modeling kernels and export normalized model artifacts. |
| Formal proof | Lean/mathlib first; Rocq, Isabelle, Agda later | Provide proof sessions, build logs, theorem search, and checked artifacts. |
| Notebooks | JupyterLab model | Reuse .ipynb concepts and kernels but replace ambient authority with caps. |
| Package closure | Nix, Guix, Spack | Ingest closures and recipes; expose capOS package catalogs and Store objects. |
| HPC containers | Apptainer | Use as a Linux-sidecar compatibility bridge, not as native authority. |
Staged Implementation
Stage 0: Interface-Only Design
Define schemas for ScientificSession, ArtifactStore, PackageClosure,
SmtSolver, OptimizationSolver, ProofSession, and NotebookSession.
No backend porting is required. The goal is to make the authority and result
model reviewable.
Stage 1: Linux Sidecar Prototype
Run tools on a controlled Linux host or hardware-backed Linux guest and expose them to capOS through a capability proxy. Namespace/cgroup/seccomp/Landlock wrappers are acceptable for trusted batch tools, but untrusted notebooks, model-generated code, package builds, and multi-tenant jobs should use a QEMU/KVM guest first and Firecracker/Kata-style microVMs later. High-risk tenants, sensitive data, GPU/device passthrough, and side-channel-sensitive jobs may require single-tenant hosts instead of shared VM hosts. User-Mode Linux may remain a developer/debug fallback when KVM is unavailable, but it is not the default strong-isolation backend. This proves the API, audit, and reproducibility model before native userspace can run Python, Julia, R, and large C++ stacks.
Initial tools:
- PARI/GP;
- SymPy;
- Z3 and cvc5;
- HiGHS;
- Lean plus mathlib project build;
- immutable artifact store and provenance records.
Stage 2: Native Wrapper Services
When capOS userspace has the necessary binary/runtime support, add command
wrapper services for gp, lean/lake, z3, cvc5, highs, Rscript,
octave, gap, and Singular. Each wrapper runs with an explicit workspace,
environment, timeout, and resource ledger.
Stage 3: Notebook And Language Kernels
Add Jupyter-compatible document storage and kernel-launch policy. Python,
Sage, Julia, R, Octave, Lean, and GAP kernels can then run as KernelSession
services with capOS-owned artifact capture.
Stage 4: Package-Closure Store
Import or build Nix/Guix/Spack-style closures into capOS Store and
Namespace capabilities. Package resolution stays outside the kernel. The
important kernel-visible property is that executable environments are immutable
objects with explicit resource and authority grants.
Stage 5: Lab Workflow
Combine scientific sessions with hosted-agent workspaces, experiment registry, review gates, browser/literature tools, GPU/model services, and stateful job graphs. This is the point where capOS becomes a credible LLM agent research lab rather than a collection of math commands.
Open Questions
- Should the first sidecar protocol be Cap’n Proto RPC directly, MCP through a gateway, or both?
- Which package-closure source should capOS ingest first: Nix for breadth, Guix for scientific reproducibility, or Spack for HPC variants?
- Which hardware-backed Linux guest backend should be first after QEMU/KVM: Firecracker for narrow batch workers, Kata-style VM containers for OCI integration, or both?
- Which workload classes are eligible for capOS native auto full-nohz
placement, and how should that map to future
CpuIsolationLease,NoHzEligibility,NoHzActivation, andSchedulingContextauthority? - How much of
.ipynbshould be preserved versus represented as a capOS-native notebook object with import/export? - Which proof artifacts can be reduced to small trusted checker inputs, and which require full project build logs for confidence?
- How should floating-point nondeterminism and randomized solver behavior be summarized so agents do not overclaim exactness?
- Where should license policy live: package catalog, broker policy, or both?
Near-Term Recommendation
Do not start by porting SageMath or Jupyter. Start with a small
scientific-base sidecar proof:
NumberTheory.evalbacked by PARI/GP;SmtSolver.checkbacked by Z3 and cvc5;OptimizationSolver.solvebacked by HiGHS for LP/QP/MIP smoke cases;ProofChecker.buildbacked by Lean/mathlib for a pinned project;- immutable artifact/provenance records for every call.
That profile gives users, agent runners, and batch services exact arithmetic, constraint checking, optimization, and formal proof validation while keeping authority narrow enough for review. SageMath, OSCAR, Jupyter, R, Octave, and full package closure support should follow after the base interfaces and audit model are credible.