# Research: Scientific Agent-Lab Software Stack

This note surveys existing scientific software that capOS should treat as
adaptable service backends for a future agent-facing research lab. The central
lesson is that capOS should not invent a new computer algebra system, solver,
proof assistant, notebook system, or package manager. It should give agents
typed, audited, resource-bounded capabilities over mature tools and preserve
the exact environment, inputs, outputs, and proof artifacts needed for review.

## Design Consequences For capOS

- Provide a `scientific-standard` package as a service graph, not as ambient
  binaries on a global filesystem.
- Start by adapting existing command-line and library tools behind narrow
  typed facades. Native rewrites are unjustified until a backend needs a
  smaller trusted core or a direct capability ABI.
- Treat heavyweight systems such as SageMath, OSCAR, JupyterLab, Lean/mathlib,
  and Spack as environment subjects: they need package-store, workspace,
  process, network, cache, and quota policy, not just a binary launch API.
- Expose solver and proof tools as deterministic request/response services
  whenever possible. A model should ask `SmtSolver.check`, `ProofSession.build`,
  or `OptimizationSolver.solve`, not run arbitrary shell text.
- Keep formal proof assistants separate from automatic solvers. SMT results
  are useful evidence, but durable mathematical claims need proof artifacts
  checked by Lean, Rocq, Isabelle, Agda, or another trusted kernel.
- Make provenance a first-class output. Every notebook cell, solver run,
  proof build, CAS session, package environment, model prompt, and data input
  should produce replayable metadata and an audit record.
- Prefer open-source backends in the default package. Proprietary engines such
  as Wolfram Engine can be optional connector services with explicit license,
  network, and production-use metadata.

## Source Baseline

External sources used for this survey:

- PARI/GP: <https://pari.math.u-bordeaux.fr/>
- SageMath: <https://www.sagemath.org/>
- GAP: <https://www.gap-system.org/>
- Singular: <https://www.singular.uni-kl.de/>
- OSCAR: <https://www.oscar-system.org/about/>
- SymPy: <https://www.sympy.org/>
- GNU Octave: <https://octave.org/about>
- R Project: <https://www.r-project.org/>
- SciPy: <https://scipy.org/>
- JupyterLab: <https://jupyterlab.readthedocs.io/>
- Z3: <https://github.com/Z3Prover/z3>
- cvc5: <https://github.com/cvc5/cvc5>
- HiGHS: <https://highs.dev/>
- SCIP: <https://scipopt.org/>
- OR-Tools: <https://developers.google.com/optimization>
- JuMP: <https://jump.dev/>
- CVXPY: <https://www.cvxpy.org/>
- Lean 4: <https://lean4.dev/>
- Lean mathlib: <https://github.com/leanprover-community/mathlib4>
- Rocq Prover: <https://rocq-prover.org/>
- Isabelle: <https://isabelle.cs.tum.edu/>
- Agda: <https://agda.github.io/agda/>
- Spack: <https://spack.io/> and <https://computing.llnl.gov/projects/spack-hpc-package-manager>
- Guix-HPC: <https://hpc.guix.info/>
- Nix: <https://nixos.org/>
- Apptainer: <https://github.com/apptainer/apptainer>

Local grounding:

- [Language Models and Agent Runtime](../proposals/llm-and-agent-proposal.md)
- [capOS-Hosted Agent Swarms](../proposals/hosted-agent-swarm-proposal.md)
- [Userspace Binaries](../proposals/userspace-binaries-proposal.md)
- [Stateful Task and Job Graphs](../proposals/stateful-task-job-graphs-proposal.md)
- [Storage and Naming](../proposals/storage-and-naming-proposal.md)
- [HPC Parallel Processing Patterns](../proposals/hpc-parallel-patterns-proposal.md)
- [GPU Capability](../proposals/gpu-capability-proposal.md)
- [System Performance Benchmarks](../proposals/system-performance-benchmarks-proposal.md)
- [Hosted Agent Harnesses](hosted-agent-harnesses.md)
- [HPC Parallel Patterns](hpc-parallel-patterns.md)
- [Small LLM Survey](small-llm-survey.md)
- [Linux Sandboxes And Virtualization For Workloads](linux-sandboxes-virtualization.md)
- [NO_HZ, SQPOLL, and Realtime Scheduling](nohz-sqpoll-realtime.md)

## Tool Families

### Computer Algebra And Exact Mathematics

PARI/GP is the obvious default for number-theory work. The upstream project is
a cross-platform open-source computer algebra system designed for fast number
theory computations, and it exposes both the `gp` shell and the PARI C library.
For capOS, the C library is the better long-term service backend; the shell is
useful for early compatibility and transcript capture.

SageMath is the best broad open-source mathematical umbrella. It is GPL
software built on NumPy, SciPy, matplotlib, SymPy, Maxima, GAP, FLINT, R, and
other packages, with the explicit mission of being a free alternative to
Magma, Maple, Mathematica, and Matlab. Sage is too large to make into a small
TCB component, but it is ideal as an agent lab kernel when the package store
and Python compatibility layer exist.

GAP is the standard open system for computational discrete algebra, especially
computational group theory. Singular is the specialized polynomial,
commutative-algebra, algebraic-geometry, and singularity-theory system. OSCAR
is the newer Julia-based research system that unifies GAP, Singular, Polymake,
ANTIC/Hecke/Nemo/AbstractAlgebra-style capabilities across algebra, geometry,
number theory, and polyhedral geometry. For capOS this suggests two levels:
small typed services for common requests, and full language kernels for
research workflows that need the native ecosystem.

SymPy is lightweight and embeddable because it is Python-based and has few
dependencies. It is a good first symbolic backend for agents that need exact
manipulation, code generation, and checkable expressions without launching
Sage. SymPy should not replace PARI, GAP, Singular, or OSCAR for their
specialist domains.

### Numerical, Statistical, And Notebook Workflows

SciPy provides fundamental algorithms for scientific computing in Python:
optimization, integration, interpolation, eigenvalue problems, algebraic and
differential equations, statistics, sparse matrices, k-dimensional trees, and
more. It sits on NumPy and compiled Fortran/C/C++ kernels, so capOS support
depends on Python, native extension loading, BLAS/LAPACK packaging, and
controlled native-code execution.

R remains the standard open statistical environment. GNU Octave remains the
open MATLAB-like numerical environment for linear and nonlinear numerical
work. Julia is strategically important because OSCAR, JuMP, SciML, and many
modern research packages depend on it. The first capOS lab should host these
as isolated language kernels rather than trying to normalize them into one
universal API.

JupyterLab is the standard interactive computing front end because notebooks
combine code, prose, equations, visualizations, outputs, and controls. capOS
should adapt the notebook model but not grant notebook kernels ambient shell
or filesystem authority. A future `NotebookSession` should start kernels with
explicit workspace, package environment, data, network, and compute caps, and
record every execution result as a reproducible artifact.

### Satisfiability, Optimization, And Operations Research

Z3 and cvc5 are the primary open SMT backends to expose through a capOS
`SmtSolver` capability. Both support stand-alone and library use. SMT-LIB
should be a supported import/export format, but the service API should expose
typed assumptions, objectives, timeouts, model requests, unsat cores, and
proof/certificate availability explicitly.

For mathematical optimization, capOS should separate modeling layers from
solver engines:

- JuMP and CVXPY are high-level modeling interfaces that let researchers state
  optimization problems in Julia or Python.
- HiGHS is a strong open backend for large sparse LP, MIP, and QP models.
- SCIP is a broad optimization suite around constraint integer programming,
  with current Apache-licensed releases.
- OR-Tools is a practical operations-research toolkit, especially for
  constraint programming, routing, scheduling, and combinatorial optimization.

The capOS API should accept common model formats and provide bounded solve
jobs with time/memory limits, deterministic seeds when supported, solution
certificates when available, and reproducibility metadata. It should not hide
which backend solved the problem.

### Formal Proof Systems

Lean 4 is both a general-purpose functional language and an interactive theorem
prover, with mathlib as its main community mathematics library. It is the best
default for agent-assisted formal mathematics because current LLM tooling and
library momentum are strongest there.

Rocq, formerly Coq, remains an industrial-strength dependently typed prover
with a long verification history and program extraction story. Isabelle is a
generic proof assistant, with Isabelle/HOL and mature automation important for
systems proofs. Agda is valuable for constructive type theory and dependently
typed programming. A capOS lab should support all of them as separate proof
kernel families instead of pretending they are interchangeable.

Agent integration should be conservative:

- Agents may propose proof edits, search lemmas, call tactics, and run builds.
- The proof checker decides whether a theorem is accepted.
- Accepted proof artifacts must include toolchain version, library revision,
  package closure, command line, and full build log.
- CAS or SMT evidence can guide a proof but is not the proof unless the proof
  assistant checks an imported certificate or independently reconstructs it.

### Reproducible Environments And Package Stores

The scientific stack is too large and language-diverse for a hand-written
capOS package format to be the first step. Existing systems offer useful
pieces:

- Nix provides isolated, declarative package builds and large package
  coverage.
- Guix-HPC focuses on reproducible scientific deployment, per-user
  environments, and bit-for-bit repeatability from a specific Guix commit.
- Spack is the HPC-oriented answer for many compiler, MPI, CPU-target, and
  library variant combinations.
- Apptainer is common in HPC because it packages software into portable
  images while integrating with GPUs, high-speed networks, and shared
  filesystems.

capOS should not import any of these as the kernel package manager. Instead,
it should adapt their recipe and closure ideas into capability-native
`PackageCatalog`, `PackageClosure`, `Environment`, and `BuildService`
interfaces. Early implementations can execute Nix/Guix/Spack/Apptainer on a
Linux host or sidecar; later capOS can consume signed closures as Store objects.

## What An LLM Research Lab Needs

A credible LLM agent research lab on capOS needs more than model inference:

- **Workspace service.** Branchable project workspaces with exact input,
  output, patch, and artifact history.
- **Package environments.** Content-addressed software closures for Python,
  Julia, R, C/C++/Fortran, Lean, Rocq, Isabelle, GAP, PARI, Sage, and solver
  stacks.
- **Notebook service.** Jupyter-compatible documents and kernels, but kernels
  receive explicit caps instead of ambient filesystem, process, or network
  access.
- **Experiment registry.** Runs have immutable parameters, model versions,
  prompts, tool descriptors, seeds, package closures, datasets, results, and
  reviewer decisions.
- **Solver/proof services.** CAS, SMT, optimization, and formal proof systems
  are high-level tool capabilities with structured inputs and bounded
  resources.
- **Literature and retrieval.** Paper, code, dataset, citation, and note stores
  are ordinary namespaces; retrieval does not imply authority to fetch or
  publish.
- **Job graph orchestration.** Long calculations, training/evaluation jobs,
  proof builds, benchmark sweeps, and multi-agent tasks need resumable job
  graphs with cancellation and status.
- **Compute authority.** CPU, memory, storage, network, GPU, realtime, and
  external-provider quotas must be explicit and visible in the audit log.
- **Human review surfaces.** Agents can generate results, but publication,
  credential use, external API calls, irreversible filesystem changes, and
  proof-of-result claims need review gates.

## Adaptation Strategy

The near-term path is a staged compatibility bridge:

1. **Hosted Linux sidecars.** Run existing stacks on Linux while capOS exposes
   them as remote capability services. Use namespace/cgroup/seccomp/Landlock
   sandboxes for trusted batch tools; use hardware-backed Linux guests
   (QEMU/KVM first, Firecracker/Kata-style microVMs later) for untrusted
   notebooks, model-generated code, package builds, and multi-tenant agent
   jobs. Treat User-Mode Linux as a developer/debug fallback, not the primary
   strong-isolation boundary. This proves interfaces and audit before native
   package support exists.
2. **Command-wrapper services.** Wrap tools such as `gp`, `gap`, `Singular`,
   `lean`, `lake`, `rocq`, `isabelle`, `octave`, `Rscript`, and solver CLIs
   with explicit input/output directories, timeout, memory, and network policy.
3. **Library-backed services.** Replace wrappers with direct C/C++/Rust/Julia
   FFI or process-local RPC for small stable APIs such as PARI, Z3, cvc5, and
   HiGHS.
4. **Notebook and language kernels.** Add Python, Julia, R, Sage, and Lean
   kernels with capOS-authored kernel launchers and artifact capture.
5. **Package-closure ingestion.** Import Nix/Guix/Spack closures as signed
   Store objects, then build a capOS-native catalog around content hashes,
   licenses, vulnerability metadata, CPU/GPU compatibility, and provenance.
6. **Native capOS services.** Only after the interface stabilizes, port the
   most useful small engines or linkable libraries into native userspace.

## Risks And Open Questions

- **Supply-chain size.** Sage, Julia, Python scientific stacks, and proof
  libraries bring huge dependency closures. capOS must record and constrain
  them rather than pretend they are small trusted components.
- **Nondeterminism.** Floating-point math, randomized solvers, parallel BLAS,
  GPU kernels, and package resolution can make replay differ. Results need
  deterministic seeds and variance metadata, not only final answers.
- **License boundaries.** GPL, LGPL, Apache, MIT, BSD, proprietary, academic,
  and optional commercial solvers need explicit metadata before packaging.
- **Proof trust.** A CAS result, SMT model, or solver objective value can be
  false because of bugs, numeric tolerance, or bad modeling. Formal proof
  claims must be checked by the named proof kernel or labeled as empirical.
- **Agent overreach.** The default scientific package must not grant arbitrary
  shell, network, credential, package-install, or publishing authority to a
  model. Agents receive tools through runner policy, not direct backend caps.
- **Notebook security.** Notebooks are executable documents. Opening one is
  not consent to run it with the reader's caps.
- **Linux sidecar boundary drift.** Namespaces, seccomp, Landlock, gVisor,
  User-Mode Linux, KVM guests, and microVMs are different security and
  compatibility claims. capOS must record the backend, host kernel, policy,
  image hashes, guest tickless/nohz state, and capOS outer
  `NoHzEligibility`/`NoHzActivation` state rather than labeling all of them
  "sandboxed Linux".

## Recommendation

Define a future `scientific-standard` package as a curated service graph with
three profiles:

- **Base.** PARI/GP, SymPy, Z3, cvc5, HiGHS, Lean, and artifact/provenance
  services through tight command or library wrappers.
- **Research.** SageMath, GAP, Singular, OSCAR/Julia, R, Octave, JuMP, CVXPY,
  SCIP, OR-Tools, Jupyter-compatible notebooks, and package-closure support.
- **Lab.** Hosted-agent workspaces, experiment registry, browser/web research
  tools, GPU-backed model/scientific kernels, distributed job graphs, and
  publication/review workflows.

The Base profile is the first useful target for agents: exact number theory,
symbolic manipulation, SMT checking, linear/integer optimization, and Lean
proof checking are enough to make an agent substantially more reliable without
granting it a general-purpose scientific workstation.
