# Proposal: Scientific Standard Package And Agent Lab Capabilities

capOS should eventually ship a curated scientific standard package: a
capability-scoped service graph that gives agents and users high-level access
to computer algebra, numerical computing, solvers, formal proof systems,
notebooks, reproducible package environments, and experiment records.

This is not a request to turn the kernel into a scientific runtime. The kernel
still provides capability tables, address spaces, scheduling, IPC, memory,
device, and storage primitives. The scientific package lives in userspace,
above package, workspace, job-graph, model, and broker services.

## Design Grounding

Local grounding:

- [Scientific Agent-Lab Software Stack](../research/scientific-agent-lab-stack.md)
- [Linux Sandboxes And Virtualization For Workloads](../research/linux-sandboxes-virtualization.md)
- [NO_HZ, SQPOLL, and Realtime Scheduling](../research/nohz-sqpoll-realtime.md)
- [Language Models and Agent Runtime](llm-and-agent-proposal.md)
- [capOS-Hosted Agent Swarms](hosted-agent-swarm-proposal.md)
- [Userspace Binaries](userspace-binaries-proposal.md)
- [Stateful Task and Job Graphs](stateful-task-job-graphs-proposal.md)
- [Storage and Naming](storage-and-naming-proposal.md)
- [HPC Parallel Processing Patterns](hpc-parallel-patterns-proposal.md)
- [GPU Capability](gpu-capability-proposal.md)
- [System Performance Benchmarks](system-performance-benchmarks-proposal.md)

External grounding is summarized in the research notes and covers PARI/GP,
SageMath, GAP, Singular, OSCAR, SymPy, SciPy, R, Octave, JupyterLab, Z3, cvc5,
HiGHS, SCIP, OR-Tools, JuMP, CVXPY, Lean/mathlib, Rocq, Isabelle, Agda, Spack,
Guix-HPC, Nix, Apptainer, Linux namespaces/cgroups/seccomp/Landlock,
User-Mode Linux, gVisor, QEMU/KVM, Firecracker, Kata Containers, and Linux
CPU isolation/housekeeping.

## Goals

- Give users, agent runners, and batch services high-level scientific
  capabilities without granting an unrestricted shell.
- Make exact computation, numerical computation, optimization, SMT solving,
  and proof checking ordinary capOS services with explicit authority.
- Preserve reproducibility: package closure, input data, seed, backend,
  version, timeout, quota, output, and audit metadata travel with every result.
- Reuse mature upstream tools wherever possible.
- Support both interactive research and unattended agent jobs.
- Keep tool authority separate from model inference. Models propose; trusted
  capOS runners execute through broker policy.

## Non-Goals

- Do not invent a replacement for SageMath, PARI, GAP, Singular, OSCAR,
  SciPy, Jupyter, Lean, Rocq, Isabelle, or established solvers.
- Do not add POSIX, Docker, Conda, Nix, Guix, or Spack as ambient system
  authority.
- Do not make notebook execution equivalent to shell access.
- Do not treat SMT or CAS answers as formal proof unless a proof checker
  validates an artifact.
- Do not make this package part of the active in-process threading milestone.

## Package Profiles

The standard package should be split into explicit profiles so capOS can ship
or grant only what a session needs.

| Profile | Contents | Primary use |
| --- | --- | --- |
| `scientific-base` | PARI/GP or PARI C service, SymPy, Z3, cvc5, HiGHS, Lean checker, artifact store | Low-risk exact math, solver, and proof assistance |
| `scientific-research` | SageMath, GAP, Singular, OSCAR/Julia, R, Octave, SciPy, JuMP, CVXPY, SCIP, OR-Tools | Full interactive research workflows |
| `scientific-notebook` | Jupyter-compatible notebook/session service and language kernels | Literate experiments with replayable artifacts |
| `scientific-lab` | Experiment registry, workspaces, job graphs, retrieval, review gates, GPU/model integration | Long-running research labs with users, agents, and review workflows |
| `scientific-commercial` | Optional proprietary/commercial connectors such as Wolfram Engine or commercial solvers | Explicitly licensed site-local extensions |

Profiles grant service roots, not every concrete backend cap. A user, agent
runner, or batch service normally receives a `ScientificSession` facade that
advertises only the tools and methods permitted for the current session.

## Capability Surface

### Catalog And Environment

- `ScientificCatalog`: lists installed profiles, backend identities,
  supported interfaces, licenses, package closures, and known reproducibility
  caveats.
- `PackageCatalog`: resolves named package environments to content-addressed
  closures.
- `PackageClosure`: immutable description of packages, build inputs,
  toolchain versions, hashes, license metadata, vulnerability metadata, and
  supported CPU/GPU features.
- `Environment`: starts a bounded interpreter, solver, proof, notebook, or job
  process with exactly the selected closure and granted caps.

### Workspaces And Artifacts

- `ResearchWorkspace`: branchable namespace for source, notebooks, data,
  generated files, proofs, and run records.
- `ArtifactStore`: immutable objects for solver inputs, proof logs, notebooks,
  datasets, plots, tables, binaries, and transcripts.
- `ProvenanceLog`: append-only record of who or which agent produced an
  artifact, with model/tool/package/session metadata.
- `ExperimentRegistry`: immutable run specifications plus mutable review
  status, labels, and publication decisions.

### CAS And Mathematical Services

- `ComputerAlgebra`: general symbolic manipulation facade for factorization,
  simplification, integration, exact linear algebra, polynomial operations,
  and expression normalization.
- `NumberTheory`: PARI-backed exact number theory, elliptic curves, modular
  forms, algebraic number fields, L-functions, and related computations.
- `DiscreteAlgebra`: GAP-backed group, representation, finite algebra, and
  combinatorics workflows.
- `PolynomialAlgebra`: Singular-backed ideals, modules, Groebner bases,
  quotient rings, and algebraic geometry computations.
- `JuliaAlgebraKernel`: OSCAR/Nemo/Hecke/AbstractAlgebra workflows for cases
  where a general Julia session is the correct backend.

Each method returns structured values when practical and always records the
backend, package closure, input, output, elapsed time, and resource envelope.

### Solvers

- `SmtSolver`: typed SMT-LIB import/export, assertions, check-sat, model,
  unsat core, timeout, random seed, proof/certificate metadata, backend
  selection among Z3/cvc5 or future solvers.
- `OptimizationSolver`: LP, MIP, QP, conic, CP-SAT, routing, scheduling, and
  nonlinear solve jobs with declared model format, backend, objective,
  constraints, time limit, memory limit, gap/tolerance policy, and solution
  status.
- `ModelingSession`: JuMP/CVXPY/OR-Tools-style language session for models
  that need high-level construction rather than direct serialized input.

Solver calls must distinguish `optimal`, `feasible`, `infeasible`,
`unbounded`, `unknown`, `timeout`, `resource_exceeded`, and `backend_error`.
User-facing tools should not collapse these into a single textual answer.

### Formal Proof

- `ProofCatalog`: installed proof assistants, libraries, theorem indexes, and
  package closures.
- `ProofSession`: checkout, edit, build, query goals, run tactics, run tests,
  and produce checked proof artifacts.
- `ProofChecker`: batch verification of a named theorem or project under a
  pinned closure.
- `LemmaSearch`: retrieval over local proof libraries, declarations, docs, and
  prior accepted project artifacts.

The first implementation target should be Lean plus mathlib because it is the
most useful default for current agent-assisted mathematics. Rocq, Isabelle,
and Agda should remain first-class future backends with separate kernels and
project layouts.

### Notebook And Interactive Kernel Sessions

- `NotebookDocument`: immutable or branchable notebook object with cells,
  outputs, attachments, environment id, and execution provenance.
- `NotebookSession`: starts kernels, executes cells, captures outputs, renders
  rich media, and gates side effects.
- `KernelSession`: Python, Sage, Julia, R, Octave, Lean, GAP, PARI, or other
  REPL-like process with explicit workspace and package environment caps.

Notebook execution is authority-bearing. Opening a notebook for reading should
not execute it. Running a notebook should prompt or use session policy for
network access, package installation, writes outside the workspace, long jobs,
credential access, GPU use, and publication.

## Agent Lab Architecture

An LLM agent research lab on capOS should be a service graph:

```mermaid
flowchart LR
    User[User] --> Runner[Agent Runner]
    Runner --> Broker[AuthorityBroker]
    Runner --> Model[LanguageModel]
    Runner --> Sci[ScientificSession]
    Sci --> CAS[CAS Services]
    Sci --> Solvers[Solver Services]
    Sci --> Proof[Proof Services]
    Sci --> Notebook[NotebookSession]
    Sci --> Jobs[JobGraph]
    Sci --> Workspace[ResearchWorkspace]
    Workspace --> Artifacts[ArtifactStore]
    Jobs --> Compute[CPU/GPU/Storage/Network Caps]
    Runner --> Audit[ProvenanceLog]
```

The runner owns the user session and applies tool policy. The model service
does not hold scientific tool caps directly. Tool calls from the model become
typed proposals to the runner, and the runner invokes `ScientificSession`
methods only when broker policy allows it.

## Authority And Safety Rules

- A tool cap grants only the named interface. `NumberTheory` does not imply
  file, shell, network, package install, or proof-publication authority.
- Package installation and environment resolution are separate authorities
  from executing an already-pinned environment.
- External network fetch is separate from local computation. Literature search,
  package download, model-provider calls, and dataset upload are different
  caps.
- Every long-running calculation must have a job id, quota, cancellation path,
  and durable status.
- GPU use requires a GPU/session cap and should record driver/runtime/kernel
  metadata.
- Proof acceptance must be checker-backed. Agent confidence, CAS evidence, or
  SMT success is advisory unless the proof kernel accepts the artifact.
- Published results must cite the artifact ids and package closure ids that
  produced them.
- Commercial or proprietary engines must be opt-in, labeled, and grantable
  only through site policy.
- Linux workload placement must distinguish ordinary resource-limited work
  from capOS-native auto-nohz-eligible work. Linux `nohz_full` inside a guest
  may be useful compatibility or benchmark state, but capOS CPU isolation,
  auto full-nohz activation, housekeeping placement, IRQ routing, and
  exclusive CPU use are outer scheduler-authority decisions, not options an
  agent tool descriptor can set by itself.

## Linux Workload And Virtualization Strategy

The first implementation is likely to consume the generic Linux workload
sandbox substrate for large scientific stacks. Scientific jobs should be
selected by trust and compatibility class:

| Backend | Use | Boundary claim |
| --- | --- | --- |
| namespace/cgroup/seccomp/Landlock sandbox | trusted batch tools and fast command wrappers | shares host Linux kernel; useful policy layer, not strong multi-tenant isolation |
| bubblewrap/nsjail | early command-wrapper executor for `gp`, solvers, proof checkers, and scripts | structured process sandbox over Linux primitives |
| User-Mode Linux | developer/debug fallback when KVM is unavailable | Linux-as-host-process compatibility; not the main strong-isolation path |
| gVisor | container-compatible higher-risk workloads | per-sandbox application kernel reduces direct host-kernel exposure |
| QEMU/KVM Linux guest | broad compatibility, full distro roots, package builds, untrusted notebooks | hardware-backed guest kernel boundary |
| Firecracker or Kata-style microVM | repeated stateless solver/proof/notebook jobs with narrow device models | hardware-backed microVM boundary with smaller operational surface |
| dedicated host or single-tenant node | high-risk tenants, sensitive data, GPU/device passthrough, side-channel-sensitive jobs, long-lived browser/GUI workloads | reduces shared-host and VM-escape blast radius beyond ordinary VM tenancy |

The generic `LinuxWorkloadSandbox` service should record backend,
image/rootfs/package hashes,
sandbox policy or VM device model, kernel version, CPU affinity, cgroup quota,
deployment location, external-host placement metadata, capOS
`NoHzEligibility`/`NoHzActivation` state for capOS-scheduled proxies or VMMs,
guest tickless/nohz state, network policy, artifact inputs, artifact outputs,
and exit reason. A result from a namespace sandbox and a result from a KVM
guest may be functionally equivalent, but their security, scheduler, and
reproducibility claims are different.

For the native capOS auto full-nohz scheduler track, scientific jobs should
use the generic workload placement classes:

- **ordinary placement**: cgroup v2 resource limits and optional affinity for
  normal solver, proof, CAS, package, and notebook jobs.
- **auto-nohz-eligible placement**: explicit capOS eligibility plus CPU-time
  authority for low-jitter benchmark, realtime, GPU-feed, SQPOLL-like, or
  latency-bound workload loops. The outer capOS scheduler must know the
  workload's vCPU/helper/poller threads and must also account for housekeeping
  CPUs, IRQ placement, timers, and deferred kernel work. Guest Linux tickless
  state and external Linux-host isolation state are recorded separately and do
  not by themselves activate capOS nohz.

## Existing Solutions To Adapt

| Area | Adapt first | Reasonable capOS adaptation |
| --- | --- | --- |
| Number theory | PARI/GP | Wrap `gp` early; use PARI C library for stable service calls later. |
| Broad math | SageMath | Host as a Python/Sage kernel with pinned closure and notebook integration. |
| Discrete algebra | GAP | Wrap CLI and package loading; later expose common group-theory methods. |
| Polynomial algebra | Singular | Wrap command/batch mode; later expose polynomial/ideal operations. |
| Algebra research | OSCAR | Host Julia/OSCAR kernel; avoid flattening its object model prematurely. |
| Symbolic Python | SymPy | Embed in Python service for lightweight symbolic calls and code generation. |
| Scientific Python | NumPy/SciPy | Provide Python kernel and batch-job environments with BLAS/LAPACK metadata. |
| Statistics | R | Provide Rscript and R kernel sessions with package closure capture. |
| MATLAB-like workflows | GNU Octave | Provide batch and interactive kernel sessions. |
| SMT | Z3, cvc5 | Provide `SmtSolver` with backend identity, model, core, and timeout fields. |
| Optimization engines | HiGHS, SCIP, OR-Tools | Provide direct solve jobs and higher-level modeling sessions. |
| Modeling layers | JuMP, CVXPY | Host Julia/Python modeling kernels and export normalized model artifacts. |
| Formal proof | Lean/mathlib first; Rocq, Isabelle, Agda later | Provide proof sessions, build logs, theorem search, and checked artifacts. |
| Notebooks | JupyterLab model | Reuse `.ipynb` concepts and kernels but replace ambient authority with caps. |
| Package closure | Nix, Guix, Spack | Ingest closures and recipes; expose capOS package catalogs and Store objects. |
| HPC containers | Apptainer | Use as a Linux-sidecar compatibility bridge, not as native authority. |

## Staged Implementation

### Stage 0: Interface-Only Design

Define schemas for `ScientificSession`, `ArtifactStore`, `PackageClosure`,
`SmtSolver`, `OptimizationSolver`, `ProofSession`, and `NotebookSession`.
No backend porting is required. The goal is to make the authority and result
model reviewable.

### Stage 1: Linux Sidecar Prototype

Run tools on a controlled Linux host or hardware-backed Linux guest and expose
them to capOS through a capability proxy. Namespace/cgroup/seccomp/Landlock
wrappers are acceptable for trusted batch tools, but untrusted notebooks,
model-generated code, package builds, and multi-tenant jobs should use a
QEMU/KVM guest first and Firecracker/Kata-style microVMs later. High-risk
tenants, sensitive data, GPU/device passthrough, and side-channel-sensitive
jobs may require single-tenant hosts instead of shared VM hosts. User-Mode
Linux may remain a developer/debug fallback when KVM is unavailable, but it is
not the default strong-isolation backend. This proves the API, audit, and
reproducibility model before native userspace can run Python, Julia, R, and
large C++ stacks.

Initial tools:

- PARI/GP;
- SymPy;
- Z3 and cvc5;
- HiGHS;
- Lean plus mathlib project build;
- immutable artifact store and provenance records.

### Stage 2: Native Wrapper Services

When capOS userspace has the necessary binary/runtime support, add command
wrapper services for `gp`, `lean`/`lake`, `z3`, `cvc5`, `highs`, `Rscript`,
`octave`, `gap`, and `Singular`. Each wrapper runs with an explicit workspace,
environment, timeout, and resource ledger.

### Stage 3: Notebook And Language Kernels

Add Jupyter-compatible document storage and kernel-launch policy. Python,
Sage, Julia, R, Octave, Lean, and GAP kernels can then run as `KernelSession`
services with capOS-owned artifact capture.

### Stage 4: Package-Closure Store

Import or build Nix/Guix/Spack-style closures into capOS `Store` and
`Namespace` capabilities. Package resolution stays outside the kernel. The
important kernel-visible property is that executable environments are immutable
objects with explicit resource and authority grants.

### Stage 5: Lab Workflow

Combine scientific sessions with hosted-agent workspaces, experiment registry,
review gates, browser/literature tools, GPU/model services, and stateful job
graphs. This is the point where capOS becomes a credible LLM agent research lab
rather than a collection of math commands.

## Open Questions

- Should the first sidecar protocol be Cap'n Proto RPC directly, MCP through a
  gateway, or both?
- Which package-closure source should capOS ingest first: Nix for breadth,
  Guix for scientific reproducibility, or Spack for HPC variants?
- Which hardware-backed Linux guest backend should be first after QEMU/KVM:
  Firecracker for narrow batch workers, Kata-style VM containers for OCI
  integration, or both?
- Which workload classes are eligible for capOS native auto full-nohz
  placement, and how should that map to future `CpuIsolationLease`,
  `NoHzEligibility`, `NoHzActivation`, and `SchedulingContext` authority?
- How much of `.ipynb` should be preserved versus represented as a
  capOS-native notebook object with import/export?
- Which proof artifacts can be reduced to small trusted checker inputs, and
  which require full project build logs for confidence?
- How should floating-point nondeterminism and randomized solver behavior be
  summarized so agents do not overclaim exactness?
- Where should license policy live: package catalog, broker policy, or both?

## Near-Term Recommendation

Do not start by porting SageMath or Jupyter. Start with a small
`scientific-base` sidecar proof:

- `NumberTheory.eval` backed by PARI/GP;
- `SmtSolver.check` backed by Z3 and cvc5;
- `OptimizationSolver.solve` backed by HiGHS for LP/QP/MIP smoke cases;
- `ProofChecker.build` backed by Lean/mathlib for a pinned project;
- immutable artifact/provenance records for every call.

That profile gives users, agent runners, and batch services exact arithmetic,
constraint checking, optimization, and formal proof validation while keeping
authority narrow enough for review. SageMath, OSCAR, Jupyter, R, Octave, and
full package closure support should follow after the base interfaces and audit
model are credible.