Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Scientific Standard Package And Agent Lab Capabilities

capOS should eventually ship a curated scientific standard package: a capability-scoped service graph that gives agents and users high-level access to computer algebra, numerical computing, solvers, formal proof systems, notebooks, reproducible package environments, and experiment records.

This is not a request to turn the kernel into a scientific runtime. The kernel still provides capability tables, address spaces, scheduling, IPC, memory, device, and storage primitives. The scientific package lives in userspace, above package, workspace, job-graph, model, and broker services.

Design Grounding

Local grounding:

External grounding is summarized in the research notes and covers PARI/GP, SageMath, GAP, Singular, OSCAR, SymPy, SciPy, R, Octave, JupyterLab, Z3, cvc5, HiGHS, SCIP, OR-Tools, JuMP, CVXPY, Lean/mathlib, Rocq, Isabelle, Agda, Spack, Guix-HPC, Nix, Apptainer, Linux namespaces/cgroups/seccomp/Landlock, User-Mode Linux, gVisor, QEMU/KVM, Firecracker, Kata Containers, and Linux CPU isolation/housekeeping.

Goals

  • Give users, agent runners, and batch services high-level scientific capabilities without granting an unrestricted shell.
  • Make exact computation, numerical computation, optimization, SMT solving, and proof checking ordinary capOS services with explicit authority.
  • Preserve reproducibility: package closure, input data, seed, backend, version, timeout, quota, output, and audit metadata travel with every result.
  • Reuse mature upstream tools wherever possible.
  • Support both interactive research and unattended agent jobs.
  • Keep tool authority separate from model inference. Models propose; trusted capOS runners execute through broker policy.

Non-Goals

  • Do not invent a replacement for SageMath, PARI, GAP, Singular, OSCAR, SciPy, Jupyter, Lean, Rocq, Isabelle, or established solvers.
  • Do not add POSIX, Docker, Conda, Nix, Guix, or Spack as ambient system authority.
  • Do not make notebook execution equivalent to shell access.
  • Do not treat SMT or CAS answers as formal proof unless a proof checker validates an artifact.
  • Do not make this package part of the active in-process threading milestone.

Package Profiles

The standard package should be split into explicit profiles so capOS can ship or grant only what a session needs.

ProfileContentsPrimary use
scientific-basePARI/GP or PARI C service, SymPy, Z3, cvc5, HiGHS, Lean checker, artifact storeLow-risk exact math, solver, and proof assistance
scientific-researchSageMath, GAP, Singular, OSCAR/Julia, R, Octave, SciPy, JuMP, CVXPY, SCIP, OR-ToolsFull interactive research workflows
scientific-notebookJupyter-compatible notebook/session service and language kernelsLiterate experiments with replayable artifacts
scientific-labExperiment registry, workspaces, job graphs, retrieval, review gates, GPU/model integrationLong-running research labs with users, agents, and review workflows
scientific-commercialOptional proprietary/commercial connectors such as Wolfram Engine or commercial solversExplicitly licensed site-local extensions

Profiles grant service roots, not every concrete backend cap. A user, agent runner, or batch service normally receives a ScientificSession facade that advertises only the tools and methods permitted for the current session.

Capability Surface

Catalog And Environment

  • ScientificCatalog: lists installed profiles, backend identities, supported interfaces, licenses, package closures, and known reproducibility caveats.
  • PackageCatalog: resolves named package environments to content-addressed closures.
  • PackageClosure: immutable description of packages, build inputs, toolchain versions, hashes, license metadata, vulnerability metadata, and supported CPU/GPU features.
  • Environment: starts a bounded interpreter, solver, proof, notebook, or job process with exactly the selected closure and granted caps.

Workspaces And Artifacts

  • ResearchWorkspace: branchable namespace for source, notebooks, data, generated files, proofs, and run records.
  • ArtifactStore: immutable objects for solver inputs, proof logs, notebooks, datasets, plots, tables, binaries, and transcripts.
  • ProvenanceLog: append-only record of who or which agent produced an artifact, with model/tool/package/session metadata.
  • ExperimentRegistry: immutable run specifications plus mutable review status, labels, and publication decisions.

CAS And Mathematical Services

  • ComputerAlgebra: general symbolic manipulation facade for factorization, simplification, integration, exact linear algebra, polynomial operations, and expression normalization.
  • NumberTheory: PARI-backed exact number theory, elliptic curves, modular forms, algebraic number fields, L-functions, and related computations.
  • DiscreteAlgebra: GAP-backed group, representation, finite algebra, and combinatorics workflows.
  • PolynomialAlgebra: Singular-backed ideals, modules, Groebner bases, quotient rings, and algebraic geometry computations.
  • JuliaAlgebraKernel: OSCAR/Nemo/Hecke/AbstractAlgebra workflows for cases where a general Julia session is the correct backend.

Each method returns structured values when practical and always records the backend, package closure, input, output, elapsed time, and resource envelope.

Solvers

  • SmtSolver: typed SMT-LIB import/export, assertions, check-sat, model, unsat core, timeout, random seed, proof/certificate metadata, backend selection among Z3/cvc5 or future solvers.
  • OptimizationSolver: LP, MIP, QP, conic, CP-SAT, routing, scheduling, and nonlinear solve jobs with declared model format, backend, objective, constraints, time limit, memory limit, gap/tolerance policy, and solution status.
  • ModelingSession: JuMP/CVXPY/OR-Tools-style language session for models that need high-level construction rather than direct serialized input.

Solver calls must distinguish optimal, feasible, infeasible, unbounded, unknown, timeout, resource_exceeded, and backend_error. User-facing tools should not collapse these into a single textual answer.

Formal Proof

  • ProofCatalog: installed proof assistants, libraries, theorem indexes, and package closures.
  • ProofSession: checkout, edit, build, query goals, run tactics, run tests, and produce checked proof artifacts.
  • ProofChecker: batch verification of a named theorem or project under a pinned closure.
  • LemmaSearch: retrieval over local proof libraries, declarations, docs, and prior accepted project artifacts.

The first implementation target should be Lean plus mathlib because it is the most useful default for current agent-assisted mathematics. Rocq, Isabelle, and Agda should remain first-class future backends with separate kernels and project layouts.

Notebook And Interactive Kernel Sessions

  • NotebookDocument: immutable or branchable notebook object with cells, outputs, attachments, environment id, and execution provenance.
  • NotebookSession: starts kernels, executes cells, captures outputs, renders rich media, and gates side effects.
  • KernelSession: Python, Sage, Julia, R, Octave, Lean, GAP, PARI, or other REPL-like process with explicit workspace and package environment caps.

Notebook execution is authority-bearing. Opening a notebook for reading should not execute it. Running a notebook should prompt or use session policy for network access, package installation, writes outside the workspace, long jobs, credential access, GPU use, and publication.

Agent Lab Architecture

An LLM agent research lab on capOS should be a service graph:

flowchart LR
    User[User] --> Runner[Agent Runner]
    Runner --> Broker[AuthorityBroker]
    Runner --> Model[LanguageModel]
    Runner --> Sci[ScientificSession]
    Sci --> CAS[CAS Services]
    Sci --> Solvers[Solver Services]
    Sci --> Proof[Proof Services]
    Sci --> Notebook[NotebookSession]
    Sci --> Jobs[JobGraph]
    Sci --> Workspace[ResearchWorkspace]
    Workspace --> Artifacts[ArtifactStore]
    Jobs --> Compute[CPU/GPU/Storage/Network Caps]
    Runner --> Audit[ProvenanceLog]

The runner owns the user session and applies tool policy. The model service does not hold scientific tool caps directly. Tool calls from the model become typed proposals to the runner, and the runner invokes ScientificSession methods only when broker policy allows it.

Authority And Safety Rules

  • A tool cap grants only the named interface. NumberTheory does not imply file, shell, network, package install, or proof-publication authority.
  • Package installation and environment resolution are separate authorities from executing an already-pinned environment.
  • External network fetch is separate from local computation. Literature search, package download, model-provider calls, and dataset upload are different caps.
  • Every long-running calculation must have a job id, quota, cancellation path, and durable status.
  • GPU use requires a GPU/session cap and should record driver/runtime/kernel metadata.
  • Proof acceptance must be checker-backed. Agent confidence, CAS evidence, or SMT success is advisory unless the proof kernel accepts the artifact.
  • Published results must cite the artifact ids and package closure ids that produced them.
  • Commercial or proprietary engines must be opt-in, labeled, and grantable only through site policy.
  • Linux workload placement must distinguish ordinary resource-limited work from capOS-native auto-nohz-eligible work. Linux nohz_full inside a guest may be useful compatibility or benchmark state, but capOS CPU isolation, auto full-nohz activation, housekeeping placement, IRQ routing, and exclusive CPU use are outer scheduler-authority decisions, not options an agent tool descriptor can set by itself.

Linux Workload And Virtualization Strategy

The first implementation is likely to consume the generic Linux workload sandbox substrate for large scientific stacks. Scientific jobs should be selected by trust and compatibility class:

BackendUseBoundary claim
namespace/cgroup/seccomp/Landlock sandboxtrusted batch tools and fast command wrappersshares host Linux kernel; useful policy layer, not strong multi-tenant isolation
bubblewrap/nsjailearly command-wrapper executor for gp, solvers, proof checkers, and scriptsstructured process sandbox over Linux primitives
User-Mode Linuxdeveloper/debug fallback when KVM is unavailableLinux-as-host-process compatibility; not the main strong-isolation path
gVisorcontainer-compatible higher-risk workloadsper-sandbox application kernel reduces direct host-kernel exposure
QEMU/KVM Linux guestbroad compatibility, full distro roots, package builds, untrusted notebookshardware-backed guest kernel boundary
Firecracker or Kata-style microVMrepeated stateless solver/proof/notebook jobs with narrow device modelshardware-backed microVM boundary with smaller operational surface
dedicated host or single-tenant nodehigh-risk tenants, sensitive data, GPU/device passthrough, side-channel-sensitive jobs, long-lived browser/GUI workloadsreduces shared-host and VM-escape blast radius beyond ordinary VM tenancy

The generic LinuxWorkloadSandbox service should record backend, image/rootfs/package hashes, sandbox policy or VM device model, kernel version, CPU affinity, cgroup quota, deployment location, external-host placement metadata, capOS NoHzEligibility/NoHzActivation state for capOS-scheduled proxies or VMMs, guest tickless/nohz state, network policy, artifact inputs, artifact outputs, and exit reason. A result from a namespace sandbox and a result from a KVM guest may be functionally equivalent, but their security, scheduler, and reproducibility claims are different.

For the native capOS auto full-nohz scheduler track, scientific jobs should use the generic workload placement classes:

  • ordinary placement: cgroup v2 resource limits and optional affinity for normal solver, proof, CAS, package, and notebook jobs.
  • auto-nohz-eligible placement: explicit capOS eligibility plus CPU-time authority for low-jitter benchmark, realtime, GPU-feed, SQPOLL-like, or latency-bound workload loops. The outer capOS scheduler must know the workload’s vCPU/helper/poller threads and must also account for housekeeping CPUs, IRQ placement, timers, and deferred kernel work. Guest Linux tickless state and external Linux-host isolation state are recorded separately and do not by themselves activate capOS nohz.

Existing Solutions To Adapt

AreaAdapt firstReasonable capOS adaptation
Number theoryPARI/GPWrap gp early; use PARI C library for stable service calls later.
Broad mathSageMathHost as a Python/Sage kernel with pinned closure and notebook integration.
Discrete algebraGAPWrap CLI and package loading; later expose common group-theory methods.
Polynomial algebraSingularWrap command/batch mode; later expose polynomial/ideal operations.
Algebra researchOSCARHost Julia/OSCAR kernel; avoid flattening its object model prematurely.
Symbolic PythonSymPyEmbed in Python service for lightweight symbolic calls and code generation.
Scientific PythonNumPy/SciPyProvide Python kernel and batch-job environments with BLAS/LAPACK metadata.
StatisticsRProvide Rscript and R kernel sessions with package closure capture.
MATLAB-like workflowsGNU OctaveProvide batch and interactive kernel sessions.
SMTZ3, cvc5Provide SmtSolver with backend identity, model, core, and timeout fields.
Optimization enginesHiGHS, SCIP, OR-ToolsProvide direct solve jobs and higher-level modeling sessions.
Modeling layersJuMP, CVXPYHost Julia/Python modeling kernels and export normalized model artifacts.
Formal proofLean/mathlib first; Rocq, Isabelle, Agda laterProvide proof sessions, build logs, theorem search, and checked artifacts.
NotebooksJupyterLab modelReuse .ipynb concepts and kernels but replace ambient authority with caps.
Package closureNix, Guix, SpackIngest closures and recipes; expose capOS package catalogs and Store objects.
HPC containersApptainerUse as a Linux-sidecar compatibility bridge, not as native authority.

Staged Implementation

Stage 0: Interface-Only Design

Define schemas for ScientificSession, ArtifactStore, PackageClosure, SmtSolver, OptimizationSolver, ProofSession, and NotebookSession. No backend porting is required. The goal is to make the authority and result model reviewable.

Stage 1: Linux Sidecar Prototype

Run tools on a controlled Linux host or hardware-backed Linux guest and expose them to capOS through a capability proxy. Namespace/cgroup/seccomp/Landlock wrappers are acceptable for trusted batch tools, but untrusted notebooks, model-generated code, package builds, and multi-tenant jobs should use a QEMU/KVM guest first and Firecracker/Kata-style microVMs later. High-risk tenants, sensitive data, GPU/device passthrough, and side-channel-sensitive jobs may require single-tenant hosts instead of shared VM hosts. User-Mode Linux may remain a developer/debug fallback when KVM is unavailable, but it is not the default strong-isolation backend. This proves the API, audit, and reproducibility model before native userspace can run Python, Julia, R, and large C++ stacks.

Initial tools:

  • PARI/GP;
  • SymPy;
  • Z3 and cvc5;
  • HiGHS;
  • Lean plus mathlib project build;
  • immutable artifact store and provenance records.

Stage 2: Native Wrapper Services

When capOS userspace has the necessary binary/runtime support, add command wrapper services for gp, lean/lake, z3, cvc5, highs, Rscript, octave, gap, and Singular. Each wrapper runs with an explicit workspace, environment, timeout, and resource ledger.

Stage 3: Notebook And Language Kernels

Add Jupyter-compatible document storage and kernel-launch policy. Python, Sage, Julia, R, Octave, Lean, and GAP kernels can then run as KernelSession services with capOS-owned artifact capture.

Stage 4: Package-Closure Store

Import or build Nix/Guix/Spack-style closures into capOS Store and Namespace capabilities. Package resolution stays outside the kernel. The important kernel-visible property is that executable environments are immutable objects with explicit resource and authority grants.

Stage 5: Lab Workflow

Combine scientific sessions with hosted-agent workspaces, experiment registry, review gates, browser/literature tools, GPU/model services, and stateful job graphs. This is the point where capOS becomes a credible LLM agent research lab rather than a collection of math commands.

Open Questions

  • Should the first sidecar protocol be Cap’n Proto RPC directly, MCP through a gateway, or both?
  • Which package-closure source should capOS ingest first: Nix for breadth, Guix for scientific reproducibility, or Spack for HPC variants?
  • Which hardware-backed Linux guest backend should be first after QEMU/KVM: Firecracker for narrow batch workers, Kata-style VM containers for OCI integration, or both?
  • Which workload classes are eligible for capOS native auto full-nohz placement, and how should that map to future CpuIsolationLease, NoHzEligibility, NoHzActivation, and SchedulingContext authority?
  • How much of .ipynb should be preserved versus represented as a capOS-native notebook object with import/export?
  • Which proof artifacts can be reduced to small trusted checker inputs, and which require full project build logs for confidence?
  • How should floating-point nondeterminism and randomized solver behavior be summarized so agents do not overclaim exactness?
  • Where should license policy live: package catalog, broker policy, or both?

Near-Term Recommendation

Do not start by porting SageMath or Jupyter. Start with a small scientific-base sidecar proof:

  • NumberTheory.eval backed by PARI/GP;
  • SmtSolver.check backed by Z3 and cvc5;
  • OptimizationSolver.solve backed by HiGHS for LP/QP/MIP smoke cases;
  • ProofChecker.build backed by Lean/mathlib for a pinned project;
  • immutable artifact/provenance records for every call.

That profile gives users, agent runners, and batch services exact arithmetic, constraint checking, optimization, and formal proof validation while keeping authority narrow enough for review. SageMath, OSCAR, Jupyter, R, Octave, and full package closure support should follow after the base interfaces and audit model are credible.