# Genode OS Framework: Research Report for capOS

Research on Genode's capability-based component framework, session routing,
VFS architecture, and POSIX compatibility -- with lessons for capOS.

## 1. Capability-Based Component Framework

### Core Abstraction: RPC Objects

Genode's fundamental abstraction is the **RPC object**. Every service in the
system is implemented as an RPC object that can be invoked by clients holding
a capability to it. The capability is an unforgeable reference -- a kernel-
protected token that names a specific RPC object and grants the holder the
right to invoke its methods.

Genode supports multiple microkernels (NOVA, seL4, Fiasco.OC, a custom
base-hw kernel). The capability model is consistent across all of them,
though the kernel-level implementation details differ. The framework
abstracts kernel capabilities into its own uniform model.

Key properties of Genode capabilities:

- **Unforgeable.** A capability can only be obtained by delegation from a
  holder or creation by the kernel. There is no mechanism to synthesize a
  capability from an integer or address.
- **Typed.** Each capability refers to an RPC object with a specific
  interface. The C++ type system enforces interface contracts at compile time.
- **Delegatable.** A capability holder can pass it to another component via
  RPC arguments, allowing authority to flow through the system graph.
- **Revocable.** Capabilities can be revoked (invalidated). When an RPC
  object is destroyed, all capabilities pointing to it become invalid.

### Capability Types in Genode

Genode distinguishes several kinds of capabilities based on what they refer to:

1. **Session capabilities.** The most common type. A session capability
   refers to a service session -- an ongoing relationship between a client
   and a server. Example: a `Log_session` capability lets a client write
   log messages to a specific log session on a LOG server.

2. **Parent capability.** Every component holds an implicit capability to
   its parent. This is the channel through which it requests resources and
   sessions. The parent capability is never explicitly passed -- it's
   built into the component framework.

3. **Dataspace capabilities.** Represent shared-memory regions. A
   `Ram_dataspace` capability grants access to a specific region of
   physical memory. Dataspaces are the mechanism for bulk data transfer
   between components (the RPC path is for small messages and control).

4. **Signal capabilities.** Used for asynchronous notifications. A
   signal source produces signals; holders of the signal capability can
   register handlers. Signals are Genode's primary async notification
   mechanism -- they don't carry data, just wake up the receiver.

### Sessions: The Service Contract

A **session** is the central concept of Genode's inter-component communication.
It represents an established relationship between a client component and a
server component, with negotiated resource commitments.

Session lifecycle:

1. **Request.** A client asks its parent to create a session of a specific
   type (e.g., `Gui::Session`, `File_system::Session`, `Nic::Session`).
   The request includes a label string and optional session arguments.

2. **Routing.** The parent routes the session request according to its
   policy (see Section 2). The request may traverse multiple levels of
   the component tree.

3. **Creation.** The server creates a session object, allocates resources
   for it (e.g., a shared-memory buffer), and returns a session capability
   to the client.

4. **Use.** The client invokes RPC methods on the session capability.
   The server handles the calls. Both sides can use shared dataspaces
   for bulk data.

5. **Close.** Either side can close the session. Resources committed
   to the session are released back.

This model is fundamentally different from Unix IPC (anonymous
pipes/sockets). Every session is:

- **Typed** -- the interface is known at compile time.
- **Named** -- sessions carry a label used for routing and policy.
- **Resource-accounted** -- the client explicitly donates RAM to the
  server via a "session quota" to fund the server-side state for this
  session. This prevents denial-of-service through resource exhaustion.

### Resource Trading

Genode's resource model is unique and worth studying closely. Resources
(primarily RAM) flow through the component tree:

- The kernel grants a fixed RAM budget to core (the root component).
- Core grants budgets to its children (typically just init).
- Init grants budgets to its children according to the deployment config.
- Each component can donate RAM to servers when opening sessions.

The `session_quota` mechanism works as follows: when a client opens a
session, it specifies how much RAM it donates. This RAM transfer goes
from the client's budget to the server's budget. The server uses this
donated RAM to allocate server-side state for the session. When the
session closes, the RAM flows back.

This creates a closed accounting system:
- No component can use more RAM than it was granted.
- Servers don't need their own large budgets -- clients fund their sessions.
- Resource exhaustion is contained: a misbehaving client can only exhaust
  its own budget, not the server's.

### Capability Invocation vs. Delegation

Genode distinguishes two fundamental operations on capabilities:

**Invocation:** calling an RPC method on the capability. The caller sends
a message to the RPC object named by the capability, the server processes
it and returns a result. This is synchronous in Genode -- the caller
blocks until the server replies. (Asynchronous interaction uses signals
and shared memory.)

**Delegation:** passing a capability as an argument in an RPC call. When
a capability appears as a parameter or return value, the kernel transfers
the capability reference to the receiving component. The receiver now
holds an independent reference to the same RPC object. This is how
authority propagates through the system.

Example: when a client opens a `File_system::Session`, the session
creation returns a session capability. If the file system server needs
to allocate memory, it calls back to the client's RAM service using a
RAM capability that was delegated during session setup.

Capabilities in Genode RPC are transferred by the kernel during the IPC
operation -- the framework marshals them into a special "capability
argument" slot in the IPC message, and the kernel copies the capability
reference into the receiver's capability space. This is transparent to
application code: capabilities appear as typed C++ objects in the RPC
interface.

## 2. Session Routing

### The Problem Session Routing Solves

In a traditional OS, services are found via well-known names in a global
namespace (D-Bus addresses, socket paths, service names). This creates
ambient authority -- any process can connect to any service if it knows
the name.

Genode has no global service namespace. A component can only obtain
sessions through its parent. The parent decides *which* server to route
each session request to. This means:

- Service visibility is controlled structurally.
- A component can only reach services its parent explicitly allows.
- Different children of the same parent can be routed to different
  servers for the same service type.

### Parent-Child Relationship

Every Genode component (except core) has exactly one parent. The parent:

1. Created the child (spawned it with an initial set of resources).
2. Intercepts all session requests from the child.
3. Routes requests according to its routing policy.
4. Can deny requests entirely (the child gets an error).

This creates a tree structure where authority flows downward. A child
cannot bypass its parent to reach a service the parent didn't approve.

### Init's Routing Configuration

The init process (Genode's `init`) reads an XML configuration that
specifies which services to start and how to route their session requests.
This is the core of system policy.

A minimal init config:

```xml
<config>
  <parent-provides>
    <service name="LOG"/>
    <service name="ROM"/>
    <service name="CPU"/>
    <service name="RAM"/>
    <service name="PD"/>
  </parent-provides>

  <start name="timer">
    <resource name="RAM" quantum="1M"/>
    <provides> <service name="Timer"/> </provides>
    <route>
      <service name="ROM"> <parent/> </service>
      <service name="LOG"> <parent/> </service>
      <service name="CPU"> <parent/> </service>
      <service name="RAM"> <parent/> </service>
      <service name="PD">  <parent/> </service>
    </route>
  </start>

  <start name="test-log">
    <resource name="RAM" quantum="1M"/>
    <route>
      <service name="Timer"> <child name="timer"/> </service>
      <service name="LOG">   <parent/> </service>
      <!-- remaining services routed to parent by default -->
      <any-service> <parent/> </any-service>
    </route>
  </start>
</config>
```

Key routing directives:

- `<parent/>` -- route to the parent (upward delegation).
- `<child name="x"/>` -- route to a specific child (sibling routing).
- `<any-child/>` -- route to any child that provides the service.
- `<any-service>` -- catch-all for unspecified service types.

### Label-Based Routing

Labels are strings attached to session requests. They carry
context about *who* is requesting and *what* they want, enabling
fine-grained routing decisions.

When a client requests a session, it attaches a label. As the request
traverses the routing tree, each intermediate component (typically init)
can prepend its own label. By the time the request reaches the server,
the label encodes the full path through the component tree.

Example: a component named `my-app` inside an init subsystem named
`apps` requests a `File_system` session with label `"data"`. The
composed label arriving at the file system server is:
`"apps -> my-app -> data"`.

The server can use this label for:
- **Access control.** Grant different permissions based on who is asking.
- **Isolation.** Store data in different directories per client.
- **Logging.** Identify which component generated a message.

Label-based routing in init config:

```xml
<start name="fs">
  <provides> <service name="File_system"/> </provides>
  <route> ... </route>
</start>

<start name="app-a">
  <route>
    <service name="File_system" label="data">
      <child name="fs"/>
    </service>
    <service name="File_system" label="config">
      <child name="config-fs"/>
    </service>
  </route>
</start>
```

Here, `app-a`'s file system requests are split: requests labeled `"data"`
go to one server, requests labeled `"config"` go to another. The
application code is unchanged -- the routing is entirely a deployment
decision.

### Routing as Policy

The critical insight is that **routing IS access control**. There is no
separate permission system. If a component's route config doesn't include
a path to a network service, that component has no network access --
period. It cannot discover the network service because it has no way to
name it.

This replaces:
- Firewall rules (routing controls which network services are reachable)
- File permissions (routing controls which file system sessions are available)
- Process isolation policies (routing controls everything)

The routing configuration is equivalent to a whitelist of allowed
service connections for each component. Adding or removing access
means editing the init config, not modifying the component's code
or the server's access control lists.

### Dynamic Routing and Sculpt

In the static case (Genode's test scenarios), routing is defined once
in init's config. In Sculpt OS (Section 6), the routing configuration
can be modified at runtime, allowing users to install applications and
connect them to services dynamically.

## 3. VFS on Top of Capabilities

### The VFS Layer

Genode's VFS (Virtual File System) is a library-level abstraction, not a
kernel feature. It provides a path-based file-like interface implemented
as a plugin architecture within a component's address space.

The VFS exists because many existing applications (and libc) expect
file-like access patterns. Rather than forcing all code to use Genode's
native session/capability model, the VFS provides a translation layer.

Architecture:

```
Application code
  |
  |  POSIX: open(), read(), write()
  v
libc (Genode's port of FreeBSD libc)
  |
  |  VFS API: vfs_open(), vfs_read(), vfs_write()
  v
VFS library (in-process)
  |
  |  Plugin dispatch based on mount point
  v
VFS plugins (in-process)
  |
  +--> ram_fs plugin (in-memory file system)
  +--> <fs> plugin (delegates to File_system session)
  +--> <terminal> plugin (delegates to Terminal session)
  +--> <log> plugin (delegates to LOG session)
  +--> <nic> plugin (delegates to Nic session, for socket layer)
  +--> <block> plugin (delegates to Block session)
  +--> <dir> plugin (combines subtrees)
  +--> <tar> plugin (read-only tar archive)
  +--> <import> plugin (populate from ROM)
  +--> <pipe> plugin (in-process pipe pair)
  +--> <rtc> plugin (system clock)
  +--> <zero> plugin (/dev/zero equivalent)
  +--> <null> plugin (/dev/null equivalent)
  ...
```

### VFS Plugin Architecture

Each VFS plugin is a dynamically loadable library (or statically linked
module) that implements a file-system-like interface. Plugins handle:

- **open/close** -- create/destroy file handles
- **read/write** -- data transfer
- **stat** -- metadata queries
- **readdir** -- directory enumeration
- **ioctl** -- device-specific control (limited)

Plugins are composed by the VFS configuration, which is XML embedded in
the component's config:

```xml
<config>
  <vfs>
    <dir name="dev">
      <log/>
      <null/>
      <zero/>
      <terminal name="stdin" label="input"/>
      <inline name="rtc">2024-01-01 00:00</inline>
    </dir>
    <dir name="tmp"> <ram/> </dir>
    <dir name="data"> <fs label="persistent"/> </dir>
    <dir name="socket"> <lxip dhcp="yes"/> </dir>
  </vfs>
  <libc stdout="/dev/log" stderr="/dev/log" stdin="/dev/stdin"
        rtc="/dev/rtc" socket="/socket"/>
</config>
```

This config creates a virtual filesystem tree:
- `/dev/log` -- writes go to the LOG session
- `/dev/null`, `/dev/zero` -- standard synthetic files
- `/dev/stdin` -- reads from a Terminal session
- `/tmp/` -- in-memory filesystem (RAM-backed)
- `/data/` -- delegates to a File_system session labeled "persistent"
- `/socket/` -- network sockets via lwIP stack (in-process)

The `<fs>` plugin is the bridge from VFS to Genode's capability world.
When the application does `open("/data/foo.txt")`, the `<fs>` plugin
translates this into a `File_system::Session` RPC call to the external
file system server that the component's routing connects to.

### File System Components

Genode has several file system server components:

- **ram_fs** -- in-memory file system server. Multiple components can
  share files through it by routing their `File_system` sessions to it.
- **vfs_server** (previously `vfs`) -- a file system server backed by
  the VFS plugin architecture itself. This enables recursive composition:
  a VFS server can mount another VFS server.
- **fatfs** -- FAT file system driver over a Block session.
- **ext2_fs** -- ext2/3/4 via a ported Linux implementation (rump kernel).
- **store_fs / recall_fs** -- content-hash-based storage (experimental
  in some Genode releases).

The file system server is a regular Genode component. It receives a
Block session (from a block device driver), provides File_system sessions,
and the routing determines who can access what:

```
block_driver -> provides Block session
       |
       v
fatfs -> consumes Block session, provides File_system session
       |
       v
application -> consumes File_system session via VFS <fs> plugin
```

### Libc Integration

Genode ports a substantial subset of FreeBSD's libc. The integration
point is the VFS: libc's file operations are implemented by calling
the VFS layer, which dispatches to plugins, which invoke Genode
sessions as needed.

The libc port modifies FreeBSD libc minimally. Most changes are in
the "backend" layer that replaces kernel syscalls with VFS calls:

- `open()` -> `vfs_open()` -> VFS plugin dispatch
- `read()` -> `vfs_read()` -> VFS plugin
- `socket()` -> via VFS socket plugin (`<lxip>` or `<lwip>`)
- `mmap()` -> supported for anonymous mappings and file-backed read-only
- `fork()` -> **NOT supported** (no `fork()` in Genode)
- `exec()` -> **NOT supported** (no in-place process replacement)
- `pthreads` -> supported via Genode's Thread API
- `select()/poll()` -> supported via VFS notification mechanism
- `signal()` -> partial support (SIGCHLD, basic signal delivery)

The key architectural decision: libc talks to the VFS library (in-process),
the VFS talks to Genode sessions (cross-process RPC). Application code
never directly touches Genode capabilities -- the VFS mediates everything.

## 4. POSIX Compatibility

### The Noux Approach (Historical)

Genode's early POSIX approach was **Noux**, a process runtime that emulated
Unix-like process semantics (fork, exec, pipe) on top of Genode. Noux
ran as a single Genode component containing multiple "Noux processes"
that shared an address space but had separate VFS views.

Noux supported:
- `fork()` via copy-on-write within the Noux address space
- `exec()` via in-place program replacement
- `pipe()` for inter-process communication
- A shared file system namespace

Noux was eventually deprecated because:
1. It conflated multiple processes in one address space, undermining
   Genode's isolation model.
2. Fork emulation was fragile and slow.
3. The libc-based VFS approach (Section 3) achieved better compatibility
   with less complexity.

### Current Approach: libc + VFS

The current POSIX compatibility strategy:

1. **FreeBSD libc port.** Provides standard C library functions. Modified
   to use Genode's VFS instead of kernel syscalls.

2. **VFS plugins as POSIX backends.** Each POSIX I/O pattern maps to a
   VFS plugin:
   - File I/O -> `<fs>` plugin -> File_system session
   - Sockets -> `<lxip>` or `<lwip>` plugin -> Nic session (in-process
     TCP/IP stack)
   - Terminal I/O -> `<terminal>` plugin -> Terminal session
   - Device access -> custom VFS plugins

3. **No fork().** The most significant POSIX omission. Programs that
   require `fork()` must be modified to use `posix_spawn()` or Genode's
   native child-spawning mechanism. In practice, many programs use
   fork() only for daemon patterns or subprocess creation, and can be
   adapted.

4. **No exec().** Related to no fork(): there's no in-place process
   replacement. New processes are created as new Genode components.

5. **Signals.** Basic support -- enough for SIGCHLD notification and
   simple signal handling. Complex signal semantics (real-time signals,
   signal-driven I/O) are not supported.

6. **pthreads.** Fully supported via Genode's native threading.

7. **mmap.** Anonymous mappings and read-only file-backed mappings work.
   MAP_SHARED with write semantics is limited.

### What Works in Practice

Genode has successfully ported:

- **Qt5/Qt6** -- the full widget toolkit, including QtWebEngine (Chromium).
  This is the basis of Sculpt's GUI.
- **VirtualBox** -- full x86 virtualization (runs Windows, Linux guests).
- **Mesa/Gallium** -- GPU-accelerated 3D graphics.
- **curl, wget, fetchmail** -- network utilities.
- **GCC toolchain** -- compiler, assembler, linker running on Genode.
- **bash** -- with limitations (no job control via signals, no fork-heavy
  patterns). Works for simple scripting.
- **vim, nano** -- terminal editors.
- **OpenSSL/LibreSSL** -- cryptographic libraries.
- **Various system utilities** -- ls, cp, rm, etc. via Coreutils port.

Applications that don't port well:
- Anything deeply dependent on fork+exec patterns (e.g., traditional
  Unix shells for complex scripting).
- Programs relying on procfs, sysfs, or Linux-specific interfaces.
- Daemons using inotify or Linux-specific async I/O.
- Programs that assume global file system namespace visibility.

### Practical Porting Effort

For most POSIX applications, porting involves:

1. Build the application using Genode's ports system (downloads upstream
   source, applies patches, builds with Genode's toolchain).
2. Write a VFS configuration that provides the file-like resources the
   application expects.
3. Write a routing configuration that connects the application to
   required services.
4. Patch `fork()` calls if present (usually replacing with
   `posix_spawn()` or restructuring to avoid subprocess creation).

The VFS configuration is where the "impedance mismatch" between POSIX
expectations and Genode capabilities is resolved. The application thinks
it's accessing `/etc/resolv.conf` -- the VFS plugin infrastructure
translates this to capability-mediated access.

## 5. Component Architecture

### Core, Init, and User Components

**Core** (or `base-hw`/`base-nova`/etc.): the lowest-level component,
running directly on the microkernel. Core provides the fundamental
services: RAM allocation, CPU time (PD sessions), ROM access (boot
modules), IRQ delivery, and I/O memory access. Core is the only
component with direct hardware access. Everything else goes through core.

**Init**: the first user-level component, child of core. Init reads its
XML configuration and manages the component tree. Init's responsibilities:
- Parse `<start>` entries and spawn components.
- Route session requests between components according to `<route>` rules.
- Manage component lifecycle (restart policies, resource reclamation).
- Propagate configuration changes (dynamic reconfiguration in Sculpt).

**User components**: all other components. They can be:
- **Servers** that provide sessions (drivers, file systems, network stacks).
- **Clients** that consume sessions (applications).
- **Both** simultaneously (a network stack consumes NIC sessions and
  provides socket-level sessions).
- **Sub-inits** -- components that run their own init-like management
  for a subtree of components.

### Resource Trading in Practice

Resources in Genode flow through the tree. A concrete example:

1. Core has 256 MB RAM total.
2. Core grants 250 MB to init, keeps 6 MB for kernel structures.
3. Init grants 10 MB to the timer driver, 50 MB to the GUI subsystem,
   20 MB to the network subsystem, 5 MB to a log server.
4. When the GUI subsystem starts a framebuffer driver, it donates 8 MB
   from its 50 MB budget to the driver as a session quota.
5. The framebuffer driver uses this donated RAM for the frame buffer
   allocation.

If the GUI subsystem wants more RAM for a new application, it can
reclaim RAM by closing sessions (getting donated RAM back) or requesting
more from its parent (init).

The accounting is strict: at any point, the sum of all RAM budgets
across all components equals the total system RAM. There is no
over-commit. This prevents the "OOM killer" problem -- each component
knows exactly how much RAM it can use.

### Practical Component Patterns

**Driver components** follow a common pattern:
- Receive: Platform session (for I/O port/memory access), IRQ session
- Provide: A device-specific session (NIC, Block, GPU, Audio, etc.)
- Stateless: all per-client state funded by session quota

**Multiplexer components**:
- Receive: one instance of a service
- Provide: multiple instances to clients
- Example: NIC router receives one NIC session, provides multiple
  sessions with packet routing between clients

**Proxy components**:
- Forward one session type, possibly filtering or transforming
- Example: nic_bridge, nitpicker (GUI multiplexer), VFS server

**Subsystem inits**:
- A component running its own init for a group of related components
- Isolates the subtree: crash of the subsystem doesn't affect siblings
- Example: Sculpt's drivers subsystem, network subsystem

## 6. Sculpt OS

### What Sculpt Demonstrates

Sculpt OS is Genode's demonstration desktop operating system. It turns
the component framework into a usable system where:

- Users install and run applications at runtime.
- Each application runs in its own isolated component with explicitly
  configured capabilities.
- A GUI lets users connect applications to services (routing).
- The entire system is reconfigurable without reboot.

### Architecture

Sculpt's component tree:

```
core
  |
  init
    |
    +--> drivers subsystem (sub-init)
    |      +--> platform_drv (PCI, IOMMU)
    |      +--> fb_drv (framebuffer)
    |      +--> usb_drv (USB host controller)
    |      +--> wifi_drv (wireless)
    |      +--> ahci_drv (SATA)
    |      +--> nvme_drv (NVMe)
    |      +--> ...
    |
    +--> runtime subsystem (sub-init, user-managed)
    |      +--> (user-installed applications)
    |
    +--> leitzentrale (management GUI)
    |      +--> system shell
    |      +--> config editor
    |
    +--> nitpicker (GUI multiplexer)
    +--> nic_router (network multiplexer)
    +--> ram_fs (shared file system)
    +--> ...
```

### User Experience of Capabilities

In Sculpt, installing an application means:

1. Download the package (a Genode component archive).
2. Edit a "deploy" configuration that specifies which services the
   application can access (routing rules).
3. The runtime subsystem spawns the component with the specified routing.

A text editor gets: `File_system` session (to read/write files), `GUI`
session (for display), `Terminal` session (optionally). It does NOT get:
network access, block device access, or access to other applications'
file systems.

A web browser gets: `GUI` session, `Nic` session (for network), `GPU`
session (for rendering), `File_system` session (for downloads). Each
service connection is an explicit choice.

The deploy config is the security policy. A user can see exactly what
authority each application has, and can change it by editing the config.

### Lessons from Sculpt

1. **Capabilities need a management UI.** Raw capability graphs are
   incomprehensible to users. Sculpt provides a GUI that presents
   service connections in an understandable way (though it's still
   oriented toward power users).

2. **Routing is the killer feature.** Being able to route the same
   session type to different servers for different clients is extremely
   powerful. One application's "file system" is local storage; another's
   is a network share -- same code, different routing.

3. **Sub-inits provide failure isolation.** The drivers subsystem can
   crash and restart without affecting applications. Sculpt's robustness
   comes from this hierarchical isolation.

4. **Dynamic reconfiguration is essential.** A static boot config
   (like capOS's current manifest) is fine for servers and embedded
   systems, but a general-purpose OS needs to add/remove/reconfigure
   components at runtime.

5. **Package management is a routing problem.** Installing an application
   in Sculpt is not "copy binary to disk" -- it's "add a component to
   the runtime subsystem with specific routing rules." The binary is
   almost secondary to the routing.

6. **POSIX compat through VFS works.** Sculpt runs real desktop
   applications (Qt-based apps, VirtualBox, web browser) using the
   VFS-mediated POSIX layer. The capability model doesn't prevent
   running complex existing software -- it just requires explicit
   service configuration.

## 7. Relevance to capOS

### VFS Capability Design

**Genode's approach:** The VFS is an in-process library with a plugin
architecture. It mediates between libc/POSIX and Genode sessions. The VFS
configuration is per-component XML.

**Lessons for capOS:**

1. **Don't put the VFS in the kernel.** Genode's VFS is entirely
   userspace, which is correct for a capability OS. capOS should do the
   same -- the VFS is a library linked into processes that need POSIX
   compatibility, not a kernel subsystem.

2. **Plugin model maps well to Cap'n Proto.** Each Genode VFS plugin
   bridges to a specific session type. In capOS, each VFS "backend"
   would bridge to a specific capability interface:

   | Genode VFS plugin | capOS VFS backend |
   |---|---|
   | `<fs>` -> File_system session | `FsBackend` -> Namespace + Store caps |
   | `<terminal>` -> Terminal session | `TerminalBackend` -> Console cap |
   | `<lxip>` -> Nic session | `NetBackend` -> TcpSocket/UdpSocket caps |
   | `<log>` -> LOG session | `LogBackend` -> Console cap |
   | `<ram>` -> in-process RAM | `RamBackend` -> in-process (no cap needed) |

3. **VFS config should be declarative.** Rather than hardcoding mount
   points, capOS processes using `libcapos-posix` should receive a
   VFS mount table as part of their initial capability set. This could
   be a Cap'n Proto struct:

   ```capnp
   struct VfsMountTable {
       mounts @0 :List(VfsMount);
   }

   struct VfsMount {
       path @0 :Text;           # mount point, e.g. "/data"
       union {
           namespace @1 :Void;  # use the Namespace cap named in capName
           console @2 :Void;    # use a Console cap
           ram @3 :Void;        # in-memory filesystem
           socket @4 :Void;     # socket interface
       }
       capName @5 :Text;        # name of the cap in CapSet backing this mount
   }
   ```

   This separates the VFS topology (a deployment decision) from the
   application code (which just calls `open()`).

4. **Genode's `<fs>` plugin is the key analog.** capOS's Namespace
   capability is equivalent to Genode's File_system session. The
   `libcapos-posix` path resolution layer (`open()` -> `namespace.resolve()`)
   is exactly Genode's `<fs>` VFS plugin. The existing capOS design
   in `docs/proposals/userspace-binaries-proposal.md` is already on the right track.

5. **Consider streaming for large files.** Genode uses shared-memory
   dataspaces for bulk data transfer in file system sessions. capOS's
   current Store interface returns `Data` (a capnp blob), which means
   the entire object is copied per `get()` call. For large files, a
   streaming interface (with a shared-memory buffer and cursor) would
   be more efficient. This is capOS's Open Question #4.

### Session Routing Patterns

**Genode's approach:** XML-configured routing in init, label-based
dispatch, parent mediates all session requests.

**Lessons for capOS:**

1. **The manifest IS the routing config.** capOS's `SystemManifest`
   with structured `CapRef` source entries such as
   `{ service = { service = "net-stack", export = "nic" } }` is
   functionally equivalent to Genode's init routing config. The
   capOS design already handles the static case well.

2. **Label-based routing is valuable.** Genode's ability to route
   different requests from the same client to different servers
   (based on labels) maps directly to capOS's capability naming.
   capOS already does this implicitly -- a process can receive
   separate `Namespace` caps for "config" and "data". The key
   insight is that this should be a deployment-time decision, not
   an application-time decision.

3. **Consider dynamic routing.** capOS's current manifest is static
   (baked into the ISO). For a more flexible system, init should
   support runtime reconfiguration:
   - Reload the manifest from a Store cap.
   - Add/remove services without reboot.
   - Re-route sessions when services restart.

   Genode achieves this via init's config ROM, which can be updated
   at runtime. capOS could achieve it by having init watch a
   `Namespace` cap for manifest updates.

4. **Parent-mediated routing has costs.** In Genode, every session
   request traverses the component tree. This adds latency and
   complexity. capOS's direct capability passing (a process holds
   a cap directly, not through its parent) avoids this overhead.
   The tradeoff: capOS has less runtime control over routing (once
   a cap is passed, the parent can't intercept invocations on it).

   This is a deliberate design choice. capOS favors direct caps
   (lower overhead, simpler) over proxied caps (more control).
   Genode's session routing is powerful but adds a layer of
   indirection that may not be worth it for capOS's use case.

5. **Service export needs a protocol.** Genode's session model
   has server components explicitly `announce` what services they
   provide. capOS's `ProcessHandle.exported()` mechanism serves
   the same purpose. The manifest's `exports` field pre-declares
   what a service will export, which helps init plan the dependency
   graph before spawning anything.

### POSIX Compatibility Without Compromising Capabilities

**Genode's approach:** libc port + VFS + per-component VFS config.
No global namespace. No fork(). Applications see a curated file
tree, not the real system.

**Lessons for capOS:**

1. **The VFS is a capability adapter, not a capability.** The VFS
   library runs inside the process that needs POSIX compatibility.
   It doesn't weaken the capability model because it can only
   access capabilities the process was granted. This matches capOS's
   `libcapos-posix` design exactly.

2. **musl over FreeBSD libc.** Genode uses FreeBSD libc because of
   its clean backend interface. capOS plans to use musl, which has
   an even cleaner `__syscall()` interface. This is a good choice.
   Genode's experience shows that the libc implementation matters
   less than the VFS/backend layer quality.

3. **No fork() is fine.** Genode has operated without fork() for
   over 15 years and runs complex software (Qt, VirtualBox,
   Chromium). The applications that truly need fork() are rare and
   usually need only `posix_spawn()` semantics. capOS should not
   attempt to implement fork() -- focus on `posix_spawn()` backed
   by `ProcessSpawner` cap.

4. **Sockets via in-process TCP/IP stack.** Genode's `<lxip>` VFS
   plugin runs an lwIP TCP/IP stack inside the application process,
   using the NIC session for raw packet I/O. This avoids the
   overhead of routing every socket call through a separate network
   stack component.

   capOS could offer a similar choice:
   - **Out-of-process:** socket calls go to the network stack
     component via `TcpSocket`/`UdpSocket` caps (safer, more
     isolated, more overhead).
   - **In-process:** an lwIP/smoltcp library runs inside the
     application, consuming a raw `Nic` cap (less isolation, less
     overhead, more authority).

   For most applications, out-of-process sockets via caps are fine.
   For high-performance networking (database, web server), an
   in-process stack over a raw NIC cap may be needed.

5. **select/poll/epoll need async caps.** Genode implements
   select/poll via VFS notifications (signals on file readiness).
   capOS needs the async capability rings (io_uring-inspired) from
   Stage 4 before select/poll can work. This is a natural fit:
   each polled fd maps to a pending capability invocation in the
   completion ring.

### Component Patterns for Cap'n Proto Interfaces

**Genode's patterns and their capOS/Cap'n Proto equivalents:**

1. **Session creation = factory method on a capability.**

   Genode: client requests a `Nic::Session` from its parent, which
   routes to a NIC driver server.

   capOS: client holds a `NetworkManager` cap and calls
   `create_tcp_socket()` to get a `TcpSocket` cap. The factory
   pattern is the same, but capOS does it via direct cap invocation
   instead of parent-mediated session requests.

   Cap'n Proto naturally supports this via interfaces that return
   interfaces:

   ```capnp
   interface NetworkManager {
       createTcpSocket @0 () -> (socket :TcpSocket);
       createUdpSocket @1 () -> (socket :UdpSocket);
       createTcpListener @2 (addr :IpAddress, port :UInt16)
           -> (listener :TcpListener);
   }
   ```

2. **Resource quotas in session creation.**

   Genode: session requests include a RAM quota donated from client
   to server.

   capOS should consider this pattern. Currently, capOS processes
   receive a `FrameAllocator` cap for memory. If a server needs
   to allocate memory per-client, the client should fund it.
   Cap'n Proto schema could encode this:

   ```capnp
   interface FileSystem {
       open @0 (path :Text, bufferPages :UInt32)
           -> (file :File);
       # bufferPages: number of pages the client donates for
       # server-side buffering. Server allocates from a shared
       # FrameAllocator or the client passes frames explicitly.
   }
   ```

   This prevents the denial-of-service problem where a client
   opens many sessions, exhausting the server's memory.

3. **Multiplexer components.**

   Genode: `nic_router` takes one NIC session, provides many.
   `nitpicker` takes one framebuffer, provides many GUI sessions.

   capOS equivalent: a process that consumes a `Nic` cap and
   provides multiple `TcpSocket`/`UdpSocket` caps. This is
   already what the network stack component does in capOS's
   service architecture proposal. Cap'n Proto's interface model
   makes this natural -- the multiplexer implements one interface
   (NetworkManager) using another (Nic).

4. **Attenuation = capability narrowing.**

   Genode: servers can return restricted capabilities (e.g., a
   read-only file handle from a read-write file system session).

   capOS: already planned via Fetch -> HttpEndpoint narrowing,
   Store -> read-only Store, Namespace -> scoped Namespace. The
   pattern is sound. Cap'n Proto interfaces make the attenuation
   explicit in the schema.

5. **Dataspace pattern for bulk data.**

   Genode uses shared-memory dataspaces for efficient bulk transfer
   (file contents, network packets, framebuffers). The RPC path
   carries only small control messages and capability references.

   capOS currently moves Cap'n Proto control messages through capability
   rings and bounded kernel scratch, with no zero-copy bulk-data object yet.
   For bulk data, capOS should add a `SharedBuffer` capability:

   ```capnp
   interface SharedBuffer {
       # Map a shared memory region into caller's address space
       map @0 () -> (addr :UInt64, size :UInt64);
       # Notify that data has been written to the buffer
       signal @1 (offset :UInt64, length :UInt64) -> ();
   }
   ```

   File system and network operations would use SharedBuffer for
   data transfer and capability invocations for control, matching
   Genode's split between RPC and dataspaces.

6. **Sub-init pattern for failure domains.**

   Genode: a sub-init manages a subtree of components. If the
   subtree crashes, only the sub-init restarts it.

   capOS: a supervisor process (not necessarily init) holds a
   `ProcessSpawner` cap and manages a group of services. This is
   already described in the service architecture proposal's
   supervision tree. The key addition from Genode: make sub-
   supervisors a first-class pattern with their own manifest
   fragments, not just ad-hoc supervision loops.

## Summary of Key Takeaways for capOS

| Area | Genode approach | capOS adaptation |
|---|---|---|
| Capability model | Kernel-enforced caps to RPC objects | Kernel-enforced caps to Cap'n Proto objects (aligned) |
| Service discovery | Parent-mediated session routing | Manifest-driven cap passing at spawn (simpler, less dynamic) |
| VFS | In-process library with plugin architecture | `libcapos-posix` with mount table from CapSet (same pattern) |
| POSIX | FreeBSD libc + VFS backends | musl + `libcapos-posix` backends (same architecture) |
| fork() | Not supported | Not supported (use posix_spawn -> ProcessSpawner) |
| Bulk data | Shared-memory dataspaces | SharedBuffer design exists; implementation pending |
| Resource accounting | Session quotas (RAM donated per session) | Authority-accounting design exists; unified ledgers pending |
| Routing labels | String labels on session requests, routed by init | Cap naming in manifest serves same purpose |
| Dynamic reconfig | Init config ROM updated at runtime | Manifest reload via Store cap (future) |
| Failure isolation | Sub-inits as failure domains | Supervisor processes (same concept, different mechanism) |
| Async notification | Signal capabilities | Async cap rings / io_uring model (more general) |

### Top Recommendations

1. **Add session quotas / resource trading.** This is the most important
   Genode pattern capOS hasn't adopted yet. Without it, a malicious client
   can exhaust a server's memory by opening many capability sessions.
   Design resource donation into the Cap'n Proto schema for session-creating
   interfaces.

2. **Design a SharedBuffer capability.** Copying capnp messages through the
   kernel works for control messages but not for bulk data. A shared-memory
   mechanism (like Genode's dataspaces) is essential for file I/O, networking,
   and GPU rendering.

3. **Keep VFS as a library, not a service.** Genode's in-process VFS is the
   right pattern. capOS's `libcapos-posix` should work the same way -- a
   library that translates POSIX calls to capability invocations within the
   process. No VFS server component needed (though a file system *server*
   implementing the Namespace/Store interface is separate).

4. **Add a declarative VFS mount table to process init.** Each POSIX-compat
   process should receive a mount table (as a capnp struct) that maps paths
   to capabilities. This separates deployment policy from application code,
   matching Genode's per-component VFS config.

5. **Plan for dynamic reconfiguration.** The static manifest is fine for now,
   but Sculpt shows that a usable capability OS needs runtime service
   management. Design init so it can accept manifest updates through a cap,
   not just from the boot image.

6. **Don't over-engineer routing.** Genode's parent-mediated session routing
   is powerful but complex. capOS's direct capability passing is simpler and
   sufficient for most use cases. Add proxy/mediator patterns only when
   specific needs arise (e.g., capability revocation, load balancing).

## References

- Genode Foundations book (genode.org/documentation/genode-foundations/)
  -- the authoritative source for architecture, session model, routing,
  VFS, and component composition.
- Norman Feske, "Genode Operating System Framework" (2008-2025) --
  release notes and design documentation at genode.org.
- Sculpt OS documentation at genode.org/download/sculpt -- practical
  deployment of the capability model.
- Genode source repository: github.com/genodelabs/genode -- reference
  implementations of VFS plugins, file system servers, libc port.
