# Research: Browser Engines, Document Engines, and Agent Browsers

Survey of mainstream browser engines, embedding paths, automation protocols,
and Donut Browser-style profile orchestration for
[Browser Capability and Agent Web Sessions](../proposals/browser-capability-proposal.md).

## Source Snapshot

Checked on 2026-04-30:

- Chromium Ozone overview:
  <https://chromium.googlesource.com/chromium/src/+/main/docs/ozone_overview.md>
- Chromium Embedded Framework:
  <https://github.com/chromiumembedded/cef>
- Microsoft Edge WebView2:
  <https://developer.microsoft.com/en-us/microsoft-edge/webview2>
- WebKit ports and WPE WebKit:
  <https://docs.webkit.org/Ports/Introduction.html>,
  <https://webkit.org/wpe/>
- Mozilla Gecko and GeckoView:
  <https://firefox-source-docs.mozilla.org/overview/gecko.html>,
  <https://firefox-source-docs.mozilla.org/mobile/android/geckoview/contributor/geckoview-architecture.html>
- Servo:
  <https://servo.org/>
- Ladybird:
  <https://ladybird.org/>,
  <https://github.com/LadybirdBrowser/ladybird>
- SpiderMonkey:
  <https://spidermonkey.dev/>,
  <https://firefox-source-docs.mozilla.org/js/>
- Boa:
  <https://github.com/boa-dev/boa>
- JavaScriptCore:
  <https://docs.webkit.org/Deep%20Dive/JSC/JavaScriptCore.html>
- QuickJS:
  <https://www.bellard.org/quickjs/>
- Chrome DevTools Protocol:
  <https://chromedevtools.github.io/devtools-protocol/>
- WebDriver BiDi:
  <https://www.w3.org/TR/webdriver-bidi/>
- Playwright browser support:
  <https://playwright.dev/docs/browsers>
- Model Context Protocol, version 2025-11-25 (latest as of this snapshot):
  <https://modelcontextprotocol.io/specification/2025-11-25/>
- Donut Browser:
  <https://github.com/zhom/donutbrowser>,
  <https://donutbrowser.com/mission/>,
  <https://donutbrowser.com/use-cases/automation/>

## Design Consequences For capOS

- Do not make a browser engine a near-term kernel or GUI prerequisite. Modern
  browser engines assume a large userspace substrate: processes, threads,
  shared memory, timers, files, DNS, sockets/TLS, fonts, image codecs, GPU or
  software compositing, profile storage, crash handling, and a sandbox.
- Split browser work into three tracks:
  **agent/shell browser sessions first**, a **cap-native document engine** as
  the middle target, then **visual browser after GUI**. The first track can
  start as a capability wrapper around an external or hosted engine. The middle
  track validates cap-backed web host APIs over provided document data. The
  visual-browser track needs compositor, input, fonts, storage, networking,
  and userspace-driver safety.
- Treat browser profiles as capability objects. Cookies, local storage,
  cache, permissions, proxy selection, downloads, and automation endpoints
  should be held by `BrowserProfile`/`BrowserContext` caps, not ambient files
  under a hidden profile directory.
- Standardize the agent-facing surface above CDP/WebDriver BiDi, not below it.
  CDP is powerful and Chromium-specific; WebDriver BiDi is standardizing
  bidirectional browser automation. capOS should expose a typed, narrowed
  `BrowserSession` capability and use CDP/BiDi/Playwright only as backends.
- Borrow Donut Browser's useful product ideas -- profile isolation, local API,
  persistent sessions, per-profile proxy/VPN selection, MCP integration, and
  AI-control hooks -- without adopting anti-detection as a capOS goal.
  Fingerprint, geolocation, locale, proxy, and user-agent choices must be
  explicit, auditable policy, not stealth defaults.
- Reuse the project rule "the interface is the permission." A process with
  `BrowserNavigate` can navigate; a process with `BrowserReadPage` can inspect
  page state; a process with `BrowserInput` can click/type; a process with
  `BrowserDownload` and a granted `DownloadSink` can receive downloaded bytes.
  Bundling all of those into one raw DevTools port would recreate ambient
  authority.
- Treat a browser as a shell capability, not as the shell. The native shell or
  agent runner may hold a browser session and use it as a tool, but browser
  JavaScript must not directly hold the shell's file, launch, network, or
  approval capabilities.
- Add a middle track for a **cap-native document engine**: JS, DOM/CSS,
  layout, rendering, and perhaps WebAssembly over caller-provided
  document/resource data, with web host APIs backed by explicit capOS
  capabilities. This is not full internet browsing, but it could power local
  HTML/CSS/JS apps and test the browser authority model earlier.

## Engine Portability Surface

### Chromium / Blink

Chromium has the broadest web compatibility and the strongest automation
ecosystem. Ozone is the relevant porting layer: it centralizes low-level input
and graphics behind platform interfaces, supports runtime platform binding, and
expects new platforms to implement an Ozone backend. CEF is the production
embedding path for many native applications: it wraps Chromium/Blink behind
stable APIs, binary distributions, and release branches tracking Chromium.
WebView2 is Microsoft's Windows embedding product around Edge/Chromium, with
evergreen and fixed-version runtime choices.

capOS implications:

- Best near-term backend for agent/shell usage is an external Chromium family
  process controlled through CDP, WebDriver BiDi, or Playwright, with capOS
  wrapping the endpoint as typed caps.
- A native capOS Chromium port is a very large post-GUI project. The likely
  port boundary is Ozone plus a capOS sandbox/profile/network/storage backend,
  not direct Blink surgery.
- CDP must not be directly handed to ordinary capOS workloads. It exposes
  navigation, DOM, network, runtime, storage, input, tracing, and debugging
  authority in one endpoint and has no stable backward-compatibility guarantee
  for tip-of-tree protocol use.

### WebKit / WPE

WebKit's upstream port model makes ports first-class maintainable units.
WebKitGTK and WPE are maintained by Igalia; WPE is specifically designed as a
small-footprint embedded WebKit port with a backend architecture, hardware
acceleration, GStreamer media, and periodic releases.

capOS implications:

- WPE is the most plausible visual-browser candidate once capOS has a GUI
  substrate because it is meant for embedded systems without a full desktop
  toolkit.
- WPE still needs a platform backend, graphics/EGL or software fallback,
  input, fonts, networking/TLS, storage, media dependencies, and an update
  story. It is not an early shell feature.
- WebKit's port/release discipline is useful precedent for a capOS browser
  backend: keep platform-specific code narrow and upstreamable where possible.

### Gecko / GeckoView

Gecko is Firefox's full web platform: JavaScript, layout, graphics, media,
networking, profiles, preferences, principals, and more. GeckoView is Mozilla's
Android embedding library and powers active Mozilla Android browsers. Its API
separates `GeckoRuntime`, `GeckoSession`, and `GeckoView`, delegates storage and
UI behavior to embedders, and hides internal principals from the public API.

capOS implications:

- Gecko is credible as an external backend, especially for browser diversity
  and WebDriver BiDi, but GeckoView itself is Android-specific and not a
  desktop/no-OS embedding path for capOS.
- Gecko's principal model is important precedent: origin/security context is
  a first-class internal object. capOS should make origin/session policy
  explicit in its browser capability layer rather than flattening it to URLs.
- The runtime/session/view split maps cleanly to capOS capabilities:
  engine/service supervision, per-profile context, and visual surface should
  be separate authorities.

### Servo

Servo is a Rust browser engine with WebView embedding ambitions, WebGL/WebGPU
support, modular architecture, parallel layout, and active cross-platform work.
It is not yet a mainstream compatibility replacement for Chromium/WebKit/Gecko,
but it is closer to capOS's implementation culture than the large C++ engines.

capOS implications:

- Servo is the best research-aligned engine to track for a future native
  capOS engine experiment because Rust and modular embedding fit capOS better
  than direct Chromium/Gecko ports.
- It is not the first user-facing browser choice if the goal is broad web
  compatibility for operators or agents.
- Servo's WebView API and crate decomposition are worth watching for a
  possible `BrowserView`/`BrowserSession` backend once capOS has GUI and
  ordinary userspace dependencies.

### Ladybird / LibWeb

Ladybird is building an independent browser engine from scratch, with an alpha
target for Linux and macOS in 2026. It uses a multi-process architecture and is
focused on standards rather than embedding today. It is valuable prior art for
independent engine architecture and process separation, not a near-term capOS
dependency.

capOS implications:

- Track Ladybird for architecture ideas: isolated renderer processes, separate
  network and image-decoder processes, and specification-driven development.
- Do not depend on Ladybird for capOS's browser plan until its API, platform
  support, and compatibility stabilize.
- Its "no inherited engine" posture is inspirational but not pragmatic for
  capOS near-term. capOS should expose capability-native browser APIs while
  reusing maintained engines underneath.

## Cap-Native Document Engine Substrate

A cap-native document engine is a smaller target than a full browser. It
executes a document graph supplied by capOS -- for example a boot package,
Store object, generated UI bundle, or test fixture -- and returns a rendered
surface, screenshot, event stream, and bounded DOM/accessibility snapshot.
Networking, storage, permissions, clipboard, downloads, and device access are
not internal browser privileges; they are host bindings backed by separate
capabilities.

This track changes the portability question. Instead of asking "which browser
can capOS port?", it asks "which engine pieces can run with capOS as the host
environment?"

### Servo As A Document Engine

Servo is the closest architectural fit for this middle track. It is Rust,
embeddable, modular, parallel, and already presents itself as a WebView-capable
engine. The value for capOS is not only memory safety. It is the possibility
of treating the embedding API as the boundary where `fetch`, storage,
permission prompts, surfaces, and resource loading are backed by typed caps.

Risks:

- Servo still brings a large standards surface.
- API stability and completeness must be checked at implementation time.
- A WebView embedding API is not the same as a small deterministic
  document-rendering library; capOS may still need substantial host glue.

### Ladybird / LibWeb As A Document Engine

Ladybird's LibWeb/LibJS stack is attractive as readable independent-engine
prior art. Its multi-process browser architecture also maps well to capOS
service decomposition. However, Ladybird is focused on building a full browser,
not on providing a stable embeddable document engine for external hosts.

capOS should track it for design ideas and perhaps future experiments, but
should not treat it as the near-term substrate for local HTML/CSS/JS apps.

### SpiderMonkey

SpiderMonkey is Mozilla's JavaScript and WebAssembly engine, used by Firefox
and Servo, and can be embedded in C++ and Rust projects. It is useful if capOS
wants a serious JS/Wasm runtime while building DOM/layout/rendering and host
bindings separately or while experimenting with Servo components.

The tradeoff is that SpiderMonkey is only the JS/Wasm engine. DOM, CSS, layout,
rendering, networking, storage, event loops, Web APIs, and browser security
objects remain host responsibilities unless capOS embeds a larger engine.

### JavaScriptCore

JavaScriptCore is WebKit's ECMAScript engine and an optimizing VM with
interpreter and JIT tiers. It is a mature engine, but its natural home is
inside WebKit. For capOS, JavaScriptCore is most relevant if the visual-browser
track chooses WPE/WebKit; it is less obviously attractive as a standalone
cap-native document-engine substrate than Servo or a Rust-native JS engine.

### Boa

Boa is an embeddable JavaScript engine written in Rust, with actively
maintained crates and a focus on ECMAScript conformance. It is attractive for
capOS experiments because it is Rust, smaller than the mainstream browser JS
engines, and easier to embed in native services.

The tradeoff is compatibility and performance. Boa is a plausible substrate
for trusted/local UI scripting or early host-binding proofs, not a replacement
for the JS engine in a general web browser.

### QuickJS

QuickJS is a small embeddable JavaScript engine. It is useful as a reference
for tiny host-controlled JS runtimes and deterministic local scripting. It is
not a DOM/layout/rendering engine and should not be mistaken for browser
compatibility.

### Consequences

- A cap-native document engine should start with local/trusted bundles, not
  arbitrary internet pages.
- The host API contract matters more than the JS engine choice. `fetch`,
  storage, clipboard, downloads, timers, workers, and Wasm imports must all be
  explicit cap-backed facets.
- The first proof can be intentionally small: render a packaged HTML/CSS/JS
  dashboard or demo UI, capture a screenshot and accessibility/DOM snapshot,
  and prove that missing network/storage/download caps fail closed.
- Full browser compatibility remains a later engine-port problem. This track
  buys capOS-native web UI and authority-model validation, not Chrome parity.

## Automation And Agent Protocols

### CDP

Chrome DevTools Protocol can instrument, inspect, debug, profile, capture
screenshots, manipulate DOM/runtime/network state, and control browser targets.
It is excellent as a backend and dangerous as a user-facing authority surface.
The tip-of-tree protocol changes frequently and is not compatibility-stable.

capOS implication: a CDP endpoint is equivalent to a broad browser-admin cap.
Only a trusted browser service should hold it. Ordinary agents receive narrowed
typed operations.

### WebDriver BiDi

WebDriver BiDi is a W3C Working Draft for bidirectional remote control of user
agents. It introduces event streaming over WebSocket and includes modules for
browser contexts, browsing contexts, emulation, network, script, and input.

capOS implication: BiDi is a better standards-shaped backend contract than
raw CDP for cross-engine automation, but it still exposes more authority than
most capOS workloads should receive directly.

### Playwright

Playwright operates across Chromium, WebKit, and Firefox and manages specific
browser versions for each Playwright release. It is practical as an early
host-side harness or browser-service backend while capOS lacks native browser
engine support.

capOS implication: use Playwright for development and host-side proof harnesses,
but keep it out of the capOS ABI. The capOS ABI should be the typed
`BrowserSession`/`BrowserProfile` capability surface.

### MCP Browser Tools

MCP standardizes how LLM applications connect to external tools, resources,
and prompts, with explicit consent and tool-safety guidance. Browser tools are
already becoming a common MCP shape: navigate, snapshot, click, type,
screenshot, download, and inspect network state.

capOS implication: the browser capability can export an MCP adapter for
external agents, but MCP is only an adapter. It must not smuggle raw browser,
network, file, or shell authority around the capOS broker.

## Donut Browser Lessons

Donut Browser is an open-source anti-detect browser application with a Tauri
Rust/TypeScript codebase, AGPL app licensing, per-profile isolation, local REST
API, MCP server, proxy/VPN controls, persistent sessions, sync, and engine
choice through Wayfern (Chromium-based) and Camoufox (Firefox-based). Its own
mission page states that the app is open source while the browser-engine
anti-detection components have a mixed proprietary/open-source model.

Useful to adapt:

- Profile manager as the primary product object.
- Per-profile cookies, storage, extensions, fingerprint settings, proxy/VPN,
  and persistent session state.
- Local API and MCP server as automation surfaces.
- Ability to launch a profile and attach Playwright/Puppeteer/Selenium through
  a backend automation endpoint.
- Default-browser routing where each link chooses a profile/context.

Not adopted:

- Anti-detection as a default product promise.
- Closed fingerprint-spoofing logic as a security dependency.
- Treating "looks like a real device" as a capOS correctness goal.
- Exposing a broad local browser-control API without capability-scoped grants.

capOS replacement framing:

- `BrowserPersona` is explicit policy: user agent, viewport, locale, timezone,
  geolocation, WebRTC exposure, proxy, and storage partition.
- `BrowserProfile` holds state and can be cloned, snapshotted, exported, or
  destroyed through typed caps.
- `BrowserAutomation` is split by operation class, not by one admin token.
- Audits record profile, persona, network route, downloads, uploads, and
  whether a human or agent initiated each action.

## Open Research Gaps

- Which backend should be the first in-capOS visual engine candidate: WPE or
  Servo?
- Which substrate should be tried first for a cap-native document engine:
  Servo WebView components, Ladybird/LibWeb experimentation, SpiderMonkey with
  a custom DOM, Boa for trusted local UI scripting, or QuickJS for tiny proofs?
- How much of a browser profile should be persistent Store state versus
  revocable in-memory session state?
- What is the smallest useful DOM/screenshot/accessibility snapshot for an
  LLM tool that avoids dumping excessive page data into model context?
- How should downloads and uploads preserve provenance and consent across
  browser, shell, and storage caps?
- Can WebDriver BiDi become the only external automation backend, or is CDP
  unavoidable for practical Chromium compatibility?