Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Research: Browser Engines, Document Engines, and Agent Browsers

Survey of mainstream browser engines, embedding paths, automation protocols, and Donut Browser-style profile orchestration for Browser Capability and Agent Web Sessions.

Source Snapshot

Checked on 2026-04-30:

Design Consequences For capOS

  • Do not make a browser engine a near-term kernel or GUI prerequisite. Modern browser engines assume a large userspace substrate: processes, threads, shared memory, timers, files, DNS, sockets/TLS, fonts, image codecs, GPU or software compositing, profile storage, crash handling, and a sandbox.
  • Split browser work into three tracks: agent/shell browser sessions first, a cap-native document engine as the middle target, then visual browser after GUI. The first track can start as a capability wrapper around an external or hosted engine. The middle track validates cap-backed web host APIs over provided document data. The visual-browser track needs compositor, input, fonts, storage, networking, and userspace-driver safety.
  • Treat browser profiles as capability objects. Cookies, local storage, cache, permissions, proxy selection, downloads, and automation endpoints should be held by BrowserProfile/BrowserContext caps, not ambient files under a hidden profile directory.
  • Standardize the agent-facing surface above CDP/WebDriver BiDi, not below it. CDP is powerful and Chromium-specific; WebDriver BiDi is standardizing bidirectional browser automation. capOS should expose a typed, narrowed BrowserSession capability and use CDP/BiDi/Playwright only as backends.
  • Borrow Donut Browser’s useful product ideas – profile isolation, local API, persistent sessions, per-profile proxy/VPN selection, MCP integration, and AI-control hooks – without adopting anti-detection as a capOS goal. Fingerprint, geolocation, locale, proxy, and user-agent choices must be explicit, auditable policy, not stealth defaults.
  • Reuse the project rule “the interface is the permission.” A process with BrowserNavigate can navigate; a process with BrowserReadPage can inspect page state; a process with BrowserInput can click/type; a process with BrowserDownload and a granted DownloadSink can receive downloaded bytes. Bundling all of those into one raw DevTools port would recreate ambient authority.
  • Treat a browser as a shell capability, not as the shell. The native shell or agent runner may hold a browser session and use it as a tool, but browser JavaScript must not directly hold the shell’s file, launch, network, or approval capabilities.
  • Add a middle track for a cap-native document engine: JS, DOM/CSS, layout, rendering, and perhaps WebAssembly over caller-provided document/resource data, with web host APIs backed by explicit capOS capabilities. This is not full internet browsing, but it could power local HTML/CSS/JS apps and test the browser authority model earlier.

Engine Portability Surface

Chromium has the broadest web compatibility and the strongest automation ecosystem. Ozone is the relevant porting layer: it centralizes low-level input and graphics behind platform interfaces, supports runtime platform binding, and expects new platforms to implement an Ozone backend. CEF is the production embedding path for many native applications: it wraps Chromium/Blink behind stable APIs, binary distributions, and release branches tracking Chromium. WebView2 is Microsoft’s Windows embedding product around Edge/Chromium, with evergreen and fixed-version runtime choices.

capOS implications:

  • Best near-term backend for agent/shell usage is an external Chromium family process controlled through CDP, WebDriver BiDi, or Playwright, with capOS wrapping the endpoint as typed caps.
  • A native capOS Chromium port is a very large post-GUI project. The likely port boundary is Ozone plus a capOS sandbox/profile/network/storage backend, not direct Blink surgery.
  • CDP must not be directly handed to ordinary capOS workloads. It exposes navigation, DOM, network, runtime, storage, input, tracing, and debugging authority in one endpoint and has no stable backward-compatibility guarantee for tip-of-tree protocol use.

WebKit / WPE

WebKit’s upstream port model makes ports first-class maintainable units. WebKitGTK and WPE are maintained by Igalia; WPE is specifically designed as a small-footprint embedded WebKit port with a backend architecture, hardware acceleration, GStreamer media, and periodic releases.

capOS implications:

  • WPE is the most plausible visual-browser candidate once capOS has a GUI substrate because it is meant for embedded systems without a full desktop toolkit.
  • WPE still needs a platform backend, graphics/EGL or software fallback, input, fonts, networking/TLS, storage, media dependencies, and an update story. It is not an early shell feature.
  • WebKit’s port/release discipline is useful precedent for a capOS browser backend: keep platform-specific code narrow and upstreamable where possible.

Gecko / GeckoView

Gecko is Firefox’s full web platform: JavaScript, layout, graphics, media, networking, profiles, preferences, principals, and more. GeckoView is Mozilla’s Android embedding library and powers active Mozilla Android browsers. Its API separates GeckoRuntime, GeckoSession, and GeckoView, delegates storage and UI behavior to embedders, and hides internal principals from the public API.

capOS implications:

  • Gecko is credible as an external backend, especially for browser diversity and WebDriver BiDi, but GeckoView itself is Android-specific and not a desktop/no-OS embedding path for capOS.
  • Gecko’s principal model is important precedent: origin/security context is a first-class internal object. capOS should make origin/session policy explicit in its browser capability layer rather than flattening it to URLs.
  • The runtime/session/view split maps cleanly to capOS capabilities: engine/service supervision, per-profile context, and visual surface should be separate authorities.

Servo

Servo is a Rust browser engine with WebView embedding ambitions, WebGL/WebGPU support, modular architecture, parallel layout, and active cross-platform work. It is not yet a mainstream compatibility replacement for Chromium/WebKit/Gecko, but it is closer to capOS’s implementation culture than the large C++ engines.

capOS implications:

  • Servo is the best research-aligned engine to track for a future native capOS engine experiment because Rust and modular embedding fit capOS better than direct Chromium/Gecko ports.
  • It is not the first user-facing browser choice if the goal is broad web compatibility for operators or agents.
  • Servo’s WebView API and crate decomposition are worth watching for a possible BrowserView/BrowserSession backend once capOS has GUI and ordinary userspace dependencies.

Ladybird / LibWeb

Ladybird is building an independent browser engine from scratch, with an alpha target for Linux and macOS in 2026. It uses a multi-process architecture and is focused on standards rather than embedding today. It is valuable prior art for independent engine architecture and process separation, not a near-term capOS dependency.

capOS implications:

  • Track Ladybird for architecture ideas: isolated renderer processes, separate network and image-decoder processes, and specification-driven development.
  • Do not depend on Ladybird for capOS’s browser plan until its API, platform support, and compatibility stabilize.
  • Its “no inherited engine” posture is inspirational but not pragmatic for capOS near-term. capOS should expose capability-native browser APIs while reusing maintained engines underneath.

Cap-Native Document Engine Substrate

A cap-native document engine is a smaller target than a full browser. It executes a document graph supplied by capOS – for example a boot package, Store object, generated UI bundle, or test fixture – and returns a rendered surface, screenshot, event stream, and bounded DOM/accessibility snapshot. Networking, storage, permissions, clipboard, downloads, and device access are not internal browser privileges; they are host bindings backed by separate capabilities.

This track changes the portability question. Instead of asking “which browser can capOS port?”, it asks “which engine pieces can run with capOS as the host environment?”

Servo As A Document Engine

Servo is the closest architectural fit for this middle track. It is Rust, embeddable, modular, parallel, and already presents itself as a WebView-capable engine. The value for capOS is not only memory safety. It is the possibility of treating the embedding API as the boundary where fetch, storage, permission prompts, surfaces, and resource loading are backed by typed caps.

Risks:

  • Servo still brings a large standards surface.
  • API stability and completeness must be checked at implementation time.
  • A WebView embedding API is not the same as a small deterministic document-rendering library; capOS may still need substantial host glue.

Ladybird / LibWeb As A Document Engine

Ladybird’s LibWeb/LibJS stack is attractive as readable independent-engine prior art. Its multi-process browser architecture also maps well to capOS service decomposition. However, Ladybird is focused on building a full browser, not on providing a stable embeddable document engine for external hosts.

capOS should track it for design ideas and perhaps future experiments, but should not treat it as the near-term substrate for local HTML/CSS/JS apps.

SpiderMonkey

SpiderMonkey is Mozilla’s JavaScript and WebAssembly engine, used by Firefox and Servo, and can be embedded in C++ and Rust projects. It is useful if capOS wants a serious JS/Wasm runtime while building DOM/layout/rendering and host bindings separately or while experimenting with Servo components.

The tradeoff is that SpiderMonkey is only the JS/Wasm engine. DOM, CSS, layout, rendering, networking, storage, event loops, Web APIs, and browser security objects remain host responsibilities unless capOS embeds a larger engine.

JavaScriptCore

JavaScriptCore is WebKit’s ECMAScript engine and an optimizing VM with interpreter and JIT tiers. It is a mature engine, but its natural home is inside WebKit. For capOS, JavaScriptCore is most relevant if the visual-browser track chooses WPE/WebKit; it is less obviously attractive as a standalone cap-native document-engine substrate than Servo or a Rust-native JS engine.

Boa

Boa is an embeddable JavaScript engine written in Rust, with actively maintained crates and a focus on ECMAScript conformance. It is attractive for capOS experiments because it is Rust, smaller than the mainstream browser JS engines, and easier to embed in native services.

The tradeoff is compatibility and performance. Boa is a plausible substrate for trusted/local UI scripting or early host-binding proofs, not a replacement for the JS engine in a general web browser.

QuickJS

QuickJS is a small embeddable JavaScript engine. It is useful as a reference for tiny host-controlled JS runtimes and deterministic local scripting. It is not a DOM/layout/rendering engine and should not be mistaken for browser compatibility.

Consequences

  • A cap-native document engine should start with local/trusted bundles, not arbitrary internet pages.
  • The host API contract matters more than the JS engine choice. fetch, storage, clipboard, downloads, timers, workers, and Wasm imports must all be explicit cap-backed facets.
  • The first proof can be intentionally small: render a packaged HTML/CSS/JS dashboard or demo UI, capture a screenshot and accessibility/DOM snapshot, and prove that missing network/storage/download caps fail closed.
  • Full browser compatibility remains a later engine-port problem. This track buys capOS-native web UI and authority-model validation, not Chrome parity.

Automation And Agent Protocols

CDP

Chrome DevTools Protocol can instrument, inspect, debug, profile, capture screenshots, manipulate DOM/runtime/network state, and control browser targets. It is excellent as a backend and dangerous as a user-facing authority surface. The tip-of-tree protocol changes frequently and is not compatibility-stable.

capOS implication: a CDP endpoint is equivalent to a broad browser-admin cap. Only a trusted browser service should hold it. Ordinary agents receive narrowed typed operations.

WebDriver BiDi

WebDriver BiDi is a W3C Working Draft for bidirectional remote control of user agents. It introduces event streaming over WebSocket and includes modules for browser contexts, browsing contexts, emulation, network, script, and input.

capOS implication: BiDi is a better standards-shaped backend contract than raw CDP for cross-engine automation, but it still exposes more authority than most capOS workloads should receive directly.

Playwright

Playwright operates across Chromium, WebKit, and Firefox and manages specific browser versions for each Playwright release. It is practical as an early host-side harness or browser-service backend while capOS lacks native browser engine support.

capOS implication: use Playwright for development and host-side proof harnesses, but keep it out of the capOS ABI. The capOS ABI should be the typed BrowserSession/BrowserProfile capability surface.

MCP Browser Tools

MCP standardizes how LLM applications connect to external tools, resources, and prompts, with explicit consent and tool-safety guidance. Browser tools are already becoming a common MCP shape: navigate, snapshot, click, type, screenshot, download, and inspect network state.

capOS implication: the browser capability can export an MCP adapter for external agents, but MCP is only an adapter. It must not smuggle raw browser, network, file, or shell authority around the capOS broker.

Donut Browser Lessons

Donut Browser is an open-source anti-detect browser application with a Tauri Rust/TypeScript codebase, AGPL app licensing, per-profile isolation, local REST API, MCP server, proxy/VPN controls, persistent sessions, sync, and engine choice through Wayfern (Chromium-based) and Camoufox (Firefox-based). Its own mission page states that the app is open source while the browser-engine anti-detection components have a mixed proprietary/open-source model.

Useful to adapt:

  • Profile manager as the primary product object.
  • Per-profile cookies, storage, extensions, fingerprint settings, proxy/VPN, and persistent session state.
  • Local API and MCP server as automation surfaces.
  • Ability to launch a profile and attach Playwright/Puppeteer/Selenium through a backend automation endpoint.
  • Default-browser routing where each link chooses a profile/context.

Not adopted:

  • Anti-detection as a default product promise.
  • Closed fingerprint-spoofing logic as a security dependency.
  • Treating “looks like a real device” as a capOS correctness goal.
  • Exposing a broad local browser-control API without capability-scoped grants.

capOS replacement framing:

  • BrowserPersona is explicit policy: user agent, viewport, locale, timezone, geolocation, WebRTC exposure, proxy, and storage partition.
  • BrowserProfile holds state and can be cloned, snapshotted, exported, or destroyed through typed caps.
  • BrowserAutomation is split by operation class, not by one admin token.
  • Audits record profile, persona, network route, downloads, uploads, and whether a human or agent initiated each action.

Open Research Gaps

  • Which backend should be the first in-capOS visual engine candidate: WPE or Servo?
  • Which substrate should be tried first for a cap-native document engine: Servo WebView components, Ladybird/LibWeb experimentation, SpiderMonkey with a custom DOM, Boa for trusted local UI scripting, or QuickJS for tiny proofs?
  • How much of a browser profile should be persistent Store state versus revocable in-memory session state?
  • What is the smallest useful DOM/screenshot/accessibility snapshot for an LLM tool that avoids dumping excessive page data into model context?
  • How should downloads and uploads preserve provenance and consent across browser, shell, and storage caps?
  • Can WebDriver BiDi become the only external automation backend, or is CDP unavoidable for practical Chromium compatibility?