Current Status
This page describes current repository behavior, not the full long-term design.
Current Snapshot
capOS boots on x86_64 QEMU, starts a standalone init process from the default
manifest, and runs the native shell plus resident demo services through typed
capabilities. The current operator path starts as an anonymous shell session;
login prompts for username> and hidden password>, validates the selected
bootstrap account through SessionManager and CredentialStore, and upgrades
to a broker-issued operator bundle. Default password-authenticated local
operator sessions do not expire by wall-clock timestamp; they remain intended
to end through logout, terminal/connection/process-tree close, or
administrator revocation. Manifests can still set a non-default operator
lifetime for focused expiry proofs. setup can create a volatile local
operator credential and then follows the same login upgrade path.
The implemented baseline includes isolated processes, user-mode ELF loading, shared-memory capability rings, endpoint IPC, copy/move capability transfer, thread and park primitives, init-owned service spawning, local shell login, focused Telnet shell demo, resident chat/adventure services, and the Paperclips terminal demo. The current selected milestone is GCE Self-Hosted Web UI: the next visible path is serving the remote-session Web UI through the Phase C userspace network stack and proving private GCE reachability before any public endpoint. Installable System is the completed previous selected milestone for the bounded local/QEMU contract: data-region mount, config-overlay merge, generation/rollback machinery, integrated bootable disk, install, first-boot provision, update/rollback, and structural proposal/body wording reconcile have landed. Device Driver Foundation is also complete; its GCP-first provider rollup has live operator-access, selected NIC raw-frame, selected storage I/O, and gVNIC portability evidence. Durable multi-account credential storage, broader account policy, production SSH/WebShell ingress, public GCE ingress/TLS, AWS/Azure providers, broader storage variants, high-throughput NIC work, direct-remapping DMA, and persistence beyond the landed installable data-region and generation paths remain future work.
Recent Status Notes
Update 2026-06-11 19:21 UTC: the GCE Self-Hosted Web UI local readiness wave
is reconciled here. Since the local Web UI L4 proof, the following landed as
local QEMU/cloudboot or no-spend harness evidence only: legacy GCE virtio-net
Web UI serving (make run-cloud-gce-legacy-virtio-webui-serving proves a host
HTTP peer fetching the byte-verified UI bundle over the kernel-brokered legacy
virtio 0.9 runtime that backs the typed Nic cap), a browser-facing hardening
set proved on the L4 gate (single public-origin policy, IAP-aware SameSite
cookie policy, JSON content-type guard, security response headers with a strict
CSP, GFE-range-pinned X-Forwarded-Proto trust, the public /healthz
health-check contract, and in-guest login peer-gate/failure-backoff hardening),
and a no-spend provider-harness fixture set (the private-proof
--preflight-only mode, private and public proof-evidence validators, the
public ingress resource plan gate, the journal-driven teardown engine, and the
provider-command allowlist gate), each fixture gate driving recording stub
provider CLIs only, with no real provider invocation or mutation on any
path. None of this proves private GCE
reachability, public exposure, TLS custody, or production readiness; the live
private and public proofs stay on hold.
The selected GCE Self-Hosted Web UI evidence ladder is:
- Landed local serve-from-userspace proof:
cloud-prod-userspace-network-stack-smoltcp-local-proofproves a non-qemucloudboot application client using a userspace-servedTcpListenAuthorityfor one local hostfwd TCP request/response. This is local QEMU/cloudboot evidence, not provider reachability. - Landed local IPv4 configuration proof:
cloud-prod-network-stack-dhcp-ipv4-config-local-proofproves DHCP/IPv4 lease, default route, and ARP/neighbor state for the userspace smoltcp network-stack process under local QEMU/cloudboot. - Landed cloudboot Web UI authority inventory:
remote-session-webui-cloudboot-authority-inventoryrecords theremote-session-web-uirequired and forbidden grants, trusted listener/source metadata, browser-visible forbidden markers, and expected local L4 proof markers for the non-qemucloudboot path. - Landed local Web UI L4 proof and hardening set:
cloud-prod-remote-session-web-ui-l4-local-proofservesremote-session-web-uithrough the non-qemucloudboot L4 path with the full fixed-name bundle, browser login, backend-heldSystemInfo, logout/stale failure, manual viewer, and browser-boundary checks. The samemake run-cloud-prod-remote-session-web-ui-l4gate now also proves server-side session hardening, per-connection deadlines, in-guest login peer-gate and failure-backoff hardening, the single public-origin policy, the IAP-aware SameSite cookie policy, the JSON content-type guard, the security response headers and strict CSP, the GFE-range-pinned forwarded-scheme trust, and the public/healthzhealth-check contract. All of this is local QEMU/cloudboot evidence, not private provider reachability or public exposure. - Landed legacy GCE virtio-net serving proof:
cloud-gce-legacy-virtio-webui-serving-local-proofcloses the legacy-virtio serving gap locally:make run-cloud-gce-legacy-virtio-webui-servingproves a host HTTP peer fetching the byte-verified Web UI bundle over the kernel-brokered legacy virtio 0.9 runtime underdisable-modern=on. This closes only the local serving story for the GCE NIC shape; it is not live GCE reachability. - Landed no-spend provider-harness gates: the private-proof harness
--preflight-onlymode, the private and public proof-evidence validators, the public ingress resource plan gate, the journal-driven teardown engine, and the provider-command allowlist gate validate the future private/public proofs’ evidence, resource graph, teardown, and provider-command boundaries against recording stub provider CLIs; no real provider is invoked or mutated on any fixture path. These are local fixture gates only; they authorize no spend, exposure, or provider mutation. A matching public-harness no-spend preflight task is dispatchable future work, not landed. - Private GCE proof:
cloud-gce-private-self-hosted-webui-proofremains on hold on missing firewall IAM against GCE default-deny ingress (shared with the on-hold private ICMP proof) and on per-run billable authorization; the legacy-virtio serving gap was closed locally on 2026-06-11. It must keep the no-public-IP cloudboot posture and prove a private probe that crosses the live GCE NIC. - On-hold public ingress/TLS proof:
cloud-gce-public-self-hosted-webui-ingress-tlsstill requires the private proof first and explicit public-exposure approval. The local plan/teardown/evidence/allowlist gates above bound that future run but are not exposure authorization. The status above does not authorize public IPs, firewall widening, DNS, certificate issuance, TLS key custody, browser production readiness, or a release. The capOS-terminated TLS successor (cloud-tls-self-hosted-webui-terminated-endpointand the on-hold Let’s Encrypt direct-termination proof) is a separate later evidence class behind the provider-terminated first public proof.
Update 2026-06-08 10:21 UTC: local bounded ICMPv4 Echo Reply diagnostics have
landed for the Phase C userspace network stack. make run-cloud-prod-icmp-echo-reply first reruns the served
TcpListenAuthority local proof, then boots the ICMP manifest, acquires the
QEMU SLIRP DHCP lease 10.0.2.15/24, proves same-subnet ARP and ICMP Echo
Request / Echo Reply preservation for identifier 0x04d2, sequence 9, and a
23-byte payload, and rejects bad-checksum, invalid-code, truncated,
address-family, oversized-payload, and oversized-frame controls. This answers
bounded local ping for diagnostics only; it is not Web UI readiness and does
not open public ICMP or change GCE firewall posture.
Update 2026-06-08 08:24 UTC: Phase C slice 7c-ii(b) has a local
serve-from-userspace proof, and the legacy kernel socket-path grant is retired
for non-qemu production manifests. make run-cloud-prod-userspace-network-stack-smoltcp boots the non-qemu cloudboot
manifest, starts a userspace smoltcp network-stack service, grants an
application client only Console plus a served TcpListenAuthority, and
completes one hostfwd TCP request/response through served
TcpListener/TcpSocket caps while preserving host_physical_user_visible=0.
Non-qemu manifests that request kernel network_manager or
tcp_listen_authority now fail closed instead of reaching
kernel/src/virtio_stub.rs; remaining kernel socket grants are qemu-only
fixtures. DHCP/IPv4 configuration, Web UI L4, private GCE reachability, public
ingress/TLS, and kernel smoltcp/virtio-net cleanup remain separate tasks. The
current GCE Self-Hosted Web UI evidence ladder lives in the
2026-06-11 19:21 UTC update above.
Update 2026-06-07 18:20 UTC (through commit 12b8334a, committed
2026-06-07 18:19 UTC): Installable System is closed for the bounded local/QEMU
contract. The closeout reconciles the proposal, backlog, proposal index,
roadmap, and status page to the landed data-region, overlay, generation,
install, provision, and update/rollback proofs while preserving the RAM-only
Namespace caveat and leaving secure boot/signing, production release
authority, public ingress, broader provider support, full userspace smoltcp/L4
readiness, and full durable account policy as future work. The selected
milestone is now GCE Self-Hosted Web UI.
Update 2026-06-07 08:23 UTC (commit ef8d98c2): Device Driver Foundation
production-authority closeout is recorded by
ddf-production-authority-closeout. The closeout ties together the landed
provider-driver, interrupt, audit, and DMA-policy prerequisites and keeps
public ingress, AWS/Azure support, direct-remapping hardware, device-autonomous
MSI-X, high-throughput NIC, and userspace smoltcp/L4 readiness as future
follow-up work.
Update 2026-06-07 05:26 UTC: the GCP-first usable-instance provider rollup is
closed by cloud-usable-instance-provider-nic-storage. The rollup cites real
GCE serial-console operator access (1779868872-2424), live legacy virtio-net
raw-frame provider-nic-bound (1780412056-e1cb), live NVMe Persistent Disk
brokered READ (1780806087-bf69), and separate live gVNIC raw-frame /
typed-Nic portability evidence (1780794927-1aa9, 1780796615-decc). This is
not a public L4 ingress, SSH/WebShell, AWS/Azure, high-throughput NIC, broader
storage, direct-remapping DMA, or production cloud-image release claim.
Update 2026-05-23 16:51 UTC (commit c86374f8): make run-ddf-provider-consumer now has a stable userspace virtio-net provider
closeout proof line. The line ties together selected queue 1 TX
descriptor/avail/doorbell/used-ring/CQ ownership across the full QEMU TX queue
depth, bounded queue 0 RX synthetic-token CQ identity, selected TX/RX
MSI-X/LAPIC wait/ack/EOI, selected-route reset/reassignment, teardown,
stale-handle blocking, and explicit no-silent-provider-fallback boundaries.
This remains local bounded provider evidence over manager-owned bounce buffers:
live hardware RX used-ring ownership, full virtio-net ownership, direct
DMA/IOMMU, cloud NIC/storage readiness, and virtio block/storage drivers remain
future work.
Update 2026-05-23 13:36 UTC (commit e248d42b): make run-ddf-provider-consumer now exercises selected userspace virtio-net TX CQ
ownership across the full eight-entry TX queue depth used by the smoke. Eight
manager-owned bounce buffers can be live before the first completion, the ninth
allocation fails closed, wrong-order completion of descriptor 7 preserves
descriptor 0, CQ identity is delivered and acknowledged in order for
descriptors 0 through 7, release drains seven incomplete descriptors as
teardown-only, and provider TX release retires seven delivered but
unacknowledged CQ events. This remains bounded selected-queue evidence over
bounce buffers: live hardware RX used-ring ownership, direct DMA/IOMMU, full
userspace virtio-net ownership, and cloud NIC/storage readiness remain open.
Update 2026-05-12 16:40 UTC: make run-ddf-provider-consumer now exercises
the selected userspace virtio-net TX path through bounded queue 1
descriptor/avail publication, exactly one selected notify doorbell, and a
runtime-visible tx_interrupt completion event tied to the used-ring handoff.
The selected submit path validates the live DMABuffer record, scrubs the
bounce page, consumes the live no-write notify_mmio policy, publishes the
stored descriptor/avail entry, and rings the notify doorbell only after those
gates pass. DMABuffer.completeDescriptor then observes the real TX used-ring
entry for the stored software descriptor generation, clears the manager
in-flight record, and delivers a bounded
selected-tx-used-ring-completeDescriptor event to a live tx_interrupt.wait
for the same route. This is still selected-route proof coverage rather than
full userspace virtio-net ownership: arbitrary doorbells, production NIC or
storage migration, cloud readiness, hardware IRQ ownership, hardware
acknowledgement/mask/unmask, direct DMA, IOMMU programming, broader CQ
ownership, and grantable device ownership remain open.
Update 2026-05-11 14:39 UTC, commit f04a14f4: make run-ddf-provider-consumer now turns the selected userspace virtio-net TX
doorbell gate into an explicit staged claimed-notify-offset admission proof.
The selected queue 1 provider entry reports accepted notify-offset policy,
blocked wrong-queue policy, blocked wrong-offset policy, and no_doorbell=true
after descriptor authority validation and submit scrub. The same smoke submits
queue 0 first and proves it remains neutral rather than selected/backend
doorbell-capable. This remains bounded manager-owned bounce-buffer evidence
only: no virtio-net notify BAR handle is granted, no notify register is
written, no real virtio-net descriptor ring is mutated, and production
userspace NIC or cloud readiness is not claimed.
Update 2026-05-11 12:01 UTC: make run-ddf-provider-consumer now extends the
bounded provider-visible submit effect with a selected provider-owned queue
entry. Accepted DMABuffer.submitDescriptor still validates descriptor
authority and scrubs the manager-owned bounce page first, then writes queue
magic, queue id, tail, descriptor id, submitted length, and flags before the
submit marker. The focused smoke maps the buffer after completion and proves
the queue entry and marker remain visible outside the completed byte range; it
also rejects submits shorter than the full 24-byte provider-effect footprint
with zero in-flight accounting and no provider mutation. This is bounded
bounce-buffer evidence only: no hardware
descriptor ring or CQ is published, no direct DMA or IOMMU/remapping is
enabled, host physical/IOVA values stay hidden, no MMIO doorbell is written,
and provider-driver IRQ consumption plus cloud NIC/storage readiness remain
future work.
Update 2026-05-11 11:22 UTC: make run-ddf-provider-consumer now proves a
bounded descriptor-ring-equivalent provider side effect after
DMABuffer.submitDescriptor authority validation. The accepted submit path
scrubs the manager-owned bounce page, writes a provider-visible shadow
descriptor entry with magic, queue, descriptor id, submitted length, and
flags, and then writes the existing submit marker before the in-flight record
is committed. The same process maps the buffer after
DMABuffer.completeDescriptor and proves the shadow descriptor and marker
remain visible outside the completed byte range. A follow-up boundary check
rejects submits shorter than the 24-byte provider-effect footprint as
dmabuffer-provider-effect-too-short, preserving zero in-flight accounting and
blocking side effects. This is still bounded bounce-buffer evidence: no
hardware descriptor ring or CQ is published, no direct DMA or IOMMU/remapping
is enabled, host physical/IOVA values stay hidden, arbitrary MMIO doorbells
remain blocked, and provider-driver IRQ consumption plus cloud NIC/storage
readiness remain future work.
Update 2026-05-11 10:46 UTC: make run-ddf-provider-consumer now proves the
first bounded provider-visible DMA side effect in the four-cap provider
consumer. On accepted DMABuffer.submitDescriptor, the manager-owned bounce
page is scrubbed, then a submit marker is written only after descriptor
authority validation succeeds. The same process maps the buffer after
DMABuffer.completeDescriptor and proves the completion pattern is limited to
the completed byte range while the submit marker outside that range remains
visible. This is still bounded bounce-buffer evidence: no descriptor ring or
CQ is published, no direct DMA or IOMMU/remapping is enabled, host
physical/IOVA values stay hidden, arbitrary MMIO doorbells remain blocked, and
provider-driver IRQ consumption plus cloud NIC/storage readiness remain future
work.
Update 2026-05-11 10:06 UTC: commit c52064c0 extends the same
make run-ddf-provider-consumer four-cap provider-consumer smoke beyond
bounded DMABuffer submit/complete accounting. The service now also calls
brokered DeviceMmio.read32, the existing claimed-register
DeviceMmio.write32 path, and brokered readback before DeviceMmio.unmap.
The interrupt half proves one async Interrupt.wait completes as delivered
after route unmask, a second async wait stays pending for a kernel turn, and
Interrupt.mask completes that second waiter as cancelled. This remains
bounded provider-authority composition evidence only: DMA is still the
manager-owned bounce-buffer path, MMIO writes remain limited to the claimed
register policy with no arbitrary doorbell, IRQ behavior is bounded
route-generation-checked waiter delivery/cancellation, and production
NIC/storage migration, IOMMU/remapping, descriptor-ring mutation,
completion-queue publication, provider-driver interrupt consumption, and
cloud readiness remain future work.
Update 2026-05-11 09:25 UTC: make run-ddf-provider-consumer now extends the
same four-cap provider-consumer smoke across the bounded
DMABuffer submit/complete descriptor-accounting path. After allocating and unmapping one
manager-owned bounce buffer, the service calls typed
DMABuffer.submitDescriptor and DMABuffer.completeDescriptor, asserts
manager-inflight-recorded then manager-inflight-completed, and checks
DMAPool.info reports live_inflight=1 after submit and live_inflight=0
after completion before freeing the buffer. This remains bounded
provider-authority composition evidence only: no descriptor ring is mutated,
no CQ is published, direct DMA stays blocked, host physical/IOVA exposure stays
hidden, arbitrary MMIO writes and doorbells remain blocked, and production
NIC/block migration remains future work.
Update 2026-05-11 03:09 UTC: commit 9c0a5183 carries the
manager-owned fixed bounce-buffer DMAPool budget ledger into
DMAPool.allocateBuffer. The manifest-granted three-slot pool now attaches a
device-manager-owned budget policy with three live buffers/pages, 12288 bytes,
four queues, eight descriptors per queue, one in-flight descriptor per live
slot, zero MMIO mappings/bytes, and zero interrupt holds. With all three fixed
slots live, a fourth valid 4096-byte allocation returns no result cap and
reports result=dmapool-budget-exceeded, reason=over-buffer-budget,
sideEffect=side-effect-blocked, and bufferPresent=false before slot
selection, frame allocation, generation allocation, cap minting, or manager
ledger mutation. Imported live virtio-net proof records continue to use the
kernel-owned device_dma:virtio-net budget policy. This remains the bounded
manager-owned bounce-buffer path: direct DMA stays blocked, host physical
addresses and IOVAs stay hidden, descriptor rings and completion queues are
not mutated, and IOMMU/remapping plus production driver consumption remain
future work.
Update 2026-05-11 00:20 UTC: make run-ddf-provider-consumer now boots one
focused service that receives console, DMAPool, DeviceMmio, and
Interrupt in the same CapSet. The smoke uses existing typed runtime clients
and unchanged ABI surfaces: it validates DMAPool.info, allocates one
manager-owned bounce-buffer DMABuffer, maps and unmaps it through the
existing userspace bounce-buffer path, frees it with scrub-before-frame-free
evidence, maps and unmaps a boot-preseeded DeviceMmio BAR page read-only,
and exercises bounded Interrupt.wait, acknowledge, unmask, and mask
on the manager-attached route. The harness asserts the four-cap service spawn,
one manager-grant-source acquire/release for each authority family, and the
stable provider-consumer proof line with
authorities=dmapool,device_mmio,interrupt, no direct DMA, no host
physical/IOVA exposure, no arbitrary MMIO write or doorbell, no real interrupt
delivery requirement, and no production NIC/block migration. This is bounded
composition evidence only; it does not add schema/generated/runtime changes,
IOMMU programming, provider-driver interrupt consumption, or production
storage/network driver migration.
Update 2026-05-10 20:06 UTC: manifest-granted
DeviceMmio.write32(offset, value) now performs a bounded kernel-side
volatile MMIO write after validating the active manager-attached handle,
owner/state, region and policy binding, pure
DeviceMmioOperation::Write authority, a dword-aligned in-BAR range, and the
single PCI MSI-X metadata-derived provider claim, including BDF, BAR, BAR
base, offset, and value. The effect uses only
the boot-preseeded kernel MMIO mapping cache already used by
DeviceMmio.read32; it does not install a post-userspace kernel mapping, and
the userspace BAR VMA remains read-only. The
focused smoke uses the claimed virtio-rng MSI-X entry-0 vector-control mask
dword, reports side_effect=mmio-write-performed and
register_write=performed, then reads the same value back through brokered
read32 and the read-only userspace VMA. It also attempts an unclaimed
message-address dword write and reads back the original value unchanged.
Invalid range and unclaimed-register paths remain typed side-effect-blocked
results, while stale or released handles fail closed before any write and do
not return a write32 result payload. This does not add writable userspace
MMIO, arbitrary register writes, doorbells, host physical/IOVA exposure, IOMMU
programming, or a production provider-driver consumer.
Update 2026-05-10 19:29 UTC: DMABuffer.completeDescriptor now produces a
bounded userspace-visible completion effect on the manager-owned bounce-buffer
page. On the existing valid matching manager-inflight-completed path, after
active owner/epoch/slot validation and submitted-length checks pass, the
manager writes a deterministic byte pattern into the first completionLength
bytes of that slot’s bounce page before clearing the in-flight record. The
focused DMAPool smoke maps the same slot after completion and proves byte 0
and the last completed byte match the pattern while the next byte remains
unchanged. Invalid, stale, no-inflight, mismatched, length-exceeded,
mapped-live, and after-free paths keep their fail-closed labels and do not
write. This is bounce-buffer completion data only: no direct DMA, descriptor
ring mutation, CQ publication, host physical/IOVA exposure, IOMMU programming,
or production driver consumer is added.
Update 2026-05-10 16:47 UTC: the manifest-granted Interrupt.wait path now
has a fixed-table deferred waiter object for the current manager-attached
route. A masked wait still fails closed synchronously as
stale-pending-irq-masked / route-masked / side-effect-blocked, but an
unmasked wait now returns pending. The focused interrupt smoke submits that
wait nonblocking, drives the kernel once, observes it remains pending, calls
Interrupt.mask, then finishes the original wait as
interrupt-waiter-cancelled / route-masked /
waiter-completed-no-irq with wake_blocked=false, matching source/route
generations, unchanged delivery counts, and no real IRQ delivery. This is
no-IRQ cancellation waiter behavior only; hardware acknowledgement, MSI/MSI-X
programming, IRQ-delivery userspace waiters, and production interrupt dispatch
remain future work.
Update 2026-05-10 15:33 UTC: the manifest-granted DeviceMmio.map path now
maps a boot-preseeded MMIO BAR page into the caller’s userspace address space
as read-only, user-accessible, no-execute, no-cache PTEs. Accepted requests
return userspace-mmio-bar-mapped,
boot-preseeded-read-only-bar-page, user-vma-mapped, and a nonzero
page-aligned userspace address; writable, executable, unknown-protection,
zero-size, unaligned, out-of-BAR, overflow, and duplicate active map requests
return typed no-side-effect results. The focused smoke reads the same QEMU BAR
value through the returned userspace address and brokered DeviceMmio.read32,
then exercises explicit DeviceMmio.unmap, a no-op second unmap, remap after
unmap, and stale unmap failure after cap release. Release, drop,
driver-crash, and reset-disable cleanup revoke any borrowed user VMA before the
manager record is detached. This does not add writable MMIO, doorbells,
volatile register writes, host physical/IOVA exposure, post-userspace kernel
MMIO mappings, IOMMU programming, or a production provider-driver consumer.
Update 2026-05-10 14:12 UTC: DMABuffer.submitDescriptor /
DMABuffer.completeDescriptor now keep in-flight descriptor identity on each
live DMABuffer slot instead of one pool-global descriptor. The focused
DMAPool grant smoke proves slot 0 and slot 1 can both be in flight,
duplicate submit on the same slot still fails closed, mismatched completion
preserves both live slot descriptors, completing slot 0 decrements aggregate
DMAPool.info live_inflight from 2 to 1, explicit freeBuffer of the
remaining in-flight slot fails closed, and cap release of an in-flight slot
drains only that slot while preserving another slot’s in-flight accounting.
This remains bounded manager accounting: no descriptor ring is mutated, no CQ
entry is published, no direct DMA is attempted, no IOVA or host physical
address is exposed, and no IOMMU programming or production driver consumer is
added.
Update 2026-05-10 13:45 UTC: commit 3bbeb3d4 makes DMABuffer.unmap
explicitly remove the manager-owned bounce-buffer userspace VMA for the calling
process without freeing or scrubbing the bounce page, detaching the
DMABuffer record, changing DMAPool.info live buffer/page/in-flight
accounting, or touching real DMA state. The typed result reports
userspace-bounce-buffer-unmapped / single-page-bounce-buffer /
user-vma-unmapped when a live mapping is removed, and
dmabuffer-mapping-absent / no-user-mapping with no side effect on a second
unmap. The focused smoke maps slot 0 read-only, rejects a writable remap while
the read-only mapping remains live, unmaps it, proves the second unmap is a
typed no-op, remaps the same VMA writable, and verifies stale unmap after
freeBuffer fails closed like info, map, submitDescriptor, and
completeDescriptor. While VMA teardown is in progress, the cap records an
in-progress mapping state so concurrent map/free/release paths fail closed
instead of observing an absent mapping before the page-table unmap and TLB wait
complete.
Update 2026-05-10 12:49 UTC: commit 28e16431 extends the manifest-granted
DMAPool bounce-buffer allocator from two live result caps to three fixed
manager-owned slots. The focused smoke now proves slot 0, slot 1, and slot 2
can be live at the same time, DMAPool.info reports
live_buffers=3 live_pages=3 live_bytes=12288, a fourth allocation fails
closed as dmapool-already-attached / active-buffer-attached, and freeing
slot 0 while slots 1 and 2 remain live leaves two
committed/resident/unswappable pages. The same smoke reallocates slot 0 with a
fresh generation while slots 1 and 2 remain live, verifies the old cap stays
revoked, explicitly frees the reused slot 0 and slot 2 buffers, and releases
slot 1 while one bounded descriptor submission remains in flight so
parent-first DMAPool release completes only after the last live result
buffer detaches and drains manager-owned accounting. This is a fixed
three-slot bounce-buffer allocator; it still does not expose direct DMA, IOVA
or host physical addresses, descriptor-ring mutation, CQ publication, IOMMU
programming, hostile isolation coverage, or a production driver consumer.
Update 2026-05-10 11:44 UTC: commit 75beeeb8 extends the
manifest-granted DMAPool bounce-buffer allocator from one live result cap to
two fixed manager-owned slots. The focused smoke now proves slot 0 and slot 1
can be live at the same time, DMAPool.info reports
live_buffers=2 live_pages=2 live_bytes=8192, a third allocation fails closed
as dmapool-already-attached / active-buffer-attached, and freeing slot 0
while slot 1 remains live leaves one committed/resident/unswappable page. The
same smoke reallocates slot 0 with a fresh generation while slot 1 remains
live, verifies the old cap stays revoked, and releases the parent DMAPool
before the remaining buffers so the staged pool detach completes only after
the final DMABuffer release drains bounded in-flight accounting. This is a
fixed two-slot bounce-buffer allocator; it still does not expose direct DMA,
IOVA or host physical addresses, descriptor-ring mutation, CQ publication,
IOMMU programming, hostile isolation coverage, or a production driver
consumer.
Update 2026-05-10 10:56 UTC: commit 9659763e adds typed
DeviceMmio.write32(offset, value) admission on the manifest-granted
DeviceMmio cap. The kernel validates the active manager-attached handle,
owner/state, region and policy binding, DeviceMmioOperation::Write
authority, and 32-bit aligned in-BAR range before returning
admission-check-only, real-mmio-write-not-programmed, and
side-effect-blocked with register_write=blocked. The focused smoke
asserts accepted-shaped admission, unaligned/out-of-BAR/overflow
mmio-write32-range-invalid denials, stale-after-release failure, and two
sequential grant-cycle runs. This is an admission proof only; no volatile
register write, userspace BAR mapping, doorbell, host physical exposure,
IOMMU programming, or production driver consumer is added.
Update 2026-05-10 10:20 UTC: commit 3777a50d gives the
manifest-granted proof-buffer DMABuffer descriptor path an explicit
single in-flight descriptor identity. After one valid submitDescriptor,
the focused grant smoke now proves a duplicate submit for the same
queue/descriptor returns dmabuffer-descriptor-already-inflight with
side-effect-blocked, and a valid-shaped completion for a different live
descriptor returns dmabuffer-inflight-descriptor-mismatch with
side-effect-blocked. Both refusals preserve DMAPool.info
live_inflight=1; the matching completion still restores it to 0. This
is still bounded manager accounting only, not descriptor-ring mutation, CQ
publication, direct DMA, IOVA export, or a production driver consumer.
Update 2026-05-10 05:03 UTC: unsupported-protection DeviceMmio.map
requests now stay on the typed admission result path. The focused grant smoke
decodes range_result=mmio-map-prot-invalid,
range_reason=unsupported-map-prot, range_side_effect=side-effect-blocked,
addr=0, and unchanged manager identity fields instead of treating executable
or missing-read protections as a capability exception. This remains
admission-only evidence; no real BAR mapping, register access, doorbell write,
or host physical address exposure is added.
Update 2026-05-10 09:36 UTC: a harness-hardening follow-up, commit
7dfa1d65, strengthens make run-dmapool-grant around that bounded
userspace bounce-buffer map. The current focused smoke maps the first slot
generation read-only, proves the zeroed page is readable, and asserts a
same-cap writable remap attempt fails while that read-only mapping remains
live. It then writes and reads a marker through the second slot generation’s
read-write mapping while preserving the existing typed partial-range,
protection, and free/release cleanup assertions. This is a stronger proof of
the existing mapping permission contract, not new direct DMA, IOVA, host
physical, descriptor-ring, CQ, IOMMU, hostile-isolation, or production-driver
authority.
Update 2026-05-10 08:53 UTC: the manifest-granted DMABuffer.map path now
maps the single manager-owned bounce-buffer page into the caller’s userspace
VMA. Accepted readable full-page requests return
userspace-bounce-buffer-mapped, single-page-bounce-buffer, and
user-vma-mapped with a nonzero page-aligned userspace address while still
reporting real_dma_mapping=not-programmed, direct_dma=blocked, and
host_physical_user_visible=false. The feature slice proved the mapped page
could be reached from userspace, kept typed partial-range/protection denials
at addr=0, and proved DMABuffer.freeBuffer / cap release revoke the user
mapping before the bounce page is scrubbed and freed. This is userspace access
to the kernel-managed bounce buffer only; it does not expose IOVA, host
physical addresses, direct DMA, descriptor-ring mutation, CQ publication,
IOMMU programming, or a production driver consumer.
Update 2026-05-10 06:37 UTC: the manifest-granted DMAPool.allocateBuffer
and DMABuffer.freeBuffer path now uses production bounce-buffer labels for
the single-page userspace allocation/free authority. A valid 4096-byte
request still mints exactly one same-session DMABuffer result cap, but the
cap surfaces now report userspace_dmapool=manager-issued-bounce-buffer,
allocation=single-bounce-buffer-page,
record_pool=userspace-bounce-buffer-live, and
free_buffer=bounce-buffer-page; zero-live records report
zero-live-dmapool-bounce-buffer, and oversized requests fail as
size-exceeds-bounce-buffer. The backing frame remains ledger-owned,
resident, unswappable, scrubbed before frame free, and hidden from userspace.
That slice advanced allocation/free authority; the later DMABuffer.map
slice above adds userspace bounce-buffer VMA access while descriptor methods
still only update bounded manager accounting, and no IOVA, host physical
address, CQ publication, descriptor-ring mutation, IOMMU programming, or
production driver consumer is added.
Update 2026-05-10 04:40 UTC: invalid-size DMAPool.allocateBuffer requests
now use the same typed no-result-cap rejection shape as duplicate-active
requests. Zero-size and over-bounce-buffer calls return
result=dmapool-allocation-request-invalid, the exact request reason,
side_effect=side-effect-blocked, buffer_present=false, no result cap, and
no page mutation instead of relying on a capability exception string.
Update 2026-05-10 04:27 UTC: the manifest-granted Interrupt.mask and
Interrupt.unmask methods perform bounded route-state control over the
manager-attached dispatch slot. unmask changes claimed-masked to
driver-unmasked, mask changes it back to claimed-masked, both preserve
delivery counts, and release masks retained manager-grant routes before
detaching. The 2026-05-10 16:47 UTC status above supersedes the original
post-unmask wait placeholder with deferred no-IRQ cancellation behavior.
Update 2026-05-10 03:23 UTC: DMAPool.allocateBuffer now reports the
duplicate-active bounded proof-buffer rejection as typed result data instead
of requiring userspace to infer the label from a capability exception string.
When the first proof DMABuffer is still live, the second valid-size
allocation returns no result cap and reports result=dmapool-already-attached,
reason=active-buffer-attached, side_effect=side-effect-blocked, and
buffer_present=false; the smoke then proves DMAPool.info still reports one
live 4096-byte proof frame. This remains bounded proof-buffer guard evidence,
not real multi-buffer DMA allocation or production userspace DMA authority.
Update 2026-05-10 02:52 UTC: the focused DMAPool grant smoke now proves
active duplicate allocation is refused while the single proof buffer is still
attached. The smoke calls DMAPool.allocateBuffer a second time after the
first result DMABuffer becomes live, requires the failure path, then re-reads
DMAPool.info to prove the record still reports exactly one live 4096-byte
proof frame. This is a bounded proof-buffer guard only; it does not add real
multi-buffer allocation, DMA mappings, IOVA/physical exposure, or production
driver authority.
Update 2026-05-10 02:21 UTC: DMAPool.info now reports the live manager
record accounting for the bounded proof-buffer path. The manifest-granted
DMAPool starts as zero-live-dmapool-proof, moves to
synthetic-live-dmapool-proof with one live 4096-byte page while the
manager-attached DMABuffer result cap is active, and returns to zero-live
after typed DMABuffer.freeBuffer scrubs/releases the proof frame. The
focused grant smoke asserts the live and after-free accounting lines through
the typed runtime client. This remains bounded proof-buffer accounting only;
it does not implement real device-visible DMA mappings, IOVA/physical address
exposure, production descriptor side effects, or production page lifecycle.
Update 2026-05-09 23:52 UTC: the manifest-granted Interrupt skeleton now
also has bounded Interrupt.mask and Interrupt.unmask admission methods.
They validate the current manager-attached route through the existing
Mask/Unmask authority paths and return typed no-side-effect labels while
proving route state and delivery counts stay unchanged. This does not
implement real route mask/unmask mutation, hardware acknowledgement, blocking
userspace waiters, MSI/MSI-X table programming, or real interrupt delivery.
Update 2026-05-09 23:21 UTC: the manifest-granted Interrupt skeleton now
also has a bounded Interrupt.acknowledge admission method. It validates the
current manager-attached route through the existing Acknowledge authority
path and returns typed no-side-effect labels without acknowledging hardware,
waking waiters, or changing delivery counts. This does not implement real
interrupt acknowledgement, blocking userspace waiters, real mask/unmask route
mutation, or real interrupt delivery.
Update 2026-05-09 19:18 UTC: the manifest-granted Interrupt skeleton now
has a bounded Interrupt.wait admission method. It validates the current
manager-attached route, delegates to the shared pending-IRQ token validator,
and returns typed masked-route labels without waking a waiter or advancing
delivery counts. This does not implement blocking userspace waiters,
hardware acknowledgement, real route mask/unmask mutation, or real interrupt
delivery.
Update 2026-04-28 22:02 UTC: normal shell client @... grants reject explicit
badge N selector syntax and preserve delegated client endpoint identity when
the selector is omitted. Low-level and hostile-path tests still carry explicit
selector fixtures.
Update 2026-04-29 05:59 UTC: the focused chat manifest now routes the kernel
singleton chat_endpoint through init to the resident chat server, and the
focused chat shell no longer receives a manifest-forwarded chat service
export. Its normal chat authority comes from the broker-issued operator shell
bundle, matching the default and remote shell paths while the resident bot
keeps its manifest service grant.
Update 2026-04-29 07:35 UTC: Session-Bound Invocation Context core gates are landed. Implemented pieces include the process-session invariant, endpoint caller-session metadata, stale normal endpoint rejection, transfer scopes, field-granular disclosure gating, session expiry for broker-issued shell bundle caps, guest bundle narrowing, chat session-keyed membership, and Aurelian player state keyed by live endpoint caller-session metadata.
Update 2026-05-01 08:47 UTC: default password-authenticated local operator
sessions now mint with no wall-clock expiration. Short-expiry operator proofs
remain available by setting a non-default sessionLifetimes.operatorMs in the
manifest.
Update 2026-04-29 09:44 UTC: later Gate 4 cleanup put terminal output behind
live caller-session dispatch, bound shell-serviced stdio bridge waits to opaque
live caller-session metadata, removed remaining badge-facing service-common
handler APIs from normal chat paths, and widened non-adventure endpoint
caller-session opaque references to 128 bits while preserving the
scoped_ref ABI field as the low half.
Update 2026-04-29 10:20 UTC: non-adventure endpoint caller-session references
now use an entropy-backed boot secret and HMAC-SHA256 over a non-reused
endpoint service-scope id plus kernel session id. The ABI layout is unchanged,
but scoped_ref is no longer value-compatible with the old unkeyed hash.
References rotate on reboot and endpoint object replacement.
Update 2026-04-29 21:40 UTC: Gate 4 of the Session-Bound Invocation Context
milestone is implemented and verified on mainline. Commit faeff80 at
2026-04-29 21:39 UTC records the final closeout: normal chat, adventure,
terminal, and stdio paths no longer expose caller-selected receiver identity,
focused make run-adventure passed with session-bound Adventure/chat service
grants, and focused make docs passed after docs PDF render hardening. The
paper/status alignment now records the session-bound shared-service evidence
as landed. Remaining work in this area is future stable service-audit identity
across upgrades, not additional shared-service migration.
Implemented
Visible Milestone Proofs
- First Packet: commit
b56a5c1at2026-04-24 15:37 UTC. - First HTTP: commit
a4f1722at2026-04-24 16:47 UTC. - The Unprivileged Stranger: commit
d4016abat2026-04-22 16:35 UTC. - Native Cap Shell: commit
f554e88at2026-04-23 08:41 UTC. - Boot to Shell: commit
e5adafbat2026-04-23 13:39 UTC. - The Revocable Read: commit
7f19af2at2026-04-23 16:15 UTC. - First Chat: commit
2cd85a8at2026-04-24 00:13 UTC. - Local MUD: commit
add7f9bat2026-04-24 01:40 UTC. - Verified Core: commit
d43b691at2026-04-23 22:09 UTC. - Ring as Black Box: commit
da5f5e9at2026-04-24 03:13 UTC. - First AP Scheduler: commit
d88bca7at2026-04-25 11:31 UTC. - Telnet Shell Demo: commit
2834bfcat2026-04-25 20:25 UTC. Demo scope: plaintext, loopback-only research demo proving theTerminalSession/SessionManager/AuthorityBroker/RestrictedShellLauncherboundary over a real TCP socket; not a shippable Telnet service. Production remote shell tracks the SSH Shell Gateway in SSH Shell Gateway. - Multi-Process SMP Concurrency: commit
3fb89923at2026-04-30 09:45 UTC. - Default Run Telnet Wiring Retired: commit
367117beat2026-05-01 16:54 UTCremoves default host-local Telnet gateway forwarding and the default manifest service. Currentmake runstarts the foreground shell, chat service, and remote-session CapSet gateway, and forwards only the remote CapSet endpoint. The plaintext demo was later retired with qemu-only kernel TCP listener removal;make run-telnetnow exits before QEMU with a retirement diagnostic. Earlier commit7a155f4at2026-04-26 21:02 UTCmoves Telnet IAC filtering into the kernel socket terminal (best-effort silent swallow for the BSD/netkit clients we test against; no WONT/DONT replies) so a normal Telnet client lands at the shell prompt without a userspace pre-handoff recv, and refactors the gateway to loop accept/handoff/launch_shell/wait per connection so repeated host Telnet connections succeed. - Service Object Routing/Lifecycle: commit
a4655f0at2026-04-28 14:10 UTC;make run-service-object-routingproves trusted service-object minting, receiver-cookie dispatch, payload-spoof rejection, copy/move IPC transfer, nested spawn delegation, generation-checked receiver cookies, close/revoke rejection, and stale-cookie rejection after record reuse. This is now historical low-level coverage: the implemented Session-Bound Invocation Context baseline gives normal workload processes one immutable session context, and endpoint subject disclosure is private by default. - Session Context Invariant: commit
3edee90at2026-04-28 16:26 UTCaddsmake run-session-context, proving every spawned process has one immutable session context, raw child spawns inherit the caller context, a copiedUserSessioncap cannot relabel invocation context, and a broker-issued launcher can select a validated child context while mismatched profile requests fail closed. Commit3469c27at2026-04-28 16:54 UTCextends the proof so expired guest-session bundle refreshes fail closed at the broker. Commit687511aat2026-04-28 17:43 UTCadds privacy-preserving endpoint caller-session metadata and stale normal endpoint rejection: endpoint servers receive only a service-scoped opaque caller-session reference, epoch, and live/stale flags by default, spoofeduser/session/rolepayload labels do not affect the delivered invocation context, and calls after the process session expires fail before transfer preparation or enqueue. Commitf0cb74bat2026-04-28 18:38 UTCadds session-aware cap transfer scopes: same-session-only caps cannot cross into another session, explicitly shareable caps may cross and then invoke under the receiver session, and service-regrant-only caps require a trusted fixed-session broker/launcher path. Commit0f92d77at2026-04-28 19:33 UTCadds explicit endpoint subject disclosure gating: request without scope and scope without request expose no subject fields, request plus matching scope exposes only allowed fields, and broader requests are narrowed. Commitdc7ece4at2026-04-28 20:06 UTCmigrates the chat demo to session-keyed membership: chat member state is keyed by the endpoint caller-session reference, the focused chat manifest no longer assigns static chat badges, andmake run-chatproves operator-session chat clients plus rejected delegated endpoint relabeling. A follow-up review fix keeps join handles as request data and uses service-assigned visible member labels. 2026-04-28 20:48 UTCnarrows guest shell bundles: guest sessions require an explicit manifest guest seed, guest bundles receive no default chat/adventure service endpoint caps, and guest launcher policy comes from resource-profilelauncherProfilerather than the full manifest binary list.
Boot and Kernel Baseline
- Limine boots the x86_64 kernel in QEMU.
- The kernel initializes dual UART output, GDT, IDT, LAPIC, syscall MSRs, memory management, page tables, heap allocation, and the global capability registry. The legacy PIC/PIT path remains as a fallback when LAPIC timer setup or PIT-based calibration is unavailable.
- User page-table map, unmap, and protect operations are routed through a TLB shootdown helper keyed by address-space CPU residency. Remote targets get pending full-TLB flush generations plus vector-49 IPIs, and the sender waits for observed target completion after ring dispatch releases address-space, cap-table, and scratch locks. Deferred queue slots are reserved before page-table mutation, and drains flush the current CPU before waiting. Delayed maskable interrupt delivery is covered by syscall-entry and flush-before-user-return hooks. Scheduler CR3 handoff marks the current CPU resident, including AP cpu=1 during the first AP scheduler-owner proof, so remote shootdown targets become active when an address space has run on more than one CPU.
- AP cpu=1 can own scheduler/user execution under
-smp 2: APs register theirPerCpurecords, program LAPIC timers from the BSP calibration, update AP TSS.RSP0 during context switches, and enter the scheduler from the AP idle loop when AP timer setup succeeds. This proof keeps one scheduler owner; when AP cpu=1 is online with a programmed timer, the BSP stays in kernel idle so the process-wide capability ring is not executed concurrently. - The Multi-Process SMP Concurrency milestone is complete at commit
3fb89923(2026-04-30 09:45 UTC). The scheduler tree includes a narrow reschedule-IPI wake path for halted scheduler-owner loops, andmake run-smp-process-scalebuildscapos-smp-process-scale.isofromsystem-smp-process-scale.cue, runs repeated-smp 1,-smp 2, and best-effort 4-vCPU QEMU cases, parses compact verified timing lines, stores raw serial logs undertarget/smp-process-scale/<timestamp>/, and enforces the 1.6x median speedup threshold when KVM-backed evidence is available. The accepted run intarget/smp-process-scale/cycle-balanced-default/recorded1.608x1-to-2 speedup. A latercapos-benchnested-QEMU/KVM rerun on GCEn2-highcpu-8at commit0d89a91b(2026-04-30 11:09 UTC) pinned QEMU to host CPUs0,1,2,3and recorded1.873xcapOS 1-to-2 speedup; the matching Linux guest baseline under the same CPU pinning recorded1.934x. The same run recorded capOSsmp4=1111scaled cycles, or1.475xfrom the 1-vCPU baseline but slower than the 2-vCPU median, while Linux recorded3.774x1-to-4 speedup; capOS therefore still claims only the 1-to-2 milestone gate. The closeout also reran ordinaryrun-smokeandrun-spawnunder-smp 2, with logs intarget/smp2-smokes/, covering the default manifest, ring, thread lifecycle, park cleanup, generic child waits, and process exit. - The kernel creates its own page tables with per-section permissions and keeps the higher-half direct map for physical memory access.
- SMEP/SMAP are enabled when the QEMU CPU advertises support.
Code: kernel/src/main.rs, kernel/src/arch/x86_64/, kernel/src/mem/.
Validation: cargo build --features qemu, make run-smoke.
Process and Userspace Runtime
- Processes have isolated address spaces, one or more internal Thread records with per-thread kernel stacks and saved CPU context, CapSet bootstrap pages, capability rings, and local capability tables.
- ELF loading supports static no_std userspace binaries and TLS setup.
capos-rtowns the userspace entry path, allocator initialization, ring-client access, typed clients, result-cap parsing, and owned-handle release.capos-rtis the only source owner for the userspace_start, panic, global allocator, raw syscall, andcapos_rt_mainhandoff surfaces; a source check guards this split.targets/x86_64-unknown-capos.jsondefines the capOS userspace target for bootedinit,demos,shell, andcapos-rtruntime builds; the kernel default remainsx86_64-unknown-none.- The 7.1.0 in-process threading contract defines the split between
process-owned address-space/capability state and thread-owned execution
state, plus thread/kernel-stack quotas and generation-checked waiter
identity. 7.2.0 moved saved context, kernel stack, FS base, and block state
into
Threadrecords; 7.2.1 schedules and wakes generation-checkedThreadRefvalues; 7.2.2 adds process-localThreadSpawnerandThreadHandlecaps plusThreadControl.exitThreadfor create, join, detach, self-join rejection, exit-code observation, and last-thread process exit; and 7.2.3 adds private ParkSpace wait/wake with timeout, wake, and reserved waiter completion semantics. SharedParkSpace park-words remain future work.
Code: kernel/src/spawn.rs, kernel/src/process.rs, capos-rt/src/,
init/src/main.rs, demos/, shell/, targets/x86_64-unknown-capos.json,
tools/check-userspace-runtime-surface.sh.
Design: In-Process Threading, Park Authority.
Validation: tools/check-userspace-runtime-surface.sh,
make capos-rt-check, make init-capos-build, make demos-capos-build,
make shell-capos-build, make capos-rt-capos-build, make run-smoke,
make run-spawn.
Programming Language Support
- Native capOS Rust is the only implemented booted Rust language path. It uses
#![no_std],alloc,capos-rt, static ELF binaries, and thetargets/x86_64-unknown-capos.jsoncustom target. - Native C boots through the libcapos C-substrate (Phase 0;
make run-c-helloexercises Console + Timer + EntropySource + a 4 KiB anonymous VM roundtrip) and through the POSIX adapter (Phase P1.2 Phase B; the historicalmake run-posix-dns-smokeresolvedexample.comover the qemu-only kernelUdpSocketcap via QEMU slirp DNS at 10.0.2.3:53, but that target is retired after kernel socket-owner removal;make run-posix-pipe-smokeandmake run-posix-spawn-smokeexercise pipe, fork-for-exec, directposix_spawn, minimal file actions, read, and waitpid overProcessSpawner/Pipe; the Console-backed stdio proof landed at commitaa6a56d7(2026-05-13 11:03 UTC) andmake run-posix-stdio-smokeexerciseswrite(1, ...)andwrite(2, ...)over a granted Console while provingread(0, ...)stays closed without a stdin grant; the file/directory fd closeout landed at commitf97d9833(2026-05-23 06:23 UTC) andmake run-posix-fileexercisesopen(),write(),lseek(),read(),opendir(),readdir(), andclosedir()over a granted rootDirectory;make run-posix-printfexercises the focused printf/string subset: formatted output, string/mem, numeric conversion, and ctype helpers;make run-posix-signal-timeexercises Timer-backedtime,nanosleep, andsleepplus the documented fail-closed signal-delivery stubs). Both bypass WASI – they are static ELF binaries linked againstlibcapos.aand, for POSIX smokes,libcapos_posix.a. POSIXposix_spawn()accepts argv/envp for source compatibility but does not deliver them until LaunchParameters / environment support lands. Broader C/libcapos surface and full POSIX adapter scope remain future design. - Sandboxed
wasm32-wasiis the first booted WASI-hosted language path. Phase W.5 (filesystem;capos-wasm/src/wasi/fs.rs) closed and is exercised bymake run-wasi-fs: thewasm-hostinstalls the manifest-granted rootDirectorycap as a preopened fd, the WASI payload writes and reads back a file throughpath_open/fd_write/fd_close/ re-open /fd_read, and the preopen sandbox refuses absolute paths and parent-escape..segments. The WASI host adapter closed Phase W.4 at commitb0f6939f(2026-05-07 20:09 UTC); Phase W.3 closed at commitca41ecc1(2026-05-07 18:29 UTC; the surrounding W.3 narrative stamps from2026-05-07 18:25 UTCpredate the feat commit by a few minutes); Phase W.2 closed at commit7bfcb1d8(2026-05-07 10:53 UTC): thewasm-hostuserspace binary (capos-wasm/ standalone crate over vendored wasmi 1.0.9) hosts WebAssembly modules whosewasi_snapshot_preview1imports are backed by typed capOS capabilities (Console + Timer + BootPackage, the per-instance argv text grant from W.3, the 2026-05-13 bounded environment text grant throughinitConfig.init.wasiEnv, and the optional W.4EntropySourcecap looked up from the per-instance CapSet under the well-known namerandom).make run-wasi-hello-rust,make run-wasi-hello-c,make run-wasi-cli-args, andmake run-wasi-envare the regression smokes;make run-wasi-randomis the W.4 granted gate (the payload reads N=64 bytes throughrandom_getand prints[wasi-random] entropy_bytes=64 entropy_bound_ok=true) andmake run-wasi-random-ungrantedis the matching refusal gate (the same payload observesERRNO_NOSYS = 52from the closed-fail branch when the manifest withholds the grant). A 2026-05-13 authority-free compatibility slice addsmake run-wasi-stdio-fd, whose direct-import payload provesclock_res_get(MONOTONIC),sched_yield,fd_fdstat_get(1/2), andfd_seek(1/2)no longer returnERRNO_NOSYS;make run-wasi-envproves one granted environment value reaches a WASI payload throughenviron_get/environ_sizes_get;make run-wasi-preview1-refusalsremains the storage/socket fail-closed gate forpath_open,fd_read,sock_send, andsock_recv. Wall-clock support stays deferred until capOS has a typedWallClock/RealTimeClockcap;clock_time_get(CLOCKID_REALTIME)keeps returningERRNO_NOSYSuntil that cap lands. - Rust
std, C++, Go, Python, JavaScript/TypeScript, and full POSIX shell/utilities are not implemented as supported capOS runtime paths. - Lua has a Phase 0 in-tree capability-aware Lua-subset interpreter under
demos/lua-smoke/(gated bymake run-lua-smoke); it validates the long-term capability-userdata host API design but is NOT a PUC Lua dialect-compatible runner. Dialect compatibility waits on the future C/libcapos PUC port. - The planned compatibility story is split by adapter type rather than one generic “compatibility layer”: native runtime adapters for languages such as Rust and Go, capability-native bindings over Cap’n Proto interfaces, POSIX compatibility adapters over scoped file/socket/process caps, and WASI host adapters backed by capabilities.
Design: Programming Languages, Userspace Runtime, Userspace Binaries, Go Runtime, Lua Scripting.
Validation: current native Rust validation uses
tools/check-userspace-runtime-surface.sh, custom-target userspace builds, and
the runtime QEMU smokes listed above. Native C/POSIX validation is through the
focused make run-c-* and make run-posix-* smokes named in this section,
including make run-posix-file for the File/Directory fd surface. WASI
filesystem validation uses make run-wasi-fs (Phase W.5 preopened-directory
round-trip and sandbox proof).
Capability Ring and IPC
- The shared ring ABI supports CALL, RECV, RETURN, RELEASE, CANCEL, NOP, and compact ParkSpace PARK/UNPARK transport operations.
cap_enterprocesses submissions and can block until completions arrive or a timeout expires.- Endpoints route ring-native IPC between processes.
- Direct IPC handoff lets a blocked receiver run before unrelated round-robin work after a matching CALL arrives.
- Transport errors and application exceptions are surfaced through CQEs and typed runtime client errors.
- Ordinary capability implementation errors, revoked ordinary/endpoint use,
live endpoint target errors after endpoint identification, and endpoint
RETURN application failures use serialized
CapExceptionpayloads when a caller result buffer can safely receive one. No-payload application failures reportCAP_ERR_APPLICATION_EXCEPTION_TRUNCATED; malformed transport metadata and unsafe result-buffer paths remain transport errors. - Endpoint RETURN can propagate a serialized
CapExceptionfrom a userspace endpoint server to the original cross-process caller. debug_tapbuilds export metadata-onlyringtap:records for observed SQEs and posted CQEs on the QEMU/debug UART. The format is fixed, bounded, and deliberately recordspayload_len = 0until a separate payload-capture authority lands.tools/ringtap-viewer/parsesringtap:logs into SQE/CQE summaries and can decode authorized Cap’n Proto payloads forCapException,TerminalSession.readLineparams, andProcessHandle.waitresults when future tap output includespayload_schemaandpayload_hexfields.make run-ringtap-failing-callboots the default shell smoke withdebug_tap, drives the knowntyped-callmethod-99 launcher failure, runs the viewer over the captured kernel log, and leaves offline inspection logs intarget/ringtap-failing-call-*.log.
Code: capos-config/src/ring.rs, kernel/src/cap/ring.rs,
kernel/src/cap/endpoint.rs, kernel/src/debug_tap.rs,
capos-rt/src/ring.rs, capos-rt/src/client.rs,
tools/ringtap-failing-call-smoke.sh, tools/ringtap-viewer/.
Validation: cargo test-ring-loom, make run-smoke, make run-spawn,
make run-smoke CARGO_FLAGS='--features debug_tap',
cd tools/ringtap-viewer && cargo test, make run-ringtap-failing-call.
Capabilities
Implemented kernel capabilities include:
- Console for debug UART output.
- TerminalSession for the separate session UART with line input/output,
bounded
readLine, visible/hidden echo, structured cancellation, and a single move-only foreground holder. - BootPackage for read-only, chunked boot manifest reads from init.
- FrameAllocator for typed
MemoryObjectframe ownership grants. - MemoryObject for owned physical frame ranges, caller-local map/unmap/protect, and final backing release after cap/mapping teardown.
- Endpoint for IPC rendezvous.
- VirtualMemory for anonymous user page map, unmap, and protect operations.
- Timer for monotonic tick/time reads and bounded sleep completions through the capability ring.
- ThreadControl for runtime-owned FS-base get/set and current-thread
exitThreadon the current thread. - ThreadSpawner and ThreadHandle for process-local in-process thread creation, one-shot join, exit-code observation, detach-on-release, and retained-status cleanup.
- ParkSpace for process-local private park wait/wake on 32-bit userspace words, with per-thread blocking and reserved waiter CQE credits.
- ProcessSpawner and ProcessHandle for init-driven child process creation and wait semantics.
- Retired NetworkManager, TcpListener, and TcpSocket qemu-only kernel socket capabilities. Their entry points now fail closed; the active TCP/UDP socket authority shape is the Phase C userspace network-stack path.
MemoryObject holders and anonymous VirtualMemory mappings charge the same
per-process ResourceLedger::frame_grant_pages quota. Mapping a held
MemoryObject records borrowed address-space pages and reserves mapping quota
until unmap so backing frames cannot stay pinned after the cap charge is
released.
Code: kernel/src/cap/console.rs, kernel/src/cap/terminal_session.rs,
kernel/src/cap/boot_package.rs, kernel/src/cap/frame_alloc.rs,
kernel/src/cap/endpoint.rs, kernel/src/cap/virtual_memory.rs,
kernel/src/cap/timer.rs, kernel/src/cap/thread_control.rs,
kernel/src/cap/thread_handle.rs, kernel/src/cap/process_spawner.rs,
kernel/src/cap/network.rs.
Validation: make run-smoke, make run-memoryobject-shared, make run-spawn,
make run-shell, make run-terminal, make run-net, cargo test-lib.
Capability Transfer and Release
- IPC CALL and RETURN support sideband transfer descriptors.
- Copy and move transfer are implemented.
- Move transfer reserves the sender slot until destination insertion and commit.
- Transfer result caps carry interface ids to userspace.
CAP_OP_RELEASEremoves local capability-table slots. Runtime owned-handle drop queues one local release, andRuntime::flush_releases()forces queued releases when code cannot wait for the next ring-client acquisition/drop.
Code: kernel/src/cap/transfer.rs, kernel/src/cap/ring.rs,
capos-lib/src/cap_table.rs, capos-rt/src/ring.rs.
Validation: cargo test-lib, make run-smoke.
Manifest Tooling and Smokes
tools/mkmanifestturnssystem.cueinto a Cap’n Proto boot manifest.- The build uses repo-pinned Cap’n Proto and CUE tool paths through the
Makefile; direct
mkmanifestinvocation also rejects missing, unpinned, or version-mismatched CUE compilers.mkmanifest cue-to-capnpextends the same pinned-tool policy to general CUE-authored data messages: it exports CUE as JSON, validatesCAPOS_CAPNP, and delegates arbitrary specified schema-rooted struct serialization tocapnp convert json:binary. - Default scripted QEMU smoke still uses the focused shell-led
system-smoke.cuepath: anonymous session on boot,loginprompting for username before hidden password entry, generic failed-auth output on a wrong password, successful operator login, broker upgrade to the operator bundle, child terminal isolation, stale-handle release, single-capos-shellinit boot, and clean halt. The default operator-facingsystem.cuepath is init-owned and is exercised bymake run. system.cueis now the default init-owned manifest. The kernel starts only the firstinitservice, and init startscapos-shell, the remote-session CapSet gateway, and the default demo services from the manifest service graph. The shell receives terminal/creds/sessions/audit/broker caps and mints its own anonymous session.system-shell.cueis the focused anonymous-shell proof (no verifier), which exercises the shell in its anonymous bundle and asserts that the anonymous launcher rejects spawns because its allowlist is empty.system-chat.cueis the focused First Chat prototype proof. It starts a residentChatendpoint service on the kernel singletonchat_endpoint, a resident bot participant, and the shell;make run-chatdrivesrun "chat-client"with explicitStdIOplus the broker-issuedchatendpoint grant, sends one line, and checks that the bot reply is printed by the foreground client.system-adventure.cueis the focused adventure prototype proof. It keeps adventure out of shell builtins and drivesrun "adventure-client"through explicitStdIO,adventure, andchatendpoint grants. See the Aurelian Frontier (proof slice) page for the current mission, commands, and transcript coverage.system-paperclips.cueis the focused clean-room Paperclips-style terminal demo proof. As of commit532207c1(2026-04-30 20:54 UTC), it boots Paperclips server services plus a terminal client. The server owns generated content, game state, regular timer cadence, unlock checks, game-rule mutation, and proof-command gating; the terminal client receives explicitStdIOplus aPaperclipsGameendpoint and renders server-modehelpfrom the server’s structured command specs. Commite9ae4e97(2026-04-30 22:02 UTC) adds structured plain-status snapshots, so server-mode plainstatusis rendered from the server’s structuredPaperclipsStatusSnapshot. Commit32462e9f(2026-04-30 22:32 UTC) adds the structured project-list follow-up: server-provided project entries for terminal-rendered plainprojects, whileproject <id>remains a raw text request that mutates server-owned game state.make run-paperclipsfirst proves that normal server authority rejectsrun <ms>fast-forward plus rejection of a forgedproof_accelerator: @timergrant, then relaunches against the proof server endpoint with the focused manifest’s explicitproof_acceleratorcap for transcript acceleration. The accelerated proof drives one-at-a-time manual production, locked-purchase and insufficient-funds refusal output, bulk-manual rejection, high-price zero-demand sale refusal, no-wire manual production refusal, explicit sales, immediate repeat-sale cooldown refusal, repeatable marketing, autoclipper unlock, real-time automation, generated Cap’n Proto content loading, first project completion, scaled business-phase production, thedesign-searchandforecast-engineproject chain,survey-drones, the visible== autonomous phase ==transition, then representative autonomous drone/factory scaling with local-matter conversion and additional clip production,mesh-coordination,seed-probes, the visible== cosmic phase ==transition, one bounded probe interval with cosmic matter conversion, probe replication, and additional production, then asserts a compactstatus --jsonmachine-readable status line and verifiesfinal-conversionremains locked before clean process exit through the native shell. Active schema, content, rules, and smoke sources use clean-room Strategy internals, and host tests reject explicit zero-count purchases without mutating state. Host tests cover the one-real-time-hour non-completion property under a generous normal-play creativity upper bound.demos/service-common/holds the shared caller-session endpoint loop and chat actor bootstrap/polling helpers used by the chat/adventure resident services, chat bot, and adventure NPC processes. New shared endpoint loop code usesEndpointUserData; the old badge-named user-data alias remains only for compatibility while peer branches migrate. Shared event queues remain deferred until another service has queue needs matching chat history/inbox behavior.system-spawn.cueremains the focused ProcessSpawner smoke for endpoint, IPC, VirtualMemory, Timer, ThreadControl, FrameAllocator cleanup, and hostile spawn inputs.make run-spawnasserts that the kernel boot-launches only the standaloneinit, that init validates BootPackage metadata, and that the init-owned manifest executor spawns and waits for every focused child service, including thetimer-smokemonotonic now/sleep proof,timer-floodper-process Timer sleep quota proof,runtime-fs-baseruntime-owned FS-base proof,single-thread-runtimeVirtualMemory plus Timer runtime checkpoint, andthread-lifecyclein-process thread/park proof.
Code: tools/mkmanifest/, system.cue, system-chat.cue,
system-adventure.cue, system-paperclips.cue, system-spawn.cue, demos/,
capos-rt/.
Validation: cargo test-mkmanifest, make generated-code-check,
make run-smoke, make run-chat, make run-adventure,
make run-paperclips, make run-spawn.
Partially Implemented
Login Boot and Init-Owned Spawn
Default make run now uses the init-owned default manifest. The kernel
validates the kernel-owned boot boundary, boot-launches standalone init, and
leaves the service graph plus login/session/broker flow in userspace. Init
starts the foreground capos-shell service, resident demo services, and the
host-local remote-session CapSet gateway; make run forwards host-local TCP
to guest port 2327 for the remote CapSet path only. The foreground shell mints
its own anonymous UserSession on boot; login and setup
commands drive CredentialStore/SessionManager/AuthorityBroker to upgrade
the session in place. Local password login is username-aware on the ordinary
foreground shell path, while durable multi-account credential storage remains
future work.
The plaintext Telnet gateway was only a focused
make run-telnet / system-telnet.cue research demo. That target is retired
after qemu-only kernel TCP listener removal, and the gateway demo, its
manifest, and the kernel SocketTerminalSession shim are removed; use the
in-guest login smokes for current shell coverage and rebuild any
socket-backed terminal proof on the Phase C userspace network stack before
using it as validation.
The focused init-owned spawn path remains under make run-spawn. There the
kernel boot-launches init with Console, BootPackage, and ProcessSpawner.
Parent endpoint facets used for later service-sourced imports are returned by
ProcessSpawner during child spawn, not granted at boot. init
performs metadata-only manifest validation, resolves kernel and service cap
sources, spawns children through ProcessSpawner, records exports, waits for
children, and reports failures through Console output. The QEMU target now
asserts the single-init boot markers, the three-cap init bundle, BootPackage
validation, child exit records, manifest child waits, spawn-loop completion,
and clean halt.
Measurement startup now follows the same boundary. make run-measure uses a
focused system-measure.cue manifest where the kernel boots standalone init
with Console, BootPackage, and ProcessSpawner, and init spawns ring-nop with
Console, FrameAllocator, the measurement-only NullCap, and the measurement-only
ParkBench cap through ProcessSpawner grants. It also spawns
thread-lifecycle with ThreadControl, ThreadSpawner, ParkSpace, and a
measurement marker cap. The demos print compact versus generic park-shaped
failed-wait/empty-wake cycle averages plus real ParkSpace blocked/resume
cycle averages before the measure-feature kernel prints segmented dispatch
counts, total cycles, and averages for SQE processing, validation, cap lookup,
capnp decode, method body dispatch, CQE posting, and waiter wake/check. Kernel
bootstrap now loads only
initConfig.init and validates only the kernel-owned manifest boundary;
mkmanifest and init own initConfig.services graph validation for focused
BootPackage executor manifests.
SSH Shell Gateway
The SSH Shell Gateway proof targets are implemented, covering the authority prerequisites and fixture authentication path that precede an encrypted SSH transport. Bounded QEMU smokes exist for:
- Host-key fixture signing (
make run-ssh-host-key): a development-only non-productionSshHostKeycap returns public metadata, signs bounded fixture exchange hashes for QEMU proof, fails wrong-algorithm requests closed, and does not leak the private host-key seed. - Authorized-key lookup (
make run-ssh-authorized-key): a manifest-seededAuthorizedKeyStorecap accepts configuredssh-ed25519public keys mapped to seed-account principals, denies unknown, disabled, and unsupported-algorithm keys, and does not leak private key material. - Public-key session minting (
make run-ssh-public-key-session,make run-ssh-public-key-auth):SessionManager.sshPublicKeyrechecks configured key records, verifies a boundedssh-ed25519signature over fixture authentication bytes, mints apublicKeyUserSessiononly after the signature succeeds, and logs stable audit reason codes for each denial path without leaking principal or profile metadata.UserSession.auditContextfails closed after logout through the sameensure_session_liveguard asinfo(). - Unsupported feature policy (
make run-ssh-feature-policy): acapos-config::ssh_policysurface classifies password auth, exec requests, SFTP, direct-tcpip, agent/X11 forwarding, env import, and multiple session or shell channels into stable audit reason codes; all denied paths produceevent=session result=denied reason=policyaudit records. - Restricted shell launcher (
make run-restricted-shell-launcher): a manifest-declaredRestrictedShellLaunchercap launches onlycapos-shell, injects supplied terminal/session caps plus child-local stdio, rejects session/profile mismatch and kernel-sourced or dangerous pass-through grant attempts, and strips hidden process-supervision result caps. - Bounded terminal-host proof (retired):
make run-ssh-gateway-terminal-hostwired scopedTcpListenAuthoritylisten, authorized-key lookup, public-keyUserSessionminting, broker profile matching, socket-to-TerminalSessionconversion, and restricted shell launch over a host-local plain TCP connection. It sat on the qemu-only kernel socket owner and the kernelSocketTerminalSession, both of which are retired with the userspace network-stack migration; the smoke now exits with a retirement diagnostic and a future terminal host must target the userspace network stack.
Encrypted SSH packet transport, OpenSSH-compatible key exchange and channel
handling, full SSH userauth transcript validation, channel binding,
TerminalSessionFromByteStream terminal-factory wiring, a terminal host over
the userspace network stack, and a production OpenSSH harness remain open.
The landed proofs use development/fixture key material; they are not a
production SSH service and are not safe for non-loopback deployment.
Design: SSH Shell Gateway.
Code: kernel/src/cap/ssh_host_key.rs, kernel/src/cap/authorized_key_store.rs,
kernel/src/cap/restricted_launcher.rs, capos-config/src/ (ssh_policy),
demos/ssh-*/, tools/qemu-ssh-*-smoke.sh.
Validation: make run-ssh-host-key, make run-ssh-authorized-key,
make run-ssh-public-key-session, make run-ssh-public-key-auth,
make run-ssh-feature-policy.
Hardware and Networking
The hardware bring-up path has bounded ACPI RSDP/RSDT/XSDT, MADT, MCFG, DMAR,
and IVRS diagnostics plus reusable PCI config-space access through legacy I/O
ports and Q35 PCIe ECAM, and the x86 path programs masked MADT-backed I/O APIC
routes for legacy IRQs while honoring source overrides. IOMMU reporting is
policy-only: malformed DMAR/IVRS structures fail closed, DMAR DRHD include-all
or single-hop PCI endpoint device-scope metadata can mark retained DMA-capable
PCI functions as IOMMU-attached/covered; bridge and multi-hop scopes remain
diagnostic-only until PCI topology traversal exists, and include-all fallback
fails closed when retained DMAR coverage metadata is capped. Direct DMA remains
blocked with zero trusted domains, and every retained DMA-capable prototype
function requires bounce buffering. The current staged domain-policy proof
also reports that future claimed DMA-capable devices use a
device-manager-owned per-device domain or trusted sharing group, exported
device addresses are IOVA-only, host
physical addresses are not user-visible, remapping tables are not programmed,
and production userspace hardware authority is still blocked. That
blocked-direct-DMA admission decision now runs through the host-tested
capos-lib::device_authority helper used for device-authority validation, so
the PCI proof line and diagnostics mirrors share the same fail-closed labels
for absent, malformed, unsupported, or retained-capped remapping metadata.
Active device-manager DMAPool policy records also carry a software
remapping-domain ledger staging record. The QEMU lifecycle/imported-live
proofs bind it to the active record and matching handle with
diagnostics-only static ACPI/PCI coverage, remapping_domain_owner=device-manager,
remapping_domain_ready=false, remapping_tables=not-programmed,
iova_export=disabled-future-only, direct_dma=blocked, and
host_physical_user_visible=0; no remapping tables, direct-DMA trusted
domains, host physical addresses, or IOVAs are exposed.
Bounded
manifest grants now exist for DMAPool, DeviceMmio, and Interrupt:
DeviceMmio exposes bounded .info, read-only userspace .map /
.unmap over boot-preseeded BAR pages, brokered read-only .read32 backed by
the same boot-preseeded 64-page kernel mapping cache, and bounded brokered
claimed-register .write32; Interrupt exposes bounded .info plus
admission-only .wait, .acknowledge, .mask, and .unmask, and
DMAPool reports conservative .info status and can mint eight fixed
manager-attached bounce-buffer DMABuffer result caps via
request-shaped allocateBuffer. Valid one-page bounce-buffer requests report
requested bytes, allocated bytes, page count, and request labels; zero-size
and over-bounce-buffer requests fail closed as
dmapool-allocation-request-invalid before result-cap or page mutation.
The same DMAPool.info result now exposes the attached manager record’s
owner/pool labels, live buffer/page/byte counts, in-flight submissions, and
committed/resident/unswappable/scrub-before-release flags. The bounded
manifest smoke proves zero-live accounting before allocation, one, two, three,
and eight live 4096-byte bounce pages while result DMABuffers are active, a
ninth-allocation full-pool rejection, restoration to the existing four-buffer
descriptor working set after freeing slots 4 through 7, three-live accounting
after freeing one descriptor-test slot, slot-0 reuse while the other three
descriptor-test slots remain live, and zero-live again after all typed
DMABuffer releases complete.
That DMABuffer now supports typed .freeBuffer, single-page userspace
bounce-buffer .map and .unmap, and
bounded manager-accounted .submitDescriptor / .completeDescriptor. The
.map path validates the live bounce-buffer epoch, accepts readable full-page
requests, maps the manager-owned bounce page into the caller’s userspace
address space, returns userspace-bounce-buffer-mapped,
single-page-bounce-buffer, user-vma-mapped, and a nonzero userspace
address, rejects zero-size, partial-page/out-of-range, and
executable/unknown protections with typed range/protection labels, and still reports
real_dma_mapping=not-programmed, direct DMA blocked, and no host physical
address exposure. The .unmap path validates the live buffer record first,
removes only that cap-owned borrowed VMA for the caller process, reports a
typed no-op when no mapping is present, and leaves page lifetime plus pool and
descriptor accounting unchanged. The descriptor paths carry
request labels plus the bounded proof counts (queue_count=4,
descriptor_count=8, buffer_bytes=4096): valid submits return
manager-inflight-recorded and raise the attached DMAPool.info
live_inflight count to 1, valid completions return
manager-inflight-completed and restore it to 0, and valid completions with
no outstanding submission return dmabuffer-no-inflight-submission with
side-effect-blocked. The manager record also tracks the single live
descriptor identity: duplicate submits for that live queue/descriptor return
dmabuffer-descriptor-already-inflight, and valid-shaped completions for a
different descriptor return dmabuffer-inflight-descriptor-mismatch; both
paths leave live_inflight=1 until the matching completion arrives.
Out-of-range queues/descriptors, zero submit lengths, submit lengths beyond
the bounce buffer, and completion lengths beyond the bounce buffer still fail
closed as dmabuffer-descriptor-request-invalid without mutating the counter.
The default manager-accounting descriptor path also preflights result
serialization before mutating accounting, and cap-table release drains bounded
in-flight accounting before detaching so a removed userspace cap cannot strand
the bounce buffer. The selected provider-TX exception for
make run-ddf-provider-consumer is narrower and runtime-visible: queue 1
submits may publish the selected eight-entry TX queue depth, descriptors 0..7,
into the existing kernel-owned virtio-net TX ring after the same DMABuffer
authority, bounce-scrub, and live notify_mmio policy gates; that selected path
then rings exactly one notify doorbell per accepted provider descriptor and
lets DMABuffer.completeDescriptor consume the stored software descriptor
generation from the real TX used ring before clearing each manager in-flight
record. Live tx_interrupt.wait calls over that selected route can observe
the ordered bounded completion events, and provider tx_interrupt release
proves bounded teardown by draining seven incomplete descriptor handoffs or
retiring seven delivered-but-unacked completion events with no pending provider
waiters. Wrong queue, stale
DMABuffer, stale notify policy, inflight publication, duplicate completion,
and stale tx_interrupt issue paths still fail closed before their guarded
side effects.
DMABuffer.freeBuffer, cap release, driver-crash, reset-disable, and drop
cleanup still revoke any remaining user mapping before scrub/free. None of these methods
program direct DMA, publish arbitrary CQ entries, transfer full virtio-net
ownership, or expose host physical addresses.
Allocations beyond the eight fixed bounce-buffer slots, DMA map/submit/complete
side effects outside the selected provider-TX proof, writable userspace BAR
mappings, arbitrary MMIO writes and doorbells, unbrokered register access,
blocking IRQ wait beyond the bounded selected-route completion waiter, real
hardware acknowledgement, hardware IRQ ownership, hardware mask/unmask,
hardware MSI/MSI-X programming, and general IRQ delivery remain blocked;
parent-first
DMAPool release
defers until all live DMABuffer slots release, or successful DMABuffer
driver-crash/reset-disable cleanup frees the remaining bounce pages and
completes the staged zero-live pool detach. make run-hardware-grant-cycle
proves
sequential DeviceMmio/Interrupt skeleton grants can release and reacquire
fresh DeviceMmio mapping generations while the Interrupt grant retains its
source generation and refreshes only the route generation, and its read-only
HardwareAuditLog.snapshot check decodes those two-cycle audit records through
the current volatile unsigned audit surface. make run-hardware-audit-interrupt-waiter also decodes recent boot-time
DmaBuffer, DmaPool, and Interrupt driver-crash / reset-disable
lifecycle records through that typed volatile snapshot path, and its cursor
snapshot requests from the first older retained DeviceMmio lifecycle
sequence to decode rows outside the default latest 16-record tail. make run-hardware-audit also proves below-oldest cursor clamping and past-end empty
cursor metadata on the overflowed volatile ring, and the QEMU-only local-ring
proof now checks those same cursor edges without mutating live audit records.
Unsafe retained
metadata fails closed, and prototype devices remain kernel-owned
bounce-buffer-only. The device-manager DMAPool
attachment path now stores
that explicit bounce-buffer policy on the attached pool record and the QEMU
lifecycle/imported-live proofs read it through the active manager record and
matching DmaPoolHandle; no new cross-manager lock was added, so the existing
PCI_DEVICE_MANAGER before DEVICE_INTERRUPT_ROUTES order remains unchanged.
PCI memory-BAR subregions are validated and mapped through a shared kernel
helper before in-kernel drivers use device MMIO, and PCI capability walking
reports non-programming MSI/MSI-X metadata for the QEMU virtio-net function.
make run-pci-nvme now applies the same metadata-only PCI path to a QEMU NVMe
controller: class/subclass/programming-interface, memory BAR, capability, and
MSI-X metadata are visible, while userspace device authority, DMAPool,
DeviceMmio, Interrupt, controller init, admin queues, I/O queues, MMIO
doorbells, and direct DMA remain not started or blocked.
make run-diagnostics now boots a
feature-gated COM1 early-boot diagnostics prompt before capability,
scheduler, timer, manifest, or userspace startup, with bounded commands for
status, CPU, memory, ACPI, PCI, IRQ, timers, devices, logs, reboot placeholder,
and halt. The ACPI and PCI diagnostics commands now also print bounded
MADT/MCFG/DMAR/IVRS record details, PCI function/config-header summaries, BAR
summaries, capability counts, MSI/MSI-X summaries, and bounded PCI
DMA-attachment policy counters/details when present; devices reports PCI
totals plus network/storage/display/bridge class counts and mirrors the
current DMA-domain policy without owner identity: direct DMA is blocked,
trusted-domain and ready-domain counts are zero, remapping tables are not
programmed, future exported device addresses are IOVA-only, userspace device
authority is not started, and prototype devices remain kernel-owned
bounce-buffer-only. The current runtime-state diagnostics slice also attaches
QEMU virtio-net plus the second virtio-rng proof device before the prompt and
reports virtio_net=ready,
bounded RX/TX virtqueue ring state, MSI-X route/vector/counter state, live
buffer state, the kernel-owned DMA owner/pool ledger, and device interrupt
route aggregates plus per-route delivery counters. Future driver extensions,
production teardown/lifecycle diagnostics, IOMMU remapping table programming,
production DMAPool, and userspace driver authority remain planned. The QEMU
virtio-net path has a
make run-net boot target, modern virtio PCI transport discovery for the
common, notify, ISR, and device-specific MMIO regions, feature negotiation, and
RX/TX split-virtqueue initialization, a TX descriptor completion proof, minimal
Ethernet ARP resolution, and ICMP echo validation against the QEMU user-mode
gateway. It no longer wraps the virtio driver in a kernel smoltcp interface or
performs a kernel TCP HTTP GET; the remaining run-net evidence is a
lower-layer QEMU fixture, and TCP/UDP socket proof lives under the Phase C
userspace network-stack gates.
QEMU currently exposes a transitional 1af4:1000 virtio-net function with
modern vendor capabilities; capOS accepts that shape only through the modern
capability layout and now selects a usable MSI-X capability for config/RX/TX
table entries, records kernel-owned MSI-X sources for config/RX/TX in the
device interrupt dispatch table, programs those entries through the typed PCI
MSI-X table helper using a bounded first-fit LAPIC device MSI vector pool,
lets the in-kernel virtio-net owner claim and unmask only its routes, assigns
the virtio common/config and queue MSI-X vector fields, and keeps descriptor,
ARP, and ICMP fixture evidence in make run-net after the kernel L4 owner is
retired. Device-autonomous virtio-net MSI-X delivery is covered by the dedicated
userspace-provider gates. A masked lifecycle probe on the unused virtio-net
MSI-X table entry proves claimed-route
reassignment, stale old-route rejection, old-vector unregistered delivery,
reassigned-vector masked delivery, unsupported-vector delivery, and release
before the live routes are registered. The QEMU virtio-rng second-device
metadata path also exercises the device-manager ownership model, the claimed
MSI-X route handoff, and a bounded teardown-trigger contract for cap release,
process exit, driver crash, reset/disable escalation, interrupt waiter, future
DeviceMmio, and future DMAPool trigger labels through the same
claim/transfer/revoke/release transaction. It also proves a bounded
manager-owned DeviceMmio record lifecycle: active-owner attach, stale and
owner-mismatch rejection, duplicate attach rejection, active RingDoorbell
validation through capos-lib::device_authority, binding to the first decoded
PCI memory BAR region from the tested PciDevice,
region_source=pci-decoded-memory-bar, region_bound_to_manager=true,
bar_present=true, bar_memory=true, bar_base, bar_length,
fail-closed wrong-BDF, wrong-BAR, and zero-length region metadata as
devicemmio-region-invalid, no invalid mapping created, negative side-effect
blocking,
stale-after-revoke rejection as devicemmio-stale-handle with
stale-owner-generation and side-effect-blocked, the
RevokingHandles -> MmioRevoked transition blocking while attached, bounded
detach, and no userspace handle, real BAR mapping, or doorbell write.
For the bounded userspace DeviceMmio.map path, the shared pure
capos-lib::device_authority validator now accepts only page-aligned in-BAR
read-only requests and denies writable, executable, unknown-protection,
zero-size, unaligned, offset-size overflow, and out-of-BAR requests before the
kernel maps anything. Accepted map requests install a caller-owned borrowed
read-only userspace VMA over boot-preseeded BAR pages only, and
DeviceMmio.unmap removes that borrowed VMA after checking the active manager
record and caller address space. Duplicate map, second unmap, stale release,
drop, driver-crash, and reset-disable paths all preserve fail-closed VMA
revocation and no-side-effect labels.
For the brokered userspace DeviceMmio.read32 path, the same authority layer
validates active handle/state/policy and in-BAR dword-aligned offsets before a
single kernel-side volatile read from the boot-preseeded cache. The cap call
path never installs new kernel mappings, returns typed
mmio-read32-range-invalid denials for unaligned, overflowing, and out-of-BAR
offsets, reports register_read=performed only for accepted reads, and fails
closed after cap release.
For the brokered userspace DeviceMmio.write32 path, the kernel runs the same
manager-attached identity, policy, and range checks before one volatile dword
write through the boot-preseeded kernel MMIO mapping cache, and it accepts
only the single PCI MSI-X metadata-derived provider-scoped masked
vector-control claim. The focused proof uses that idempotent dword on the virtio-rng
BAR, confirms it through both read32 and the read-only userspace VMA, and
proves an unclaimed message-address dword write leaves the original value
unchanged. Unclaimed, unaligned, overflowing, and out-of-BAR calls report typed
blocked labels before any write; stale or released handles fail closed before a
write and do not return a write32 result payload.
The same path proves a bounded zero-live DMAPool record lifecycle:
active-owner attach, stale and owner-mismatch rejection, duplicate attach
rejection, generation invalidation on revocation, DmaMappingsRemoved blocking
while attached, teardown detach, and terminal release. It now also issues and
records a bounded manager-attached DMA buffer handle under that attached pool,
validates active SubmitDescriptor through the pure DMA-buffer validator, and
records stale-after-revoke (stale-owner-generation), freed-buffer (freed),
and reused-slot (stale-slot-generation) rejection with
side-effect-blocked; the attached buffer record now also blocks zero-live
pool teardown as dmapool-buffer-attached, rejects a stale same-slot
proof-scoped FreeBuffer as dmabuffer-stale-handle with
stale-slot-generation and side-effect-blocked while preserving the
manager-owned buffer record, then validates an active FreeBuffer and
manager-owned buffer-record detach as ok, after which the existing
DMAPool detach succeeds. This still exposes no userspace handle and attempts
no real DMA. The pending IRQ token path now also delegates the
source/route-generation, masked, unregistered, invalid-owner, and malformed
identity decision to a host-tested capos-lib::device_authority validator
after snapshotting the live dispatch slot, while preserving the existing
stale-pending-irq-* QEMU labels. This remains validator/adapter evidence,
not production userspace interrupt waiter authority. The virtio-net smoke now also
derives an imported live-accounting
DMAPool record from the authoritative kernel-owned DMA ledger, records live
buffer/page count, live bytes, in-flight submissions,
committed/resident/unswappable flags, and scrub-before-release policy, and
proves both teardown detach and DmaMappingsRemoved fail closed while that
ledger is live. The live proof now consumes the device_dma teardown-evidence
API, records the expected authoritative-ledger-live block with matching
imported live accounting, and defers completion because this slice has no
authoritative zero-live/scrubbed evidence for the live virtio-net ledger and
does not attempt real DMA teardown, scrub, DmaMappingsRemoved, terminal
Dead, or release for the live virtio-net record. A separate scratch-ledger
proof reaches authoritative-ledger-zero-live only after both quiesce and
scrub markers are set, without touching live virtio-net DMA. Another scratch
proof covers stale DMA page handles by generation-tagging same-phys reuse and
rejecting stale, wrong-queue, wrong-label, and duplicate-free attempts without
mutating the active ledger. Userspace DMAPool/DeviceMmio/Interrupt
authority, real lifecycle hook plumbing, real page quiesce/scrub/release
cleanup, and broader driver interrupt dispatch remain planned.
The kernel negotiates VIRTIO_F_VERSION_1 plus MAC when safe and
VIRTIO_NET_F_MRG_RXBUF for QEMU’s merged-buffer virtio-net header, maps the
virtio MMIO regions after kernel paging is live, allocates
kernel-owned DMA pages for RX/TX descriptor, available, used rings, RX packet
buffers, and one-shot TX buffers, submits a descriptor proof frame, sends an
ARP request from 10.0.2.15 to 10.0.2.2, and observes the ARP reply in
make run-net. Those current DMA pages now pass through a bounded
kernel-owned device_dma pool ledger that proves live pool bytes, page counts,
page-rounded MMIO mapping bytes, config/RX/TX interrupt holds, RX/TX ring
depths, and RX/TX descriptor submission/completion accounting while no
userspace DMA/MMIO/interrupt handles are exposed. The net smoke also proves the
current kernel-owned budget/OOM policy with a scratch ledger: page and byte
allocation over budget, overlarge queue depth, duplicate and over-budget MMIO
holds, MMIO byte over budget, duplicate and over-budget interrupt holds, and
descriptor submission beyond queue depth all fail closed while the live
virtio-net ledger still validates normally, and the live device-manager record
proof is derived from that same ledger without zeroing the copied record as a
stand-in for real cleanup. The device-manager DMAPool record now also
carries that budget profile, and the lifecycle/imported-live proofs read it
through the active manager record plus matching DmaPoolHandle before
checking zero-live accounting or live aggregate in-flight accounting against
derived total budgets. The same scratch-proof pattern now covers zero-live
teardown evidence and stale DMA page handles. Production userspace DMAPool,
DeviceMmio, and Interrupt handles, production userspace DMA-buffer handles,
real page cleanup/reuse, real DeviceMmio mapping objects, cache
attributes/write policy enforcement, hostile stale-MMIO/DMA smokes, S.11.2
hostile smokes, and real doorbell writes remain unavailable; the current
malformed-region and manager-attached buffer proofs are only bounded
fail-closed metadata evidence in the manager proof path. The same smoke
sends
an IPv4 ICMP echo request to 10.0.2.2,
validates the echo reply identifier, sequence, payload, IPv4/ICMP checksums,
and addresses, and prints an icmp echo ok proof line. The former kernel
smoltcp TCP HTTP smoke, scheduler-polled smoltcp runtime, Phase B
NetworkManager/TcpListener/TcpSocket qemu-only cap objects, socket-backed
Telnet terminal handoff, and POSIX DNS UdpSocket smoke are retired. The
kernel no longer depends on smoltcp; qemu-only kernel TCP/UDP socket entry
points fail closed; and the corresponding Make targets exit before QEMU with
retirement diagnostics. Phase C (Networking
Part 3) moves TCP/IP behavior into a userspace network stack process and keeps
the kernel production surface focused on DMAPool/DeviceMmio/Interrupt
device capabilities.
The local serve-from-userspace proof now boots a non-qemu cloudboot manifest
where a userspace smoltcp service grants an application client a
TcpListenAuthority and serves TcpListener/TcpSocket caps for one hostfwd
TCP round trip. A later local DHCP/IPv4 proof now lands the first
lease/default-route/ARP configuration evidence on that userspace stack. Local
bounded ICMPv4 Echo Reply diagnostics are also proved through a local cloudboot
manifest, but remain diagnostic-only and outside the Web UI readiness ladder.
For the selected GCE Self-Hosted Web UI milestone, the evidence order is local
served TcpListenAuthority, local DHCP/IPv4, local Web UI L4, private GCE
reachability, then the separately authorized public ingress/TLS proof. The
legacy kernel socket owner no longer accepts non-qemu production manifest
grants; qemu-only fixtures keep their explicit kernel socket sources until the
broader Phase C exit cleanup removes that path.
Code: kernel/src/acpi.rs, kernel/src/diagnostics.rs,
kernel/src/pci.rs, kernel/src/device_interrupt.rs,
kernel/src/device_manager/, kernel/src/device_dma.rs,
kernel/src/virtio.rs, kernel/src/cap/network.rs,
kernel/src/cap/ring.rs, kernel/src/sched.rs, kernel/src/mem/paging.rs,
kernel/src/arch/x86_64/pci_config.rs, Makefile,
tools/qemu-diagnostics-smoke.sh, tools/qemu-iommu-acpi-smoke.sh,
tools/qemu-net-smoke.sh, tools/qemu-net-harness.sh.
Validation: make run-diagnostics, make run-iommu-acpi, make run-net,
make qemu-net-harness.
Security and Verification Track
The repo has Miri, proptest, fuzz, Loom, Kani, generated-code, dependency
policy, trusted-build-input, panic-surface, and DMA-isolation work. CI now runs
a bounded Kani gate for capos-lib bitmap, cap-table stale-handle, transfer
preflight, transfer rollback split between source-visible rollback and
destination-ledger restoration, and frame-grant accounting invariants. The
heavier prepare-copy to provisional-destination seam proof passed in the
high-memory make kani-lib-full Cloud Build gate, but coverage is not complete
for every trust boundary.
References: Trusted Build Inputs, Panic Surface Inventory, DMA Isolation, and Security and Verification Proposal.
Future Work
Future architecture includes service restart policy, capability-scoped system monitoring, notification objects, promise pipelining, service-facing SharedBuffer APIs on top of the MemoryObject substrate, scheduling-context donation, session quotas, SMP, storage and naming, userspace networking, cloud boot support, user identity, policy enforcement, multi-front-end terminal hosts, richer native command surfaces, and broader language/runtime support.
Design references: