Proposal: Volume Encryption
Encrypting system and user volumes in a capability OS where storage is already a stack of typed capabilities and keys can be first-class capability objects.
Problem
capOS currently has no persistent storage, no crypto, no TPM driver, and no block-device drivers. That is the right moment to decide what encryption-at-rest looks like, before storage interfaces and service graphs harden around plaintext assumptions.
Traditional OSes bolt encryption on as a kernel subsystem
(dm-crypt/LUKS, BitLocker, FileVault, fscrypt). That choice follows
from those kernels’ architecture: the kernel owns block I/O, the
filesystem, the keyring, and the trust domain between processes, so
encryption logically lives there too. capOS has made the opposite bet —
the kernel is a capability router, block I/O lives in userspace
services, filesystems are userspace services, and there is no ambient
keyring because there is no ambient anything.
Putting crypto in the kernel would contradict Design Principle 5 (“the kernel is becoming a capnp-rpc router”) and Principle 7 (“pragmatic reuse” — let userspace crates do what they already do well). Putting it nowhere leaves the system unable to protect data at rest. The proposal below places encryption in userspace services expressed as capabilities, with no new kernel mechanism.
Threat Model
Four attackers worth distinguishing up front, because the defenses differ:
- Offline disk theft. Attacker has the storage medium, no live system, no running key service, possibly no hardware attestation. Ciphertext must reveal nothing about plaintext beyond length and block boundaries.
- Ciphertext tampering at rest. Attacker can write to the medium and hopes to flip ciphertext bits to produce attacker-chosen plaintext changes (classic XTS malleability). Modification must be detected, not merely scrambled.
- Peer userspace service holding the raw
BlockDevicecap. The virtio-blk driver, a backup agent, a telemetry exporter, or any service that is on the physical I/O path. They hold authority to read sectors but must not see plaintext for volumes whose key they do not hold. - Compromised session with a live key cap. Once an attacker is
inside a user’s session and holds the user’s
SymmetricKeycap, that user’s data is lost. The goal is lateral containment: no cross-user leverage, no escalation to the system volume, no access to other sessions’ keys.
Out of scope for a first pass:
- Cold-boot RAM attacks and side channels (mitigation: use TPM-bound keys when available, but physical memory reads against a running host are not defended).
- Evil-maid attacks on the unencrypted portion of the boot image (addressed separately by secure boot / measured boot — see storage-and-naming-proposal.md Open Question #5).
- Traffic analysis against encrypted backups or encrypted replication.
- Key escrow for legal recovery. capOS takes no position; a deployment
can add an escrow
KeySourcewithout changing the model.
Keys Are Capabilities
Key material never crosses cap boundaries. Callers hold
SymmetricKey or PrivateKey capabilities whose methods run inside
the service that holds the key; the holder gets encrypt/decrypt/sign
authority, not the bytes. Attenuation (decrypt-only, AAD-pinned,
purpose-bound) is wrapper CapObjects, the same mechanism that builds
read-only Files.
This proposal does not define those interfaces. They belong to
cryptography-and-key-management-proposal.md,
which covers SymmetricKey, PrivateKey/PublicKey, KeySource,
KeyVault, algorithm and purpose enums, seal policies, and the set
of concrete key sources (manifest-embedded, passphrase, passkey PRF,
TPM 2.0, cloud KMS, attestation, network, software-stored). Volume
encryption is one consumer among many.
Layer Placement
Two layers exist, and a first-class design uses both.
Layer A — EncryptedBlockDevice (LUKS analog)
A userspace service holds two caps — BlockDevice (raw) and
SymmetricKey — and exports a new BlockDevice cap that looks
identical to its input but encrypts writes and decrypts reads
transparently. Everything above the wrapper (filesystems, the Store
service, content-addressed backends) is oblivious.
Raw block device
→ virtio-blk / NVMe driver → BlockDevice cap (ciphertext)
→ EncryptedBlockDevice service holds [BlockDevice + SymmetricKey]
→ BlockDevice cap (plaintext-view)
→ FAT / ext4 / Store service
→ File / Directory / Namespace caps
→ App
Properties:
- One key per volume (or per-range, see “Key hierarchy” below).
- Granularity is a sector/block. Metadata in the filesystem layer is encrypted along with data — the shape of the directory tree is invisible to threat #3.
- Incompatible with zero-copy device DMA into user pages (see “SharedBuffer” below).
Layer A defends against threats #1, #2, and #3.
Layer B — per-user Namespace / Directory encryption (fscrypt analog)
Layered above a filesystem or Store, Layer B encrypts object contents and, optionally, object names, using a per-user key. The underlying block device may or may not also be encrypted.
BlockDevice (ciphertext or plaintext)
→ Store service → Store/Namespace caps (ciphertext objects)
→ EncryptedNamespace service holds [Namespace + UserKey]
→ Namespace cap (plaintext-view)
→ User's session services
Properties:
- One key per user (or per session, per device, per tenant).
- Metadata at the filesystem/Store layer is visible to threat #3 unless Layer A is also in place.
- Cap boundaries are naturally per-user — revocation is “drop the cap,” no filesystem rekeying.
- Compatible with shared filesystems across users (per-entry encryption).
Layer B defends primarily against #4-lateral (a compromise of user Bob’s session does not reveal user Alice’s data) and against a compromised shared filesystem service when the underlying block layer is unencrypted.
Recommendation
Use both. Layer A for the system volume and for the per-tenant block substrate in multi-tenant deployments; Layer B for per-user data on top of a shared filesystem or store. Users who run single-tenant desktops can skip B. Cloud VMs that rely on provider-side encryption of block storage (see “Cloud integration”) can skip A and keep B. The proposal does not mandate either layer; it standardizes the interface so both compose.
Volume-Specific Schemas
SymmetricKey, KeySource, KeyAlgorithm, KeyPurpose, and
SealPolicy are defined in
cryptography-and-key-management-proposal.md.
This proposal adds only the wrapper-factory and on-disk-format
schemas.
EncryptedBlockDevice
Exposes nothing new — it implements the existing BlockDevice
interface. The distinction is where it sits in the cap graph. A
factory cap creates it:
interface EncryptedBlockDeviceFactory {
open @0 (raw :BlockDevice, key :SymmetricKey, format :VolumeFormat)
-> (plain :BlockDevice);
format @1 (raw :BlockDevice, key :SymmetricKey, params :FormatParams)
-> (plain :BlockDevice);
}
struct VolumeFormat {
superblock @0 :Data; # read from raw device during open()
algorithm @1 :SymmetricAlgorithm; # defined in key-management proposal
sectorSize @2 :UInt32;
tagAreaLayout @3 :TagAreaLayout;
}
Cryptographic Construction
Two separate questions — block layer and object layer — with different answers.
Block layer (Layer A)
Requirement: authenticate every block. XTS alone is not enough; it defends against #1 but not #2.
Shortlist:
- AES-256-GCM-SIV with LBA-derived nonce + separate tag area. The
nonce is
HMAC(K_nonce, LBA)(deterministic, no extra storage). The tag (128 bits) is stored in a reserved tag area, either a sidecar journal (dm-integrity style) or a reserved footer per block group. Cost: ~3% storage overhead for the tag, one extra read/write to the tag area per I/O (usually absorbed by sector grouping). Defends against #1 and #2. - XChaCha20-Poly1305 with random nonce + tag. Same tag-storage problem as GCM-SIV; XChaCha’s 192-bit nonce removes nonce-reuse concerns entirely. Slower than AES on hardware that has AES-NI, faster on hardware that doesn’t (e.g. low-end ARM).
- AES-256-XTS alone. The LUKS1/LUKS2 default. Reject this as the sole defense; it fails #2. May still be useful as a building block under an external MAC (dm-integrity + dm-crypt in Linux).
- Wide-block constructions (HCTR2, Adiantum). Length-preserving, no MAC. Better diffusion than XTS but still fail #2. Useful only when storage overhead for tags is unacceptable and tamper-detection is being provided elsewhere.
Recommendation: AES-256-GCM-SIV with LBA-derived nonce and a
dedicated tag area, fallback to XChaCha20-Poly1305 on hardware without
AES-NI. Document the tag-area layout in VolumeFormat; don’t invent a
scheme per deployment.
Object layer (Layer B)
Requirement: per-object authentication; compatibility with content-addressed storage where possible.
Options, with the honest tradeoffs:
- Per-tenant keys,
hash(ciphertext)as address. Each user’s Store encrypts with their key. Dedup works within a volume, not across. Metadata (object size, access patterns) is visible to a peer holding the backingBlockDevice. This is the recommended default. - Per-tenant keys,
HMAC(K, plaintext)as address. Address derived deterministically from plaintext allows a user to look up their own objects by plaintext hash without scanning. Same cross-tenant properties as above. - Convergent encryption (key =
hash(plaintext)). Global dedup across users, but leaks equality: “user X holds the same file as user Y.” Rejected as a default; too much leakage for a capability-based OS that treats ambient authority as a bug.
All three use an AEAD (GCM-SIV or XChaCha20-Poly1305) per object with a random nonce stored with the object.
System Volume Flow
- Boot firmware loads Limine, which loads the kernel + init + boot services from an unencrypted boot partition.
- Kernel spawns init. Init spawns a minimal service graph: block
device driver, console service,
KeySourceservice (one of passphrase / TPM / cloud KMS / manifest-embedded), and theEncryptedBlockDeviceFactoryservice. - Init obtains the unlock context. For interactive boot: read a passphrase via the console login flow in boot-to-shell-proposal.md. For unattended boot: invoke TPM unseal, KMS decrypt, or an attestation protocol. Contexts that require networking (cloud KMS, Tang) come up after the network stack.
- Init hands
(BlockDevice, SymmetricKey)toEncryptedBlockDeviceFactory.openand receives a plaintext-viewBlockDevice. - Init hands that
BlockDeviceto the filesystem or Store service, which becomes the system storage root. - Init pivots to the services graph baked in the now-readable system
volume. Services that do not need direct I/O never see a raw
BlockDeviceand therefore never see ciphertext.
Analogous to Linux’s initramfs pattern, but with capabilities
instead of /dev paths.
User Volume Flow
- User authenticates through the login flow in
boot-to-shell-proposal.md. Success
yields a session and a
CredentialStoreresponse. SessionManagerinvokes the user’sKeySource— passkey PRF, password-derived, or cloud-held — yielding a userSymmetricKey.SessionManagerhands(UserNamespace, UserKey)to anEncryptedNamespaceFactory.openand receives a plaintext-viewNamespace.- The plaintext Namespace is installed in the session’s CapSet. Services in the session see only the user’s decrypted view.
- On logout, the session is torn down; the user
SymmetricKeycap is released; the key service’s in-process material is zeroized.EncryptedNamespacestops decrypting. Ciphertext remains intact on disk.
Revocation is a cap-drop, not a filesystem rekey.
SharedBuffer and DMA
SharedBuffer (docs/roadmap.md Stage 6 / MemoryObject) exists so devices can
DMA directly into app pages. Software block encryption is inherently
incompatible with that: the device writes ciphertext; the app expects
plaintext.
Three honest answers:
- Extra copy. Driver DMAs into a scratch page held by the
EncryptedBlockDeviceservice, which decrypts into the app’sSharedBuffer. One extra copy per I/O. Simple; correct; first implementation. Cost is dominated by the crypto itself, not the copy, for typical I/O sizes. - Decrypt in place. Device DMAs ciphertext into the app’s
SharedBuffer; the service decrypts it in-place before completion is posted. Saves a copy, keeps CPU crypto on the hot path, and complicates reuse of the buffer (the app sees ciphertext briefly, then plaintext). Viable once the buffer lifetime is well-specified. - Hardware inline crypto. NVMe OPAL, SED drives, Intel CSE, AES-XTS block engines on some ARM SoCs. Device sees the key; DMA paths see plaintext; software sees an unencrypted-looking device. Different trust model — the device is now in the TCB — and different key-provisioning story (IEEE 1667 / TCG Opal PSID). Note for future work; not a first-implementation target.
First implementation: #1. Revisit #2 when I/O performance matters.
Treat #3 as a separate capability shape (SelfEncryptingBlockDevice)
rather than a flag on the main interface.
Boot Order and the Unencrypted Boot Partition
By construction there must be an unencrypted partition containing at least: Limine, kernel, init, the block device driver, the key-source service(s), the encrypted block device factory, and — if the key source requires it — a minimal networking stack.
This partition is the trust root for the whole system. It does not need to be encrypted, because its contents are either integrity-protected by a measured-boot chain or considered public anyway (the capOS binaries are open source). It does need to be integrity-protected, which is secure boot / measured boot — addressed in storage-and-naming-proposal.md Open Question #5 and not duplicated here.
Relationship to that question: a TPM-sealed KeySource requires
measured boot to be useful. Without measurement, a tampered boot
partition can unseal the key under attacker-controlled code. A
passphrase KeySource does not require measured boot, only the
expectation that the user will notice if the boot UI looks wrong. A
cloud KMS KeySource relies on cloud-provider instance identity,
which is a parallel trust story (see below).
Cloud Integration
Cloud environments change every part of this picture: the block device is virtual, the key store is a network service, instance identity is provider-signed, object storage exists as a first-class primitive, and backups are a product, not a script. capOS should treat each of these as a capability and reuse them.
Cloud block storage (EBS, GCP Persistent Disk, Azure Disk)
These volumes are already encrypted at rest by the provider. The question is whose key performs the encryption:
| Model | Provider sees plaintext? | Customer controls key? | Customer does crypto? |
|---|---|---|---|
| Provider-managed (default) | Yes (plaintext in volume) | No | No |
| Customer-managed (CMEK) | Yes (plaintext in volume) | Yes (via KMS) | No |
| Customer-supplied (CSEK) | Briefly, during request | Yes | No |
| Client-side (Layer A) | No | Yes | Yes |
capOS’s BlockDevice cap is indifferent to which of the first three
the provider is doing. For the fourth — client-side encryption — capOS
wraps the provider’s BlockDevice cap in its own
EncryptedBlockDevice. The provider sees only ciphertext and cannot
read the volume even with a compelled-disclosure order.
Deployment guidance:
- Untrusted provider / compliance-driven: Layer A over cloud block storage. Provider-side encryption becomes a belt-and-braces redundancy.
- Trusted provider / operational simplicity: rely on CMEK, skip Layer A. Capability model still contains peer services — a compromised capOS service does not get raw block I/O unless it holds the cap.
- Confidential-computing VMs (SEV-SNP / TDX / Nitro): use Layer A
with an attestation-gated
KeySource. The attestation report proves the VM is genuine and running approved code; KMS releases the DEK only against a valid report.
Cloud KMS (AWS KMS, GCP KMS, Azure Key Vault, Vault, …)
Envelope encryption is the universal pattern: the cloud KMS holds a key-encrypting key (KEK) with tight IAM-bound access; the actual data-encrypting key (DEK) is generated by capOS, wrapped by the KEK, stored alongside the ciphertext, and unwrapped by KMS at unlock time.
Map to capabilities:
- A
CloudKmsKeySourceservice implementsKeySource.unlock(blob)sends the wrapped DEK to KMS forDecrypt, receives the plaintext DEK, constructs a localSymmetricKeycap around it, and returns it. - The service authenticates to KMS using the VM’s instance identity,
obtained from a
CloudMetadata-derivedInstanceIdentitycap (see cloud-metadata-proposal.md). No long-lived credentials are baked into the image. seal(key, KmsPolicy{kmsKeyId, grant})calls KMSEncryptto wrap the key under the named KEK and returns the opaque blob.- KMS audit logs record every unwrap. This is a free observability win capOS inherits by delegation; nothing in the OS needs to log key usage separately.
Benefits of envelope encryption that capOS gets by following the pattern:
- Free KEK rotation. Rotating the KEK requires only re-wrapping
the DEK (fast, metadata-only). The DEK itself stays; the volume is
not rewritten. A
rewrapmethod onKeySourcemakes this explicit. - Revocation. Disable the KMS key or revoke the IAM grant; the
next
unlockfails. Running instances with a cached DEK continue until reboot — matches Linux behavior. - Cross-region / cross-account access. KMS grants move
ciphertext-readable capability between accounts without handing
over the key material. capOS reads that as “the receiving account
holds a
KeySourcecap whose policy the grant satisfies.”
Non-AWS KMS providers (Vault, HSM clusters, KMIP devices) fit the
same interface. The CloudKmsKeySource service name is a placeholder;
production likely wants one service per provider, or one generic
service with a provider-selection parameter.
Instance identity and attestation
Cloud VMs authenticate to KMS without baked-in credentials because the hypervisor signs identity tokens. AWS IMDSv2, GCP metadata identity tokens, and Azure IMDS all produce short-lived signed JWTs. Confidential-computing platforms extend this with hardware attestation reports (SEV-SNP, TDX, Nitro).
An InstanceIdentity capability — carved out of
cloud-metadata-proposal.md — exposes
these token and attestation paths. Key-source services consume that
cap instead of pulling from an ambient metadata endpoint. Revoking a
service’s access to the metadata service becomes a cap-graph edit:
no firewall rules, no iptables on 169.254.169.254.
OIDC-gated volume unlock (workload identity federation)
InstanceIdentity is the raw material. Modern clouds consume it
through OIDC token exchange (RFC 8693) rather than a provider-
specific identity API. That pattern is defined in
oidc-and-oauth2-proposal.md as
WorkloadIdentityFederation; volume encryption consumes it through
OidcFederatedKeySource (see
cryptography-and-key-management-proposal.md).
System-volume flow:
- Boot the key-less image.
initstarts the block driver, the metadata service, and the OAuth service, but never holds raw cloud credentials. CloudMetadatareturns anInstanceIdentitycap (a signed JWT from the hypervisor).WorkloadIdentityFederation.exchangeposts that JWT to the cloud STS withgrant_type = urn:ietf:params:oauth:grant-type:token-exchangeandsubject_token_type = urn:ietf:params:oauth:token-type:jwt. It receives a short-lived cloud access token bound to the instance’s identity.OidcFederatedKeySourceuses that access token to authenticate aDecryptcall on the wrapped DEK at the cloud KMS. The plaintext DEK returns as aSymmetricKeycap.EncryptedBlockDeviceFactory.opencomposes that key with the rawBlockDeviceand returns a plaintext-viewBlockDevice.
Per-user volume flow (Layer B):
- Alice authenticates through console or web shell OIDC; the IdP issues an ID token and an access token.
SessionManagermints herUserSession; herAccessTokencap is handed toOidcFederatedKeySourcewrapped inside the broker- returned session bundle — never as a bearer string.- The key service enforces
SealPolicy.tokenExchange { issuer, audience, subjectPattern, requiredClaims, minAuthStrength }. It verifies the access token (or an ID token it exchanges for) against its pinned IdP trust record and only then releases Alice’s DEK. EncryptedNamespaceFactory.openyields Alice’s plaintext namespace. Logout drops the cap; the in-process key material zeroizes.
Properties this adds on top of plain CloudKmsKeySource:
- No long-lived IAM credentials anywhere in the image. The historical instance-role access-key pair is gone; what remains is a short-lived access token tied to the live workload.
- Audit keyed on principal. Cloud KMS logs the OIDC
subof every Decrypt, so “Alice’s laptop unlocked her volume at 09:14” is observable without extra audit glue. - Step-up authentication on the unlock path.
TokenExchangePolicy.minAuthStrengthmaps to X.1254 LoA. A volume requiringloa3cannot be unlocked by a passwords-only session. - Revocation through IdP or KMS. Disable Alice at the IdP or revoke the IAM grant and the next unlock fails. Cached DEKs in running instances survive until reboot — identical to today’s cloud KMS semantics but explicit.
Token TTL vs. cached DEK
OIDC access tokens typically expire in minutes; DEKs typically live
for as long as a volume is mounted. OidcFederatedKeySource.unlock
is called once per mount; the DEK cap is held by the encrypted
block/namespace service until mount ends. Token expiry after unlock
does not re-lock the volume. This matches every other KMS-unwrap
pattern (CloudKmsKeySource, Tpm2KeySource), but it is worth
saying aloud: short-lived tokens give short-lived authorization
freshness, not short-lived key availability. Deployments that
want stricter revocation can:
- require periodic re-unlock (re-mount) via broker policy,
- keep the volume mounted read-only by default and require a fresh token for each write window,
- or use a confidential-computing + attestation-gated KEK that the hardware refuses to re-release on policy change.
No baked credentials policy
The capOS ISO must contain neither a long-lived cloud IAM credential
nor a long-lived bearer token. ManifestEmbeddedKeySource remains
dev/CI only. Production builds pass through one of:
Tpm2KeySource, AttestationKeySource, CloudKmsKeySource
(instance-identity flow), or OidcFederatedKeySource
(workload-federation flow). The manifest validator should refuse a
production-profile image that embeds a symmetric volume key or a
long-lived cloud credential.
Object storage (S3, GCS, Azure Blob)
Object storage is a natural backend for the capability-native
Store. The Store service holds an S3Bucket cap, serializes capnp
messages as S3 objects keyed by their content hash, and exports
Store / Namespace caps to clients.
Encryption trust tiers mirror block storage:
| Model | Provider sees plaintext? | Customer key? | Customer does crypto? |
|---|---|---|---|
| SSE-S3 | Yes | No | No |
| SSE-KMS | Yes | Yes (KMS) | No |
| SSE-C | Briefly | Yes | No |
| Client-side (Layer B in Store) | No | Yes | Yes |
Client-side is the interesting case for capOS. The content-addressed
Store can encrypt each blob with a per-tenant DEK before upload,
keying objects by hash(ciphertext) or HMAC(K, plaintext). The DEK
is wrapped by cloud KMS; the bucket can be world-readable without
leaking plaintext. This is a deployment where “the provider stores our
data” and “the provider cannot read our data” coexist.
Nonce management across objects becomes the main design question. Either:
- random 192-bit nonce per object (XChaCha), stored as an object header; or
- derived nonce from object identity (
HMAC(K_n, object_id)), requires that the same plaintext object is never uploaded twice under the same key, which is consistent with content-addressing semantics.
Backups
Backups are where encryption choices pay off or hurt:
- Block-level snapshot / cross-region replication. The provider handles it. A snapshot of a Layer-A-encrypted EBS volume is ciphertext; restoring requires the KMS key. Cross-region replication requires the key to be grant-accessible in the target region. Free; handled by the provider.
- Application-level backup service. A backup service holds a
StoreorDirectorycap, reads objects, writes them to an object-storage bucket, and records the backup manifest. If Layer B is in place, the backup bytes are already encrypted — no re-encryption needed, and the backup destination does not need the user’s key. If only Layer A is in place, the backup service sees plaintext because Layer A wraps below the Directory; the backup service must re-encrypt for the destination. - Restore to a different account / region / capOS install. The
key must be reachable in the target environment. For KMS-wrapped
DEKs: cross-account grants, multi-region KMS keys, or replicated
key material. For TPM-sealed DEKs: explicit re-seal to the target
TPM before restore. capOS does not need to implement this
directly; it needs the
KeySourceabstraction to not hide the provider-specific primitives that enable it.
A backup KeyPolicy worth documenting: “this key is usable in
regions A, B, and C, wrapped under KMS keys k_a, k_b, k_c, all
granting access to the instance identity role backup-reader.” This
is routine on AWS and routinely surprising to people who expect Linux
dm-crypt semantics.
Keys never in the image
The capOS ISO must never contain production keys. The
ManifestEmbeddedKeySource (key-management proposal) exists for
development and CI only; the manifest validator should refuse to boot
from an image that embeds a non-development key on a
production-profile manifest. The production flow is always: boot from
a key-less image, obtain identity from the cloud, fetch the wrapping
policy from the cloud, unwrap a DEK via KMS, mount the volume. Same
property as AWS’s “EBS with KMS requires no bootstrap secrets on the
instance.”
Confidential computing
SEV-SNP, TDX, and AWS Nitro Enclaves produce attestation reports that include measurements of the VM image. A KMS policy can require a matching attestation before releasing the wrapping key. In capOS:
AttestationServiceexposesattestation(nonce) -> report(the report includes the image measurement, firmware version, and VM metadata signed by the hardware root of trust).KeySourceof kindattestationcollects the report and submits it as part of the KMSDecryptrequest; KMS enforces the policy server-side.- The trust story becomes: “this capOS image, unmodified, running on genuine SEV-SNP / TDX / Nitro hardware, is the only thing that can unlock this volume.” That is materially stronger than instance-identity alone.
This composes cleanly with Layer A: the confidential VM reads ciphertext from a cloud disk, unwraps the DEK via attestation-gated KMS, and decrypts locally. The cloud provider never sees plaintext and a stolen snapshot cannot be decrypted outside the attested VM.
Phases
No implementation exists. Phases here cover only the volume-specific work; the underlying key abstractions, key sources, and KMS integration are phased in cryptography-and-key-management-proposal.md. Volume encryption tracks, but does not duplicate, that sequence.
Phase V1 — EncryptedBlockDevice over RAM block device
- Add
EncryptedBlockDeviceFactory,VolumeFormat,TagAreaLayout, andFormatParamstoschema/capos.capnp. - Wire the service between a RAM-backed
BlockDeviceand the Store or a toy FAT reader. Key source isManifestEmbeddedKeySourcefrom the key-management proposal’s Phase 1. - Implement AES-256-GCM-SIV with a reserved tag area; document the on-disk format (superblock, tag area layout, block size).
- Measurement: demonstrate a Store survives a ciphertext read of the raw RAM disk and fails decrypt after a flipped bit.
Phase V2 — EncryptedNamespace and user-volume path
- Add
EncryptedNamespaceFactoryschema. - Layer B over a RAM-backed Store. Depends on
PassphraseKeySource(key-management Phase 4) andPasskeyPrfKeySourceonce passkey infrastructure lands. - Revocation tests: dropping a session’s key cap renders the namespace unreadable without rebooting.
Phase V3 — Persistent storage integration
- Promote Phase V1 from RAM disk to virtio-blk.
- System volume unlock in the normal boot path. Default dev build uses a manifest-embedded key; production build requires passphrase/TPM/KMS.
- QEMU smoke: system volume encrypted with a passphrase, reboot survives, wrong passphrase fails closed.
Phase V4 — TPM-backed system volume
- Depends on
Tpm2KeySourcefrom key-management Phase 5. - Measured-boot chain: firmware, bootloader, kernel, init, key service. PCR composition for a sealed system volume documented.
Phase V5 — Cloud deployment
- Depends on
CloudKmsKeySourcefrom key-management Phase 6. - Client-side encrypted block volume over cloud block storage.
- Optional: client-side encrypted Store backend over object storage.
Phase V5b — OIDC-federated unlock
- Depends on
OidcFederatedKeySourcefrom key-management Phase 6b and onWorkloadIdentityFederationfrom oidc-and-oauth2-proposal.md Phase 5. - System volume unlocks through token-exchange against the cloud STS; no long-lived IAM credentials in the image.
- Per-user
EncryptedNamespaceunlocks from a userAccessTokenunderSealPolicy.tokenExchange. - QEMU smoke against a local fake STS (e.g.
dex) proves the flow end-to-end before targeting a real cloud.
Phase V6 — Confidential computing
- Depends on
AttestationKeySourcefrom key-management Phase 7. - Attestation-gated system volume unlock on SEV-SNP / TDX / Nitro.
- QEMU SEV-SNP smoke (where toolchain supports it).
Relationship to Other Proposals
cryptography-and-key-management-proposal.md— primary dependency. DefinesSymmetricKey,KeySource,KeyVault,KeyAlgorithm,KeyPurpose,SealPolicy, and every concrete key source this proposal names. This proposal adds only the volume wrapper factories and on-disk format.storage-and-naming-proposal.md— Open Question #5 (manifest trust and secure boot) is a prerequisite for a TPM-sealedKeySourceto be meaningful. This proposal extends the storage stack withEncryptedBlockDeviceandEncryptedNamespaceas optional wrapper services; theBlockDevice,File,Directory,Store, andNamespaceinterfaces are unchanged.boot-to-shell-proposal.md— the passphrase / passkey unlock path at the console and in the web gateway feedsKeySourceimplementations.CredentialStore,SessionManager, andAuthorityBrokeralready think about missing credentials not implying an unlocked system; this proposal extends that to “missing key source implies missing system volume, not zero-fill.”user-identity-and-policy-proposal.md— user-volume keys are bound to session identity. The cap chain that yields “you are Alice” also yields Alice’s KEK.cloud-metadata-proposal.md—CloudMetadataand theInstanceIdentitycap carved out of it are what the cloudKeySourceimplementations consume to authenticate to KMS without baked-in credentials.oidc-and-oauth2-proposal.md— theWorkloadIdentityFederationand token-exchange primitives behindOidcFederatedKeySource. Also the source of theAccessToken/IdTokencap shape used in per-user volume unlock and the policy inputs consumed bySealPolicy.tokenExchange.cloud-deployment-proposal.md— hardware abstraction for NVMe and SED drives sets the ground for a futureSelfEncryptingBlockDevicecapability (hardware inline crypto), distinct from this proposal’s software-crypto Layer A.security-and-verification-proposal.md— the encrypted block format is a good target for the tiered tooling plan: fuzz corrupted ciphertext at the block boundary, proptest round-trips through the wrapper, Loom-model the volume unlock state machine, Kani-prove LBA-nonce uniqueness invariants. General crypto-side invariants are tracked in the key-management proposal.system-monitoring-proposal.md— volume unlock, decrypt failure, and format-params events are audit-worthy. TheEncryptedBlockDeviceservice emits them through the audit cap. Generic key events are emitted by the key-management services.live-upgrade-proposal.md— replacing theEncryptedBlockDeviceservice must preserve in-flight I/O and the DEK. The service holds sensitive state (the key material); live upgrade needs a state-transfer path that does not touch the disk and does not leak the key through shared memory.
Open Questions
- Tag area layout. Sidecar journal (dm-integrity style, separate device or partition) vs. reserved footer per block group vs. derived-nonce-only-plus-separate-MAC-area. Affects write amplification, recovery, and fsync semantics. A small measurement study under QEMU would settle it.
- Key rotation at scale. Rewrap-only (KEK rotation) is cheap. Rekeying a DEK on a live volume means re-encrypting every block. Online rekey is a research problem; for capOS a controlled offline rekey service reading old-key and writing new-key is the honest first answer.
- Metadata leakage in Layer B. fscrypt-style filename encryption is fiddly (deterministic encryption to preserve directory lookups vs. randomized encryption that breaks them). Decide whether Layer B encrypts names as well as contents, and how lookups work if names are randomized.
- Backup re-encryption. A backup crossing trust boundaries needs either shared key material at both ends or an explicit re-encrypt step. Who does the re-encryption — the backup service, a dedicated re-encryption service, or a KMS-side primitive? Policy question, not a mechanism question, but worth documenting defaults.
- Hardware inline crypto as a separate capability. NVMe OPAL and
SED drives do not fit the software-AEAD model. Define
SelfEncryptingBlockDevicewith its ownopen/lock/unlockmethods and a separate trust story (the device is in the TCB). - Swap / paging. No swap yet. When added, encrypted swap with a per-boot ephemeral key is standard. The memory-pressure policy, page-eligibility rules, and swap lifecycle now live in oom-and-swap-proposal.md.
- Firmware and boot-partition integrity. This proposal assumes
secure boot / measured boot is available when TPM-sealed keys are
in use. The actual secure-boot work is owned by
storage-and-naming-proposal.mdOpen Question #5 and is prerequisite, not in scope here.
Algorithm enum scope, side-channel hardening, post-quantum migration, GOST support, and audit granularity are answered in cryptography-and-key-management-proposal.md’s open-questions section rather than duplicated here.