Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Volume Encryption

Encrypting system and user volumes in a capability OS where storage is already a stack of typed capabilities and keys can be first-class capability objects.

Problem

capOS currently has no persistent storage, no crypto, no TPM driver, and no block-device drivers. That is the right moment to decide what encryption-at-rest looks like, before storage interfaces and service graphs harden around plaintext assumptions.

Traditional OSes bolt encryption on as a kernel subsystem (dm-crypt/LUKS, BitLocker, FileVault, fscrypt). That choice follows from those kernels’ architecture: the kernel owns block I/O, the filesystem, the keyring, and the trust domain between processes, so encryption logically lives there too. capOS has made the opposite bet — the kernel is a capability router, block I/O lives in userspace services, filesystems are userspace services, and there is no ambient keyring because there is no ambient anything.

Putting crypto in the kernel would contradict Design Principle 5 (“the kernel is becoming a capnp-rpc router”) and Principle 7 (“pragmatic reuse” — let userspace crates do what they already do well). Putting it nowhere leaves the system unable to protect data at rest. The proposal below places encryption in userspace services expressed as capabilities, with no new kernel mechanism.

Threat Model

Four attackers worth distinguishing up front, because the defenses differ:

  1. Offline disk theft. Attacker has the storage medium, no live system, no running key service, possibly no hardware attestation. Ciphertext must reveal nothing about plaintext beyond length and block boundaries.
  2. Ciphertext tampering at rest. Attacker can write to the medium and hopes to flip ciphertext bits to produce attacker-chosen plaintext changes (classic XTS malleability). Modification must be detected, not merely scrambled.
  3. Peer userspace service holding the raw BlockDevice cap. The virtio-blk driver, a backup agent, a telemetry exporter, or any service that is on the physical I/O path. They hold authority to read sectors but must not see plaintext for volumes whose key they do not hold.
  4. Compromised session with a live key cap. Once an attacker is inside a user’s session and holds the user’s SymmetricKey cap, that user’s data is lost. The goal is lateral containment: no cross-user leverage, no escalation to the system volume, no access to other sessions’ keys.

Out of scope for a first pass:

  • Cold-boot RAM attacks and side channels (mitigation: use TPM-bound keys when available, but physical memory reads against a running host are not defended).
  • Evil-maid attacks on the unencrypted portion of the boot image (addressed separately by secure boot / measured boot — see storage-and-naming-proposal.md Open Question #5).
  • Traffic analysis against encrypted backups or encrypted replication.
  • Key escrow for legal recovery. capOS takes no position; a deployment can add an escrow KeySource without changing the model.

Keys Are Capabilities

Key material never crosses cap boundaries. Callers hold SymmetricKey or PrivateKey capabilities whose methods run inside the service that holds the key; the holder gets encrypt/decrypt/sign authority, not the bytes. Attenuation (decrypt-only, AAD-pinned, purpose-bound) is wrapper CapObjects, the same mechanism that builds read-only Files.

This proposal does not define those interfaces. They belong to cryptography-and-key-management-proposal.md, which covers SymmetricKey, PrivateKey/PublicKey, KeySource, KeyVault, algorithm and purpose enums, seal policies, and the set of concrete key sources (manifest-embedded, passphrase, passkey PRF, TPM 2.0, cloud KMS, attestation, network, software-stored). Volume encryption is one consumer among many.

Layer Placement

Two layers exist, and a first-class design uses both.

Layer A — EncryptedBlockDevice (LUKS analog)

A userspace service holds two caps — BlockDevice (raw) and SymmetricKey — and exports a new BlockDevice cap that looks identical to its input but encrypts writes and decrypts reads transparently. Everything above the wrapper (filesystems, the Store service, content-addressed backends) is oblivious.

Raw block device
  → virtio-blk / NVMe driver → BlockDevice cap (ciphertext)
    → EncryptedBlockDevice service holds [BlockDevice + SymmetricKey]
      → BlockDevice cap (plaintext-view)
        → FAT / ext4 / Store service
          → File / Directory / Namespace caps
            → App

Properties:

  • One key per volume (or per-range, see “Key hierarchy” below).
  • Granularity is a sector/block. Metadata in the filesystem layer is encrypted along with data — the shape of the directory tree is invisible to threat #3.
  • Incompatible with zero-copy device DMA into user pages (see “SharedBuffer” below).

Layer A defends against threats #1, #2, and #3.

Layer B — per-user Namespace / Directory encryption (fscrypt analog)

Layered above a filesystem or Store, Layer B encrypts object contents and, optionally, object names, using a per-user key. The underlying block device may or may not also be encrypted.

BlockDevice (ciphertext or plaintext)
  → Store service → Store/Namespace caps (ciphertext objects)
    → EncryptedNamespace service holds [Namespace + UserKey]
      → Namespace cap (plaintext-view)
        → User's session services

Properties:

  • One key per user (or per session, per device, per tenant).
  • Metadata at the filesystem/Store layer is visible to threat #3 unless Layer A is also in place.
  • Cap boundaries are naturally per-user — revocation is “drop the cap,” no filesystem rekeying.
  • Compatible with shared filesystems across users (per-entry encryption).

Layer B defends primarily against #4-lateral (a compromise of user Bob’s session does not reveal user Alice’s data) and against a compromised shared filesystem service when the underlying block layer is unencrypted.

Recommendation

Use both. Layer A for the system volume and for the per-tenant block substrate in multi-tenant deployments; Layer B for per-user data on top of a shared filesystem or store. Users who run single-tenant desktops can skip B. Cloud VMs that rely on provider-side encryption of block storage (see “Cloud integration”) can skip A and keep B. The proposal does not mandate either layer; it standardizes the interface so both compose.

Volume-Specific Schemas

SymmetricKey, KeySource, KeyAlgorithm, KeyPurpose, and SealPolicy are defined in cryptography-and-key-management-proposal.md. This proposal adds only the wrapper-factory and on-disk-format schemas.

EncryptedBlockDevice

Exposes nothing new — it implements the existing BlockDevice interface. The distinction is where it sits in the cap graph. A factory cap creates it:

interface EncryptedBlockDeviceFactory {
    open @0 (raw :BlockDevice, key :SymmetricKey, format :VolumeFormat)
         -> (plain :BlockDevice);
    format @1 (raw :BlockDevice, key :SymmetricKey, params :FormatParams)
           -> (plain :BlockDevice);
}

struct VolumeFormat {
    superblock     @0 :Data;  # read from raw device during open()
    algorithm      @1 :SymmetricAlgorithm;  # defined in key-management proposal
    sectorSize     @2 :UInt32;
    tagAreaLayout  @3 :TagAreaLayout;
}

Cryptographic Construction

Two separate questions — block layer and object layer — with different answers.

Block layer (Layer A)

Requirement: authenticate every block. XTS alone is not enough; it defends against #1 but not #2.

Shortlist:

  • AES-256-GCM-SIV with LBA-derived nonce + separate tag area. The nonce is HMAC(K_nonce, LBA) (deterministic, no extra storage). The tag (128 bits) is stored in a reserved tag area, either a sidecar journal (dm-integrity style) or a reserved footer per block group. Cost: ~3% storage overhead for the tag, one extra read/write to the tag area per I/O (usually absorbed by sector grouping). Defends against #1 and #2.
  • XChaCha20-Poly1305 with random nonce + tag. Same tag-storage problem as GCM-SIV; XChaCha’s 192-bit nonce removes nonce-reuse concerns entirely. Slower than AES on hardware that has AES-NI, faster on hardware that doesn’t (e.g. low-end ARM).
  • AES-256-XTS alone. The LUKS1/LUKS2 default. Reject this as the sole defense; it fails #2. May still be useful as a building block under an external MAC (dm-integrity + dm-crypt in Linux).
  • Wide-block constructions (HCTR2, Adiantum). Length-preserving, no MAC. Better diffusion than XTS but still fail #2. Useful only when storage overhead for tags is unacceptable and tamper-detection is being provided elsewhere.

Recommendation: AES-256-GCM-SIV with LBA-derived nonce and a dedicated tag area, fallback to XChaCha20-Poly1305 on hardware without AES-NI. Document the tag-area layout in VolumeFormat; don’t invent a scheme per deployment.

Object layer (Layer B)

Requirement: per-object authentication; compatibility with content-addressed storage where possible.

Options, with the honest tradeoffs:

  • Per-tenant keys, hash(ciphertext) as address. Each user’s Store encrypts with their key. Dedup works within a volume, not across. Metadata (object size, access patterns) is visible to a peer holding the backing BlockDevice. This is the recommended default.
  • Per-tenant keys, HMAC(K, plaintext) as address. Address derived deterministically from plaintext allows a user to look up their own objects by plaintext hash without scanning. Same cross-tenant properties as above.
  • Convergent encryption (key = hash(plaintext)). Global dedup across users, but leaks equality: “user X holds the same file as user Y.” Rejected as a default; too much leakage for a capability-based OS that treats ambient authority as a bug.

All three use an AEAD (GCM-SIV or XChaCha20-Poly1305) per object with a random nonce stored with the object.

System Volume Flow

  1. Boot firmware loads Limine, which loads the kernel + init + boot services from an unencrypted boot partition.
  2. Kernel spawns init. Init spawns a minimal service graph: block device driver, console service, KeySource service (one of passphrase / TPM / cloud KMS / manifest-embedded), and the EncryptedBlockDeviceFactory service.
  3. Init obtains the unlock context. For interactive boot: read a passphrase via the console login flow in boot-to-shell-proposal.md. For unattended boot: invoke TPM unseal, KMS decrypt, or an attestation protocol. Contexts that require networking (cloud KMS, Tang) come up after the network stack.
  4. Init hands (BlockDevice, SymmetricKey) to EncryptedBlockDeviceFactory.open and receives a plaintext-view BlockDevice.
  5. Init hands that BlockDevice to the filesystem or Store service, which becomes the system storage root.
  6. Init pivots to the services graph baked in the now-readable system volume. Services that do not need direct I/O never see a raw BlockDevice and therefore never see ciphertext.

Analogous to Linux’s initramfs pattern, but with capabilities instead of /dev paths.

User Volume Flow

  1. User authenticates through the login flow in boot-to-shell-proposal.md. Success yields a session and a CredentialStore response.
  2. SessionManager invokes the user’s KeySource — passkey PRF, password-derived, or cloud-held — yielding a user SymmetricKey.
  3. SessionManager hands (UserNamespace, UserKey) to an EncryptedNamespaceFactory.open and receives a plaintext-view Namespace.
  4. The plaintext Namespace is installed in the session’s CapSet. Services in the session see only the user’s decrypted view.
  5. On logout, the session is torn down; the user SymmetricKey cap is released; the key service’s in-process material is zeroized. EncryptedNamespace stops decrypting. Ciphertext remains intact on disk.

Revocation is a cap-drop, not a filesystem rekey.

SharedBuffer and DMA

SharedBuffer (docs/roadmap.md Stage 6 / MemoryObject) exists so devices can DMA directly into app pages. Software block encryption is inherently incompatible with that: the device writes ciphertext; the app expects plaintext.

Three honest answers:

  1. Extra copy. Driver DMAs into a scratch page held by the EncryptedBlockDevice service, which decrypts into the app’s SharedBuffer. One extra copy per I/O. Simple; correct; first implementation. Cost is dominated by the crypto itself, not the copy, for typical I/O sizes.
  2. Decrypt in place. Device DMAs ciphertext into the app’s SharedBuffer; the service decrypts it in-place before completion is posted. Saves a copy, keeps CPU crypto on the hot path, and complicates reuse of the buffer (the app sees ciphertext briefly, then plaintext). Viable once the buffer lifetime is well-specified.
  3. Hardware inline crypto. NVMe OPAL, SED drives, Intel CSE, AES-XTS block engines on some ARM SoCs. Device sees the key; DMA paths see plaintext; software sees an unencrypted-looking device. Different trust model — the device is now in the TCB — and different key-provisioning story (IEEE 1667 / TCG Opal PSID). Note for future work; not a first-implementation target.

First implementation: #1. Revisit #2 when I/O performance matters. Treat #3 as a separate capability shape (SelfEncryptingBlockDevice) rather than a flag on the main interface.

Boot Order and the Unencrypted Boot Partition

By construction there must be an unencrypted partition containing at least: Limine, kernel, init, the block device driver, the key-source service(s), the encrypted block device factory, and — if the key source requires it — a minimal networking stack.

This partition is the trust root for the whole system. It does not need to be encrypted, because its contents are either integrity-protected by a measured-boot chain or considered public anyway (the capOS binaries are open source). It does need to be integrity-protected, which is secure boot / measured boot — addressed in storage-and-naming-proposal.md Open Question #5 and not duplicated here.

Relationship to that question: a TPM-sealed KeySource requires measured boot to be useful. Without measurement, a tampered boot partition can unseal the key under attacker-controlled code. A passphrase KeySource does not require measured boot, only the expectation that the user will notice if the boot UI looks wrong. A cloud KMS KeySource relies on cloud-provider instance identity, which is a parallel trust story (see below).

Cloud Integration

Cloud environments change every part of this picture: the block device is virtual, the key store is a network service, instance identity is provider-signed, object storage exists as a first-class primitive, and backups are a product, not a script. capOS should treat each of these as a capability and reuse them.

Cloud block storage (EBS, GCP Persistent Disk, Azure Disk)

These volumes are already encrypted at rest by the provider. The question is whose key performs the encryption:

ModelProvider sees plaintext?Customer controls key?Customer does crypto?
Provider-managed (default)Yes (plaintext in volume)NoNo
Customer-managed (CMEK)Yes (plaintext in volume)Yes (via KMS)No
Customer-supplied (CSEK)Briefly, during requestYesNo
Client-side (Layer A)NoYesYes

capOS’s BlockDevice cap is indifferent to which of the first three the provider is doing. For the fourth — client-side encryption — capOS wraps the provider’s BlockDevice cap in its own EncryptedBlockDevice. The provider sees only ciphertext and cannot read the volume even with a compelled-disclosure order.

Deployment guidance:

  • Untrusted provider / compliance-driven: Layer A over cloud block storage. Provider-side encryption becomes a belt-and-braces redundancy.
  • Trusted provider / operational simplicity: rely on CMEK, skip Layer A. Capability model still contains peer services — a compromised capOS service does not get raw block I/O unless it holds the cap.
  • Confidential-computing VMs (SEV-SNP / TDX / Nitro): use Layer A with an attestation-gated KeySource. The attestation report proves the VM is genuine and running approved code; KMS releases the DEK only against a valid report.

Cloud KMS (AWS KMS, GCP KMS, Azure Key Vault, Vault, …)

Envelope encryption is the universal pattern: the cloud KMS holds a key-encrypting key (KEK) with tight IAM-bound access; the actual data-encrypting key (DEK) is generated by capOS, wrapped by the KEK, stored alongside the ciphertext, and unwrapped by KMS at unlock time.

Map to capabilities:

  • A CloudKmsKeySource service implements KeySource. unlock(blob) sends the wrapped DEK to KMS for Decrypt, receives the plaintext DEK, constructs a local SymmetricKey cap around it, and returns it.
  • The service authenticates to KMS using the VM’s instance identity, obtained from a CloudMetadata-derived InstanceIdentity cap (see cloud-metadata-proposal.md). No long-lived credentials are baked into the image.
  • seal(key, KmsPolicy{kmsKeyId, grant}) calls KMS Encrypt to wrap the key under the named KEK and returns the opaque blob.
  • KMS audit logs record every unwrap. This is a free observability win capOS inherits by delegation; nothing in the OS needs to log key usage separately.

Benefits of envelope encryption that capOS gets by following the pattern:

  • Free KEK rotation. Rotating the KEK requires only re-wrapping the DEK (fast, metadata-only). The DEK itself stays; the volume is not rewritten. A rewrap method on KeySource makes this explicit.
  • Revocation. Disable the KMS key or revoke the IAM grant; the next unlock fails. Running instances with a cached DEK continue until reboot — matches Linux behavior.
  • Cross-region / cross-account access. KMS grants move ciphertext-readable capability between accounts without handing over the key material. capOS reads that as “the receiving account holds a KeySource cap whose policy the grant satisfies.”

Non-AWS KMS providers (Vault, HSM clusters, KMIP devices) fit the same interface. The CloudKmsKeySource service name is a placeholder; production likely wants one service per provider, or one generic service with a provider-selection parameter.

Instance identity and attestation

Cloud VMs authenticate to KMS without baked-in credentials because the hypervisor signs identity tokens. AWS IMDSv2, GCP metadata identity tokens, and Azure IMDS all produce short-lived signed JWTs. Confidential-computing platforms extend this with hardware attestation reports (SEV-SNP, TDX, Nitro).

An InstanceIdentity capability — carved out of cloud-metadata-proposal.md — exposes these token and attestation paths. Key-source services consume that cap instead of pulling from an ambient metadata endpoint. Revoking a service’s access to the metadata service becomes a cap-graph edit: no firewall rules, no iptables on 169.254.169.254.

OIDC-gated volume unlock (workload identity federation)

InstanceIdentity is the raw material. Modern clouds consume it through OIDC token exchange (RFC 8693) rather than a provider- specific identity API. That pattern is defined in oidc-and-oauth2-proposal.md as WorkloadIdentityFederation; volume encryption consumes it through OidcFederatedKeySource (see cryptography-and-key-management-proposal.md).

System-volume flow:

  1. Boot the key-less image. init starts the block driver, the metadata service, and the OAuth service, but never holds raw cloud credentials.
  2. CloudMetadata returns an InstanceIdentity cap (a signed JWT from the hypervisor).
  3. WorkloadIdentityFederation.exchange posts that JWT to the cloud STS with grant_type = urn:ietf:params:oauth:grant-type:token-exchange and subject_token_type = urn:ietf:params:oauth:token-type:jwt. It receives a short-lived cloud access token bound to the instance’s identity.
  4. OidcFederatedKeySource uses that access token to authenticate a Decrypt call on the wrapped DEK at the cloud KMS. The plaintext DEK returns as a SymmetricKey cap.
  5. EncryptedBlockDeviceFactory.open composes that key with the raw BlockDevice and returns a plaintext-view BlockDevice.

Per-user volume flow (Layer B):

  1. Alice authenticates through console or web shell OIDC; the IdP issues an ID token and an access token.
  2. SessionManager mints her UserSession; her AccessToken cap is handed to OidcFederatedKeySource wrapped inside the broker- returned session bundle — never as a bearer string.
  3. The key service enforces SealPolicy.tokenExchange { issuer, audience, subjectPattern, requiredClaims, minAuthStrength }. It verifies the access token (or an ID token it exchanges for) against its pinned IdP trust record and only then releases Alice’s DEK.
  4. EncryptedNamespaceFactory.open yields Alice’s plaintext namespace. Logout drops the cap; the in-process key material zeroizes.

Properties this adds on top of plain CloudKmsKeySource:

  • No long-lived IAM credentials anywhere in the image. The historical instance-role access-key pair is gone; what remains is a short-lived access token tied to the live workload.
  • Audit keyed on principal. Cloud KMS logs the OIDC sub of every Decrypt, so “Alice’s laptop unlocked her volume at 09:14” is observable without extra audit glue.
  • Step-up authentication on the unlock path. TokenExchangePolicy.minAuthStrength maps to X.1254 LoA. A volume requiring loa3 cannot be unlocked by a passwords-only session.
  • Revocation through IdP or KMS. Disable Alice at the IdP or revoke the IAM grant and the next unlock fails. Cached DEKs in running instances survive until reboot — identical to today’s cloud KMS semantics but explicit.

Token TTL vs. cached DEK

OIDC access tokens typically expire in minutes; DEKs typically live for as long as a volume is mounted. OidcFederatedKeySource.unlock is called once per mount; the DEK cap is held by the encrypted block/namespace service until mount ends. Token expiry after unlock does not re-lock the volume. This matches every other KMS-unwrap pattern (CloudKmsKeySource, Tpm2KeySource), but it is worth saying aloud: short-lived tokens give short-lived authorization freshness, not short-lived key availability. Deployments that want stricter revocation can:

  • require periodic re-unlock (re-mount) via broker policy,
  • keep the volume mounted read-only by default and require a fresh token for each write window,
  • or use a confidential-computing + attestation-gated KEK that the hardware refuses to re-release on policy change.

No baked credentials policy

The capOS ISO must contain neither a long-lived cloud IAM credential nor a long-lived bearer token. ManifestEmbeddedKeySource remains dev/CI only. Production builds pass through one of: Tpm2KeySource, AttestationKeySource, CloudKmsKeySource (instance-identity flow), or OidcFederatedKeySource (workload-federation flow). The manifest validator should refuse a production-profile image that embeds a symmetric volume key or a long-lived cloud credential.

Object storage (S3, GCS, Azure Blob)

Object storage is a natural backend for the capability-native Store. The Store service holds an S3Bucket cap, serializes capnp messages as S3 objects keyed by their content hash, and exports Store / Namespace caps to clients.

Encryption trust tiers mirror block storage:

ModelProvider sees plaintext?Customer key?Customer does crypto?
SSE-S3YesNoNo
SSE-KMSYesYes (KMS)No
SSE-CBrieflyYesNo
Client-side (Layer B in Store)NoYesYes

Client-side is the interesting case for capOS. The content-addressed Store can encrypt each blob with a per-tenant DEK before upload, keying objects by hash(ciphertext) or HMAC(K, plaintext). The DEK is wrapped by cloud KMS; the bucket can be world-readable without leaking plaintext. This is a deployment where “the provider stores our data” and “the provider cannot read our data” coexist.

Nonce management across objects becomes the main design question. Either:

  • random 192-bit nonce per object (XChaCha), stored as an object header; or
  • derived nonce from object identity (HMAC(K_n, object_id)), requires that the same plaintext object is never uploaded twice under the same key, which is consistent with content-addressing semantics.

Backups

Backups are where encryption choices pay off or hurt:

  • Block-level snapshot / cross-region replication. The provider handles it. A snapshot of a Layer-A-encrypted EBS volume is ciphertext; restoring requires the KMS key. Cross-region replication requires the key to be grant-accessible in the target region. Free; handled by the provider.
  • Application-level backup service. A backup service holds a Store or Directory cap, reads objects, writes them to an object-storage bucket, and records the backup manifest. If Layer B is in place, the backup bytes are already encrypted — no re-encryption needed, and the backup destination does not need the user’s key. If only Layer A is in place, the backup service sees plaintext because Layer A wraps below the Directory; the backup service must re-encrypt for the destination.
  • Restore to a different account / region / capOS install. The key must be reachable in the target environment. For KMS-wrapped DEKs: cross-account grants, multi-region KMS keys, or replicated key material. For TPM-sealed DEKs: explicit re-seal to the target TPM before restore. capOS does not need to implement this directly; it needs the KeySource abstraction to not hide the provider-specific primitives that enable it.

A backup KeyPolicy worth documenting: “this key is usable in regions A, B, and C, wrapped under KMS keys k_a, k_b, k_c, all granting access to the instance identity role backup-reader.” This is routine on AWS and routinely surprising to people who expect Linux dm-crypt semantics.

Keys never in the image

The capOS ISO must never contain production keys. The ManifestEmbeddedKeySource (key-management proposal) exists for development and CI only; the manifest validator should refuse to boot from an image that embeds a non-development key on a production-profile manifest. The production flow is always: boot from a key-less image, obtain identity from the cloud, fetch the wrapping policy from the cloud, unwrap a DEK via KMS, mount the volume. Same property as AWS’s “EBS with KMS requires no bootstrap secrets on the instance.”

Confidential computing

SEV-SNP, TDX, and AWS Nitro Enclaves produce attestation reports that include measurements of the VM image. A KMS policy can require a matching attestation before releasing the wrapping key. In capOS:

  • AttestationService exposes attestation(nonce) -> report (the report includes the image measurement, firmware version, and VM metadata signed by the hardware root of trust).
  • KeySource of kind attestation collects the report and submits it as part of the KMS Decrypt request; KMS enforces the policy server-side.
  • The trust story becomes: “this capOS image, unmodified, running on genuine SEV-SNP / TDX / Nitro hardware, is the only thing that can unlock this volume.” That is materially stronger than instance-identity alone.

This composes cleanly with Layer A: the confidential VM reads ciphertext from a cloud disk, unwraps the DEK via attestation-gated KMS, and decrypts locally. The cloud provider never sees plaintext and a stolen snapshot cannot be decrypted outside the attested VM.

Phases

No implementation exists. Phases here cover only the volume-specific work; the underlying key abstractions, key sources, and KMS integration are phased in cryptography-and-key-management-proposal.md. Volume encryption tracks, but does not duplicate, that sequence.

Phase V1 — EncryptedBlockDevice over RAM block device

  • Add EncryptedBlockDeviceFactory, VolumeFormat, TagAreaLayout, and FormatParams to schema/capos.capnp.
  • Wire the service between a RAM-backed BlockDevice and the Store or a toy FAT reader. Key source is ManifestEmbeddedKeySource from the key-management proposal’s Phase 1.
  • Implement AES-256-GCM-SIV with a reserved tag area; document the on-disk format (superblock, tag area layout, block size).
  • Measurement: demonstrate a Store survives a ciphertext read of the raw RAM disk and fails decrypt after a flipped bit.

Phase V2 — EncryptedNamespace and user-volume path

  • Add EncryptedNamespaceFactory schema.
  • Layer B over a RAM-backed Store. Depends on PassphraseKeySource (key-management Phase 4) and PasskeyPrfKeySource once passkey infrastructure lands.
  • Revocation tests: dropping a session’s key cap renders the namespace unreadable without rebooting.

Phase V3 — Persistent storage integration

  • Promote Phase V1 from RAM disk to virtio-blk.
  • System volume unlock in the normal boot path. Default dev build uses a manifest-embedded key; production build requires passphrase/TPM/KMS.
  • QEMU smoke: system volume encrypted with a passphrase, reboot survives, wrong passphrase fails closed.

Phase V4 — TPM-backed system volume

  • Depends on Tpm2KeySource from key-management Phase 5.
  • Measured-boot chain: firmware, bootloader, kernel, init, key service. PCR composition for a sealed system volume documented.

Phase V5 — Cloud deployment

  • Depends on CloudKmsKeySource from key-management Phase 6.
  • Client-side encrypted block volume over cloud block storage.
  • Optional: client-side encrypted Store backend over object storage.

Phase V5b — OIDC-federated unlock

  • Depends on OidcFederatedKeySource from key-management Phase 6b and on WorkloadIdentityFederation from oidc-and-oauth2-proposal.md Phase 5.
  • System volume unlocks through token-exchange against the cloud STS; no long-lived IAM credentials in the image.
  • Per-user EncryptedNamespace unlocks from a user AccessToken under SealPolicy.tokenExchange.
  • QEMU smoke against a local fake STS (e.g. dex) proves the flow end-to-end before targeting a real cloud.

Phase V6 — Confidential computing

  • Depends on AttestationKeySource from key-management Phase 7.
  • Attestation-gated system volume unlock on SEV-SNP / TDX / Nitro.
  • QEMU SEV-SNP smoke (where toolchain supports it).

Relationship to Other Proposals

  • cryptography-and-key-management-proposal.md — primary dependency. Defines SymmetricKey, KeySource, KeyVault, KeyAlgorithm, KeyPurpose, SealPolicy, and every concrete key source this proposal names. This proposal adds only the volume wrapper factories and on-disk format.
  • storage-and-naming-proposal.md — Open Question #5 (manifest trust and secure boot) is a prerequisite for a TPM-sealed KeySource to be meaningful. This proposal extends the storage stack with EncryptedBlockDevice and EncryptedNamespace as optional wrapper services; the BlockDevice, File, Directory, Store, and Namespace interfaces are unchanged.
  • boot-to-shell-proposal.md — the passphrase / passkey unlock path at the console and in the web gateway feeds KeySource implementations. CredentialStore, SessionManager, and AuthorityBroker already think about missing credentials not implying an unlocked system; this proposal extends that to “missing key source implies missing system volume, not zero-fill.”
  • user-identity-and-policy-proposal.md — user-volume keys are bound to session identity. The cap chain that yields “you are Alice” also yields Alice’s KEK.
  • cloud-metadata-proposal.mdCloudMetadata and the InstanceIdentity cap carved out of it are what the cloud KeySource implementations consume to authenticate to KMS without baked-in credentials.
  • oidc-and-oauth2-proposal.md — the WorkloadIdentityFederation and token-exchange primitives behind OidcFederatedKeySource. Also the source of the AccessToken / IdToken cap shape used in per-user volume unlock and the policy inputs consumed by SealPolicy.tokenExchange.
  • cloud-deployment-proposal.md — hardware abstraction for NVMe and SED drives sets the ground for a future SelfEncryptingBlockDevice capability (hardware inline crypto), distinct from this proposal’s software-crypto Layer A.
  • security-and-verification-proposal.md — the encrypted block format is a good target for the tiered tooling plan: fuzz corrupted ciphertext at the block boundary, proptest round-trips through the wrapper, Loom-model the volume unlock state machine, Kani-prove LBA-nonce uniqueness invariants. General crypto-side invariants are tracked in the key-management proposal.
  • system-monitoring-proposal.md — volume unlock, decrypt failure, and format-params events are audit-worthy. The EncryptedBlockDevice service emits them through the audit cap. Generic key events are emitted by the key-management services.
  • live-upgrade-proposal.md — replacing the EncryptedBlockDevice service must preserve in-flight I/O and the DEK. The service holds sensitive state (the key material); live upgrade needs a state-transfer path that does not touch the disk and does not leak the key through shared memory.

Open Questions

  1. Tag area layout. Sidecar journal (dm-integrity style, separate device or partition) vs. reserved footer per block group vs. derived-nonce-only-plus-separate-MAC-area. Affects write amplification, recovery, and fsync semantics. A small measurement study under QEMU would settle it.
  2. Key rotation at scale. Rewrap-only (KEK rotation) is cheap. Rekeying a DEK on a live volume means re-encrypting every block. Online rekey is a research problem; for capOS a controlled offline rekey service reading old-key and writing new-key is the honest first answer.
  3. Metadata leakage in Layer B. fscrypt-style filename encryption is fiddly (deterministic encryption to preserve directory lookups vs. randomized encryption that breaks them). Decide whether Layer B encrypts names as well as contents, and how lookups work if names are randomized.
  4. Backup re-encryption. A backup crossing trust boundaries needs either shared key material at both ends or an explicit re-encrypt step. Who does the re-encryption — the backup service, a dedicated re-encryption service, or a KMS-side primitive? Policy question, not a mechanism question, but worth documenting defaults.
  5. Hardware inline crypto as a separate capability. NVMe OPAL and SED drives do not fit the software-AEAD model. Define SelfEncryptingBlockDevice with its own open/lock/unlock methods and a separate trust story (the device is in the TCB).
  6. Swap / paging. No swap yet. When added, encrypted swap with a per-boot ephemeral key is standard. The memory-pressure policy, page-eligibility rules, and swap lifecycle now live in oom-and-swap-proposal.md.
  7. Firmware and boot-partition integrity. This proposal assumes secure boot / measured boot is available when TPM-sealed keys are in use. The actual secure-boot work is owned by storage-and-naming-proposal.md Open Question #5 and is prerequisite, not in scope here.

Algorithm enum scope, side-channel hardening, post-quantum migration, GOST support, and audit granularity are answered in cryptography-and-key-management-proposal.md’s open-questions section rather than duplicated here.