Proposal: Cloud Instance Bootstrap
Picking up instance-specific configuration — SSH keys, hostname, network config, user-supplied payload — from cloud provider metadata sources, without porting the Canonical cloud-init stack.
Problem
A capOS ISO built once has to boot on any cloud VM and adapt to its environment: different instance IDs, different public IPs, different operator-supplied SSH keys, different user-data payloads. Without this, every instance needs a custom-baked ISO — and the content-addressed-boot story (“same hash boots identically on N machines”) devalues itself at the point where it would actually matter for operations.
The Linux convention is cloud-init: a Python daemon that reads
metadata from provider-specific sources and applies it by writing
files under /etc, invoking systemctl, creating users, and running
shell scripts. Porting it is a non-starter:
- Python, POSIX, systemd-dependent.
- Runs as root with ambient authority: parses untrusted user-data as shell scripts, mutates arbitrary system state.
- ~100k lines covering hundreds of rarely-used modules (chef, puppet, seed_random, phone_home).
- Assumes a package manager and init system that do not exist on capOS.
capOS needs the pattern — consume provider metadata, use it to bootstrap the instance — reshaped to the capability model.
Metadata Sources
All major clouds expose instance metadata through one or more of:
- HTTP IMDS.
169.254.169.254. AWS IMDSv2 requires aPUTtoken-exchange handshake; GCP and Azure accept directGET. Paths differ per provider. Needs a running network stack. - ConfigDrive. An ISO9660 filesystem attached as a block device,
containing
meta_data.json(or equivalent) and optional user-data file. OpenStack, older Azure. Needs a block driver and filesystem reader, no network. - SMBIOS / DMI. Vendor, product, serial-number, UUID fields populated by the hypervisor. Good for provider detection before networking comes up.
- NoCloud. Seed files baked into the image or on an attached FAT disk. Useful for development and bare-metal.
The bootstrap service should read from whichever source is present rather than hardcoding one. Provider detection via SMBIOS runs first (no dependencies), then the appropriate transport is initialized.
CloudMetadata Capability
A single capnp interface; one or more implementations:
interface CloudMetadata {
# Instance identity
instanceId @0 () -> (id :Text);
instanceType @1 () -> (type :Text);
hostname @2 () -> (name :Text);
region @3 () -> (region :Text);
# Network configuration (primary interface addresses, gateway, DNS)
networkConfig @4 () -> (config :NetworkConfig);
# Authentication material
sshKeys @5 () -> (keys :List(Text));
# User-supplied payload. Opaque to the metadata provider.
userData @6 () -> (data :Data, contentType :Text);
# Vendor-supplied payload. Separate from userData so the
# bootstrap policy can trust them differently.
vendorData @7 () -> (data :Data, contentType :Text);
}
struct NetworkConfig {
interfaces @0 :List(Interface);
struct Interface {
macAddress @0 :Text;
ipv4 @1 :List(IpAddress);
ipv6 @2 :List(IpAddress);
gateway @3 :Text;
dnsServers @4 :List(Text);
mtu @5 :UInt16;
}
}
Implementations:
HttpMetadata— fetches from169.254.169.254; one variant per provider because paths and auth handshakes differ (AWS IMDSv2 token, GCPMetadata-Flavor: Google, Azure API version).ConfigDriveMetadata— reads an ISO9660 seed disk.NoCloudMetadata— reads a seed blob from the initial manifest.
Detection lives in a small probe service that inspects SMBIOS
(System Manufacturer: Google, Amazon EC2, Microsoft Corporation,
…) and grants the cloud-bootstrap service the appropriate
CloudMetadata implementation as part of a manifest delta.
Bootstrap Service
A single service — cloud-bootstrap — runs once per boot:
cloud-bootstrap:
caps:
- metadata: CloudMetadata # from probe service
- manifest: ManifestUpdater # narrow authority to extend the graph
- network: NetworkConfigurator # apply interface addresses
- ssh_keys: KeyStore # target store for authorized keys
user_data_handlers:
- application/x-capos-manifest: ManifestDeltaHandler
# operator-installed handlers for other content types
Sequence:
- Gather identity and declarative config (
instanceId,hostname,networkConfig,sshKeys), apply through the narrow caps above. (data, ct) = metadata.userData()— dispatch by content type. If no handler is registered, log and skip.- Exit.
The service never holds ProcessSpawner directly. It holds
ManifestUpdater, a wrapper that accepts capnp-encoded
ManifestDelta messages and applies them through the existing init
spawn path. The decoder and apply path are shared with the build-time
pipeline (same capos-config crate, same spawn loop). The precise
shape of ManifestDelta is an open question — see “Open Questions”
below — but at minimum it covers hostname, network config, SSH keys,
and authorized application-level service additions:
struct ManifestDelta {
addServices @0 :List(ServiceEntry);
addBinaries @1 :List(NamedBlob);
setHostname @2 :Text;
setNetworkConfig @3 :NetworkConfig;
}
Relationship to the Build-Time Manifest Pipeline
The existing build-time pipeline (system.cue →
tools/mkmanifest → manifest.bin → Limine boot module →
capos-config decoder → init spawn loop) and the cloud-metadata
bootstrap path are not two parallel systems. They are the same
pipeline with different transports and different trust scopes.
| Stage | Build-time (baked ISO) | Runtime (cloud metadata) |
|---|---|---|
| Authoring | system.cue in the repo | user-data.cue on the operator’s host |
| Compile | mkmanifest (CUE → capnp) | same tool, same output |
| Transport | Limine boot module | HTTP IMDS / ConfigDrive / NoCloud disk |
| Wire format | capnp-encoded SystemManifest | capnp-encoded ManifestDelta |
| Decoder | capos-config | capos-config |
| Apply | init spawn loop | same spawn loop, invoked via ManifestUpdater |
Three practical consequences:
- CUE is a host-side authoring convenience, not an on-wire format.
Neither kernel nor init evaluates CUE. An operator supplying
user-data writes
user-data.cue, runs `mkmanifest user-data.cueuser-data.bin
on their host, and ships the capnp bytes (base64 into–metadata [email protected]` for GCP/AWS, or as a file on a ConfigDrive ISO). - NoCloud is a Limine boot module by another name. A NoCloud
seed blob is the same bytes as a baked-in
manifest.bin, attached via a disk or bundled into the ISO instead of handed over by the bootloader. The only difference is who hands the bytes to the parser. - No new schema surface.
ManifestDeltais defined alongsideSystemManifestinschema/capos.capnp, and sharing the decoder meansManifestUpdater’s apply path is a thin merge-and-spawn on top of code that already boots the base system.
The trust model stays clean precisely because ManifestDelta is
not SystemManifest. The base manifest is inside the
content-addressed ISO hash (fully trusted, reproducible). The
runtime delta is applied by a narrowly-permitted service whose caps
define what fields of the delta can actually take effect — the
content-addressed-boot story is preserved because cloud metadata
augments the base graph, it cannot replace it.
User-Data Model
User-data on the wire is a capnp blob, not a shell script. Content
type application/x-capos-manifest identifies the canonical case:
the payload is a ManifestDelta message produced by mkmanifest
on the operator’s host and consumed directly by the bootstrap
service.
For cross-cloud-vendor compatibility, operators can install user-data dispatcher services for other content types (YAML, other capnp schemas, signed manifests, etc.). The bootstrap service holds a handler cap per content type; unknown types are logged and ignored, not executed.
Shell-script user-data — the Linux default — has nowhere to run on
capOS because there is no shell and no ambient-authority process to
execute it under. An operator who insists on this can install a
shell service and a handler that routes text/x-shellscript to it,
but that is a deliberate choice, not a default fallback.
Trust Model
The capability angle earns its keep here.
- The metadata endpoint is assumed as trustworthy as the hypervisor running the VM — the same assumption Linux cloud-init makes.
- The bootstrap service holds narrow caps (
ManifestUpdater,NetworkConfigurator,KeyStore), not ambient root. A bug or a malicious metadata response can at most spawn services theManifestUpdateraccepts, set network config theNetworkConfiguratoraccepts, and drop keys into theKeyStore. It cannot reach for arbitrary system state. vendorDataanduserDataare separated on the wire. A policy that trusts the cloud provider but not the operator (e.g., applyvendorDataas-is, routeuserDatathrough a signature check) is expressible by granting different handler caps to each.- User-data content-type dispatch is capability-mediated: the bootstrap service cannot execute a content type it wasn’t given a handler for. There is no fallback “try to run it as shell.”
Phased Implementation
Most of the manifest-handling machinery already exists from the
build-time pipeline (capos-config, mkmanifest, init’s spawn
loop). The new work is transports, provider detection, and the
ManifestDelta merge semantics.
ManifestDeltaschema andManifestUpdatercap. Add the delta type toschema/capos.capnpalongsideSystemManifest, extendcapos-configwith a merge routine (SystemManifest + ManifestDelta → new services to spawn), and exposeManifestUpdateras a cap in init.NoCloudMetadataseeded from a test fixture is enough to demo the apply path end-to-end without any cloud dependency.- Provider detection via SMBIOS. Kernel-side primitive or capability that reads SMBIOS DMI tables and exposes manufacturer / product strings. No network required.
- ConfigDrive support. ISO9660 reader plus
ConfigDriveMetadata. Gives a working real-transport metadata source with no dependency on userspace networking. QEMU can attach one via-drive file=configdrive.iso,if=virtiofor local testing. - HttpMetadata per provider. Requires the userspace network stack (Stage 6+). GCP first (simplest auth), then AWS (IMDSv2 token flow), then Azure.
- Cross-provider Cloud Metadata demo. Same ISO hash boots under
QEMU, GCP, AWS, and Azure; the only difference is the SMBIOS
manufacturer string, which the probe service uses to pick the
right
HttpMetadatavariant. This is the Cloud Metadata observable milestone.
Open Questions
Which fields of system.cue are runtime-modifiable?
system.cue today is a handful of service entries with kernel Console cap
grants encoded as structured source variants. That will grow. Plausible additions as capOS
matures: driver process definitions (virtio-net, virtio-blk, NVMe) with
device MMIO, interrupt, and frame allocator grants; scheduler tuning
(priority, budget, CPU pinning); filesystem driver services; memory-policy
hooks; ACPI/SMBIOS consumers.
Most of those are either fragile (kernel-adjacent; a bad value bricks
the instance), sensitive (granting kernel:frame_allocator to a
user-data-declared service is effectively root), or both. A
ManifestDelta with full SystemManifest equivalence hands every
such knob to whoever controls user-data.
The narrowing has to happen somewhere, but there are several places it could live:
- Different schema.
ManifestDeltais not structurally a subset ofSystemManifest— it omits driver entries, scheduler config, and kernel cap sources entirely. Schema-level guarantee; rigid but unambiguous. - Shared schema, policy-narrowing cap.
ManifestUpdateraccepts a full delta but validates at apply time: kernel source variants are rejected unless explicitly allow-listed by the cap’s parameters; additions that touch driver-level service entries fail. Flexible, but the narrowing logic is code that has to be audited, not a schema that is self-documenting. - Tiered deltas.
PrivilegedDelta(drivers, scheduler) andApplicationDelta(hostname, SSH keys, app services), minted by different caps. An operator supervisor holdsPrivilegedManifestUpdater;cloud-bootstrapholds onlyApplicationManifestUpdater. Compositional; matches the capability-model grain but doubles the schema surface. - Tag-based field permissions. Fields in
ServiceEntrycarry a privilege tag;ManifestUpdateris parameterized with a permitted-tag set. One schema, orthogonal policy.
Picking one prematurely would either over-constrain the cloud path
(option 1 before we know what apps legitimately need) or
under-constrain it (option 2 without clarity on what to check
against). This proposal commits only to the shared pipeline
(decoder, spawn loop, authoring tool). The shape of the public
type(s) the cap accepts is deferred until system.cue has grown
enough that the privileged vs. application split is visible in
concrete form.
Related open question: whether kernel cap sources should be expressible in
system.cue at all, or whether the build-time manifest should also declare
them through a narrower mechanism so that the same discipline that protects
cloud user-data also protects the baked-in manifest from accidental
over-grants. If they remain expressible, they should be structured enum/union
variants, not free-form strings; the associated interface TYPE_ID is only a
schema compatibility check and does not identify the authority being granted.
Non-Goals
- cloud-init compatibility. No parsing of
#cloud-configYAML, no#!/bin/bashexecution, noinclude-url, no MIME multipart handling. Operators who need these install their own dispatcher services; the base system does not. - Runtime package installation. The capOS equivalent of “install nginx on boot” is “include nginx in the manifest.” User-data can add services to the manifest; it cannot install packages (there is no package manager to install into).
- Re-running on every boot. cloud-init distinguishes
per-boot,per-instance, andper-oncemodules. The capOS bootstrap service runs once per boot; the manifest it produces is cached under the instance ID, and subsequent boots read the cache and skip the metadata round-trip. A full mode matrix is future work. - IPv6-only bring-up in the first iteration. Many clouds expose both; the schema supports both; the first implementations do whichever is easier per provider (typically IPv4).
- Automatic secret rotation. Metadata often exposes short-lived credentials (IAM role tokens on AWS, service-account tokens on GCP). Refresh logic belongs to the service that consumes the credential, not to cloud-bootstrap.
Related Work
- cloud-init (Canonical). The Linux reference. Huge scope, shell-script-centric, assumes root and POSIX. The capOS design intentionally takes the pattern and drops everything that depends on ambient authority.
- ignition (CoreOS/Flatcar). Runs once in initramfs, consumes a JSON spec, fails-fast if the spec can’t be applied. Closer in spirit to the capOS design — small, single-pass, declarative. Worth studying for its rollback and error-handling approach.
- AWS IMDSv2. The token-exchange handshake is the one thing the
HTTP client needs to handle that is not plain
GETs. Designing theHttpMetadatainterface without accounting for it up front leads to a rewrite later.