Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Proposal: Time and Clock Capability Authority

How capOS should expose wall-clock time, clock discipline, and trusted timestamps without introducing ambient real time, allowing a service to forge timestamps, or creating a covert timing channel between processes.

Problem

Today capOS has one time-related capability: Timer, which exposes now() -> (monotonicNs, tick) and sleep(). The monotonic counter is useful for scheduling and rate limiting, but it carries no provenance, has no relationship to wall-clock time, and is not a trusted source for security decisions.

Several upcoming capability surfaces implicitly need trustworthy wall-clock time:

  • TLS certificate validation (certificates-and-tls-proposal.md) must compare notBefore/notAfter fields against a wall-clock source whose provenance the validator trusts.
  • OIDC token expiry (oidc-and-oauth2-proposal.md) must compare exp and iat claims against wall-clock time.
  • Audit records must carry a timestamp that a security reviewer can trust. A service must not be able to backdate its own audit entries.
  • WASI clock_time_get(CLOCKID_REALTIME) currently returns NOSYS. Any WASM payload that needs the current time, including TLS libraries compiled to WASM, hits this gap.
  • Cloud metadata bootstrap (cloud-metadata-proposal.md) supplies instance-launch time; any cloud image verification that checks a timestamp needs a root-of-trust for time.

None of these can be satisfied by handing callers a monotonic tick offset and asking them to add a boot-time offset they supply themselves: the capability model requires that time provenance be part of the granted interface, not an ambient convention.

User Stories

  • A TLS handshake service holds a WallClock cap labeled ntp-synced. It calls wallTime() to get the current UTC time and validates a certificate’s validity window. If the provenance were untrusted, it would refuse validation or surface a warning.
  • An audit service receives timestamped records from AuthorityBroker and session services. It does not trust the caller-supplied timestamp; it reads its own granted WallClock and stamps records at ingestion time.
  • A WASM payload loaded by capos-wasm calls clock_time_get(CLOCKID_REALTIME). The WASI host adapter reads the WallClock cap that was granted to the wasm-host process at launch, returns the wall-clock seconds, and sets the provenance flag in the host’s internal WASI state so that WASM callers cannot assume sync quality beyond what was granted.
  • An init operator grants clockDiscipline to a userspace NTP service. The NTP service calls step() or slew() to advance or discipline the system clock. No other process may call these methods.
  • A process running in an environment with no NTP synchronization receives a WallClock labeled measured-boot-monotonic. It can compute elapsed time accurately but knows that absolute wall time is only as accurate as the firmware real-time clock at boot.

Design

Existing Timer Interface

interface Timer {
    now @0 () -> (monotonicNs :UInt64, tick :UInt64);
    sleep @1 (durationNs :UInt64) -> ();
}

Timer remains the canonical interface for deadlines, sleep, and monotonic elapsed time. It does not change. WallClock is a separate, orthogonal capability whose provenance tracks the quality of the absolute time signal.

WallClock Interface

enum ClockProvenance {
    # Zero-value is fail-closed: an unset, default, or unrecognized provenance
    # decodes as untrusted, so a caller that skips the check never treats an
    # unknown source as trusted. No reliable source known; callers must fail
    # closed on sensitive decisions.
    untrusted        @0;
    # Synchronized to a trusted NTP source within the last sync window.
    ntpSynced        @1;
    # PTP hardware clock; higher precision, same trust level as ntpSynced.
    ptpSynced        @2;
    # Firmware RTC at boot; advanced monotonically since; no network sync.
    measuredBootMonotonic @3;
    # Manual set by an operator with clockDiscipline authority.
    manualSet        @4;
}

interface WallClock {
    # Returns UTC seconds since Unix epoch, nanoseconds within the second,
    # the current monotonic offset from the same Timer.now() base, and
    # the provenance label for this clock source.
    wallTime @0 () -> (
        utcSeconds  :Int64,
        utcNanos    :UInt32,
        monotonicNs :UInt64,
        provenance  :ClockProvenance
    );
}

Key properties:

  • No ambient access. A process must hold a granted WallClock cap to read wall time. Init-owned processes receive it via the manifest bundle; ordinary services receive it only if their supervisor grants it.
  • Provenance is part of the response, not a separate call. A validator that requires ntpSynced can check the provenance field on every read without a separate round-trip.
  • Monotonic offset is included. The returned monotonicNs ties the wall-clock sample to the Timer.now() timeline so callers can compute elapsed time without a second Timer call. The kernel ensures both fields are read from a consistent snapshot within the same tick.
  • Single method. WallClock is read-only and has no state. Its simplicity makes attenuation straightforward: a wrapper that downgrades provenance to untrusted or truncates resolution is trivially composable.

ClockDiscipline Interface

Clock setting and NTP/PTP synchronization require a separate, stronger capability. No userspace process can discipline the clock without holding it.

interface ClockDiscipline {
    # Atomically step the wall-clock by the given signed delta in nanoseconds.
    # Used for large corrections (initial set from RTC, NTP step).
    step @0 (deltaNs :Int64) -> ();

    # Gradually slew the clock toward the target offset, bounded to
    # `maxRateNsPerS` nanoseconds per second.  Used for NTP drift correction.
    slew @1 (offsetNs :Int64, maxRateNsPerS :UInt32) -> ();

    # Declare the current source and its estimated error bound.
    setProvenance @2 (
        provenance    :ClockProvenance,
        errorBoundNs  :UInt64
    ) -> ();

    # Read the current sync state.
    syncState @3 () -> (
        provenance    :ClockProvenance,
        lastSyncMonotonicNs :UInt64,
        lastStepMonotonicNs :UInt64,
        errorBoundNs  :UInt64,
        slewRateNsPerS :Int32
    );
}

ClockDiscipline is init-owned at boot. The manifest may grant it to a dedicated NTP service process. No service other than the designated NTP/PTP daemon should hold this cap.

step() adjusts only the UTC offset, never the monotonic base. Per the prior-art note’s clock-step/leap-second lesson (a monotonic timeline must never jump backwards), a step retargets the wall-clock offset layered on Timer.now(); it does not rewind the monotonic timeline that scheduler deadlines, ring timeouts, and slew() rate-limiting depend on. Large discontinuities use step() (initial set / NTP step), small drift uses slew(), and leap seconds are absorbed by slewing (smear) rather than a backwards step so ordered timestamps never regress. The lastStepMonotonicNs field lets a WallClock consumer detect that a step happened since a cached observation and re-read.

Timezone and Locale Data

Timezone and locale data are not ambient. They are delivered as named entries in a Directory-backed data store (per storage-and-naming-proposal.md). A process that needs timezone conversion receives a scoped read-only Directory cap pointing at the relevant tzdata namespace entry, not an environment variable or a path under a global filesystem.

Rationale: environment variables are not capability-scoped, and a process should not observe the host’s timezone as a side channel. Explicit directory delivery makes timezone data just another granted resource.

Manifest Seeding

The boot manifest may include a seedUtcSeconds field in SystemConfig (or an extension struct). At first kernel tick, the kernel initializes the wall-clock state from this seed with measuredBootMonotonic provenance. If no seed is present, the firmware RTC is read during early boot; if no RTC is available, provenance is untrusted.

After init starts the NTP service and that service disciplines the clock, it calls ClockDiscipline.setProvenance(ntpSynced, ...) to upgrade the provenance label. From that point, all WallClock.wallTime() calls return ntpSynced.

Audit Timestamps

Audit records must carry a server-stamped timestamp, not a caller-supplied one.

The audit service holds a WallClock cap. When it ingests a record from AuthorityBroker, SessionManager, or any other producer, it stamps the record with the time returned by its own WallClock call at ingestion. The producer may supply a monotonic offset for correlation, but the wall-clock stamp is always the audit service’s own read.

Audit record timestamps carry the same ClockProvenance enum value that was returned by WallClock.wallTime() at ingestion time. A security reviewer can verify that audit entries were timestamped with a synchronized source and reject or flag entries timestamped under untrusted.

WASI Integration

capos-wasm Phase W.3+ adds WallClock as a grantable cap in the per-instance CapSet launched by wasm-host. The WASI Preview 1 host function clock_time_get(CLOCKID_REALTIME, ...) reads from the granted WallClock, returns the UTC second/nanosecond pair, and records the provenance in the host state so that the wasm-host audit trail can assert what time quality the WASM instance saw. If no WallClock cap was granted, clock_time_get(REALTIME) returns NOSYS as it does today.

No Cross-Process Skew Side Channel

WallClock exposes only the current time from the kernel’s single wall-clock state. It does not expose skew history, NTP offset measurements, or raw clock-adjustment rates. ClockDiscipline.syncState() is the only path to sync state and is held by at most one NTP service.

A process cannot learn another process’s read pattern from WallClock because there is no shared counter or read-cursor that leaks observer timing. The monotonic offset in the wallTime() response is derived from the same TSC baseline as Timer.now() and does not introduce new covert-channel surface.

Fail-Closed Policy

Services that receive a WallClock cap and make security decisions on its output must treat untrusted provenance as a failure condition, not a degraded-but-functional mode. The recommended pattern:

let (utc, _, _, prov) = wall_clock.wallTime()?;
if prov == ClockProvenance::Untrusted {
    return Err(CapError::ClockProvenanceInsufficient);
}
validate_cert_notafter(utc, cert)?;

Callers that accept measuredBootMonotonic for non-security uses (e.g., log timestamps, cache TTLs) should document the provenance they accept. Callers that accept only ntpSynced or ptpSynced for security decisions should reject all other values.

Phasing

Phase 1 — WallClock Read and Provenance

Status: landed (2026-05-24 09:31 UTC), fixed-boot-base variant. The WallClock read cap and ClockProvenance enum exist end-to-end: schema + generated bindings, kernel/src/cap/wall_clock.rs, the capos-config wall_clock kernel source, the capos-rt WallClockClient, and a shell date command proven by make run-shell. The follow-up bullets below (manifest seed, stateful WallClockState, init audit/TLS grants, WASM realtime clock) remain Phase 1.x / Phase 2.

  • Add WallClock interface and ClockProvenance enum to schema/capos.capnp. Landed.
  • Landed (fixed-boot-base variant): the kernel cap derives UTC from a fixed compile-time base over the existing monotonic timebase and reports the fail-closed untrusted provenance (the ClockProvenance zero value). It is not read from firmware RTC and is not network-synchronized, so untrusted is the honest label; this also proves the zero-value fail-closed enum semantics end-to-end. A stateful WallClockState (UTC offset, provenance, last-sync tick, error bound) and a manifest seedUtcSeconds seed with measuredBootMonotonic provenance are deferred to Phase 1.x / Phase 2 where ClockDiscipline can upgrade the label.
  • cap/wall_clock.rs implements the cap; capos-rt adds a typed client. Landed (WallClockClient, with a fail-closed ClockProvenance::from_schema unknown-variant decode).
  • Init grants WallClock to audit service and TLS service in the manifest bundle. (Deferred; the landed proof grants wall_clock directly to the shell-as-init in system-shell.cue.)
  • WASM host adapter: clock_time_get(CLOCKID_REALTIME) reads the instance’s granted WallClock; if absent, returns NOSYS as before. (Deferred.)
  • Smoke: a shell date command in make run-shell boots, reads WallClock, prints UTC seconds/nanos/monotonic plus the provenance label, and exits cleanly. Landed (asserted in tools/qemu-shell-smoke.sh).

Phase 2 — Clock Discipline and NTP Service

  • Add ClockDiscipline interface to schema.
  • Kernel implements step(), slew(), setProvenance(), and syncState().
  • A userspace NTP client process receives ClockDiscipline from init and synchronizes to a configured NTP server (requires UdpSocket from the networking capability).
  • After first successful sync, calls setProvenance(ntpSynced, errorBoundNs). All subsequent WallClock.wallTime() calls return ntpSynced.
  • Audit entries timestamped post-sync carry ntpSynced provenance.

Phase 3 — PTP, Leap Second, and Suspend Recovery

  • PTP hardware clock support for environments that have it.
  • Leap-second policy: step vs. smear, configurable per ClockDiscipline.
  • Suspend/resume: WallClock provenance downgrades to measuredBootMonotonic after a suspend event until NTP re-syncs. (Cross-links to the future power/suspend proposal; no dependency today.)
  • Timezone delivery: a Directory namespace entry backed by tzdata is seeded from the manifest and delivered as a cap to timezone-aware services.

Hazards and Invariants

Monotonic vs. wall-clock relationship. The wall-clock state is an offset applied to the Timer monotonic base. step() changes the offset; the underlying monotonic timeline never goes backward. Callers that need monotonic guarantees must use Timer.now(); callers that need calendar time use WallClock.wallTime(). This separation prevents a clock step from violating monotonicity promises made to schedulers or ring timeouts.

ABI stability. ClockProvenance enum variants must only be added, never removed or reordered. Binaries compiled against an older schema that see an unrecognized provenance value should treat it as untrusted (fail-closed). This requires the capnp generated decode to default unknown enum values to zero, which is ntpSynced — so the schema field ordering above must put untrusted at zero or the generated bindings must use an explicit unknown-variant path. Ordering note: when adding to schema, put untrusted @0 first so that the zero default is fail-closed, not the most-trusted value.

DMA and IRQ neutrality. WallClock and ClockDiscipline do not touch device memory, DMA pools, or interrupt grant paths. They are pure kernel-state caps. No DMA/MMIO/IRQ hazard applies.

No capability-transfer amplification. WallClock is a read-only snapshot surface. Transferring it to another process does not grant clock-setting authority. ClockDiscipline must not be transferable through normal cap-grant paths; it should be restricted to init-owned grant at boot and explicit manifest-operator grants.

Relevant Research and Prior Art

In-Tree Grounding

  • NO_HZ, SQPOLL, and Realtime Scheduling records the Linux timer-stack split between clock sources (monotonic timeline counters) and clock events (hardware devices that interrupt at selected future times), and concludes capOS should “introduce a monotonic now_ns clocksource layer” distinct from the scheduler tick. This proposal builds directly on that separation: Timer.now()/WallClock.wallTime() expose the clocksource timeline, while clock-event programming stays a scheduler concern. The wall-clock offset rides on the same monotonic base so a clock step never rewinds the timeline the scheduler and ring timeouts depend on — the monotonicity invariant called out in that note.
  • Future Scheduler Architecture reinforces the same clocksource/clockevent boundary and the lesson that absolute-deadline waiters should be stored by expiry time, not periodic tick count. That confirms WallClock must not become the deadline substrate: deadlines remain monotonic, and wall-clock time is a separate, disciplinable view layered on top.

External Precedent and Lessons

  • Linux clock_gettime / adjtimex. Linux exposes distinct clock IDs (CLOCK_MONOTONIC vs CLOCK_REALTIME) and gates clock discipline behind a privileged interface: adjtimex/clock_adjtime and stepping the realtime clock require CAP_SYS_TIME. Lesson: reading time and disciplining time are different authorities. capOS encodes this as a read-only WallClock cap held by ordinary services and a separate, stronger ClockDiscipline cap held only by a designated sync service — the capability-native analog of the read/CAP_SYS_TIME split.
  • Linux time namespaces. CLOCK_MONOTONIC/CLOCK_BOOTTIME offsets can be virtualized per namespace so a container observes a different boot/monotonic origin. Lesson: time can be a per-context value rather than a single global ambient fact, which supports delivering wall-clock as a granted, attenuable cap (and timezone data as a scoped Directory) instead of a process-wide environment.
  • Fuchsia/Zircon UTC clock objects. Fuchsia models UTC as a kernel clock object distributed to processes as read-only handles, with a separate privileged maintainer service holding the write handle that disciplines the clock; clock reads carry an error bound and a “started/synced” signal so a reader can tell whether the clock is yet trustworthy. Lesson: this is the closest capability-native precedent for the design here. capOS’s read-only WallClock with a ClockProvenance label maps to Fuchsia’s read-only UTC handle plus its synced/error-bound signal, and ClockDiscipline maps to the single write-handle maintainer. (The in-tree Zircon report covers handles, rights, and VMOs but not the UTC clock object specifically; the UTC-clock mapping is external precedent, not yet captured as an in-tree research note.)
  • NTP step vs. slew. NTP daemons step the clock for large offsets and slew (bounded rate adjustment) for small drift, precisely because abruptly rewinding wall time breaks timestamp ordering and timeouts. Lesson: capOS exposes step() and slew() as distinct ClockDiscipline methods rather than a single “set time”, so the discipline policy is explicit at the cap boundary.
  • IEEE-1588 PTP. Precision Time Protocol provides sub-microsecond hardware timestamping via a dedicated hardware clock, distinct from software NTP. Lesson: provenance is not binary. The ptpSynced vs ntpSynced distinction in ClockProvenance lets a validator that needs high-precision time distinguish the two without conflating accuracy with mere network sync.

Dedicated Research Note

  • Time and Clock Authority is the focused prior-art survey for this proposal: verified Linux CAP_SYS_TIME read/discipline split, time namespaces as per-context clock offsets, chrony/NTP step/slew/smear discipline, PTP/IEEE-1588 hardware timestamping, Fuchsia’s ZX_RIGHT_READ/ZX_RIGHT_WRITE UTC clock object, and leap-second smearing vs stepping, each with its capOS lesson and real sources. It is the primary external grounding for WallClock, ClockDiscipline, and ClockProvenance.

Residual research still owed before Phase 2/3 implementation: the servo / loop-filter behavior, holdover and error-bound estimation, and suspend/resume clock recovery are the highest-risk underspecified areas and should be deepened in that note (or a follow-on) rather than fixed by this proposal’s sketch.

Relevant Proposals

  • Certificates and TLS — TLS validation delegates certificate validity-window checks to a granted WallClock.
  • OIDC and OAuth2 — Token expiry checks (exp, iat, nbf) use a granted WallClock with at least measuredBootMonotonic provenance.
  • WASI Host Adapter — Phase W.3+ clock_time_get(CLOCKID_REALTIME) backed by a per-instance WallClock cap.
  • Cloud Metadata — Cloud instance launch time delivered through the metadata capability; the WallClock seed path integrates with this bootstrap.
  • System Monitoring — Audit records carry ClockProvenance-labeled timestamps from the audit service’s own WallClock read at ingestion.
  • Storage and Naming — Timezone and locale data delivered as a read-only Directory cap, not an ambient environment.