# Proposal: Time and Clock Capability Authority

How capOS should expose wall-clock time, clock discipline, and trusted
timestamps without introducing ambient real time, allowing a service to forge
timestamps, or creating a covert timing channel between processes.


## Problem

Today capOS has one time-related capability: `Timer`, which exposes
`now() -> (monotonicNs, tick)` and `sleep()`. The monotonic counter is useful
for scheduling and rate limiting, but it carries no provenance, has no
relationship to wall-clock time, and is not a trusted source for security
decisions.

Several upcoming capability surfaces implicitly need trustworthy wall-clock
time:

- **TLS certificate validation** (`certificates-and-tls-proposal.md`) must
  compare `notBefore`/`notAfter` fields against a wall-clock source whose
  provenance the validator trusts.
- **OIDC token expiry** (`oidc-and-oauth2-proposal.md`) must compare `exp`
  and `iat` claims against wall-clock time.
- **Audit records** must carry a timestamp that a security reviewer can trust.
  A service must not be able to backdate its own audit entries.
- **WASI `clock_time_get(CLOCKID_REALTIME)`** currently returns `NOSYS`. Any
  WASM payload that needs the current time, including TLS libraries compiled to
  WASM, hits this gap.
- **Cloud metadata bootstrap** (`cloud-metadata-proposal.md`) supplies
  instance-launch time; any cloud image verification that checks a timestamp
  needs a root-of-trust for time.

None of these can be satisfied by handing callers a monotonic tick offset and
asking them to add a boot-time offset they supply themselves: the
capability model requires that time provenance be part of the granted interface,
not an ambient convention.

## User Stories

- A TLS handshake service holds a `WallClock` cap labeled `ntp-synced`. It
  calls `wallTime()` to get the current UTC time and validates a certificate's
  validity window. If the provenance were `untrusted`, it would refuse
  validation or surface a warning.
- An audit service receives timestamped records from `AuthorityBroker` and
  session services. It does not trust the caller-supplied timestamp; it reads
  its own granted `WallClock` and stamps records at ingestion time.
- A WASM payload loaded by `capos-wasm` calls
  `clock_time_get(CLOCKID_REALTIME)`. The WASI host adapter reads the
  `WallClock` cap that was granted to the `wasm-host` process at launch,
  returns the wall-clock seconds, and sets the provenance flag in the host's
  internal WASI state so that WASM callers cannot assume sync quality beyond
  what was granted.
- An init operator grants `clockDiscipline` to a userspace NTP service. The
  NTP service calls `step()` or `slew()` to advance or discipline the
  system clock. No other process may call these methods.
- A process running in an environment with no NTP synchronization receives a
  `WallClock` labeled `measured-boot-monotonic`. It can compute elapsed time
  accurately but knows that absolute wall time is only as accurate as the
  firmware real-time clock at boot.

## Design

### Existing `Timer` Interface

```capnp
interface Timer {
    now @0 () -> (monotonicNs :UInt64, tick :UInt64);
    sleep @1 (durationNs :UInt64) -> ();
}
```

`Timer` remains the canonical interface for deadlines, sleep, and monotonic
elapsed time. It does not change. `WallClock` is a separate, orthogonal
capability whose provenance tracks the quality of the absolute time signal.

### `WallClock` Interface

```capnp
enum ClockProvenance {
    # Zero-value is fail-closed: an unset, default, or unrecognized provenance
    # decodes as untrusted, so a caller that skips the check never treats an
    # unknown source as trusted. No reliable source known; callers must fail
    # closed on sensitive decisions.
    untrusted        @0;
    # Synchronized to a trusted NTP source within the last sync window.
    ntpSynced        @1;
    # PTP hardware clock; higher precision, same trust level as ntpSynced.
    ptpSynced        @2;
    # Firmware RTC at boot; advanced monotonically since; no network sync.
    measuredBootMonotonic @3;
    # Manual set by an operator with clockDiscipline authority.
    manualSet        @4;
}

interface WallClock {
    # Returns UTC seconds since Unix epoch, nanoseconds within the second,
    # the current monotonic offset from the same Timer.now() base, and
    # the provenance label for this clock source.
    wallTime @0 () -> (
        utcSeconds  :Int64,
        utcNanos    :UInt32,
        monotonicNs :UInt64,
        provenance  :ClockProvenance
    );
}
```

Key properties:

- **No ambient access.** A process must hold a granted `WallClock` cap to read
  wall time. Init-owned processes receive it via the manifest bundle; ordinary
  services receive it only if their supervisor grants it.
- **Provenance is part of the response, not a separate call.** A validator that
  requires `ntpSynced` can check the provenance field on every read without a
  separate round-trip.
- **Monotonic offset is included.** The returned `monotonicNs` ties the
  wall-clock sample to the `Timer.now()` timeline so callers can compute
  elapsed time without a second `Timer` call. The kernel ensures both fields
  are read from a consistent snapshot within the same tick.
- **Single method.** `WallClock` is read-only and has no state. Its simplicity
  makes attenuation straightforward: a wrapper that downgrades provenance to
  `untrusted` or truncates resolution is trivially composable.

### `ClockDiscipline` Interface

Clock setting and NTP/PTP synchronization require a separate, stronger
capability. No userspace process can discipline the clock without holding it.

```capnp
interface ClockDiscipline {
    # Atomically step the wall-clock by the given signed delta in nanoseconds.
    # Used for large corrections (initial set from RTC, NTP step).
    step @0 (deltaNs :Int64) -> ();

    # Gradually slew the clock toward the target offset, bounded to
    # `maxRateNsPerS` nanoseconds per second.  Used for NTP drift correction.
    slew @1 (offsetNs :Int64, maxRateNsPerS :UInt32) -> ();

    # Declare the current source and its estimated error bound.
    setProvenance @2 (
        provenance    :ClockProvenance,
        errorBoundNs  :UInt64
    ) -> ();

    # Read the current sync state.
    syncState @3 () -> (
        provenance    :ClockProvenance,
        lastSyncMonotonicNs :UInt64,
        lastStepMonotonicNs :UInt64,
        errorBoundNs  :UInt64,
        slewRateNsPerS :Int32
    );
}
```

`ClockDiscipline` is init-owned at boot. The manifest may grant it to a
dedicated NTP service process. No service other than the designated NTP/PTP
daemon should hold this cap.

**`step()` adjusts only the UTC offset, never the monotonic base.** Per the
prior-art note's clock-step/leap-second lesson (a monotonic timeline must never
jump backwards), a step retargets the wall-clock offset layered on
`Timer.now()`; it does not rewind the monotonic timeline that scheduler
deadlines, ring timeouts, and `slew()` rate-limiting depend on. Large
discontinuities use `step()` (initial set / NTP step), small drift uses
`slew()`, and leap seconds are absorbed by slewing (smear) rather than a
backwards step so ordered timestamps never regress. The `lastStepMonotonicNs`
field lets a `WallClock` consumer detect that a step happened since a cached
observation and re-read.

### Timezone and Locale Data

Timezone and locale data are not ambient. They are delivered as named entries
in a `Directory`-backed data store (per `storage-and-naming-proposal.md`).
A process that needs timezone conversion receives a scoped read-only `Directory`
cap pointing at the relevant tzdata namespace entry, not an environment variable
or a path under a global filesystem.

Rationale: environment variables are not capability-scoped, and a process should
not observe the host's timezone as a side channel. Explicit directory delivery
makes timezone data just another granted resource.

### Manifest Seeding

The boot manifest may include a `seedUtcSeconds` field in `SystemConfig`
(or an extension struct). At first kernel tick, the kernel initializes the
wall-clock state from this seed with `measuredBootMonotonic` provenance. If no
seed is present, the firmware RTC is read during early boot; if no RTC is
available, provenance is `untrusted`.

After init starts the NTP service and that service disciplines the clock, it
calls `ClockDiscipline.setProvenance(ntpSynced, ...)` to upgrade the provenance
label. From that point, all `WallClock.wallTime()` calls return `ntpSynced`.

### Audit Timestamps

Audit records must carry a server-stamped timestamp, not a caller-supplied one.

The audit service holds a `WallClock` cap. When it ingests a record from
`AuthorityBroker`, `SessionManager`, or any other producer, it stamps the record
with the time returned by its own `WallClock` call at ingestion. The producer
may supply a monotonic offset for correlation, but the wall-clock stamp is
always the audit service's own read.

Audit record timestamps carry the same `ClockProvenance` enum value that was
returned by `WallClock.wallTime()` at ingestion time. A security reviewer can
verify that audit entries were timestamped with a synchronized source and reject
or flag entries timestamped under `untrusted`.

### WASI Integration

`capos-wasm` Phase W.3+ adds `WallClock` as a grantable cap in the per-instance
`CapSet` launched by `wasm-host`. The WASI Preview 1 host function
`clock_time_get(CLOCKID_REALTIME, ...)` reads from the granted `WallClock`,
returns the UTC second/nanosecond pair, and records the provenance in the host
state so that the wasm-host audit trail can assert what time quality the WASM
instance saw. If no `WallClock` cap was granted, `clock_time_get(REALTIME)`
returns `NOSYS` as it does today.

### No Cross-Process Skew Side Channel

`WallClock` exposes only the current time from the kernel's single
wall-clock state. It does not expose skew history, NTP offset measurements,
or raw clock-adjustment rates. `ClockDiscipline.syncState()` is the only
path to sync state and is held by at most one NTP service.

A process cannot learn another process's read pattern from `WallClock` because
there is no shared counter or read-cursor that leaks observer timing. The
monotonic offset in the `wallTime()` response is derived from the same TSC
baseline as `Timer.now()` and does not introduce new covert-channel surface.

### Fail-Closed Policy

Services that receive a `WallClock` cap and make security decisions on its
output must treat `untrusted` provenance as a failure condition, not a
degraded-but-functional mode. The recommended pattern:

```
let (utc, _, _, prov) = wall_clock.wallTime()?;
if prov == ClockProvenance::Untrusted {
    return Err(CapError::ClockProvenanceInsufficient);
}
validate_cert_notafter(utc, cert)?;
```

Callers that accept `measuredBootMonotonic` for non-security uses (e.g., log
timestamps, cache TTLs) should document the provenance they accept. Callers
that accept only `ntpSynced` or `ptpSynced` for security decisions should
reject all other values.

## Phasing

### Phase 1 — WallClock Read and Provenance

**Status: landed (2026-05-24 09:31 UTC), fixed-boot-base variant.** The
`WallClock` read cap and `ClockProvenance` enum exist end-to-end: schema +
generated bindings, `kernel/src/cap/wall_clock.rs`, the `capos-config`
`wall_clock` kernel source, the `capos-rt` `WallClockClient`, and a shell `date`
command proven by `make run-shell`. The follow-up bullets below (manifest seed,
stateful `WallClockState`, init audit/TLS grants, WASM realtime clock) remain
Phase 1.x / Phase 2.

- Add `WallClock` interface and `ClockProvenance` enum to `schema/capos.capnp`.
  **Landed.**
- **Landed (fixed-boot-base variant):** the kernel cap derives UTC from a fixed
  compile-time base over the existing monotonic timebase and reports the
  fail-closed `untrusted` provenance (the `ClockProvenance` zero value). It is
  not read from firmware RTC and is not network-synchronized, so `untrusted` is
  the honest label; this also proves the zero-value fail-closed enum semantics
  end-to-end. A stateful `WallClockState` (UTC offset, provenance, last-sync
  tick, error bound) and a manifest `seedUtcSeconds` seed with
  `measuredBootMonotonic` provenance are deferred to Phase 1.x / Phase 2 where
  `ClockDiscipline` can upgrade the label.
- `cap/wall_clock.rs` implements the cap; `capos-rt` adds a typed client.
  **Landed** (`WallClockClient`, with a fail-closed `ClockProvenance::from_schema`
  unknown-variant decode).
- Init grants `WallClock` to audit service and TLS service in the manifest
  bundle. *(Deferred; the landed proof grants `wall_clock` directly to the
  shell-as-init in `system-shell.cue`.)*
- WASM host adapter: `clock_time_get(CLOCKID_REALTIME)` reads the instance's
  granted `WallClock`; if absent, returns `NOSYS` as before. *(Deferred.)*
- Smoke: a shell `date` command in `make run-shell` boots, reads `WallClock`,
  prints UTC seconds/nanos/monotonic plus the provenance label, and exits
  cleanly. **Landed** (asserted in `tools/qemu-shell-smoke.sh`).

### Phase 2 — Clock Discipline and NTP Service

- Add `ClockDiscipline` interface to schema.
- Kernel implements `step()`, `slew()`, `setProvenance()`, and `syncState()`.
- A userspace NTP client process receives `ClockDiscipline` from init and
  synchronizes to a configured NTP server (requires `UdpSocket` from the
  networking capability).
- After first successful sync, calls `setProvenance(ntpSynced, errorBoundNs)`.
  All subsequent `WallClock.wallTime()` calls return `ntpSynced`.
- Audit entries timestamped post-sync carry `ntpSynced` provenance.

### Phase 3 — PTP, Leap Second, and Suspend Recovery

- PTP hardware clock support for environments that have it.
- Leap-second policy: step vs. smear, configurable per `ClockDiscipline`.
- Suspend/resume: `WallClock` provenance downgrades to `measuredBootMonotonic`
  after a suspend event until NTP re-syncs. (Cross-links to the future
  power/suspend proposal; no dependency today.)
- Timezone delivery: a `Directory` namespace entry backed by tzdata is seeded
  from the manifest and delivered as a cap to timezone-aware services.

## Hazards and Invariants

**Monotonic vs. wall-clock relationship.** The wall-clock state is an offset
applied to the `Timer` monotonic base. `step()` changes the offset; the
underlying monotonic timeline never goes backward. Callers that need monotonic
guarantees must use `Timer.now()`; callers that need calendar time use
`WallClock.wallTime()`. This separation prevents a clock step from violating
monotonicity promises made to schedulers or ring timeouts.

**ABI stability.** `ClockProvenance` enum variants must only be added, never
removed or reordered. Binaries compiled against an older schema that see an
unrecognized provenance value should treat it as `untrusted` (fail-closed).
This requires the capnp generated decode to default unknown enum values to zero,
which is `ntpSynced` — so the schema field ordering above must put `untrusted`
at zero or the generated bindings must use an explicit unknown-variant path.
**Ordering note**: when adding to schema, put `untrusted @0` first so that the
zero default is fail-closed, not the most-trusted value.

**DMA and IRQ neutrality.** `WallClock` and `ClockDiscipline` do not touch
device memory, DMA pools, or interrupt grant paths. They are pure kernel-state
caps. No DMA/MMIO/IRQ hazard applies.

**No capability-transfer amplification.** `WallClock` is a read-only snapshot
surface. Transferring it to another process does not grant clock-setting
authority. `ClockDiscipline` must not be transferable through normal cap-grant
paths; it should be restricted to init-owned grant at boot and explicit
manifest-operator grants.

## Relevant Research and Prior Art

### In-Tree Grounding

- [NO_HZ, SQPOLL, and Realtime Scheduling](../research/nohz-sqpoll-realtime.md)
  records the Linux timer-stack split between **clock sources** (monotonic
  timeline counters) and **clock events** (hardware devices that interrupt at
  selected future times), and concludes capOS should "introduce a monotonic
  `now_ns` clocksource layer" distinct from the scheduler tick. This proposal
  builds directly on that separation: `Timer.now()`/`WallClock.wallTime()`
  expose the clocksource timeline, while clock-event programming stays a
  scheduler concern. The wall-clock offset rides on the same monotonic base so
  a clock step never rewinds the timeline the scheduler and ring timeouts
  depend on — the monotonicity invariant called out in that note.
- [Future Scheduler Architecture](../research/future-scheduler-architecture.md)
  reinforces the same clocksource/clockevent boundary and the lesson that
  absolute-deadline waiters should be stored by expiry time, not periodic tick
  count. That confirms `WallClock` must not become the deadline substrate:
  deadlines remain monotonic, and wall-clock time is a separate, disciplinable
  view layered on top.

### External Precedent and Lessons

- **Linux `clock_gettime` / `adjtimex`.** Linux exposes distinct clock IDs
  (`CLOCK_MONOTONIC` vs `CLOCK_REALTIME`) and gates clock discipline behind a
  privileged interface: `adjtimex`/`clock_adjtime` and stepping the realtime
  clock require `CAP_SYS_TIME`. *Lesson:* reading time and disciplining time are
  different authorities. capOS encodes this as a read-only `WallClock` cap held
  by ordinary services and a separate, stronger `ClockDiscipline` cap held only
  by a designated sync service — the capability-native analog of the
  read/`CAP_SYS_TIME` split.
- **Linux time namespaces.** `CLOCK_MONOTONIC`/`CLOCK_BOOTTIME` offsets can be
  virtualized per namespace so a container observes a different boot/monotonic
  origin. *Lesson:* time can be a per-context value rather than a single global
  ambient fact, which supports delivering wall-clock as a granted, attenuable
  cap (and timezone data as a scoped `Directory`) instead of a process-wide
  environment.
- **Fuchsia/Zircon UTC clock objects.** Fuchsia models UTC as a kernel clock
  object distributed to processes as **read-only** handles, with a separate
  privileged maintainer service holding the write handle that disciplines the
  clock; clock reads carry an error bound and a "started/synced" signal so a
  reader can tell whether the clock is yet trustworthy. *Lesson:* this is the
  closest capability-native precedent for the design here. capOS's
  read-only `WallClock` with a `ClockProvenance` label maps to Fuchsia's
  read-only UTC handle plus its synced/error-bound signal, and
  `ClockDiscipline` maps to the single write-handle maintainer. (The in-tree
  Zircon report covers handles, rights, and VMOs but not the UTC clock object
  specifically; the UTC-clock mapping is external precedent, not yet captured
  as an in-tree research note.)
- **NTP step vs. slew.** NTP daemons step the clock for large offsets and slew
  (bounded rate adjustment) for small drift, precisely because abruptly
  rewinding wall time breaks timestamp ordering and timeouts. *Lesson:* capOS
  exposes `step()` and `slew()` as distinct `ClockDiscipline` methods rather
  than a single "set time", so the discipline policy is explicit at the cap
  boundary.
- **IEEE-1588 PTP.** Precision Time Protocol provides sub-microsecond hardware
  timestamping via a dedicated hardware clock, distinct from software NTP.
  *Lesson:* provenance is not binary. The `ptpSynced` vs `ntpSynced` distinction
  in `ClockProvenance` lets a validator that needs high-precision time
  distinguish the two without conflating accuracy with mere network sync.

### Dedicated Research Note

- [Time and Clock Authority](../research/time-and-clock-authority.md) is the
  focused prior-art survey for this proposal: verified Linux `CAP_SYS_TIME`
  read/discipline split, time namespaces as per-context clock offsets,
  chrony/NTP step/slew/smear discipline, PTP/IEEE-1588 hardware timestamping,
  Fuchsia's `ZX_RIGHT_READ`/`ZX_RIGHT_WRITE` UTC clock object, and leap-second
  smearing vs stepping, each with its capOS lesson and real sources. It is the
  primary external grounding for `WallClock`, `ClockDiscipline`, and
  `ClockProvenance`.

Residual research still owed before Phase 2/3 implementation: the servo /
loop-filter behavior, holdover and error-bound estimation, and suspend/resume
clock recovery are the highest-risk underspecified areas and should be deepened
in that note (or a follow-on) rather than fixed by this proposal's sketch.

## Relevant Proposals

- [Certificates and TLS](certificates-and-tls-proposal.md) — TLS validation
  delegates certificate validity-window checks to a granted `WallClock`.
- [OIDC and OAuth2](oidc-and-oauth2-proposal.md) — Token expiry checks
  (`exp`, `iat`, `nbf`) use a granted `WallClock` with at least
  `measuredBootMonotonic` provenance.
- [WASI Host Adapter](wasi-host-adapter-proposal.md) — Phase W.3+
  `clock_time_get(CLOCKID_REALTIME)` backed by a per-instance `WallClock` cap.
- [Cloud Metadata](cloud-metadata-proposal.md) — Cloud instance launch time
  delivered through the metadata capability; the `WallClock` seed path
  integrates with this bootstrap.
- [System Monitoring](system-monitoring-proposal.md) — Audit records carry
  `ClockProvenance`-labeled timestamps from the audit service's own `WallClock`
  read at ingestion.
- [Storage and Naming](storage-and-naming-proposal.md) — Timezone and locale
  data delivered as a read-only `Directory` cap, not an ambient environment.
