Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Research: Time and Clock Authority in Operating Systems

This note records verified external grounding for capOS’s time and clock authority design. It covers Linux clock IDs and privilege model, time namespaces, NTP/chrony discipline, PTP/IEEE-1588, Fuchsia’s UTC clock object, and leap-second handling. Findings feed directly into the WallClock / ClockDiscipline / ClockProvenance design in Time and Clock.


1. Linux: Clock IDs and the Read/Discipline Split

Clock IDs

Linux exposes multiple clock IDs through clock_gettime(2):

  • CLOCK_REALTIME — settable system-wide wall clock. Measures seconds since the Unix epoch. Can jump forward or backward when disciplined by settimeofday or NTP. Requires CAP_SYS_TIME to set.
  • CLOCK_MONOTONIC — non-settable system-wide monotonic clock. Counts from an unspecified boot-adjacent point. Cannot jump; unaffected by NTP steps; responds to frequency adjustments only. Does not include suspend time.
  • CLOCK_BOOTTIME — identical to CLOCK_MONOTONIC but includes suspended time. Non-settable. Useful for suspend-aware timers without CLOCK_REALTIME jump exposure.
  • CLOCK_TAI — non-settable clock based on wall time but counting leap seconds (TAI = International Atomic Time). Unlike CLOCK_REALTIME, it has no discontinuity on leap second insertion.

The CAP_SYS_TIME Privilege

CAP_SYS_TIME gates all operations that modify the kernel clock: settimeofday(2), stime(2), adjtimex(2)/clock_adjtime(2) when modes != 0, and setting the hardware RTC. Reading the clock — including a read-only adjtimex call with modes = 0 — requires no privilege. The clock_adjtime(2) variant (added in Linux 2.6.39) accepts an additional clk_id argument so callers can target a specific clock rather than only the system-wide realtime clock.

Concretely: any process can call clock_gettime(CLOCK_REALTIME, &ts) without privilege; only a privileged NTP daemon calls adjtimex() or clock_settime(CLOCK_REALTIME, &ts).

Lesson for capOS

This is the direct prior art for splitting WallClock (read-only cap, granted to ordinary processes) from ClockDiscipline (stronger cap, held only by the designated sync service). The Linux CAP_SYS_TIME flag is a coarse ambient privilege bit; capOS encodes the same split as two distinct capability types, with no ambient privilege required and no escalation path between them.


2. Linux Time Namespaces

What Is Namespaced

Linux time namespaces (added in Linux 5.6) let processes inside a namespace observe different values for CLOCK_MONOTONIC and CLOCK_BOOTTIME than the host. The per-namespace offsets are written to /proc/pid/timens_offsets before any process enters the namespace; once the first process has entered, writes return EACCES. The format is:

<clock-id> <offset-secs> <offset-nanosecs>

CLOCK_REALTIME is deliberately not namespaced: the kernel documentation cites “reasons of complexity and overhead” — in practice, CLOCK_REALTIME is already settable and the step/slew machinery is not per-namespace.

The offsets are pure integers (seconds + nanoseconds); there is no per-namespace frequency correction or NTP discipline within the namespace. This feature is primarily used for container checkpoint/restore (CRIU) where the monotonic clock must appear consistent before and after migration.

Lesson for capOS

Time is not an ambient global fact — it can be a per-context offset applied to a shared monotonic base. capOS’s WallClock cap fits this shape directly: the cap object holds the offset from the kernel monotonic timeline to the wall epoch, and different processes can hold caps with different offsets (timezone, test-clock injection, container clock virtualization). Freezing offsets at namespace creation maps to the capOS invariant that WallClock cannot be retroactively shifted by the holder — only ClockDiscipline can adjust the shared reference.


3. NTP Discipline: chrony and ntpd

Step vs. Slew

NTP daemons correct clock drift using two mechanisms:

  • Slew (gradual): adjust the clock frequency to converge slowly. Linux adjtime(3) / adjtimex(ADJ_OFFSET) implements slew. Default rate is bounded to 500 ppm; corrections over 0.5 seconds are clamped. This preserves monotonicity.
  • Step (abrupt): directly set the clock to the reference value. Breaks timestamp ordering for any process comparing consecutive readings across the step.

chrony makestep: makestep threshold limit allows stepping if the offset exceeds threshold seconds, but only within the first limit clock updates. For example, makestep 1.0 3 steps for offsets over 1 second during the first three updates, then slews only thereafter. A negative limit removes the update-count restriction entirely. After an initial step, chrony reverts to pure slew to protect running applications from abrupt clock changes.

Leap Second Handling (leapsecmode)

chrony supports four modes for the UTC leap second insertion:

  • system (default): the kernel steps the clock at the UTC boundary.
  • step: chronyd performs the step rather than delegating to the kernel.
  • slew: the leap second is absorbed by slewing (~12 seconds of correction at the default 500 ppm rate on Linux).
  • ignore: no automatic correction; the offset is absorbed during normal tracking.

For servers distributing time to clients unaware of leap seconds, chrony combines leapsecmode slew with smoothtime to smear the correction outward over up to 17 hours 34 minutes (when limiting slew to 1000 ppm).

Sync State Exposure

chronyc tracking reports the reference source, stratum, system time offset, frequency error, and RMS offset. chronyc sourcestats shows per-source statistics. These are the client-visible trust/sync signals that a capOS ClockProvenance would encode — the binary ntpSynced or ptpSynced flag plus an error bound.

Lesson for capOS

ClockDiscipline.step() and ClockDiscipline.slew() as distinct cap methods are justified by this split: an NTP daemon that calls step() at startup but only slew() at steady state exposes its policy at the capability boundary. Callers that need monotonic-safe time can check ClockProvenance to distinguish a recently-stepped clock from a stably-slewed one.


4. PTP / IEEE-1588: Hardware Timestamping

What PTP Provides

IEEE 1588 Precision Time Protocol synchronizes clocks using timestamps captured by NIC hardware at the Media Independent Interface (MII) boundary, typically within 100 ns of frame ingress/egress. This eliminates software scheduling jitter that limits NTP to millisecond accuracy. With hardware support, PTP achieves sub-microsecond accuracy.

Linux implements PTP through ptp4l (PTP daemon managing the protocol state machine) and phc2sys (synchronizing the hardware PTP clock to the system clock). ptp4l can configure a system as an Ordinary Clock (single port) or Boundary Clock (multi-port).

Use Cases vs. NTP

NTP is adequate for general server synchronization (sub-10 ms, typically 1–10 ms LAN, sub-ms with GPS). PTP is used where sub-microsecond accuracy is required: industrial automation, 5G RAN timing, financial trading, and audio/video bridging (AVB/TSN). The distinction is hardware timestamping support in the NIC and a local Grandmaster or GNSS-disciplined boundary clock.

Lesson for capOS

Provenance is not binary (synced vs. unsynced). The ptpSynced vs ntpSynced distinction in ClockProvenance is justified: a process requiring microsecond timestamps for audio-visual synchronization or hardware scheduling needs to distinguish PTP discipline from NTP discipline. A cap validator checking ClockProvenance before accepting a timestamp for a hard real-time claim should require ptpSynced and an error bound below the application’s tolerance.


5. Fuchsia / Zircon: UTC Clock Objects

Clock as a Kernel Object

Fuchsia models UTC time as a first-class kernel object (zx_clock_t), not as a syscall or global variable. A clock is a one-dimensional affine transformation of the monotonic reference timeline, maintained atomically and observed through typed operations.

Rights Model

Zircon clock handles carry typed rights:

  • ZX_RIGHT_READ: permits zx_clock_read() (read current time) and zx_clock_get_details() (read transformation parameters and error bound).
  • ZX_RIGHT_WRITE: permits zx_clock_update() — adjusting the clock’s absolute value, frequency (in ppm), and error bound (in nanoseconds).

Any process holding ZX_RIGHT_WRITE acts as a clock maintainer. There is no separate “maintain” right; the write right IS the maintain authority.

Monotonic option: clocks created with ZX_CLOCK_OPT_MONOTONIC reject any zx_clock_update() that would cause the clock to go backward.

Continuous option: clocks created with ZX_CLOCK_OPT_CONTINUOUS allow setting the absolute value only on the first update; subsequent absolute-value changes are rejected, allowing only frequency adjustments.

UTC Maintainer Service

All components started by Fuchsia’s Component Manager receive a UTC clock handle with read-only rights. Only the Timekeeper service receives the write handle. Timekeeper synchronizes against an RTC or a network time source and calls zx_clock_update() to discipline the UTC clock.

The UTC clock has a “backstop” guarantee: it never reports a time earlier than the timestamp of the latest build commit (the backstop value). Before Timekeeper first synchronizes, the clock may be in a fixed state (stopped at backstop) or running-but-unsynchronized state. Fuchsia documents that the UTC clock “is neither monotonic nor continuous” — Timekeeper may step it backward when corrections are needed. Callers needing a reliable timestamp must query the clock details to determine whether the clock has been synchronized.

Lesson for capOS

This is the closest capability-native precedent for capOS’s design. The mapping:

Fuchsia/ZirconcapOS
Clock kernel object with ZX_RIGHT_READ handleWallClock capability (read-only)
Clock handle with ZX_RIGHT_WRITE held by TimekeeperClockDiscipline capability (init-granted)
zx_clock_get_details() error bound and sync signalClockProvenance label on WallClock
Backstop guarantee (never before build timestamp)Provenance downgrades on suspend/resume or loss of sync
ZX_CLOCK_OPT_MONOTONIC flagThe invariant that Timer.now() monotonic base is never adjusted

The Fuchsia UTC design confirms that the right model is: one strong-authority maintainer, many read-only observers, with a typed signal for trust state. capOS extends this by making provenance an explicit labeled field on the cap rather than a query-on-demand operation.


6. Leap Seconds and Clock Steps: Smearing vs. Stepping

The Problem

UTC inserts or deletes leap seconds at irregular intervals, decided by the International Earth Rotation and Reference Systems Service (IERS). Inserting a leap second means UTC has a second labeled 23:59:60 before rolling to midnight, creating a discontinuity in POSIX time (which counts seconds without leap seconds). Deleting a leap second would mean skipping a second.

For software:

  • Stepping: CLOCK_REALTIME jumps by ±1 second at the UTC boundary. Any application comparing two CLOCK_REALTIME readings across the boundary sees a negative elapsed time (on insert) or a missing second (on delete). CLOCK_MONOTONIC must not step; it continues forward through the leap second unaffected.
  • Slewing/Smearing: the correction is distributed over a window. No discontinuity occurs, but CLOCK_REALTIME temporarily deviates from true UTC during the smear window.

Industry Smear Practice

Google has applied a 24-hour linear smear (noon-to-noon UTC) since 2008: each second in the smear window is ~11.6 µs longer than an SI second. AWS’s Amazon Time Sync Service applies the same 24-hour noon-to-noon linear smear automatically. Both services suppress the leap second indicator on their NTP responses so clients do not attempt their own step.

The smear approach means that any client synchronized to Google Public NTP or Amazon Time Sync is not tracking true UTC during the smear window — it tracks “smeared UTC”, which is coordinated but not the same as civil UTC. This is a design choice accepting brief inaccuracy for availability of monotonic-safe time.

CLOCK_MONOTONIC Must Not Jump

CLOCK_MONOTONIC is specifically designed to be immune to steps. Linux documents it as “nonsettable” — no process can set it; only frequency adjustments are permitted. The rationale: timers, timeouts, and scheduling deadlines depend on monotonic ordering. Any step in the monotonic timeline would silently break all in-flight waiters.

Lesson for capOS

The monotonic timeline (Timer.now()) must be the invariant substrate. WallClock is a separate, disciplinable offset layered on top. A ClockDiscipline.step() call adjusts the wall-clock offset without touching the monotonic base — ensuring in-flight ring timeouts and scheduler deadlines are never invalidated. The ClockProvenance.lastStep timestamp lets an auditor see when the wall clock was last stepped, so validators can reject timestamps taken during or shortly after a step if their use case requires continuity.


Applicability to capOS

Read vs. Discipline Authority

Every system surveyed maintains a hard split between reading time (no privilege required, granted to all processes) and adjusting time (strong authority, held by one designated service):

  • Linux: clock_gettime (unprivileged) vs adjtimex/CAP_SYS_TIME (privileged)
  • Fuchsia: ZX_RIGHT_READ handle (distributed to all components) vs ZX_RIGHT_WRITE handle (held only by Timekeeper)
  • chrony/ntpd: any client queries sync state; only the daemon calls adjtimex

capOS should encode this as: WallClock (read-only cap, grantable and attenuable) and ClockDiscipline (separate stronger cap, init-granted at boot, not transferable through normal cap-grant paths).

Clock Provenance as a Typed Signal

Fuchsia’s per-clock error bound and sync signal, and chrony’s tracking command, both expose metadata about trust state alongside the time value itself. capOS’s ClockProvenance label on WallClock captures this: a validator that needs trustworthy time checks provenance rather than relying on the presence of the cap alone.

The ptpSynced / ntpSynced distinction maps directly to the PTP vs NTP accuracy gap: hardware timestamping is a stronger claim than software NTP, and an OS-level audit trail needs to encode which applies.

Wall Clock as a Granted, Attenuable Cap

Linux time namespaces demonstrate that clock offsets can be virtualized per-context rather than being a single global ambient fact. capOS takes this further: WallClock is a capability object, not a process-wide environment variable. A test harness can inject a fake WallClock; a container process can receive a WallClock with a different UTC offset (timezone) without any global state change; a WASI host adapter can supply a per-instance WallClock to each wasm module without sharing a mutable global.

Step vs. Slew as Distinct Cap Methods

chrony’s makestep and leapsecmode options distinguish step (abrupt correction) from slew (rate adjustment). capOS should expose these as distinct ClockDiscipline methods so the discipline policy is explicit at the capability boundary — a sync service can be audited for whether it steps or only slews, and the ClockProvenance.lastStep field makes a step visible to downstream validators.

Monotonic Invariant Is Non-Negotiable

Every surveyed system — Linux CLOCK_MONOTONIC, Fuchsia ZX_CLOCK_OPT_MONOTONIC, chrony slew-only mode — treats monotonic ordering as inviolable. Any step in the monotonic timeline breaks in-flight timers, scheduling deadlines, and ring timeouts. capOS’s Timer.now() monotonic base must never be adjusted; only the wall-clock offset layered above it is disciplinable.

Audit Timestamps and Trusted Time

Audit log entries in capOS will carry timestamps. The ClockProvenance label on the WallClock used to generate those timestamps becomes the evidence of timestamp trustworthiness: an audit consumer can reject entries generated while provenance was unsynchronized or stepped (within a recency window after a step), rather than silently accepting timestamps of unknown reliability.

WASI Realtime Clock Mapping

WASI Preview 1 clock_time_get(CLOCKID_REALTIME) maps naturally to WallClock.wallTime(). A per-instance WASI WallClock cap — granted at module instantiation — means a wasm module receives the same read-only, provenance-labeled time view that native capOS services receive, with no special privilege and no ambient global.


Sources