Research: Time and Clock Authority in Operating Systems
This note records verified external grounding for capOS’s time and clock
authority design. It covers Linux clock IDs and privilege model, time
namespaces, NTP/chrony discipline, PTP/IEEE-1588, Fuchsia’s UTC clock object,
and leap-second handling. Findings feed directly into the WallClock /
ClockDiscipline / ClockProvenance design in
Time and Clock.
1. Linux: Clock IDs and the Read/Discipline Split
Clock IDs
Linux exposes multiple clock IDs through clock_gettime(2):
CLOCK_REALTIME— settable system-wide wall clock. Measures seconds since the Unix epoch. Can jump forward or backward when disciplined bysettimeofdayor NTP. RequiresCAP_SYS_TIMEto set.CLOCK_MONOTONIC— non-settable system-wide monotonic clock. Counts from an unspecified boot-adjacent point. Cannot jump; unaffected by NTP steps; responds to frequency adjustments only. Does not include suspend time.CLOCK_BOOTTIME— identical toCLOCK_MONOTONICbut includes suspended time. Non-settable. Useful for suspend-aware timers withoutCLOCK_REALTIMEjump exposure.CLOCK_TAI— non-settable clock based on wall time but counting leap seconds (TAI = International Atomic Time). UnlikeCLOCK_REALTIME, it has no discontinuity on leap second insertion.
The CAP_SYS_TIME Privilege
CAP_SYS_TIME gates all operations that modify the kernel clock:
settimeofday(2), stime(2), adjtimex(2)/clock_adjtime(2) when
modes != 0, and setting the hardware RTC. Reading the clock — including a
read-only adjtimex call with modes = 0 — requires no privilege. The
clock_adjtime(2) variant (added in Linux 2.6.39) accepts an additional
clk_id argument so callers can target a specific clock rather than only the
system-wide realtime clock.
Concretely: any process can call clock_gettime(CLOCK_REALTIME, &ts) without
privilege; only a privileged NTP daemon calls adjtimex() or
clock_settime(CLOCK_REALTIME, &ts).
Lesson for capOS
This is the direct prior art for splitting WallClock (read-only cap, granted
to ordinary processes) from ClockDiscipline (stronger cap, held only by the
designated sync service). The Linux CAP_SYS_TIME flag is a coarse ambient
privilege bit; capOS encodes the same split as two distinct capability types,
with no ambient privilege required and no escalation path between them.
2. Linux Time Namespaces
What Is Namespaced
Linux time namespaces (added in Linux 5.6) let processes inside a namespace
observe different values for CLOCK_MONOTONIC and CLOCK_BOOTTIME than the
host. The per-namespace offsets are written to /proc/pid/timens_offsets before
any process enters the namespace; once the first process has entered, writes
return EACCES. The format is:
<clock-id> <offset-secs> <offset-nanosecs>
CLOCK_REALTIME is deliberately not namespaced: the kernel documentation
cites “reasons of complexity and overhead” — in practice, CLOCK_REALTIME is
already settable and the step/slew machinery is not per-namespace.
The offsets are pure integers (seconds + nanoseconds); there is no per-namespace frequency correction or NTP discipline within the namespace. This feature is primarily used for container checkpoint/restore (CRIU) where the monotonic clock must appear consistent before and after migration.
Lesson for capOS
Time is not an ambient global fact — it can be a per-context offset applied to
a shared monotonic base. capOS’s WallClock cap fits this shape directly: the
cap object holds the offset from the kernel monotonic timeline to the wall epoch,
and different processes can hold caps with different offsets (timezone,
test-clock injection, container clock virtualization). Freezing offsets at
namespace creation maps to the capOS invariant that WallClock cannot be
retroactively shifted by the holder — only ClockDiscipline can adjust the
shared reference.
3. NTP Discipline: chrony and ntpd
Step vs. Slew
NTP daemons correct clock drift using two mechanisms:
- Slew (gradual): adjust the clock frequency to converge slowly. Linux
adjtime(3)/adjtimex(ADJ_OFFSET)implements slew. Default rate is bounded to 500 ppm; corrections over 0.5 seconds are clamped. This preserves monotonicity. - Step (abrupt): directly set the clock to the reference value. Breaks timestamp ordering for any process comparing consecutive readings across the step.
chrony makestep: makestep threshold limit allows stepping if the offset
exceeds threshold seconds, but only within the first limit clock updates.
For example, makestep 1.0 3 steps for offsets over 1 second during the first
three updates, then slews only thereafter. A negative limit removes the
update-count restriction entirely. After an initial step, chrony reverts to pure
slew to protect running applications from abrupt clock changes.
Leap Second Handling (leapsecmode)
chrony supports four modes for the UTC leap second insertion:
system(default): the kernel steps the clock at the UTC boundary.step: chronyd performs the step rather than delegating to the kernel.slew: the leap second is absorbed by slewing (~12 seconds of correction at the default 500 ppm rate on Linux).ignore: no automatic correction; the offset is absorbed during normal tracking.
For servers distributing time to clients unaware of leap seconds, chrony
combines leapsecmode slew with smoothtime to smear the correction outward
over up to 17 hours 34 minutes (when limiting slew to 1000 ppm).
Sync State Exposure
chronyc tracking reports the reference source, stratum, system time offset,
frequency error, and RMS offset. chronyc sourcestats shows per-source
statistics. These are the client-visible trust/sync signals that a capOS
ClockProvenance would encode — the binary ntpSynced or ptpSynced flag plus
an error bound.
Lesson for capOS
ClockDiscipline.step() and ClockDiscipline.slew() as distinct cap methods
are justified by this split: an NTP daemon that calls step() at startup but
only slew() at steady state exposes its policy at the capability boundary.
Callers that need monotonic-safe time can check ClockProvenance to distinguish
a recently-stepped clock from a stably-slewed one.
4. PTP / IEEE-1588: Hardware Timestamping
What PTP Provides
IEEE 1588 Precision Time Protocol synchronizes clocks using timestamps captured by NIC hardware at the Media Independent Interface (MII) boundary, typically within 100 ns of frame ingress/egress. This eliminates software scheduling jitter that limits NTP to millisecond accuracy. With hardware support, PTP achieves sub-microsecond accuracy.
Linux implements PTP through ptp4l (PTP daemon managing the protocol state
machine) and phc2sys (synchronizing the hardware PTP clock to the system
clock). ptp4l can configure a system as an Ordinary Clock (single port) or
Boundary Clock (multi-port).
Use Cases vs. NTP
NTP is adequate for general server synchronization (sub-10 ms, typically 1–10 ms LAN, sub-ms with GPS). PTP is used where sub-microsecond accuracy is required: industrial automation, 5G RAN timing, financial trading, and audio/video bridging (AVB/TSN). The distinction is hardware timestamping support in the NIC and a local Grandmaster or GNSS-disciplined boundary clock.
Lesson for capOS
Provenance is not binary (synced vs. unsynced). The ptpSynced vs ntpSynced
distinction in ClockProvenance is justified: a process requiring microsecond
timestamps for audio-visual synchronization or hardware scheduling needs to
distinguish PTP discipline from NTP discipline. A cap validator checking
ClockProvenance before accepting a timestamp for a hard real-time claim should
require ptpSynced and an error bound below the application’s tolerance.
5. Fuchsia / Zircon: UTC Clock Objects
Clock as a Kernel Object
Fuchsia models UTC time as a first-class kernel object (zx_clock_t), not as
a syscall or global variable. A clock is a one-dimensional affine
transformation of the monotonic reference timeline, maintained atomically and
observed through typed operations.
Rights Model
Zircon clock handles carry typed rights:
ZX_RIGHT_READ: permitszx_clock_read()(read current time) andzx_clock_get_details()(read transformation parameters and error bound).ZX_RIGHT_WRITE: permitszx_clock_update()— adjusting the clock’s absolute value, frequency (in ppm), and error bound (in nanoseconds).
Any process holding ZX_RIGHT_WRITE acts as a clock maintainer. There is no
separate “maintain” right; the write right IS the maintain authority.
Monotonic option: clocks created with ZX_CLOCK_OPT_MONOTONIC reject any
zx_clock_update() that would cause the clock to go backward.
Continuous option: clocks created with ZX_CLOCK_OPT_CONTINUOUS allow
setting the absolute value only on the first update; subsequent absolute-value
changes are rejected, allowing only frequency adjustments.
UTC Maintainer Service
All components started by Fuchsia’s Component Manager receive a UTC clock handle
with read-only rights. Only the Timekeeper service receives the write handle.
Timekeeper synchronizes against an RTC or a network time source and calls
zx_clock_update() to discipline the UTC clock.
The UTC clock has a “backstop” guarantee: it never reports a time earlier than the timestamp of the latest build commit (the backstop value). Before Timekeeper first synchronizes, the clock may be in a fixed state (stopped at backstop) or running-but-unsynchronized state. Fuchsia documents that the UTC clock “is neither monotonic nor continuous” — Timekeeper may step it backward when corrections are needed. Callers needing a reliable timestamp must query the clock details to determine whether the clock has been synchronized.
Lesson for capOS
This is the closest capability-native precedent for capOS’s design. The mapping:
| Fuchsia/Zircon | capOS |
|---|---|
Clock kernel object with ZX_RIGHT_READ handle | WallClock capability (read-only) |
Clock handle with ZX_RIGHT_WRITE held by Timekeeper | ClockDiscipline capability (init-granted) |
zx_clock_get_details() error bound and sync signal | ClockProvenance label on WallClock |
| Backstop guarantee (never before build timestamp) | Provenance downgrades on suspend/resume or loss of sync |
ZX_CLOCK_OPT_MONOTONIC flag | The invariant that Timer.now() monotonic base is never adjusted |
The Fuchsia UTC design confirms that the right model is: one strong-authority maintainer, many read-only observers, with a typed signal for trust state. capOS extends this by making provenance an explicit labeled field on the cap rather than a query-on-demand operation.
6. Leap Seconds and Clock Steps: Smearing vs. Stepping
The Problem
UTC inserts or deletes leap seconds at irregular intervals, decided by the
International Earth Rotation and Reference Systems Service (IERS). Inserting a
leap second means UTC has a second labeled 23:59:60 before rolling to midnight,
creating a discontinuity in POSIX time (which counts seconds without leap
seconds). Deleting a leap second would mean skipping a second.
For software:
- Stepping:
CLOCK_REALTIMEjumps by ±1 second at the UTC boundary. Any application comparing twoCLOCK_REALTIMEreadings across the boundary sees a negative elapsed time (on insert) or a missing second (on delete).CLOCK_MONOTONICmust not step; it continues forward through the leap second unaffected. - Slewing/Smearing: the correction is distributed over a window. No
discontinuity occurs, but
CLOCK_REALTIMEtemporarily deviates from true UTC during the smear window.
Industry Smear Practice
Google has applied a 24-hour linear smear (noon-to-noon UTC) since 2008: each second in the smear window is ~11.6 µs longer than an SI second. AWS’s Amazon Time Sync Service applies the same 24-hour noon-to-noon linear smear automatically. Both services suppress the leap second indicator on their NTP responses so clients do not attempt their own step.
The smear approach means that any client synchronized to Google Public NTP or Amazon Time Sync is not tracking true UTC during the smear window — it tracks “smeared UTC”, which is coordinated but not the same as civil UTC. This is a design choice accepting brief inaccuracy for availability of monotonic-safe time.
CLOCK_MONOTONIC Must Not Jump
CLOCK_MONOTONIC is specifically designed to be immune to steps. Linux
documents it as “nonsettable” — no process can set it; only frequency
adjustments are permitted. The rationale: timers, timeouts, and scheduling
deadlines depend on monotonic ordering. Any step in the monotonic timeline would
silently break all in-flight waiters.
Lesson for capOS
The monotonic timeline (Timer.now()) must be the invariant substrate.
WallClock is a separate, disciplinable offset layered on top. A
ClockDiscipline.step() call adjusts the wall-clock offset without touching the
monotonic base — ensuring in-flight ring timeouts and scheduler deadlines are
never invalidated. The ClockProvenance.lastStep timestamp lets an auditor see
when the wall clock was last stepped, so validators can reject timestamps taken
during or shortly after a step if their use case requires continuity.
Applicability to capOS
Read vs. Discipline Authority
Every system surveyed maintains a hard split between reading time (no privilege required, granted to all processes) and adjusting time (strong authority, held by one designated service):
- Linux:
clock_gettime(unprivileged) vsadjtimex/CAP_SYS_TIME(privileged) - Fuchsia:
ZX_RIGHT_READhandle (distributed to all components) vsZX_RIGHT_WRITEhandle (held only by Timekeeper) - chrony/ntpd: any client queries sync state; only the daemon calls
adjtimex
capOS should encode this as: WallClock (read-only cap, grantable and
attenuable) and ClockDiscipline (separate stronger cap, init-granted at boot,
not transferable through normal cap-grant paths).
Clock Provenance as a Typed Signal
Fuchsia’s per-clock error bound and sync signal, and chrony’s tracking
command, both expose metadata about trust state alongside the time value
itself. capOS’s ClockProvenance label on WallClock captures this: a
validator that needs trustworthy time checks provenance rather than relying on
the presence of the cap alone.
The ptpSynced / ntpSynced distinction maps directly to the PTP vs NTP
accuracy gap: hardware timestamping is a stronger claim than software NTP, and
an OS-level audit trail needs to encode which applies.
Wall Clock as a Granted, Attenuable Cap
Linux time namespaces demonstrate that clock offsets can be virtualized
per-context rather than being a single global ambient fact. capOS takes this
further: WallClock is a capability object, not a process-wide environment
variable. A test harness can inject a fake WallClock; a container process can
receive a WallClock with a different UTC offset (timezone) without any global
state change; a WASI host adapter can supply a per-instance WallClock to each
wasm module without sharing a mutable global.
Step vs. Slew as Distinct Cap Methods
chrony’s makestep and leapsecmode options distinguish step (abrupt
correction) from slew (rate adjustment). capOS should expose these as distinct
ClockDiscipline methods so the discipline policy is explicit at the capability
boundary — a sync service can be audited for whether it steps or only slews,
and the ClockProvenance.lastStep field makes a step visible to downstream
validators.
Monotonic Invariant Is Non-Negotiable
Every surveyed system — Linux CLOCK_MONOTONIC, Fuchsia ZX_CLOCK_OPT_MONOTONIC,
chrony slew-only mode — treats monotonic ordering as inviolable. Any step in
the monotonic timeline breaks in-flight timers, scheduling deadlines, and ring
timeouts. capOS’s Timer.now() monotonic base must never be adjusted; only the
wall-clock offset layered above it is disciplinable.
Audit Timestamps and Trusted Time
Audit log entries in capOS will carry timestamps. The ClockProvenance label on
the WallClock used to generate those timestamps becomes the evidence of
timestamp trustworthiness: an audit consumer can reject entries generated while
provenance was unsynchronized or stepped (within a recency window after a
step), rather than silently accepting timestamps of unknown reliability.
WASI Realtime Clock Mapping
WASI Preview 1 clock_time_get(CLOCKID_REALTIME) maps naturally to
WallClock.wallTime(). A per-instance WASI WallClock cap — granted at module
instantiation — means a wasm module receives the same read-only, provenance-labeled
time view that native capOS services receive, with no special privilege and no
ambient global.
Sources
- clock_gettime(2) — Linux manual page
- capabilities(7) — Linux manual page
- adjtimex / clock_adjtime(2) — Ubuntu manpage
- time_namespaces(7) — Linux manual page
- Clock — Fuchsia reference (kernel objects)
- UTC behavior — Fuchsia
- chrony.conf(5) — chrony 4.3 manual
- Leap Second Smearing — Google for Developers
- Look Before You Leap — AWS blog on leap second and smearing
- Configuring PTP Using ptp4l — Red Hat RHEL 7 System Administrator’s Guide
- IEEE 1588 Precision Time Protocol — NTP.org reference