# Research: Time and Clock Authority in Operating Systems

This note records verified external grounding for capOS's time and clock
authority design. It covers Linux clock IDs and privilege model, time
namespaces, NTP/chrony discipline, PTP/IEEE-1588, Fuchsia's UTC clock object,
and leap-second handling. Findings feed directly into the `WallClock` /
`ClockDiscipline` / `ClockProvenance` design in
[Time and Clock](../proposals/time-and-clock-proposal.md).

---

## 1. Linux: Clock IDs and the Read/Discipline Split

### Clock IDs

Linux exposes multiple clock IDs through `clock_gettime(2)`:

- **`CLOCK_REALTIME`** — settable system-wide wall clock. Measures seconds since
  the Unix epoch. Can jump forward or backward when disciplined by `settimeofday`
  or NTP. Requires `CAP_SYS_TIME` to set.
- **`CLOCK_MONOTONIC`** — non-settable system-wide monotonic clock. Counts from
  an unspecified boot-adjacent point. Cannot jump; unaffected by NTP steps;
  responds to frequency adjustments only. Does not include suspend time.
- **`CLOCK_BOOTTIME`** — identical to `CLOCK_MONOTONIC` but includes suspended
  time. Non-settable. Useful for suspend-aware timers without `CLOCK_REALTIME`
  jump exposure.
- **`CLOCK_TAI`** — non-settable clock based on wall time but counting leap
  seconds (TAI = International Atomic Time). Unlike `CLOCK_REALTIME`, it has no
  discontinuity on leap second insertion.

### The `CAP_SYS_TIME` Privilege

`CAP_SYS_TIME` gates all operations that *modify* the kernel clock:
`settimeofday(2)`, `stime(2)`, `adjtimex(2)`/`clock_adjtime(2)` when
`modes != 0`, and setting the hardware RTC. Reading the clock — including a
read-only `adjtimex` call with `modes = 0` — requires no privilege. The
`clock_adjtime(2)` variant (added in Linux 2.6.39) accepts an additional
`clk_id` argument so callers can target a specific clock rather than only the
system-wide realtime clock.

Concretely: any process can call `clock_gettime(CLOCK_REALTIME, &ts)` without
privilege; only a privileged NTP daemon calls `adjtimex()` or
`clock_settime(CLOCK_REALTIME, &ts)`.

### Lesson for capOS

This is the direct prior art for splitting `WallClock` (read-only cap, granted
to ordinary processes) from `ClockDiscipline` (stronger cap, held only by the
designated sync service). The Linux `CAP_SYS_TIME` flag is a coarse ambient
privilege bit; capOS encodes the same split as two distinct capability types,
with no ambient privilege required and no escalation path between them.

---

## 2. Linux Time Namespaces

### What Is Namespaced

Linux time namespaces (added in Linux 5.6) let processes inside a namespace
observe different values for `CLOCK_MONOTONIC` and `CLOCK_BOOTTIME` than the
host. The per-namespace offsets are written to `/proc/pid/timens_offsets` before
any process enters the namespace; once the first process has entered, writes
return `EACCES`. The format is:

```
<clock-id> <offset-secs> <offset-nanosecs>
```

`CLOCK_REALTIME` is deliberately **not** namespaced: the kernel documentation
cites "reasons of complexity and overhead" — in practice, `CLOCK_REALTIME` is
already settable and the step/slew machinery is not per-namespace.

The offsets are pure integers (seconds + nanoseconds); there is no per-namespace
frequency correction or NTP discipline within the namespace. This feature is
primarily used for container checkpoint/restore (CRIU) where the monotonic
clock must appear consistent before and after migration.

### Lesson for capOS

Time is not an ambient global fact — it can be a per-context offset applied to
a shared monotonic base. capOS's `WallClock` cap fits this shape directly: the
cap object holds the offset from the kernel monotonic timeline to the wall epoch,
and different processes can hold caps with different offsets (timezone,
test-clock injection, container clock virtualization). Freezing offsets at
namespace creation maps to the capOS invariant that `WallClock` cannot be
retroactively shifted by the holder — only `ClockDiscipline` can adjust the
shared reference.

---

## 3. NTP Discipline: chrony and ntpd

### Step vs. Slew

NTP daemons correct clock drift using two mechanisms:

- **Slew** (gradual): adjust the clock frequency to converge slowly. Linux
  `adjtime(3)` / `adjtimex(ADJ_OFFSET)` implements slew. Default rate is
  bounded to 500 ppm; corrections over 0.5 seconds are clamped. This preserves
  monotonicity.
- **Step** (abrupt): directly set the clock to the reference value. Breaks
  timestamp ordering for any process comparing consecutive readings across the
  step.

**chrony `makestep`**: `makestep threshold limit` allows stepping if the offset
exceeds `threshold` seconds, but only within the first `limit` clock updates.
For example, `makestep 1.0 3` steps for offsets over 1 second during the first
three updates, then slews only thereafter. A negative limit removes the
update-count restriction entirely. After an initial step, chrony reverts to pure
slew to protect running applications from abrupt clock changes.

### Leap Second Handling (`leapsecmode`)

chrony supports four modes for the UTC leap second insertion:

- **`system`** (default): the kernel steps the clock at the UTC boundary.
- **`step`**: chronyd performs the step rather than delegating to the kernel.
- **`slew`**: the leap second is absorbed by slewing (~12 seconds of correction
  at the default 500 ppm rate on Linux).
- **`ignore`**: no automatic correction; the offset is absorbed during normal
  tracking.

For servers distributing time to clients unaware of leap seconds, chrony
combines `leapsecmode slew` with `smoothtime` to smear the correction outward
over up to 17 hours 34 minutes (when limiting slew to 1000 ppm).

### Sync State Exposure

`chronyc tracking` reports the reference source, stratum, system time offset,
frequency error, and RMS offset. `chronyc sourcestats` shows per-source
statistics. These are the client-visible trust/sync signals that a capOS
`ClockProvenance` would encode — the binary `ntpSynced` or `ptpSynced` flag plus
an error bound.

### Lesson for capOS

`ClockDiscipline.step()` and `ClockDiscipline.slew()` as distinct cap methods
are justified by this split: an NTP daemon that calls `step()` at startup but
only `slew()` at steady state exposes its policy at the capability boundary.
Callers that need monotonic-safe time can check `ClockProvenance` to distinguish
a recently-stepped clock from a stably-slewed one.

---

## 4. PTP / IEEE-1588: Hardware Timestamping

### What PTP Provides

IEEE 1588 Precision Time Protocol synchronizes clocks using timestamps captured
by NIC hardware at the Media Independent Interface (MII) boundary, typically
within 100 ns of frame ingress/egress. This eliminates software scheduling
jitter that limits NTP to millisecond accuracy. With hardware support, PTP
achieves sub-microsecond accuracy.

Linux implements PTP through `ptp4l` (PTP daemon managing the protocol state
machine) and `phc2sys` (synchronizing the hardware PTP clock to the system
clock). `ptp4l` can configure a system as an Ordinary Clock (single port) or
Boundary Clock (multi-port).

### Use Cases vs. NTP

NTP is adequate for general server synchronization (sub-10 ms, typically 1–10 ms
LAN, sub-ms with GPS). PTP is used where sub-microsecond accuracy is required:
industrial automation, 5G RAN timing, financial trading, and audio/video
bridging (AVB/TSN). The distinction is hardware timestamping support in the NIC
and a local Grandmaster or GNSS-disciplined boundary clock.

### Lesson for capOS

Provenance is not binary (synced vs. unsynced). The `ptpSynced` vs `ntpSynced`
distinction in `ClockProvenance` is justified: a process requiring microsecond
timestamps for audio-visual synchronization or hardware scheduling needs to
distinguish PTP discipline from NTP discipline. A cap validator checking
`ClockProvenance` before accepting a timestamp for a hard real-time claim should
require `ptpSynced` and an error bound below the application's tolerance.

---

## 5. Fuchsia / Zircon: UTC Clock Objects

### Clock as a Kernel Object

Fuchsia models UTC time as a first-class kernel object (`zx_clock_t`), not as
a syscall or global variable. A clock is a one-dimensional affine
transformation of the monotonic reference timeline, maintained atomically and
observed through typed operations.

### Rights Model

Zircon clock handles carry typed rights:

- **`ZX_RIGHT_READ`**: permits `zx_clock_read()` (read current time) and
  `zx_clock_get_details()` (read transformation parameters and error bound).
- **`ZX_RIGHT_WRITE`**: permits `zx_clock_update()` — adjusting the clock's
  absolute value, frequency (in ppm), and error bound (in nanoseconds).

Any process holding `ZX_RIGHT_WRITE` acts as a clock maintainer. There is no
separate "maintain" right; the write right IS the maintain authority.

**Monotonic option**: clocks created with `ZX_CLOCK_OPT_MONOTONIC` reject any
`zx_clock_update()` that would cause the clock to go backward.

**Continuous option**: clocks created with `ZX_CLOCK_OPT_CONTINUOUS` allow
setting the absolute value only on the first update; subsequent absolute-value
changes are rejected, allowing only frequency adjustments.

### UTC Maintainer Service

All components started by Fuchsia's Component Manager receive a UTC clock handle
with **read-only** rights. Only the Timekeeper service receives the write handle.
Timekeeper synchronizes against an RTC or a network time source and calls
`zx_clock_update()` to discipline the UTC clock.

The UTC clock has a "backstop" guarantee: it never reports a time earlier than
the timestamp of the latest build commit (the backstop value). Before Timekeeper
first synchronizes, the clock may be in a fixed state (stopped at backstop) or
running-but-unsynchronized state. Fuchsia documents that the UTC clock "is
neither monotonic nor continuous" — Timekeeper may step it backward when
corrections are needed. Callers needing a reliable timestamp must query the clock
details to determine whether the clock has been synchronized.

### Lesson for capOS

This is the closest capability-native precedent for capOS's design. The mapping:

| Fuchsia/Zircon | capOS |
|---|---|
| Clock kernel object with `ZX_RIGHT_READ` handle | `WallClock` capability (read-only) |
| Clock handle with `ZX_RIGHT_WRITE` held by Timekeeper | `ClockDiscipline` capability (init-granted) |
| `zx_clock_get_details()` error bound and sync signal | `ClockProvenance` label on `WallClock` |
| Backstop guarantee (never before build timestamp) | Provenance downgrades on suspend/resume or loss of sync |
| `ZX_CLOCK_OPT_MONOTONIC` flag | The invariant that `Timer.now()` monotonic base is never adjusted |

The Fuchsia UTC design confirms that the right model is: one strong-authority
maintainer, many read-only observers, with a typed signal for trust state. capOS
extends this by making provenance an explicit labeled field on the cap rather
than a query-on-demand operation.

---

## 6. Leap Seconds and Clock Steps: Smearing vs. Stepping

### The Problem

UTC inserts or deletes leap seconds at irregular intervals, decided by the
International Earth Rotation and Reference Systems Service (IERS). Inserting a
leap second means UTC has a second labeled `23:59:60` before rolling to midnight,
creating a discontinuity in POSIX time (which counts seconds without leap
seconds). Deleting a leap second would mean skipping a second.

For software:

- **Stepping**: `CLOCK_REALTIME` jumps by ±1 second at the UTC boundary. Any
  application comparing two `CLOCK_REALTIME` readings across the boundary sees a
  negative elapsed time (on insert) or a missing second (on delete). `CLOCK_MONOTONIC`
  must not step; it continues forward through the leap second unaffected.
- **Slewing/Smearing**: the correction is distributed over a window. No
  discontinuity occurs, but `CLOCK_REALTIME` temporarily deviates from true UTC
  during the smear window.

### Industry Smear Practice

Google has applied a 24-hour linear smear (noon-to-noon UTC) since 2008: each
second in the smear window is ~11.6 µs longer than an SI second. AWS's Amazon
Time Sync Service applies the same 24-hour noon-to-noon linear smear
automatically. Both services suppress the leap second indicator on their NTP
responses so clients do not attempt their own step.

The smear approach means that any client synchronized to Google Public NTP or
Amazon Time Sync is not tracking true UTC during the smear window — it tracks
"smeared UTC", which is coordinated but not the same as civil UTC. This is a
design choice accepting brief inaccuracy for availability of monotonic-safe time.

### `CLOCK_MONOTONIC` Must Not Jump

`CLOCK_MONOTONIC` is specifically designed to be immune to steps. Linux
documents it as "nonsettable" — no process can set it; only frequency
adjustments are permitted. The rationale: timers, timeouts, and scheduling
deadlines depend on monotonic ordering. Any step in the monotonic timeline would
silently break all in-flight waiters.

### Lesson for capOS

The monotonic timeline (`Timer.now()`) must be the invariant substrate.
`WallClock` is a separate, disciplinable offset layered on top. A
`ClockDiscipline.step()` call adjusts the wall-clock offset without touching the
monotonic base — ensuring in-flight ring timeouts and scheduler deadlines are
never invalidated. The `ClockProvenance.lastStep` timestamp lets an auditor see
when the wall clock was last stepped, so validators can reject timestamps taken
during or shortly after a step if their use case requires continuity.

---

## Applicability to capOS

### Read vs. Discipline Authority

Every system surveyed maintains a hard split between *reading* time (no
privilege required, granted to all processes) and *adjusting* time (strong
authority, held by one designated service):

- Linux: `clock_gettime` (unprivileged) vs `adjtimex`/`CAP_SYS_TIME`
  (privileged)
- Fuchsia: `ZX_RIGHT_READ` handle (distributed to all components) vs
  `ZX_RIGHT_WRITE` handle (held only by Timekeeper)
- chrony/ntpd: any client queries sync state; only the daemon calls `adjtimex`

capOS should encode this as: `WallClock` (read-only cap, grantable and
attenuable) and `ClockDiscipline` (separate stronger cap, init-granted at boot,
not transferable through normal cap-grant paths).

### Clock Provenance as a Typed Signal

Fuchsia's per-clock error bound and sync signal, and chrony's `tracking`
command, both expose *metadata about trust state* alongside the time value
itself. capOS's `ClockProvenance` label on `WallClock` captures this: a
validator that needs trustworthy time checks `provenance` rather than relying on
the presence of the cap alone.

The `ptpSynced` / `ntpSynced` distinction maps directly to the PTP vs NTP
accuracy gap: hardware timestamping is a stronger claim than software NTP, and
an OS-level audit trail needs to encode which applies.

### Wall Clock as a Granted, Attenuable Cap

Linux time namespaces demonstrate that clock offsets can be virtualized
per-context rather than being a single global ambient fact. capOS takes this
further: `WallClock` is a capability object, not a process-wide environment
variable. A test harness can inject a fake `WallClock`; a container process can
receive a `WallClock` with a different UTC offset (timezone) without any global
state change; a WASI host adapter can supply a per-instance `WallClock` to each
wasm module without sharing a mutable global.

### Step vs. Slew as Distinct Cap Methods

chrony's `makestep` and `leapsecmode` options distinguish step (abrupt
correction) from slew (rate adjustment). capOS should expose these as distinct
`ClockDiscipline` methods so the discipline policy is explicit at the capability
boundary — a sync service can be audited for whether it steps or only slews,
and the `ClockProvenance.lastStep` field makes a step visible to downstream
validators.

### Monotonic Invariant Is Non-Negotiable

Every surveyed system — Linux `CLOCK_MONOTONIC`, Fuchsia `ZX_CLOCK_OPT_MONOTONIC`,
chrony slew-only mode — treats monotonic ordering as inviolable. Any step in
the monotonic timeline breaks in-flight timers, scheduling deadlines, and ring
timeouts. capOS's `Timer.now()` monotonic base must never be adjusted; only the
wall-clock offset layered above it is disciplinable.

### Audit Timestamps and Trusted Time

Audit log entries in capOS will carry timestamps. The `ClockProvenance` label on
the `WallClock` used to generate those timestamps becomes the evidence of
timestamp trustworthiness: an audit consumer can reject entries generated while
provenance was `unsynchronized` or `stepped` (within a recency window after a
step), rather than silently accepting timestamps of unknown reliability.

### WASI Realtime Clock Mapping

WASI Preview 1 `clock_time_get(CLOCKID_REALTIME)` maps naturally to
`WallClock.wallTime()`. A per-instance WASI `WallClock` cap — granted at module
instantiation — means a wasm module receives the same read-only, provenance-labeled
time view that native capOS services receive, with no special privilege and no
ambient global.

---

## Sources

- [clock\_gettime(2) — Linux manual page](https://www.man7.org/linux/man-pages/man2/clock_gettime.2.html)
- [capabilities(7) — Linux manual page](https://www.man7.org/linux/man-pages/man7/capabilities.7.html)
- [adjtimex / clock\_adjtime(2) — Ubuntu manpage](https://manpages.ubuntu.com/manpages//jammy/man2/clock_adjtime.2.html)
- [time\_namespaces(7) — Linux manual page](https://man7.org/linux/man-pages/man7/time_namespaces.7.html)
- [Clock — Fuchsia reference (kernel objects)](https://fuchsia.dev/fuchsia-src/reference/kernel_objects/clock)
- [UTC behavior — Fuchsia](https://fuchsia.dev/fuchsia-src/concepts/kernel/time/utc/behavior)
- [chrony.conf(5) — chrony 4.3 manual](https://chrony-project.org/doc/4.3/chrony.conf.html)
- [Leap Second Smearing — Google for Developers](https://developers.google.com/time/smear)
- [Look Before You Leap — AWS blog on leap second and smearing](https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/)
- [Configuring PTP Using ptp4l — Red Hat RHEL 7 System Administrator's Guide](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-configuring_ptp_using_ptp4l)
- [IEEE 1588 Precision Time Protocol — NTP.org reference](https://www.ntp.org/reflib/ptp/)
