# Proposal: Default User Avatar From Identity Hash

How capOS should pick a default avatar for an account or session in a way that
is deterministic, stable across reboots and devices, free of network
side-effects, and easy for the user to override with an explicit choice.


## Problem

Today every `UserSession` is metadata-only: name (sometimes), profile class
(`anonymous` / `guest` / `operator` / future durable accounts), and a
session-token entropy field. Any UI that needs to show "who is this" — login
screen, shell prompt, remote-session client, future GUI — has nothing to draw
beyond a profile-class fallback. The consequences:

* New accounts and anonymous sessions look identical even when they have
  different identities, which is misleading in any multi-account context.
* If an admin assigns avatars by hand, the assignment lives outside the
  identity surface and is not stable across re-imports of the account.
* Without an authority-controlled default, every UI invents its own,
  including potentially a Gravatar-style network call that exposes the
  account email to a third party.

The branding asset set already ships 144 curated rounded-card tiles
(`branding/user-icons/set-flat/`, 72 icons; `branding/user-icons/set-modern/`,
72 icons). They are typed as user avatars but have no consumer yet.


## Goals

1. Every account and session resolves to a concrete avatar without relying on
   network lookups or external services.
2. The default is **stable**: the same identity always resolves to the same
   tile, on every host that imports the account, until the user explicitly
   sets an override.
3. The default is **derived from a stable identifier**, not from mutable
   profile fields like display name. Renaming an account does not change its
   avatar.
4. The override is **persistent** and travels with the account, not with a
   per-host UI preference store.
5. Anonymous and short-lived sessions still get *some* deterministic avatar so
   they look distinct from each other within a session lifetime, without
   leaking durable identity.
6. The avatar surface is **a capability**, not an ambient lookup. UI code asks
   for an `Avatar` from the `UserSession` it already holds.


## Non-Goals

* Generative identicons (jdenticon-style pixel art). The curated tile sets
  are already on disk and visually consistent with the rest of the branding.
* Per-user avatar uploads. The override is a *selection* from the shipped set
  for now; arbitrary blob uploads are a separate, larger design question
  (storage, scanning, capability scope).
* Avatar themes that follow OS dark/light mode. Theme handling is the
  responsibility of the rendering surface; the identity layer commits to a
  single tile per account.
* Group/role icons, badges, presence indicators. Those layer on top of the
  avatar, they do not replace it.


## Design

### Identity Inputs

The hash input is the **stable account identifier**, never the display name
and never anything that can be rotated for security reasons. Every subject
class is **length-framed** and **domain-tagged**, so an attacker who can
choose bytes for one class cannot synthesize a collision against another:

```
input := classTag || u16(len(field_1)) || field_1 || ... || u16(len(field_k)) || field_k
```

| Subject                          | Class tag           | Fields (in order)                                       |
| -------------------------------- | ------------------- | ------------------------------------------------------- |
| Durable account                  | `"acct"`            | `principalId`                                           |
| Manifest-seeded operator account | `"oper"`            | `principalId` (resolved to the seeded operator)         |
| Service identity                 | `"svc "`            | `principalId` (manifest or registry)                    |
| Federated account                | `"fed "`            | `providerKind`, `issuer`, `tenant`, `subject`           |
| Anonymous session                | `"anon"`            | session-token entropy                                   |
| Guest session                    | `"gst "`            | session-token entropy                                   |

Class tags are 4 fixed bytes (space-padded where shorter) so the input
prefix is unambiguous without needing a separator. The federated layout
matches the canonical external subject key from
[user identity and policy](user-identity-and-policy-proposal.md): the same
`(providerKind, issuer, tenant, subject)` tuple that produces
`AccountExternalBinding.subjectHash` — `subject` alone is not unique across
identity providers and must not be used directly. Length-framing ensures
that, e.g., `(issuer="A", tenant="BC")` and `(issuer="AB", tenant="C")`
hash differently even though their concatenations would otherwise be equal.

### Hash and Mapping

```
digest = BLAKE2b-256(personalization = "capos-avatar-v1",
                    message         = input)
tile_index = u32_be(digest[0:4]) % len(active_set)
```

* `BLAKE2b` is the digest primitive named by the
  [cryptography and key management proposal](cryptography-and-key-management-proposal.md);
  no new primitive.
* `"capos-avatar-v1"` is the **public personalization tag** (BLAKE2's
  built-in domain-separation parameter), not a secret key. The avatar
  selection is fully derivable from public account metadata; no MAC and no
  HKDF subkey derivation is involved. Bumping `v1` to `v2` would let us
  re-issue defaults across the fleet (e.g., if a future tile set deprecates
  an icon) without affecting any other hash that consumes the same
  identifier.
* `u32_be` over the first four digest bytes is sufficient: the tile-set
  sizes (72) are far smaller than `2^32`, and modulo bias on a 32-bit space
  against 72 buckets is below `2^-25` — visually irrelevant.
* Collisions are *fine* and expected: with 72 tiles, two arbitrary accounts
  collide with probability ~1.4%; in a tenant of 36 users, the birthday
  probability of any collision is roughly 50%. The avatar conveys identity
  hint, not identity proof. Higher-assurance UIs combine the avatar with
  the display name and account id.

### Active Set

Each system commits to one active set (`set-flat` or `set-modern`). The
active set is a system-level configuration value, not a per-user choice, so
that:

* All accounts on a host look stylistically uniform.
* Switching the system theme remaps every account's default deterministically
  but consistently — every account moves to its set-modern tile of the same
  hash position, not to a random new one.

The active set is exposed via `SystemInfo.avatarSet` (extension to the
existing `SystemInfo` capability). Future themes add new sets without
reshuffling existing assignments.

### Override

A durable account can pin an explicit tile that wins over the hash-derived
default. The override is a new optional field on `AccountRecord`:

```capnp
struct AccountRecord {
  # ...existing fields @0..@18 from the identity proposal...
  avatarOverride @19 :AvatarRef;       # zero-length set/name means "no override"
}

struct AvatarRef {
  set  @0 :Text;   # e.g. "set-flat", "set-modern"
  name @1 :Text;   # e.g. "panda" (the bare semantic name, without NNN- prefix)
}
```

* The override is mutated through the existing
  `AccountStoreManager.update(recordId, expectedStoreEpoch,
  expectedRecordVersion, expectedHash, patch)` compare-and-set protocol
  defined by the identity proposal. Setting or clearing an avatar bumps
  `recordVersion`, recomputes `contentHash`, and links to `previousHash`
  exactly like any other field change; nothing about avatar overrides
  bypasses the record-version, store-epoch, or hash-chain checks.
* Validation: `set` must name a set the active build ships, and `name`
  must resolve to a tile within that set. Records that fail this check at
  load time fall back to the hash-derived default and emit an audit
  record; the record is not silently rewritten.
* The override is checked first; the hash is the fallback.
* `update` patches use the standard "absent field means unchanged"
  convention. Clearing an override is an explicit operation: the patch
  must contain `avatarOverride` with both `set` and `name` empty. An
  unrelated update that omits `avatarOverride` from its patch must not
  drop a previously pinned override.
* The override is **a name**, not a tile blob. Storing only the name
  keeps the account record compact, makes shipped-asset replacement
  automatic (a re-rendered tile with the same name applies everywhere),
  and avoids embedding image data in identity records.
* Account export/import carries the field unchanged: since the import
  path validates `set`/`name` against the importing host's shipped tile
  catalog, an override that names a tile the importing host does not
  ship is downgraded to the hash-derived default at import time and
  audited, never silently dropped.
* Anonymous and guest sessions cannot pin an override: they are
  short-lived and have nowhere durable to store it. Their hash result
  is the only avatar they get.

### Capability Surface

`UserSession` gains:

```capnp
interface UserSession {
  info         @0 () -> (info :SessionInfo);
  auditContext @1 () -> (sessionId :Data, principalId :Data);
  logout       @2 () -> ();
  avatar       @3 () -> (avatar :Avatar);
}

interface Avatar {
  # Stable, content-addressed handle for the chosen tile. `digest` is the
  # SHA-256 of the encoded WebP bytes, NOT the identity hash. Two accounts
  # that resolve to the same tile (whether through hash collision or
  # explicit override) return the same `digest`, so UIs can cache by it.
  ref  @0 () -> (set :Text, name :Text, digest :Data);

  # Bytes of the encoded WebP, when the caller is allowed to render it
  # locally. Same caps that grant `UserSession` are sufficient; no separate
  # avatar-read authority.
  read @1 () -> (image :Data, mime :Text);
}
```

The `avatar` ordinal `@3` follows the existing `info @0`, `auditContext @1`,
`logout @2` ordinals on `UserSession` and slots into the next free position.
A future schema change that lands ahead of this one must shift the avatar
ordinal accordingly; the cap-name is the contract, not the ordinal number.

* `ref` returns a content-addressed identifier suitable for caching across
  reboots without re-reading the bytes. The asset SHA-256 is computed once
  per shipped tile at build time and is identical across accounts that
  resolve to the same tile, so UI clients can key their local cache by
  `digest` and dedupe across many sessions. The identity-derived digest
  from the `Hash and Mapping` section is internal to the avatar resolver
  and is not exposed by `ref`.
* `read` returns the WebP bytes from the active set's tile. The ABI does not
  expose alternate formats — surfaces that need PNG can decode locally.

### When the Avatar is Bound

Resolution happens lazily, the first time `avatar()` is called on a session:

1. If the underlying account has an override, pick that tile.
2. Otherwise, hash the account/session identity input, take `index % len(set)`
   in the active set, look up `branding/user-icons/<set>/<NNN>-<name>.webp`.
3. Cache the result on the in-memory session object until the session is torn
   down.

There is no precomputation step at boot or login; the cost is one
personalized BLAKE2b digest plus one filesystem read per session, both
negligible.


## Surfaces That Consume Avatars

* **Login UI** (text shell `login`, future web login, future GUI): show the
  avatar next to the typed username while waiting for the password prompt,
  so the user has a non-cryptographic visual confirmation that they are
  selecting the account they expect. The avatar itself is not a secret and
  exposing it pre-auth is intentional — the same is true of display names.
* **Shell prompt and `whoami`**: optional inline avatar reference for
  graphical terminals; pure text shells fall back to `set/name`.
* **Remote-session client and Tauri wrapper**: the bridge already receives a
  view model from the trusted backend; add `avatar` to the session view
  model so the browser/desktop UI never queries identity directly.
* **System monitoring / audit views**: the avatar identifies the actor in
  human-readable timelines without leaking the underlying id.


## Anonymous and Guest Sessions

Anonymous and guest sessions get a hash-derived avatar, with these
constraints:

* The input uses the four-byte class tags `"anon"` and `"gst "` from the
  framed-input table above, so an anonymous session and a durable account
  with the same entropy field do not collide.
* The avatar lives only as long as the session-token entropy. Re-anonymizing
  produces a new tile.
* The login UI distinguishes `anonymous` and `guest` sessions from durable
  accounts by a chrome accent (border colour, badge, label), not by reusing
  one fixed tile. Reusing a fixed tile would make every anonymous user look
  identical, which loses the "tell sessions apart at a glance" property.


## Privacy and Security

* The hash uses a **public personalization tag**, not a secret key. The tile
  derivation is fully reproducible from public account metadata; the privacy
  guarantee is "the avatar leaks no extra information beyond what the
  identity surface already exposes," not "the underlying id is hidden by
  cryptography." The identity digest never leaves the identity layer — only
  its modulo-N tile-index result reaches UI surfaces, embedded in the
  resolved `set/name`.
* **Cross-host correlation is intentionally observable.** Because the hash
  has no per-host salt, the same durable or pseudonymous principal imported
  into two hosts produces the same `set/name` and `digest` on both. Anyone
  who watches the avatar surface on multiple hosts can correlate "same
  account is here too," and combined with display name or session metadata
  the avatar acts as a low-entropy identifier. This is the same correlation
  surface the principal id and external-binding `subjectHash` already
  expose, so we treat it as acceptable for ordinary multi-host accounts and
  call it out explicitly so privacy-sensitive deployments can pin a generic
  override or set a per-host override policy.
* Operators can audit-log avatar overrides as account-record edits, like
  any other identity field; the override mutation goes through
  `AccountStoreManager.update` and produces the same store-epoch /
  record-version / content-hash audit chain as other record changes.
* The avatar is **not authentication**. Two accounts with the same tile are
  not equivalent; the system always uses the principal id internally. The
  avatar is an identification *aid* for humans, like a display name.
* No network lookups. No Gravatar, no third-party calls.


## Open Questions

* Should the hash include a per-system salt so an account imported into two
  hosts does not always show the same default tile, similar to how Unix
  uid/gid space is host-local? This proposal currently says **no** —
  cross-host stability is more useful than host-local distinctness, since
  durable accounts already have a globally unique id.
* Should `Avatar.read` expose only the active-set bytes, or both
  `set-flat`/`set-modern` so a UI can render adaptive variants? Current
  preference: only the active set. Adaptive themes are the surface's job,
  not the identity layer's.
* How should the manifest seed an override for the operator account? A
  `seed.operator.avatar = "set-flat:robot"` field in `system.cue` is the
  natural extension, but only if operators express a need — the hash-derived
  default is already deterministic.


## Relevant Research

* Curated tile set with set-aware rounded-card masking:
  `branding/extract_user_icons.py`, `branding/user-icons/set-{flat,modern}/`.
* Identity surfaces that will host the new method:
  [User Identity, Sessions, and Policy](user-identity-and-policy-proposal.md);
  [Boot to Shell](boot-to-shell-proposal.md);
  [Delegated Subject Context](delegated-subject-context-proposal.md).
* Hash primitive and key-personalization conventions:
  [Cryptography and Key Management](cryptography-and-key-management-proposal.md).
