Managed-update sidecar

The chatalot-updater service and how to operate it.

What it is

The chatalot-updater container runs alongside chatalot-server in the same Compose stack. It owns the managed-update lifecycle:

Fetches release manifests from updates.seglamater.app over HTTPS.
Takes a pre-upgrade Postgres snapshot.
Pulls the new image (with digest pinning), runs pre-flight and migration scripts as throwaway containers, and recreates the server container on the new image.
Polls /health on the new container; on failure, restores the snapshot and redeploys the previous image.
Persists every apply's state + event log to a local SQLite database (/var/lib/chatalot-updater/state.db, mode 0600).

An apply is kicked off by an authenticated call from chatalot-server to the updater's HTTP API — operators never call the updater directly in normal use. The admin UI on the server surfaces apply status + event history by proxying through signed requests.

Container layout

Enabling the sidecar adds two services (both opt-in via the updater Compose profile):

Container	Image	Purpose
`chatalot-updater`	built from `Dockerfile.updater`	Orchestrator + HTTP API on port 8081
`chatalot-socket-proxy`	`registry.seglamater.app/seglamater/docker-socket-proxy:0.3.0`	Filtered gateway to the host Docker socket

The socket-proxy image is mirrored from docker.io/tecnativa/docker-socket-proxy:0.3.0 into the Seglamater Forgejo registry so the managed-update system does not depend on Docker Hub availability. If you're self-hosting chatalot against a different registry, repeat the mirror step from the section Mirroring the socket-proxy image below and update the image: reference in docker-compose.yml.

Both containers live on an isolated internal Docker network (updater_internal). Neither port is exposed to the host. The updater also joins the existing internal network so it can reach Postgres for pg_dump.

Socket-proxy ACL

The proxy permits exactly the Docker API surface BollardDockerClient uses (see crates/chatalot-updater/src/update/docker_client.rs):

CONTAINERS=1  IMAGES=1  NETWORKS=1  POST=1
EXEC=0  VOLUMES=0  SYSTEM=0  SECRETS=0  CONFIGS=0
SERVICES=0  TASKS=0  SWARM=0  PLUGINS=0  NODES=0
BUILD=0  COMMIT=0  DISTRIBUTION=0

EXEC=0 is the critical one — it means a compromised updater cannot run arbitrary commands inside any existing container. Extending the orchestrator with a new Docker call means revisiting this ACL; the module docstring flags that obligation.

Image size

The runtime image is debian:bookworm-slim plus postgresql-client-17 (for pg_dump / psql) and curl (for the healthcheck probe). Expect ~110–130 MB pulled. The ticket originally aimed for <30 MB, but the postgres client alone is ~50 MB. Keeping that client is mandatory — the updater can't orchestrate DB snapshots without it.

Operator workflow

Enabling

Copy the secret placeholders:

cp secrets/updater_token.example secrets/updater_token
openssl rand -hex 32 > secrets/updater_token
chmod 600 secrets/updater_token

cp secrets/cosign_pub.example secrets/cosign_pub
# Paste the cosign public key (see the .example file or the
# canonical chatalot.pub at https://updates.seglamater.app/.well-known/keys/chatalot.pub
# — or generate your own if self-publishing).
chmod 600 secrets/cosign_pub

Bring up the sidecar:

docker compose --profile updater up -d \
    chatalot-socket-proxy chatalot-updater

Verify health from inside the stack:

docker compose exec chatalot curl -sf http://chatalot-updater:8081/health

Expect {"status":"ok","version":"0.24.6"} (or the current updater crate version).

Inspecting apply state

The quickest view is the SQLite database inside the container:

docker compose exec chatalot-updater \
    sqlite3 -readonly /var/lib/chatalot-updater/state.db \
    'SELECT id, target_version, current_state, outcome FROM applies ORDER BY started_at DESC LIMIT 10;'

For event-level detail on a single apply:

docker compose exec chatalot-updater \
    sqlite3 -readonly /var/lib/chatalot-updater/state.db \
    "SELECT ts, event_code, detail FROM apply_events WHERE apply_id='<uuid>' ORDER BY ts;"

Restart safety

If the updater container is restarted (operator-triggered or OOM-killed) mid-apply, the next startup's orphan scan transitions any in-flight apply to FrozenMaintenanceRequired. Operators should treat a frozen state as an intervention point — see the rollback recipe below.

Rollback recipe

When an apply lands in FrozenMaintenanceRequired, rollback requires manual review. The snapshot produced during the failed apply is named pre-apply-<version>-<timestamp>.sql.gz under /var/backups/chatalot.

Identify the snapshot:

docker compose exec chatalot-updater \
    ls -lah /var/backups/chatalot | tail -10

Confirm the snapshot matches the failed apply (cross-check the snapshot_started event's detail blob for the snapshot id).
Restore via the CHAT-17 CLI:

docker compose exec chatalot \
    /app/chatalot-snapshot restore --id <snapshot-id>

Recreate the server container on the previous image. The updater records the previous image reference in the rollback_started event's detail:

docker compose exec chatalot-updater \
    sqlite3 -readonly /var/lib/chatalot-updater/state.db \
    "SELECT detail FROM apply_events WHERE apply_id='<uuid>' AND event_code='rollback_started';"

After chatalot-server is healthy, transition the frozen row to a terminal outcome so it no longer trips the orphan-scan alert:

docker compose exec chatalot-updater \
    sqlite3 /var/lib/chatalot-updater/state.db \
    "UPDATE applies SET outcome='rolled_back', error='manual-recovery' WHERE id='<uuid>';"

Wave-1 limitations

Called out here so operators don't assume they're bugs:

Cosign verification is stubbed. StubCosignVerifier validates the pubkey file is readable at startup and rejects missing/empty signatures on verify calls, but does not cryptographically verify anything. Every verify call emits a loud WARN target="cosign_stub" log — grep for that string to confirm you're still on wave-1.
Maintenance broadcast is a log-only stub. The pre-disruption broadcast that tells users "maintenance in N seconds" isn't wired to the server-side WebSocket API yet; the phase just logs and sleeps for CHATALOT_UPDATER_BROADCAST_GRACE_SECS.
No WebSocket progress stream. Admin UI polls /v1/apply/:id. Sub-second live progress lands in wave-2.

All three are tracked as separate wave-2 tickets and will slot in without Compose changes.

Configuration reference

All env vars the updater reads (defaults in parentheses):

Variable	Default	Purpose
`CHATALOT_UPDATER_LISTEN_ADDR`	`0.0.0.0:8081`	HTTP listen
`CHATALOT_UPDATER_API_TOKEN_FILE`	`/run/secrets/updater_token`	HMAC shared secret
`CHATALOT_UPDATER_SQLITE_PATH`	`/var/lib/chatalot-updater/state.db`	Apply state + event log
`DOCKER_HOST`	`tcp://chatalot-socket-proxy:2375`	Daemon endpoint
`CHATALOT_UPDATER_SERVER_CONTAINER`	`chatalot-server`	Container the orchestrator manipulates
`CHATALOT_UPDATER_COSIGN_PUBKEY_PATH`	`/run/secrets/cosign_pub`	Cosign public key
`CHATALOT_UPDATER_PRE_FLIGHT_TIMEOUT_SECS`	`30`	Pre-flight phase timeout
`CHATALOT_UPDATER_COSIGN_TIMEOUT_SECS`	`30`	Cosign verify timeout
`CHATALOT_UPDATER_SNAPSHOT_TIMEOUT_SECS`	`300`	pg_dump timeout
`CHATALOT_UPDATER_PULL_TIMEOUT_SECS`	`600`	docker pull timeout
`CHATALOT_UPDATER_MIGRATE_TIMEOUT_SECS`	`600`	Migrate one-shot timeout
`CHATALOT_UPDATER_HEALTH_CHECK_TIMEOUT_SECS`	`60`	Post-start health wait
`CHATALOT_UPDATER_BROADCAST_GRACE_SECS`	`30`	Maintenance broadcast grace

All timeouts are clamped to [1, 3600] seconds with a warning log on out-of-range values.

Mirroring the socket-proxy image

The stock docker-compose.yml references registry.seglamater.app/seglamater/docker-socket-proxy:0.3.0 — a mirror of the upstream tecnativa/docker-socket-proxy:0.3.0 image published via the Seglamater Forgejo registry. This keeps the managed-update install path Docker-Hub-independent: once chatalot is running, pulling updates and their required sidecar images hits only Seglamater infrastructure.

If you're self-hosting chatalot against your own registry, replicate the mirror:

# 1. Pull the upstream image from Docker Hub.
docker pull tecnativa/docker-socket-proxy:0.3.0

# 2. Verify the content digest matches what upstream published at
#    mirror time. sha256:9e4b9e7517a6b660f2cc903a19b257b1852d5b3344794e3ea334ff00ae677ac2
docker inspect --format '{{index .RepoDigests 0}}' \
    tecnativa/docker-socket-proxy:0.3.0

# 3. Re-tag for your registry (example: my-registry.example.com, org `acme`).
docker tag tecnativa/docker-socket-proxy:0.3.0 \
    my-registry.example.com/acme/docker-socket-proxy:0.3.0

# 4. Push.
echo "$REGISTRY_TOKEN" | docker login my-registry.example.com -u acme --password-stdin
docker push my-registry.example.com/acme/docker-socket-proxy:0.3.0

# 5. Update docker-compose.yml so chatalot-socket-proxy uses the
#    mirrored path, then restart the sidecar:
#      docker compose --profile updater up -d chatalot-socket-proxy

After the mirror you can (and should) cosign sign the image in your registry with the same key that signs your chatalot release manifests, so the supply-chain story is symmetric end-to-end. Cosign is stubbed in wave-1 so this isn't wired to block the pull yet; the signature lands in place for wave-2 to verify.

Crate: crates/chatalot-updater/
Orchestrator: crates/chatalot-updater/src/update/orchestrator.rs
HTTP API: crates/chatalot-updater/src/update/http_api.rs
Dockerfile: Dockerfile.updater
Compose services: docker-compose.yml (chatalot-updater + chatalot-socket-proxy)