Troubleshooting

Common failure modes for a self-hosted Hivemind install, drawn from real deploy + dogfood experience. Each entry names the symptom you'll see, the root cause, and the concrete fix. American English, copy-pasteable.

If you hit something not listed here, the first thing to check is the server boot log — most install-time problems announce themselves there:

docker compose logs --tail=50 hivemind-server

Look for INFO lines that confirm each subsystem (database migrations applied, BYOK LLM provider initialised, integration credential key loaded, listening) and any WARN/ERROR lines explaining what didn't.

Install + boot

Integration routes return 503 `integration_key_missing`

You see this from POST /api/v1/agents/{id}/chatalot/connect (or any other /chatalot/* lifecycle route) with body {"error":"integration_key_missing", ...}.

Cause: HIVEMIND_INTEGRATION_ENCRYPTION_KEY (or _FILE) is not wired into hivemind-server. The server logs this at boot: HIVEMIND_INTEGRATION_ENCRYPTION_KEY (or _FILE) not set — integration routes will return 503.

Fix (v0.1.9 and later): scripts/install.sh provisions secrets/integration_encryption_key automatically. Confirm the file is there:

ls -la secrets/integration_encryption_key
# Expect: -rw------- ... 64 bytes (32 bytes hex)

If it's missing, re-run scripts/install.sh or generate it by hand:

openssl rand -hex 32 > secrets/integration_encryption_key
chmod 0600 secrets/integration_encryption_key
docker compose up -d --force-recreate server

After the recreate, the boot log should read integration credential key loaded — chatalot integrations enabled.

Fix (pre-v0.1.9): upgrade. v0.1.9 added the compose secret + the installer provisioning together; older releases require provisioning the file manually and editing docker-compose.yml to mount it as a docker secret.

Boot log warns `LLM startup ping FAILED — missing HIVEMIND_LLM_API_KEY`

The server boots healthy and reachable, but LLM-dependent surfaces (public chat endpoint, agent runtime tool-call loop, content studio drafting) are degraded.

Cause: the BYOK LLM provider isn't configured in .env.

Fix: set the provider triplet in .env and restart server:

# In .env (Anthropic example; see byok-llm.md for OpenAI / Ollama / OpenAI-compat)
HIVEMIND_LLM_PROVIDER=anthropic
HIVEMIND_LLM_API_KEY=sk-ant-...
HIVEMIND_LLM_MODEL=claude-haiku-4-5-20251001

docker compose up -d server

The next boot should log BYOK LLM provider initialised provider="anthropic" endpoint=... model=... without the LLM startup ping FAILED line.

`/api/v1/health` is `ok` but `/api/v1/agents/{id}/chatalot/*` returns 404

The server is up and routes for the rest of the API work, but every /chatalot/* path 404s.

Cause: on releases before v0.1.9, the chatalot lifecycle routes are only mounted when the integration key is set. Missing key → no mount → 404 (rather than the v0.1.9+ 503).

Fix: wire the integration key as above (integration_key_missing entry) and upgrade to v0.1.9 or later, where the same condition surfaces as a clearer 503 instead of a misleading 404.

Chatalot integration

`POST /chatalot/connect` returns `chatalot_mint_bot_failed` / opaque "HTTP transport error"

Connecting to a chatalot instance produces:

{"error":"chatalot_mint_bot_failed",
 "detail":"HTTP transport error: ..."}

Cause: the chatalot instance uses a self-signed or private-CA TLS certificate that the hivemind-server container's system CA store doesn't trust. The connector verifies TLS by default. On v0.1.9 and later, the enriched error reads TLS/certificate error (connection failed): self-signed certificate — if the chatalot instance uses a self-signed or private-CA cert, set verify_tls=false on the integration.

Fix: include "verify_tls": false in the connect body — a deliberate, per-integration, logged opt-out:

curl -fsS -X POST -H "X-API-Key: $API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "instance_url": "https://chat.example.internal",
    "admin_jwt": "<one-time chatalot admin JWT>",
    "bot_username": "ops-bot",
    "display_name": "Ops Bot",
    "expires_in_hours": 720,
    "verify_tls": false
  }' \
  "$HIVE/api/v1/agents/$AGENT_ID/chatalot/connect"

Hivemind logs a WARN on every client build with verify_tls=false. Disabling TLS verification drops MITM protection on that integration — use only against an instance you control. A publicly-trusted cert (Let's Encrypt, a private CA pinned into the container) is the secure path when available.

`send_message` tool returns 403 / `require_webhook_manager` upstream

An agent's send_message call returns an UpstreamError{status:403,...} or chatalot_tool_failed with chatalot complaining about webhook-manager permissions.

Cause: chatalot has no plain REST message-send. The Hivemind send_message tool routes through a per-channel webhook (POST /api/webhooks/execute/{token}), which means the bot has to auto-create that webhook on the first send. Webhook creation in chatalot requires the bot to hold a community owner/admin role on the target channel's community. A bot that's only a member of the channel gets 403.

Fix: grant the bot community-admin via chatalot's bot administration API (on the chatalot instance):

curl -fsS -X POST -H "Authorization: Bearer $CHATALOT_ADMIN_JWT" \
  -H 'Content-Type: application/json' \
  -d '{"community_id":"<community-uuid>","role":"admin"}' \
  "$CHATALOT/api/admin/bots/$BOT_ID/communities"

The next send_message call mints the webhook (POST /api/channels/{id}/webhooks) cleanly and proceeds to execute it.

Routing + networking

`hivemind-server` container times out reaching an on-prem chatalot

Symptom: POST /chatalot/connect (or /test) against https://chat.your-domain times out from inside the hivemind-server container, even though curl https://chat.your-domain/api/health works from the host shell on the same machine.

Cause: the hostname resolves (via DNS) to the host's LAN IP, but container → host LAN IP traffic to :443 is dropped by the host firewall (ufw-docker and friends route container-network → host-LAN asymmetrically). The TLS path itself is fine; only the route through the host IP fails.

Fix: route the hostname directly to the reverse-proxy's container-network IP instead, by adding an extra_hosts entry in docker-compose.yml on the server service:

services:
  server:
    extra_hosts:
      - "chat.your-domain:<proxy-container-ip>"

<proxy-container-ip> is the reverse proxy's address on a docker network the hivemind-server container is also attached to (a shared overlay or bridge). The full HTTPS path then runs end-to-end inside the container network, never traverses the host LAN, and TLS / SNI stay intact.

Long-term, replacing the extra_hosts entry with a docker network alias on the proxy service (so resolution happens by docker DNS) drops the hardcoded IP.

Health + verification

`docker compose ps` shows `hivemind-server` as `(unhealthy)` after install

The container is running but its healthcheck (curl /api/v1/health) fails repeatedly. /api/v1/health is what scripts/deploy.sh and the container HEALTHCHECK directive both poll.

Cause: usually one of:

The server failed to apply its migrations (Postgres not ready yet, or wrong DATABASE_URL). The logs show a sqlx::Error near the top.
.env is missing a required key (DB_PASSWORD, ADMIN_API_KEY, COOKIE_SECRET, JWT_SIGNING_KEY). The server panics at startup — the logs name the missing key.
Host port ${API_PORT:-8585} is already in use by another process. The container starts but can't bind; docker compose logs shows the bind error.

Fix: read the actual error in the logs and act on it:

docker compose logs --tail=80 hivemind-server
docker compose logs --tail=40 postgres

For missing .env keys, re-run scripts/install.sh (idempotent — it will fill in any blanks without regenerating existing secrets). For port conflicts, set API_PORT=<free-port> in .env and docker compose up -d server.

If the healthcheck still fails after these, curl -fsS http://localhost:8585/api/v1/health from the host should return {"status":"ok","version":"…"}. If that works but docker compose ps still says unhealthy, the issue is the in-container curl (e.g., the image is missing curl); confirm you're on the published registry.seglamater.app/seglamater/hivemind-server:<ver> image.

Troubleshooting

Install + boot

Integration routes return 503 integration_key_missing

Boot log warns LLM startup ping FAILED — missing HIVEMIND_LLM_API_KEY

/api/v1/health is ok but /api/v1/agents/{id}/chatalot/* returns 404