8.1 KiB
ByteLyst Hermes Operations Runbook
Operational runbook for the private Telegram-driven Hermes Agent setup on the ByteLyst VM.
Current baseline
Observed on 2026-05-27:
- Hermes version:
v0.14.0 (2026.5.16) - Shared source checkout:
/usr/local/lib/hermes-agentat upstream0b6ace649after the 2026-05-27 late upgrade pass - Install path:
/usr/local/lib/hermes-agent - Active profile:
default - Primary provider: OpenAI Codex OAuth
- Root Telegram gateway:
hermes-gateway.service, system service, enabled and running - Uma Telegram gateway:
uma-hermes-gateway.service, user service foruma, enabled and running - Root and Uma default model:
gpt-5.5,model.routing.enabled: false - Backup cron:
Sync Hermes persistent-data backup to GitHub, every 30 minutes, local delivery - Watchdog cron:
ByteLyst Hermes gateway/backup/disk watchdog, every 15 minutes, Telegram delivery on failure only - Dashboard policy: do not expose Hermes dashboard/API publicly without explicit approval
- Tailscale: installed and
tailscaledenabled/running; authenticated as tailnet IP100.87.53.10 - Private dashboards:
- Root:
http://100.87.53.10:9119/,hermes-root-dashboard.service - Uma:
http://100.87.53.10:9120/,uma-hermes-dashboard.service
- Root:
Safety guardrail: no public Hermes dashboard/API
Before adding any new Caddy hostname, Docker port, or dashboard/API feature, verify that it is not a Hermes dashboard/API public exposure.
# Inspect public Caddy routes and obvious Hermes/API/dashboard references.
docker ps --format '{{.Names}} {{.Ports}}' | grep -i caddy || true
grep -RniE 'hermes|dashboard|api-server|API_SERVER|8000|8080|3000|5173' /etc/caddy /root/bytelyst.ai 2>/dev/null | head -100
# Inspect listening ports. Review any 0.0.0.0 listeners before exposing a hostname.
ss -ltnp
Allowed private access patterns for a future Hermes dashboard:
- local-only binding (
127.0.0.1) - SSH tunnel
- Tailscale/WireGuard private network
- Cloudflare Access or equivalent identity gate
- basic auth plus IP allowlist only if public routing is unavoidable and explicitly approved
Current private network access:
tailscale status
tailscale ip -4
# Expected server IPv4: 100.87.53.10
Private dashboard services:
systemctl status hermes-root-dashboard --no-pager
systemctl status uma-hermes-dashboard --no-pager
ss -ltnp | grep -E ':(9119|9120)'
# Expected listeners are Tailscale-only:
# 100.87.53.10:9119
# 100.87.53.10:9120
Tracked service unit templates:
systemd/hermes-root-dashboard.service
systemd/uma-hermes-dashboard.service
Health baseline commands
hermes --version
hermes config check
hermes doctor --fix
hermes status --all
hermes cron list
systemctl status hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
df -h /
free -h
ss -ltnp
Notes:
hermes doctor --fixmigrated root and Uma configs to version24on 2026-05-27.- Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials.
Gateway recovery
systemctl status hermes-gateway --no-pager
journalctl -u hermes-gateway -n 100 --no-pager
hermes gateway restart
# If the CLI restart path is unavailable:
sudo systemctl restart hermes-gateway
# Uma user gateway:
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 journalctl --user -u uma-hermes-gateway -n 100 --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user restart uma-hermes-gateway
After restart, verify from Telegram:
- inbound message receives a response
- outbound completion messages work
- approval prompts still reach the allowed user
- media/file delivery works for a known safe file if needed
Cron and watchdogs
List jobs:
hermes cron list
Current watchdog script:
~/.hermes/scripts/hermes_health_watchdog.py
Tracked source copy:
scripts/hermes-health-watchdog.py
Behavior:
- no output on success, so the cron stays silent
- sends a Telegram message only when it detects an actionable failure
- checks gateway service state, Hermes cron backup visibility/status, backup repo freshness when discoverable, and root disk usage
- also checks memory pressure plus critical Caddy/Gitea Docker containers (
caddy,gitea-npm-registry)
Manual smoke test:
python3 ~/.hermes/scripts/hermes_health_watchdog.py
# Healthy output should be empty.
Backup and restore drill outline
The persistent-data backup repo intentionally excludes raw secrets and state.db.
Quarterly restore drill:
- Run the backup sync manually or wait for a successful cron run.
- Clone the backup repo into a temporary directory.
- Inspect git contents for accidental raw secrets:
git grep -nE '(API_KEY|TOKEN|SECRET|PASSWORD|BEGIN .*PRIVATE KEY)' || true - Restore into a non-production Hermes profile/test directory only.
- Verify config, skills, sessions JSON exports, cron definitions, memories, and scripts are present.
- Confirm
.env, OAuth files, SQLite WAL/SHM files, logs, caches, and rawstate.dbare absent. - Delete the temporary restore directory when done.
2026-05-27 restore rehearsal:
- Restored root backup into
/tmp/hermes-restore-test-root. - Verified portable directories/files were present:
config.yaml,skills/,sessions/,cron/,memories/, and scripts. - Verified raw
state.dbwas absent. - Scanned restored
.envtemplate andconfig.yamlfor common token patterns; no hits.
Upgrade checklist
Before upgrade:
hermes --version
hermes status --all
hermes config check
hermes cron list
python3 ~/.hermes/scripts/sync_hermes_persistent_backup.py
Upgrade from an interactive/private shell only:
hermes update
After upgrade:
hermes doctor --fix
hermes gateway restart
hermes --version
hermes status --all
hermes cron list
python3 ~/.hermes/scripts/hermes_health_watchdog.py
Then run Telegram smoke tests and record any manual fixups in this doc or the roadmap.
2026-05-27 late upgrade pass:
- Backed up root/Uma configs and service units under
/root/hermes-fix-backups/20260527-roadmap-noncreds/. - Fast-forwarded
/usr/local/lib/hermes-agentto upstream0b6ace649. - Restarted both gateways.
- Verified provider smoke tests with exact responses
root-roadmap-okanduma-roadmap-ok.
Provider and tool changes
Use Hermes flows rather than editing secrets into git-tracked files:
hermes model
hermes setup model
hermes tools list
hermes tools enable <toolset>
hermes tools disable <toolset>
Restart/reset requirement:
- gateway config changes:
/restartfrom Telegram orhermes gateway restart - CLI session tool changes: start a new session or
/reset - provider auth changes: start a new session after switching models/providers
Telegram topics and session handling
Root and Uma currently use the standard Telegram gateway session handling. Do not enable or change topic/session behavior without a concrete routing need.
Review these before changing Telegram routing:
systemctl status hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
grep -RniE 'topic|thread|TELEGRAM_.*THREAD|HOME_CHANNEL' /root/.hermes /home/uma/.hermes 2>/dev/null | head -100
Multi-agent execution conventions
Use the smallest execution surface that fits the task:
- direct tool call: one-shot local checks, edits, commits, pushes, status reads
delegate_task: bounded research or code inspection that can return inside the parent session- background terminal process: long-running local commands that need monitoring
- cron job: recurring, deterministic, silent-on-success maintenance
- Kanban worker: durable multi-agent project coordination after the board is intentionally configured
Telegram progress/completion updates should keep the user's numbered-prefix convention (1, 2, etc. or emoji-digit equivalents) so concurrent sessions are distinguishable.