# ByteLyst Hermes Operations Runbook Operational runbook for the private Telegram-driven Hermes Agent setup on the ByteLyst VM. ## Current baseline Observed on 2026-05-27: - Hermes version: `v0.14.0 (2026.5.16)` - Shared source checkout: `/usr/local/lib/hermes-agent` at upstream `0b6ace649` after the 2026-05-27 late upgrade pass - Install path: `/usr/local/lib/hermes-agent` - Active profile: `default` - Primary provider: OpenAI Codex OAuth - Root Telegram gateway: `hermes-gateway.service`, system service, enabled and running - Uma Telegram gateway: `uma-hermes-gateway.service`, user service for `uma`, enabled and running - Root and Uma default model: `gpt-5.5`, `model.routing.enabled: false` - Backup cron: `Sync Hermes persistent-data backup to GitHub`, every 30 minutes, local delivery - Watchdog cron: `ByteLyst Hermes gateway/backup/disk watchdog`, every 15 minutes, Telegram delivery on failure only - Dashboard policy: do not expose Hermes dashboard/API publicly without explicit approval - Tailscale: installed and `tailscaled` enabled/running; authenticated as tailnet IP `100.87.53.10` - Private dashboards: - Root: `http://100.87.53.10:9119/`, `hermes-root-dashboard.service` - Uma: `http://100.87.53.10:9120/`, `uma-hermes-dashboard.service` ## Safety guardrail: no public Hermes dashboard/API Before adding any new Caddy hostname, Docker port, or dashboard/API feature, verify that it is not a Hermes dashboard/API public exposure. ```bash # Inspect public Caddy routes and obvious Hermes/API/dashboard references. docker ps --format '{{.Names}} {{.Ports}}' | grep -i caddy || true grep -RniE 'hermes|dashboard|api-server|API_SERVER|8000|8080|3000|5173' /etc/caddy /root/bytelyst.ai 2>/dev/null | head -100 # Inspect listening ports. Review any 0.0.0.0 listeners before exposing a hostname. ss -ltnp ``` Allowed private access patterns for a future Hermes dashboard: 1. local-only binding (`127.0.0.1`) 2. SSH tunnel 3. Tailscale/WireGuard private network 4. Cloudflare Access or equivalent identity gate 5. basic auth plus IP allowlist only if public routing is unavoidable and explicitly approved Current private network access: ```bash tailscale status tailscale ip -4 # Expected server IPv4: 100.87.53.10 ``` Private dashboard services: ```bash systemctl status hermes-root-dashboard --no-pager systemctl status uma-hermes-dashboard --no-pager ss -ltnp | grep -E ':(9119|9120)' # Expected listeners are Tailscale-only: # 100.87.53.10:9119 # 100.87.53.10:9120 ``` Tracked service unit templates: ```bash systemd/hermes-root-dashboard.service systemd/uma-hermes-dashboard.service ``` ## Health baseline commands ```bash hermes --version hermes config check hermes doctor --fix hermes status --all hermes cron list systemctl status hermes-gateway --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager df -h / free -h ss -ltnp ``` Notes: - `hermes doctor --fix` migrated root and Uma configs to version `24` on 2026-05-27. - Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials. ## Gateway recovery ```bash systemctl status hermes-gateway --no-pager journalctl -u hermes-gateway -n 100 --no-pager hermes gateway restart # If the CLI restart path is unavailable: sudo systemctl restart hermes-gateway # Uma user gateway: sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 journalctl --user -u uma-hermes-gateway -n 100 --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user restart uma-hermes-gateway ``` After restart, verify from Telegram: - inbound message receives a response - outbound completion messages work - approval prompts still reach the allowed user - media/file delivery works for a known safe file if needed ## Cron and watchdogs List jobs: ```bash hermes cron list ``` Current watchdog script: ```bash ~/.hermes/scripts/hermes_health_watchdog.py ``` Tracked source copy: ```bash scripts/hermes-health-watchdog.py ``` Behavior: - no output on success, so the cron stays silent - sends a Telegram message only when it detects an actionable failure - checks gateway service state, Hermes cron backup visibility/status, backup repo freshness when discoverable, and root disk usage - also checks memory pressure plus critical Caddy/Gitea Docker containers (`caddy`, `gitea-npm-registry`) Manual smoke test: ```bash python3 ~/.hermes/scripts/hermes_health_watchdog.py # Healthy output should be empty. ``` ## Backup and restore drill outline The persistent-data backup repo intentionally excludes raw secrets and `state.db`. Quarterly restore drill: 1. Run the backup sync manually or wait for a successful cron run. 2. Clone the backup repo into a temporary directory. 3. Inspect git contents for accidental raw secrets: ```bash git grep -nE '(API_KEY|TOKEN|SECRET|PASSWORD|BEGIN .*PRIVATE KEY)' || true ``` 4. Restore into a non-production Hermes profile/test directory only. 5. Verify config, skills, sessions JSON exports, cron definitions, memories, and scripts are present. 6. Confirm `.env`, OAuth files, SQLite WAL/SHM files, logs, caches, and raw `state.db` are absent. 7. Delete the temporary restore directory when done. 2026-05-27 restore rehearsal: - Restored root backup into `/tmp/hermes-restore-test-root`. - Verified portable directories/files were present: `config.yaml`, `skills/`, `sessions/`, `cron/`, `memories/`, and scripts. - Verified raw `state.db` was absent. - Scanned restored `.env` template and `config.yaml` for common token patterns; no hits. ## Upgrade checklist Before upgrade: ```bash hermes --version hermes status --all hermes config check hermes cron list python3 ~/.hermes/scripts/sync_hermes_persistent_backup.py ``` Upgrade from an interactive/private shell only: ```bash hermes update ``` After upgrade: ```bash hermes doctor --fix hermes gateway restart hermes --version hermes status --all hermes cron list python3 ~/.hermes/scripts/hermes_health_watchdog.py ``` Then run Telegram smoke tests and record any manual fixups in this doc or the roadmap. 2026-05-27 late upgrade pass: - Backed up root/Uma configs and service units under `/root/hermes-fix-backups/20260527-roadmap-noncreds/`. - Fast-forwarded `/usr/local/lib/hermes-agent` to upstream `0b6ace649`. - Restarted both gateways. - Verified provider smoke tests with exact responses `root-roadmap-ok` and `uma-roadmap-ok`. ## Provider and tool changes Use Hermes flows rather than editing secrets into git-tracked files: ```bash hermes model hermes setup model hermes tools list hermes tools enable hermes tools disable ``` Restart/reset requirement: - gateway config changes: `/restart` from Telegram or `hermes gateway restart` - CLI session tool changes: start a new session or `/reset` - provider auth changes: start a new session after switching models/providers ## Safe local Gitea Git token flow Root Hermes has a least-privilege local Gitea Git path for repository reads: - token file: `/root/.gitea_npm_token_home` - askpass helper: `/root/.local/bin/gitea-git-askpass` - Git wrapper: `/root/.local/bin/gitea-git` - default username: `learning_ai_user` - local Gitea URL: `http://localhost:3300` The token value must never be placed in a remote URL, shell history, Git config, docs, logs, or Hermes chat. The wrapper sets `GIT_TERMINAL_PROMPT=0` and `GIT_ASKPASS=/root/.local/bin/gitea-git-askpass`; the askpass helper reads the token from the root-only token file only when Git prompts for a password. Safe read-only test: ```bash /root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD ``` Hermes-safe prompt pattern: ```text Use the terminal tool only. Run exactly this read-only command and report only whether it succeeded and the first 12 characters of the HEAD hash: /root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD. Do not print any token, credential, environment variable, or file contents. ``` Verification recorded on 2026-05-27: - local Gitea version endpoint returned `1.22.6` - token file permissions are root-only - profile-read API access returned a scope denial, confirming the token is not broad enough for user-profile reads - direct wrapper test returned HEAD `59c4638f85be...` - Hermes one-shot test reported success with truncated HEAD `59c4638f85be` For write operations, create a separate repo-scoped token and store it in a new root-only token file. Do not reuse this read-focused token for broad automation unless the required scope is explicitly reviewed first. ## Telegram topics and session handling Root and Uma currently use the standard Telegram gateway session handling. Do not enable or change topic/session behavior without a concrete routing need. Review these before changing Telegram routing: ```bash systemctl status hermes-gateway --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager grep -RniE 'topic|thread|TELEGRAM_.*THREAD|HOME_CHANNEL' /root/.hermes /home/uma/.hermes 2>/dev/null | head -100 ``` ## Multi-agent execution conventions Use the smallest execution surface that fits the task: - direct tool call: one-shot local checks, edits, commits, pushes, status reads - `delegate_task`: bounded research or code inspection that can return inside the parent session - background terminal process: long-running local commands that need monitoring - cron job: recurring, deterministic, silent-on-success maintenance - Kanban worker: durable multi-agent project coordination after the board is intentionally configured Telegram progress/completion updates should keep the user's numbered-prefix convention (`1`, `2`, etc. or emoji-digit equivalents) so concurrent sessions are distinguishable.