15 KiB
ByteLyst Hermes Operations Runbook
Operational runbook for the private Telegram-driven Hermes Agent setup on the ByteLyst VM.
Current baseline
Observed on 2026-05-27:
- Hermes version:
v0.14.0 (2026.5.16) - Shared source checkout:
/usr/local/lib/hermes-agentat upstream0b6ace649after the 2026-05-27 late upgrade pass - Install path:
/usr/local/lib/hermes-agent - Active profile:
default - Primary provider: OpenAI Codex OAuth
- Root Telegram gateway:
hermes-gateway.service, system service, enabled and running - Uma Telegram gateway:
uma-hermes-gateway.service, user service foruma, enabled and running - Root and Uma default model:
gpt-5.5,model.routing.enabled: false - Shared local fallback chain via Ollama on demand:
qwen2.5-coder:1.5bllama3.2:1bllama3.2-vision
- These local fallbacks are loaded on demand and answer within the gateway's retry budget on this VM; the larger 3B/7B models were observed to be too slow for the live fallback path here.
- Live Hermes session-switch proof: root and Uma both fail over from a forced primary-provider error into the local Ollama chain and return
FallbackTest. - Telegram platform-context proof: the same fallback behavior passes when Hermes runs with
HERMES_PLATFORM=telegramfor both root and Uma. This is platform-context proof, not a separately replayed inbound Telegram network message. - Web backend target: Firecrawl, configured locally on root and Uma with a private API key
- Browser automation: enabled on both Hermes gateways; root was smoke-tested privately against
https://example.com - Backup cron:
Sync Hermes persistent-data backup to GitHub, every 30 minutes, local delivery - Systemd persistent backup timers:
hermes-root-backup.timeranduma-hermes-backup.timer, every 10 minutes - Watchdog cron:
ByteLyst Hermes gateway/backup/disk watchdog, every 15 minutes, Telegram delivery on failure only - Dashboard policy: do not expose Hermes dashboard/API publicly without explicit approval
- Tailscale: installed and
tailscaledenabled/running; authenticated as tailnet IP100.87.53.10 - Private dashboards:
- Root:
http://100.87.53.10:9119/,hermes-root-dashboard.service - Uma:
http://100.87.53.10:9120/,uma-hermes-dashboard.service - Live ops panel shows gateway state, active sessions, refresh delta, cron state, backup freshness, sanitized alerts, and runbook links for both instances.
- Root:
Safety guardrail: no public Hermes dashboard/API
Before adding any new Caddy hostname, Docker port, or dashboard/API feature, verify that it is not a Hermes dashboard/API public exposure.
# Inspect public Caddy routes and obvious Hermes/API/dashboard references.
docker ps --format '{{.Names}} {{.Ports}}' | grep -i caddy || true
grep -RniE 'hermes|dashboard|api-server|API_SERVER|8000|8080|3000|5173' /etc/caddy /root/bytelyst.ai 2>/dev/null | head -100
# Inspect listening ports. Review any 0.0.0.0 listeners before exposing a hostname.
ss -ltnp
Allowed private access patterns for a future Hermes dashboard:
- local-only binding (
127.0.0.1) - SSH tunnel
- Tailscale/WireGuard private network
- Cloudflare Access or equivalent identity gate
- basic auth plus IP allowlist only if public routing is unavoidable and explicitly approved
Current private network access:
tailscale status
tailscale ip -4
# Expected server IPv4: 100.87.53.10
Private dashboard services:
systemctl status hermes-root-dashboard --no-pager
systemctl status uma-hermes-dashboard --no-pager
ss -ltnp | grep -E ':(9119|9120)'
# Expected listeners are Tailscale-only:
# 100.87.53.10:9119
# 100.87.53.10:9120
Tracked service unit templates:
systemd/hermes-gateway.service
systemd/uma-hermes-gateway.service
systemd/hermes-root-dashboard.service
systemd/uma-hermes-dashboard.service
systemd/hermes-root-backup.service
systemd/hermes-root-backup.timer
systemd/uma-hermes-backup.service
systemd/uma-hermes-backup.timer
Health baseline commands
hermes --version
hermes config check
hermes doctor --fix
hermes status --all
hermes cron list
systemctl status hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
df -h /
free -h
ss -ltnp
Notes:
hermes doctor --fixmigrated root and Uma configs to version24on 2026-05-27.- Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials.
- Local Ollama fallback models are installed on demand, not kept hot permanently. Both Hermes instances can reach the shared host service at
http://127.0.0.1:11434/v1. The live fallback order isqwen2.5-coder:1.5b->llama3.2:1b->llama3.2-vision.gemma4was attempted but the installed Ollama runtime rejected it, so the vision fallback isllama3.2-vision.
Gateway recovery
systemctl status hermes-gateway --no-pager
journalctl -u hermes-gateway -n 100 --no-pager
hermes gateway restart
# If the CLI restart path is unavailable:
sudo systemctl restart hermes-gateway
# Uma user gateway:
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 journalctl --user -u uma-hermes-gateway -n 100 --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user restart uma-hermes-gateway
After restart, verify from Telegram:
- inbound message receives a response
- outbound completion messages work
- approval prompts still reach the allowed user
- media/file delivery works for a known safe file if needed
Cron and watchdogs
List jobs:
hermes cron list
Current watchdog script:
~/.hermes/scripts/hermes_health_watchdog.py
Tracked source copy:
scripts/hermes-health-watchdog.py
Behavior:
- no output on success, so the cron stays silent
- sends a Telegram message only when it detects an actionable failure
- checks gateway service state, Hermes cron backup visibility/status, backup repo freshness when discoverable, and root disk usage
- also checks memory pressure plus critical Caddy/Gitea Docker containers (
caddy,gitea-npm-registry)
Manual smoke test:
python3 ~/.hermes/scripts/hermes_health_watchdog.py
# Healthy output should be empty.
Persistent backup timers:
systemctl status hermes-root-backup.timer uma-hermes-backup.timer --no-pager
systemctl list-timers --all --no-pager | grep 'hermes.*backup'
Backup and restore drill outline
The persistent-data backup repo intentionally excludes raw secrets and state.db.
For full VM rebuild steps, use docs/hermes-disaster-recovery.md.
For break-glass recovery of raw secrets/auth/state that are excluded from GitHub backups, use:
scripts/hermes-emergency-bundle-create.sh
scripts/hermes-emergency-bundle-decrypt.sh
scripts/hermes-emergency-bundle-upload-drive.sh
Store only the encrypted .gpg bundle in Google Drive or similar private storage. Never upload the plaintext staging directory.
Automated Drive upload:
/root/.local/share/hermes-drive-uploader-venv/bin/python scripts/hermes-google-drive-oauth-login.py
systemctl status hermes-emergency-drive-upload.timer --no-pager
systemctl start hermes-emergency-drive-upload.service
journalctl -u hermes-emergency-drive-upload.service -n 80 --no-pager
Personal Google Drive requires OAuth user credentials. A service account can see shared personal folders but cannot upload because it has no personal Drive storage quota.
General one-file Drive upload:
scripts/google-drive-upload-file.sh /path/to/file --target vijay
scripts/google-drive-upload-file.sh /path/to/file --target bheem --encrypt
The general uploader refuses sensitive-looking files by default, including .env, auth tokens, private keys, SQLite DBs, and Google credential files. Use --encrypt for private files. Use --allow-sensitive only after explicit approval.
Telegram usage pattern:
Upload the file I just sent to Vijay Google Drive. Do not print file contents. Find the local attachment path, then use scripts/google-drive-upload-file.sh with --target vijay.
Quarterly restore drill:
- Run the backup sync manually or wait for a successful cron run.
- Clone the backup repo into a temporary directory.
- Inspect git contents for accidental raw secrets:
git grep -nE '(API_KEY|TOKEN|SECRET|PASSWORD|BEGIN .*PRIVATE KEY)' || true - Restore into a non-production Hermes profile/test directory only.
- Verify config, skills, sessions JSON exports, cron definitions, memories, and scripts are present.
- Confirm
.env, OAuth files, SQLite WAL/SHM files, logs, caches, and rawstate.dbare absent. - Delete the temporary restore directory when done.
2026-05-27 restore rehearsal:
- Restored root backup into
/tmp/hermes-restore-test-root. - Verified portable directories/files were present:
config.yaml,skills/,sessions/,cron/,memories/, and scripts. - Verified raw
state.dbwas absent. - Scanned restored
.envtemplate andconfig.yamlfor common token patterns; no hits.
Upgrade checklist
Before upgrade:
hermes --version
hermes status --all
hermes config check
hermes cron list
python3 ~/.hermes/scripts/sync_hermes_persistent_backup.py
Upgrade from an interactive/private shell only:
hermes update
After upgrade:
hermes doctor --fix
hermes gateway restart
hermes --version
hermes status --all
hermes cron list
python3 ~/.hermes/scripts/hermes_health_watchdog.py
Then run Telegram smoke tests and record any manual fixups in this doc or the roadmap.
2026-05-27 late upgrade pass:
- Backed up root/Uma configs and service units under
/root/hermes-fix-backups/20260527-roadmap-noncreds/. - Fast-forwarded
/usr/local/lib/hermes-agentto upstream0b6ace649. - Restarted both gateways.
- Verified provider smoke tests with exact responses
root-roadmap-okanduma-roadmap-ok.
Provider and tool changes
Use Hermes flows rather than editing secrets into git-tracked files:
hermes model
hermes setup model
hermes tools list
hermes tools enable <toolset>
hermes tools disable <toolset>
Restart/reset requirement:
- gateway config changes:
/restartfrom Telegram orhermes gateway restart - CLI session tool changes: start a new session or
/reset - provider auth changes: start a new session after switching models/providers
Safe local Gitea Git token flow
Root Hermes has a least-privilege local Gitea Git path for repository reads:
- token file:
/root/.gitea_npm_token_home - askpass helper:
/root/.local/bin/gitea-git-askpass - Git wrapper:
/root/.local/bin/gitea-git - default username:
learning_ai_user - local Gitea URL:
http://localhost:3300
The token value must never be placed in a remote URL, shell history, Git config, docs, logs, or Hermes chat. The wrapper sets GIT_TERMINAL_PROMPT=0 and GIT_ASKPASS=/root/.local/bin/gitea-git-askpass; the askpass helper reads the token from the root-only token file only when Git prompts for a password.
Safe read-only test:
/root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD
Hermes-safe prompt pattern:
Use the terminal tool only. Run exactly this read-only command and report only whether it succeeded and the first 12 characters of the HEAD hash: /root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD. Do not print any token, credential, environment variable, or file contents.
Verification recorded on 2026-05-27:
- local Gitea version endpoint returned
1.22.6 - token file permissions are root-only
- profile-read API access returned a scope denial, confirming the token is not broad enough for user-profile reads
- direct wrapper test returned HEAD
59c4638f85be... - Hermes one-shot test reported success with truncated HEAD
59c4638f85be
For write operations, create a separate repo-scoped token and store it in a new root-only token file. Do not reuse this read-focused token for broad automation unless the required scope is explicitly reviewed first.
GitHub credential ownership
Root Git operations already have GitHub push credentials through the root Git credential store. Root is the operator account for both:
https://github.com/saravanakumardb/learning_ai_devops_tools.githttps://github.com/umadev0931/uma_hostinger_hermes_vm.git
Uma does not need a separate /home/uma/.git-credentials file for the current workflow because repo maintenance and pushes are performed from root. Do not copy root GitHub credentials into Uma's home directory unless there is a concrete need for Uma-user GitHub pushes.
Remaining audit item: confirm in GitHub that the root token is fine-grained or otherwise limited to the intended repos and permissions. Do not print the token while checking this.
Telegram topics and session handling
Root and Uma currently use the standard Telegram gateway session handling. Do not enable or change topic/session behavior without a concrete routing need.
Review these before changing Telegram routing:
systemctl status hermes-gateway --no-pager
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
grep -RniE 'topic|thread|TELEGRAM_.*THREAD|HOME_CHANNEL' /root/.hermes /home/uma/.hermes 2>/dev/null | head -100
Multi-agent execution conventions
Use the smallest execution surface that fits the task:
- direct tool call: one-shot local checks, edits, commits, pushes, status reads
delegate_task: bounded research or code inspection that can return inside the parent session- spawned Hermes/tmux session: long-running mission that must outlive the parent turn
- background terminal process: long-running local commands that need monitoring
- cron job: recurring, deterministic, silent-on-success maintenance
- worktree: independent coding agent branch space when tasks can overlap
- Kanban worker: durable multi-agent project coordination after the board is intentionally configured
Telegram progress/completion updates should keep the user's numbered-prefix convention (1, 2, etc. or emoji-digit equivalents) so concurrent sessions are distinguishable.
Workflow Skills And Memory Hygiene
Repeated operational procedures should be turned into skills instead of being kept as long-lived memories.
Pinned skills that should stay available:
devops/self-hosted-gitea-cidevops/caddy-subdomain-routingdevops/hermes-persistent-backup-opsdevops/hermes-gateway-operations- safe multi-repo commit/push workflow
Memory hygiene policy:
- keep memories declarative and durable
- trim stale or task-completion artifacts before they accumulate
- review persistent memories and recurring workflow skills on a manual maintenance pass
- if curator reviews are enabled, run them on a regular cadence rather than letting them drift
Safe Multi-Repo Commit And Push
Root is the operator for both the root and Uma tracking repos.
Safe sequence:
- Work in the target repo only.
- Run the repo's tests or checks before committing.
- Commit the smallest coherent change.
- Push from root using the already-approved GitHub credential path.
- Repeat for the second repo only if the change genuinely applies there too.
Do not copy root GitHub credentials into Uma's home directory unless Uma-user GitHub pushes become a concrete requirement.