# Hermes Disaster Recovery Runbook Goal: rebuild the ByteLyst root Hermes and Uma/Bheem Hermes setup on a new VM quickly, with durable memory, sessions, cron definitions, skills, scripts, and dashboard/service configuration restored from GitHub-backed artifacts. Last verified: 2026-05-27. ## Current Recovery Confidence **High for durable Hermes state.** Both root and Uma now have sanitized `.hermes` persistent backups pushed to GitHub and recurring systemd backup timers. What is recoverable: - root Hermes config, memories, skills, sessions JSON exports, cron definitions, scripts, channel directory, gateway state, SOUL, and Kanban DB - Uma Hermes config, memories, skills, sessions JSON exports, cron definitions, scripts, channel directory, gateway state, SOUL, and Kanban DB - root and Uma gateway systemd unit definitions - root and Uma private dashboard systemd unit definitions - root and Uma backup timer systemd unit definitions - Uma wrapper/memory/docs repo content - root operational docs and rebuild knowledge in this repo What still requires operator-provided credentials or re-authentication: - GitHub token or credentials for clone/push if the new VM does not already have them - OpenAI Codex OAuth/provider login, unless restored from an encrypted emergency bundle - Telegram bot/user credentials, unless restored from an encrypted emergency bundle - Tailscale login for the new machine, unless restoring Tailscale state is explicitly chosen - any optional provider/search/browser API keys What is intentionally not restored from git: - raw `.env` secret values - Hermes `auth.json` - raw `state.db`, SQLite WAL/SHM files, logs, cache directories, sandboxes, locks, and PIDs - live OS processes or in-flight terminal commands that were running at the exact moment the VM was lost Expected data-loss window: - durable backups run every 10 minutes through systemd timers - latest in-memory/live process activity since the last backup may need manual reconstruction from Telegram/GitHub context ## Backup Sources | Instance | GitHub repo | Backup path | Recurring sync | | --- | --- | --- | --- | | root/vijay | `https://github.com/saravanakumardb/bytelyst_hostinger_hermes_vm.git` | `hermes_persistent_backup/` | `hermes-root-backup.timer` every 10 minutes | | Uma/bheem | `https://github.com/umadev0931/uma_hostinger_hermes_vm.git` | `hermes_persistent_backup/` | `uma-hermes-backup.timer` every 10 minutes | | ops docs | `https://github.com/saravanakumardb/learning_ai_devops_tools.git` | `docs/`, `systemd/`, `scripts/` | pushed manually after changes | ## Encrypted Emergency Bundle Normal GitHub backups are sanitized and intentionally exclude raw secrets, auth state, and raw `state.db`. For faster break-glass recovery, create a separate encrypted bundle and store the encrypted `.gpg` file in Google Drive or another private location. Create bundle on the old/current VM: ```bash /root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-create.sh ``` The script creates: ```text /root/hermes-emergency-bundles/hermes-emergency-bundle--.tar.zst.gpg ``` It includes an allow-list only: - `/root/.hermes/.env`, `auth.json`, `state.db*` - `/home/uma/.hermes/.env`, `auth.json`, `state.db*` - `/root/.git-credentials` - `/root/.gitea_admin_password`, `/root/.gitea_npm_token`, `/root/.gitea_npm_token_home` - `/var/lib/tailscale/tailscaled.state` It does not include logs, caches, locks, PIDs, or sandboxes. Decrypt on a new VM into staging only: ```bash /root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-decrypt.sh \ /path/to/hermes-emergency-bundle.tar.zst.gpg ``` The decrypt script extracts to `/root/hermes-emergency-restore-staging/...` by default. It does not overwrite live `.hermes` or credential files. Inspect the staging directory first, then manually copy only the files needed for the recovery. For unattended operation, both scripts support: ```bash export BUNDLE_PASSPHRASE_FILE=/root/path/to/passphrase-file ``` Keep the passphrase outside GitHub and outside the encrypted bundle. Latest verified commits on 2026-05-27: - root persistent backup: `d286a03` - Uma persistent backup: `bbad574` - ops docs/systemd templates: update after this runbook commit ## Fast Rebuild Order ### 1. Prepare Base VM Install the minimum system packages: ```bash apt-get update apt-get install -y git curl rsync python3 python3-venv nodejs npm systemd ``` Create Uma if missing: ```bash id uma || useradd -m -s /bin/bash uma loginctl enable-linger uma ``` ### 2. Restore Git Access Root is the operator for both root and Uma repo pushes. Restore GitHub credentials for root without printing them: ```bash git config --global credential.helper store chmod 700 /root # Create /root/.git-credentials from the external secret source. chmod 600 /root/.git-credentials ``` Then clone the three recovery repos: ```bash mkdir -p /root/repos /home/uma/repos git clone https://github.com/saravanakumardb/learning_ai_devops_tools.git /root/repos/learning_ai_devops_tools git clone https://github.com/saravanakumardb/bytelyst_hostinger_hermes_vm.git /root/repos/bytelyst_hostinger_hermes_vm git clone https://github.com/umadev0931/uma_hostinger_hermes_vm.git /home/uma/repos/uma_hostinger_hermes_vm chown -R uma:uma /home/uma/repos ``` ### 3. Install Hermes Source Use the official Hermes source and the same shared install path: ```bash mkdir -p /usr/local/lib git clone https://github.com/NousResearch/hermes-agent.git /usr/local/lib/hermes-agent cd /usr/local/lib/hermes-agent python3 -m venv venv ./venv/bin/pip install -e . ``` If the repo provides a setup/update script in the future, prefer the official upstream instructions, then verify: ```bash /usr/local/lib/hermes-agent/venv/bin/hermes --version ``` ### 4. Restore Root Hermes Persistent Data ```bash HERMES_HOME=/root/.hermes \ /root/repos/bytelyst_hostinger_hermes_vm/restore_hermes_persistent_data.sh \ /root/repos/bytelyst_hostinger_hermes_vm/hermes_persistent_backup ``` Re-enter secrets from the external source into `/root/.hermes/.env` or via Hermes auth flows. Do not copy secrets from docs or chat. Verify: ```bash HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes doctor --fix HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list ``` ### 5. Restore Uma Hermes Persistent Data ```bash mkdir -p /home/uma/.hermes HERMES_HOME=/home/uma/.hermes \ /home/uma/repos/uma_hostinger_hermes_vm/restore_hermes_persistent_data.sh \ /home/uma/repos/uma_hostinger_hermes_vm/hermes_persistent_backup chown -R uma:uma /home/uma/.hermes ``` Re-enter Uma secrets from the external source into `/home/uma/.hermes/.env` or via Hermes auth flows. Verify: ```bash sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes doctor --fix sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list ``` ### 6. Reinstall Systemd Units ```bash cp /root/repos/learning_ai_devops_tools/systemd/hermes-gateway.service /etc/systemd/system/hermes-gateway.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-dashboard.service /etc/systemd/system/hermes-root-dashboard.service cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-dashboard.service /etc/systemd/system/uma-hermes-dashboard.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-backup.service /etc/systemd/system/hermes-root-backup.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-backup.timer /etc/systemd/system/hermes-root-backup.timer cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-backup.service /etc/systemd/system/uma-hermes-backup.service cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-backup.timer /etc/systemd/system/uma-hermes-backup.timer ``` Install Uma user gateway: ```bash mkdir -p /home/uma/.config/systemd/user cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-gateway.service /home/uma/.config/systemd/user/uma-hermes-gateway.service chown -R uma:uma /home/uma/.config ``` Enable services: ```bash systemctl daemon-reload systemctl enable --now hermes-gateway.service systemctl enable --now hermes-root-backup.timer uma-hermes-backup.timer sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user daemon-reload sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user enable --now uma-hermes-gateway.service ``` ### 7. Reconnect Tailscale And Dashboards ```bash curl -fsSL https://tailscale.com/install.sh | sh systemctl enable --now tailscaled tailscale up tailscale ip -4 ``` Update the dashboard service files if the new Tailscale IP differs from the old `100.87.53.10`, then: ```bash systemctl daemon-reload systemctl enable --now hermes-root-dashboard.service uma-hermes-dashboard.service ``` ### 8. Final Verification ```bash systemctl status hermes-gateway.service --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user status uma-hermes-gateway.service --no-pager systemctl status hermes-root-backup.timer uma-hermes-backup.timer --no-pager systemctl list-timers --all --no-pager | grep 'hermes.*backup' HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list python3 /root/.hermes/scripts/sync_hermes_persistent_backup.py HERMES_HOME=/home/uma/.hermes HERMES_BACKUP_REPO=/home/uma/repos/uma_hostinger_hermes_vm HERMES_BACKUP_REMOTE=https://github.com/umadev0931/uma_hostinger_hermes_vm.git python3 /home/uma/.hermes/scripts/sync_uma_hermes_persistent_backup.py ``` Telegram smoke tests: - send root Hermes: `Hi` - send Uma/Bheem Hermes: `Hi` - verify both reply without model-provider errors - verify root and Uma dashboards return HTTP 200 on the current Tailscale IP/ports ## Restore Test Evidence Root restore test on 2026-05-27: - restored into `/tmp/hermes-restore-test-root-current` - `MANIFEST.json` source: `/root/.hermes` - restored file count: `751` - restored cron job count: `1` - confirmed absent: `state.db`, `auth.json`, `logs/` Uma restore test on 2026-05-27: - restored into `/tmp/hermes-restore-test-uma` - `MANIFEST.json` source: `/home/uma/.hermes` - restored file count: `600` - restored cron job count: `2` - confirmed absent: `state.db`, `auth.json`, `logs/` ## Hard Rule During Recovery Do not expose Hermes dashboard/API publicly during rebuild. Use only local shell, SSH tunnel, or Tailscale/private network unless S explicitly approves the hostname, authentication gate, and access path.