# Hermes Disaster Recovery Runbook Goal: rebuild the ByteLyst root Hermes and Uma/Bheem Hermes setup on a new VM quickly, with durable memory, sessions, cron definitions, skills, scripts, and dashboard/service configuration restored from GitHub-backed artifacts. Last verified: 2026-05-27. ## Current Recovery Confidence **High for durable Hermes state.** Both root and Uma now have sanitized `.hermes` persistent backups pushed to GitHub and recurring systemd backup timers. What is recoverable: - root Hermes config, memories, skills, sessions JSON exports, cron definitions, scripts, channel directory, gateway state, SOUL, and Kanban DB - Uma Hermes config, memories, skills, sessions JSON exports, cron definitions, scripts, channel directory, gateway state, SOUL, and Kanban DB - root and Uma gateway systemd unit definitions - root and Uma private dashboard systemd unit definitions - root and Uma backup timer systemd unit definitions - Uma wrapper/memory/docs repo content - root operational docs and rebuild knowledge in this repo What still requires operator-provided credentials or re-authentication: - GitHub token or credentials for clone/push if the new VM does not already have them - OpenAI Codex OAuth/provider login, unless restored from an encrypted emergency bundle - Telegram bot/user credentials, unless restored from an encrypted emergency bundle - Tailscale login for the new machine, unless restoring Tailscale state is explicitly chosen - any optional provider/search/browser API keys What is intentionally not restored from git: - raw `.env` secret values - Hermes `auth.json` - raw `state.db`, SQLite WAL/SHM files, logs, cache directories, sandboxes, locks, and PIDs - live OS processes or in-flight terminal commands that were running at the exact moment the VM was lost Expected data-loss window: - durable backups run every 10 minutes through systemd timers - latest in-memory/live process activity since the last backup may need manual reconstruction from Telegram/GitHub context ## Backup Sources | Instance | GitHub repo | Backup path | Recurring sync | | --- | --- | --- | --- | | root/vijay | `https://github.com/saravanakumardb/bytelyst_hostinger_hermes_vm.git` | `hermes_persistent_backup/` | `hermes-root-backup.timer` every 10 minutes | | Uma/bheem | `https://github.com/umadev0931/uma_hostinger_hermes_vm.git` | `hermes_persistent_backup/` | `uma-hermes-backup.timer` every 10 minutes | | ops docs | `https://github.com/saravanakumardb/learning_ai_devops_tools.git` | `docs/`, `systemd/`, `scripts/` | pushed manually after changes | ## Encrypted Emergency Bundle Normal GitHub backups are sanitized and intentionally exclude raw secrets, auth state, and raw `state.db`. For faster break-glass recovery, create a separate encrypted bundle and store the encrypted `.gpg` file in Google Drive or another private location. Create bundle on the old/current VM: ```bash /root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-create.sh ``` The script creates: ```text /root/hermes-emergency-bundles/hermes-emergency-bundle--.tar.zst.gpg ``` It includes an allow-list only: - `/root/.hermes/.env`, `auth.json`, `state.db*` - `/home/uma/.hermes/.env`, `auth.json`, `state.db*` - `/root/.git-credentials` - `/root/.gitea_admin_password`, `/root/.gitea_npm_token`, `/root/.gitea_npm_token_home` - `/var/lib/tailscale/tailscaled.state` It does not include logs, caches, locks, PIDs, or sandboxes. Decrypt on a new VM into staging only: ```bash /root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-decrypt.sh \ /path/to/hermes-emergency-bundle.tar.zst.gpg ``` The decrypt script extracts to `/root/hermes-emergency-restore-staging/...` by default. It does not overwrite live `.hermes` or credential files. Inspect the staging directory first, then manually copy only the files needed for the recovery. For unattended operation, both scripts support: ```bash export BUNDLE_PASSPHRASE_FILE=/root/path/to/passphrase-file ``` Keep the passphrase outside GitHub and outside the encrypted bundle. Automated Google Drive upload for personal Drive uses OAuth user credentials, not the service account. Why: service accounts can read metadata for folders shared from personal Drive, but personal Drive uploads fail because service accounts do not have personal Drive storage quota. Use the service account path only for Shared Drives or Workspace delegation. Personal Drive OAuth setup: 1. In Google Cloud Console, create an OAuth client of type **Desktop app** in the `hermes-emergency-backups` project. 2. Save the downloaded JSON as: ```text /root/.config/hermes-google-drive/oauth-client.json ``` 3. Run: ```bash /root/.local/share/hermes-drive-uploader-venv/bin/python \ /root/repos/learning_ai_devops_tools/scripts/hermes-google-drive-oauth-login.py ``` 4. Open the printed URL, approve access, paste the code back in the terminal. 5. Confirm `/root/.config/hermes-google-drive/user-token.json` exists with mode `600`. Automated Google Drive upload is configured to use: - OAuth client: `/root/.config/hermes-google-drive/oauth-client.json` - OAuth token: `/root/.config/hermes-google-drive/user-token.json` - passphrase file: `/root/.config/hermes-google-drive/bundle-passphrase` - uploader venv: `/root/.local/share/hermes-drive-uploader-venv` - uploader script: `scripts/hermes-emergency-bundle-upload-drive.sh` - timer: `hermes-emergency-drive-upload.timer`, daily around `03:17 UTC` Drive targets: - Vijay folder: `1KIlSJzpf5fuaH5LYvfbLsUbOSYY23YGm` - Bheem folder: `1Ac5cbDC0dSWas8LeeWe_9XFqCquz7kZT` The uploader creates one encrypted bundle and uploads the same encrypted file to both folders. It keeps the latest 12 encrypted bundles per Drive folder. Latest verified commits on 2026-05-27: - root persistent backup: `d286a03` - Uma persistent backup: `bbad574` - ops docs/systemd templates: update after this runbook commit ## Fast Rebuild Order ### 1. Prepare Base VM Install the minimum system packages: ```bash apt-get update apt-get install -y git curl rsync python3 python3-venv nodejs npm systemd ``` Create Uma if missing: ```bash id uma || useradd -m -s /bin/bash uma loginctl enable-linger uma ``` ### 2. Restore Git Access Root is the operator for both root and Uma repo pushes. Restore GitHub credentials for root without printing them: ```bash git config --global credential.helper store chmod 700 /root # Create /root/.git-credentials from the external secret source. chmod 600 /root/.git-credentials ``` Then clone the three recovery repos: ```bash mkdir -p /root/repos /home/uma/repos git clone https://github.com/saravanakumardb/learning_ai_devops_tools.git /root/repos/learning_ai_devops_tools git clone https://github.com/saravanakumardb/bytelyst_hostinger_hermes_vm.git /root/repos/bytelyst_hostinger_hermes_vm git clone https://github.com/umadev0931/uma_hostinger_hermes_vm.git /home/uma/repos/uma_hostinger_hermes_vm chown -R uma:uma /home/uma/repos ``` ### 3. Install Hermes Source Use the official Hermes source and the same shared install path: ```bash mkdir -p /usr/local/lib git clone https://github.com/NousResearch/hermes-agent.git /usr/local/lib/hermes-agent cd /usr/local/lib/hermes-agent python3 -m venv venv ./venv/bin/pip install -e . ``` If the repo provides a setup/update script in the future, prefer the official upstream instructions, then verify: ```bash /usr/local/lib/hermes-agent/venv/bin/hermes --version ``` ### 4. Restore Root Hermes Persistent Data ```bash HERMES_HOME=/root/.hermes \ /root/repos/bytelyst_hostinger_hermes_vm/restore_hermes_persistent_data.sh \ /root/repos/bytelyst_hostinger_hermes_vm/hermes_persistent_backup ``` Re-enter secrets from the external source into `/root/.hermes/.env` or via Hermes auth flows. Do not copy secrets from docs or chat. Verify: ```bash HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes doctor --fix HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list ``` ### 5. Restore Uma Hermes Persistent Data ```bash mkdir -p /home/uma/.hermes HERMES_HOME=/home/uma/.hermes \ /home/uma/repos/uma_hostinger_hermes_vm/restore_hermes_persistent_data.sh \ /home/uma/repos/uma_hostinger_hermes_vm/hermes_persistent_backup chown -R uma:uma /home/uma/.hermes ``` Re-enter Uma secrets from the external source into `/home/uma/.hermes/.env` or via Hermes auth flows. Verify: ```bash sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes doctor --fix sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list ``` ### 6. Reinstall Systemd Units ```bash cp /root/repos/learning_ai_devops_tools/systemd/hermes-gateway.service /etc/systemd/system/hermes-gateway.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-dashboard.service /etc/systemd/system/hermes-root-dashboard.service cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-dashboard.service /etc/systemd/system/uma-hermes-dashboard.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-backup.service /etc/systemd/system/hermes-root-backup.service cp /root/repos/learning_ai_devops_tools/systemd/hermes-root-backup.timer /etc/systemd/system/hermes-root-backup.timer cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-backup.service /etc/systemd/system/uma-hermes-backup.service cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-backup.timer /etc/systemd/system/uma-hermes-backup.timer ``` Install Uma user gateway: ```bash mkdir -p /home/uma/.config/systemd/user cp /root/repos/learning_ai_devops_tools/systemd/uma-hermes-gateway.service /home/uma/.config/systemd/user/uma-hermes-gateway.service chown -R uma:uma /home/uma/.config ``` Enable services: ```bash systemctl daemon-reload systemctl enable --now hermes-gateway.service systemctl enable --now hermes-root-backup.timer uma-hermes-backup.timer sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user daemon-reload sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user enable --now uma-hermes-gateway.service ``` ### 7. Reconnect Tailscale And Dashboards ```bash curl -fsSL https://tailscale.com/install.sh | sh systemctl enable --now tailscaled tailscale up tailscale ip -4 ``` Update the dashboard service files if the new Tailscale IP differs from the old `100.87.53.10`, then: ```bash systemctl daemon-reload systemctl enable --now hermes-root-dashboard.service uma-hermes-dashboard.service ``` ### 8. Final Verification ```bash systemctl status hermes-gateway.service --no-pager sudo -u uma XDG_RUNTIME_DIR=/run/user/$(id -u uma) systemctl --user status uma-hermes-gateway.service --no-pager systemctl status hermes-root-backup.timer uma-hermes-backup.timer --no-pager systemctl list-timers --all --no-pager | grep 'hermes.*backup' HERMES_HOME=/root/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list sudo -u uma HERMES_HOME=/home/uma/.hermes /usr/local/lib/hermes-agent/venv/bin/hermes cron list python3 /root/.hermes/scripts/sync_hermes_persistent_backup.py HERMES_HOME=/home/uma/.hermes HERMES_BACKUP_REPO=/home/uma/repos/uma_hostinger_hermes_vm HERMES_BACKUP_REMOTE=https://github.com/umadev0931/uma_hostinger_hermes_vm.git python3 /home/uma/.hermes/scripts/sync_uma_hermes_persistent_backup.py ``` Telegram smoke tests: - send root Hermes: `Hi` - send Uma/Bheem Hermes: `Hi` - verify both reply without model-provider errors - verify root and Uma dashboards return HTTP 200 on the current Tailscale IP/ports ## Restore Test Evidence Root restore test on 2026-05-27: - restored into `/tmp/hermes-restore-test-root-current` - `MANIFEST.json` source: `/root/.hermes` - restored file count: `751` - restored cron job count: `1` - confirmed absent: `state.db`, `auth.json`, `logs/` Uma restore test on 2026-05-27: - restored into `/tmp/hermes-restore-test-uma` - `MANIFEST.json` source: `/home/uma/.hermes` - restored file count: `600` - restored cron job count: `2` - confirmed absent: `state.db`, `auth.json`, `logs/` ## Hard Rule During Recovery Do not expose Hermes dashboard/API publicly during rebuild. Use only local shell, SSH tunnel, or Tailscale/private network unless S explicitly approves the hostname, authentication gate, and access path.