diff --git a/docs/hermes-disaster-recovery.md b/docs/hermes-disaster-recovery.md index 5136a8f..5a0be03 100644 --- a/docs/hermes-disaster-recovery.md +++ b/docs/hermes-disaster-recovery.md @@ -21,9 +21,9 @@ What is recoverable: What still requires operator-provided credentials or re-authentication: - GitHub token or credentials for clone/push if the new VM does not already have them -- OpenAI Codex OAuth/provider login -- Telegram bot/user credentials if not restored from an external secret manager -- Tailscale login for the new machine +- OpenAI Codex OAuth/provider login, unless restored from an encrypted emergency bundle +- Telegram bot/user credentials, unless restored from an encrypted emergency bundle +- Tailscale login for the new machine, unless restoring Tailscale state is explicitly chosen - any optional provider/search/browser API keys What is intentionally not restored from git: @@ -46,6 +46,49 @@ Expected data-loss window: | Uma/bheem | `https://github.com/umadev0931/uma_hostinger_hermes_vm.git` | `hermes_persistent_backup/` | `uma-hermes-backup.timer` every 10 minutes | | ops docs | `https://github.com/saravanakumardb/learning_ai_devops_tools.git` | `docs/`, `systemd/`, `scripts/` | pushed manually after changes | +## Encrypted Emergency Bundle + +Normal GitHub backups are sanitized and intentionally exclude raw secrets, auth state, and raw `state.db`. For faster break-glass recovery, create a separate encrypted bundle and store the encrypted `.gpg` file in Google Drive or another private location. + +Create bundle on the old/current VM: + +```bash +/root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-create.sh +``` + +The script creates: + +```text +/root/hermes-emergency-bundles/hermes-emergency-bundle--.tar.zst.gpg +``` + +It includes an allow-list only: + +- `/root/.hermes/.env`, `auth.json`, `state.db*` +- `/home/uma/.hermes/.env`, `auth.json`, `state.db*` +- `/root/.git-credentials` +- `/root/.gitea_admin_password`, `/root/.gitea_npm_token`, `/root/.gitea_npm_token_home` +- `/var/lib/tailscale/tailscaled.state` + +It does not include logs, caches, locks, PIDs, or sandboxes. + +Decrypt on a new VM into staging only: + +```bash +/root/repos/learning_ai_devops_tools/scripts/hermes-emergency-bundle-decrypt.sh \ + /path/to/hermes-emergency-bundle.tar.zst.gpg +``` + +The decrypt script extracts to `/root/hermes-emergency-restore-staging/...` by default. It does not overwrite live `.hermes` or credential files. Inspect the staging directory first, then manually copy only the files needed for the recovery. + +For unattended operation, both scripts support: + +```bash +export BUNDLE_PASSPHRASE_FILE=/root/path/to/passphrase-file +``` + +Keep the passphrase outside GitHub and outside the encrypted bundle. + Latest verified commits on 2026-05-27: - root persistent backup: `d286a03` diff --git a/docs/hermes-operations.md b/docs/hermes-operations.md index 090803a..c9e3126 100644 --- a/docs/hermes-operations.md +++ b/docs/hermes-operations.md @@ -166,6 +166,15 @@ The persistent-data backup repo intentionally excludes raw secrets and `state.db For full VM rebuild steps, use `docs/hermes-disaster-recovery.md`. +For break-glass recovery of raw secrets/auth/state that are excluded from GitHub backups, use: + +```bash +scripts/hermes-emergency-bundle-create.sh +scripts/hermes-emergency-bundle-decrypt.sh +``` + +Store only the encrypted `.gpg` bundle in Google Drive or similar private storage. Never upload the plaintext staging directory. + Quarterly restore drill: 1. Run the backup sync manually or wait for a successful cron run. diff --git a/scripts/hermes-emergency-bundle-create.sh b/scripts/hermes-emergency-bundle-create.sh new file mode 100755 index 0000000..0440f39 --- /dev/null +++ b/scripts/hermes-emergency-bundle-create.sh @@ -0,0 +1,109 @@ +#!/usr/bin/env bash +set -euo pipefail + +usage() { + cat <<'USAGE' +Usage: + hermes-emergency-bundle-create.sh [output-dir] + +Creates a GPG-encrypted emergency bundle containing sensitive recovery files +that are intentionally excluded from the normal GitHub Hermes backups. + +Default output-dir: + /root/hermes-emergency-bundles + +Passphrase: + Interactive GPG prompt by default. + Or set BUNDLE_PASSPHRASE_FILE=/root/path/to/passphrase-file for unattended use. + +Safety: + - Does not print secret values. + - Uses an allow-list of sensitive recovery files. + - Does not include logs, caches, locks, PIDs, or sandboxes. +USAGE +} + +if [ "${1:-}" = "-h" ] || [ "${1:-}" = "--help" ]; then + usage + exit 0 +fi + +OUT_DIR="${1:-/root/hermes-emergency-bundles}" +STAMP="$(date -u +%Y%m%dT%H%M%SZ)" +HOST="$(hostname -s 2>/dev/null || hostname)" +WORK_DIR="$(mktemp -d)" +PAYLOAD_DIR="$WORK_DIR/payload" +ARCHIVE="$WORK_DIR/hermes-emergency-bundle-${HOST}-${STAMP}.tar.zst" +OUT_FILE="$OUT_DIR/hermes-emergency-bundle-${HOST}-${STAMP}.tar.zst.gpg" + +cleanup() { + rm -rf "$WORK_DIR" +} +trap cleanup EXIT + +install -d -m 700 "$OUT_DIR" +install -d -m 700 "$PAYLOAD_DIR" + +copy_if_exists() { + src="$1" + dest="$PAYLOAD_DIR/${src#/}" + if [ -e "$src" ]; then + install -d -m 700 "$(dirname "$dest")" + cp -a "$src" "$dest" + printf '%s\n' "${src#/}" >> "$PAYLOAD_DIR/MANIFEST.paths" + fi +} + +# Root Hermes sensitive state. +copy_if_exists /root/.hermes/.env +copy_if_exists /root/.hermes/auth.json +copy_if_exists /root/.hermes/state.db +copy_if_exists /root/.hermes/state.db-shm +copy_if_exists /root/.hermes/state.db-wal + +# Uma Hermes sensitive state. +copy_if_exists /home/uma/.hermes/.env +copy_if_exists /home/uma/.hermes/auth.json +copy_if_exists /home/uma/.hermes/state.db +copy_if_exists /home/uma/.hermes/state.db-shm +copy_if_exists /home/uma/.hermes/state.db-wal + +# Git and local registry credentials used for recovery operations. +copy_if_exists /root/.git-credentials +copy_if_exists /root/.gitea_admin_password +copy_if_exists /root/.gitea_npm_token +copy_if_exists /root/.gitea_npm_token_home + +# Tailscale machine state is sensitive. Restoring it is optional; a fresh +# `tailscale up` login is often cleaner, but this preserves a break-glass copy. +copy_if_exists /var/lib/tailscale/tailscaled.state + +if [ ! -s "$PAYLOAD_DIR/MANIFEST.paths" ]; then + echo "No emergency files found to bundle." >&2 + exit 1 +fi + +cat > "$PAYLOAD_DIR/README.txt" < [staging-dir] + +Decrypts a Hermes emergency bundle into a staging directory. + +Default staging-dir: + /root/hermes-emergency-restore-staging/ + +Passphrase: + Interactive GPG prompt by default. + Or set BUNDLE_PASSPHRASE_FILE=/root/path/to/passphrase-file for unattended use. + +Safety: + - Does not write into /root/.hermes or /home/uma/.hermes. + - Does not overwrite live credentials. + - Review extracted files, then copy only the needed files manually. +USAGE +} + +if [ "${1:-}" = "-h" ] || [ "${1:-}" = "--help" ] || [ "$#" -lt 1 ]; then + usage + exit 0 +fi + +BUNDLE="$1" +if [ ! -f "$BUNDLE" ]; then + echo "Bundle not found: $BUNDLE" >&2 + exit 1 +fi + +base="$(basename "$BUNDLE" .gpg)" +STAGING_DIR="${2:-/root/hermes-emergency-restore-staging/$base}" +WORK_DIR="$(mktemp -d)" +ARCHIVE="$WORK_DIR/$base" + +cleanup() { + rm -rf "$WORK_DIR" +} +trap cleanup EXIT + +install -d -m 700 "$STAGING_DIR" + +gpg_args=(--decrypt --output "$ARCHIVE") +if [ -n "${BUNDLE_PASSPHRASE_FILE:-}" ]; then + gpg_args=(--batch --yes --pinentry-mode loopback --passphrase-file "$BUNDLE_PASSPHRASE_FILE" "${gpg_args[@]}") +fi + +gpg "${gpg_args[@]}" "$BUNDLE" +tar -C "$STAGING_DIR" -I zstd -xf "$ARCHIVE" +chmod -R go-rwx "$STAGING_DIR" + +echo "Bundle decrypted into staging directory: $STAGING_DIR" +echo +echo "Included paths:" +if [ -f "$STAGING_DIR/MANIFEST.paths" ]; then + sed -n '1,200p' "$STAGING_DIR/MANIFEST.paths" +else + find "$STAGING_DIR" -type f | sed "s#^$STAGING_DIR/##" | sort | sed -n '1,200p' +fi +echo +echo "Next step: inspect staging, then manually copy only the needed files into place."