bytelyst-devops-tools/docs/prompts/phase4-bheem-uma-parity.md

# Delegation Brief — Phase 4: Bheem/Uma parity

> Self-contained task brief for a VM-side agent (Hermes `delegate_task`, a
> manual ops session, or a fresh remote Devin session that has SSH/console
> access to the Hostinger VM). This is **VM operations work**, not codebase
> work — none of the steps below are made by editing files in this repo.
>
> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 4),
> `docs/hermes-operations.md`, `docs/hermes-disaster-recovery.md`,
> `scripts/hermes-health-watchdog.py`.

---

ROLE: Operator with sudo on the Hostinger VM and Telegram + Uma GitHub access.

OBJECTIVE: Bring the Bheem (Uma-user) Hermes instance up to parity with
Vijay (root) so the dashboard's `getHermesOpsSnapshot()` ops-panel stops
surfacing Bheem-only warnings (backup-timer-inactive, repo-not-readable,
google-token-missing). When this brief is done, "Healthy instances" should
read `2/2` and the per-instance roll-up cards on `/hermes` should show Bheem
green across the board.

CONTEXT (read first):
- VM: `bytelyst@hostinger-vm` (Tailscale only — no public ingress).
- Two Hermes instances colocated:
  - **Vijay**: root user, `/root/.hermes`, gateway = `hermes-gateway.service`,
    backup timer = `hermes-root-backup.timer`, backup repo = `bytelyst/
    bytelyst_hostinger_hermes_vm` on GitHub, watchdog =
    `scripts/hermes-health-watchdog.py` running under root systemd, alerts
    to root's Telegram chat.
  - **Bheem**: `uma` user, `/home/uma/.hermes`, gateway =
    `uma-hermes-gateway.service` (user systemd), no backup timer yet, no
    persistent backup repo, no watchdog. This is the gap.
- Decision #5 (in the v2 roadmap): Bheem **self-pushes** its own backup
  with a Uma-owned, repo-scoped GitHub PAT. Root must NOT push Uma's
  backup. Each instance owns its own credentials.

GUARDRAILS:
- Tailscale-only access; never expose any new port publicly.
- Sanitize before commit/push: `state.db`, SQLite WAL/SHM, secrets, OAuth
  tokens, `.env` files must all be gitignored in the persistent-backup
  repo. Use the same allowlist/denylist that root's backup uses (see
  `scripts/hermes-persistent-backup.sh` or equivalent).
- Don't commit credentials anywhere.
- Mirror root's design — don't invent a new pattern.

TASKS (in order):

1. **Uma persistent-backup repo + timer.**
   - Create `umadev0931/uma_hostinger_hermes_vm` on GitHub (private).
   - Generate a fine-grained PAT scoped to that repo only (`Contents: rw`,
     `Metadata: ro`). Store it in `~uma/.config/hermes/github_token` mode
     `600`, owned by `uma:uma`.
   - Mirror `scripts/hermes-persistent-backup.sh` into a Uma-owned variant
     (could be the same script with `HERMES_HOME=/home/uma/.hermes
     HERMES_BACKUP_REPO=...`). Run it once manually to populate the repo
     and confirm sanitization.
   - Install `uma-hermes-backup.service` + `uma-hermes-backup.timer` as
     **user** systemd units (`~uma/.config/systemd/user/`). Enable with
     `systemctl --user --machine=uma@.host enable --now`.
   - Verify: `runuser -u uma -- systemctl --user is-active uma-hermes-
     backup.timer` returns `active`. The dashboard's hermes-ops endpoint
     uses exactly this probe.

2. **Uma health watchdog.**
   - Mirror `scripts/hermes-health-watchdog.py` into a Uma-owned variant:
     same checks (gateway active, dashboard reachable, backup repo
     freshness, disk, memory), but reading from `/home/uma/.hermes` and
     posting to **Uma's** Telegram chat (separate token + chat ID from
     root's).
   - Telegram credentials: store in `~uma/.config/hermes/telegram` mode
     `600`. Format: two lines, `BOT_TOKEN=...` then `CHAT_ID=...`.
   - Silent on success — only post when something is wrong (mirror the
     root watchdog's behaviour). Verify by manually breaking a check
     (e.g. stop the gateway briefly) and confirming the alert lands in
     Uma's Telegram, not root's.
   - Install as `uma-hermes-health-watchdog.timer` user-systemd unit, run
     every 5 minutes.

3. **First Uma restore rehearsal.**
   - Pick a temporary `HERMES_HOME=/tmp/uma-restore-rehearsal-<date>`.
   - Clone `umadev0931/uma_hostinger_hermes_vm` into it.
   - Verify the rehearsal Hermes starts cleanly (gateway probe + sessions
     load). Tear down the rehearsal dir.
   - Document the exact steps you ran in `docs/hermes-disaster-recovery.md`
     under a new "Bheem (Uma) restore" section — same depth as the
     existing root section.

4. **Quarterly restore-drill reminder.**
   - Add a calendar reminder (or a Hermes cron entry on either instance)
     to repeat the restore rehearsal every 90 days. Document the cadence
     in `docs/hermes-operations.md`.

5. **Confirm the dashboard agrees.**
   - Hit `GET /api/hermes/ops` (admin token, via Tailscale or SSH tunnel).
     The `instances[].backup.timer.active` for Bheem should be `true`,
     `instances[].backup.repo.status` should be `up`, and
     `instances[].google.workspaceToken` should be `true` (if Google
     workspace integration is part of Bheem's scope — confirm with the
     founder before scoping that in).
   - Hit `GET /api/hermes/telemetry/bheem`. `backupHistory.status` should
     be `up` (the new Uma backup repo is readable), `watchdog.status`
     should be `up` (the new watchdog log exists and parses).
   - Open the Mission Control dashboard. The "Per-instance roll-up"
     section should show Bheem with no warnings; the ops panel should
     read "Healthy instances 2/2".

REPORTING:
When finished, report (commit-style summary):
- Repo URL of the new Uma backup repo + sample commit list.
- Paths of the new systemd unit files.
- Output of `runuser -u uma -- systemctl --user list-timers`.
- Output of `GET /api/hermes/ops` (sanitized).
- Output of `GET /api/hermes/telemetry/bheem` (sanitized).
- A summary diff of `docs/hermes-disaster-recovery.md` and
  `docs/hermes-operations.md`.

DEFINITION OF DONE:
- All five Bheem-only warnings closed in `getHermesOpsSnapshot()`.
- Telemetry endpoint reports `up` for backup-history + watchdog on bheem.
- Restore drill is documented and the next-drill reminder is scheduled.
- Operator (you) signs off in the corresponding roadmap checkboxes.