bytelyst-devops-tools/docs/prompts/phase4-bheem-uma-parity.md
Hermes VM efdf41f2bb feat(dashboard): Phase 7 — gate /hermes/ops on requireAdmin + Phase 4 brief
Two threads, one commit because they're both about closing dashboard-
side roadmap items that don't need their own slice.

Phase 7 — auth coverage on hermes routes:
  - `/api/hermes/ops` was the last unauthenticated Hermes endpoint —
    despite revealing instance / gateway / Tailscale-IP / backup-repo /
    warnings state. Now gated on `requireAdmin`, matching the new
    `/api/hermes/telemetry/:instance` from the previous slice and
    every other privileged route in this backend.
  - Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to
    show `requireAdmin` for both Hermes routes; the previous
    "no auth, read-only ops snapshot" carve-out is gone.
  - Roadmap Phase 7 ticks for "require auth on hermes routes" + "keep
    hermes data private-only" with verification notes.

Phase 4 — Bheem/Uma parity (delegation brief):
  - Phase 4 is **VM ops, not codebase work** — it requires sudo on the
    Hostinger VM, Uma-owned GitHub credentials, and Telegram bot
    tokens. None of it is editable in this repo. Wrote
    `docs/prompts/phase4-bheem-uma-parity.md` as a self-contained
    delegation brief covering: Uma persistent-backup repo + timer,
    Uma health watchdog, first restore rehearsal, quarterly drill
    reminder, and the dashboard-side verification (the /hermes/ops +
    /hermes/telemetry/bheem outputs that confirm the gap is closed).
  - Phase 4 section header in the roadmap now points at the brief
    and explains why the checkboxes stay open in this repo.

Verified: backend 57/57 unit tests , web 7/7 E2E  (Playwright
mocks bypass requireAdmin since they fulfill before the request
reaches Fastify; real auth'd users get the same flow as every other
admin route). Lint 0 errors, build green.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 07:59:48 +00:00

6.0 KiB

Delegation Brief — Phase 4: Bheem/Uma parity

Self-contained task brief for a VM-side agent (Hermes delegate_task, a manual ops session, or a fresh remote Devin session that has SSH/console access to the Hostinger VM). This is VM operations work, not codebase work — none of the steps below are made by editing files in this repo.

Related: docs/hermes_dashboard_v2_roadmap.md (Phase 4), docs/hermes-operations.md, docs/hermes-disaster-recovery.md, scripts/hermes-health-watchdog.py.


ROLE: Operator with sudo on the Hostinger VM and Telegram + Uma GitHub access.

OBJECTIVE: Bring the Bheem (Uma-user) Hermes instance up to parity with Vijay (root) so the dashboard's getHermesOpsSnapshot() ops-panel stops surfacing Bheem-only warnings (backup-timer-inactive, repo-not-readable, google-token-missing). When this brief is done, "Healthy instances" should read 2/2 and the per-instance roll-up cards on /hermes should show Bheem green across the board.

CONTEXT (read first):

  • VM: bytelyst@hostinger-vm (Tailscale only — no public ingress).
  • Two Hermes instances colocated:
    • Vijay: root user, /root/.hermes, gateway = hermes-gateway.service, backup timer = hermes-root-backup.timer, backup repo = bytelyst/ bytelyst_hostinger_hermes_vm on GitHub, watchdog = scripts/hermes-health-watchdog.py running under root systemd, alerts to root's Telegram chat.
    • Bheem: uma user, /home/uma/.hermes, gateway = uma-hermes-gateway.service (user systemd), no backup timer yet, no persistent backup repo, no watchdog. This is the gap.
  • Decision #5 (in the v2 roadmap): Bheem self-pushes its own backup with a Uma-owned, repo-scoped GitHub PAT. Root must NOT push Uma's backup. Each instance owns its own credentials.

GUARDRAILS:

  • Tailscale-only access; never expose any new port publicly.
  • Sanitize before commit/push: state.db, SQLite WAL/SHM, secrets, OAuth tokens, .env files must all be gitignored in the persistent-backup repo. Use the same allowlist/denylist that root's backup uses (see scripts/hermes-persistent-backup.sh or equivalent).
  • Don't commit credentials anywhere.
  • Mirror root's design — don't invent a new pattern.

TASKS (in order):

  1. Uma persistent-backup repo + timer.

    • Create umadev0931/uma_hostinger_hermes_vm on GitHub (private).
    • Generate a fine-grained PAT scoped to that repo only (Contents: rw, Metadata: ro). Store it in ~uma/.config/hermes/github_token mode 600, owned by uma:uma.
    • Mirror scripts/hermes-persistent-backup.sh into a Uma-owned variant (could be the same script with HERMES_HOME=/home/uma/.hermes HERMES_BACKUP_REPO=...). Run it once manually to populate the repo and confirm sanitization.
    • Install uma-hermes-backup.service + uma-hermes-backup.timer as user systemd units (~uma/.config/systemd/user/). Enable with systemctl --user --machine=uma@.host enable --now.
    • Verify: runuser -u uma -- systemctl --user is-active uma-hermes- backup.timer returns active. The dashboard's hermes-ops endpoint uses exactly this probe.
  2. Uma health watchdog.

    • Mirror scripts/hermes-health-watchdog.py into a Uma-owned variant: same checks (gateway active, dashboard reachable, backup repo freshness, disk, memory), but reading from /home/uma/.hermes and posting to Uma's Telegram chat (separate token + chat ID from root's).
    • Telegram credentials: store in ~uma/.config/hermes/telegram mode 600. Format: two lines, BOT_TOKEN=... then CHAT_ID=....
    • Silent on success — only post when something is wrong (mirror the root watchdog's behaviour). Verify by manually breaking a check (e.g. stop the gateway briefly) and confirming the alert lands in Uma's Telegram, not root's.
    • Install as uma-hermes-health-watchdog.timer user-systemd unit, run every 5 minutes.
  3. First Uma restore rehearsal.

    • Pick a temporary HERMES_HOME=/tmp/uma-restore-rehearsal-<date>.
    • Clone umadev0931/uma_hostinger_hermes_vm into it.
    • Verify the rehearsal Hermes starts cleanly (gateway probe + sessions load). Tear down the rehearsal dir.
    • Document the exact steps you ran in docs/hermes-disaster-recovery.md under a new "Bheem (Uma) restore" section — same depth as the existing root section.
  4. Quarterly restore-drill reminder.

    • Add a calendar reminder (or a Hermes cron entry on either instance) to repeat the restore rehearsal every 90 days. Document the cadence in docs/hermes-operations.md.
  5. Confirm the dashboard agrees.

    • Hit GET /api/hermes/ops (admin token, via Tailscale or SSH tunnel). The instances[].backup.timer.active for Bheem should be true, instances[].backup.repo.status should be up, and instances[].google.workspaceToken should be true (if Google workspace integration is part of Bheem's scope — confirm with the founder before scoping that in).
    • Hit GET /api/hermes/telemetry/bheem. backupHistory.status should be up (the new Uma backup repo is readable), watchdog.status should be up (the new watchdog log exists and parses).
    • Open the Mission Control dashboard. The "Per-instance roll-up" section should show Bheem with no warnings; the ops panel should read "Healthy instances 2/2".

REPORTING: When finished, report (commit-style summary):

  • Repo URL of the new Uma backup repo + sample commit list.
  • Paths of the new systemd unit files.
  • Output of runuser -u uma -- systemctl --user list-timers.
  • Output of GET /api/hermes/ops (sanitized).
  • Output of GET /api/hermes/telemetry/bheem (sanitized).
  • A summary diff of docs/hermes-disaster-recovery.md and docs/hermes-operations.md.

DEFINITION OF DONE:

  • All five Bheem-only warnings closed in getHermesOpsSnapshot().
  • Telemetry endpoint reports up for backup-history + watchdog on bheem.
  • Restore drill is documented and the next-drill reminder is scheduled.
  • Operator (you) signs off in the corresponding roadmap checkboxes.