Two threads, one commit because they're both about closing dashboard-
side roadmap items that don't need their own slice.
Phase 7 — auth coverage on hermes routes:
- `/api/hermes/ops` was the last unauthenticated Hermes endpoint —
despite revealing instance / gateway / Tailscale-IP / backup-repo /
warnings state. Now gated on `requireAdmin`, matching the new
`/api/hermes/telemetry/:instance` from the previous slice and
every other privileged route in this backend.
- Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to
show `requireAdmin` for both Hermes routes; the previous
"no auth, read-only ops snapshot" carve-out is gone.
- Roadmap Phase 7 ticks for "require auth on hermes routes" + "keep
hermes data private-only" with verification notes.
Phase 4 — Bheem/Uma parity (delegation brief):
- Phase 4 is **VM ops, not codebase work** — it requires sudo on the
Hostinger VM, Uma-owned GitHub credentials, and Telegram bot
tokens. None of it is editable in this repo. Wrote
`docs/prompts/phase4-bheem-uma-parity.md` as a self-contained
delegation brief covering: Uma persistent-backup repo + timer,
Uma health watchdog, first restore rehearsal, quarterly drill
reminder, and the dashboard-side verification (the /hermes/ops +
/hermes/telemetry/bheem outputs that confirm the gap is closed).
- Phase 4 section header in the roadmap now points at the brief
and explains why the checkboxes stay open in this repo.
Verified: backend 57/57 unit tests ✅, web 7/7 E2E ✅ (Playwright
mocks bypass requireAdmin since they fulfill before the request
reaches Fastify; real auth'd users get the same flow as every other
admin route). Lint 0 errors, build green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
123 lines
6.0 KiB
Markdown
123 lines
6.0 KiB
Markdown
# Delegation Brief — Phase 4: Bheem/Uma parity
|
|
|
|
> Self-contained task brief for a VM-side agent (Hermes `delegate_task`, a
|
|
> manual ops session, or a fresh remote Devin session that has SSH/console
|
|
> access to the Hostinger VM). This is **VM operations work**, not codebase
|
|
> work — none of the steps below are made by editing files in this repo.
|
|
>
|
|
> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 4),
|
|
> `docs/hermes-operations.md`, `docs/hermes-disaster-recovery.md`,
|
|
> `scripts/hermes-health-watchdog.py`.
|
|
|
|
---
|
|
|
|
ROLE: Operator with sudo on the Hostinger VM and Telegram + Uma GitHub access.
|
|
|
|
OBJECTIVE: Bring the Bheem (Uma-user) Hermes instance up to parity with
|
|
Vijay (root) so the dashboard's `getHermesOpsSnapshot()` ops-panel stops
|
|
surfacing Bheem-only warnings (backup-timer-inactive, repo-not-readable,
|
|
google-token-missing). When this brief is done, "Healthy instances" should
|
|
read `2/2` and the per-instance roll-up cards on `/hermes` should show Bheem
|
|
green across the board.
|
|
|
|
CONTEXT (read first):
|
|
- VM: `bytelyst@hostinger-vm` (Tailscale only — no public ingress).
|
|
- Two Hermes instances colocated:
|
|
- **Vijay**: root user, `/root/.hermes`, gateway = `hermes-gateway.service`,
|
|
backup timer = `hermes-root-backup.timer`, backup repo = `bytelyst/
|
|
bytelyst_hostinger_hermes_vm` on GitHub, watchdog =
|
|
`scripts/hermes-health-watchdog.py` running under root systemd, alerts
|
|
to root's Telegram chat.
|
|
- **Bheem**: `uma` user, `/home/uma/.hermes`, gateway =
|
|
`uma-hermes-gateway.service` (user systemd), no backup timer yet, no
|
|
persistent backup repo, no watchdog. This is the gap.
|
|
- Decision #5 (in the v2 roadmap): Bheem **self-pushes** its own backup
|
|
with a Uma-owned, repo-scoped GitHub PAT. Root must NOT push Uma's
|
|
backup. Each instance owns its own credentials.
|
|
|
|
GUARDRAILS:
|
|
- Tailscale-only access; never expose any new port publicly.
|
|
- Sanitize before commit/push: `state.db`, SQLite WAL/SHM, secrets, OAuth
|
|
tokens, `.env` files must all be gitignored in the persistent-backup
|
|
repo. Use the same allowlist/denylist that root's backup uses (see
|
|
`scripts/hermes-persistent-backup.sh` or equivalent).
|
|
- Don't commit credentials anywhere.
|
|
- Mirror root's design — don't invent a new pattern.
|
|
|
|
TASKS (in order):
|
|
|
|
1. **Uma persistent-backup repo + timer.**
|
|
- Create `umadev0931/uma_hostinger_hermes_vm` on GitHub (private).
|
|
- Generate a fine-grained PAT scoped to that repo only (`Contents: rw`,
|
|
`Metadata: ro`). Store it in `~uma/.config/hermes/github_token` mode
|
|
`600`, owned by `uma:uma`.
|
|
- Mirror `scripts/hermes-persistent-backup.sh` into a Uma-owned variant
|
|
(could be the same script with `HERMES_HOME=/home/uma/.hermes
|
|
HERMES_BACKUP_REPO=...`). Run it once manually to populate the repo
|
|
and confirm sanitization.
|
|
- Install `uma-hermes-backup.service` + `uma-hermes-backup.timer` as
|
|
**user** systemd units (`~uma/.config/systemd/user/`). Enable with
|
|
`systemctl --user --machine=uma@.host enable --now`.
|
|
- Verify: `runuser -u uma -- systemctl --user is-active uma-hermes-
|
|
backup.timer` returns `active`. The dashboard's hermes-ops endpoint
|
|
uses exactly this probe.
|
|
|
|
2. **Uma health watchdog.**
|
|
- Mirror `scripts/hermes-health-watchdog.py` into a Uma-owned variant:
|
|
same checks (gateway active, dashboard reachable, backup repo
|
|
freshness, disk, memory), but reading from `/home/uma/.hermes` and
|
|
posting to **Uma's** Telegram chat (separate token + chat ID from
|
|
root's).
|
|
- Telegram credentials: store in `~uma/.config/hermes/telegram` mode
|
|
`600`. Format: two lines, `BOT_TOKEN=...` then `CHAT_ID=...`.
|
|
- Silent on success — only post when something is wrong (mirror the
|
|
root watchdog's behaviour). Verify by manually breaking a check
|
|
(e.g. stop the gateway briefly) and confirming the alert lands in
|
|
Uma's Telegram, not root's.
|
|
- Install as `uma-hermes-health-watchdog.timer` user-systemd unit, run
|
|
every 5 minutes.
|
|
|
|
3. **First Uma restore rehearsal.**
|
|
- Pick a temporary `HERMES_HOME=/tmp/uma-restore-rehearsal-<date>`.
|
|
- Clone `umadev0931/uma_hostinger_hermes_vm` into it.
|
|
- Verify the rehearsal Hermes starts cleanly (gateway probe + sessions
|
|
load). Tear down the rehearsal dir.
|
|
- Document the exact steps you ran in `docs/hermes-disaster-recovery.md`
|
|
under a new "Bheem (Uma) restore" section — same depth as the
|
|
existing root section.
|
|
|
|
4. **Quarterly restore-drill reminder.**
|
|
- Add a calendar reminder (or a Hermes cron entry on either instance)
|
|
to repeat the restore rehearsal every 90 days. Document the cadence
|
|
in `docs/hermes-operations.md`.
|
|
|
|
5. **Confirm the dashboard agrees.**
|
|
- Hit `GET /api/hermes/ops` (admin token, via Tailscale or SSH tunnel).
|
|
The `instances[].backup.timer.active` for Bheem should be `true`,
|
|
`instances[].backup.repo.status` should be `up`, and
|
|
`instances[].google.workspaceToken` should be `true` (if Google
|
|
workspace integration is part of Bheem's scope — confirm with the
|
|
founder before scoping that in).
|
|
- Hit `GET /api/hermes/telemetry/bheem`. `backupHistory.status` should
|
|
be `up` (the new Uma backup repo is readable), `watchdog.status`
|
|
should be `up` (the new watchdog log exists and parses).
|
|
- Open the Mission Control dashboard. The "Per-instance roll-up"
|
|
section should show Bheem with no warnings; the ops panel should
|
|
read "Healthy instances 2/2".
|
|
|
|
REPORTING:
|
|
When finished, report (commit-style summary):
|
|
- Repo URL of the new Uma backup repo + sample commit list.
|
|
- Paths of the new systemd unit files.
|
|
- Output of `runuser -u uma -- systemctl --user list-timers`.
|
|
- Output of `GET /api/hermes/ops` (sanitized).
|
|
- Output of `GET /api/hermes/telemetry/bheem` (sanitized).
|
|
- A summary diff of `docs/hermes-disaster-recovery.md` and
|
|
`docs/hermes-operations.md`.
|
|
|
|
DEFINITION OF DONE:
|
|
- All five Bheem-only warnings closed in `getHermesOpsSnapshot()`.
|
|
- Telemetry endpoint reports `up` for backup-history + watchdog on bheem.
|
|
- Restore drill is documented and the next-drill reminder is scheduled.
|
|
- Operator (you) signs off in the corresponding roadmap checkboxes.
|