From efdf41f2bb57766f2c0505e3c5f5efe43a8f61a5 Mon Sep 17 00:00:00 2001 From: Hermes VM Date: Sat, 30 May 2026 07:59:36 +0000 Subject: [PATCH] =?UTF-8?q?feat(dashboard):=20Phase=207=20=E2=80=94=20gate?= =?UTF-8?q?=20/hermes/ops=20on=20requireAdmin=20+=20Phase=204=20brief?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two threads, one commit because they're both about closing dashboard- side roadmap items that don't need their own slice. Phase 7 — auth coverage on hermes routes: - `/api/hermes/ops` was the last unauthenticated Hermes endpoint — despite revealing instance / gateway / Tailscale-IP / backup-repo / warnings state. Now gated on `requireAdmin`, matching the new `/api/hermes/telemetry/:instance` from the previous slice and every other privileged route in this backend. - Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to show `requireAdmin` for both Hermes routes; the previous "no auth, read-only ops snapshot" carve-out is gone. - Roadmap Phase 7 ticks for "require auth on hermes routes" + "keep hermes data private-only" with verification notes. Phase 4 — Bheem/Uma parity (delegation brief): - Phase 4 is **VM ops, not codebase work** — it requires sudo on the Hostinger VM, Uma-owned GitHub credentials, and Telegram bot tokens. None of it is editable in this repo. Wrote `docs/prompts/phase4-bheem-uma-parity.md` as a self-contained delegation brief covering: Uma persistent-backup repo + timer, Uma health watchdog, first restore rehearsal, quarterly drill reminder, and the dashboard-side verification (the /hermes/ops + /hermes/telemetry/bheem outputs that confirm the gap is closed). - Phase 4 section header in the roadmap now points at the brief and explains why the checkboxes stay open in this repo. Verified: backend 57/57 unit tests ✅, web 7/7 E2E ✅ (Playwright mocks bypass requireAdmin since they fulfill before the request reaches Fastify; real auth'd users get the same flow as every other admin route). Lint 0 errors, build green. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dashboard/DEPLOYMENT.md | 3 +- .../backend/src/modules/hermes-ops/routes.ts | 9 +- docs/hermes_dashboard_v2_roadmap.md | 10 +- docs/prompts/phase4-bheem-uma-parity.md | 122 ++++++++++++++++++ 4 files changed, 138 insertions(+), 6 deletions(-) create mode 100644 docs/prompts/phase4-bheem-uma-parity.md diff --git a/dashboard/DEPLOYMENT.md b/dashboard/DEPLOYMENT.md index b7ca70e..23549fd 100644 --- a/dashboard/DEPLOYMENT.md +++ b/dashboard/DEPLOYMENT.md @@ -383,7 +383,8 @@ expected via Tailscale or an SSH tunnel. Any reverse proxy in front of it | `GET /vm/ollama/models`, `DELETE /vm/ollama/models/:name` | `vm/repository.ts` | HTTP-only (talks to host Ollama via `host-gateway`). No shell-out. | `requireAdmin` | | `POST /code-quality/check` | `code-quality/repository.ts` | `npm run typecheck`, `npm run lint`, `npm run build`, `npm run test:run` in the request-supplied `projectPath`. | `requireAdmin` *(added concurrently with this doc; previously unauthenticated — see the Phase 5 P1 commit)* | | `POST /deployments/trigger/:serviceId` | `deployments/orchestrator.ts` | `bash ` from the registered service registry (paths are stored at create-time, not request-time). | `requireAdmin` | -| `/hermes/ops` (snapshot) | `hermes-ops/repository.ts` | Read-only probes: `systemctl is-active/is-enabled`, `git status`, `du -sh`, `ps`, `tailscale ip`, `runuser -u uma -- systemctl --user ...`. No state-changing commands. | none (read-only ops snapshot) | +| `/hermes/ops` (snapshot) | `hermes-ops/repository.ts` | Read-only probes: `systemctl is-active/is-enabled`, `git status`, `du -sh`, `ps`, `tailscale ip`, `runuser -u uma -- systemctl --user ...`. No state-changing commands. | `requireAdmin` *(Phase 7 — private-only)* | +| `/hermes/telemetry/:instance` | `hermes-telemetry/repository.ts` | Read-only: `runuser -u -- hermes sessions/cron/memory/skills list --json`, `git -C log`, tail of the watchdog log. No state-changing commands. | `requireAdmin` | ### Blast radius if an admin token is leaked diff --git a/dashboard/backend/src/modules/hermes-ops/routes.ts b/dashboard/backend/src/modules/hermes-ops/routes.ts index 9c5b536..022bebc 100644 --- a/dashboard/backend/src/modules/hermes-ops/routes.ts +++ b/dashboard/backend/src/modules/hermes-ops/routes.ts @@ -1,9 +1,16 @@ import type { FastifyInstance } from 'fastify'; import { getHermesOpsSnapshot } from './repository.js'; import { HermesOpsSnapshotSchema } from './types.js'; +import { requireAdmin } from '../../lib/auth.js'; export async function hermesOpsRoutes(fastify: FastifyInstance) { - fastify.get('/hermes/ops', async (req, reply) => { + // Admin-only. Phase 7 — every Hermes route is private-only and requires + // a valid platform admin token. The snapshot reveals which instances / + // gateways are up, the Tailscale IP, the persistent-backup repo paths, + // and recent ops warnings; not a public endpoint. + fastify.get('/hermes/ops', { + preHandler: async (req) => requireAdmin(req), + }, async (req, reply) => { try { const snapshot = await getHermesOpsSnapshot(); // Validate our own output against the stable contract before sending, so a diff --git a/docs/hermes_dashboard_v2_roadmap.md b/docs/hermes_dashboard_v2_roadmap.md index 1bfe770..21748cb 100644 --- a/docs/hermes_dashboard_v2_roadmap.md +++ b/docs/hermes_dashboard_v2_roadmap.md @@ -112,6 +112,8 @@ Define the ingestion contract first, then convert panes. Keep any pane with no r This is the biggest operational asymmetry and the reason half the ops-panel warnings are Bheem-only. +> **VM ops, not codebase work.** This phase requires sudo on the Hostinger VM, Uma-owned GitHub credentials, and Telegram bot tokens — none of it is editable in this repo. The full delegation brief is in [`docs/prompts/phase4-bheem-uma-parity.md`](./prompts/phase4-bheem-uma-parity.md). When the brief's Definition-of-Done is met, tick the boxes below and the summary line at the bottom of this file. + - [ ] Stand up a **Uma persistent backup repo + `uma-hermes-backup.timer`** mirroring the root design (sanitized `hermes_persistent_backup/`, secrets and `state.db` excluded), pushing to `umadev0931/uma_hostinger_hermes_vm` **with a Uma-owned, repo-scoped token (Bheem self-pushes; root no longer pushes Uma's backup — Decision #5)**. - [ ] Install a **Uma health watchdog** (mirror `scripts/hermes-health-watchdog.py`), silent-on-success, alerting Uma's Telegram. - [ ] Run the **first Uma restore rehearsal** into a temporary `HERMES_HOME`; document in `docs/hermes-operations.md` / `docs/hermes-disaster-recovery.md`. @@ -139,10 +141,10 @@ This is the biggest operational asymmetry and the reason half the ops-panel warn ## Phase 7 — Security & access (G8) -- [ ] Require authentication on the DevOps dashboard's hermes routes/endpoints (reuse platform-service auth already used elsewhere). -- [ ] Decide and document `security.redact_secrets` and `privacy.redact_pii` for gateway sessions (per instance). -- [ ] Finish the GitHub/Gitea **least-privilege token audit** (root currently pushes both repos) and rotate any migrated/exposed credentials — completed naturally by Decision #5 (Bheem self-pushes with its own scoped token). -- [ ] Keep all hermes data private-only; never expose the `hermes-ops` snapshot or task data on a public route. +- [x] Require authentication on the DevOps dashboard's hermes routes/endpoints (reuse platform-service auth already used elsewhere). *(Both `/api/hermes/ops` and the new `/api/hermes/telemetry/:instance` now gate on `requireAdmin`. Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to match. The previous "read-only ops snapshot, no auth" carve-out is gone — all Hermes routes are admin-only.)* +- [ ] Decide and document `security.redact_secrets` and `privacy.redact_pii` for gateway sessions (per instance). *(Deferred — needs a founder decision on PII handling for session content; not a code-only change.)* +- [ ] Finish the GitHub/Gitea **least-privilege token audit** (root currently pushes both repos) and rotate any migrated/exposed credentials — completed naturally by Decision #5 (Bheem self-pushes with its own scoped token). *(Resolves naturally when Phase 4 ships — see the Phase 4 delegation brief.)* +- [x] Keep all hermes data private-only; never expose the `hermes-ops` snapshot or task data on a public route. *(Verified: no Caddy/public route added; the dashboard is bound to `127.0.0.1` and reached via Tailscale or SSH tunnel only — see `dashboard/DEPLOYMENT.md` "Ports — quick reference" + "Privilege Surface" sections. With this commit's `requireAdmin` change, even an attacker with internal network access still needs a valid admin JWT to read the ops snapshot.)* ## Phase 8 — Notifications & Telegram loop (G9) diff --git a/docs/prompts/phase4-bheem-uma-parity.md b/docs/prompts/phase4-bheem-uma-parity.md new file mode 100644 index 0000000..e29dfbb --- /dev/null +++ b/docs/prompts/phase4-bheem-uma-parity.md @@ -0,0 +1,122 @@ +# Delegation Brief — Phase 4: Bheem/Uma parity + +> Self-contained task brief for a VM-side agent (Hermes `delegate_task`, a +> manual ops session, or a fresh remote Devin session that has SSH/console +> access to the Hostinger VM). This is **VM operations work**, not codebase +> work — none of the steps below are made by editing files in this repo. +> +> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 4), +> `docs/hermes-operations.md`, `docs/hermes-disaster-recovery.md`, +> `scripts/hermes-health-watchdog.py`. + +--- + +ROLE: Operator with sudo on the Hostinger VM and Telegram + Uma GitHub access. + +OBJECTIVE: Bring the Bheem (Uma-user) Hermes instance up to parity with +Vijay (root) so the dashboard's `getHermesOpsSnapshot()` ops-panel stops +surfacing Bheem-only warnings (backup-timer-inactive, repo-not-readable, +google-token-missing). When this brief is done, "Healthy instances" should +read `2/2` and the per-instance roll-up cards on `/hermes` should show Bheem +green across the board. + +CONTEXT (read first): +- VM: `bytelyst@hostinger-vm` (Tailscale only — no public ingress). +- Two Hermes instances colocated: + - **Vijay**: root user, `/root/.hermes`, gateway = `hermes-gateway.service`, + backup timer = `hermes-root-backup.timer`, backup repo = `bytelyst/ + bytelyst_hostinger_hermes_vm` on GitHub, watchdog = + `scripts/hermes-health-watchdog.py` running under root systemd, alerts + to root's Telegram chat. + - **Bheem**: `uma` user, `/home/uma/.hermes`, gateway = + `uma-hermes-gateway.service` (user systemd), no backup timer yet, no + persistent backup repo, no watchdog. This is the gap. +- Decision #5 (in the v2 roadmap): Bheem **self-pushes** its own backup + with a Uma-owned, repo-scoped GitHub PAT. Root must NOT push Uma's + backup. Each instance owns its own credentials. + +GUARDRAILS: +- Tailscale-only access; never expose any new port publicly. +- Sanitize before commit/push: `state.db`, SQLite WAL/SHM, secrets, OAuth + tokens, `.env` files must all be gitignored in the persistent-backup + repo. Use the same allowlist/denylist that root's backup uses (see + `scripts/hermes-persistent-backup.sh` or equivalent). +- Don't commit credentials anywhere. +- Mirror root's design — don't invent a new pattern. + +TASKS (in order): + +1. **Uma persistent-backup repo + timer.** + - Create `umadev0931/uma_hostinger_hermes_vm` on GitHub (private). + - Generate a fine-grained PAT scoped to that repo only (`Contents: rw`, + `Metadata: ro`). Store it in `~uma/.config/hermes/github_token` mode + `600`, owned by `uma:uma`. + - Mirror `scripts/hermes-persistent-backup.sh` into a Uma-owned variant + (could be the same script with `HERMES_HOME=/home/uma/.hermes + HERMES_BACKUP_REPO=...`). Run it once manually to populate the repo + and confirm sanitization. + - Install `uma-hermes-backup.service` + `uma-hermes-backup.timer` as + **user** systemd units (`~uma/.config/systemd/user/`). Enable with + `systemctl --user --machine=uma@.host enable --now`. + - Verify: `runuser -u uma -- systemctl --user is-active uma-hermes- + backup.timer` returns `active`. The dashboard's hermes-ops endpoint + uses exactly this probe. + +2. **Uma health watchdog.** + - Mirror `scripts/hermes-health-watchdog.py` into a Uma-owned variant: + same checks (gateway active, dashboard reachable, backup repo + freshness, disk, memory), but reading from `/home/uma/.hermes` and + posting to **Uma's** Telegram chat (separate token + chat ID from + root's). + - Telegram credentials: store in `~uma/.config/hermes/telegram` mode + `600`. Format: two lines, `BOT_TOKEN=...` then `CHAT_ID=...`. + - Silent on success — only post when something is wrong (mirror the + root watchdog's behaviour). Verify by manually breaking a check + (e.g. stop the gateway briefly) and confirming the alert lands in + Uma's Telegram, not root's. + - Install as `uma-hermes-health-watchdog.timer` user-systemd unit, run + every 5 minutes. + +3. **First Uma restore rehearsal.** + - Pick a temporary `HERMES_HOME=/tmp/uma-restore-rehearsal-`. + - Clone `umadev0931/uma_hostinger_hermes_vm` into it. + - Verify the rehearsal Hermes starts cleanly (gateway probe + sessions + load). Tear down the rehearsal dir. + - Document the exact steps you ran in `docs/hermes-disaster-recovery.md` + under a new "Bheem (Uma) restore" section — same depth as the + existing root section. + +4. **Quarterly restore-drill reminder.** + - Add a calendar reminder (or a Hermes cron entry on either instance) + to repeat the restore rehearsal every 90 days. Document the cadence + in `docs/hermes-operations.md`. + +5. **Confirm the dashboard agrees.** + - Hit `GET /api/hermes/ops` (admin token, via Tailscale or SSH tunnel). + The `instances[].backup.timer.active` for Bheem should be `true`, + `instances[].backup.repo.status` should be `up`, and + `instances[].google.workspaceToken` should be `true` (if Google + workspace integration is part of Bheem's scope — confirm with the + founder before scoping that in). + - Hit `GET /api/hermes/telemetry/bheem`. `backupHistory.status` should + be `up` (the new Uma backup repo is readable), `watchdog.status` + should be `up` (the new watchdog log exists and parses). + - Open the Mission Control dashboard. The "Per-instance roll-up" + section should show Bheem with no warnings; the ops panel should + read "Healthy instances 2/2". + +REPORTING: +When finished, report (commit-style summary): +- Repo URL of the new Uma backup repo + sample commit list. +- Paths of the new systemd unit files. +- Output of `runuser -u uma -- systemctl --user list-timers`. +- Output of `GET /api/hermes/ops` (sanitized). +- Output of `GET /api/hermes/telemetry/bheem` (sanitized). +- A summary diff of `docs/hermes-disaster-recovery.md` and + `docs/hermes-operations.md`. + +DEFINITION OF DONE: +- All five Bheem-only warnings closed in `getHermesOpsSnapshot()`. +- Telemetry endpoint reports `up` for backup-history + watchdog on bheem. +- Restore drill is documented and the next-drill reminder is scheduled. +- Operator (you) signs off in the corresponding roadmap checkboxes.