Hermes VM 13e5e1c551 ci(dashboard): Phase 5 P2 — wire Playwright E2E into Gitea CI

Closes the Phase 5 P2 checkbox (second half — first half: pino logging
in 1e64d75). Phase 5 is now fully green.

Two changes:

  1. `web/e2e/hermes.spec.ts` now intercepts `/api/hermes/ops` with a
     fixture snapshot. The backend's hermes-ops endpoint shells out to
     `systemctl` / `git` / `ps` / `du` on the live VM and is therefore
     neither available nor deterministic in CI. Mocking it lets the
     suite run against the web stack alone (no backend, no live VM).
     Fixture shape mirrors the Zod schema in
     `backend/src/modules/hermes-ops/types.ts`.

  2. `.gitea/workflows/ci.yml` re-enables the previously-commented-out
     E2E step. Adds a preceding `playwright install --with-deps
     chromium` step so the runner pulls the browser fresh per run.
     The web suite starts its own Next dev server via Playwright's
     `webServer` config (`pnpm exec next dev -p 3200`), so we do NOT
     start the backend in CI — every backend route used by the suite
     is mocked via `page.route` (auth, csrf, services, deployments,
     health/cache, seed, hermes-ops).

Verified locally: `pnpm exec playwright test` → 6 passed in 19.5s
(2 hermes specs + 4 dashboard/login specs across desktop + mobile).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2026-05-30 07:28:50 +00:00

21 KiB

Raw Blame History

Hermes Mission Control v2 — Two-Instance Dashboard Roadmap

Date: 2026-05-30 Owner: ByteLyst / S (saravanakumardb) Repo: learning_ai_devops_tools (GitHub remote: bytelyst-devops-tools) Dashboard: dashboard/ — Next.js 16 web (web/, port 3000 / container 3049) + Fastify 5 backend (backend/, port 4004)

What This Roadmap Is

The two existing roadmaps are effectively complete for their original scope:

docs/hermes_dashboard_roadmap.md — built the 7-pane Hermes Mission Control UI. All checklist items are checked, but every pane except the live ops panel renders mock/seed data from web/src/lib/hermes.
docs/hermes-setup-upgrade-roadmap.md — stood up and hardened the two Hermes instances on the VM (~68% checked; the open items are mostly Uma parity, credentials, and policy decisions).

This v2 roadmap supersedes the open dashboard-related items in both and adds the missing theme: power the unified dashboard with real data from BOTH Hermes instances — Vijay/root and Bheem/Uma — and close every known gap (mock data, backend hardening, two-instance parity, app/CI hygiene, UX polish, security, and notifications).

It does not re-do anything already verified in v1. It builds on the one piece that is already real and two-instance-aware: the backend hermes-ops module.

The Two Instances (authoritative topology)

Source of truth is dashboard/backend/src/modules/hermes-ops/repository.ts.

Codename	OS user	`HERMES_HOME`	Gateway service	Private dashboard	Backup timer	Backup repo (local → GitHub)	Drive folder
Vijay	`root`	`/root/.hermes`	`hermes-gateway.service` (system)	`hermes-root-dashboard.service` → :9119	`hermes-root-backup.timer`	`/root/repos/bytelyst_hostinger_hermes_vm` → `saravanakumardb/bytelyst_hostinger_hermes_vm`	Vijay Drive
Bheem	`uma`	`/home/uma/.hermes`	`uma-hermes-gateway.service` (uma user systemd)	`uma-hermes-dashboard.service` → :9120	`uma-hermes-backup.timer`	`/home/uma/repos/uma_hostinger_hermes_vm` → `umadev0931/uma_hostinger_hermes_vm`	Bheem Drive

Both reachable privately only over Tailscale 100.87.53.10 (:9119 Vijay, :9120 Bheem). No public Caddy route. This is a hard guardrail.
Plus a root-level hermes-emergency-drive-upload.timer that pushes encrypted bundles to each instance's Google Drive folder.

Three dashboard surfaces exist today

Native per-instance Hermes dashboards — :9119 (Vijay) and :9120 (Bheem), one per user, over Tailscale. Operationally scoped, separate from this codebase.
ByteLyst Mission Control — the /hermes/* suite in this repo's DevOps dashboard (7 panes). Intended to be the unified pane-of-glass over both instances.
The live hermes-ops panel — embedded in the Mission Control overview (web/src/components/hermes-ops-panel.tsx), already rendering real, both-instance status: gateways, private dashboards, backup timers, repo HEAD/cleanliness, Google token, restore-payload counts, cron timers, emergency drive, Tailscale IP, active session count, and warnings.

Decision baked into this roadmap: invest in surface #2/#3 — make unified Mission Control the real two-instance command center — rather than expanding the native per-instance dashboards. The hermes-ops module is the seed; everything below extends it.

Goal / Target State

A single private dashboard where, for both Vijay and Bheem, S can see at a glance:

live instance health (gateway, dashboard, cron, backup freshness, disk/mem, Google auth) — real, cached, robust
everything each Hermes is doing / did / failed / is blocked on — from real session/cron/task telemetry, filterable by instance
backup & disaster-recovery posture at parity across both instances
what needs founder attention, pushed to the right Telegram chat

…with the whole thing private-only, authenticated, tested, and in CI.

Gap Inventory (consolidated)

ID	Gap	Source	Severity
G1	6 of 7 Mission Control panes are mock (`web/src/lib/hermes`)	v1 roadmap / README "read-only mock"	High
G2	Tasks/Products/History/Agents have no instance dimension (Vijay vs Bheem)	this review	High
G3	`hermes-ops` backend not hardened: no cache (~20 `execFile` per 60s poll), brittle Uma checks (`ps` string-match + hardcoded `existsSync`), errors swallowed to `null`, no tests	REVIEW_ACTIONS P1 #3	High
G4	No real telemetry ingestion (sessions, cron, memory, skills, alerts, backup history, task events)	v1 roadmap "real telemetry plan"	High
G5	App/CI hygiene: CI path wrong (P0), lint is a no-op echo (P0), thin tests (P1), SSE disabled (P1), doc drift 3000 vs 3049 (P1), privileged docker socket/host mounts undocumented (P1), no prod logging (P2), E2E not wired (P2)	REVIEW_ACTIONS	P0–P2
G6	Mission Control polish: warning severity filters, trend cards, ops→ledger deep links, per-instance action rows, theme toggle	v1 "Next Improvements"	Med
G7	Bheem/Uma parity: no persistent backup repo + timer equivalent to root, no watchdog cron, restore never tested, no quarterly drill	setup roadmap (lines 20, 146, 172, 185, 432, 447, 461)	High
G8	Security/access: devops dashboard hermes routes need auth; `security.redact_secrets` / `privacy.redact_pii` undecided; GitHub/Gitea least-privilege audit + rotation pending	setup roadmap Phase 11	High
G9	Notifications: dashboard warnings not pushed to Telegram; approval-prompt flow + media/file delivery UX unvalidated	setup roadmap Phase 6	Med

Phase 0 — Guardrails (must hold throughout)

No public Caddy route or public listener for any Hermes dashboard, the hermes-ops API, or the DevOps dashboard's hermes data. Private-only via Tailscale / SSH tunnel / loopback.
Keep Hermes command approvals at manual or smart; no gateway approval bypass.
No raw secrets, tokens, OAuth files, state.db, or SQLite WAL/SHM in any git backup or in this repo.
Re-run the Caddy/port review (docs/hermes-operations.md) before adding any route or hostname.

Phase 1 — Make the unified backend authoritative and hardened (G3)

The hermes-ops snapshot becomes the single source of truth for live status. Before building UI on it, harden it.

Add a short-TTL cache (mirror the health module's 30s cache) so the 60s panel poll doesn't fan out ~20 systemctl/git/ps/du subprocesses every refresh; serve cached snapshot with generatedAt.
Replace brittle Bheem/Uma checks in repository.ts (runuser systemctl --user with ps/existsSync fallback so a failed probe degrades to the legacy check, not a false "down"):
- isUmaGatewayActive() (currently ps -eo string match) → runuser -u uma -- systemctl --user is-active uma-hermes-gateway.service (or --machine=uma@.host).
- isUmaGatewayEnabled() (currently hardcoded existsSync of a wants-symlink) → systemctl --user is-enabled via the same path.
Stop swallowing every failure to null indiscriminately: distinguish "unit inactive" from "probe failed/timed out" and surface per-field status so the UI can show unknown vs down.
Add Zod validation + a stable typed contract for HermesOpsSnapshot on the route.
Add unit tests for the hermes-ops repository (mock execFile/fs) — closes the REVIEW_ACTIONS "only services has tests" gap for this module.
Read Bheem/Uma state via a self-reporting ops exporter (Decision #2): a read-only uma user-systemd timer writes a sanitized JSON snapshot to a known path; the root backend reads + aggregates it (Vijay gets a symmetric exporter). Interim stopgap until it ships: runuser -u uma -- systemctl --user is-active/is-enabled instead of the ps/existsSync checks.

Phase 2 — Instance dimension across Mission Control (G2)

Add instanceId: 'vijay' | 'bheem' to the core types in web/src/lib/hermes (HermesTask, HermesProduct, HermesEvent, HermesRun, agent/overview models) and to the backend contracts.
Add a global instance switcher in HermesShell (All / Vijay (root) / Bheem (uma)) with persisted selection; thread it through every pane.
Overview: show per-instance cards and a combined roll-up (extend the existing "Healthy instances 2/2" pattern from the ops panel to the whole overview).
Ledger / Products / History / Agents: filter and badge by instance.

Phase 3 — Real per-instance telemetry, replacing mock pane by pane (G1, G4)

Define the ingestion contract first, then convert panes. Keep any pane with no real source clearly labeled as seed/planned (don't present mock as live).

Primary source = real artifacts (Decision #1): sessions, cron, watchdog alerts, backup history — read-only and cached. Treat a Hermes session as the work unit. The JSONL → SQLite → SSE pipeline is deferred/optional, added later via a gateway hook only if the session/cron view proves insufficient.
Backend endpoints per instance, reading real Hermes state:
- Sessions + stats (hermes sessions stats — baseline today: Vijay 59 sessions/5225 msgs, Bheem 18/635).
- Cron jobs (hermes cron list) including backup + watchdog timers.
- Memory + skills inventory.
- Watchdog alerts feed (from hermes-health-watchdog.py output / logs).
- Backup history (git log of each backup repo: HEAD, last-commit age, freshness).
Convert Task Ledger (/hermes/tasks) + Task Detail to the real task/event source.
Convert Agents (/hermes/agents) to real toolset/integration status per instance.
Convert History (/hermes/history) to real session/cron/backup trends.
Products (/hermes/products): repoint at the real service registry (backend/src/modules/services/) + health module (Decision #3); drop the fabricated 50-item mock. Optional manual entries for not-yet-deployed products come later.

Phase 4 — Bheem/Uma parity so the dashboard shows two equal instances (G7)

This is the biggest operational asymmetry and the reason half the ops-panel warnings are Bheem-only.

Stand up a Uma persistent backup repo + uma-hermes-backup.timer mirroring the root design (sanitized hermes_persistent_backup/, secrets and state.db excluded), pushing to umadev0931/uma_hostinger_hermes_vm with a Uma-owned, repo-scoped token (Bheem self-pushes; root no longer pushes Uma's backup — Decision #5).
Install a Uma health watchdog (mirror scripts/hermes-health-watchdog.py), silent-on-success, alerting Uma's Telegram.
Run the first Uma restore rehearsal into a temporary HERMES_HOME; document in docs/hermes-operations.md / docs/hermes-disaster-recovery.md.
Schedule a quarterly Uma restore-drill reminder (parity with root).
Confirm these close the corresponding Bheem warnings emitted by getHermesOpsSnapshot() (backup timer active, repo HEAD readable + clean, Google token present).

Phase 5 — Dashboard app hardening (G5)

P0: Fix the CI workspace path (${{ gitea.workspace }}) in .gitea/workflows/ci.yml, DEPLOYMENT.md, scripts/deploy-hotcopy.sh (currently point at non-existent /opt/bytelyst/bytelyst-devops-tools/...).
P0: Replace the no-op lint echo with real linting (next lint for web, minimal ESLint for backend); make pnpm lint fail on bad code.
P1: Add tests for auth, csrf, deployments/orchestrator, health, and hermes-ops; add pnpm test:coverage gate. (35 new unit tests; v8 coverage thresholds gated on the six tested files in backend/vitest.config.ts (≥85% lines/funcs/stmts, ≥65% branches), wired into Gitea CI as a dedicated step. Today's actuals: ≥95% lines on every gated file. Ratchet up as more modules get tested.)
P1: Resolve the SSE TODO — either ship a Fastify-5-compatible log-stream or remove the SSE claim from docs/UI. (Chose remove: dropped fastify-sse-v2 dep, deleted commented-out plugin import + TODO from server.ts and deployments/routes.ts, rewrote the README/DEPLOYMENT.md "Log Streaming" section as "Logs (JSON-polled, no SSE)". Web client already polls /deployments/:id/logs via apiRequest — no UI change needed. If a real-time stream is wanted later, implement via reply.raw and update docs in the same change.)
P1: Fix doc drift (web port 3000 vs 3049; endpoint URLs; merge duplicate deployment docs). (DEPLOYMENT.md is now canonical; DEPLOYMENT_GUIDE.md reduced to a redirect stub; deploy.sh updated. Added an explicit "Ports — quick reference" table to DEPLOYMENT.md distinguishing container :3000, Compose host :3049, Traefik production. README and ENDPOINTS.md cross-link to it. Marks REVIEW_ACTIONS #5 resolved.)
P1: Document the docker-socket + host-log/script mount privilege surface (the backend reads cross-user/host paths — blast radius must be written down; consider an allow-list wrapper over the raw socket). (New "Privilege Surface" section in dashboard/DEPLOYMENT.md enumerating every mount, every shell-outing route + commands + auth gate, the blast-radius if an admin token leaks, five known sharp edges, and a P1→P3 mitigation roadmap. Concurrent fix: /code-quality/check was reachable unauthenticated despite shelling out to npm run in a caller-supplied path — requireAdmin added. Allow-list wrapper around docker/bash/npm invocations and projectPath validation are queued as the next P1s; running the container as non-root and replacing the raw docker.sock with a verb-restricted proxy are P2/P3.)
P2: Structured backend logging (pino → stdout); wire E2E (hermes.spec.ts) into CI with a started stack. (Two commits: (1) lib/logger.ts exposes a configured pino instance shared between Fastify (via loggerInstance) and any non-request code path, with LOG_LEVEL env knob and built-in redaction for Authorization/Cookie headers + common secret-shaped field names; runtime console.error sites in deployments/orchestrator, system, backup, and vm modules ported over to structured logs. (2) E2E in CI: hermes spec now intercepts /api/hermes/ops with a fixture snapshot so it's deterministic without a live backend; CI workflow runs playwright install --with-deps chromium then pnpm test:e2e (web suite starts its own Next dev via Playwright's webServer config). Verified locally: 6/6 E2E green, 51/51 unit tests green, coverage gate ≥95% lines.)

Phase 6 — Mission Control UX polish (G6)

Severity-tag warnings (info/warn/critical) and add a severity filter to the ops panel.
Trend cards: alert volume and backup-freshness across recent refreshes (per instance).
Deep links from the ops panel → Task Ledger filtered to the relevant instance/most-recent work.
Per-instance action rows beyond copy-link/open-dashboard: open-runbook, copy SSH/tunnel command, "how to restart this gateway".
Optional dark/light theme toggle if the shell supports it.
Unified alerts feed across both instances on the overview.

Phase 7 — Security & access (G8)

Require authentication on the DevOps dashboard's hermes routes/endpoints (reuse platform-service auth already used elsewhere).
Decide and document security.redact_secrets and privacy.redact_pii for gateway sessions (per instance).
Finish the GitHub/Gitea least-privilege token audit (root currently pushes both repos) and rotate any migrated/exposed credentials — completed naturally by Decision #5 (Bheem self-pushes with its own scoped token).
Keep all hermes data private-only; never expose the hermes-ops snapshot or task data on a public route.

Phase 8 — Notifications & Telegram loop (G9)

Push new dashboard-detected warnings to the correct Telegram (Vijay → root chat, Bheem → Uma chat), reusing the watchdog delivery path; silent on healthy.
Validate the Telegram approval-prompt flow and media/file delivery end-to-end (the two unchecked v1 items).
Preserve the numbered-emoji progress convention (1️⃣, 2️⃣, …) for completion updates.

Data Model Additions

// web/src/lib/hermes (and mirrored in backend contracts)
export type HermesInstanceId = 'vijay' | 'bheem';

export interface HermesInstanceRef {
  id: HermesInstanceId;
  label: string;        // "Vijay / root", "Bheem / Uma"
  user: string;         // "root" | "uma"
  hermesHome: string;
}

// add `instanceId: HermesInstanceId` to:
//   HermesTask, HermesProduct, HermesEvent, HermesRun, HermesAgentStatus, HermesOverview

Acceptance Criteria

This roadmap is complete when:

The overview, ledger, agents, and history panes render real data for both Vijay and Bheem, filterable by instance; only panes without a real source remain (clearly labeled) seed data.
hermes-ops is cached, uses robust Uma user-systemd checks, distinguishes unknown vs down, and has unit tests.
Bheem has a persistent backup repo + timer, a watchdog, and one completed restore rehearsal — and the dashboard shows 2/2 healthy with zero standing Bheem warnings.
CI is green on the correct path, lint is real, and coverage includes auth/csrf/orchestrator/health/hermes-ops.
Hermes routes require auth and remain private-only; redact policies are decided and documented.
Dashboard warnings reach the correct Telegram chat per instance.

Implementation Status Checklist

Update only with evidence (source review, tests, build output, or browser/VM verification).

Phase 0 — Guardrails reconfirmed
Phase 1 — hermes-ops hardened + tested
Phase 2 — Instance dimension + switcher
Phase 3 — Real telemetry ingestion + panes converted
Phase 4 — Bheem/Uma parity (backup, watchdog, restore drill)
Phase 5 — App/CI hardening (P0/P1/P2 done; P2 follow-ups in DEPLOYMENT.md mitigation roadmap remain)
Phase 6 — UX polish
Phase 7 — Security & access
Phase 8 — Notifications & Telegram

Decisions (resolved 2026-05-30)

Task data source — derive from real artifacts now; defer the JSONL pipeline. Hermes' real unit of work is the session (+ cron jobs), and there's no evidence the agent emits a task-level JSONL ledger today. Build the ledger/activity views from what already exists — hermes sessions (+ stats), hermes cron list (+ last-run), watchdog alerts, and backup git history. Add a JSONL session/event pipeline → SQLite later and only if the session/cron view proves insufficient (via a gateway hook that appends records). Do not fabricate a task store.
Reading Bheem state — self-reporting ops exporter per instance. Each instance runs a tiny read-only exporter (Bheem as a uma user-systemd timer, Vijay symmetrically) that writes a sanitized JSON snapshot (booleans, counts, timestamps, short HEADs — no secrets) to a known path; the unified backend just reads and aggregates the two files. No cross-user command execution or reaching into /home/uma/.hermes. Interim stopgap until the exporter ships: replace the brittle ps/existsSync Uma checks with runuser -u uma -- systemctl --user is-active/is-enabled.
Products — repoint at the real service registry; drop the fabricated mock. The dashboard already has a live service registry (backend/src/modules/services/, with health). Back the Products pane with that real data instead of a 50-item fiction; allow optional manual entries later for not-yet-deployed products. Relabel clearly until the mapping lands.
Auth — reuse platform-service JWT, defense-in-depth. Put the hermes routes behind the same platform-service auth (backend/src/lib/auth.ts) the rest of the dashboard uses; keep the network private (Tailscale/loopback) as a second layer. No separate basic-auth gate (that's only the never-used "if forced public" path).
Bheem backup — same repo + Drive, but Uma-owned least-privilege token; Bheem self-pushes. Keep umadev0931/uma_hostinger_hermes_vm and Bheem Drive, but give Bheem its own repo-scoped credential so it backs itself up rather than depending on root's broad credential. Root stops pushing Uma's backup; this also closes the standing GitHub least-privilege audit item.

Suggested Execution Order

Phase 5 P0 (CI path + lint) — unblocks everything.
Phase 1 (harden hermes-ops) — the foundation the real UI sits on.
Phase 2 (instance dimension) + Phase 4 (Bheem parity) in parallel — make "two instances" first-class in both data and ops.
Phase 3 (real telemetry, pane by pane).
Phase 7 (auth) before any wider access; Phase 8 (Telegram) and Phase 6 (polish) last.

Each item is sized to land as a single PR with incremental commits to main.

21 KiB Raw Blame History Unescape Escape