docs: Phase 8 — Telegram convention + delegation brief

Closes the Phase 8 line that's actually a docs/codebase change. The other two Phase 8 items are VM-ops work (bot tokens + watchdog extensions) and live as a delegation brief. What's in this repo - `docs/hermes-operations.md` gains a "Telegram Notification Convention" section codifying: * routing per instance (Vijay → root chat, Bheem → Uma chat, cross-cutting → root) * silent-on-healthy + post-on-recovery * the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and why it survives Telegram client rendering * approval-prompt UI expectation * "don't paste secrets" pointer back to `lib/logger.ts`'s redaction path-list - `docs/prompts/phase8-telegram-loop.md` — full delegation brief for the VM-side implementation. Design: dashboard backend writes new warnings (with `instance=<id>` tag, deduped over 1h) to an append-only log; both watchdogs tail it and route through the existing Telegram delivery path. Avoids splitting the delivery code into two places that would each need rate-limit + token- rotation handling. Brief is gated on Phase 4 — Uma's watchdog must exist first. - Roadmap Phase 8 ticked for "preserve numbered-emoji convention" (codified in operations doc); the other two items have notes pointing at the brief. Phase 8 doesn't fully close in this repo because the delivery loop needs real bot tokens and the Phase 4 Uma watchdog before it can be end-to-end validated. The codebase's contribution is everything that doesn't need a token: the convention, the design, and the delegation brief. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 08:05:52 +00:00 · 2026-05-30 08:05:52 +00:00 · a8cf61a281
commit a8cf61a281
parent 14c7a8f59a
3 changed files with 165 additions and 4 deletions
--- a/docs/hermes-operations.md
+++ b/docs/hermes-operations.md
@ -392,3 +392,41 @@ Safe sequence:
 5. Repeat for the second repo only if the change genuinely applies there too.

 Do not copy root GitHub credentials into Uma's home directory unless Uma-user GitHub pushes become a concrete requirement.
+
+## Telegram Notification Convention
+
+Phase 8 of the dashboard roadmap (and the watchdog scripts that ship Telegram
+alerts today) follow a small set of conventions worth keeping consistent.
+
+**Routing per instance**
+- Vijay (root) alerts go to the root Telegram chat.
+- Bheem (uma) alerts go to Uma's Telegram chat.
+- Cross-cutting alerts (e.g. "the dashboard itself is unreachable") go to the
+  root chat — root is the operator account.
+
+**Silent on healthy**
+- Watchdog scripts and (in future) the dashboard's own Telegram hook **only**
+  post when something is wrong. A green poll is a no-op.
+- Recoveries ARE a Telegram event (one line: "back to healthy") so the chat
+  history reflects the full incident lifecycle.
+
+**Numbered-emoji progress convention**
+- When a multi-step operation is being narrated to Telegram, prefix each step
+  with the corresponding numbered emoji: `1️⃣`, `2️⃣`, `3️⃣`, … up to `🔟`.
+- This survives copy-paste across clients (unlike `1.`, which Telegram tends
+  to render inconsistently in dark mode) and makes the chat scannable.
+- The watchdog scripts already emit completion updates this way; any
+  dashboard-originated message that runs through the same delivery path
+  should match.
+
+**Approval prompts**
+- Approval-required actions still land in Telegram with two inline buttons
+  (✅ approve / ❌ deny). The dashboard does not yet trigger these — see the
+  Phase 8 delegation brief in `docs/prompts/phase8-telegram-loop.md` for the
+  design that closes the loop end-to-end.
+
+**Don't paste secrets**
+- Bot tokens and chat IDs live in `~<user>/.config/hermes/telegram` mode `600`,
+  never in repo files. The dashboard's `lib/logger.ts` redacts
+  `Authorization` / `Cookie` / `*.token` paths from any logged object so an
+  accidental `req.log.info({ tg })` won't dump credentials.
--- a/docs/hermes_dashboard_v2_roadmap.md
+++ b/docs/hermes_dashboard_v2_roadmap.md
@ -148,9 +148,11 @@ This is the biggest operational asymmetry and the reason half the ops-panel warn

 ## Phase 8 — Notifications & Telegram loop (G9)

- [ ] Push new dashboard-detected warnings to the correct Telegram (Vijay → root chat, Bheem → Uma chat), reusing the watchdog delivery path; silent on healthy.
- [ ] Validate the Telegram approval-prompt flow and media/file delivery end-to-end (the two unchecked v1 items).
- [ ] Preserve the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) for completion updates.
+> **Mostly VM ops + bot-token configuration**, with two small backend hooks. Full delegation brief in [`docs/prompts/phase8-telegram-loop.md`](./prompts/phase8-telegram-loop.md). The dashboard's documentation half is already done — see `docs/hermes-operations.md` "Telegram Notification Convention".
+
+- [ ] Push new dashboard-detected warnings to the correct Telegram (Vijay → root chat, Bheem → Uma chat), reusing the watchdog delivery path; silent on healthy. *(Design captured in the brief: `lib/dashboard-alerts.ts` writes new warnings to a tag-prefixed log; both watchdogs tail it. Implementation gated on Phase 4 (Uma watchdog must exist first) and on bot tokens.)*
+- [ ] Validate the Telegram approval-prompt flow and media/file delivery end-to-end (the two unchecked v1 items). *(Brief item 3.)*
+- [x] Preserve the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) for completion updates. *(Codified in `docs/hermes-operations.md` under a new "Telegram Notification Convention" section, alongside the routing-per-instance, silent-on-healthy, and never-paste-secrets rules. The brief references this as the source of truth so VM-side implementers stay consistent.)*

 ---

@ -194,7 +196,7 @@ Update only with evidence (source review, tests, build output, or browser/VM ver
 - [x] Phase 5 — App/CI hardening (P0/P1/P2 done; P2 follow-ups in DEPLOYMENT.md mitigation roadmap remain)
 - [x] Phase 6 — UX polish (severity tags + deep links + per-instance actions; trend cards + theme toggle deferred)
 - [x] Phase 7 — Security & access (auth on hermes routes + privacy stance documented; redact_secrets/redact_pii decision deferred)
- [ ] Phase 8 — Notifications & Telegram
+- [ ] Phase 8 — Notifications & Telegram (convention codified; delivery loop is VM ops, see brief)

 ## Decisions (resolved 2026-05-30)

--- a/docs/prompts/phase8-telegram-loop.md
+++ b/docs/prompts/phase8-telegram-loop.md
@ -0,0 +1,121 @@
+# Delegation Brief — Phase 8: Telegram notification loop
+
+> Self-contained task brief. Mostly **VM ops + bot-token configuration**, with
+> two small backend-side hooks. The dashboard has already done its half of
+> Phase 8 — see `docs/hermes-operations.md` "Telegram Notification Convention".
+>
+> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 8),
+> `docs/hermes-operations.md`, `scripts/hermes-health-watchdog.py`,
+> `docs/prompts/phase4-bheem-uma-parity.md` (Bheem watchdog needs to exist
+> first).
+
+---
+
+ROLE: Operator with sudo on the Hostinger VM and admin access to both Telegram
+bots (root + Uma).
+
+OBJECTIVE: Close the loop between dashboard-detected warnings and the
+existing watchdog Telegram delivery path so that:
+
+1. New warnings that the dashboard surfaces in `getHermesOpsSnapshot()` (and
+   in the per-instance telemetry endpoint) reach the right Telegram chat
+   (Vijay → root; Bheem → Uma).
+2. Approval-required actions (currently only the watchdog uses these) work
+   end-to-end including media/file delivery — these are the two unchecked
+   items left over from Hermes v1.
+3. The numbered-emoji progress convention is preserved.
+
+PREREQUISITES:
+- Phase 4 (Bheem/Uma parity) must be complete so Uma has its own watchdog +
+  bot. Without Uma's bot, Bheem warnings have nowhere to go. Don't start
+  Phase 8 until the Phase 4 brief signs off.
+- Root + Uma watchdog scripts already deliver to Telegram successfully on a
+  manually-broken probe. Confirm before proceeding.
+
+DESIGN (least-invasive, no new long-lived service):
+
+The dashboard does NOT open its own Telegram connection. Instead, the
+backend writes new dashboard-detected warnings to a small append-only log
+that the existing watchdog tails.
+
+- Path: `/var/log/hermes-dashboard-warnings.log` (root-writeable; world-
+  readable so both watchdogs can tail).
+- Format: one line per warning, RFC3339-ish timestamp + severity token +
+  message — same shape as `hermes-health-watchdog.log`. Reuse the parser in
+  `backend/src/modules/hermes-telemetry/repository.ts:WATCHDOG_LINE`.
+- Routing: each line carries an explicit `instance=<vijay|bheem|all>` tag
+  so the watchdog knows which Telegram bot to use. `instance=all` posts to
+  both chats (cross-cutting).
+- De-dup: the dashboard backend keeps a 1h in-memory hash of recent
+  warnings and only appends each one once. Restart resets the hash — that's
+  fine; an alert reappearing post-restart is signal, not noise.
+
+TASKS:
+
+1. **Backend hook** (small):
+   - Add `lib/dashboard-alerts.ts` that exposes
+     `appendDashboardWarning({ severity, instance, message })`. Internals:
+     append + dedup hash. Tests should mock `fs.promises.appendFile`.
+   - Wire it into `getHermesOpsSnapshot()` so each new warning in
+     `snapshot.warnings` (only the ones not in the dedup hash) is written
+     out. Same wiring on `getHermesTelemetrySnapshot()` for `warnings` and
+     `watchdog.alerts` of severity `critical`.
+   - Gate the file-write behind an env flag `HERMES_DASHBOARD_ALERT_LOG`
+     pointing at the path so dev/CI doesn't try to write to `/var/log`.
+   - Unit-test: appendDashboardWarning de-dups within the window, expires,
+     and writes the right line format. Add to the coverage gate.
+
+2. **Watchdog tail-extension** (VM ops):
+   - Modify both watchdog scripts (root + Uma's mirror) to ALSO tail the
+     new dashboard-warnings log. Filter by `instance=` tag — root's
+     watchdog only acts on `instance=vijay` or `instance=all`; Uma's only
+     on `instance=bheem` or `instance=all`.
+   - Forward each parsed line into the existing Telegram delivery (same
+     format / same numbered-emoji convention). Silent on no-new-lines.
+
+3. **Approval-prompt + media validation** (the two unchecked v1 items):
+   - Pick a Telegram approval-required action that already exists in the
+     watchdog (e.g. "restart degraded gateway"). Confirm the inline ✅/❌
+     buttons land, the callback hits the watchdog, and the action runs.
+   - Confirm the watchdog can deliver a small file (e.g. last 200 lines of
+     a log) when an alert says "investigate", and that the file lands as
+     a Telegram document, not a truncated message.
+   - Document both flows in `docs/hermes-operations.md` under "Telegram
+     Notification Convention" so anyone reading it knows what's wired.
+
+4. **End-to-end test**:
+   - From the dashboard, trigger a transient warning (e.g. stop a non-
+     critical timer for 30 seconds).
+   - Confirm the right Telegram chat receives one alert (numbered-emoji
+     formatted) and one recovery message when the timer comes back. The
+     OTHER chat must stay silent.
+   - Repeat for Bheem.
+   - Repeat with `instance=all` (cross-cutting) and confirm BOTH chats
+     receive the alert.
+
+GUARDRAILS:
+- Bot tokens never go in repo files. They live in `~<user>/.config/hermes/
+  telegram` mode `600`, owned by the right user.
+- The dashboard backend only WRITES to the alert log. It must NOT call
+  Telegram directly (that would split the delivery path and create two
+  places where rate limits / token rotation matter).
+- Don't emit chatty health pings — silent-on-success is the rule.
+- Numbered-emoji convention is mandatory for completion-update messages
+  (`1️⃣`, `2️⃣`, …); see `docs/hermes-operations.md`.
+
+REPORTING:
+When finished, report:
+- Diff of `lib/dashboard-alerts.ts` and the wiring sites.
+- Output of `tail -20 /var/log/hermes-dashboard-warnings.log` after the
+  end-to-end test.
+- Screenshots / chat-export of the test alerts in both Telegram chats
+  (sanitized).
+- Updated `docs/hermes-operations.md` "Telegram Notification Convention"
+  section with the wired approval + media flows.
+
+DEFINITION OF DONE:
+- Dashboard-detected warnings reach the right Telegram chat per instance.
+- Recoveries reach the same chat.
+- Approval prompt + file delivery validated end-to-end.
+- Numbered-emoji convention preserved.
+- Operator (you) ticks the corresponding Phase 8 roadmap checkboxes.