bytelyst-devops-tools/docs/prompts/phase8-telegram-loop.md

# Delegation Brief — Phase 8: Telegram notification loop

> Self-contained task brief. Mostly **VM ops + bot-token configuration**, with
> two small backend-side hooks. The dashboard has already done its half of
> Phase 8 — see `docs/hermes-operations.md` "Telegram Notification Convention".
>
> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 8),
> `docs/hermes-operations.md`, `scripts/hermes-health-watchdog.py`,
> `docs/prompts/phase4-bheem-uma-parity.md` (Bheem watchdog needs to exist
> first).

---

ROLE: Operator with sudo on the Hostinger VM and admin access to both Telegram
bots (root + Uma).

OBJECTIVE: Close the loop between dashboard-detected warnings and the
existing watchdog Telegram delivery path so that:

1. New warnings that the dashboard surfaces in `getHermesOpsSnapshot()` (and
   in the per-instance telemetry endpoint) reach the right Telegram chat
   (Vijay → root; Bheem → Uma).
2. Approval-required actions (currently only the watchdog uses these) work
   end-to-end including media/file delivery — these are the two unchecked
   items left over from Hermes v1.
3. The numbered-emoji progress convention is preserved.

PREREQUISITES:
- Phase 4 (Bheem/Uma parity) must be complete so Uma has its own watchdog +
  bot. Without Uma's bot, Bheem warnings have nowhere to go. Don't start
  Phase 8 until the Phase 4 brief signs off.
- Root + Uma watchdog scripts already deliver to Telegram successfully on a
  manually-broken probe. Confirm before proceeding.

DESIGN (least-invasive, no new long-lived service):

The dashboard does NOT open its own Telegram connection. Instead, the
backend writes new dashboard-detected warnings to a small append-only log
that the existing watchdog tails.

- Path: `/var/log/hermes-dashboard-warnings.log` (root-writeable; world-
  readable so both watchdogs can tail).
- Format: one line per warning, RFC3339-ish timestamp + severity token +
  message — same shape as `hermes-health-watchdog.log`. Reuse the parser in
  `backend/src/modules/hermes-telemetry/repository.ts:WATCHDOG_LINE`.
- Routing: each line carries an explicit `instance=<vijay|bheem|all>` tag
  so the watchdog knows which Telegram bot to use. `instance=all` posts to
  both chats (cross-cutting).
- De-dup: the dashboard backend keeps a 1h in-memory hash of recent
  warnings and only appends each one once. Restart resets the hash — that's
  fine; an alert reappearing post-restart is signal, not noise.

TASKS:

1. **Backend hook** (small):
   - Add `lib/dashboard-alerts.ts` that exposes
     `appendDashboardWarning({ severity, instance, message })`. Internals:
     append + dedup hash. Tests should mock `fs.promises.appendFile`.
   - Wire it into `getHermesOpsSnapshot()` so each new warning in
     `snapshot.warnings` (only the ones not in the dedup hash) is written
     out. Same wiring on `getHermesTelemetrySnapshot()` for `warnings` and
     `watchdog.alerts` of severity `critical`.
   - Gate the file-write behind an env flag `HERMES_DASHBOARD_ALERT_LOG`
     pointing at the path so dev/CI doesn't try to write to `/var/log`.
   - Unit-test: appendDashboardWarning de-dups within the window, expires,
     and writes the right line format. Add to the coverage gate.

2. **Watchdog tail-extension** (VM ops):
   - Modify both watchdog scripts (root + Uma's mirror) to ALSO tail the
     new dashboard-warnings log. Filter by `instance=` tag — root's
     watchdog only acts on `instance=vijay` or `instance=all`; Uma's only
     on `instance=bheem` or `instance=all`.
   - Forward each parsed line into the existing Telegram delivery (same
     format / same numbered-emoji convention). Silent on no-new-lines.

3. **Approval-prompt + media validation** (the two unchecked v1 items):
   - Pick a Telegram approval-required action that already exists in the
     watchdog (e.g. "restart degraded gateway"). Confirm the inline ✅/❌
     buttons land, the callback hits the watchdog, and the action runs.
   - Confirm the watchdog can deliver a small file (e.g. last 200 lines of
     a log) when an alert says "investigate", and that the file lands as
     a Telegram document, not a truncated message.
   - Document both flows in `docs/hermes-operations.md` under "Telegram
     Notification Convention" so anyone reading it knows what's wired.

4. **End-to-end test**:
   - From the dashboard, trigger a transient warning (e.g. stop a non-
     critical timer for 30 seconds).
   - Confirm the right Telegram chat receives one alert (numbered-emoji
     formatted) and one recovery message when the timer comes back. The
     OTHER chat must stay silent.
   - Repeat for Bheem.
   - Repeat with `instance=all` (cross-cutting) and confirm BOTH chats
     receive the alert.

GUARDRAILS:
- Bot tokens never go in repo files. They live in `~<user>/.config/hermes/
  telegram` mode `600`, owned by the right user.
- The dashboard backend only WRITES to the alert log. It must NOT call
  Telegram directly (that would split the delivery path and create two
  places where rate limits / token rotation matter).
- Don't emit chatty health pings — silent-on-success is the rule.
- Numbered-emoji convention is mandatory for completion-update messages
  (`1️⃣`, `2️⃣`, …); see `docs/hermes-operations.md`.

REPORTING:
When finished, report:
- Diff of `lib/dashboard-alerts.ts` and the wiring sites.
- Output of `tail -20 /var/log/hermes-dashboard-warnings.log` after the
  end-to-end test.
- Screenshots / chat-export of the test alerts in both Telegram chats
  (sanitized).
- Updated `docs/hermes-operations.md` "Telegram Notification Convention"
  section with the wired approval + media flows.

DEFINITION OF DONE:
- Dashboard-detected warnings reach the right Telegram chat per instance.
- Recoveries reach the same chat.
- Approval prompt + file delivery validated end-to-end.
- Numbered-emoji convention preserved.
- Operator (you) ticks the corresponding Phase 8 roadmap checkboxes.