bytelyst-devops-tools/docs/prompts/phase8-telegram-loop.md
Hermes VM a8cf61a281 docs: Phase 8 — Telegram convention + delegation brief
Closes the Phase 8 line that's actually a docs/codebase change. The
other two Phase 8 items are VM-ops work (bot tokens + watchdog
extensions) and live as a delegation brief.

What's in this repo
  - `docs/hermes-operations.md` gains a "Telegram Notification
    Convention" section codifying:
      * routing per instance (Vijay → root chat, Bheem → Uma chat,
        cross-cutting → root)
      * silent-on-healthy + post-on-recovery
      * the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and
        why it survives Telegram client rendering
      * approval-prompt UI expectation
      * "don't paste secrets" pointer back to `lib/logger.ts`'s
        redaction path-list
  - `docs/prompts/phase8-telegram-loop.md` — full delegation brief
    for the VM-side implementation. Design: dashboard backend writes
    new warnings (with `instance=<id>` tag, deduped over 1h) to an
    append-only log; both watchdogs tail it and route through the
    existing Telegram delivery path. Avoids splitting the delivery
    code into two places that would each need rate-limit + token-
    rotation handling. Brief is gated on Phase 4 — Uma's watchdog
    must exist first.
  - Roadmap Phase 8 ticked for "preserve numbered-emoji convention"
    (codified in operations doc); the other two items have notes
    pointing at the brief.

Phase 8 doesn't fully close in this repo because the delivery loop
needs real bot tokens and the Phase 4 Uma watchdog before it can be
end-to-end validated. The codebase's contribution is everything that
doesn't need a token: the convention, the design, and the delegation
brief.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 08:05:52 +00:00

5.8 KiB
Raw Blame History

Delegation Brief — Phase 8: Telegram notification loop

Self-contained task brief. Mostly VM ops + bot-token configuration, with two small backend-side hooks. The dashboard has already done its half of Phase 8 — see docs/hermes-operations.md "Telegram Notification Convention".

Related: docs/hermes_dashboard_v2_roadmap.md (Phase 8), docs/hermes-operations.md, scripts/hermes-health-watchdog.py, docs/prompts/phase4-bheem-uma-parity.md (Bheem watchdog needs to exist first).


ROLE: Operator with sudo on the Hostinger VM and admin access to both Telegram bots (root + Uma).

OBJECTIVE: Close the loop between dashboard-detected warnings and the existing watchdog Telegram delivery path so that:

  1. New warnings that the dashboard surfaces in getHermesOpsSnapshot() (and in the per-instance telemetry endpoint) reach the right Telegram chat (Vijay → root; Bheem → Uma).
  2. Approval-required actions (currently only the watchdog uses these) work end-to-end including media/file delivery — these are the two unchecked items left over from Hermes v1.
  3. The numbered-emoji progress convention is preserved.

PREREQUISITES:

  • Phase 4 (Bheem/Uma parity) must be complete so Uma has its own watchdog + bot. Without Uma's bot, Bheem warnings have nowhere to go. Don't start Phase 8 until the Phase 4 brief signs off.
  • Root + Uma watchdog scripts already deliver to Telegram successfully on a manually-broken probe. Confirm before proceeding.

DESIGN (least-invasive, no new long-lived service):

The dashboard does NOT open its own Telegram connection. Instead, the backend writes new dashboard-detected warnings to a small append-only log that the existing watchdog tails.

  • Path: /var/log/hermes-dashboard-warnings.log (root-writeable; world- readable so both watchdogs can tail).
  • Format: one line per warning, RFC3339-ish timestamp + severity token + message — same shape as hermes-health-watchdog.log. Reuse the parser in backend/src/modules/hermes-telemetry/repository.ts:WATCHDOG_LINE.
  • Routing: each line carries an explicit instance=<vijay|bheem|all> tag so the watchdog knows which Telegram bot to use. instance=all posts to both chats (cross-cutting).
  • De-dup: the dashboard backend keeps a 1h in-memory hash of recent warnings and only appends each one once. Restart resets the hash — that's fine; an alert reappearing post-restart is signal, not noise.

TASKS:

  1. Backend hook (small):

    • Add lib/dashboard-alerts.ts that exposes appendDashboardWarning({ severity, instance, message }). Internals: append + dedup hash. Tests should mock fs.promises.appendFile.
    • Wire it into getHermesOpsSnapshot() so each new warning in snapshot.warnings (only the ones not in the dedup hash) is written out. Same wiring on getHermesTelemetrySnapshot() for warnings and watchdog.alerts of severity critical.
    • Gate the file-write behind an env flag HERMES_DASHBOARD_ALERT_LOG pointing at the path so dev/CI doesn't try to write to /var/log.
    • Unit-test: appendDashboardWarning de-dups within the window, expires, and writes the right line format. Add to the coverage gate.
  2. Watchdog tail-extension (VM ops):

    • Modify both watchdog scripts (root + Uma's mirror) to ALSO tail the new dashboard-warnings log. Filter by instance= tag — root's watchdog only acts on instance=vijay or instance=all; Uma's only on instance=bheem or instance=all.
    • Forward each parsed line into the existing Telegram delivery (same format / same numbered-emoji convention). Silent on no-new-lines.
  3. Approval-prompt + media validation (the two unchecked v1 items):

    • Pick a Telegram approval-required action that already exists in the watchdog (e.g. "restart degraded gateway"). Confirm the inline / buttons land, the callback hits the watchdog, and the action runs.
    • Confirm the watchdog can deliver a small file (e.g. last 200 lines of a log) when an alert says "investigate", and that the file lands as a Telegram document, not a truncated message.
    • Document both flows in docs/hermes-operations.md under "Telegram Notification Convention" so anyone reading it knows what's wired.
  4. End-to-end test:

    • From the dashboard, trigger a transient warning (e.g. stop a non- critical timer for 30 seconds).
    • Confirm the right Telegram chat receives one alert (numbered-emoji formatted) and one recovery message when the timer comes back. The OTHER chat must stay silent.
    • Repeat for Bheem.
    • Repeat with instance=all (cross-cutting) and confirm BOTH chats receive the alert.

GUARDRAILS:

  • Bot tokens never go in repo files. They live in ~<user>/.config/hermes/ telegram mode 600, owned by the right user.
  • The dashboard backend only WRITES to the alert log. It must NOT call Telegram directly (that would split the delivery path and create two places where rate limits / token rotation matter).
  • Don't emit chatty health pings — silent-on-success is the rule.
  • Numbered-emoji convention is mandatory for completion-update messages (1, 2, …); see docs/hermes-operations.md.

REPORTING: When finished, report:

  • Diff of lib/dashboard-alerts.ts and the wiring sites.
  • Output of tail -20 /var/log/hermes-dashboard-warnings.log after the end-to-end test.
  • Screenshots / chat-export of the test alerts in both Telegram chats (sanitized).
  • Updated docs/hermes-operations.md "Telegram Notification Convention" section with the wired approval + media flows.

DEFINITION OF DONE:

  • Dashboard-detected warnings reach the right Telegram chat per instance.
  • Recoveries reach the same chat.
  • Approval prompt + file delivery validated end-to-end.
  • Numbered-emoji convention preserved.
  • Operator (you) ticks the corresponding Phase 8 roadmap checkboxes.