bytelyst-devops-tools/docs/prompts/phase8-telegram-loop.md
Hermes VM a8cf61a281 docs: Phase 8 — Telegram convention + delegation brief
Closes the Phase 8 line that's actually a docs/codebase change. The
other two Phase 8 items are VM-ops work (bot tokens + watchdog
extensions) and live as a delegation brief.

What's in this repo
  - `docs/hermes-operations.md` gains a "Telegram Notification
    Convention" section codifying:
      * routing per instance (Vijay → root chat, Bheem → Uma chat,
        cross-cutting → root)
      * silent-on-healthy + post-on-recovery
      * the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and
        why it survives Telegram client rendering
      * approval-prompt UI expectation
      * "don't paste secrets" pointer back to `lib/logger.ts`'s
        redaction path-list
  - `docs/prompts/phase8-telegram-loop.md` — full delegation brief
    for the VM-side implementation. Design: dashboard backend writes
    new warnings (with `instance=<id>` tag, deduped over 1h) to an
    append-only log; both watchdogs tail it and route through the
    existing Telegram delivery path. Avoids splitting the delivery
    code into two places that would each need rate-limit + token-
    rotation handling. Brief is gated on Phase 4 — Uma's watchdog
    must exist first.
  - Roadmap Phase 8 ticked for "preserve numbered-emoji convention"
    (codified in operations doc); the other two items have notes
    pointing at the brief.

Phase 8 doesn't fully close in this repo because the delivery loop
needs real bot tokens and the Phase 4 Uma watchdog before it can be
end-to-end validated. The codebase's contribution is everything that
doesn't need a token: the convention, the design, and the delegation
brief.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 08:05:52 +00:00

122 lines
5.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Delegation Brief — Phase 8: Telegram notification loop
> Self-contained task brief. Mostly **VM ops + bot-token configuration**, with
> two small backend-side hooks. The dashboard has already done its half of
> Phase 8 — see `docs/hermes-operations.md` "Telegram Notification Convention".
>
> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 8),
> `docs/hermes-operations.md`, `scripts/hermes-health-watchdog.py`,
> `docs/prompts/phase4-bheem-uma-parity.md` (Bheem watchdog needs to exist
> first).
---
ROLE: Operator with sudo on the Hostinger VM and admin access to both Telegram
bots (root + Uma).
OBJECTIVE: Close the loop between dashboard-detected warnings and the
existing watchdog Telegram delivery path so that:
1. New warnings that the dashboard surfaces in `getHermesOpsSnapshot()` (and
in the per-instance telemetry endpoint) reach the right Telegram chat
(Vijay → root; Bheem → Uma).
2. Approval-required actions (currently only the watchdog uses these) work
end-to-end including media/file delivery — these are the two unchecked
items left over from Hermes v1.
3. The numbered-emoji progress convention is preserved.
PREREQUISITES:
- Phase 4 (Bheem/Uma parity) must be complete so Uma has its own watchdog +
bot. Without Uma's bot, Bheem warnings have nowhere to go. Don't start
Phase 8 until the Phase 4 brief signs off.
- Root + Uma watchdog scripts already deliver to Telegram successfully on a
manually-broken probe. Confirm before proceeding.
DESIGN (least-invasive, no new long-lived service):
The dashboard does NOT open its own Telegram connection. Instead, the
backend writes new dashboard-detected warnings to a small append-only log
that the existing watchdog tails.
- Path: `/var/log/hermes-dashboard-warnings.log` (root-writeable; world-
readable so both watchdogs can tail).
- Format: one line per warning, RFC3339-ish timestamp + severity token +
message — same shape as `hermes-health-watchdog.log`. Reuse the parser in
`backend/src/modules/hermes-telemetry/repository.ts:WATCHDOG_LINE`.
- Routing: each line carries an explicit `instance=<vijay|bheem|all>` tag
so the watchdog knows which Telegram bot to use. `instance=all` posts to
both chats (cross-cutting).
- De-dup: the dashboard backend keeps a 1h in-memory hash of recent
warnings and only appends each one once. Restart resets the hash — that's
fine; an alert reappearing post-restart is signal, not noise.
TASKS:
1. **Backend hook** (small):
- Add `lib/dashboard-alerts.ts` that exposes
`appendDashboardWarning({ severity, instance, message })`. Internals:
append + dedup hash. Tests should mock `fs.promises.appendFile`.
- Wire it into `getHermesOpsSnapshot()` so each new warning in
`snapshot.warnings` (only the ones not in the dedup hash) is written
out. Same wiring on `getHermesTelemetrySnapshot()` for `warnings` and
`watchdog.alerts` of severity `critical`.
- Gate the file-write behind an env flag `HERMES_DASHBOARD_ALERT_LOG`
pointing at the path so dev/CI doesn't try to write to `/var/log`.
- Unit-test: appendDashboardWarning de-dups within the window, expires,
and writes the right line format. Add to the coverage gate.
2. **Watchdog tail-extension** (VM ops):
- Modify both watchdog scripts (root + Uma's mirror) to ALSO tail the
new dashboard-warnings log. Filter by `instance=` tag — root's
watchdog only acts on `instance=vijay` or `instance=all`; Uma's only
on `instance=bheem` or `instance=all`.
- Forward each parsed line into the existing Telegram delivery (same
format / same numbered-emoji convention). Silent on no-new-lines.
3. **Approval-prompt + media validation** (the two unchecked v1 items):
- Pick a Telegram approval-required action that already exists in the
watchdog (e.g. "restart degraded gateway"). Confirm the inline ✅/❌
buttons land, the callback hits the watchdog, and the action runs.
- Confirm the watchdog can deliver a small file (e.g. last 200 lines of
a log) when an alert says "investigate", and that the file lands as
a Telegram document, not a truncated message.
- Document both flows in `docs/hermes-operations.md` under "Telegram
Notification Convention" so anyone reading it knows what's wired.
4. **End-to-end test**:
- From the dashboard, trigger a transient warning (e.g. stop a non-
critical timer for 30 seconds).
- Confirm the right Telegram chat receives one alert (numbered-emoji
formatted) and one recovery message when the timer comes back. The
OTHER chat must stay silent.
- Repeat for Bheem.
- Repeat with `instance=all` (cross-cutting) and confirm BOTH chats
receive the alert.
GUARDRAILS:
- Bot tokens never go in repo files. They live in `~<user>/.config/hermes/
telegram` mode `600`, owned by the right user.
- The dashboard backend only WRITES to the alert log. It must NOT call
Telegram directly (that would split the delivery path and create two
places where rate limits / token rotation matter).
- Don't emit chatty health pings — silent-on-success is the rule.
- Numbered-emoji convention is mandatory for completion-update messages
(`1⃣`, `2⃣`, …); see `docs/hermes-operations.md`.
REPORTING:
When finished, report:
- Diff of `lib/dashboard-alerts.ts` and the wiring sites.
- Output of `tail -20 /var/log/hermes-dashboard-warnings.log` after the
end-to-end test.
- Screenshots / chat-export of the test alerts in both Telegram chats
(sanitized).
- Updated `docs/hermes-operations.md` "Telegram Notification Convention"
section with the wired approval + media flows.
DEFINITION OF DONE:
- Dashboard-detected warnings reach the right Telegram chat per instance.
- Recoveries reach the same chat.
- Approval prompt + file delivery validated end-to-end.
- Numbered-emoji convention preserved.
- Operator (you) ticks the corresponding Phase 8 roadmap checkboxes.