Closes the Phase 8 line that's actually a docs/codebase change. The
other two Phase 8 items are VM-ops work (bot tokens + watchdog
extensions) and live as a delegation brief.
What's in this repo
- `docs/hermes-operations.md` gains a "Telegram Notification
Convention" section codifying:
* routing per instance (Vijay → root chat, Bheem → Uma chat,
cross-cutting → root)
* silent-on-healthy + post-on-recovery
* the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and
why it survives Telegram client rendering
* approval-prompt UI expectation
* "don't paste secrets" pointer back to `lib/logger.ts`'s
redaction path-list
- `docs/prompts/phase8-telegram-loop.md` — full delegation brief
for the VM-side implementation. Design: dashboard backend writes
new warnings (with `instance=<id>` tag, deduped over 1h) to an
append-only log; both watchdogs tail it and route through the
existing Telegram delivery path. Avoids splitting the delivery
code into two places that would each need rate-limit + token-
rotation handling. Brief is gated on Phase 4 — Uma's watchdog
must exist first.
- Roadmap Phase 8 ticked for "preserve numbered-emoji convention"
(codified in operations doc); the other two items have notes
pointing at the brief.
Phase 8 doesn't fully close in this repo because the delivery loop
needs real bot tokens and the Phase 4 Uma watchdog before it can be
end-to-end validated. The codebase's contribution is everything that
doesn't need a token: the convention, the design, and the delegation
brief.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
122 lines
5.8 KiB
Markdown
122 lines
5.8 KiB
Markdown
# Delegation Brief — Phase 8: Telegram notification loop
|
||
|
||
> Self-contained task brief. Mostly **VM ops + bot-token configuration**, with
|
||
> two small backend-side hooks. The dashboard has already done its half of
|
||
> Phase 8 — see `docs/hermes-operations.md` "Telegram Notification Convention".
|
||
>
|
||
> Related: `docs/hermes_dashboard_v2_roadmap.md` (Phase 8),
|
||
> `docs/hermes-operations.md`, `scripts/hermes-health-watchdog.py`,
|
||
> `docs/prompts/phase4-bheem-uma-parity.md` (Bheem watchdog needs to exist
|
||
> first).
|
||
|
||
---
|
||
|
||
ROLE: Operator with sudo on the Hostinger VM and admin access to both Telegram
|
||
bots (root + Uma).
|
||
|
||
OBJECTIVE: Close the loop between dashboard-detected warnings and the
|
||
existing watchdog Telegram delivery path so that:
|
||
|
||
1. New warnings that the dashboard surfaces in `getHermesOpsSnapshot()` (and
|
||
in the per-instance telemetry endpoint) reach the right Telegram chat
|
||
(Vijay → root; Bheem → Uma).
|
||
2. Approval-required actions (currently only the watchdog uses these) work
|
||
end-to-end including media/file delivery — these are the two unchecked
|
||
items left over from Hermes v1.
|
||
3. The numbered-emoji progress convention is preserved.
|
||
|
||
PREREQUISITES:
|
||
- Phase 4 (Bheem/Uma parity) must be complete so Uma has its own watchdog +
|
||
bot. Without Uma's bot, Bheem warnings have nowhere to go. Don't start
|
||
Phase 8 until the Phase 4 brief signs off.
|
||
- Root + Uma watchdog scripts already deliver to Telegram successfully on a
|
||
manually-broken probe. Confirm before proceeding.
|
||
|
||
DESIGN (least-invasive, no new long-lived service):
|
||
|
||
The dashboard does NOT open its own Telegram connection. Instead, the
|
||
backend writes new dashboard-detected warnings to a small append-only log
|
||
that the existing watchdog tails.
|
||
|
||
- Path: `/var/log/hermes-dashboard-warnings.log` (root-writeable; world-
|
||
readable so both watchdogs can tail).
|
||
- Format: one line per warning, RFC3339-ish timestamp + severity token +
|
||
message — same shape as `hermes-health-watchdog.log`. Reuse the parser in
|
||
`backend/src/modules/hermes-telemetry/repository.ts:WATCHDOG_LINE`.
|
||
- Routing: each line carries an explicit `instance=<vijay|bheem|all>` tag
|
||
so the watchdog knows which Telegram bot to use. `instance=all` posts to
|
||
both chats (cross-cutting).
|
||
- De-dup: the dashboard backend keeps a 1h in-memory hash of recent
|
||
warnings and only appends each one once. Restart resets the hash — that's
|
||
fine; an alert reappearing post-restart is signal, not noise.
|
||
|
||
TASKS:
|
||
|
||
1. **Backend hook** (small):
|
||
- Add `lib/dashboard-alerts.ts` that exposes
|
||
`appendDashboardWarning({ severity, instance, message })`. Internals:
|
||
append + dedup hash. Tests should mock `fs.promises.appendFile`.
|
||
- Wire it into `getHermesOpsSnapshot()` so each new warning in
|
||
`snapshot.warnings` (only the ones not in the dedup hash) is written
|
||
out. Same wiring on `getHermesTelemetrySnapshot()` for `warnings` and
|
||
`watchdog.alerts` of severity `critical`.
|
||
- Gate the file-write behind an env flag `HERMES_DASHBOARD_ALERT_LOG`
|
||
pointing at the path so dev/CI doesn't try to write to `/var/log`.
|
||
- Unit-test: appendDashboardWarning de-dups within the window, expires,
|
||
and writes the right line format. Add to the coverage gate.
|
||
|
||
2. **Watchdog tail-extension** (VM ops):
|
||
- Modify both watchdog scripts (root + Uma's mirror) to ALSO tail the
|
||
new dashboard-warnings log. Filter by `instance=` tag — root's
|
||
watchdog only acts on `instance=vijay` or `instance=all`; Uma's only
|
||
on `instance=bheem` or `instance=all`.
|
||
- Forward each parsed line into the existing Telegram delivery (same
|
||
format / same numbered-emoji convention). Silent on no-new-lines.
|
||
|
||
3. **Approval-prompt + media validation** (the two unchecked v1 items):
|
||
- Pick a Telegram approval-required action that already exists in the
|
||
watchdog (e.g. "restart degraded gateway"). Confirm the inline ✅/❌
|
||
buttons land, the callback hits the watchdog, and the action runs.
|
||
- Confirm the watchdog can deliver a small file (e.g. last 200 lines of
|
||
a log) when an alert says "investigate", and that the file lands as
|
||
a Telegram document, not a truncated message.
|
||
- Document both flows in `docs/hermes-operations.md` under "Telegram
|
||
Notification Convention" so anyone reading it knows what's wired.
|
||
|
||
4. **End-to-end test**:
|
||
- From the dashboard, trigger a transient warning (e.g. stop a non-
|
||
critical timer for 30 seconds).
|
||
- Confirm the right Telegram chat receives one alert (numbered-emoji
|
||
formatted) and one recovery message when the timer comes back. The
|
||
OTHER chat must stay silent.
|
||
- Repeat for Bheem.
|
||
- Repeat with `instance=all` (cross-cutting) and confirm BOTH chats
|
||
receive the alert.
|
||
|
||
GUARDRAILS:
|
||
- Bot tokens never go in repo files. They live in `~<user>/.config/hermes/
|
||
telegram` mode `600`, owned by the right user.
|
||
- The dashboard backend only WRITES to the alert log. It must NOT call
|
||
Telegram directly (that would split the delivery path and create two
|
||
places where rate limits / token rotation matter).
|
||
- Don't emit chatty health pings — silent-on-success is the rule.
|
||
- Numbered-emoji convention is mandatory for completion-update messages
|
||
(`1️⃣`, `2️⃣`, …); see `docs/hermes-operations.md`.
|
||
|
||
REPORTING:
|
||
When finished, report:
|
||
- Diff of `lib/dashboard-alerts.ts` and the wiring sites.
|
||
- Output of `tail -20 /var/log/hermes-dashboard-warnings.log` after the
|
||
end-to-end test.
|
||
- Screenshots / chat-export of the test alerts in both Telegram chats
|
||
(sanitized).
|
||
- Updated `docs/hermes-operations.md` "Telegram Notification Convention"
|
||
section with the wired approval + media flows.
|
||
|
||
DEFINITION OF DONE:
|
||
- Dashboard-detected warnings reach the right Telegram chat per instance.
|
||
- Recoveries reach the same chat.
|
||
- Approval prompt + file delivery validated end-to-end.
|
||
- Numbered-emoji convention preserved.
|
||
- Operator (you) ticks the corresponding Phase 8 roadmap checkboxes.
|