567 lines
23 KiB
Markdown
567 lines
23 KiB
Markdown
# ByteLyst Hermes Operations Runbook
|
||
|
||
Operational runbook for the private Telegram-driven Hermes Agent setup on the ByteLyst VM.
|
||
|
||
## Current baseline
|
||
|
||
Observed on 2026-05-27:
|
||
|
||
- Hermes version: `v0.14.0 (2026.5.16)`
|
||
- Shared source checkout: `/usr/local/lib/hermes-agent` at upstream `0b6ace649` after the 2026-05-27 late upgrade pass
|
||
- Install path: `/usr/local/lib/hermes-agent`
|
||
- Active profile: `default`
|
||
- Primary provider: OpenAI Codex OAuth
|
||
- Root Telegram gateway: `hermes-gateway.service`, system service, enabled and running
|
||
- Uma Telegram gateway: `uma-hermes-gateway.service`, user service for `uma`, enabled and running
|
||
- Root and Uma default model: `gpt-5.5`, `model.routing.enabled: false`
|
||
- Shared local fallback chain via Ollama on demand:
|
||
- `qwen2.5-coder:1.5b`
|
||
- `llama3.2:1b`
|
||
- `llama3.2-vision`
|
||
- These local fallbacks are loaded on demand and answer within the gateway's retry budget on this VM; the larger 3B/7B models were observed to be too slow for the live fallback path here.
|
||
- Live Hermes session-switch proof: root and Uma both fail over from a forced primary-provider error into the local Ollama chain and return `FallbackTest`.
|
||
- Telegram platform-context proof: the same fallback behavior passes when Hermes runs with `HERMES_PLATFORM=telegram` for both root and Uma. This is platform-context proof, not a separately replayed inbound Telegram network message.
|
||
- Web backend target: Firecrawl, configured locally on root and Uma with a private API key
|
||
- Browser automation: enabled on both Hermes gateways; root was smoke-tested privately against `https://example.com`
|
||
- Backup cron: `Sync Hermes persistent-data backup to GitHub`, every 30 minutes, local delivery
|
||
- Systemd persistent backup timers: `hermes-root-backup.timer` and `uma-hermes-backup.timer`, every 10 minutes
|
||
- Watchdog cron: `ByteLyst Hermes gateway/backup/disk watchdog`, every 15 minutes, Telegram delivery on failure only
|
||
- Dashboard policy: do not expose Hermes dashboard/API publicly without explicit approval
|
||
- Tailscale: installed and `tailscaled` enabled/running; authenticated as tailnet IP `100.87.53.10`
|
||
- Private dashboards:
|
||
- Root: `http://100.87.53.10:9119/`, `hermes-root-dashboard.service`
|
||
- Uma: `http://100.87.53.10:9120/`, `uma-hermes-dashboard.service`
|
||
- Live ops panel shows gateway state, active sessions, refresh delta, cron state, backup freshness, sanitized alerts, and runbook links for both instances.
|
||
|
||
## Safety guardrail: no public Hermes dashboard/API
|
||
|
||
Before adding any new Caddy hostname, Docker port, or dashboard/API feature, verify that it is not a Hermes dashboard/API public exposure.
|
||
|
||
Session privacy policy for dashboard/telemetry surfaces:
|
||
|
||
- Treat gateway session content as private by default for both Vijay and Bheem.
|
||
- Dashboard routes may show counts, statuses, timestamps, IDs, sanitized warning
|
||
messages, cron names, skill/memory names, and backup commit subjects.
|
||
- Dashboard telemetry may show sanitized session JSONL event projections:
|
||
event type, role, timestamp, source filename, tool names, item types, and
|
||
status. Raw message content remains redacted before it reaches the UI.
|
||
- Dashboard routes must not expose raw prompts, full session transcripts, raw
|
||
command output containing secrets, `.env` values, OAuth payloads, raw
|
||
`state.db`, Telegram tokens, provider keys, or personal message content.
|
||
- If a future session-event pipeline is added, enable secret and PII redaction
|
||
at ingestion time and store only the redacted event projection used by the UI.
|
||
|
||
```bash
|
||
# Inspect public Caddy routes and obvious Hermes/API/dashboard references.
|
||
docker ps --format '{{.Names}} {{.Ports}}' | grep -i caddy || true
|
||
grep -RniE 'hermes|dashboard|api-server|API_SERVER|8000|8080|3000|5173' /etc/caddy /root/bytelyst.ai 2>/dev/null | head -100
|
||
|
||
# Inspect listening ports. Review any 0.0.0.0 listeners before exposing a hostname.
|
||
ss -ltnp
|
||
```
|
||
|
||
Allowed private access patterns for a future Hermes dashboard:
|
||
|
||
1. local-only binding (`127.0.0.1`)
|
||
2. SSH tunnel
|
||
3. Tailscale/WireGuard private network
|
||
4. Cloudflare Access or equivalent identity gate
|
||
5. basic auth plus IP allowlist only if public routing is unavoidable and explicitly approved
|
||
|
||
Current private network access:
|
||
|
||
```bash
|
||
tailscale status
|
||
tailscale ip -4
|
||
# Expected server IPv4: 100.87.53.10
|
||
```
|
||
|
||
Private dashboard services:
|
||
|
||
```bash
|
||
systemctl status hermes-root-dashboard --no-pager
|
||
systemctl status uma-hermes-dashboard --no-pager
|
||
ss -ltnp | grep -E ':(9119|9120)'
|
||
|
||
# Expected listeners are Tailscale-only:
|
||
# 100.87.53.10:9119
|
||
# 100.87.53.10:9120
|
||
```
|
||
|
||
Tracked service unit templates:
|
||
|
||
```bash
|
||
systemd/hermes-gateway.service
|
||
systemd/uma-hermes-gateway.service
|
||
systemd/hermes-root-dashboard.service
|
||
systemd/uma-hermes-dashboard.service
|
||
systemd/hermes-root-backup.service
|
||
systemd/hermes-root-backup.timer
|
||
systemd/uma-hermes-backup.service
|
||
systemd/uma-hermes-backup.timer
|
||
systemd/hermes-health-watchdog.service
|
||
systemd/hermes-health-watchdog.timer
|
||
systemd/uma-hermes-health-watchdog.service
|
||
systemd/uma-hermes-health-watchdog.timer
|
||
systemd/hermes-ops-exporter.service
|
||
systemd/hermes-ops-exporter.timer
|
||
systemd/uma-hermes-ops-exporter.service
|
||
systemd/uma-hermes-ops-exporter.timer
|
||
```
|
||
|
||
## Mission Control ops exporter
|
||
|
||
Mission Control can read a sanitized per-instance ops export before falling back
|
||
to live cross-user probes. This reduces brittle root-to-Uma inspection and keeps
|
||
the dashboard contract free of secrets or session content.
|
||
|
||
Tracked exporter:
|
||
|
||
```bash
|
||
scripts/hermes-ops-exporter.py
|
||
```
|
||
|
||
Output paths:
|
||
|
||
```text
|
||
/root/.hermes/ops-export.json
|
||
/home/uma/.hermes/ops-export.json
|
||
```
|
||
|
||
The JSON contains only service booleans/status, timer timestamps, short Git
|
||
metadata, restore counts, and whether a Google token file exists. It does not
|
||
include token values, raw `state.db`, logs, prompt/session text, OAuth payloads,
|
||
or environment files.
|
||
|
||
Install root exporter:
|
||
|
||
```bash
|
||
cp systemd/hermes-ops-exporter.service /etc/systemd/system/hermes-ops-exporter.service
|
||
cp systemd/hermes-ops-exporter.timer /etc/systemd/system/hermes-ops-exporter.timer
|
||
systemctl daemon-reload
|
||
systemctl enable --now hermes-ops-exporter.timer
|
||
systemctl status hermes-ops-exporter.timer --no-pager
|
||
```
|
||
|
||
Install Uma exporter as user systemd:
|
||
|
||
```bash
|
||
install -d -o uma -g uma /home/uma/.config/systemd/user
|
||
cp systemd/uma-hermes-ops-exporter.service /home/uma/.config/systemd/user/uma-hermes-ops-exporter.service
|
||
cp systemd/uma-hermes-ops-exporter.timer /home/uma/.config/systemd/user/uma-hermes-ops-exporter.timer
|
||
chown uma:uma /home/uma/.config/systemd/user/uma-hermes-ops-exporter.*
|
||
runuser -u uma -- systemctl --user daemon-reload
|
||
runuser -u uma -- systemctl --user enable --now uma-hermes-ops-exporter.timer
|
||
runuser -u uma -- systemctl --user status uma-hermes-ops-exporter.timer --no-pager
|
||
```
|
||
|
||
## Health baseline commands
|
||
|
||
```bash
|
||
hermes --version
|
||
hermes config check
|
||
hermes doctor --fix
|
||
hermes status --all
|
||
hermes cron list
|
||
systemctl status hermes-gateway --no-pager
|
||
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
|
||
df -h /
|
||
free -h
|
||
ss -ltnp
|
||
```
|
||
|
||
Notes:
|
||
|
||
- `hermes doctor --fix` migrated root and Uma configs to version `24` on 2026-05-27.
|
||
- Optional providers/search backends are mostly not configured yet. Configure through Hermes setup/auth flows only; never commit credentials.
|
||
- Local Ollama fallback models are installed on demand, not kept hot permanently. Both Hermes instances can reach the shared host service at `http://127.0.0.1:11434/v1`. The live fallback order is `qwen2.5-coder:1.5b` -> `llama3.2:1b` -> `llama3.2-vision`. `gemma4` was attempted but the installed Ollama runtime rejected it, so the vision fallback is `llama3.2-vision`.
|
||
|
||
## Gateway recovery
|
||
|
||
```bash
|
||
systemctl status hermes-gateway --no-pager
|
||
journalctl -u hermes-gateway -n 100 --no-pager
|
||
hermes gateway restart
|
||
# If the CLI restart path is unavailable:
|
||
sudo systemctl restart hermes-gateway
|
||
|
||
# Uma user gateway:
|
||
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
|
||
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 journalctl --user -u uma-hermes-gateway -n 100 --no-pager
|
||
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user restart uma-hermes-gateway
|
||
```
|
||
|
||
After restart, verify from Telegram:
|
||
|
||
- inbound message receives a response
|
||
- outbound completion messages work
|
||
- approval prompts still reach the allowed user
|
||
- media/file delivery works for a known safe file if needed
|
||
|
||
## Cron and watchdogs
|
||
|
||
List jobs:
|
||
|
||
```bash
|
||
hermes cron list
|
||
```
|
||
|
||
Current watchdog script:
|
||
|
||
```bash
|
||
~/.hermes/scripts/hermes_health_watchdog.py
|
||
```
|
||
|
||
Tracked source copy:
|
||
|
||
```bash
|
||
scripts/hermes-health-watchdog.py
|
||
```
|
||
|
||
Behavior:
|
||
|
||
- no output on success, so the cron stays silent
|
||
- sends a Telegram message only when it detects an actionable failure
|
||
- checks gateway service state, Hermes cron backup visibility/status, backup repo freshness when discoverable, and root disk usage
|
||
- also checks memory pressure plus critical Caddy/Gitea Docker containers (`caddy`, `gitea-npm-registry`)
|
||
|
||
Manual smoke test:
|
||
|
||
```bash
|
||
python3 ~/.hermes/scripts/hermes_health_watchdog.py
|
||
# Healthy output should be empty.
|
||
```
|
||
|
||
Tracked systemd watchdog timers:
|
||
|
||
```bash
|
||
systemctl status hermes-health-watchdog.timer --no-pager
|
||
systemctl --user --machine=uma@.host status uma-hermes-health-watchdog.timer --no-pager
|
||
tail -n 20 /root/.hermes/logs/hermes-health-watchdog.log
|
||
tail -n 20 /home/uma/.hermes/logs/hermes-health-watchdog.log
|
||
```
|
||
|
||
Dashboard warning bridge:
|
||
|
||
```bash
|
||
/var/log/hermes-dashboard-warnings.log
|
||
```
|
||
|
||
The dashboard backend appends deduplicated warning lines there when
|
||
`HERMES_DASHBOARD_ALERT_LOG` is configured. Both watchdogs tail the same file
|
||
and route by `instance=vijay`, `instance=bheem`, or `instance=all`.
|
||
Telegram delivery is attempted only when `~<user>/.config/hermes/telegram`
|
||
exists with `BOT_TOKEN=`/`CHAT_ID=` or `TELEGRAM_BOT_TOKEN=`/`TELEGRAM_CHAT_ID=`.
|
||
If that file is absent, the watchdog still writes a local warning log line and
|
||
records `Telegram delivery skipped or failed`.
|
||
|
||
2026-05-31 Telegram delivery validation:
|
||
|
||
- `instance=bheem` synthetic warning: consumed only by Uma watchdog; root log
|
||
had zero matches; Telegram delivery succeeded.
|
||
- `instance=vijay` synthetic warning: consumed only by root watchdog; Uma log
|
||
had zero matches; Telegram delivery succeeded.
|
||
- `instance=all` synthetic warning: consumed by both watchdogs; Telegram
|
||
delivery succeeded for both chats.
|
||
- Recovery messages: after each alert, the next healthy watchdog pass sent
|
||
`recovery: back to healthy` and logged `Telegram recovery delivery succeeded`.
|
||
- Approval prompt/media validation: root and Uma bots returned Telegram `200`
|
||
for harmless inline-button prompt delivery and small document upload.
|
||
- Approval callback execution evidence: live gateway logs contain real
|
||
`Telegram button resolved 1 approval(s)` entries for root through
|
||
2026-05-30, including a deny choice, and for Uma on 2026-05-25. Telegram's
|
||
Bot API cannot synthesize user callback clicks, so callback execution proof
|
||
comes from these receiver logs plus source review of the Telegram callback
|
||
handler.
|
||
|
||
Persistent backup timers:
|
||
|
||
```bash
|
||
systemctl status hermes-root-backup.timer uma-hermes-backup.timer --no-pager
|
||
systemctl list-timers --all --no-pager | grep 'hermes.*backup'
|
||
```
|
||
|
||
## Backup and restore drill outline
|
||
|
||
The persistent-data backup repo intentionally excludes raw secrets and `state.db`.
|
||
|
||
For full VM rebuild steps, use `docs/hermes-disaster-recovery.md`.
|
||
|
||
For break-glass recovery of raw secrets/auth/state that are excluded from GitHub backups, use:
|
||
|
||
```bash
|
||
scripts/hermes-emergency-bundle-create.sh
|
||
scripts/hermes-emergency-bundle-decrypt.sh
|
||
scripts/hermes-emergency-bundle-upload-drive.sh
|
||
```
|
||
|
||
Store only the encrypted `.gpg` bundle in Google Drive or similar private storage. Never upload the plaintext staging directory.
|
||
|
||
Automated Drive upload:
|
||
|
||
```bash
|
||
/root/.local/share/hermes-drive-uploader-venv/bin/python scripts/hermes-google-drive-oauth-login.py
|
||
systemctl status hermes-emergency-drive-upload.timer --no-pager
|
||
systemctl start hermes-emergency-drive-upload.service
|
||
journalctl -u hermes-emergency-drive-upload.service -n 80 --no-pager
|
||
```
|
||
|
||
Personal Google Drive requires OAuth user credentials. A service account can see shared personal folders but cannot upload because it has no personal Drive storage quota.
|
||
|
||
General one-file Drive upload:
|
||
|
||
```bash
|
||
scripts/google-drive-upload-file.sh /path/to/file --target vijay
|
||
scripts/google-drive-upload-file.sh /path/to/file --target bheem --encrypt
|
||
```
|
||
|
||
The general uploader refuses sensitive-looking files by default, including `.env`, auth tokens, private keys, SQLite DBs, and Google credential files. Use `--encrypt` for private files. Use `--allow-sensitive` only after explicit approval.
|
||
|
||
Telegram usage pattern:
|
||
|
||
```text
|
||
Upload the file I just sent to Vijay Google Drive. Do not print file contents. Find the local attachment path, then use scripts/google-drive-upload-file.sh with --target vijay.
|
||
```
|
||
|
||
Quarterly restore drill:
|
||
|
||
1. Run the backup sync manually or wait for a successful cron run.
|
||
2. Clone the backup repo into a temporary directory.
|
||
3. Inspect git contents for accidental raw secrets:
|
||
```bash
|
||
git grep -nE '(API_KEY|TOKEN|SECRET|PASSWORD|BEGIN .*PRIVATE KEY)' || true
|
||
```
|
||
4. Restore into a non-production Hermes profile/test directory only.
|
||
5. Verify config, skills, sessions JSON exports, cron definitions, memories, and scripts are present.
|
||
6. Confirm `.env`, OAuth files, SQLite WAL/SHM files, logs, caches, and raw `state.db` are absent.
|
||
7. Delete the temporary restore directory when done.
|
||
|
||
2026-05-27 restore rehearsal:
|
||
|
||
- Restored root backup into `/tmp/hermes-restore-test-root`.
|
||
- Verified portable directories/files were present: `config.yaml`, `skills/`, `sessions/`, `cron/`, `memories/`, and scripts.
|
||
- Verified raw `state.db` was absent.
|
||
- Scanned restored `.env` template and `config.yaml` for common token patterns; no hits.
|
||
|
||
## Upgrade checklist
|
||
|
||
Before upgrade:
|
||
|
||
```bash
|
||
hermes --version
|
||
hermes status --all
|
||
hermes config check
|
||
hermes cron list
|
||
python3 ~/.hermes/scripts/sync_hermes_persistent_backup.py
|
||
```
|
||
|
||
Upgrade from an interactive/private shell only:
|
||
|
||
```bash
|
||
hermes update
|
||
```
|
||
|
||
After upgrade:
|
||
|
||
```bash
|
||
hermes doctor --fix
|
||
hermes gateway restart
|
||
hermes --version
|
||
hermes status --all
|
||
hermes cron list
|
||
python3 ~/.hermes/scripts/hermes_health_watchdog.py
|
||
```
|
||
|
||
Then run Telegram smoke tests and record any manual fixups in this doc or the roadmap.
|
||
|
||
2026-05-27 late upgrade pass:
|
||
|
||
- Backed up root/Uma configs and service units under `/root/hermes-fix-backups/20260527-roadmap-noncreds/`.
|
||
- Fast-forwarded `/usr/local/lib/hermes-agent` to upstream `0b6ace649`.
|
||
- Restarted both gateways.
|
||
- Verified provider smoke tests with exact responses `root-roadmap-ok` and `uma-roadmap-ok`.
|
||
|
||
## Provider and tool changes
|
||
|
||
Use Hermes flows rather than editing secrets into git-tracked files:
|
||
|
||
```bash
|
||
hermes model
|
||
hermes setup model
|
||
hermes tools list
|
||
hermes tools enable <toolset>
|
||
hermes tools disable <toolset>
|
||
```
|
||
|
||
Restart/reset requirement:
|
||
|
||
- gateway config changes: `/restart` from Telegram or `hermes gateway restart`
|
||
- CLI session tool changes: start a new session or `/reset`
|
||
- provider auth changes: start a new session after switching models/providers
|
||
|
||
## Safe local Gitea Git token flow
|
||
|
||
Root Hermes has a least-privilege local Gitea Git path for repository reads:
|
||
|
||
- token file: `/root/.gitea_npm_token_home`
|
||
- askpass helper: `/root/.local/bin/gitea-git-askpass`
|
||
- Git wrapper: `/root/.local/bin/gitea-git`
|
||
- default username: `learning_ai_user`
|
||
- local Gitea URL: `http://localhost:3300`
|
||
|
||
The token value must never be placed in a remote URL, shell history, Git config, docs, logs, or Hermes chat. The wrapper sets `GIT_TERMINAL_PROMPT=0` and `GIT_ASKPASS=/root/.local/bin/gitea-git-askpass`; the askpass helper reads the token from the root-only token file only when Git prompts for a password.
|
||
|
||
Safe read-only test:
|
||
|
||
```bash
|
||
/root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD
|
||
```
|
||
|
||
Hermes-safe prompt pattern:
|
||
|
||
```text
|
||
Use the terminal tool only. Run exactly this read-only command and report only whether it succeeded and the first 12 characters of the HEAD hash: /root/.local/bin/gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD. Do not print any token, credential, environment variable, or file contents.
|
||
```
|
||
|
||
Verification recorded on 2026-05-27:
|
||
|
||
- local Gitea version endpoint returned `1.22.6`
|
||
- token file permissions are root-only
|
||
- profile-read API access returned a scope denial, confirming the token is not broad enough for user-profile reads
|
||
- direct wrapper test returned HEAD `59c4638f85be...`
|
||
- Hermes one-shot test reported success with truncated HEAD `59c4638f85be`
|
||
|
||
For write operations, create a separate repo-scoped token and store it in a new root-only token file. Do not reuse this read-focused token for broad automation unless the required scope is explicitly reviewed first.
|
||
|
||
## GitHub credential ownership
|
||
|
||
Root Git operations already have GitHub push credentials through the root Git credential store. Root is the operator account for both:
|
||
|
||
- `https://github.com/saravanakumardb/learning_ai_devops_tools.git`
|
||
- `https://github.com/umadev0931/uma_hostinger_hermes_vm.git`
|
||
|
||
Uma does not need a separate `/home/uma/.git-credentials` file for the current workflow because repo maintenance and pushes are performed from root. Do not copy root GitHub credentials into Uma's home directory unless there is a concrete need for Uma-user GitHub pushes.
|
||
|
||
Remaining audit item: confirm in GitHub that the root token is fine-grained or otherwise limited to the intended repos and permissions. Do not print the token while checking this.
|
||
|
||
## Telegram topics and session handling
|
||
|
||
Root and Uma currently use the standard Telegram gateway session handling. Do not enable or change topic/session behavior without a concrete routing need.
|
||
|
||
Review these before changing Telegram routing:
|
||
|
||
```bash
|
||
systemctl status hermes-gateway --no-pager
|
||
sudo -u uma XDG_RUNTIME_DIR=/run/user/1002 systemctl --user status uma-hermes-gateway --no-pager
|
||
grep -RniE 'topic|thread|TELEGRAM_.*THREAD|HOME_CHANNEL' /root/.hermes /home/uma/.hermes 2>/dev/null | head -100
|
||
```
|
||
|
||
## Multi-agent execution conventions
|
||
|
||
Use the smallest execution surface that fits the task:
|
||
|
||
- direct tool call: one-shot local checks, edits, commits, pushes, status reads
|
||
- `delegate_task`: bounded research or code inspection that can return inside the parent session
|
||
- spawned Hermes/tmux session: long-running mission that must outlive the parent turn
|
||
- background terminal process: long-running local commands that need monitoring
|
||
- cron job: recurring, deterministic, silent-on-success maintenance
|
||
- worktree: independent coding agent branch space when tasks can overlap
|
||
- Kanban worker: durable multi-agent project coordination after the board is intentionally configured
|
||
|
||
Telegram progress/completion updates should keep the user's numbered-prefix convention (`1`, `2`, etc. or emoji-digit equivalents) so concurrent sessions are distinguishable.
|
||
|
||
## Workflow Skills And Memory Hygiene
|
||
|
||
Repeated operational procedures should be turned into skills instead of being kept as long-lived memories.
|
||
|
||
Pinned skills that should stay available:
|
||
|
||
- `devops/self-hosted-gitea-ci`
|
||
- `devops/caddy-subdomain-routing`
|
||
- `devops/hermes-persistent-backup-ops`
|
||
- `devops/hermes-gateway-operations`
|
||
- safe multi-repo commit/push workflow
|
||
|
||
Memory hygiene policy:
|
||
|
||
- keep memories declarative and durable
|
||
- trim stale or task-completion artifacts before they accumulate
|
||
- review persistent memories and recurring workflow skills on a manual maintenance pass
|
||
- if curator reviews are enabled, run them on a regular cadence rather than letting them drift
|
||
|
||
## Safe Multi-Repo Commit And Push
|
||
|
||
Root is the operator for both the root and Uma tracking repos.
|
||
|
||
Safe sequence:
|
||
|
||
1. Work in the target repo only.
|
||
2. Run the repo's tests or checks before committing.
|
||
3. Commit the smallest coherent change.
|
||
4. Push from root using the already-approved GitHub credential path.
|
||
5. Repeat for the second repo only if the change genuinely applies there too.
|
||
|
||
Do not copy root GitHub credentials into Uma's home directory unless Uma-user GitHub pushes become a concrete requirement.
|
||
|
||
## Telegram Notification Convention
|
||
|
||
Phase 8 of the dashboard roadmap (and the watchdog scripts that ship Telegram
|
||
alerts today) follow a small set of conventions worth keeping consistent.
|
||
|
||
**Routing per instance**
|
||
- Vijay (root) alerts go to the root Telegram chat.
|
||
- Bheem (uma) alerts go to Uma's Telegram chat.
|
||
- Cross-cutting alerts (e.g. "the dashboard itself is unreachable") go to the
|
||
root chat — root is the operator account.
|
||
|
||
**Silent on healthy**
|
||
- Watchdog scripts and (in future) the dashboard's own Telegram hook **only**
|
||
post when something is wrong. A green poll is a no-op.
|
||
- Recoveries ARE a Telegram event (one line: "back to healthy") so the chat
|
||
history reflects the full incident lifecycle.
|
||
|
||
**Numbered-emoji progress convention**
|
||
- When a multi-step operation is being narrated to Telegram, prefix each step
|
||
with the corresponding numbered emoji: `1️⃣`, `2️⃣`, `3️⃣`, … up to `🔟`.
|
||
- This survives copy-paste across clients (unlike `1.`, which Telegram tends
|
||
to render inconsistently in dark mode) and makes the chat scannable.
|
||
- The watchdog scripts already emit completion updates this way; any
|
||
dashboard-originated message that runs through the same delivery path
|
||
should match.
|
||
|
||
**Approval prompts**
|
||
- Approval-required actions still land in Telegram with two inline buttons
|
||
(✅ approve / ❌ deny). The dashboard does not yet trigger these — see the
|
||
Phase 8 delegation brief in `docs/prompts/phase8-telegram-loop.md` for the
|
||
design that closes the loop end-to-end.
|
||
- 2026-05-31 delivery smoke test: root and Uma bots both returned Telegram
|
||
`200` for a harmless inline-button approval prompt. Callback handling was not
|
||
exercised because that requires a human button press and an action receiver.
|
||
|
||
**Media/file delivery**
|
||
- 2026-05-31 delivery smoke test: root and Uma bots both returned Telegram
|
||
`200` for a small text document upload.
|
||
|
||
**Don't paste secrets**
|
||
- Bot tokens and chat IDs live in `~<user>/.config/hermes/telegram` mode `600`,
|
||
never in repo files. The dashboard's `lib/logger.ts` redacts
|
||
`Authorization` / `Cookie` / `*.token` paths from any logged object so an
|
||
accidental `req.log.info({ tg })` won't dump credentials.
|
||
|
||
## Token audit status
|
||
|
||
Checked on 2026-05-31 without printing token values:
|
||
|
||
- Gitea package tokens exist at `/opt/bytelyst/.gitea_token`,
|
||
`/root/.gitea_npm_token`, and `/root/.gitea_npm_token_home`, mode `600`.
|
||
They can read package metadata from the local Gitea npm registry and receive
|
||
`403` from `/api/v1/user`, which is consistent with package-only/no-profile
|
||
scope.
|
||
- Root GitHub credentials exist in `/root/.git-credentials`. GitHub API scope
|
||
headers report `gist, read:org, repo, workflow`; this is broader than the
|
||
desired least-privilege backup scope.
|
||
- No Uma-owned GitHub token file was found under `/home/uma` during the metadata
|
||
scan, and the active `uma-hermes-backup.service` still runs as root. Keep the
|
||
existing backup path running until a fine-grained Uma-owned token is provided,
|
||
then migrate Bheem self-push and re-audit.
|