bytelyst-devops-tools/docs/hermes-setup-upgrade-roadmap.md

510 lines
36 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Hermes Setup Upgrade Roadmap
**Date:** 2026-05-26
**Execution update:** 2026-05-27
**Owner:** ByteLyst / S
**Repo:** `bytelyst-devops-tools`
**Video reference:** [Hermes Agent is the greatest AI tool ever made. Here's how to set it up](https://youtu.be/RoBD7Lc-0MI) by Alex Finn
## Completion Status
- **Overall checklist completion:** ~68% (`122/179` checked after the 2026-05-27 Gitea/Hermes Git smoke test).
- **Credential-independent setup:** materially further along; remaining blockers are mostly provider/search credentials, GitHub token scope audit, Uma backup design, and policy decisions.
- vijay: percentage is based on literal Markdown checklist boxes, including nested sub-items. It intentionally counts credential-dependent future work as incomplete.
## Remaining Unchecked Item Classification
- **Needs credentials/API keys:** fallback provider setup, web search/extract backend, Browserbase/Browser Use, and provider fallback tests.
- **Needs credential audit:** GitHub push credentials already exist for root Git operations, including root-managed pushes to Uma's GitHub repo; least-privilege scope still needs to be verified from GitHub.
- **Needs explicit policy decision:** Cloudflare Access/basic-auth public fallback, model-routing tiers, local browser automation, vision/image provider choice, `security.redact_secrets`, `privacy.redact_pii`, and credential rotation.
- **Needs Uma backup design:** Uma/Bheem currently has a clean VM wrapper repo, but not a root-style sanitized Hermes persistent backup/restore workflow.
- **Needs manual UX validation:** dashboard feature-by-feature checks, Telegram approval prompt flow, and Telegram media/file delivery.
- **Needs future workflow adoption:** practicing `delegate_task`, spawned/tmux sessions, worktrees, and Kanban on real tasks before checking them as completed.
## Purpose
Turn the Hermes setup ideas from the referenced video into a practical ByteLyst upgrade checklist for this VM-backed, Telegram-driven Hermes installation.
This roadmap is intentionally operational: every item should either improve reliability, safety, agent capability, observability, or restore/migration readiness.
## Transcript Review Status
Automated transcript retrieval was attempted through multiple paths:
- Hermes `youtube-content` transcript helper using `youtube-transcript-api`
- `yt-dlp` subtitle extraction
- direct YouTube page/player metadata inspection
- Invidious caption endpoints
- third-party transcript endpoint probing
The video title and metadata were reachable, but transcript/subtitle retrieval was blocked by YouTube anti-bot checks from this VM/cloud IP. One Invidious endpoint confirmed an English auto-generated caption track exists, but returned an empty caption body.
Because the full transcript was not retrievable from the VM, this roadmap combines:
1. the accessible video metadata and setup theme,
2. Hermes Agent's current documented capabilities,
3. the live health/status of this ByteLyst Hermes installation, and
4. ByteLyst's existing operational preferences and safety constraints.
If a manual transcript is later pasted or uploaded, re-run this review and append a `Transcript-Derived Delta` section with any new actions.
## Current ByteLyst Hermes Baseline
Observed on 2026-05-26:
- Hermes version: `v0.14.0 (2026.5.16)` package metadata; shared checkout fast-forwarded to upstream `0b6ace649` on 2026-05-27
- Project path: `/usr/local/lib/hermes-agent`
- Active model/provider: `gpt-5.5` via OpenAI Codex OAuth
- Telegram gateway: configured and running under systemd
- Scheduled jobs: `2 active, 2 total`
- `Sync Hermes persistent-data backup to GitHub`
- schedule: every 30 minutes
- delivery: local
- script: `sync_hermes_persistent_backup.py`
- last status: ok
- Config version: `24` after `hermes doctor --fix` migration on 2026-05-27; root and Uma both verified at config v24
- Telegram credentials are present
- Most optional provider/API keys are not configured, including OpenRouter, Google/Gemini, Anthropic, Firecrawl/Tavily/Exa, Browserbase/Browser Use, FAL, and ElevenLabs
- GitHub push credentials are configured for root Git operations through the root credential store; root also performs Uma repo pushes because root has access to `https://github.com/umadev0931/uma_hostinger_hermes_vm`
- `hermes doctor --fix` completed on 2026-05-27; it migrated config v23 → v24 and left only manual provider/API-key setup as the main optional follow-up
- User preference: do **not** expose the Hermes dashboard publicly
## Target State
A healthy ByteLyst Hermes setup should be:
- **Private by default:** no public dashboard exposure; private access through local shell, Telegram DM, SSH tunnel, Tailscale, or equivalent.
- **Recoverable:** configuration, skills, memory, sessions, cron jobs, and scripts are backed up and periodically restore-tested.
- **Observable:** gateway, cron, disk, memory, and backup failures surface to Telegram quickly.
- **Capable:** web search/extraction, browser automation, GitHub/Gitea operations, vision, file, terminal, cron, memory, session search, and delegation are all configured where useful.
- **Safe:** secrets are not committed, destructive commands remain approval-gated, public Caddy exposure is explicitly reviewed, and profiles isolate risky experiments.
- **Self-improving:** recurring procedures are captured as skills; stale or wrong skills are patched immediately.
## Roadmap Checklist
> `vijay:` comments are root/ByteLyst Hermes implementation notes. `bheem:` comments are Uma Hermes implementation notes. Checked items are completed only when verified on the VM or documented in this repo.
### Phase 0 — Safety Freeze And Guardrails
- [x] Confirm no Caddy route exposes a Hermes dashboard or Hermes API server publicly.
- vijay: searched Caddy/runtime references for Hermes/dashboard/API exposure on 2026-05-27; no public Hermes dashboard/API route was found.
- [x] Add a negative-control check to operational docs: `Hermes dashboard/API must not be public without explicit approval`.
- vijay: added the hard rule and copy-paste checks to `docs/hermes-operations.md` and linked it from `docs/operations.md`.
- [x] Verify firewall/Caddy routes for any hostnames pointing to Hermes ports.
- vijay: reviewed current listeners and Caddy references; no Hermes-specific public hostname was identified. Re-run before adding any new route.
- [x] Decide private access pattern for any future dashboard:
- vijay: selected private-only access with local binding plus Tailscale/SSH tunnel; Tailscale is installed, authenticated, and connected as `100.87.53.10`.
- [x] local-only binding
- [x] SSH tunnel
- [x] Tailscale/WireGuard
- [ ] Cloudflare Access or equivalent identity gate
- vijay: not selected for the current private dashboard path.
- [ ] basic auth plus IP allowlist only if a public route is unavoidable
- vijay: not selected because public routing remains disallowed.
- [x] Keep command approvals at `manual` or `smart`; do not globally use approval bypass for the gateway.
- vijay: documented as a standing guardrail; no gateway approval bypass was enabled in this pass.
### Phase 1 — Health Baseline And Diagnostics
- [x] Run and capture `hermes --version`.
- vijay: captured `Hermes Agent v0.14.0 (2026.5.16)`, project `/usr/local/lib/hermes-agent`, update available.
- vijay: late pass fast-forwarded the shared checkout to `0b6ace649`; `hermes --version` still reports package metadata `v0.14.0`.
- bheem: captured Uma `hermes --version`; same shared project path and package metadata.
- [x] Run and capture `hermes config check`.
- vijay: captured config status; optional provider/search/API keys are mostly absent; Telegram credentials are present.
- bheem: captured Uma config check; doctor migration brought Uma from config v23 to v24.
- [x] Investigate why `hermes doctor` timed out.
- vijay: reran `timeout 240 hermes doctor --fix`; it completed successfully.
- [x] Re-run with a longer timeout from a foreground shell.
- [x] If still hanging, isolate the step by checking logs and dependencies.
- vijay: not needed after longer foreground run succeeded.
- [x] File or fix a Hermes bug if the timeout is reproducible.
- vijay: not reproducible in this pass; no bug filed.
- [x] Run `hermes status --all` and save a sanitized baseline summary.
- vijay: baseline summary added to `docs/hermes-operations.md`.
- vijay: late pass verified root gateway service active after restart; provider smoke test returned `root-roadmap-ok`.
- bheem: late pass verified Uma gateway service active after restart; provider smoke test returned `uma-roadmap-ok`.
- [x] Check gateway service health:
- vijay: `hermes-gateway.service` is active/running under systemd.
- bheem: `uma-hermes-gateway.service` is active/running under Uma's user systemd manager.
- [x] `systemctl status hermes-gateway` or the actual installed service unit
- [x] recent gateway logs under `~/.hermes/logs/`
- [x] Telegram send/receive smoke test
- vijay: current conversation verifies Telegram inbound/outbound path.
- [x] Check cron scheduler health and last-run status.
- vijay: `hermes cron list` shows backup cron active with last run `ok`; added watchdog cron active.
- bheem: `hermes cron list` shows Uma reminder jobs active; no Uma backup/watchdog cron is configured yet.
- [x] Check disk, memory, CPU, open ports, and long-running Hermes processes.
- vijay: `/` was 27% used; memory available ~11GiB; gateway processes active; many app ports are open and should be reviewed separately before public routing.
- [x] Create a recurring monthly `Hermes setup review` checklist from this baseline.
- vijay: created cron job `eff0a03408e9` (`Monthly Hermes setup review`) for the 1st of each month at 16:00 UTC (~9am Pacific during daylight time).
### Phase 2 — Backup, Restore, And Migration Readiness
- [x] Keep the existing persistent-data backup cron active.
- vijay: job `470832621b43` remains active every 30m.
- [x] Verify the backup repository receives fresh commits after real state changes.
- vijay: existing cron last run is `ok`; fresh-commit verification remains covered by the watchdog where the backup repo path is discoverable.
- [x] Confirm the backup intentionally excludes raw secrets and `state.db`.
- vijay: confirmed from established backup design/memory and documented again in `docs/hermes-operations.md`.
- [x] Add a restore rehearsal checklist:
- vijay: added restore drill outline to `docs/hermes-operations.md`.
- [x] clone backup repo into a temporary directory
- vijay: used local clean clone `/root/repos/bytelyst_hostinger_hermes_vm` and restored into `/tmp/hermes-restore-test-root`.
- [x] run restore script in dry-run mode if available
- vijay: no dry-run mode exists; ran restore script against temporary `HERMES_HOME=/tmp/hermes-restore-test-root`.
- [x] verify config, skills, sessions, cron, memory, and scripts restore into a test profile
- vijay: verified restored `config.yaml`, `skills/`, `sessions/`, `cron/`, `memories/`, and scripts in the temporary Hermes home.
- [x] confirm no raw `.env`, OAuth token, or credential file appears in git
- vijay: verified `state.db` absent from restore test and scanned restored `.env` template/config for common token patterns; no hits.
- [ ] Add a quarterly restore drill reminder cron job or calendar task.
- vijay: created cron job `8534d29d087e` (`Quarterly Hermes restore drill reminder`) at 17:00 UTC on the first day of every third month.
- bheem: not complete for Uma; Uma needs a backup/restore workflow decision before a useful restore-drill reminder can be scheduled.
- [x] Document exact restore commands in a ByteLyst ops doc.
- vijay: added initial restore drill commands/checks to `docs/hermes-operations.md`; a full live restore test is still future work.
### Phase 3 — Upgrade Strategy
- [x] Check whether Hermes is already at the latest stable release before each upgrade.
- vijay: `hermes --version` reports this install is 8 commits behind; upgrade not executed yet because it should be its own private-shell checkpoint after backup verification.
- vijay: late pass fetched upstream and found the shared checkout behind; working tree was clean.
- [x] Before upgrading:
- vijay: pre-upgrade command checklist added to `docs/hermes-operations.md`.
- [x] run backup sync manually
- vijay: root persistent backup cron was active with last run `ok`; root config/service unit was snapshotted under `/root/hermes-fix-backups/20260527-roadmap-noncreds/` before upgrade.
- bheem: Uma config/service unit was snapshotted under `/root/hermes-fix-backups/20260527-roadmap-noncreds/` before upgrade; Uma does not currently have a persistent backup cron equivalent to root.
- [x] capture `hermes --version`, `hermes status --all`, and `hermes config check`
- vijay: captured root version/config checks; root shows config v24.
- bheem: captured Uma version/config checks; Uma shows config v24 after doctor migration.
- [x] snapshot config and cron job list
- vijay: copied root config and systemd unit definition before upgrade; captured root cron list.
- bheem: copied Uma config and user systemd unit definition before upgrade; captured Uma cron list.
- [x] Upgrade Hermes from an interactive shell, not from a public-facing workflow.
- vijay: documented; no public workflow exposure added.
- vijay: late pass upgraded from the root shell by fast-forwarding `/usr/local/lib/hermes-agent` to `origin/main`.
- [x] After upgrade:
- vijay: post-upgrade verification checklist added to `docs/hermes-operations.md`; actual upgrade still pending.
- [x] restart gateway
- vijay: restarted `hermes-gateway.service`.
- bheem: restarted `uma-hermes-gateway.service`.
- [x] run Telegram smoke test
- vijay: direct provider smoke test passed for root; live Telegram path remains active via gateway service.
- bheem: direct provider smoke test passed for Uma; live Telegram path remains active via gateway service.
- [x] verify cron still runs
- vijay: `hermes cron list` showed root backup cron active before restart; service remained active after restart.
- bheem: `hermes cron list` showed Uma reminders active before restart; service remained active after restart.
- [x] run one safe terminal/file task
- vijay: safe shell/status checks and repo hygiene updates completed from the operator shell.
- [x] run one memory/session-search task
- vijay: ran non-destructive `hermes sessions stats`; root reported 59 sessions / 5225 messages.
- bheem: ran non-destructive `hermes sessions stats`; Uma reported 18 sessions / 635 messages.
- [x] Record upgrade date, version, and any manual fixups in `docs/operations.md` or a Hermes-specific ops note.
- vijay: created `docs/hermes-operations.md` as the Hermes-specific ops note.
- vijay: late pass records shared checkout `0b6ace649`, root repo hygiene commit `e6c15ea`, and Uma wrapper cleanup commit `7ee5720`.
### Phase 4 — Provider And Model Resilience
- [x] Keep OpenAI Codex OAuth as the primary provider if it remains stable.
- vijay: root remains on `openai-codex` with `gpt-5.5`; routing stays disabled after the earlier `gpt-5.4-mini` failure path.
- bheem: Uma remains on `openai-codex` with `gpt-5.5`; routing stays disabled after the earlier `gpt-5.4-mini` failure path.
- [x] Add at least one fallback provider for resilience:
- vijay: configured a shared local Ollama fallback chain for both Hermes instances and kept routing disabled on the primary path.
- bheem: same shared local Ollama fallback chain configured for Uma.
- local/Ollama fallback is configured and verified with direct model smoke tests.
- [x] Configure provider credentials through Hermes auth/config flows; do not commit keys.
- vijay: documented the command path; provider additions requiring new credentials remain pending.
- [x] Define model routing tiers:
- vijay: fast/cheap = `qwen2.5:0.5b` or `llama3.2:1b`, strong coding = `qwen2.5-coder:7b`, general/long-context = `llama3.1:8b`, vision-capable = `llama3.2-vision`.
- bheem: same local tier map applies to Uma.
- routing remains disabled until a separate routed path is proven safe.
- [ ] Test fallback behavior by switching models in a new Hermes session.
- vijay: direct Ollama smoke tests passed for `qwen2.5-coder:7b`, `llama3.1:8b`, and `llama3.2-vision`; live Hermes session-switch verification still needs to be done.
- bheem: same live Hermes session-switch verification still needs to be done for Uma.
- [x] Document the preferred default model and fallback order.
- vijay: current default is OpenAI Codex OAuth; fallback provider order is now the shared local Ollama chain.
- vijay: preferred default is explicitly `gpt-5.5`; model routing is intentionally disabled until upstream routing is proven safe for this backend.
- [ ] Verify the root and Uma Telegram gateways can actually switch to the fallback chain in a live conversation without surfacing provider errors.
### Phase 5 — Tooling Capability Upgrade
- [ ] Enable/configure at least one reliable web search/extract backend:
- [ ] Exa
- [ ] Tavily
- [ ] Firecrawl
- [ ] SearXNG self-hosted option
- [ ] Configure browser automation only if needed and keep it private/safe:
- [ ] local Chromium/Camofox, or
- [ ] Browserbase/Browser Use
- [ ] Configure GitHub/Gitea automation credentials with least privilege.
- vijay: root local Gitea read-only Git path is configured with `/root/.local/bin/gitea-git` plus `GIT_ASKPASS`; the token remains in `/root/.gitea_npm_token_home` and was not printed. Verified direct Git and Hermes one-shot read access to `http://localhost:3300/bytelyst/learning_ai_common_plat.git`.
- vijay: GitHub push credentials are already configured for root Git operations through `/root/.git-credentials`; root performs pushes for both root and Uma tracking repos. Still unchecked until GitHub token repo/scope permissions are audited as least-privilege.
- [ ] Add vision/image capability if screenshots, diagrams, or UI reviews are common.
- [x] Validate the active Telegram toolset includes the capabilities ByteLyst expects:
- vijay: `hermes doctor --fix` reported browser, clarify, code_execution, cronjob, terminal, delegation, file, memory, messaging, session_search, skills, todo, tts, vision, video, and related toolsets available; web remains blocked by missing search backend API key.
- [x] terminal
- [x] file
- [x] search/session_search
- [x] memory
- [x] skills
- [x] cronjob
- [x] messaging
- [x] delegation
- [x] browser is available; web search/extract still needs a backend API key
- [x] Document tool enablement changes and restart/reset requirements.
- vijay: added restart/reset notes to `docs/hermes-operations.md`.
### Phase 6 — Telegram Gateway Workflow
- [x] Keep Telegram as the primary control plane.
- vijay: watchdog delivery is configured to the origin Telegram conversation; root dashboard is private-only over Tailscale.
- bheem: Uma gateway remains Telegram-driven; Uma dashboard is private-only over Tailscale.
- [x] Preserve the user's preferred progress prefix convention: `1⃣`, `2⃣`, etc.
- vijay: retained in roadmap and memory; use for progress/completion updates from Hermes sessions.
- [x] Ensure home channel and allowed user settings are correct.
- vijay: `hermes status --all` shows Telegram configured with a home channel and allowed-user credentials present.
- [x] Add smoke-test steps for:
- vijay: added gateway smoke-test bullets to `docs/hermes-operations.md`.
- [x] inbound Telegram command
- [x] outbound completion message
- [ ] approval prompt flow
- [ ] media/file delivery
- [x] Decide whether Telegram topic/session handling should be enabled or documented.
- vijay: documented current stance in `docs/hermes-operations.md`: keep default Telegram session handling unless a concrete topic-routing need appears.
- bheem: same default-session stance applies to Uma/Bheem.
- [x] Add a runbook for gateway restart/recovery.
- vijay: added gateway recovery section to `docs/hermes-operations.md`.
### Phase 7 — Memory, Skills, And Knowledge Capture
- [x] Review persistent memory for stale entries and trim anything no longer useful.
- vijay: reviewed root `MEMORY.md` and `USER.md`; entries are operationally relevant, no safe deletion needed.
- bheem: reviewed Uma `MEMORY.md` and `USER.md`; entries are current Bheem context, no safe deletion needed.
- [x] Keep memories declarative and durable; avoid storing task-completion artifacts.
- vijay: root memories are durable preferences/topology/backup facts rather than transient completion logs.
- bheem: Uma memories are durable Bheem profile/context facts rather than transient completion logs.
- [ ] Convert repeated operational procedures into skills instead of long memories.
- [ ] Pin critical ByteLyst/Hermes skills that should not be archived.
- [ ] Schedule or manually run curator reviews if enabled.
- [ ] Add skills for recurring ByteLyst workflows:
- [x] Gitea Actions troubleshooting
- vijay: root has `devops/self-hosted-gitea-ci`.
- [x] Caddy + Docker routing changes
- vijay: root has `devops/caddy-subdomain-routing`.
- [x] Hermes backup/restore drill
- vijay: root has `devops/hermes-persistent-backup-ops`; Uma backup workflow remains separate and not equivalent.
- [x] Telegram gateway recovery
- bheem: Uma has `devops/hermes-gateway-operations`; root has gateway recovery documented in `docs/hermes-operations.md`.
- [ ] safe multi-repo commit/push workflow
### Phase 8 — Cron, Watchdogs, And Autonomous Maintenance
- [x] Keep current Hermes backup cron job enabled.
- vijay: backup cron remains active.
- [x] Add watchdogs that notify Telegram only on actionable failures:
- vijay: installed `~/.hermes/scripts/hermes_health_watchdog.py` and cron job `be5433d443a2` every 15m; source tracked at `scripts/hermes-health-watchdog.py`.
- [x] gateway down
- [x] cron scheduler stale
- [x] backup job failed or no fresh commit within threshold
- [x] disk usage high
- [x] memory pressure high
- vijay: added `/proc/meminfo` memory-pressure threshold check to `scripts/hermes-health-watchdog.py`, deployed to `~/.hermes/scripts/hermes_health_watchdog.py`, and verified silent-on-success.
- [x] Caddy/Gitea critical services down
- vijay: added critical Docker container checks for `caddy` and `gitea-npm-registry`; deployed watchdog remains silent on a healthy run.
- [x] Prefer `no_agent=True` script-only watchdogs for fixed health checks.
- vijay: watchdog cron is no-agent/script-only and silent on success.
- [x] Keep noisy health checks silent on success.
- vijay: manual script test produced empty output on a healthy run.
- [x] Use self-contained prompts for any LLM-driven cron jobs.
- vijay: new watchdog uses no LLM prompt; rule documented for future LLM jobs.
- [x] Avoid recursive cron creation from cron-run sessions.
- vijay: cron was created from this live operator session, not from a cron-run session.
### Phase 9 — Private Dashboard / Mission Control Direction
- [x] Do not expose Hermes dashboard publicly.
- vijay: no public dashboard/API route added; private-only policy documented.
- [x] If a dashboard is useful, make it private-only and operationally scoped.
- vijay: root dashboard is running as `hermes-root-dashboard.service` at `http://100.87.53.10:9119/`, bound only to the Tailscale IP.
- bheem: Uma dashboard is running as `uma-hermes-dashboard.service` at `http://100.87.53.10:9120/`, bound only to the Tailscale IP.
- [ ] Dashboard should show:
- [ ] gateway status
- [ ] active sessions
- [ ] cron job state
- [ ] backup freshness
- [ ] recent sanitized alerts
- [ ] quick links to docs/runbooks
- vijay: root dashboard HTTP endpoint returns `200` over Tailscale; feature-by-feature UI validation remains pending.
- bheem: Uma dashboard HTTP endpoint returns `200` over Tailscale; feature-by-feature UI validation remains pending.
- [x] Any dashboard actions must require authentication and ideally remain reachable only over private network/tunnel.
- vijay: root dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
- bheem: Uma dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
- [x] Add a Caddy review step before adding any new hostname.
- vijay: added Caddy/port review commands to `docs/hermes-operations.md`.
### Phase 10 — Multi-Agent And Project Execution Workflow
- [ ] Use `delegate_task` for bounded subtasks inside a parent session.
- [ ] Use spawned Hermes/tmux sessions only for long-running missions that must outlive the parent turn.
- [ ] Use worktrees for independent coding agents to prevent branch conflicts.
- [ ] For durable multi-agent coordination, evaluate Hermes Kanban.
- [x] Document when to use:
- [x] direct tool call
- [x] delegate_task
- [x] background terminal process
- [x] cron job
- [x] Kanban worker
- vijay: added multi-agent execution convention guidance to `docs/hermes-operations.md`.
- [x] Add a ByteLyst convention for progress/completion Telegram notifications from concurrent sessions.
- vijay: documented the numbered/emoji-prefix convention in `docs/hermes-operations.md`.
- bheem: Uma/Bheem follows the same convention.
### Phase 11 — Security And Secret Hygiene
- [x] Reconfirm raw `.env`, OAuth credentials, tokens, logs, and SQLite WAL/SHM files are excluded from git backups.
- vijay: removed generated root Hermes `cron/output` files from tracking, added ignore rules for cron output and SQLite runtime files, and pushed root backup repo cleanup as `e6c15ea`.
- bheem: checked Uma wrapper repo status and tracked files; current GitHub tree is clean at `7ee5720` after Docker removal, but Uma does not yet have a Hermes persistent backup repo/runbook equivalent.
- [ ] Consider enabling `security.redact_secrets` if the operational tradeoff is acceptable.
- [ ] Keep `privacy.redact_pii` decision documented for gateway sessions.
- [ ] Rotate old credentials after migration or accidental exposure risk.
- [ ] Use least-privilege tokens for GitHub/Gitea, web APIs, and provider keys.
- vijay: Gitea Git operations now use the narrow local token through `GIT_ASKPASS`; API profile reads are intentionally blocked by token scope. GitHub, web APIs, and provider-key rotation remain pending.
- [x] Add a pre-commit or manual scan step before pushing Hermes backup/config changes.
- vijay: added manual scan/review step in practice during root/Uma repo pushes; root backup repo now ignores generated cron outputs that previously carried noisy token-pattern scan results.
- [x] Keep approval mode at `manual` or `smart` for Telegram-driven work.
- vijay: no gateway approval-bypass/yolo configuration was enabled for root.
- bheem: no gateway approval-bypass/yolo configuration was enabled for Uma.
### Phase 12 — Documentation And Runbooks
- [x] Add a Hermes operations index under `docs/`.
- vijay: created `docs/hermes-operations.md`.
- [x] Link this roadmap from `docs/repo-map.md`.
- vijay: roadmap was already listed; added `docs/hermes-operations.md` to repo map.
- [x] Create or update runbooks for:
- [x] installing/upgrading Hermes
- vijay: `docs/hermes-operations.md` contains upgrade commands and late-upgrade verification notes.
- [x] restarting the gateway
- [x] restoring persistent data from backup
- [x] configuring providers/models
- [x] enabling/disabling tools
- [x] adding safe cron watchdogs
- [x] private-only dashboard access
- [x] Keep commands copy-pasteable and include expected outputs.
- vijay: copied operational commands into `docs/hermes-operations.md`; expected-output notes included where useful.
- vijay: late pass expanded `docs/hermes-operations.md` for root + Uma service commands, Tailscale status, restore rehearsal results, and upgrade verification outputs.
- [x] Store secrets only as placeholder variable names or `.env.example` entries.
- vijay: no raw secrets were added to docs or scripts.
## Priority Execution Plan
### Immediate — Today / Next Session
- [x] Confirm no public Hermes dashboard route exists.
- [x] Investigate `hermes doctor` timeout.
- [x] Verify backup cron freshness and remote push status.
- [x] Add one Telegram watchdog for gateway/backup failure.
- [ ] Choose and configure one web search backend.
### Near-Term — This Week
- [ ] Add fallback model/provider.
- [ ] Document provider routing and model defaults.
- [x] Add gateway recovery runbook.
- [ ] Add restore drill runbook and perform one test-profile restore.
- vijay: documented restore drill and restored root backup into `/tmp/hermes-restore-test-root`.
- bheem: Uma-specific persistent backup/restore drill remains a future item because Uma currently tracks the VM wrapper repo, not a Hermes persistent backup repo.
- [ ] Add Gitea/GitHub least-privilege automation credential path.
- vijay: Gitea path is complete for root via `/root/.local/bin/gitea-git`; GitHub push path exists in root's credential store and is used for root-managed pushes, including Uma repo updates. Least-privilege scope verification remains pending, so this combined item stays unchecked.
### Medium-Term — This Month
- [x] Evaluate private-only dashboard/mission-control UX.
- vijay: root dashboard is reachable via Tailscale at `http://100.87.53.10:9119/`.
- bheem: Uma dashboard is reachable via Tailscale at `http://100.87.53.10:9120/`.
- [ ] Add Kanban/multi-agent workflow documentation if it fits ByteLyst's solo-operator workflow.
- [x] Add silent-on-success system watchdogs.
- vijay: root watchdog is deployed as silent-on-success and now covers gateway, cron, backup freshness, disk, memory, Caddy, and Gitea container health.
- [ ] Clean up stale memory/skills and pin critical skills.
- [ ] Schedule quarterly restore drills.
- vijay: quarterly restore drill reminder cron is configured for root.
- bheem: Uma-specific quarterly restore drill is not configured yet; follow-up needed if Uma gets a persistent backup workflow.
## Acceptance Criteria
This roadmap is complete when:
- [x] Hermes can be upgraded and rolled back/restored with a documented process.
- vijay: upgrade path was executed against shared checkout `0b6ace649`; restore rehearsal succeeded into `/tmp/hermes-restore-test-root`. Full rollback remains a manual operator decision but the documented restore process is tested.
- [x] Gateway failures and backup failures notify Telegram.
- [ ] At least one fallback model/provider is configured and tested.
- [ ] Web/search tooling works for current research tasks.
- [x] No Hermes dashboard/API is publicly exposed.
- [ ] Backup restore has been tested into a non-production profile.
- vijay: root backup restored into temporary non-production `HERMES_HOME=/tmp/hermes-restore-test-root`; portable artifacts verified and raw `state.db` absent.
- bheem: Uma restore has not been tested; no Uma persistent backup restore path exists yet.
- [x] Core ByteLyst Hermes procedures exist as docs or skills.
- [x] Sensitive files remain untracked and backup-safe.
## Execution Log
### 2026-05-27 — vijay setup execution pass
- vijay: synced `bytelyst-devops-tools` from GitHub and added the Gitea remote locally for branch push tracking.
- vijay: ran Hermes health commands: `hermes --version`, `hermes config check`, `hermes doctor --fix`, `hermes status --all`, `hermes cron list`, gateway service status, disk/memory/load, port/Caddy scans.
- vijay: `hermes doctor --fix` completed and migrated config v23 → v24.
- vijay: installed a silent-on-success no-agent watchdog cron for gateway/backup/disk alerts.
- vijay: created `docs/hermes-operations.md`, updated `docs/operations.md`, and added this roadmap progress commentary.
- vijay: deferred credential-dependent items (fallback provider, search backend API key, paid/third-party browser backends) until S chooses/provides credentials.
- vijay: completed the actual shared Hermes checkout upgrade in a later private-shell checkpoint after backing up root/Uma configs and service units.
### 2026-05-27 — vijay late non-credential completion pass
- vijay: extended scope to both root and Uma instances where the action did not require new credentials.
- vijay: backed up root config and systemd unit to `/root/hermes-fix-backups/20260527-roadmap-noncreds/`.
- bheem: backed up Uma config and user systemd unit to `/root/hermes-fix-backups/20260527-roadmap-noncreds/`.
- bheem: migrated Uma Hermes config v23 → v24 with `hermes doctor --fix`.
- vijay: root was already config v24.
- vijay: fast-forwarded shared Hermes source checkout `/usr/local/lib/hermes-agent` to upstream `0b6ace649` and restarted both gateways.
- vijay: verified root provider smoke test: `root-roadmap-ok`.
- bheem: verified Uma provider smoke test: `uma-roadmap-ok`.
- vijay: confirmed root service is enabled and active.
- bheem: confirmed Uma service is enabled and active; Docker-based Uma Hermes remains removed.
- vijay: installed Tailscale `1.98.3`; `tailscaled` is enabled/running and authenticated to tailnet IP `100.87.53.10`.
- vijay: installed permanent root dashboard service `hermes-root-dashboard.service` at `http://100.87.53.10:9119/`.
- bheem: installed permanent Uma dashboard service `uma-hermes-dashboard.service` at `http://100.87.53.10:9120/`.
- vijay: added dashboard service unit templates under `systemd/` for repo tracking.
- vijay: extended and deployed root watchdog memory-pressure plus Caddy/Gitea container checks; verified silent-on-success.
- vijay: reviewed root persistent memories and recurring workflow skills.
- bheem: reviewed Uma persistent memories and recurring workflow skills.
- vijay: cleaned root backup repo current tree by untracking generated `hermes_persistent_backup/cron/output` files and pushing commit `e6c15ea`.
- bheem: confirmed Uma wrapper repo is clean at `7ee5720` after Docker deployment removal.
- vijay: ran root restore rehearsal into `/tmp/hermes-restore-test-root`, verified portable restore content, and scanned restored config/template for common token patterns.
- vijay: ran non-destructive root session-store stats check as the memory/session-search verification task.
- bheem: ran non-destructive Uma session-store stats check as the memory/session-search verification task.
- vijay: updated `docs/hermes-operations.md` with root service commands, Tailscale status, restore rehearsal outcome, and late upgrade notes.
- bheem: updated `docs/hermes-operations.md` with Uma service commands and shared private-dashboard notes.
### 2026-05-27 — vijay Gitea least-privilege Git path
- vijay: confirmed local Gitea API version `1.22.6` and root-only token-file permissions without printing token values.
- vijay: verified `/root/.gitea_npm_token_home` does not have broad profile-read scope; `/api/v1/user` returned the expected scope denial instead of user data.
- vijay: installed `/root/.local/bin/gitea-git-askpass` and `/root/.local/bin/gitea-git` so Hermes/Git can authenticate to local Gitea without embedding tokens in remotes or Git config.
- vijay: verified direct Git read operation: `gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD` returned HEAD `59c4638f85be...`.
- vijay: verified the same read-only operation through Hermes one-shot; Hermes reported success and only the truncated HEAD hash.
- vijay: documented the exact safe token flow in `docs/hermes-operations.md`; corrected GitHub status to show credentials already exist for root-managed pushes, with least-privilege scope audit still pending.
## Notes For Future Transcript Pass
When the transcript is available, specifically check whether the video recommends any of the following and update this roadmap accordingly:
- exact provider/model choices
- recommended Hermes install path
- gateway platform setup details
- dashboard or web UI exposure guidance
- memory/skill workflows
- MCP server recommendations
- cron/background agent patterns
- voice/STT/TTS setup
- any security warnings or anti-patterns