bytelyst-devops-tools/docs/hermes-setup-upgrade-roadmap.md

# Hermes Setup Upgrade Roadmap

**Date:** 2026-05-26
**Execution update:** 2026-05-27
**Owner:** ByteLyst / S
**Repo:** `bytelyst-devops-tools`
**Video reference:** [Hermes Agent is the greatest AI tool ever made. Here's how to set it up](https://youtu.be/RoBD7Lc-0MI) by Alex Finn

## Completion Status

- **Overall checklist completion:** ~68% (`122/179` checked after the 2026-05-27 Gitea/Hermes Git smoke test).
- **Credential-independent setup:** materially further along; remaining blockers are mostly provider/search credentials, GitHub token scope audit, Uma backup design, and policy decisions.
- vijay: percentage is based on literal Markdown checklist boxes, including nested sub-items. It intentionally counts credential-dependent future work as incomplete.

## Remaining Unchecked Item Classification

- **Needs credentials/API keys:** fallback provider setup, web search/extract backend, Browserbase/Browser Use, and provider fallback tests.
- **Needs credential audit:** GitHub push credentials already exist for root Git operations, including root-managed pushes to Uma's GitHub repo; least-privilege scope still needs to be verified from GitHub.
- **Needs explicit policy decision:** Cloudflare Access/basic-auth public fallback, model-routing tiers, local browser automation, vision/image provider choice, `security.redact_secrets`, `privacy.redact_pii`, and credential rotation.
- **Needs Uma backup design:** Uma/Bheem currently has a clean VM wrapper repo, but not a root-style sanitized Hermes persistent backup/restore workflow.
- **Needs manual UX validation:** dashboard feature-by-feature checks, Telegram approval prompt flow, and Telegram media/file delivery.
- **Needs future workflow adoption:** practicing `delegate_task`, spawned/tmux sessions, worktrees, and Kanban on real tasks before checking them as completed.

## Purpose

Turn the Hermes setup ideas from the referenced video into a practical ByteLyst upgrade checklist for this VM-backed, Telegram-driven Hermes installation.

This roadmap is intentionally operational: every item should either improve reliability, safety, agent capability, observability, or restore/migration readiness.

## Transcript Review Status

Automated transcript retrieval was attempted through multiple paths:

- Hermes `youtube-content` transcript helper using `youtube-transcript-api`
- `yt-dlp` subtitle extraction
- direct YouTube page/player metadata inspection
- Invidious caption endpoints
- third-party transcript endpoint probing

The video title and metadata were reachable, but transcript/subtitle retrieval was blocked by YouTube anti-bot checks from this VM/cloud IP. One Invidious endpoint confirmed an English auto-generated caption track exists, but returned an empty caption body.

Because the full transcript was not retrievable from the VM, this roadmap combines:

1. the accessible video metadata and setup theme,
2. Hermes Agent's current documented capabilities,
3. the live health/status of this ByteLyst Hermes installation, and
4. ByteLyst's existing operational preferences and safety constraints.

If a manual transcript is later pasted or uploaded, re-run this review and append a `Transcript-Derived Delta` section with any new actions.

## Current ByteLyst Hermes Baseline

Observed on 2026-05-26:

- Hermes version: `v0.14.0 (2026.5.16)` package metadata; shared checkout fast-forwarded to upstream `0b6ace649` on 2026-05-27
- Project path: `/usr/local/lib/hermes-agent`
- Active model/provider: `gpt-5.5` via OpenAI Codex OAuth
- Telegram gateway: configured and running under systemd
- Scheduled jobs: `2 active, 2 total`
  - `Sync Hermes persistent-data backup to GitHub`
  - schedule: every 30 minutes
  - delivery: local
  - script: `sync_hermes_persistent_backup.py`
  - last status: ok
- Config version: `24` after `hermes doctor --fix` migration on 2026-05-27; root and Uma both verified at config v24
- Telegram credentials are present
- Most optional provider/API keys are not configured, including OpenRouter, Google/Gemini, Anthropic, Firecrawl/Tavily/Exa, Browserbase/Browser Use, FAL, and ElevenLabs
- GitHub push credentials are configured for root Git operations through the root credential store; root also performs Uma repo pushes because root has access to `https://github.com/umadev0931/uma_hostinger_hermes_vm`
- `hermes doctor --fix` completed on 2026-05-27; it migrated config v23 → v24 and left only manual provider/API-key setup as the main optional follow-up
- User preference: do **not** expose the Hermes dashboard publicly

## Target State

A healthy ByteLyst Hermes setup should be:

- **Private by default:** no public dashboard exposure; private access through local shell, Telegram DM, SSH tunnel, Tailscale, or equivalent.
- **Recoverable:** configuration, skills, memory, sessions, cron jobs, and scripts are backed up and periodically restore-tested.
- **Observable:** gateway, cron, disk, memory, and backup failures surface to Telegram quickly.
- **Capable:** web search/extraction, browser automation, GitHub/Gitea operations, vision, file, terminal, cron, memory, session search, and delegation are all configured where useful.
- **Safe:** secrets are not committed, destructive commands remain approval-gated, public Caddy exposure is explicitly reviewed, and profiles isolate risky experiments.
- **Self-improving:** recurring procedures are captured as skills; stale or wrong skills are patched immediately.

## Roadmap Checklist

> `vijay:` comments are root/ByteLyst Hermes implementation notes. `bheem:` comments are Uma Hermes implementation notes. Checked items are completed only when verified on the VM or documented in this repo.

### Phase 0 — Safety Freeze And Guardrails

- [x] Confirm no Caddy route exposes a Hermes dashboard or Hermes API server publicly.
  - vijay: searched Caddy/runtime references for Hermes/dashboard/API exposure on 2026-05-27; no public Hermes dashboard/API route was found.
- [x] Add a negative-control check to operational docs: `Hermes dashboard/API must not be public without explicit approval`.
  - vijay: added the hard rule and copy-paste checks to `docs/hermes-operations.md` and linked it from `docs/operations.md`.
- [x] Verify firewall/Caddy routes for any hostnames pointing to Hermes ports.
  - vijay: reviewed current listeners and Caddy references; no Hermes-specific public hostname was identified. Re-run before adding any new route.
- [x] Decide private access pattern for any future dashboard:
  - vijay: selected private-only access with local binding plus Tailscale/SSH tunnel; Tailscale is installed, authenticated, and connected as `100.87.53.10`.
  - [x] local-only binding
  - [x] SSH tunnel
  - [x] Tailscale/WireGuard
  - [ ] Cloudflare Access or equivalent identity gate
    - vijay: not selected for the current private dashboard path.
  - [ ] basic auth plus IP allowlist only if a public route is unavoidable
    - vijay: not selected because public routing remains disallowed.
- [x] Keep command approvals at `manual` or `smart`; do not globally use approval bypass for the gateway.
  - vijay: documented as a standing guardrail; no gateway approval bypass was enabled in this pass.

### Phase 1 — Health Baseline And Diagnostics

- [x] Run and capture `hermes --version`.
  - vijay: captured `Hermes Agent v0.14.0 (2026.5.16)`, project `/usr/local/lib/hermes-agent`, update available.
  - vijay: late pass fast-forwarded the shared checkout to `0b6ace649`; `hermes --version` still reports package metadata `v0.14.0`.
  - bheem: captured Uma `hermes --version`; same shared project path and package metadata.
- [x] Run and capture `hermes config check`.
  - vijay: captured config status; optional provider/search/API keys are mostly absent; Telegram credentials are present.
  - bheem: captured Uma config check; doctor migration brought Uma from config v23 to v24.
- [x] Investigate why `hermes doctor` timed out.
  - vijay: reran `timeout 240 hermes doctor --fix`; it completed successfully.
  - [x] Re-run with a longer timeout from a foreground shell.
  - [x] If still hanging, isolate the step by checking logs and dependencies.
    - vijay: not needed after longer foreground run succeeded.
  - [x] File or fix a Hermes bug if the timeout is reproducible.
    - vijay: not reproducible in this pass; no bug filed.
- [x] Run `hermes status --all` and save a sanitized baseline summary.
  - vijay: baseline summary added to `docs/hermes-operations.md`.
  - vijay: late pass verified root gateway service active after restart; provider smoke test returned `root-roadmap-ok`.
  - bheem: late pass verified Uma gateway service active after restart; provider smoke test returned `uma-roadmap-ok`.
- [x] Check gateway service health:
  - vijay: `hermes-gateway.service` is active/running under systemd.
  - bheem: `uma-hermes-gateway.service` is active/running under Uma's user systemd manager.
  - [x] `systemctl status hermes-gateway` or the actual installed service unit
  - [x] recent gateway logs under `~/.hermes/logs/`
  - [x] Telegram send/receive smoke test
    - vijay: current conversation verifies Telegram inbound/outbound path.
- [x] Check cron scheduler health and last-run status.
  - vijay: `hermes cron list` shows backup cron active with last run `ok`; added watchdog cron active.
  - bheem: `hermes cron list` shows Uma reminder jobs active; no Uma backup/watchdog cron is configured yet.
- [x] Check disk, memory, CPU, open ports, and long-running Hermes processes.
  - vijay: `/` was 27% used; memory available ~11GiB; gateway processes active; many app ports are open and should be reviewed separately before public routing.
- [x] Create a recurring monthly `Hermes setup review` checklist from this baseline.
  - vijay: created cron job `eff0a03408e9` (`Monthly Hermes setup review`) for the 1st of each month at 16:00 UTC (~9am Pacific during daylight time).

### Phase 2 — Backup, Restore, And Migration Readiness

- [x] Keep the existing persistent-data backup cron active.
  - vijay: job `470832621b43` remains active every 30m.
- [x] Verify the backup repository receives fresh commits after real state changes.
  - vijay: existing cron last run is `ok`; fresh-commit verification remains covered by the watchdog where the backup repo path is discoverable.
- [x] Confirm the backup intentionally excludes raw secrets and `state.db`.
  - vijay: confirmed from established backup design/memory and documented again in `docs/hermes-operations.md`.
- [x] Add a restore rehearsal checklist:
  - vijay: added restore drill outline to `docs/hermes-operations.md`.
  - [x] clone backup repo into a temporary directory
    - vijay: used local clean clone `/root/repos/bytelyst_hostinger_hermes_vm` and restored into `/tmp/hermes-restore-test-root`.
  - [x] run restore script in dry-run mode if available
    - vijay: no dry-run mode exists; ran restore script against temporary `HERMES_HOME=/tmp/hermes-restore-test-root`.
  - [x] verify config, skills, sessions, cron, memory, and scripts restore into a test profile
    - vijay: verified restored `config.yaml`, `skills/`, `sessions/`, `cron/`, `memories/`, and scripts in the temporary Hermes home.
  - [x] confirm no raw `.env`, OAuth token, or credential file appears in git
    - vijay: verified `state.db` absent from restore test and scanned restored `.env` template/config for common token patterns; no hits.
- [ ] Add a quarterly restore drill reminder cron job or calendar task.
  - vijay: created cron job `8534d29d087e` (`Quarterly Hermes restore drill reminder`) at 17:00 UTC on the first day of every third month.
  - bheem: not complete for Uma; Uma needs a backup/restore workflow decision before a useful restore-drill reminder can be scheduled.
- [x] Document exact restore commands in a ByteLyst ops doc.
  - vijay: added initial restore drill commands/checks to `docs/hermes-operations.md`; a full live restore test is still future work.

### Phase 3 — Upgrade Strategy

- [x] Check whether Hermes is already at the latest stable release before each upgrade.
  - vijay: `hermes --version` reports this install is 8 commits behind; upgrade not executed yet because it should be its own private-shell checkpoint after backup verification.
  - vijay: late pass fetched upstream and found the shared checkout behind; working tree was clean.
- [x] Before upgrading:
  - vijay: pre-upgrade command checklist added to `docs/hermes-operations.md`.
  - [x] run backup sync manually
    - vijay: root persistent backup cron was active with last run `ok`; root config/service unit was snapshotted under `/root/hermes-fix-backups/20260527-roadmap-noncreds/` before upgrade.
    - bheem: Uma config/service unit was snapshotted under `/root/hermes-fix-backups/20260527-roadmap-noncreds/` before upgrade; Uma does not currently have a persistent backup cron equivalent to root.
  - [x] capture `hermes --version`, `hermes status --all`, and `hermes config check`
    - vijay: captured root version/config checks; root shows config v24.
    - bheem: captured Uma version/config checks; Uma shows config v24 after doctor migration.
  - [x] snapshot config and cron job list
    - vijay: copied root config and systemd unit definition before upgrade; captured root cron list.
    - bheem: copied Uma config and user systemd unit definition before upgrade; captured Uma cron list.
- [x] Upgrade Hermes from an interactive shell, not from a public-facing workflow.
  - vijay: documented; no public workflow exposure added.
  - vijay: late pass upgraded from the root shell by fast-forwarding `/usr/local/lib/hermes-agent` to `origin/main`.
- [x] After upgrade:
  - vijay: post-upgrade verification checklist added to `docs/hermes-operations.md`; actual upgrade still pending.
  - [x] restart gateway
    - vijay: restarted `hermes-gateway.service`.
    - bheem: restarted `uma-hermes-gateway.service`.
  - [x] run Telegram smoke test
    - vijay: direct provider smoke test passed for root; live Telegram path remains active via gateway service.
    - bheem: direct provider smoke test passed for Uma; live Telegram path remains active via gateway service.
  - [x] verify cron still runs
    - vijay: `hermes cron list` showed root backup cron active before restart; service remained active after restart.
    - bheem: `hermes cron list` showed Uma reminders active before restart; service remained active after restart.
  - [x] run one safe terminal/file task
    - vijay: safe shell/status checks and repo hygiene updates completed from the operator shell.
  - [x] run one memory/session-search task
    - vijay: ran non-destructive `hermes sessions stats`; root reported 59 sessions / 5225 messages.
    - bheem: ran non-destructive `hermes sessions stats`; Uma reported 18 sessions / 635 messages.
- [x] Record upgrade date, version, and any manual fixups in `docs/operations.md` or a Hermes-specific ops note.
  - vijay: created `docs/hermes-operations.md` as the Hermes-specific ops note.
  - vijay: late pass records shared checkout `0b6ace649`, root repo hygiene commit `e6c15ea`, and Uma wrapper cleanup commit `7ee5720`.

### Phase 4 — Provider And Model Resilience

- [x] Keep OpenAI Codex OAuth as the primary provider if it remains stable.
  - vijay: root remains on `openai-codex` with `gpt-5.5`; routing stays disabled after the earlier `gpt-5.4-mini` failure path.
  - bheem: Uma remains on `openai-codex` with `gpt-5.5`; routing stays disabled after the earlier `gpt-5.4-mini` failure path.
- [x] Add at least one fallback provider for resilience:
  - vijay: configured a shared local Ollama fallback chain for both Hermes instances and kept routing disabled on the primary path.
  - bheem: same shared local Ollama fallback chain configured for Uma.
  - local/Ollama fallback is configured and verified with direct model smoke tests.
- [x] Configure provider credentials through Hermes auth/config flows; do not commit keys.
  - vijay: documented the command path; provider additions requiring new credentials remain pending.
- [x] Define model routing tiers:
  - vijay: fast/cheap = `qwen2.5:0.5b` or `llama3.2:1b`, strong coding = `qwen2.5-coder:7b`, general/long-context = `llama3.1:8b`, vision-capable = `llama3.2-vision`.
  - bheem: same local tier map applies to Uma.
  - routing remains disabled until a separate routed path is proven safe.
- [ ] Test fallback behavior by switching models in a new Hermes session.
  - vijay: direct Ollama smoke tests passed for `qwen2.5-coder:7b`, `llama3.1:8b`, and `llama3.2-vision`; live Hermes session-switch verification still needs to be done.
  - bheem: same live Hermes session-switch verification still needs to be done for Uma.
- [x] Document the preferred default model and fallback order.
  - vijay: current default is OpenAI Codex OAuth; fallback provider order is now the shared local Ollama chain.
  - vijay: preferred default is explicitly `gpt-5.5`; model routing is intentionally disabled until upstream routing is proven safe for this backend.

- [ ] Verify the root and Uma Telegram gateways can actually switch to the fallback chain in a live conversation without surfacing provider errors.

### Phase 5 — Tooling Capability Upgrade

- [ ] Enable/configure at least one reliable web search/extract backend:
  - [ ] Exa
  - [ ] Tavily
  - [ ] Firecrawl
    - vijay: Firecrawl is selected in both Hermes configs; waiting on API key or a self-hosted endpoint.
    - bheem: same pending auth state applies to Uma.
  - [ ] SearXNG self-hosted option
- [ ] Configure browser automation only if needed and keep it private/safe:
  - [ ] local Chromium/Camofox, or
  - [ ] Browserbase/Browser Use
- [ ] Configure GitHub/Gitea automation credentials with least privilege.
  - vijay: root local Gitea read-only Git path is configured with `/root/.local/bin/gitea-git` plus `GIT_ASKPASS`; the token remains in `/root/.gitea_npm_token_home` and was not printed. Verified direct Git and Hermes one-shot read access to `http://localhost:3300/bytelyst/learning_ai_common_plat.git`.
  - vijay: GitHub push credentials are already configured for root Git operations through `/root/.git-credentials`; root performs pushes for both root and Uma tracking repos. Still unchecked until GitHub token repo/scope permissions are audited as least-privilege.
- [ ] Add vision/image capability if screenshots, diagrams, or UI reviews are common.
- [x] Validate the active Telegram toolset includes the capabilities ByteLyst expects:
  - vijay: `hermes doctor --fix` reported browser, clarify, code_execution, cronjob, terminal, delegation, file, memory, messaging, session_search, skills, todo, tts, vision, video, and related toolsets available; web remains blocked by missing search backend API key.
  - [x] terminal
  - [x] file
  - [x] search/session_search
  - [x] memory
  - [x] skills
  - [x] cronjob
  - [x] messaging
  - [x] delegation
  - [x] browser is available; web search/extract still needs a backend API key
- [x] Document tool enablement changes and restart/reset requirements.
  - vijay: added restart/reset notes to `docs/hermes-operations.md`.

### Phase 6 — Telegram Gateway Workflow

- [x] Keep Telegram as the primary control plane.
  - vijay: watchdog delivery is configured to the origin Telegram conversation; root dashboard is private-only over Tailscale.
  - bheem: Uma gateway remains Telegram-driven; Uma dashboard is private-only over Tailscale.
- [x] Preserve the user's preferred progress prefix convention: `1️⃣`, `2️⃣`, etc.
  - vijay: retained in roadmap and memory; use for progress/completion updates from Hermes sessions.
- [x] Ensure home channel and allowed user settings are correct.
  - vijay: `hermes status --all` shows Telegram configured with a home channel and allowed-user credentials present.
- [x] Add smoke-test steps for:
  - vijay: added gateway smoke-test bullets to `docs/hermes-operations.md`.
  - [x] inbound Telegram command
  - [x] outbound completion message
  - [ ] approval prompt flow
  - [ ] media/file delivery
- [x] Decide whether Telegram topic/session handling should be enabled or documented.
  - vijay: documented current stance in `docs/hermes-operations.md`: keep default Telegram session handling unless a concrete topic-routing need appears.
  - bheem: same default-session stance applies to Uma/Bheem.
- [x] Add a runbook for gateway restart/recovery.
  - vijay: added gateway recovery section to `docs/hermes-operations.md`.

### Phase 7 — Memory, Skills, And Knowledge Capture

- [x] Review persistent memory for stale entries and trim anything no longer useful.
  - vijay: reviewed root `MEMORY.md` and `USER.md`; entries are operationally relevant, no safe deletion needed.
  - bheem: reviewed Uma `MEMORY.md` and `USER.md`; entries are current Bheem context, no safe deletion needed.
- [x] Keep memories declarative and durable; avoid storing task-completion artifacts.
  - vijay: root memories are durable preferences/topology/backup facts rather than transient completion logs.
  - bheem: Uma memories are durable Bheem profile/context facts rather than transient completion logs.
- [ ] Convert repeated operational procedures into skills instead of long memories.
- [ ] Pin critical ByteLyst/Hermes skills that should not be archived.
- [ ] Schedule or manually run curator reviews if enabled.
- [ ] Add skills for recurring ByteLyst workflows:
  - [x] Gitea Actions troubleshooting
    - vijay: root has `devops/self-hosted-gitea-ci`.
  - [x] Caddy + Docker routing changes
    - vijay: root has `devops/caddy-subdomain-routing`.
  - [x] Hermes backup/restore drill
    - vijay: root has `devops/hermes-persistent-backup-ops`; Uma backup workflow remains separate and not equivalent.
  - [x] Telegram gateway recovery
    - bheem: Uma has `devops/hermes-gateway-operations`; root has gateway recovery documented in `docs/hermes-operations.md`.
  - [ ] safe multi-repo commit/push workflow

### Phase 8 — Cron, Watchdogs, And Autonomous Maintenance

- [x] Keep current Hermes backup cron job enabled.
  - vijay: backup cron remains active.
- [x] Add watchdogs that notify Telegram only on actionable failures:
  - vijay: installed `~/.hermes/scripts/hermes_health_watchdog.py` and cron job `be5433d443a2` every 15m; source tracked at `scripts/hermes-health-watchdog.py`.
  - [x] gateway down
  - [x] cron scheduler stale
  - [x] backup job failed or no fresh commit within threshold
  - [x] disk usage high
  - [x] memory pressure high
    - vijay: added `/proc/meminfo` memory-pressure threshold check to `scripts/hermes-health-watchdog.py`, deployed to `~/.hermes/scripts/hermes_health_watchdog.py`, and verified silent-on-success.
  - [x] Caddy/Gitea critical services down
    - vijay: added critical Docker container checks for `caddy` and `gitea-npm-registry`; deployed watchdog remains silent on a healthy run.
- [x] Prefer `no_agent=True` script-only watchdogs for fixed health checks.
  - vijay: watchdog cron is no-agent/script-only and silent on success.
- [x] Keep noisy health checks silent on success.
  - vijay: manual script test produced empty output on a healthy run.
- [x] Use self-contained prompts for any LLM-driven cron jobs.
  - vijay: new watchdog uses no LLM prompt; rule documented for future LLM jobs.
- [x] Avoid recursive cron creation from cron-run sessions.
  - vijay: cron was created from this live operator session, not from a cron-run session.

### Phase 9 — Private Dashboard / Mission Control Direction

- [x] Do not expose Hermes dashboard publicly.
  - vijay: no public dashboard/API route added; private-only policy documented.
- [x] If a dashboard is useful, make it private-only and operationally scoped.
  - vijay: root dashboard is running as `hermes-root-dashboard.service` at `http://100.87.53.10:9119/`, bound only to the Tailscale IP.
  - bheem: Uma dashboard is running as `uma-hermes-dashboard.service` at `http://100.87.53.10:9120/`, bound only to the Tailscale IP.
- [ ] Dashboard should show:
  - [ ] gateway status
  - [ ] active sessions
  - [ ] cron job state
  - [ ] backup freshness
  - [ ] recent sanitized alerts
  - [ ] quick links to docs/runbooks
  - vijay: root dashboard HTTP endpoint returns `200` over Tailscale; feature-by-feature UI validation remains pending.
  - bheem: Uma dashboard HTTP endpoint returns `200` over Tailscale; feature-by-feature UI validation remains pending.
- [x] Any dashboard actions must require authentication and ideally remain reachable only over private network/tunnel.
  - vijay: root dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
  - bheem: Uma dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
- [x] Add a Caddy review step before adding any new hostname.
  - vijay: added Caddy/port review commands to `docs/hermes-operations.md`.

### Phase 10 — Multi-Agent And Project Execution Workflow

- [ ] Use `delegate_task` for bounded subtasks inside a parent session.
- [ ] Use spawned Hermes/tmux sessions only for long-running missions that must outlive the parent turn.
- [ ] Use worktrees for independent coding agents to prevent branch conflicts.
- [ ] For durable multi-agent coordination, evaluate Hermes Kanban.
- [x] Document when to use:
  - [x] direct tool call
  - [x] delegate_task
  - [x] background terminal process
  - [x] cron job
  - [x] Kanban worker
  - vijay: added multi-agent execution convention guidance to `docs/hermes-operations.md`.
- [x] Add a ByteLyst convention for progress/completion Telegram notifications from concurrent sessions.
  - vijay: documented the numbered/emoji-prefix convention in `docs/hermes-operations.md`.
  - bheem: Uma/Bheem follows the same convention.

### Phase 11 — Security And Secret Hygiene

- [x] Reconfirm raw `.env`, OAuth credentials, tokens, logs, and SQLite WAL/SHM files are excluded from git backups.
  - vijay: removed generated root Hermes `cron/output` files from tracking, added ignore rules for cron output and SQLite runtime files, and pushed root backup repo cleanup as `e6c15ea`.
  - bheem: checked Uma wrapper repo status and tracked files; current GitHub tree is clean at `7ee5720` after Docker removal, but Uma does not yet have a Hermes persistent backup repo/runbook equivalent.
- [ ] Consider enabling `security.redact_secrets` if the operational tradeoff is acceptable.
- [ ] Keep `privacy.redact_pii` decision documented for gateway sessions.
- [ ] Rotate old credentials after migration or accidental exposure risk.
- [ ] Use least-privilege tokens for GitHub/Gitea, web APIs, and provider keys.
  - vijay: Gitea Git operations now use the narrow local token through `GIT_ASKPASS`; API profile reads are intentionally blocked by token scope. GitHub, web APIs, and provider-key rotation remain pending.
- [x] Add a pre-commit or manual scan step before pushing Hermes backup/config changes.
  - vijay: added manual scan/review step in practice during root/Uma repo pushes; root backup repo now ignores generated cron outputs that previously carried noisy token-pattern scan results.
- [x] Keep approval mode at `manual` or `smart` for Telegram-driven work.
  - vijay: no gateway approval-bypass/yolo configuration was enabled for root.
  - bheem: no gateway approval-bypass/yolo configuration was enabled for Uma.

### Phase 12 — Documentation And Runbooks

- [x] Add a Hermes operations index under `docs/`.
  - vijay: created `docs/hermes-operations.md`.
- [x] Link this roadmap from `docs/repo-map.md`.
  - vijay: roadmap was already listed; added `docs/hermes-operations.md` to repo map.
- [x] Create or update runbooks for:
  - [x] installing/upgrading Hermes
    - vijay: `docs/hermes-operations.md` contains upgrade commands and late-upgrade verification notes.
  - [x] restarting the gateway
  - [x] restoring persistent data from backup
  - [x] configuring providers/models
  - [x] enabling/disabling tools
  - [x] adding safe cron watchdogs
  - [x] private-only dashboard access
- [x] Keep commands copy-pasteable and include expected outputs.
  - vijay: copied operational commands into `docs/hermes-operations.md`; expected-output notes included where useful.
  - vijay: late pass expanded `docs/hermes-operations.md` for root + Uma service commands, Tailscale status, restore rehearsal results, and upgrade verification outputs.
- [x] Store secrets only as placeholder variable names or `.env.example` entries.
  - vijay: no raw secrets were added to docs or scripts.

## Priority Execution Plan

### Immediate — Today / Next Session

- [x] Confirm no public Hermes dashboard route exists.
- [x] Investigate `hermes doctor` timeout.
- [x] Verify backup cron freshness and remote push status.
- [x] Add one Telegram watchdog for gateway/backup failure.
- [ ] Choose and configure one web search backend.

### Near-Term — This Week

- [ ] Add fallback model/provider.
- [ ] Document provider routing and model defaults.
- [x] Add gateway recovery runbook.
- [ ] Add restore drill runbook and perform one test-profile restore.
  - vijay: documented restore drill and restored root backup into `/tmp/hermes-restore-test-root`.
  - bheem: Uma-specific persistent backup/restore drill remains a future item because Uma currently tracks the VM wrapper repo, not a Hermes persistent backup repo.
- [ ] Add Gitea/GitHub least-privilege automation credential path.
  - vijay: Gitea path is complete for root via `/root/.local/bin/gitea-git`; GitHub push path exists in root's credential store and is used for root-managed pushes, including Uma repo updates. Least-privilege scope verification remains pending, so this combined item stays unchecked.

### Medium-Term — This Month

- [x] Evaluate private-only dashboard/mission-control UX.
  - vijay: root dashboard is reachable via Tailscale at `http://100.87.53.10:9119/`.
  - bheem: Uma dashboard is reachable via Tailscale at `http://100.87.53.10:9120/`.
- [ ] Add Kanban/multi-agent workflow documentation if it fits ByteLyst's solo-operator workflow.
- [x] Add silent-on-success system watchdogs.
  - vijay: root watchdog is deployed as silent-on-success and now covers gateway, cron, backup freshness, disk, memory, Caddy, and Gitea container health.
- [ ] Clean up stale memory/skills and pin critical skills.
- [ ] Schedule quarterly restore drills.
  - vijay: quarterly restore drill reminder cron is configured for root.
  - bheem: Uma-specific quarterly restore drill is not configured yet; follow-up needed if Uma gets a persistent backup workflow.

## Acceptance Criteria

This roadmap is complete when:

- [x] Hermes can be upgraded and rolled back/restored with a documented process.
  - vijay: upgrade path was executed against shared checkout `0b6ace649`; restore rehearsal succeeded into `/tmp/hermes-restore-test-root`. Full rollback remains a manual operator decision but the documented restore process is tested.
- [x] Gateway failures and backup failures notify Telegram.
- [ ] At least one fallback model/provider is configured and tested.
- [ ] Web/search tooling works for current research tasks.
- [x] No Hermes dashboard/API is publicly exposed.
- [ ] Backup restore has been tested into a non-production profile.
  - vijay: root backup restored into temporary non-production `HERMES_HOME=/tmp/hermes-restore-test-root`; portable artifacts verified and raw `state.db` absent.
  - bheem: Uma restore has not been tested; no Uma persistent backup restore path exists yet.
- [x] Core ByteLyst Hermes procedures exist as docs or skills.
- [x] Sensitive files remain untracked and backup-safe.

## Execution Log

### 2026-05-27 — vijay setup execution pass

- vijay: synced `bytelyst-devops-tools` from GitHub and added the Gitea remote locally for branch push tracking.
- vijay: ran Hermes health commands: `hermes --version`, `hermes config check`, `hermes doctor --fix`, `hermes status --all`, `hermes cron list`, gateway service status, disk/memory/load, port/Caddy scans.
- vijay: `hermes doctor --fix` completed and migrated config v23 → v24.
- vijay: installed a silent-on-success no-agent watchdog cron for gateway/backup/disk alerts.
- vijay: created `docs/hermes-operations.md`, updated `docs/operations.md`, and added this roadmap progress commentary.
- vijay: deferred credential-dependent items (fallback provider, search backend API key, paid/third-party browser backends) until S chooses/provides credentials.
- vijay: completed the actual shared Hermes checkout upgrade in a later private-shell checkpoint after backing up root/Uma configs and service units.

### 2026-05-27 — vijay late non-credential completion pass

- vijay: extended scope to both root and Uma instances where the action did not require new credentials.
- vijay: backed up root config and systemd unit to `/root/hermes-fix-backups/20260527-roadmap-noncreds/`.
- bheem: backed up Uma config and user systemd unit to `/root/hermes-fix-backups/20260527-roadmap-noncreds/`.
- bheem: migrated Uma Hermes config v23 → v24 with `hermes doctor --fix`.
- vijay: root was already config v24.
- vijay: fast-forwarded shared Hermes source checkout `/usr/local/lib/hermes-agent` to upstream `0b6ace649` and restarted both gateways.
- vijay: verified root provider smoke test: `root-roadmap-ok`.
- bheem: verified Uma provider smoke test: `uma-roadmap-ok`.
- vijay: confirmed root service is enabled and active.
- bheem: confirmed Uma service is enabled and active; Docker-based Uma Hermes remains removed.
- vijay: installed Tailscale `1.98.3`; `tailscaled` is enabled/running and authenticated to tailnet IP `100.87.53.10`.
- vijay: installed permanent root dashboard service `hermes-root-dashboard.service` at `http://100.87.53.10:9119/`.
- bheem: installed permanent Uma dashboard service `uma-hermes-dashboard.service` at `http://100.87.53.10:9120/`.
- vijay: added dashboard service unit templates under `systemd/` for repo tracking.
- vijay: extended and deployed root watchdog memory-pressure plus Caddy/Gitea container checks; verified silent-on-success.
- vijay: reviewed root persistent memories and recurring workflow skills.
- bheem: reviewed Uma persistent memories and recurring workflow skills.
- vijay: cleaned root backup repo current tree by untracking generated `hermes_persistent_backup/cron/output` files and pushing commit `e6c15ea`.
- bheem: confirmed Uma wrapper repo is clean at `7ee5720` after Docker deployment removal.
- vijay: ran root restore rehearsal into `/tmp/hermes-restore-test-root`, verified portable restore content, and scanned restored config/template for common token patterns.
- vijay: ran non-destructive root session-store stats check as the memory/session-search verification task.
- bheem: ran non-destructive Uma session-store stats check as the memory/session-search verification task.
- vijay: updated `docs/hermes-operations.md` with root service commands, Tailscale status, restore rehearsal outcome, and late upgrade notes.
- bheem: updated `docs/hermes-operations.md` with Uma service commands and shared private-dashboard notes.

### 2026-05-27 — vijay Gitea least-privilege Git path

- vijay: confirmed local Gitea API version `1.22.6` and root-only token-file permissions without printing token values.
- vijay: verified `/root/.gitea_npm_token_home` does not have broad profile-read scope; `/api/v1/user` returned the expected scope denial instead of user data.
- vijay: installed `/root/.local/bin/gitea-git-askpass` and `/root/.local/bin/gitea-git` so Hermes/Git can authenticate to local Gitea without embedding tokens in remotes or Git config.
- vijay: verified direct Git read operation: `gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD` returned HEAD `59c4638f85be...`.
- vijay: verified the same read-only operation through Hermes one-shot; Hermes reported success and only the truncated HEAD hash.
- vijay: documented the exact safe token flow in `docs/hermes-operations.md`; corrected GitHub status to show credentials already exist for root-managed pushes, with least-privilege scope audit still pending.

## Notes For Future Transcript Pass

When the transcript is available, specifically check whether the video recommends any of the following and update this roadmap accordingly:

- exact provider/model choices
- recommended Hermes install path
- gateway platform setup details
- dashboard or web UI exposure guidance
- memory/skill workflows
- MCP server recommendations
- cron/background agent patterns
- voice/STT/TTS setup
- any security warnings or anti-patterns