bytelyst-devops-tools/docs/hermes-setup-upgrade-roadmap.md

37 KiB
Raw Permalink Blame History

Hermes Setup Upgrade Roadmap

Date: 2026-05-26 Execution update: 2026-05-27 Owner: ByteLyst / S Repo: bytelyst-devops-tools Video reference: Hermes Agent is the greatest AI tool ever made. Here's how to set it up by Alex Finn

Completion Status

  • Overall checklist completion: ~68% (122/179 checked after the 2026-05-27 Gitea/Hermes Git smoke test).
  • Credential-independent setup: materially further along; remaining blockers are mostly provider/search credentials, GitHub token scope audit, Uma backup design, and policy decisions.
  • vijay: percentage is based on literal Markdown checklist boxes, including nested sub-items. It intentionally counts credential-dependent future work as incomplete.

Remaining Unchecked Item Classification

  • Needs credentials/API keys: fallback provider setup, web search/extract backend, Browserbase/Browser Use, and provider fallback tests.
  • Needs credential audit: GitHub push credentials already exist for root Git operations, including root-managed pushes to Uma's GitHub repo; least-privilege scope still needs to be verified from GitHub.
  • Needs explicit policy decision: Cloudflare Access/basic-auth public fallback, model-routing tiers, local browser automation, vision/image provider choice, security.redact_secrets, privacy.redact_pii, and credential rotation.
  • Needs Uma backup design: Uma/Bheem currently has a clean VM wrapper repo, but not a root-style sanitized Hermes persistent backup/restore workflow.
  • Needs manual UX validation: dashboard feature-by-feature checks, Telegram approval prompt flow, and Telegram media/file delivery.
  • Needs future workflow adoption: practicing delegate_task, spawned/tmux sessions, worktrees, and Kanban on real tasks before checking them as completed.

Next To-Dos

The remaining work is now mostly hardening rather than feature delivery:

  • finish the GitHub/Gitea least-privilege audit for the root-managed push path
  • decide whether security.redact_secrets should be enabled by default
  • document the gateway-session privacy.redact_pii policy
  • rotate any credentials that were migrated or exposed during the setup work
  • tighten least-privilege token scopes for GitHub/Gitea, web APIs, and provider keys

Purpose

Turn the Hermes setup ideas from the referenced video into a practical ByteLyst upgrade checklist for this VM-backed, Telegram-driven Hermes installation.

This roadmap is intentionally operational: every item should either improve reliability, safety, agent capability, observability, or restore/migration readiness.

Transcript Review Status

Automated transcript retrieval was attempted through multiple paths:

  • Hermes youtube-content transcript helper using youtube-transcript-api
  • yt-dlp subtitle extraction
  • direct YouTube page/player metadata inspection
  • Invidious caption endpoints
  • third-party transcript endpoint probing

The video title and metadata were reachable, but transcript/subtitle retrieval was blocked by YouTube anti-bot checks from this VM/cloud IP. One Invidious endpoint confirmed an English auto-generated caption track exists, but returned an empty caption body.

Because the full transcript was not retrievable from the VM, this roadmap combines:

  1. the accessible video metadata and setup theme,
  2. Hermes Agent's current documented capabilities,
  3. the live health/status of this ByteLyst Hermes installation, and
  4. ByteLyst's existing operational preferences and safety constraints.

If a manual transcript is later pasted or uploaded, re-run this review and append a Transcript-Derived Delta section with any new actions.

Current ByteLyst Hermes Baseline

Observed on 2026-05-26:

  • Hermes version: v0.14.0 (2026.5.16) package metadata; shared checkout fast-forwarded to upstream 0b6ace649 on 2026-05-27
  • Project path: /usr/local/lib/hermes-agent
  • Active model/provider: gpt-5.5 via OpenAI Codex OAuth
  • Telegram gateway: configured and running under systemd
  • Scheduled jobs: 2 active, 2 total
    • Sync Hermes persistent-data backup to GitHub
    • schedule: every 30 minutes
    • delivery: local
    • script: sync_hermes_persistent_backup.py
    • last status: ok
  • Config version: 24 after hermes doctor --fix migration on 2026-05-27; root and Uma both verified at config v24
  • Telegram credentials are present
  • Most optional provider/API keys are not configured, including OpenRouter, Google/Gemini, Anthropic, Firecrawl/Tavily/Exa, Browserbase/Browser Use, FAL, and ElevenLabs
  • GitHub push credentials are configured for root Git operations through the root credential store; root also performs Uma repo pushes because root has access to https://github.com/umadev0931/uma_hostinger_hermes_vm
  • hermes doctor --fix completed on 2026-05-27; it migrated config v23 → v24 and left only manual provider/API-key setup as the main optional follow-up
  • User preference: do not expose the Hermes dashboard publicly

Target State

A healthy ByteLyst Hermes setup should be:

  • Private by default: no public dashboard exposure; private access through local shell, Telegram DM, SSH tunnel, Tailscale, or equivalent.
  • Recoverable: configuration, skills, memory, sessions, cron jobs, and scripts are backed up and periodically restore-tested.
  • Observable: gateway, cron, disk, memory, and backup failures surface to Telegram quickly.
  • Capable: web search/extraction, browser automation, GitHub/Gitea operations, vision, file, terminal, cron, memory, session search, and delegation are all configured where useful.
  • Safe: secrets are not committed, destructive commands remain approval-gated, public Caddy exposure is explicitly reviewed, and profiles isolate risky experiments.
  • Self-improving: recurring procedures are captured as skills; stale or wrong skills are patched immediately.

Roadmap Checklist

vijay: comments are root/ByteLyst Hermes implementation notes. bheem: comments are Uma Hermes implementation notes. Checked items are completed only when verified on the VM or documented in this repo.

Phase 0 — Safety Freeze And Guardrails

  • Confirm no Caddy route exposes a Hermes dashboard or Hermes API server publicly.
    • vijay: searched Caddy/runtime references for Hermes/dashboard/API exposure on 2026-05-27; no public Hermes dashboard/API route was found.
  • Add a negative-control check to operational docs: Hermes dashboard/API must not be public without explicit approval.
    • vijay: added the hard rule and copy-paste checks to docs/hermes-operations.md and linked it from docs/operations.md.
  • Verify firewall/Caddy routes for any hostnames pointing to Hermes ports.
    • vijay: reviewed current listeners and Caddy references; no Hermes-specific public hostname was identified. Re-run before adding any new route.
  • Decide private access pattern for any future dashboard:
    • vijay: selected private-only access with local binding plus Tailscale/SSH tunnel; Tailscale is installed, authenticated, and connected as 100.87.53.10.
    • local-only binding
    • SSH tunnel
    • Tailscale/WireGuard
    • Cloudflare Access or equivalent identity gate
      • vijay: not selected for the current private dashboard path.
    • basic auth plus IP allowlist only if a public route is unavoidable
      • vijay: not selected because public routing remains disallowed.
  • Keep command approvals at manual or smart; do not globally use approval bypass for the gateway.
    • vijay: documented as a standing guardrail; no gateway approval bypass was enabled in this pass.

Phase 1 — Health Baseline And Diagnostics

  • Run and capture hermes --version.
    • vijay: captured Hermes Agent v0.14.0 (2026.5.16), project /usr/local/lib/hermes-agent, update available.
    • vijay: late pass fast-forwarded the shared checkout to 0b6ace649; hermes --version still reports package metadata v0.14.0.
    • bheem: captured Uma hermes --version; same shared project path and package metadata.
  • Run and capture hermes config check.
    • vijay: captured config status; optional provider/search/API keys are mostly absent; Telegram credentials are present.
    • bheem: captured Uma config check; doctor migration brought Uma from config v23 to v24.
  • Investigate why hermes doctor timed out.
    • vijay: reran timeout 240 hermes doctor --fix; it completed successfully.
    • Re-run with a longer timeout from a foreground shell.
    • If still hanging, isolate the step by checking logs and dependencies.
      • vijay: not needed after longer foreground run succeeded.
    • File or fix a Hermes bug if the timeout is reproducible.
      • vijay: not reproducible in this pass; no bug filed.
  • Run hermes status --all and save a sanitized baseline summary.
    • vijay: baseline summary added to docs/hermes-operations.md.
    • vijay: late pass verified root gateway service active after restart; provider smoke test returned root-roadmap-ok.
    • bheem: late pass verified Uma gateway service active after restart; provider smoke test returned uma-roadmap-ok.
  • Check gateway service health:
    • vijay: hermes-gateway.service is active/running under systemd.
    • bheem: uma-hermes-gateway.service is active/running under Uma's user systemd manager.
    • systemctl status hermes-gateway or the actual installed service unit
    • recent gateway logs under ~/.hermes/logs/
    • Telegram send/receive smoke test
      • vijay: current conversation verifies Telegram inbound/outbound path.
  • Check cron scheduler health and last-run status.
    • vijay: hermes cron list shows backup cron active with last run ok; added watchdog cron active.
    • bheem: hermes cron list shows Uma reminder jobs active; no Uma backup/watchdog cron is configured yet.
  • Check disk, memory, CPU, open ports, and long-running Hermes processes.
    • vijay: / was 27% used; memory available ~11GiB; gateway processes active; many app ports are open and should be reviewed separately before public routing.
  • Create a recurring monthly Hermes setup review checklist from this baseline.
    • vijay: created cron job eff0a03408e9 (Monthly Hermes setup review) for the 1st of each month at 16:00 UTC (~9am Pacific during daylight time).

Phase 2 — Backup, Restore, And Migration Readiness

  • Keep the existing persistent-data backup cron active.
    • vijay: job 470832621b43 remains active every 30m.
  • Verify the backup repository receives fresh commits after real state changes.
    • vijay: existing cron last run is ok; fresh-commit verification remains covered by the watchdog where the backup repo path is discoverable.
  • Confirm the backup intentionally excludes raw secrets and state.db.
    • vijay: confirmed from established backup design/memory and documented again in docs/hermes-operations.md.
  • Add a restore rehearsal checklist:
    • vijay: added restore drill outline to docs/hermes-operations.md.
    • clone backup repo into a temporary directory
      • vijay: used local clean clone /root/repos/bytelyst_hostinger_hermes_vm and restored into /tmp/hermes-restore-test-root.
    • run restore script in dry-run mode if available
      • vijay: no dry-run mode exists; ran restore script against temporary HERMES_HOME=/tmp/hermes-restore-test-root.
    • verify config, skills, sessions, cron, memory, and scripts restore into a test profile
      • vijay: verified restored config.yaml, skills/, sessions/, cron/, memories/, and scripts in the temporary Hermes home.
    • confirm no raw .env, OAuth token, or credential file appears in git
      • vijay: verified state.db absent from restore test and scanned restored .env template/config for common token patterns; no hits.
  • Add a quarterly restore drill reminder cron job or calendar task.
    • vijay: created cron job 8534d29d087e (Quarterly Hermes restore drill reminder) at 17:00 UTC on the first day of every third month.
    • bheem: not complete for Uma; Uma needs a backup/restore workflow decision before a useful restore-drill reminder can be scheduled.
  • Document exact restore commands in a ByteLyst ops doc.
    • vijay: added initial restore drill commands/checks to docs/hermes-operations.md; a full live restore test is still future work.

Phase 3 — Upgrade Strategy

  • Check whether Hermes is already at the latest stable release before each upgrade.
    • vijay: hermes --version reports this install is 8 commits behind; upgrade not executed yet because it should be its own private-shell checkpoint after backup verification.
    • vijay: late pass fetched upstream and found the shared checkout behind; working tree was clean.
  • Before upgrading:
    • vijay: pre-upgrade command checklist added to docs/hermes-operations.md.
    • run backup sync manually
      • vijay: root persistent backup cron was active with last run ok; root config/service unit was snapshotted under /root/hermes-fix-backups/20260527-roadmap-noncreds/ before upgrade.
      • bheem: Uma config/service unit was snapshotted under /root/hermes-fix-backups/20260527-roadmap-noncreds/ before upgrade; Uma does not currently have a persistent backup cron equivalent to root.
    • capture hermes --version, hermes status --all, and hermes config check
      • vijay: captured root version/config checks; root shows config v24.
      • bheem: captured Uma version/config checks; Uma shows config v24 after doctor migration.
    • snapshot config and cron job list
      • vijay: copied root config and systemd unit definition before upgrade; captured root cron list.
      • bheem: copied Uma config and user systemd unit definition before upgrade; captured Uma cron list.
  • Upgrade Hermes from an interactive shell, not from a public-facing workflow.
    • vijay: documented; no public workflow exposure added.
    • vijay: late pass upgraded from the root shell by fast-forwarding /usr/local/lib/hermes-agent to origin/main.
  • After upgrade:
    • vijay: post-upgrade verification checklist added to docs/hermes-operations.md; actual upgrade still pending.
    • restart gateway
      • vijay: restarted hermes-gateway.service.
      • bheem: restarted uma-hermes-gateway.service.
    • run Telegram smoke test
      • vijay: direct provider smoke test passed for root; live Telegram path remains active via gateway service.
      • bheem: direct provider smoke test passed for Uma; live Telegram path remains active via gateway service.
    • verify cron still runs
      • vijay: hermes cron list showed root backup cron active before restart; service remained active after restart.
      • bheem: hermes cron list showed Uma reminders active before restart; service remained active after restart.
    • run one safe terminal/file task
      • vijay: safe shell/status checks and repo hygiene updates completed from the operator shell.
    • run one memory/session-search task
      • vijay: ran non-destructive hermes sessions stats; root reported 59 sessions / 5225 messages.
      • bheem: ran non-destructive hermes sessions stats; Uma reported 18 sessions / 635 messages.
  • Record upgrade date, version, and any manual fixups in docs/operations.md or a Hermes-specific ops note.
    • vijay: created docs/hermes-operations.md as the Hermes-specific ops note.
    • vijay: late pass records shared checkout 0b6ace649, root repo hygiene commit e6c15ea, and Uma wrapper cleanup commit 7ee5720.

Phase 4 — Provider And Model Resilience

  • Keep OpenAI Codex OAuth as the primary provider if it remains stable.

    • vijay: root remains on openai-codex with gpt-5.5; routing stays disabled after the earlier gpt-5.4-mini failure path.
    • bheem: Uma remains on openai-codex with gpt-5.5; routing stays disabled after the earlier gpt-5.4-mini failure path.
  • Add at least one fallback provider for resilience:

    • vijay: configured a shared local Ollama fallback chain for both Hermes instances and kept routing disabled on the primary path.
    • bheem: same shared local Ollama fallback chain configured for Uma.
    • local/Ollama fallback is configured and verified with direct model smoke tests.
  • Configure provider credentials through Hermes auth/config flows; do not commit keys.

    • vijay: documented the command path; provider additions requiring new credentials remain pending.
  • Define model routing tiers:

    • vijay: fast/cheap = qwen2.5-coder:1.5b or llama3.2:1b, strong coding = qwen2.5-coder:1.5b, general/fast fallback = llama3.2:1b, vision-capable = llama3.2-vision.
    • bheem: same local tier map applies to Uma.
    • routing remains disabled until a separate routed path is proven safe.
  • Test fallback behavior by switching models in a new Hermes session.

    • vijay: direct Ollama smoke tests passed for qwen2.5-coder:1.5b, llama3.2:1b, and llama3.2-vision; live Hermes session-switch verification passed for the root fallback chain after forcing the primary provider to fail.
    • bheem: same fallback-chain proof passed for the Uma profile as well.
  • Document the preferred default model and fallback order.

    • vijay: current default is OpenAI Codex OAuth; fallback provider order is now the shared local Ollama chain.
    • vijay: preferred default is explicitly gpt-5.5; model routing is intentionally disabled until upstream routing is proven safe for this backend.
  • Verify the root and Uma Telegram session path can switch to the fallback chain without surfacing provider errors.

    • vijay: Telegram platform-context sessions now fail over from a forced primary-provider error into the local Ollama chain and return FallbackTest.
    • bheem: same Telegram platform-context fallback proof passed for Uma.

Phase 5 — Tooling Capability Upgrade

  • Enable/configure at least one reliable web search/extract backend:
    • Exa
    • Tavily
    • Firecrawl
      • vijay: Firecrawl is selected in both Hermes configs and the local API key is now loaded for root.
      • bheem: same local Firecrawl configuration is loaded for Uma.
    • SearXNG self-hosted option
  • Configure browser automation only if needed and keep it private/safe:
    • vijay: local browser automation is enabled and smoke-tested over the private gateway.
    • bheem: Uma browser automation is enabled in the profile and available over the private gateway.
  • Configure GitHub/Gitea automation credentials with least privilege.
    • vijay: root local Gitea read-only Git path is configured with /root/.local/bin/gitea-git plus GIT_ASKPASS; the token remains in /root/.gitea_npm_token_home and was not printed. Verified direct Git and Hermes one-shot read access to http://localhost:3300/bytelyst/learning_ai_common_plat.git.
    • vijay: GitHub push credentials are already configured for root Git operations through /root/.git-credentials; root performs pushes for both root and Uma tracking repos. Still unchecked until GitHub token repo/scope permissions are audited as least-privilege.
  • Add vision/image capability if screenshots, diagrams, or UI reviews are common.
    • vijay: vision and image-generation toolsets are already enabled in the active Hermes toolset list.
    • bheem: the same toolset availability applies to Uma, including vision and image generation.
  • Validate the active Telegram toolset includes the capabilities ByteLyst expects:
    • vijay: hermes doctor --fix reported browser, clarify, code_execution, cronjob, terminal, delegation, file, memory, messaging, session_search, skills, todo, tts, vision, video, and related toolsets available; web remains blocked by missing search backend API key.
    • terminal
    • file
    • search/session_search
    • memory
    • skills
    • cronjob
    • messaging
    • delegation
    • browser is available; web search/extract still needs a backend API key
  • Document tool enablement changes and restart/reset requirements.
    • vijay: added restart/reset notes to docs/hermes-operations.md.

Phase 6 — Telegram Gateway Workflow

  • Keep Telegram as the primary control plane.
    • vijay: watchdog delivery is configured to the origin Telegram conversation; root dashboard is private-only over Tailscale.
    • bheem: Uma gateway remains Telegram-driven; Uma dashboard is private-only over Tailscale.
  • Preserve the user's preferred progress prefix convention: 1, 2, etc.
    • vijay: retained in roadmap and memory; use for progress/completion updates from Hermes sessions.
  • Ensure home channel and allowed user settings are correct.
    • vijay: hermes status --all shows Telegram configured with a home channel and allowed-user credentials present.
  • Add smoke-test steps for:
    • vijay: added gateway smoke-test bullets to docs/hermes-operations.md.
    • inbound Telegram command
    • outbound completion message
    • approval prompt flow
    • media/file delivery
  • Decide whether Telegram topic/session handling should be enabled or documented.
    • vijay: documented current stance in docs/hermes-operations.md: keep default Telegram session handling unless a concrete topic-routing need appears.
    • bheem: same default-session stance applies to Uma/Bheem.
  • Add a runbook for gateway restart/recovery.
    • vijay: added gateway recovery section to docs/hermes-operations.md.

Phase 7 — Memory, Skills, And Knowledge Capture

  • Review persistent memory for stale entries and trim anything no longer useful.
    • vijay: reviewed root MEMORY.md and USER.md; entries are operationally relevant, no safe deletion needed.
    • bheem: reviewed Uma MEMORY.md and USER.md; entries are current Bheem context, no safe deletion needed.
  • Keep memories declarative and durable; avoid storing task-completion artifacts.
    • vijay: root memories are durable preferences/topology/backup facts rather than transient completion logs.
    • bheem: Uma memories are durable Bheem profile/context facts rather than transient completion logs.
  • Convert repeated operational procedures into skills instead of long memories.
  • Pin critical ByteLyst/Hermes skills that should not be archived.
  • Schedule or manually run curator reviews if enabled.
  • Add skills for recurring ByteLyst workflows:
    • Gitea Actions troubleshooting
      • vijay: root has devops/self-hosted-gitea-ci.
    • Caddy + Docker routing changes
      • vijay: root has devops/caddy-subdomain-routing.
    • Hermes backup/restore drill
      • vijay: root has devops/hermes-persistent-backup-ops; Uma backup workflow remains separate and not equivalent.
    • Telegram gateway recovery
      • bheem: Uma has devops/hermes-gateway-operations; root has gateway recovery documented in docs/hermes-operations.md.
    • safe multi-repo commit/push workflow

Phase 8 — Cron, Watchdogs, And Autonomous Maintenance

  • Keep current Hermes backup cron job enabled.
    • vijay: backup cron remains active.
  • Add watchdogs that notify Telegram only on actionable failures:
    • vijay: installed ~/.hermes/scripts/hermes_health_watchdog.py and cron job be5433d443a2 every 15m; source tracked at scripts/hermes-health-watchdog.py.
    • gateway down
    • cron scheduler stale
    • backup job failed or no fresh commit within threshold
    • disk usage high
    • memory pressure high
      • vijay: added /proc/meminfo memory-pressure threshold check to scripts/hermes-health-watchdog.py, deployed to ~/.hermes/scripts/hermes_health_watchdog.py, and verified silent-on-success.
    • Caddy/Gitea critical services down
      • vijay: added critical Docker container checks for caddy and gitea-npm-registry; deployed watchdog remains silent on a healthy run.
  • Prefer no_agent=True script-only watchdogs for fixed health checks.
    • vijay: watchdog cron is no-agent/script-only and silent on success.
  • Keep noisy health checks silent on success.
    • vijay: manual script test produced empty output on a healthy run.
  • Use self-contained prompts for any LLM-driven cron jobs.
    • vijay: new watchdog uses no LLM prompt; rule documented for future LLM jobs.
  • Avoid recursive cron creation from cron-run sessions.
    • vijay: cron was created from this live operator session, not from a cron-run session.

Phase 9 — Private Dashboard / Mission Control Direction

  • Do not expose Hermes dashboard publicly.
    • vijay: no public dashboard/API route added; private-only policy documented.
  • If a dashboard is useful, make it private-only and operationally scoped.
    • vijay: root dashboard is running as hermes-root-dashboard.service at http://100.87.53.10:9119/, bound only to the Tailscale IP.
    • bheem: Uma dashboard is running as uma-hermes-dashboard.service at http://100.87.53.10:9120/, bound only to the Tailscale IP.
  • Dashboard should show:
    • gateway status
    • active sessions
    • cron job state
    • backup freshness
    • recent sanitized alerts
    • quick links to docs/runbooks
    • vijay: root live ops panel now shows gateway state, active sessions, cron state, backup freshness, sanitized alerts, and runbook links over Tailscale.
    • bheem: Uma live ops panel now shows the same operational fields over Tailscale.
  • Any dashboard actions must require authentication and ideally remain reachable only over private network/tunnel.
    • vijay: root dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
    • bheem: Uma dashboard is private-network-only via Tailscale IP binding; no public listener or Caddy route was added.
  • Add a Caddy review step before adding any new hostname.
    • vijay: added Caddy/port review commands to docs/hermes-operations.md.

Phase 10 — Multi-Agent And Project Execution Workflow

  • Use delegate_task for bounded subtasks inside a parent session.
  • Use spawned Hermes/tmux sessions only for long-running missions that must outlive the parent turn.
  • Use worktrees for independent coding agents to prevent branch conflicts.
  • For durable multi-agent coordination, evaluate Hermes Kanban.
  • Document when to use:
    • direct tool call
    • delegate_task
    • background terminal process
    • cron job
    • Kanban worker
    • vijay: added multi-agent execution convention guidance to docs/hermes-operations.md.
  • Add a ByteLyst convention for progress/completion Telegram notifications from concurrent sessions.
    • vijay: documented the numbered/emoji-prefix convention in docs/hermes-operations.md.
    • bheem: Uma/Bheem follows the same convention.

Phase 11 — Security And Secret Hygiene

  • Reconfirm raw .env, OAuth credentials, tokens, logs, and SQLite WAL/SHM files are excluded from git backups.
    • vijay: removed generated root Hermes cron/output files from tracking, added ignore rules for cron output and SQLite runtime files, and pushed root backup repo cleanup as e6c15ea.
    • bheem: checked Uma wrapper repo status and tracked files; current GitHub tree is clean at 7ee5720 after Docker removal, but Uma does not yet have a Hermes persistent backup repo/runbook equivalent.
  • Consider enabling security.redact_secrets if the operational tradeoff is acceptable.
  • Keep privacy.redact_pii decision documented for gateway sessions.
  • Rotate old credentials after migration or accidental exposure risk.
  • Use least-privilege tokens for GitHub/Gitea, web APIs, and provider keys.
    • vijay: Gitea Git operations now use the narrow local token through GIT_ASKPASS; API profile reads are intentionally blocked by token scope. GitHub, web APIs, and provider-key rotation remain pending.
  • Add a pre-commit or manual scan step before pushing Hermes backup/config changes.
    • vijay: added manual scan/review step in practice during root/Uma repo pushes; root backup repo now ignores generated cron outputs that previously carried noisy token-pattern scan results.
  • Keep approval mode at manual or smart for Telegram-driven work.
    • vijay: no gateway approval-bypass/yolo configuration was enabled for root.
    • bheem: no gateway approval-bypass/yolo configuration was enabled for Uma.

Phase 12 — Documentation And Runbooks

  • Add a Hermes operations index under docs/.
    • vijay: created docs/hermes-operations.md.
  • Link this roadmap from docs/repo-map.md.
    • vijay: roadmap was already listed; added docs/hermes-operations.md to repo map.
  • Create or update runbooks for:
    • installing/upgrading Hermes
      • vijay: docs/hermes-operations.md contains upgrade commands and late-upgrade verification notes.
    • restarting the gateway
    • restoring persistent data from backup
    • configuring providers/models
    • enabling/disabling tools
    • adding safe cron watchdogs
    • private-only dashboard access
  • Keep commands copy-pasteable and include expected outputs.
    • vijay: copied operational commands into docs/hermes-operations.md; expected-output notes included where useful.
    • vijay: late pass expanded docs/hermes-operations.md for root + Uma service commands, Tailscale status, restore rehearsal results, and upgrade verification outputs.
  • Store secrets only as placeholder variable names or .env.example entries.
    • vijay: no raw secrets were added to docs or scripts.

Priority Execution Plan

Immediate — Today / Next Session

  • Confirm no public Hermes dashboard route exists.
  • Investigate hermes doctor timeout.
  • Verify backup cron freshness and remote push status.
  • Add one Telegram watchdog for gateway/backup failure.
  • Choose and configure one web search backend.

Near-Term — This Week

  • Add fallback model/provider.
  • Document provider routing and model defaults.
  • Add gateway recovery runbook.
  • Add restore drill runbook and perform one test-profile restore.
    • vijay: documented restore drill and restored root backup into /tmp/hermes-restore-test-root.
    • bheem: Uma-specific persistent backup/restore drill remains a future item because Uma currently tracks the VM wrapper repo, not a Hermes persistent backup repo.
  • Add Gitea/GitHub least-privilege automation credential path.
    • vijay: Gitea path is complete for root via /root/.local/bin/gitea-git; GitHub push path exists in root's credential store and is used for root-managed pushes, including Uma repo updates. Least-privilege scope verification remains pending, so this combined item stays unchecked.

Medium-Term — This Month

  • Evaluate private-only dashboard/mission-control UX.
    • vijay: root dashboard is reachable via Tailscale at http://100.87.53.10:9119/.
    • bheem: Uma dashboard is reachable via Tailscale at http://100.87.53.10:9120/.
  • Add Kanban/multi-agent workflow documentation if it fits ByteLyst's solo-operator workflow.
  • Add silent-on-success system watchdogs.
    • vijay: root watchdog is deployed as silent-on-success and now covers gateway, cron, backup freshness, disk, memory, Caddy, and Gitea container health.
  • Clean up stale memory/skills and pin critical skills.
  • Schedule quarterly restore drills.
    • vijay: quarterly restore drill reminder cron is configured for root.
    • bheem: Uma-specific quarterly restore drill is not configured yet; follow-up needed if Uma gets a persistent backup workflow.

Acceptance Criteria

This roadmap is complete when:

  • Hermes can be upgraded and rolled back/restored with a documented process.
    • vijay: upgrade path was executed against shared checkout 0b6ace649; restore rehearsal succeeded into /tmp/hermes-restore-test-root. Full rollback remains a manual operator decision but the documented restore process is tested.
  • Gateway failures and backup failures notify Telegram.
  • At least one fallback model/provider is configured and tested.
  • Web/search tooling works for current research tasks.
  • No Hermes dashboard/API is publicly exposed.
  • Backup restore has been tested into a non-production profile.
    • vijay: root backup restored into temporary non-production HERMES_HOME=/tmp/hermes-restore-test-root; portable artifacts verified and raw state.db absent.
    • bheem: Uma restore has not been tested; no Uma persistent backup restore path exists yet.
  • Core ByteLyst Hermes procedures exist as docs or skills.
  • Sensitive files remain untracked and backup-safe.

Execution Log

2026-05-27 — vijay setup execution pass

  • vijay: synced bytelyst-devops-tools from GitHub and added the Gitea remote locally for branch push tracking.
  • vijay: ran Hermes health commands: hermes --version, hermes config check, hermes doctor --fix, hermes status --all, hermes cron list, gateway service status, disk/memory/load, port/Caddy scans.
  • vijay: hermes doctor --fix completed and migrated config v23 → v24.
  • vijay: installed a silent-on-success no-agent watchdog cron for gateway/backup/disk alerts.
  • vijay: created docs/hermes-operations.md, updated docs/operations.md, and added this roadmap progress commentary.
  • vijay: deferred credential-dependent items (fallback provider, search backend API key, paid/third-party browser backends) until S chooses/provides credentials.
  • vijay: completed the actual shared Hermes checkout upgrade in a later private-shell checkpoint after backing up root/Uma configs and service units.

2026-05-27 — vijay late non-credential completion pass

  • vijay: extended scope to both root and Uma instances where the action did not require new credentials.
  • vijay: backed up root config and systemd unit to /root/hermes-fix-backups/20260527-roadmap-noncreds/.
  • bheem: backed up Uma config and user systemd unit to /root/hermes-fix-backups/20260527-roadmap-noncreds/.
  • bheem: migrated Uma Hermes config v23 → v24 with hermes doctor --fix.
  • vijay: root was already config v24.
  • vijay: fast-forwarded shared Hermes source checkout /usr/local/lib/hermes-agent to upstream 0b6ace649 and restarted both gateways.
  • vijay: verified root provider smoke test: root-roadmap-ok.
  • bheem: verified Uma provider smoke test: uma-roadmap-ok.
  • vijay: confirmed root service is enabled and active.
  • bheem: confirmed Uma service is enabled and active; Docker-based Uma Hermes remains removed.
  • vijay: installed Tailscale 1.98.3; tailscaled is enabled/running and authenticated to tailnet IP 100.87.53.10.
  • vijay: installed permanent root dashboard service hermes-root-dashboard.service at http://100.87.53.10:9119/.
  • bheem: installed permanent Uma dashboard service uma-hermes-dashboard.service at http://100.87.53.10:9120/.
  • vijay: added dashboard service unit templates under systemd/ for repo tracking.
  • vijay: extended and deployed root watchdog memory-pressure plus Caddy/Gitea container checks; verified silent-on-success.
  • vijay: reviewed root persistent memories and recurring workflow skills.
  • bheem: reviewed Uma persistent memories and recurring workflow skills.
  • vijay: cleaned root backup repo current tree by untracking generated hermes_persistent_backup/cron/output files and pushing commit e6c15ea.
  • bheem: confirmed Uma wrapper repo is clean at 7ee5720 after Docker deployment removal.
  • vijay: ran root restore rehearsal into /tmp/hermes-restore-test-root, verified portable restore content, and scanned restored config/template for common token patterns.
  • vijay: ran non-destructive root session-store stats check as the memory/session-search verification task.
  • bheem: ran non-destructive Uma session-store stats check as the memory/session-search verification task.
  • vijay: updated docs/hermes-operations.md with root service commands, Tailscale status, restore rehearsal outcome, and late upgrade notes.
  • bheem: updated docs/hermes-operations.md with Uma service commands and shared private-dashboard notes.

2026-05-27 — vijay Gitea least-privilege Git path

  • vijay: confirmed local Gitea API version 1.22.6 and root-only token-file permissions without printing token values.
  • vijay: verified /root/.gitea_npm_token_home does not have broad profile-read scope; /api/v1/user returned the expected scope denial instead of user data.
  • vijay: installed /root/.local/bin/gitea-git-askpass and /root/.local/bin/gitea-git so Hermes/Git can authenticate to local Gitea without embedding tokens in remotes or Git config.
  • vijay: verified direct Git read operation: gitea-git ls-remote http://localhost:3300/bytelyst/learning_ai_common_plat.git HEAD returned HEAD 59c4638f85be....
  • vijay: verified the same read-only operation through Hermes one-shot; Hermes reported success and only the truncated HEAD hash.
  • vijay: documented the exact safe token flow in docs/hermes-operations.md; corrected GitHub status to show credentials already exist for root-managed pushes, with least-privilege scope audit still pending.

Notes For Future Transcript Pass

When the transcript is available, specifically check whether the video recommends any of the following and update this roadmap accordingly:

  • exact provider/model choices
  • recommended Hermes install path
  • gateway platform setup details
  • dashboard or web UI exposure guidance
  • memory/skill workflows
  • MCP server recommendations
  • cron/background agent patterns
  • voice/STT/TTS setup
  • any security warnings or anti-patterns