bytelyst/bytelyst-devops-tools

Fork 0

root e57038a6a2

pre-commit / pre-commit (push) Waiting to run

Details

docs: advance Hermes setup roadmap

2026-05-27 10:12:27 +00:00

20 KiB

Raw Blame History

Hermes Setup Upgrade Roadmap

Date: 2026-05-26 Execution update: 2026-05-27 Owner: ByteLyst / S Repo: bytelyst-devops-tools Video reference: Hermes Agent is the greatest AI tool ever made. Here's how to set it up by Alex Finn

Purpose

Turn the Hermes setup ideas from the referenced video into a practical ByteLyst upgrade checklist for this VM-backed, Telegram-driven Hermes installation.

This roadmap is intentionally operational: every item should either improve reliability, safety, agent capability, observability, or restore/migration readiness.

Transcript Review Status

Automated transcript retrieval was attempted through multiple paths:

Hermes youtube-content transcript helper using youtube-transcript-api
yt-dlp subtitle extraction
direct YouTube page/player metadata inspection
Invidious caption endpoints
third-party transcript endpoint probing

The video title and metadata were reachable, but transcript/subtitle retrieval was blocked by YouTube anti-bot checks from this VM/cloud IP. One Invidious endpoint confirmed an English auto-generated caption track exists, but returned an empty caption body.

Because the full transcript was not retrievable from the VM, this roadmap combines:

the accessible video metadata and setup theme,
Hermes Agent's current documented capabilities,
the live health/status of this ByteLyst Hermes installation, and
ByteLyst's existing operational preferences and safety constraints.

If a manual transcript is later pasted or uploaded, re-run this review and append a Transcript-Derived Delta section with any new actions.

Current ByteLyst Hermes Baseline

Observed on 2026-05-26:

Hermes version: v0.14.0 (2026.5.16); hermes --version reports an update available (8 commits behind)
Project path: /usr/local/lib/hermes-agent
Active model/provider: gpt-5.5 via OpenAI Codex OAuth
Telegram gateway: configured and running under systemd
Scheduled jobs: 2 active, 2 total
- Sync Hermes persistent-data backup to GitHub
- schedule: every 30 minutes
- delivery: local
- script: sync_hermes_persistent_backup.py
- last status: ok
Config version: 24 after hermes doctor --fix migration on 2026-05-27
Telegram credentials are present
Most optional provider/API keys are not configured, including OpenRouter, Google/Gemini, Anthropic, Firecrawl/Tavily/Exa, Browserbase/Browser Use, GitHub token, FAL, and ElevenLabs
hermes doctor --fix completed on 2026-05-27; it migrated config v23 → v24 and left only manual provider/API-key setup as the main optional follow-up
User preference: do not expose the Hermes dashboard publicly

Target State

A healthy ByteLyst Hermes setup should be:

Private by default: no public dashboard exposure; private access through local shell, Telegram DM, SSH tunnel, Tailscale, or equivalent.
Recoverable: configuration, skills, memory, sessions, cron jobs, and scripts are backed up and periodically restore-tested.
Observable: gateway, cron, disk, memory, and backup failures surface to Telegram quickly.
Capable: web search/extraction, browser automation, GitHub/Gitea operations, vision, file, terminal, cron, memory, session search, and delegation are all configured where useful.
Safe: secrets are not committed, destructive commands remain approval-gated, public Caddy exposure is explicitly reviewed, and profiles isolate risky experiments.
Self-improving: recurring procedures are captured as skills; stale or wrong skills are patched immediately.

Roadmap Checklist

vijay: comments are live implementation notes from the 2026-05-27 setup execution pass. Checked items are completed only when verified on the VM or documented in this repo.

Phase 0 — Safety Freeze And Guardrails

Confirm no Caddy route exposes a Hermes dashboard or Hermes API server publicly.
- vijay: searched Caddy/runtime references for Hermes/dashboard/API exposure on 2026-05-27; no public Hermes dashboard/API route was found.
Add a negative-control check to operational docs: Hermes dashboard/API must not be public without explicit approval.
- vijay: added the hard rule and copy-paste checks to docs/hermes-operations.md and linked it from docs/operations.md.
Verify firewall/Caddy routes for any hostnames pointing to Hermes ports.
- vijay: reviewed current listeners and Caddy references; no Hermes-specific public hostname was identified. Re-run before adding any new route.
Decide private access pattern for any future dashboard:
- local-only binding
- SSH tunnel
- Tailscale/WireGuard
- Cloudflare Access or equivalent identity gate
- basic auth plus IP allowlist only if a public route is unavoidable
Keep command approvals at manual or smart; do not globally use approval bypass for the gateway.
- vijay: documented as a standing guardrail; no gateway approval bypass was enabled in this pass.

Phase 1 — Health Baseline And Diagnostics

Run and capture hermes --version.
- vijay: captured Hermes Agent v0.14.0 (2026.5.16), project /usr/local/lib/hermes-agent, update available.
Run and capture hermes config check.
- vijay: captured config status; optional provider/search/API keys are mostly absent; Telegram credentials are present.
Investigate why hermes doctor timed out.
- vijay: reran timeout 240 hermes doctor --fix; it completed successfully.
- Re-run with a longer timeout from a foreground shell.
- If still hanging, isolate the step by checking logs and dependencies.
  - vijay: not needed after longer foreground run succeeded.
- File or fix a Hermes bug if the timeout is reproducible.
  - vijay: not reproducible in this pass; no bug filed.
Run hermes status --all and save a sanitized baseline summary.
- vijay: baseline summary added to docs/hermes-operations.md.
Check gateway service health:
- vijay: hermes-gateway.service is active/running under systemd.
- systemctl status hermes-gateway or the actual installed service unit
- recent gateway logs under ~/.hermes/logs/
- Telegram send/receive smoke test
  - vijay: current conversation verifies Telegram inbound/outbound path.
Check cron scheduler health and last-run status.
- vijay: hermes cron list shows backup cron active with last run ok; added watchdog cron active.
Check disk, memory, CPU, open ports, and long-running Hermes processes.
- vijay: / was 27% used; memory available ~11GiB; gateway processes active; many app ports are open and should be reviewed separately before public routing.
Create a recurring monthly Hermes setup review checklist from this baseline.
- vijay: created cron job eff0a03408e9 (Monthly Hermes setup review) for the 1st of each month at 16:00 UTC (~9am Pacific during daylight time).

Phase 2 — Backup, Restore, And Migration Readiness

Keep the existing persistent-data backup cron active.
- vijay: job 470832621b43 remains active every 30m.
Verify the backup repository receives fresh commits after real state changes.
- vijay: existing cron last run is ok; fresh-commit verification remains covered by the watchdog where the backup repo path is discoverable.
Confirm the backup intentionally excludes raw secrets and state.db.
- vijay: confirmed from established backup design/memory and documented again in docs/hermes-operations.md.
Add a restore rehearsal checklist:
- vijay: added restore drill outline to docs/hermes-operations.md.
- clone backup repo into a temporary directory
- run restore script in dry-run mode if available
- verify config, skills, sessions, cron, memory, and scripts restore into a test profile
- confirm no raw .env, OAuth token, or credential file appears in git
Add a quarterly restore drill reminder cron job or calendar task.
- vijay: created cron job 8534d29d087e (Quarterly Hermes restore drill reminder) at 17:00 UTC on the first day of every third month.
Document exact restore commands in a ByteLyst ops doc.
- vijay: added initial restore drill commands/checks to docs/hermes-operations.md; a full live restore test is still future work.

Phase 3 — Upgrade Strategy

Check whether Hermes is already at the latest stable release before each upgrade.
- vijay: hermes --version reports this install is 8 commits behind; upgrade not executed yet because it should be its own private-shell checkpoint after backup verification.
Before upgrading:
- vijay: pre-upgrade command checklist added to docs/hermes-operations.md.
- run backup sync manually
- capture hermes --version, hermes status --all, and hermes config check
- snapshot config and cron job list
Upgrade Hermes from an interactive shell, not from a public-facing workflow.
- vijay: documented; no public workflow exposure added.
After upgrade:
- vijay: post-upgrade verification checklist added to docs/hermes-operations.md; actual upgrade still pending.
- restart gateway
- run Telegram smoke test
- verify cron still runs
- run one safe terminal/file task
- run one memory/session-search task
Record upgrade date, version, and any manual fixups in docs/operations.md or a Hermes-specific ops note.
- vijay: created docs/hermes-operations.md as the Hermes-specific ops note.

Phase 4 — Provider And Model Resilience

Keep OpenAI Codex OAuth as the primary provider if it remains stable.
Add at least one fallback provider for resilience:
- OpenRouter
- Google/Gemini
- Anthropic
- local/Ollama if useful for low-risk offline tasks
Configure provider credentials through Hermes auth/config flows; do not commit keys.
- vijay: documented the command path; provider additions requiring new credentials remain pending.
Define model routing tiers:
- fast/cheap model for routine summaries and simple ops
- strong coding model for repo work
- vision-capable model for screenshots/images
- long-context model for large transcripts and audits
Test fallback behavior by switching models in a new session.
Document the preferred default model and fallback order.
- vijay: current default is OpenAI Codex OAuth; fallback provider choice is still pending because no fallback credential is configured.

Phase 5 — Tooling Capability Upgrade

Enable/configure at least one reliable web search/extract backend:
- Exa
- Tavily
- Firecrawl
- SearXNG self-hosted option
Configure browser automation only if needed and keep it private/safe:
- local Chromium/Camofox, or
- Browserbase/Browser Use
Configure GitHub/Gitea automation credentials with least privilege.
Add vision/image capability if screenshots, diagrams, or UI reviews are common.
Validate the active Telegram toolset includes the capabilities ByteLyst expects:
- vijay: hermes doctor --fix reported browser, clarify, code_execution, cronjob, terminal, delegation, file, memory, messaging, session_search, skills, todo, tts, vision, video, and related toolsets available; web remains blocked by missing search backend API key.
- terminal
- file
- search/session_search
- memory
- skills
- cronjob
- messaging
- delegation
- browser is available; web search/extract still needs a backend API key
Document tool enablement changes and restart/reset requirements.
- vijay: added restart/reset notes to docs/hermes-operations.md.

Phase 6 — Telegram Gateway Workflow

Keep Telegram as the primary control plane.
- vijay: watchdog delivery is configured to the origin Telegram conversation; dashboard remains private-only/pending.
Preserve the user's preferred progress prefix convention: 1️⃣, 2️⃣, etc.
- vijay: retained in roadmap and memory; use for progress/completion updates from Hermes sessions.
Ensure home channel and allowed user settings are correct.
- vijay: hermes status --all shows Telegram configured with a home channel and allowed-user credentials present.
Add smoke-test steps for:
- vijay: added gateway smoke-test bullets to docs/hermes-operations.md.
- inbound Telegram command
- outbound completion message
- approval prompt flow
- media/file delivery
Decide whether Telegram topic/session handling should be enabled or documented.
Add a runbook for gateway restart/recovery.
- vijay: added gateway recovery section to docs/hermes-operations.md.

Phase 7 — Memory, Skills, And Knowledge Capture

Review persistent memory for stale entries and trim anything no longer useful.
Keep memories declarative and durable; avoid storing task-completion artifacts.
Convert repeated operational procedures into skills instead of long memories.
Pin critical ByteLyst/Hermes skills that should not be archived.
Schedule or manually run curator reviews if enabled.
Add skills for recurring ByteLyst workflows:
- Gitea Actions troubleshooting
- Caddy + Docker routing changes
- Hermes backup/restore drill
- Telegram gateway recovery
- safe multi-repo commit/push workflow

Phase 8 — Cron, Watchdogs, And Autonomous Maintenance

Keep current Hermes backup cron job enabled.
- vijay: backup cron remains active.
Add watchdogs that notify Telegram only on actionable failures:
- vijay: installed ~/.hermes/scripts/hermes_health_watchdog.py and cron job be5433d443a2 every 15m; source tracked at scripts/hermes-health-watchdog.py.
- gateway down
- cron scheduler stale
- backup job failed or no fresh commit within threshold
- disk usage high
- memory pressure high
- Caddy/Gitea critical services down
Prefer no_agent=True script-only watchdogs for fixed health checks.
- vijay: watchdog cron is no-agent/script-only and silent on success.
Keep noisy health checks silent on success.
- vijay: manual script test produced empty output on a healthy run.
Use self-contained prompts for any LLM-driven cron jobs.
- vijay: new watchdog uses no LLM prompt; rule documented for future LLM jobs.
Avoid recursive cron creation from cron-run sessions.
- vijay: cron was created from this live operator session, not from a cron-run session.

Phase 9 — Private Dashboard / Mission Control Direction

Do not expose Hermes dashboard publicly.
- vijay: no public dashboard/API route added; private-only policy documented.
If a dashboard is useful, make it private-only and operationally scoped.
Dashboard should show:
- gateway status
- active sessions
- cron job state
- backup freshness
- recent sanitized alerts
- quick links to docs/runbooks
Any dashboard actions must require authentication and ideally remain reachable only over private network/tunnel.
Add a Caddy review step before adding any new hostname.
- vijay: added Caddy/port review commands to docs/hermes-operations.md.

Phase 10 — Multi-Agent And Project Execution Workflow

Use delegate_task for bounded subtasks inside a parent session.
Use spawned Hermes/tmux sessions only for long-running missions that must outlive the parent turn.
Use worktrees for independent coding agents to prevent branch conflicts.
For durable multi-agent coordination, evaluate Hermes Kanban.
Document when to use:
- direct tool call
- delegate_task
- background terminal process
- cron job
- Kanban worker
Add a ByteLyst convention for progress/completion Telegram notifications from concurrent sessions.

Phase 11 — Security And Secret Hygiene

Reconfirm raw .env, OAuth credentials, tokens, logs, and SQLite WAL/SHM files are excluded from git backups.
Consider enabling security.redact_secrets if the operational tradeoff is acceptable.
Keep privacy.redact_pii decision documented for gateway sessions.
Rotate old credentials after migration or accidental exposure risk.
Use least-privilege tokens for GitHub/Gitea, web APIs, and provider keys.
Add a pre-commit or manual scan step before pushing Hermes backup/config changes.
Keep approval mode at manual or smart for Telegram-driven work.

Phase 12 — Documentation And Runbooks

Add a Hermes operations index under docs/.
- vijay: created docs/hermes-operations.md.
Link this roadmap from docs/repo-map.md.
- vijay: roadmap was already listed; added docs/hermes-operations.md to repo map.
Create or update runbooks for:
- installing/upgrading Hermes
- restarting the gateway
- restoring persistent data from backup
- configuring providers/models
- enabling/disabling tools
- adding safe cron watchdogs
- private-only dashboard access
Keep commands copy-pasteable and include expected outputs.
- vijay: copied operational commands into docs/hermes-operations.md; expected-output notes included where useful.
Store secrets only as placeholder variable names or .env.example entries.
- vijay: no raw secrets were added to docs or scripts.

Priority Execution Plan

Immediate — Today / Next Session

Confirm no public Hermes dashboard route exists.
Investigate hermes doctor timeout.
Verify backup cron freshness and remote push status.
Add one Telegram watchdog for gateway/backup failure.
Choose and configure one web search backend.

Near-Term — This Week

Add fallback model/provider.
Document provider routing and model defaults.
Add gateway recovery runbook.
Add restore drill runbook and perform one test-profile restore.
Add Gitea/GitHub least-privilege automation credential path.

Medium-Term — This Month

Evaluate private-only dashboard/mission-control UX.
Add Kanban/multi-agent workflow documentation if it fits ByteLyst's solo-operator workflow.
Add silent-on-success system watchdogs.
Clean up stale memory/skills and pin critical skills.
Schedule quarterly restore drills.

Acceptance Criteria

This roadmap is complete when:

Hermes can be upgraded and rolled back/restored with a documented process.
Gateway failures and backup failures notify Telegram.
At least one fallback model/provider is configured and tested.
Web/search tooling works for current research tasks.
No Hermes dashboard/API is publicly exposed.
Backup restore has been tested into a non-production profile.
Core ByteLyst Hermes procedures exist as docs or skills.
Sensitive files remain untracked and backup-safe.

Execution Log

2026-05-27 — vijay setup execution pass

vijay: synced bytelyst-devops-tools from GitHub and added the Gitea remote locally for branch push tracking.
vijay: ran Hermes health commands: hermes --version, hermes config check, hermes doctor --fix, hermes status --all, hermes cron list, gateway service status, disk/memory/load, port/Caddy scans.
vijay: hermes doctor --fix completed and migrated config v23 → v24.
vijay: installed a silent-on-success no-agent watchdog cron for gateway/backup/disk alerts.
vijay: created docs/hermes-operations.md, updated docs/operations.md, and added this roadmap progress commentary.
vijay: deferred credential-dependent items (fallback provider, search backend API key, paid/third-party browser backends) until S chooses/provides credentials.
vijay: deferred the actual Hermes version upgrade to a dedicated checkpoint because the install is 8 commits behind and should be upgraded only after a fresh backup/smoke-test window.

Notes For Future Transcript Pass

When the transcript is available, specifically check whether the video recommends any of the following and update this roadmap accordingly:

exact provider/model choices
recommended Hermes install path
gateway platform setup details
dashboard or web UI exposure guidance
memory/skill workflows
MCP server recommendations
cron/background agent patterns
voice/STT/TTS setup
any security warnings or anti-patterns

20 KiB Raw Blame History Unescape Escape