bytelyst-devops-tools

Author	SHA1	Message	Date
Hermes VM	ecd1f20d59	feat(dashboard): Phase 2 — instance dimension across Mission Control Closes Phase 2. Every entity in `web/src/lib/hermes` now carries an `instanceId: 'vijay' \| 'bheem'` (with `'all'` allowed for cross-cutting agents like Hermes Core / GitHub link), and a global instance switcher above every Mission Control pane filters them. Library changes (`web/src/lib/hermes.ts`): - New `HermesInstanceId` / `HermesInstanceFilter` types + `HERMES_INSTANCES` metadata array. - `instanceId` added to `HermesProduct`, `HermesTask`, `HermesEvent`, `HermesRun`, `HermesAgentStatus`. Seed data deterministically split ~50/50 across instances; agents tagged per-scope (Local VM runner → bheem, CLI runner / Scheduler → vijay, Hermes Core / GitHub / OpenClaw / deployment / notifications → all). - `getHermesTasks({instance})`, `getHermesProducts(view, instance)`, `getHermesAgents(instance)`, `getHermesHistory(instance)`, `getHermesOverview(instance)` all accept the filter; helper `instanceMatches(scope, filter)` keeps the semantics consistent (always-match for `'all'` on either side). UI changes: - New `HermesInstanceProvider` (React context, localStorage-backed under `hermes.instanceFilter.v1`, SSR-safe default to avoid hydration mismatch) mounted in `app/hermes/layout.tsx`. - New `HermesInstanceSwitcher` segmented control (radiogroup with aria-checked) rendered in the layout header above every pane. - New `HermesInstanceBadge` shown on task rows (Active Missions + Task Ledger), product cards (overview minicards + portfolio cards), and agent cards. - `/hermes` overview gains a "Per-instance roll-up" section that always shows Vijay vs Bheem side-by-side regardless of the active filter — that's the always-cross-instance comparison view, while the eight metric cards above it are filtered by the switcher. Tests: - 2 new unit tests in `lib/hermes.test.ts` (instance tagging on seed data + filter semantics across tasks/products/agents/overview). - 1 new E2E test asserting the switcher's radiogroup, default selection, and persistence-friendly state change. - All green: 13/13 web unit tests, 7/7 E2E. `web/test-results/` and `web/playwright-report/` added to `.gitignore` since they're regenerated per run. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:43:55 +00:00
Hermes VM	13e5e1c551	ci(dashboard): Phase 5 P2 — wire Playwright E2E into Gitea CI Closes the Phase 5 P2 checkbox (second half — first half: pino logging in `1e64d75`). Phase 5 is now fully green. Two changes: 1. `web/e2e/hermes.spec.ts` now intercepts `/api/hermes/ops` with a fixture snapshot. The backend's hermes-ops endpoint shells out to `systemctl` / `git` / `ps` / `du` on the live VM and is therefore neither available nor deterministic in CI. Mocking it lets the suite run against the web stack alone (no backend, no live VM). Fixture shape mirrors the Zod schema in `backend/src/modules/hermes-ops/types.ts`. 2. `.gitea/workflows/ci.yml` re-enables the previously-commented-out E2E step. Adds a preceding `playwright install --with-deps chromium` step so the runner pulls the browser fresh per run. The web suite starts its own Next dev server via Playwright's `webServer` config (`pnpm exec next dev -p 3200`), so we do NOT start the backend in CI — every backend route used by the suite is mocked via `page.route` (auth, csrf, services, deployments, health/cache, seed, hermes-ops). Verified locally: `pnpm exec playwright test` → 6 passed in 19.5s (2 hermes specs + 4 dashboard/login specs across desktop + mobile). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:28:50 +00:00
Hermes VM	c6ec1a06ea	docs(dashboard): Phase 5 P1 — document privilege surface; gate /code-quality/check Closes the final Phase 5 P1 checkbox and REVIEW_ACTIONS #6. The backend container has root-equivalent host access via the docker socket, host log mounts, and the VM scripts mount, but until now the "who can do what to the host?" answer was scattered across compose files and route handlers. This commit centralizes it. DEPLOYMENT.md gains a "Privilege Surface" section that lists: - every host mount + container path + mode + purpose - every shell-outing route, the actual commands it runs, and the auth gate on each - what an admin token can do today (≈ host shell) - five known sharp edges (un-allow-listed container names, unvalidated projectPath, no per-route audit-log on shell-outs, container runs as root, global rate-limit only) - a P1 → P3 mitigation roadmap (allow-list wrapper around shell-outs, projectPath validation, audit-logging shell-outs, drop root in container, replace docker.sock with a verb-restricted proxy) Concurrent code fix: `POST /code-quality/check` was reachable unauthenticated despite shelling out to `npm run typecheck/lint/ build/test:run` in a caller-supplied `projectPath`. Added `preHandler: requireAdmin` to bring it in line with every other shell-outing route in the dashboard. Same commit because the documentation table promises this gate exists. REVIEW_ACTIONS #6 marked RESOLVED with the rationale; roadmap checkbox ticked. Tests, typecheck, lint (0 errors), build, and coverage gate (≥95% lines on every gated file) all stay green. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:05:51 +00:00
Hermes VM	824f31586a	docs(dashboard): Phase 5 P1 — fix port/endpoint drift, dedupe deployment docs Closes the Phase 5 P1 doc-drift checkbox and REVIEW_ACTIONS #5. The 3000-vs-3049 confusion came from prose claims in three docs that each picked a different "right" answer. The truth is: the web container listens on :3000; docker-compose maps `127.0.0.1:3049:3000`; production is fronted by Traefik on `https://devops.bytelyst.com`. Encoding that explicitly so future readers don't have to dig through compose files: - DEPLOYMENT.md becomes canonical. Its content is now the (more accurate) old DEPLOYMENT_GUIDE.md merged with a "Ports — quick reference" table covering Local dev / Docker Compose / Production Traefik, plus a Local-development section for `pnpm dev`. - DEPLOYMENT_GUIDE.md → 5-line redirect stub pointing at DEPLOYMENT.md (kept for `deploy.sh` and any external links). - deploy.sh updated to point at DEPLOYMENT.md. - README.md "Web port: 3000" line rewritten to spell out container vs Compose-host vs dev-mode and link to the port table. - ENDPOINTS.md gets a top-of-file note: every `localhost:3000` URL in that file is the `pnpm dev` workflow; substitute `:3049` for the Dockerized stack. - REVIEW_ACTIONS.md #5 marked RESOLVED with the rationale. No code, behavior, lint, or test changes. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:03:05 +00:00
Hermes VM	3fc471e880	chore(dashboard): Phase 5 P1 — remove dead SSE log-stream claim Closes the long-standing SSE TODO. The previous attempt with `fastify-sse-v2 ^4` was incompatible with Fastify 5 and was never wired in; the README/DEPLOYMENT.md kept advertising "real-time log streaming" that didn't exist. The web client never used EventSource — `web/src/ lib/api.ts` already polls `/deployments/:id/logs` via the normal `apiRequest` helper. Resolution: remove the claim, not ship the feature. - drop `fastify-sse-v2` dep from `backend/package.json` + lockfile - delete the commented-out plugin import + register in `server.ts`, replace with a NOTE explaining the JSON-polling decision and how to add a stream later (`reply.raw`) - remove the `TODO: Re-enable SSE` comment in `deployments/routes.ts`; the endpoint already returns JSON, document that explicitly - rewrite the README "Deployment Log Streaming" section as "Deployment Logs" (JSON-polled, no SSE); fix the endpoint table - flip the DEPLOYMENT.md bullet from "Real-time log streaming (SSE)" to "Deployment log retrieval (JSON polling — no SSE)" - mark REVIEW_ACTIONS #4 RESOLVED with the reasoning - tick the roadmap checkbox If a real-time stream is wanted later, ship it explicitly via `reply.raw` and update README/DEPLOYMENT.md/the route comment in the same change. Don't reintroduce a half-disabled plugin. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:00:07 +00:00
Hermes VM	18180aab78	test(dashboard): Phase 5 P1 — auth/csrf/health/orchestrator tests + coverage gate Closes the Phase 5 P1 testing checkbox. Adds 35 new unit tests across the modules called out in the roadmap and wires a v8 coverage gate into CI. Coverage of newly-tested files (lines / branches): lib/auth.ts 94.4% / 100% lib/csrf.ts 95.1% / 90% modules/health/repository.ts 100% / 92% modules/deployments/orchestrator.ts 95.2% / 74% modules/services/repository.ts 100% / 100% modules/hermes-ops/repository.ts 95.2% / 68% Threshold (lines/funcs/stmts ≥85%, branches ≥65%) is scoped to those six files via `coverage.include` so untested legacy modules (vm, system, audit, route handlers) report but don't gate. Add files there as they gain real tests — ratchet up, never relax. Test approach mirrors the existing services/hermes-ops suites: hoisted mocks for I/O (fetch, child_process, fs/promises, cosmos-init), real JOSE-signed JWTs for the auth path, fake timers for cache TTL and CSRF expiry assertions. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 06:56:16 +00:00
Hermes VM	cf5428acd1	feat(dashboard): Phase 1 — harden hermes-ops backend + tests - Short-TTL (30s) snapshot cache + in-flight coalescing so the panel poll and concurrent refreshes don't fan out ~20 systemctl/git/ps/du subprocesses each time; snapshot carries a `cached` flag and `getHermesOpsSnapshot({force})`. - Distinguish "unit inactive" (down) from "probe couldn't run" (unknown): a new exec() wrapper reports whether the command actually ran (ENOENT/timeout = unknown) vs exited non-zero with output (e.g. systemctl is-active -> inactive). Per-field ProbeStatus on gateway/dashboard/timer/repo; warnings differentiate "is not active" from "status could not be determined". - Robust Bheem/Uma checks: `runuser -u uma -- systemctl --user is-active/ is-enabled` with a ps / existsSync fallback so a failed probe degrades to the legacy check instead of a false "down". - Zod schema (HermesOpsSnapshotSchema) as the stable typed contract; the route validates output before sending. New status fields are additive (active/ enabled/url/etc. preserved) so the existing web client is unaffected. - Unit tests (mock execFile/fs): healthy snapshot, down vs unknown mapping, runuser->ps fallback, unreadable repo, cache hit + force bypass, request coalescing. Backend: 16 tests green. Roadmap: check off Phase 1 items and Phase 5 P0 in hermes_dashboard_v2_roadmap.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 06:50:32 +00:00
Hermes VM	a8dd166108	docs: add Hermes dashboard v2 roadmap + CI/E2E delegation brief Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 05:26:49 +00:00
saravanakumardb1	8f725f8587	docs(repo-map): register agent-queue tool directory	2026-05-28 21:35:59 -07:00
saravanakumardb1	a049e9c602	docs(roadmap): record post-roadmap follow-ups complete (v15) - docker-lint CI propagated to all 9 remaining consumer repos - all 10 remaining repos mirrored to Gitea; 9/9 docker-lint jobs green - Gitea Actions runner hardened (capacity 1->2, env_file token) + documented - repair corrupted §10 execution-log region from prior rebase	2026-05-28 18:07:36 -07:00
Hermes VM	0e1905aa33	docs: document local LLM utility workflows Some checks failed pre-commit / pre-commit (push) Failing after 33s Details	2026-05-28 00:21:06 +00:00
Hermes VM	44fd6a462a	fix: bind DevOps dashboard ports to loopback Some checks failed pre-commit / pre-commit (push) Failing after 27s Details	2026-05-27 21:55:46 +00:00
Hermes VM	f936c2231c	docs: record product port hardening Some checks failed pre-commit / pre-commit (push) Failing after 25s Details	2026-05-27 21:53:08 +00:00
Hermes VM	b15c570587	docs: record common-platform port hardening Some checks failed pre-commit / pre-commit (push) Failing after 37s Details	2026-05-27 21:32:31 +00:00
Hermes VM	d60c81ebda	docs: record internal port loopback hardening Some checks failed pre-commit / pre-commit (push) Failing after 38s Details	2026-05-27 21:25:38 +00:00
Hermes VM	2fc23d6baa	feat(vm): fix devops-backend VM module — Phase 0.1 complete - Switch backend runner from node:20-alpine to node:20-slim so GNU df flags (--output=pcent/avail) work inside the container - Add volume mounts to docker-compose.yml: scripts (ro), VM logs (rw), docker.sock; set VM_SCRIPTS_PATH + VM_LOG_DIR env vars - Rebuild repository.ts: env-configurable paths, cron history parser, unhealthy-container inspector, Ollama model endpoints - Add routes: GET /api/vm/cron-status, unhealthy containers, Ollama models, container restart, model unload - vm-cleanup.sh: add step_cosmos_pglog, step_docker_aged_images; fix (( count++ )) → count=$(( count + 1 )) for set -e compatibility - Add docs/VM_OBSERVABILITY_ROADMAP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:13:45 +00:00
Hermes VM	5a2d92f519	docs: record VM container health fix Some checks failed pre-commit / pre-commit (push) Failing after 33s Details	2026-05-27 21:12:45 +00:00
Saravana Kumar	e2db92f3b1	Add Hermes snapshot diff view	2026-05-27 21:05:57 +00:00
Saravana Kumar	8f522e3505	Add Hermes dashboard improvement backlog	2026-05-27 21:02:23 +00:00
Hermes VM	9210a8890f	feat: detect stale VM automation Some checks failed pre-commit / pre-commit (push) Failing after 32s Details	2026-05-27 21:00:43 +00:00
Hermes VM	3d5f369f3d	docs: record Gitea runner recovery Some checks failed pre-commit / pre-commit (push) Failing after 40s Details	2026-05-27 20:58:16 +00:00
Hermes VM	1f2eea8268	docs: record VM backup and cron fixes Some checks failed pre-commit / pre-commit (push) Has been cancelled Details	2026-05-27 20:56:11 +00:00
Saravana Kumar	90f6db2014	Complete Hermes ops dashboard and roadmap	2026-05-27 20:53:58 +00:00
Hermes VM	e3d1dddf51	docs: add VM exposure inventory Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:51:27 +00:00
Saravana Kumar	98a7915a38	Reconcile Hermes roadmap and dashboard status	2026-05-27 20:46:16 +00:00
Saravana Kumar	ac79591903	Mark web search tooling complete	2026-05-27 20:46:16 +00:00
Hermes VM	313a775fa0	docs: strengthen VM security roadmap gates Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:34:37 +00:00
Hermes VM	2c125adb05	docs: add VM security blind spots roadmap Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:21:52 +00:00
Saravana Kumar	c89018ae47	Tighten Telegram fallback wording	2026-05-27 20:18:46 +00:00
Saravana Kumar	8145484136	Verify Telegram fallback platform context	2026-05-27 20:16:30 +00:00
Saravana Kumar	8da66497cc	Tighten Hermes local fallback chain	2026-05-27 19:58:09 +00:00
Saravana Kumar	3e26f0da31	Close Hermes browser and web backend items	2026-05-27 19:23:55 +00:00
root	d1f234fc01	Mark Firecrawl as locally configured	2026-05-27 18:57:50 +00:00
Hermes VM	70d96d7684	feat: add gitea backup timer assets	2026-05-27 18:53:20 +00:00
Hermes VM	147db72330	docs: add hostinger maintenance operations entry	2026-05-27 18:53:20 +00:00
Hermes VM	0a2d303f93	add HostingerVM health-check and cleanup scripts - vm-health-check.sh: read-only checks for disk, load, RAM, swap, Docker containers (crash-loops + healthchecks), build cache, journal. Flags: --quiet, --json, --notify (Telegram). Exit 0/1/2 = OK/WARN/CRIT. - vm-cleanup.sh: safe periodic cleanup. Default (weekly): build cache, journal, apt, npm, .next/cache. --full (monthly): adds docker system prune, pnpm store, old logs, HOLD cleanup. --dry-run, --install-cron, --uninstall-cron. Logs to /var/log/vm-cleanup.log. Related: docs/hostinger-vm-maintenance.md, scripts/VMs/HostingerVM/CRON_SETUP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
root	4249b17afc	Document Firecrawl backend selection	2026-05-27 18:52:39 +00:00
root	08f32a79e8	Clarify remaining Hermes fallback verification	2026-05-27 18:46:32 +00:00
root	8fbb535d90	Add shared local Hermes fallback chain	2026-05-27 18:43:30 +00:00
saravanakumardb1	babe2e6c13	docs(roadmap): v14 \xe2\x80\x94 ALL 20 ITEMS COMPLETE (C5 closed end-to-end) C5 fully closed by: 1. Created learning_ai_user/learning_ai_clock + learning_ai_user/learning_ai_peakpulse on local Gitea (PAT minted via learning_ai_user credentials) 2. Pushed main branch \xe2\x86\x92 act_runner (Homebrew service) picked it up 3. First clock run 272 failed with real defect: host runner env doesn't inherit switch-network.sh exports. Fix landed in both pilots' ci.yml docker-lint job: explicit env: block + read token from ~/.gitea_npm_token at step time. 4. Verified green: - clock run 273 job 675 docker-lint \xe2\x86\x92 success - peakpulse runs 274 + 275 docker-lint \xe2\x86\x92 success Roadmap final state: 20/20 items DONE.	2026-05-27 05:20:48 -07:00
root	3cc9a1456e	Add Google Drive single file uploader	2026-05-27 12:19:45 +00:00
root	79ca56ffce	Add Google Drive emergency bundle upload	2026-05-27 12:08:41 +00:00
saravanakumardb1	484c82c4b1	docs(roadmap): repair v13 \xc2\xa710 corruption + finalize C5 partial-validation note A prior rebase merged the v13/v13.1 edits into \xc2\xa710 with mangled text (steps 11\xe2\x80\x9320 out of order; step 10 garbled). Rebuilt the section cleanly from v12 base + appended the new v13/v13.1 steps: 11. Phase E1/E2/E5 12. Phase B 13. Phase B4 + E3/E4/E6 14. Phase C (8/9; C5 partial) 15. Phase D.1 16. Phase D.2 17. B7-4 AGENTS.md warnings 18. Phase D extension (MindLyst, LysnrAI, talk2obsidian) 19. Phase D.3 advisory cleanup 20. C5 partial validation (this session) Restored the lost "ported back to clock" trailing line for step 9. No content changes beyond what was already documented in v13/v13.1.	2026-05-27 04:34:53 -07:00
saravanakumardb1	2d13ae4c54	docs(roadmap): v13.1 \xe2\x80\x94 C5 partial validation (Gitea hosting gap documented) Findings from dummy check-in attempt: - Pilot workflow YAML parses cleanly (6 jobs on clock incl. docker-lint) - Local simulation of docker-lint job (gitea-doctor + docker-doctor) exits 0 on both pilots - Pilot repos are NOT hosted on Gitea (`git push gitea` returns 404). Only `learning_ai_uxui_web` exists at localhost:3300 - Until pilot repos are mirrored to Gitea, the .gitea/workflows/ci.yml file ships but the runner never fires - C5 marked as partial; gap recorded explicitly in \xc2\xa7Phase C and \xc2\xa710	2026-05-27 04:32:33 -07:00
root	bb15a225cd	Add encrypted Hermes emergency bundle scripts	2026-05-27 11:31:58 +00:00
saravanakumardb1	e96b555f07	docs(roadmap): v13 \xe2\x80\x94 12/12 consumer repos PASS docker-doctor (Phase D extension + D.3) Final-state summary: - All 12 consumer repos now PASS docker-doctor with zero errors - MindLyst + LysnrAI + talk2obsidian onboarded (was previously out of scope) - docker-doctor learned Python Dockerfile detection - 10 repos received advisory-warning cleanup commits (compose build.args + healthcheck.start_period) - C5 (CI green confirmation) is the only remaining follow-up The roadmap is now in a fully landed state for in-scope repos.	2026-05-27 04:27:15 -07:00
root	19fdba752c	Add Hermes disaster recovery runbook	2026-05-27 11:23:07 +00:00
saravanakumardb1	ccd6ee4f7f	docs(roadmap): v12 \xe2\x80\x94 all phases (A, B, C, D, E) complete for 9 consumer repos - B7-4 AGENTS.md warnings landed in all 9 repos - C9 web smoke test (Playwright) landed on clock to guard F11 regression - D.2 per-repo Dockerfile/compose fixes applied to all 7 consumer repos via idempotent fixer; docker-doctor PASS on every consumer repo - 3 non-consumer repos (MindLyst KMP, LysnrAI multi-target, talk2obsidian) remain out of scope; documented as follow-up - C5 confirmation pending next Gitea CI run Final status: 18 of 18 in-scope items complete.	2026-05-27 04:17:52 -07:00
root	547a9d00fa	Clarify root GitHub credential ownership	2026-05-27 11:10:48 +00:00
saravanakumardb1	6a4e289edc	docs(roadmap): v11 \xe2\x80\x94 Phases B4/E3/E4/E6 + C (7/9 gates) + D.1 (artifacts rolled out) - B4: pre-commit guard + husky wiring landed - E3/E4/E6: CI job + pre-commit warn-only + make doctor target - C1\xe2\x80\x93C4, C6\xe2\x80\x93C8: verified on pilots; C5 pending CI, C9 deferred - D.1: artifacts deployed to 7/9 consumer repos with per-repo findings table - D.2: per-repo Dockerfile fixes captured as a fix matrix (follow-up work) - All commit refs documented in \xc2\xa710 execution order	2026-05-27 04:07:27 -07:00

1 2

81 Commits