bytelyst-devops-tools

Author	SHA1	Message	Date
Hermes VM	7047d625ef	feat(dashboard/vm): Phases 1.1, 1.3, 3.1, 3.4 — VM page panels Phase 3.1 — VM Score Card (0–100): - 6 weighted dimensions: steal time, RAM, disk, service health, maintenance hygiene, LLM readiness (matching roadmap scoring) - Color-coded gauge + per-dimension progress bars with detail text - Computed from health + cron + unhealthy data; degrades gracefully when any source is unavailable Phase 1.3 — Unhealthy Container Detail Panel: - Loads independently from GET /api/vm/containers/unhealthy - Per-container: name, unhealthy since, restart count, last health logs - Expandable row for health check output - One-click restart with spinner, feedback toast, auto-refresh after 3s Phase 1.1 — Cron Status Panel: - Loads from GET /api/vm/cron-status - Table: 4 managed jobs × schedule \| last run \| freed MB \| status \| next - Collapsible run history (last 10) with step-by-step log expansion Phase 3.4 — Ollama/LLM Panel: - Loads from GET /api/vm/ollama/models - Currently-loaded section with RAM pressure warning (<4 GB free) - RAM bar visualisation showing model footprint - Model list with size + last-used time - One-click unload button Other improvements: - All data fetched in parallel (Promise.allSettled) — any panel failure does not block the rest of the page - Add steal, failed_units, cron_missing_paths to CHECK_META/CHECK_ORDER - Refresh now updates all 5 data sources atomically - web/package-lock.json regenerated (was stale, caused build failure) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:49:23 +00:00
Hermes VM	b15c570587	docs: record common-platform port hardening Some checks failed pre-commit / pre-commit (push) Failing after 37s Details	2026-05-27 21:32:31 +00:00
Hermes VM	d9618ba7b0	feat(vm): Phases 1.2, 1.4, 2.1 — steal time, swap pressure, health watchdog Phase 1.2 — CPU steal time metric in vm-health-check.sh: - Samples /proc/stat twice 1s apart for accurate current steal % - Thresholds: >5% WARN, >15% CRIT (currently 0.8% on this host) - Inserts before memory check so steal is visible alongside load Phase 1.4 — Swap pressure indicator: - Reads SwapCached from /proc/meminfo as secondary metric - Raises SWAP_USED_WARN_GB 1→1.5 to reduce noise (current usage 0.6G) - New WARN path: SwapCached > 200MB signals recent pressure even when current swap usage looks ok (catches post-spike state) Phase 2.1 — Docker health-check watchdog: - docker-health-watchdog.sh: checks unhealthy containers every 10 min, restarts only after 3 consecutive failing health checks (30min grace) - docker-health-watchdog.service + .timer: enabled, fires every 10 min - Sends Telegram notification on each auto-restart - Rollback: systemctl disable docker-health-watchdog.timer Phase 2.2 already complete: sync_hermes_persistent_backup.py handles diverge gracefully with rebase/reset-hard fallback; running successfully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:31:09 +00:00
Hermes VM	d60c81ebda	docs: record internal port loopback hardening Some checks failed pre-commit / pre-commit (push) Failing after 38s Details	2026-05-27 21:25:38 +00:00
Hermes VM	2fc23d6baa	feat(vm): fix devops-backend VM module — Phase 0.1 complete - Switch backend runner from node:20-alpine to node:20-slim so GNU df flags (--output=pcent/avail) work inside the container - Add volume mounts to docker-compose.yml: scripts (ro), VM logs (rw), docker.sock; set VM_SCRIPTS_PATH + VM_LOG_DIR env vars - Rebuild repository.ts: env-configurable paths, cron history parser, unhealthy-container inspector, Ollama model endpoints - Add routes: GET /api/vm/cron-status, unhealthy containers, Ollama models, container restart, model unload - vm-cleanup.sh: add step_cosmos_pglog, step_docker_aged_images; fix (( count++ )) → count=$(( count + 1 )) for set -e compatibility - Add docs/VM_OBSERVABILITY_ROADMAP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:13:45 +00:00
Hermes VM	5a2d92f519	docs: record VM container health fix Some checks failed pre-commit / pre-commit (push) Failing after 33s Details	2026-05-27 21:12:45 +00:00
Saravana Kumar	e2db92f3b1	Add Hermes snapshot diff view	2026-05-27 21:05:57 +00:00
Saravana Kumar	8f522e3505	Add Hermes dashboard improvement backlog	2026-05-27 21:02:23 +00:00
Hermes VM	9210a8890f	feat: detect stale VM automation Some checks failed pre-commit / pre-commit (push) Failing after 32s Details	2026-05-27 21:00:43 +00:00
Hermes VM	3d5f369f3d	docs: record Gitea runner recovery Some checks failed pre-commit / pre-commit (push) Failing after 40s Details	2026-05-27 20:58:16 +00:00
Hermes VM	1f2eea8268	docs: record VM backup and cron fixes Some checks failed pre-commit / pre-commit (push) Has been cancelled Details	2026-05-27 20:56:11 +00:00
Saravana Kumar	90f6db2014	Complete Hermes ops dashboard and roadmap	2026-05-27 20:53:58 +00:00
Hermes VM	e3d1dddf51	docs: add VM exposure inventory Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:51:27 +00:00
Saravana Kumar	98a7915a38	Reconcile Hermes roadmap and dashboard status	2026-05-27 20:46:16 +00:00
Saravana Kumar	ac79591903	Mark web search tooling complete	2026-05-27 20:46:16 +00:00
Hermes VM	313a775fa0	docs: strengthen VM security roadmap gates Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:34:37 +00:00
Hermes VM	2c125adb05	docs: add VM security blind spots roadmap Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:21:52 +00:00
Saravana Kumar	c89018ae47	Tighten Telegram fallback wording	2026-05-27 20:18:46 +00:00
Saravana Kumar	8145484136	Verify Telegram fallback platform context	2026-05-27 20:16:30 +00:00
Saravana Kumar	8da66497cc	Tighten Hermes local fallback chain	2026-05-27 19:58:09 +00:00
Saravana Kumar	3e26f0da31	Close Hermes browser and web backend items	2026-05-27 19:23:55 +00:00
root	d1f234fc01	Mark Firecrawl as locally configured	2026-05-27 18:57:50 +00:00
Hermes VM	70d96d7684	feat: add gitea backup timer assets	2026-05-27 18:53:20 +00:00
Hermes VM	147db72330	docs: add hostinger maintenance operations entry	2026-05-27 18:53:20 +00:00
Hermes VM	31b414d62b	fix: systematic bug fixes — code-quality parser, env key, config warnings, auth cleanup, deployment safety - code-quality/repository.ts: fix tsErrorMatch[3] → [4] for type field (group 3 is column, 4 is error\|warning) - code-quality/repository.ts: fix ESLint regex to make rule brackets optional (not all formatters include them) - code-quality/repository.ts: fix Vitest test count — parse 'Tests' line (individual tests) instead of 'Test Files' (file count); improve Jest regex to capture pass/fail independently - env/repository.ts: replace raw process.env.ENCRYPTION_KEY with config.ENCRYPTION_KEY so the validated default flows through a single source of truth - config.ts: add startup console.warn when CSRF_SECRET or ENCRYPTION_KEY are using insecure defaults - deployments/orchestrator.ts: refactor runDeploymentScript to use try/catch/finally — deployment record is now always written in the finally block, preventing zombie 'running' states if updateDeployment itself throws - auth.tsx: remove dead 'user &&' guard (user is always truthy after the !user check above); remove debug console.log calls, keep console.error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	cdc23696b2	fix: resolve all TypeScript errors — green tsc Primitives.tsx (TS2339): - asChild branch read children.props.className before the cast applied, making props typed as unknown. Extract typedChild first, then read props. hermes/page.tsx + agents/page.tsx + tasks/page.tsx + tasks/[id]/page.tsx (TS2322): - Badge.variant accepts 'neutral'\|'success'\|'warning'\|'error'\|'info' but callers were passing 'danger' (should be 'error') and 'default' (should be 'neutral'). MetricCard.tone is a separate type and is correct as-is. Changes: - statusTone map in hermes/page.tsx: 'danger' → 'error', 'default' → 'neutral' - getTaskTone fallback: 'default' → 'neutral'; explicit return type added - levelTone in tasks/[id]/page.tsx: 'danger' → 'error'; explicit return type added - Inline Badge variants: all remaining 'danger' → 'error' across 3 files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	1099d518ef	improve: dashboard security, code quality, and UX fixes Security (backend): - env/routes: add requireAdmin to all 6 env endpoints — GET /env was fully open, exposing all secret values to unauthenticated requests - deployments/routes: add requireAdmin to all 4 GET endpoints (deployment history and logs were publicly readable) - health/routes: remove duplicate requireAdmin call from DELETE /health/cache handler body (was already enforced via preHandler) Frontend — auth/api: - system/page: replace raw fetch + localStorage token with apiRequest (mutations now go through CSRF flow) - vm/page: same — replace raw fetch with vmApi from api.ts - api.ts: add vmApi (getHealth, getCleanupLog, runCleanup) + shared VmHealthResult / VmCheck / VmCheckLevel types Shared utilities: - utils.ts: add formatBytes() and getStatusColor() shared helpers - system/page: remove duplicate formatBytes, import from utils - health/page: remove duplicate getStatusColor, import from utils - page.tsx (home): remove duplicate getStatusColor, import from utils UX improvements: - page.tsx: remove Seed Services button from normal header (debug tool) - page.tsx: deploy button now always enabled; shows inline warning banner when service is not 'up' instead of silently disabling the button - metrics: fix bar chart — bars now grow from bottom (flex-col-reverse), add empty state, fix date parsing timezone edge case - sidebar-nav: theme toggle now functional — persists to localStorage and toggles document.documentElement class 'dark' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	d0b8ce2c74	feat: add VM Health page to devops dashboard Backend (Fastify): - New module: modules/vm/ (types, repository, routes) - GET /api/vm/health — runs vm-health-check.sh --json, returns structured result - GET /api/vm/cleanup-log — tails /var/log/vm-cleanup.log - POST /api/vm/cleanup — triggers vm-cleanup.sh (weekly / monthly / dry-run) - Registered vmRoutes in server.ts Frontend (Next.js): - New page: /vm — VM Health - Overall status banner (OK/WARN/CRIT) with issue summary - Per-check cards: disk, load, RAM, swap, crash loops, container health, build cache, docker images, journal, syslog — color-coded by level - Cleanup trigger buttons (dry-run, weekly, monthly) with output viewer - Collapsible cleanup log viewer (last 40 lines) - Auto-refresh every 60s - sidebar-nav.tsx: added 'VM Health' entry with Server icon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	678430d77d	fix: cron health-check entry should call vm-health-check.sh --notify The 07:00 daily cron was incorrectly pointing to vm-cleanup.sh instead of vm-health-check.sh. Health check is read-only; cleanup is not. Also add --notify so Telegram alerts fire when WARNING/CRITICAL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	0a2d303f93	add HostingerVM health-check and cleanup scripts - vm-health-check.sh: read-only checks for disk, load, RAM, swap, Docker containers (crash-loops + healthchecks), build cache, journal. Flags: --quiet, --json, --notify (Telegram). Exit 0/1/2 = OK/WARN/CRIT. - vm-cleanup.sh: safe periodic cleanup. Default (weekly): build cache, journal, apt, npm, .next/cache. --full (monthly): adds docker system prune, pnpm store, old logs, HOLD cleanup. --dry-run, --install-cron, --uninstall-cron. Logs to /var/log/vm-cleanup.log. Related: docs/hostinger-vm-maintenance.md, scripts/VMs/HostingerVM/CRON_SETUP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
root	4249b17afc	Document Firecrawl backend selection	2026-05-27 18:52:39 +00:00
root	08f32a79e8	Clarify remaining Hermes fallback verification	2026-05-27 18:46:32 +00:00
root	8fbb535d90	Add shared local Hermes fallback chain	2026-05-27 18:43:30 +00:00
root	9ee060e839	Harden Hermes operations dashboard status	2026-05-27 17:45:41 +00:00
root	0e6528b366	Add live Hermes operations dashboard	2026-05-27 13:04:36 +00:00
saravanakumardb1	babe2e6c13	docs(roadmap): v14 \xe2\x80\x94 ALL 20 ITEMS COMPLETE (C5 closed end-to-end) C5 fully closed by: 1. Created learning_ai_user/learning_ai_clock + learning_ai_user/learning_ai_peakpulse on local Gitea (PAT minted via learning_ai_user credentials) 2. Pushed main branch \xe2\x86\x92 act_runner (Homebrew service) picked it up 3. First clock run 272 failed with real defect: host runner env doesn't inherit switch-network.sh exports. Fix landed in both pilots' ci.yml docker-lint job: explicit env: block + read token from ~/.gitea_npm_token at step time. 4. Verified green: - clock run 273 job 675 docker-lint \xe2\x86\x92 success - peakpulse runs 274 + 275 docker-lint \xe2\x86\x92 success Roadmap final state: 20/20 items DONE.	2026-05-27 05:20:48 -07:00
root	3cc9a1456e	Add Google Drive single file uploader	2026-05-27 12:19:45 +00:00
root	79ca56ffce	Add Google Drive emergency bundle upload	2026-05-27 12:08:41 +00:00
saravanakumardb1	484c82c4b1	docs(roadmap): repair v13 \xc2\xa710 corruption + finalize C5 partial-validation note A prior rebase merged the v13/v13.1 edits into \xc2\xa710 with mangled text (steps 11\xe2\x80\x9320 out of order; step 10 garbled). Rebuilt the section cleanly from v12 base + appended the new v13/v13.1 steps: 11. Phase E1/E2/E5 12. Phase B 13. Phase B4 + E3/E4/E6 14. Phase C (8/9; C5 partial) 15. Phase D.1 16. Phase D.2 17. B7-4 AGENTS.md warnings 18. Phase D extension (MindLyst, LysnrAI, talk2obsidian) 19. Phase D.3 advisory cleanup 20. C5 partial validation (this session) Restored the lost "ported back to clock" trailing line for step 9. No content changes beyond what was already documented in v13/v13.1.	2026-05-27 04:34:53 -07:00
saravanakumardb1	2d13ae4c54	docs(roadmap): v13.1 \xe2\x80\x94 C5 partial validation (Gitea hosting gap documented) Findings from dummy check-in attempt: - Pilot workflow YAML parses cleanly (6 jobs on clock incl. docker-lint) - Local simulation of docker-lint job (gitea-doctor + docker-doctor) exits 0 on both pilots - Pilot repos are NOT hosted on Gitea (`git push gitea` returns 404). Only `learning_ai_uxui_web` exists at localhost:3300 - Until pilot repos are mirrored to Gitea, the .gitea/workflows/ci.yml file ships but the runner never fires - C5 marked as partial; gap recorded explicitly in \xc2\xa7Phase C and \xc2\xa710	2026-05-27 04:32:33 -07:00
root	bb15a225cd	Add encrypted Hermes emergency bundle scripts	2026-05-27 11:31:58 +00:00
saravanakumardb1	e96b555f07	docs(roadmap): v13 \xe2\x80\x94 12/12 consumer repos PASS docker-doctor (Phase D extension + D.3) Final-state summary: - All 12 consumer repos now PASS docker-doctor with zero errors - MindLyst + LysnrAI + talk2obsidian onboarded (was previously out of scope) - docker-doctor learned Python Dockerfile detection - 10 repos received advisory-warning cleanup commits (compose build.args + healthcheck.start_period) - C5 (CI green confirmation) is the only remaining follow-up The roadmap is now in a fully landed state for in-scope repos.	2026-05-27 04:27:15 -07:00
root	19fdba752c	Add Hermes disaster recovery runbook	2026-05-27 11:23:07 +00:00
saravanakumardb1	ccd6ee4f7f	docs(roadmap): v12 \xe2\x80\x94 all phases (A, B, C, D, E) complete for 9 consumer repos - B7-4 AGENTS.md warnings landed in all 9 repos - C9 web smoke test (Playwright) landed on clock to guard F11 regression - D.2 per-repo Dockerfile/compose fixes applied to all 7 consumer repos via idempotent fixer; docker-doctor PASS on every consumer repo - 3 non-consumer repos (MindLyst KMP, LysnrAI multi-target, talk2obsidian) remain out of scope; documented as follow-up - C5 confirmation pending next Gitea CI run Final status: 18 of 18 in-scope items complete.	2026-05-27 04:17:52 -07:00
root	547a9d00fa	Clarify root GitHub credential ownership	2026-05-27 11:10:48 +00:00
saravanakumardb1	6a4e289edc	docs(roadmap): v11 \xe2\x80\x94 Phases B4/E3/E4/E6 + C (7/9 gates) + D.1 (artifacts rolled out) - B4: pre-commit guard + husky wiring landed - E3/E4/E6: CI job + pre-commit warn-only + make doctor target - C1\xe2\x80\x93C4, C6\xe2\x80\x93C8: verified on pilots; C5 pending CI, C9 deferred - D.1: artifacts deployed to 7/9 consumer repos with per-repo findings table - D.2: per-repo Dockerfile fixes captured as a fix matrix (follow-up work) - All commit refs documented in \xc2\xa710 execution order	2026-05-27 04:07:27 -07:00
root	416f25794c	Document Hermes Gitea token flow	2026-05-27 11:06:15 +00:00
saravanakumardb1	11c185e772	docs(roadmap): v10 — Phase B complete (canonical docker-prep + sync tooling) - All B-tasks complete except B4 (husky hook) and B7-4 (AGENTS.md updates) - Canonical home landed at common-plat@a418a23e - Both pilots synced; end-to-end verified on clock + peakpulse - 3 bonus capabilities documented (--check, portable sed, .gitkeep preservation) - \xc2\xa710 execution step 12 marked done with commit refs	2026-05-27 03:49:52 -07:00
root	8de72351de	Complete Hermes dashboard and watchdog roadmap audit	2026-05-27 10:45:29 +00:00
saravanakumardb1	15ac960faf	docs(roadmap): v9 — Phase E1/E2/E5 done, docker-doctor.sh landed - Marked E1, E2, E5 complete in Phase E checklist - Added step 11 to \xc2\xa710 execution order with commit refs - Renumbered remaining steps; deferred E3/E4/E6 to after Phase B	2026-05-27 03:33:35 -07:00

1 2 3 4

164 Commits