Commit Graph

163 Commits

Author SHA1 Message Date
Hermes VM
b15c570587 docs: record common-platform port hardening
Some checks failed
pre-commit / pre-commit (push) Failing after 37s
2026-05-27 21:32:31 +00:00
Hermes VM
d9618ba7b0 feat(vm): Phases 1.2, 1.4, 2.1 — steal time, swap pressure, health watchdog
Phase 1.2 — CPU steal time metric in vm-health-check.sh:
- Samples /proc/stat twice 1s apart for accurate current steal %
- Thresholds: >5% WARN, >15% CRIT (currently 0.8% on this host)
- Inserts before memory check so steal is visible alongside load

Phase 1.4 — Swap pressure indicator:
- Reads SwapCached from /proc/meminfo as secondary metric
- Raises SWAP_USED_WARN_GB 1→1.5 to reduce noise (current usage 0.6G)
- New WARN path: SwapCached > 200MB signals recent pressure even when
  current swap usage looks ok (catches post-spike state)

Phase 2.1 — Docker health-check watchdog:
- docker-health-watchdog.sh: checks unhealthy containers every 10 min,
  restarts only after 3 consecutive failing health checks (30min grace)
- docker-health-watchdog.service + .timer: enabled, fires every 10 min
- Sends Telegram notification on each auto-restart
- Rollback: systemctl disable docker-health-watchdog.timer

Phase 2.2 already complete: sync_hermes_persistent_backup.py handles
diverge gracefully with rebase/reset-hard fallback; running successfully.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 21:31:09 +00:00
Hermes VM
d60c81ebda docs: record internal port loopback hardening
Some checks failed
pre-commit / pre-commit (push) Failing after 38s
2026-05-27 21:25:38 +00:00
Hermes VM
2fc23d6baa feat(vm): fix devops-backend VM module — Phase 0.1 complete
- Switch backend runner from node:20-alpine to node:20-slim so GNU df
  flags (--output=pcent/avail) work inside the container
- Add volume mounts to docker-compose.yml: scripts (ro), VM logs (rw),
  docker.sock; set VM_SCRIPTS_PATH + VM_LOG_DIR env vars
- Rebuild repository.ts: env-configurable paths, cron history parser,
  unhealthy-container inspector, Ollama model endpoints
- Add routes: GET /api/vm/cron-status, unhealthy containers, Ollama
  models, container restart, model unload
- vm-cleanup.sh: add step_cosmos_pglog, step_docker_aged_images; fix
  (( count++ )) → count=$(( count + 1 )) for set -e compatibility
- Add docs/VM_OBSERVABILITY_ROADMAP.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 21:13:45 +00:00
Hermes VM
5a2d92f519 docs: record VM container health fix
Some checks failed
pre-commit / pre-commit (push) Failing after 33s
2026-05-27 21:12:45 +00:00
e2db92f3b1 Add Hermes snapshot diff view 2026-05-27 21:05:57 +00:00
8f522e3505 Add Hermes dashboard improvement backlog 2026-05-27 21:02:23 +00:00
Hermes VM
9210a8890f feat: detect stale VM automation
Some checks failed
pre-commit / pre-commit (push) Failing after 32s
2026-05-27 21:00:43 +00:00
Hermes VM
3d5f369f3d docs: record Gitea runner recovery
Some checks failed
pre-commit / pre-commit (push) Failing after 40s
2026-05-27 20:58:16 +00:00
Hermes VM
1f2eea8268 docs: record VM backup and cron fixes
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
2026-05-27 20:56:11 +00:00
90f6db2014 Complete Hermes ops dashboard and roadmap 2026-05-27 20:53:58 +00:00
Hermes VM
e3d1dddf51 docs: add VM exposure inventory
Some checks are pending
pre-commit / pre-commit (push) Waiting to run
2026-05-27 20:51:27 +00:00
98a7915a38 Reconcile Hermes roadmap and dashboard status 2026-05-27 20:46:16 +00:00
ac79591903 Mark web search tooling complete 2026-05-27 20:46:16 +00:00
Hermes VM
313a775fa0 docs: strengthen VM security roadmap gates
Some checks are pending
pre-commit / pre-commit (push) Waiting to run
2026-05-27 20:34:37 +00:00
Hermes VM
2c125adb05 docs: add VM security blind spots roadmap
Some checks are pending
pre-commit / pre-commit (push) Waiting to run
2026-05-27 20:21:52 +00:00
c89018ae47 Tighten Telegram fallback wording 2026-05-27 20:18:46 +00:00
8145484136 Verify Telegram fallback platform context 2026-05-27 20:16:30 +00:00
8da66497cc Tighten Hermes local fallback chain 2026-05-27 19:58:09 +00:00
3e26f0da31 Close Hermes browser and web backend items 2026-05-27 19:23:55 +00:00
root
d1f234fc01 Mark Firecrawl as locally configured 2026-05-27 18:57:50 +00:00
Hermes VM
70d96d7684 feat: add gitea backup timer assets 2026-05-27 18:53:20 +00:00
Hermes VM
147db72330 docs: add hostinger maintenance operations entry 2026-05-27 18:53:20 +00:00
Hermes VM
31b414d62b fix: systematic bug fixes — code-quality parser, env key, config warnings, auth cleanup, deployment safety
- code-quality/repository.ts: fix tsErrorMatch[3] → [4] for type field (group 3 is column, 4 is error|warning)
- code-quality/repository.ts: fix ESLint regex to make rule brackets optional (not all formatters include them)
- code-quality/repository.ts: fix Vitest test count — parse 'Tests' line (individual tests) instead of 'Test Files' (file count); improve Jest regex to capture pass/fail independently
- env/repository.ts: replace raw process.env.ENCRYPTION_KEY with config.ENCRYPTION_KEY so the validated default flows through a single source of truth
- config.ts: add startup console.warn when CSRF_SECRET or ENCRYPTION_KEY are using insecure defaults
- deployments/orchestrator.ts: refactor runDeploymentScript to use try/catch/finally — deployment record is now always written in the finally block, preventing zombie 'running' states if updateDeployment itself throws
- auth.tsx: remove dead 'user &&' guard (user is always truthy after the !user check above); remove debug console.log calls, keep console.error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
Hermes VM
cdc23696b2 fix: resolve all TypeScript errors — green tsc
Primitives.tsx (TS2339):
- asChild branch read children.props.className before the cast applied,
  making props typed as unknown. Extract typedChild first, then read props.

hermes/page.tsx + agents/page.tsx + tasks/page.tsx + tasks/[id]/page.tsx (TS2322):
- Badge.variant accepts 'neutral'|'success'|'warning'|'error'|'info' but
  callers were passing 'danger' (should be 'error') and 'default' (should
  be 'neutral'). MetricCard.tone is a separate type and is correct as-is.

Changes:
- statusTone map in hermes/page.tsx: 'danger' → 'error', 'default' → 'neutral'
- getTaskTone fallback: 'default' → 'neutral'; explicit return type added
- levelTone in tasks/[id]/page.tsx: 'danger' → 'error'; explicit return type added
- Inline Badge variants: all remaining 'danger' → 'error' across 3 files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
Hermes VM
1099d518ef improve: dashboard security, code quality, and UX fixes
Security (backend):
- env/routes: add requireAdmin to all 6 env endpoints — GET /env was
  fully open, exposing all secret values to unauthenticated requests
- deployments/routes: add requireAdmin to all 4 GET endpoints (deployment
  history and logs were publicly readable)
- health/routes: remove duplicate requireAdmin call from DELETE /health/cache
  handler body (was already enforced via preHandler)

Frontend — auth/api:
- system/page: replace raw fetch + localStorage token with apiRequest
  (mutations now go through CSRF flow)
- vm/page: same — replace raw fetch with vmApi from api.ts
- api.ts: add vmApi (getHealth, getCleanupLog, runCleanup) + shared
  VmHealthResult / VmCheck / VmCheckLevel types

Shared utilities:
- utils.ts: add formatBytes() and getStatusColor() shared helpers
- system/page: remove duplicate formatBytes, import from utils
- health/page: remove duplicate getStatusColor, import from utils
- page.tsx (home): remove duplicate getStatusColor, import from utils

UX improvements:
- page.tsx: remove Seed Services button from normal header (debug tool)
- page.tsx: deploy button now always enabled; shows inline warning banner
  when service is not 'up' instead of silently disabling the button
- metrics: fix bar chart — bars now grow from bottom (flex-col-reverse),
  add empty state, fix date parsing timezone edge case
- sidebar-nav: theme toggle now functional — persists to localStorage and
  toggles document.documentElement class 'dark'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
Hermes VM
d0b8ce2c74 feat: add VM Health page to devops dashboard
Backend (Fastify):
- New module: modules/vm/ (types, repository, routes)
- GET  /api/vm/health      — runs vm-health-check.sh --json, returns structured result
- GET  /api/vm/cleanup-log — tails /var/log/vm-cleanup.log
- POST /api/vm/cleanup     — triggers vm-cleanup.sh (weekly / monthly / dry-run)
- Registered vmRoutes in server.ts

Frontend (Next.js):
- New page: /vm — VM Health
  - Overall status banner (OK/WARN/CRIT) with issue summary
  - Per-check cards: disk, load, RAM, swap, crash loops, container health,
    build cache, docker images, journal, syslog — color-coded by level
  - Cleanup trigger buttons (dry-run, weekly, monthly) with output viewer
  - Collapsible cleanup log viewer (last 40 lines)
  - Auto-refresh every 60s
- sidebar-nav.tsx: added 'VM Health' entry with Server icon

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
Hermes VM
678430d77d fix: cron health-check entry should call vm-health-check.sh --notify
The 07:00 daily cron was incorrectly pointing to vm-cleanup.sh instead
of vm-health-check.sh. Health check is read-only; cleanup is not.
Also add --notify so Telegram alerts fire when WARNING/CRITICAL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
Hermes VM
0a2d303f93 add HostingerVM health-check and cleanup scripts
- vm-health-check.sh: read-only checks for disk, load, RAM, swap,
  Docker containers (crash-loops + healthchecks), build cache, journal.
  Flags: --quiet, --json, --notify (Telegram). Exit 0/1/2 = OK/WARN/CRIT.

- vm-cleanup.sh: safe periodic cleanup.
  Default (weekly): build cache, journal, apt, npm, .next/cache.
  --full (monthly): adds docker system prune, pnpm store, old logs, HOLD cleanup.
  --dry-run, --install-cron, --uninstall-cron.
  Logs to /var/log/vm-cleanup.log.

Related: docs/hostinger-vm-maintenance.md, scripts/VMs/HostingerVM/CRON_SETUP.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
root
4249b17afc Document Firecrawl backend selection 2026-05-27 18:52:39 +00:00
root
08f32a79e8 Clarify remaining Hermes fallback verification 2026-05-27 18:46:32 +00:00
root
8fbb535d90 Add shared local Hermes fallback chain 2026-05-27 18:43:30 +00:00
root
9ee060e839 Harden Hermes operations dashboard status 2026-05-27 17:45:41 +00:00
root
0e6528b366 Add live Hermes operations dashboard 2026-05-27 13:04:36 +00:00
saravanakumardb1
babe2e6c13 docs(roadmap): v14 \xe2\x80\x94 ALL 20 ITEMS COMPLETE (C5 closed end-to-end)
C5 fully closed by:
1. Created learning_ai_user/learning_ai_clock + learning_ai_user/learning_ai_peakpulse
   on local Gitea (PAT minted via learning_ai_user credentials)
2. Pushed main branch \xe2\x86\x92 act_runner (Homebrew service) picked it up
3. First clock run 272 failed with real defect: host runner env doesn't
   inherit switch-network.sh exports. Fix landed in both pilots' ci.yml
   docker-lint job: explicit env: block + read token from
   ~/.gitea_npm_token at step time.
4. Verified green:
   - clock run 273 job 675 docker-lint \xe2\x86\x92 success
   - peakpulse runs 274 + 275 docker-lint \xe2\x86\x92 success

Roadmap final state: 20/20 items DONE.
2026-05-27 05:20:48 -07:00
root
3cc9a1456e Add Google Drive single file uploader 2026-05-27 12:19:45 +00:00
root
79ca56ffce Add Google Drive emergency bundle upload 2026-05-27 12:08:41 +00:00
saravanakumardb1
484c82c4b1 docs(roadmap): repair v13 \xc2\xa710 corruption + finalize C5 partial-validation note
A prior rebase merged the v13/v13.1 edits into \xc2\xa710 with mangled text
(steps 11\xe2\x80\x9320 out of order; step 10 garbled). Rebuilt the section
cleanly from v12 base + appended the new v13/v13.1 steps:

  11. Phase E1/E2/E5
  12. Phase B
  13. Phase B4 + E3/E4/E6
  14. Phase C (8/9; C5 partial)
  15. Phase D.1
  16. Phase D.2
  17. B7-4 AGENTS.md warnings
  18. Phase D extension (MindLyst, LysnrAI, talk2obsidian)
  19. Phase D.3 advisory cleanup
  20. C5 partial validation (this session)

Restored the lost "ported back to clock" trailing line for step 9.
No content changes beyond what was already documented in v13/v13.1.
2026-05-27 04:34:53 -07:00
saravanakumardb1
2d13ae4c54 docs(roadmap): v13.1 \xe2\x80\x94 C5 partial validation (Gitea hosting gap documented)
Findings from dummy check-in attempt:
- Pilot workflow YAML parses cleanly (6 jobs on clock incl. docker-lint)
- Local simulation of docker-lint job (gitea-doctor + docker-doctor)
  exits 0 on both pilots
- Pilot repos are NOT hosted on Gitea (`git push gitea` returns 404).
  Only `learning_ai_uxui_web` exists at localhost:3300
- Until pilot repos are mirrored to Gitea, the .gitea/workflows/ci.yml
  file ships but the runner never fires
- C5 marked as partial; gap recorded explicitly in \xc2\xa7Phase C and \xc2\xa710
2026-05-27 04:32:33 -07:00
root
bb15a225cd Add encrypted Hermes emergency bundle scripts 2026-05-27 11:31:58 +00:00
saravanakumardb1
e96b555f07 docs(roadmap): v13 \xe2\x80\x94 12/12 consumer repos PASS docker-doctor (Phase D extension + D.3)
Final-state summary:
- All 12 consumer repos now PASS docker-doctor with zero errors
- MindLyst + LysnrAI + talk2obsidian onboarded (was previously out of scope)
- docker-doctor learned Python Dockerfile detection
- 10 repos received advisory-warning cleanup commits (compose build.args
  + healthcheck.start_period)
- C5 (CI green confirmation) is the only remaining follow-up

The roadmap is now in a fully landed state for in-scope repos.
2026-05-27 04:27:15 -07:00
root
19fdba752c Add Hermes disaster recovery runbook 2026-05-27 11:23:07 +00:00
saravanakumardb1
ccd6ee4f7f docs(roadmap): v12 \xe2\x80\x94 all phases (A, B, C, D, E) complete for 9 consumer repos
- B7-4 AGENTS.md warnings landed in all 9 repos
- C9 web smoke test (Playwright) landed on clock to guard F11 regression
- D.2 per-repo Dockerfile/compose fixes applied to all 7 consumer repos
  via idempotent fixer; docker-doctor PASS on every consumer repo
- 3 non-consumer repos (MindLyst KMP, LysnrAI multi-target, talk2obsidian)
  remain out of scope; documented as follow-up
- C5 confirmation pending next Gitea CI run

Final status: 18 of 18 in-scope items complete.
2026-05-27 04:17:52 -07:00
root
547a9d00fa Clarify root GitHub credential ownership 2026-05-27 11:10:48 +00:00
saravanakumardb1
6a4e289edc docs(roadmap): v11 \xe2\x80\x94 Phases B4/E3/E4/E6 + C (7/9 gates) + D.1 (artifacts rolled out)
- B4: pre-commit guard + husky wiring landed
- E3/E4/E6: CI job + pre-commit warn-only + make doctor target
- C1\xe2\x80\x93C4, C6\xe2\x80\x93C8: verified on pilots; C5 pending CI, C9 deferred
- D.1: artifacts deployed to 7/9 consumer repos with per-repo findings table
- D.2: per-repo Dockerfile fixes captured as a fix matrix (follow-up work)
- All commit refs documented in \xc2\xa710 execution order
2026-05-27 04:07:27 -07:00
root
416f25794c Document Hermes Gitea token flow 2026-05-27 11:06:15 +00:00
saravanakumardb1
11c185e772 docs(roadmap): v10 — Phase B complete (canonical docker-prep + sync tooling)
- All B-tasks complete except B4 (husky hook) and B7-4 (AGENTS.md updates)
- Canonical home landed at common-plat@a418a23e
- Both pilots synced; end-to-end verified on clock + peakpulse
- 3 bonus capabilities documented (--check, portable sed, .gitkeep preservation)
- \xc2\xa710 execution step 12 marked done with commit refs
2026-05-27 03:49:52 -07:00
root
8de72351de Complete Hermes dashboard and watchdog roadmap audit 2026-05-27 10:45:29 +00:00
saravanakumardb1
15ac960faf docs(roadmap): v9 — Phase E1/E2/E5 done, docker-doctor.sh landed
- Marked E1, E2, E5 complete in Phase E checklist
- Added step 11 to \xc2\xa710 execution order with commit refs
- Renumbered remaining steps; deferred E3/E4/E6 to after Phase B
2026-05-27 03:33:35 -07:00
root
a6e509247f Record Tailscale login for Hermes 2026-05-27 10:31:23 +00:00