bytelyst-devops-tools/dashboard/web/src
Hermes VM 13a105ba23 feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert
vm-health-check.sh:
- check_gpu(): nvidia-smi probe; "CPU-only" OK on this VM (no GPU)
- check_image_freshness(): flag containers running images >30d old.
  Skips third-party images (gitea, grafana, prom, mcr.microsoft, axllent,
  caddy, traefik, valkey, cadvisor) — they have their own rebuild cadence.
  Currently flags 19 stale product images (~60d old).

chaos-validation.sh:
- Monthly chaos test: kill PID 1 in chronomind-web, wait up to 35 min
  for docker-health-watchdog to detect + restart. Telegram pass/fail.
- Refuses to run if target not healthy. systemd timer fires 1st of month
  at 10:00 UTC (after 08:00 weekly digest).

vm-io-anomaly-check.sh:
- 6h avg sda write rate; transition alerts at WARN (1 GB/hr) /
  CRIT (2.5 GB/hr). De-dupes via /var/log/vm-io-anomaly-state so the
  alert fires once per transition, not every 6h. Current baseline:
  ~1.94 GB/hr (orphan-container state-file writes; see Phase 0.3).
- Reports recovery to OK when rate drops back.

vm/page.tsx: gpu + image_freshness added to CHECK_META so they render
with proper icon/label and slot into CHECK_ORDER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 05:26:49 +00:00
..
app feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert 2026-05-30 05:26:49 +00:00
components Add Hermes snapshot diff view 2026-05-27 21:05:57 +00:00
lib feat(dashboard/vm): Phase 3.3 — All Containers panel with CPU/RAM, logs, bulk restart 2026-05-30 05:26:49 +00:00
styles fix(devops-web): add design tokens with Docker-compatible approach 2026-05-11 02:34:53 +00:00
test feat(devops-web): fix responsive layout and add comprehensive dashboard pages 2026-05-11 03:10:31 +00:00