bytelyst-devops-tools/scripts/VMs/HostingerVM
Hermes VM 13a105ba23 feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert
vm-health-check.sh:
- check_gpu(): nvidia-smi probe; "CPU-only" OK on this VM (no GPU)
- check_image_freshness(): flag containers running images >30d old.
  Skips third-party images (gitea, grafana, prom, mcr.microsoft, axllent,
  caddy, traefik, valkey, cadvisor) — they have their own rebuild cadence.
  Currently flags 19 stale product images (~60d old).

chaos-validation.sh:
- Monthly chaos test: kill PID 1 in chronomind-web, wait up to 35 min
  for docker-health-watchdog to detect + restart. Telegram pass/fail.
- Refuses to run if target not healthy. systemd timer fires 1st of month
  at 10:00 UTC (after 08:00 weekly digest).

vm-io-anomaly-check.sh:
- 6h avg sda write rate; transition alerts at WARN (1 GB/hr) /
  CRIT (2.5 GB/hr). De-dupes via /var/log/vm-io-anomaly-state so the
  alert fires once per transition, not every 6h. Current baseline:
  ~1.94 GB/hr (orphan-container state-file writes; see Phase 0.3).
- Reports recovery to OK when rate drops back.

vm/page.tsx: gpu + image_freshness added to CHECK_META so they render
with proper icon/label and slot into CHECK_ORDER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 05:26:49 +00:00
..
chaos-validation.sh feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert 2026-05-30 05:26:49 +00:00
CRON_SETUP.md add HostingerVM health-check and cleanup scripts 2026-05-27 18:53:20 +00:00
docker-health-watchdog.sh feat(vm): Phases 1.2, 1.4, 2.1 — steal time, swap pressure, health watchdog 2026-05-27 21:31:09 +00:00
login.sh chore: move login sh into scripts folder 2026-05-09 15:58:08 -07:00
orphan-containers.md feat(vm): Phase 2.3 closure — OOM watchdog + orphan-container docs 2026-05-30 05:26:49 +00:00
vm-cleanup.sh feat(vm): fix devops-backend VM module — Phase 0.1 complete 2026-05-27 21:13:45 +00:00
vm-health-check.sh feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert 2026-05-30 05:26:49 +00:00
vm-io-anomaly-check.sh feat(vm): Phase 5 closure — GPU/freshness checks, chaos validation, I/O alert 2026-05-30 05:26:49 +00:00
vm-oom-watchdog.sh feat(vm): Phase 2.3 closure — OOM watchdog + orphan-container docs 2026-05-30 05:26:49 +00:00
vm-weekly-digest.sh feat(dashboard/vm): Phases 4.1-4.3 — Prometheus trends, sparklines, weekly digest 2026-05-30 05:26:49 +00:00