vm-health-check.sh: - check_gpu(): nvidia-smi probe; "CPU-only" OK on this VM (no GPU) - check_image_freshness(): flag containers running images >30d old. Skips third-party images (gitea, grafana, prom, mcr.microsoft, axllent, caddy, traefik, valkey, cadvisor) — they have their own rebuild cadence. Currently flags 19 stale product images (~60d old). chaos-validation.sh: - Monthly chaos test: kill PID 1 in chronomind-web, wait up to 35 min for docker-health-watchdog to detect + restart. Telegram pass/fail. - Refuses to run if target not healthy. systemd timer fires 1st of month at 10:00 UTC (after 08:00 weekly digest). vm-io-anomaly-check.sh: - 6h avg sda write rate; transition alerts at WARN (1 GB/hr) / CRIT (2.5 GB/hr). De-dupes via /var/log/vm-io-anomaly-state so the alert fires once per transition, not every 6h. Current baseline: ~1.94 GB/hr (orphan-container state-file writes; see Phase 0.3). - Reports recovery to OK when rate drops back. vm/page.tsx: gpu + image_freshness added to CHECK_META so they render with proper icon/label and slot into CHECK_ORDER. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| VMs | ||
| gitea-backup.sh | ||
| gitea-git | ||
| gitea-git-askpass | ||
| google-drive-upload-file.py | ||
| google-drive-upload-file.sh | ||
| hermes-emergency-bundle-create.sh | ||
| hermes-emergency-bundle-decrypt.sh | ||
| hermes-emergency-bundle-upload-drive.py | ||
| hermes-emergency-bundle-upload-drive.sh | ||
| hermes-google-drive-oauth-login.py | ||
| hermes-health-watchdog.py | ||
| monitor-lucky25-execution.sh | ||
| README.md | ||
| ubuntu-vm-security-update.sh | ||
Scripts
This directory is the preferred home for self-contained operational scripts.
Current Scripts
ubuntu-vm-security-update.sh- Supported.
- Purpose: update and harden Ubuntu VMs with unattended upgrades, UFW, and fail2ban.
- Risk level: high, because it modifies packages, firewall rules, and reboot behavior.
VMs/HostingerVM/vm-health-check.sh- Supported.
- Purpose: read-only VM health and drift check for disk, memory, swap, Docker health, failed systemd units, and stale root crontab script paths.
- Risk level: low, because it is read-only apart from an optional local log write.
Conventions
- New standalone operational scripts should go here instead of the repo root.
- Each script should document:
- prerequisites
- required environment variables
- destructive or privileged behavior
- example usage
- Scripts that change host state should support
--helpand a non-destructive preview mode when practical.
Legacy Note
The repo root still contains older shell utilities. Those are not all deprecated, but new work should prefer scripts/ for clearer ownership and discoverability.