Phase 1.2 — CPU steal time metric in vm-health-check.sh: - Samples /proc/stat twice 1s apart for accurate current steal % - Thresholds: >5% WARN, >15% CRIT (currently 0.8% on this host) - Inserts before memory check so steal is visible alongside load Phase 1.4 — Swap pressure indicator: - Reads SwapCached from /proc/meminfo as secondary metric - Raises SWAP_USED_WARN_GB 1→1.5 to reduce noise (current usage 0.6G) - New WARN path: SwapCached > 200MB signals recent pressure even when current swap usage looks ok (catches post-spike state) Phase 2.1 — Docker health-check watchdog: - docker-health-watchdog.sh: checks unhealthy containers every 10 min, restarts only after 3 consecutive failing health checks (30min grace) - docker-health-watchdog.service + .timer: enabled, fires every 10 min - Sends Telegram notification on each auto-restart - Rollback: systemctl disable docker-health-watchdog.timer Phase 2.2 already complete: sync_hermes_persistent_backup.py handles diverge gracefully with rebase/reset-hard fallback; running successfully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| VMs | ||
| gitea-backup.sh | ||
| gitea-git | ||
| gitea-git-askpass | ||
| google-drive-upload-file.py | ||
| google-drive-upload-file.sh | ||
| hermes-emergency-bundle-create.sh | ||
| hermes-emergency-bundle-decrypt.sh | ||
| hermes-emergency-bundle-upload-drive.py | ||
| hermes-emergency-bundle-upload-drive.sh | ||
| hermes-google-drive-oauth-login.py | ||
| hermes-health-watchdog.py | ||
| monitor-lucky25-execution.sh | ||
| README.md | ||
| ubuntu-vm-security-update.sh | ||
Scripts
This directory is the preferred home for self-contained operational scripts.
Current Scripts
ubuntu-vm-security-update.sh- Supported.
- Purpose: update and harden Ubuntu VMs with unattended upgrades, UFW, and fail2ban.
- Risk level: high, because it modifies packages, firewall rules, and reboot behavior.
VMs/HostingerVM/vm-health-check.sh- Supported.
- Purpose: read-only VM health and drift check for disk, memory, swap, Docker health, failed systemd units, and stale root crontab script paths.
- Risk level: low, because it is read-only apart from an optional local log write.
Conventions
- New standalone operational scripts should go here instead of the repo root.
- Each script should document:
- prerequisites
- required environment variables
- destructive or privileged behavior
- example usage
- Scripts that change host state should support
--helpand a non-destructive preview mode when practical.
Legacy Note
The repo root still contains older shell utilities. Those are not all deprecated, but new work should prefer scripts/ for clearer ownership and discoverability.