# Hostinger VM — Cron Setup

Automated maintenance schedule for `srv1491630`.
Scripts: `vm-health-check.sh` (read-only) + `vm-cleanup.sh` (safe cleanup).

---

## Quick install

SSH into the VM and run:

```bash
bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-cleanup.sh --install-cron
```

This installs the full recommended schedule. To remove it:

```bash
bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-cleanup.sh --uninstall-cron
```

---

## What gets scheduled

| Schedule | Time (UTC) | Command | What it does |
|---|---|---|---|
| Daily | 07:00 | `vm-health-check.sh` | Read-only check; sends Telegram alert on WARNING/CRITICAL |
| Daily | 03:00 | `vm-cleanup.sh` | Prune Docker build cache only (always safe) |
| Weekly | Sun 02:00 | `vm-cleanup.sh` | Standard cleanup (see below) |
| Monthly | 1st 01:00 | `vm-cleanup.sh --full` | Full cleanup (see below) |

---

## What each mode does

### Standard weekly cleanup (`vm-cleanup.sh`)

All steps are labelled **SAFE** — they only remove regenerable caches.

| Step | What's removed | Risk |
|---|---|---|
| Docker build cache | Layer cache from `docker build` runs | Zero — rebuilds just take longer next time |
| Crash loop check | Detection only, no changes | Zero |
| Journal vacuum | Old journal entries beyond 200MB / 7 days | Zero — logs are already captured in syslog |
| APT cache | `/var/cache/apt/archives/` | Zero — packages can be re-downloaded |
| NPM cache | `~/.npm/_cacache/` | Zero — cache is re-populated on next `npm install` |
| `.next/cache` | Webpack/babel/TSC build cache dirs | Zero — rebuilt automatically on next `next build` |

### Monthly full cleanup (`vm-cleanup.sh --full`)

Adds these **CAREFUL** steps on top of the standard run:

| Step | What's removed | Risk |
|---|---|---|
| Docker system prune | Stopped containers, unused networks, dangling images | Low — does NOT remove images used by any container |
| pnpm store prune | Packages not referenced by any `node_modules` | Low — only removes truly orphaned packages |
| Old log files | `.gz` log rotations older than 30 days | Low — old compressed logs |
| HOLD node_modules | `node_modules` in `/opt/bytelyst/HOLD` archived projects | Low — code intact, can reinstall with `pnpm install` |

### Never touched (by design)

- `/opt/bytelyst/*/node_modules` (active repos)
- `/opt/bytelyst/*/src`, `/app`, `/backend`, `/web` source code
- `.next/standalone` (production Next.js builds)
- Docker images used by currently configured containers
- `/usr/local/lib/hermes-agent/`
- `/usr/share/ollama/` (models)
- `/swapfile`
- Any database volumes

---

## Manual crontab (if you prefer not to use --install-cron)

```
# Health check daily 07:00 UTC
0 7 * * * bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-health-check.sh --quiet --notify 2>&1 | logger -t vm-health

# Build cache prune daily 03:00 UTC
0 3 * * * bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-cleanup.sh --quiet 2>&1 | logger -t vm-cleanup

# Standard weekly cleanup Sunday 02:00 UTC
0 2 * * 0 bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-cleanup.sh --quiet 2>&1 | logger -t vm-cleanup

# Full monthly cleanup 1st of month 01:00 UTC
0 1 1 * * bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-cleanup.sh --full --quiet 2>&1 | logger -t vm-cleanup
```

Edit with: `crontab -e`

---

## Monitoring logs

```bash
# Tail cleanup log
tail -f /var/log/vm-cleanup.log

# Tail health check log
tail -f /var/log/vm-health-check.log

# See all cron output via syslog
grep vm-cleanup /var/log/syslog | tail -20
grep vm-health /var/log/syslog | tail -20
```

---

## Telegram alerts

The health check script sends a Telegram message when it detects WARNING or CRITICAL.
It reads credentials from `$HERMES_HOME/.env` (usually `/root/.hermes/.env`).

Required keys in that file:
```
TELEGRAM_BOT_TOKEN=<your-bot-token>
TELEGRAM_CHAT_ID=<your-chat-id>
```

Both are already set if Hermes gateway is configured. Test with:

```bash
bash /opt/bytelyst/learning_ai_devops_tools/scripts/VMs/HostingerVM/vm-health-check.sh --notify
```

---

## Disk thresholds (from `vm-health-check.sh`)

| Metric | WARNING | CRITICAL |
|---|---|---|
| Disk used `%` | > 55% | > 70% |
| Load average | > 4.0 | > 8.0 |
| RAM available | < 3 GB | < 1 GB |
| Swap used | > 1 GB | > 3 GB |
| Container restarts | > 10 | > 50 |
| Build cache | > 5 GB | > 20 GB |

Thresholds are constants at the top of each script — easy to adjust.

---

## What the May 2026 incident would have caught

If this cron had been running during the May 26 incident:

- **07:00 daily health check** → `container_loops CRIT: admin-web(50x)` → Telegram alert sent within hours of the loop starting
- **03:00 daily build cache prune** → would have kept build cache under 5 GB instead of growing to 84 GB
- **Monthly full cleanup** → would have cleared the HOLD node_modules and old logs before they became a storage crisis