learning_ai_common_plat/services/monitoring
Saravana Kumar fe8338c2c5 feat(monitoring): add VM Overview Grafana dashboard
12-panel dashboard auto-provisioned via /var/lib/grafana/dashboards:
  - 4 stat tiles (disk %, RAM avail, swap used, CPU steal) with
    threshold colouring matching vm-health-check.sh
  - 4 time-series (disk %, RAM trend, steal, sda write GB/hr) — 7d default
  - 2 bargauge top-10 by RAM and CPU (cAdvisor container_memory_working_set,
    container_cpu_usage)
  - Load average (1/5/15) + network throughput (RX/TX, host interfaces)

uid: vm-overview. Picked up on next Grafana boot.

Closes Phase 5: "Add Grafana" item from VM observability roadmap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 21:26:35 +00:00
..
grafana feat(monitoring): add VM Overview Grafana dashboard 2026-05-29 21:26:35 +00:00
loki fix(common): configure ESLint 9 and fix lint issues 2026-02-12 16:37:30 -08:00
prometheus feat(observability): add phase 2 monitoring and valkey services 2026-03-31 06:57:12 +00:00
health-check.local.sh fix(monitoring): update health-check endpoints for consolidated services 2026-02-17 20:53:37 -08:00
health-check.ts chore(monitoring): document health-check output 2026-05-04 16:34:27 -07:00
package.json feat(monitoring): add @bytelyst/monitoring package 2026-02-14 15:57:41 -08:00
tsconfig.json feat(services): add monitoring (Loki + Grafana config, health-check) 2026-02-12 11:39:24 -08:00