learning_ai_common_plat

History

saravanakumardb1 c63736459b feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5 ) Make the capacity autoscaling signal safe to act on automatically and observable in Grafana. Anti-flap hysteresis: - New pure applyHysteresis: suppresses a direction reversal (scale_in after scale_out, or vice versa) within a cooldown window so a consumer cannot thrash capacity. A critical scale-out (queued work, zero usable capacity) always bypasses the cooldown. Cooldown anchor only advances on an emitted action, so a suppressed signal keeps counting down from the real last action. - Process-wide per-product cooldown state (mirrors reaper/breaker in-mem state) with a test seam; cooldown tunable via FLEET_AUTOSCALE_COOLDOWN_SEC (default 300). - GET /fleet/autoscale[/all] now serve the debounced (stateful) recommendation. Observability: - Prometheus exposition emits the RAW recommendation per product (fleet_autoscale_recommended_seats/delta/pressure + one-hot fleet_autoscale_action {action}). RAW (not stateful) so a scrape never mutates the cooldown anchors. - Grafana "Fleet Overview" gains two panels: products recommending scale-out (stat) + recommended seat delta vs backlog (timeseries). Docs: FLEET_AUTOSCALE_COOLDOWN_SEC in .env.example. Tests: +10 (hysteresis/stateful/cooldown + prom autoscale series); full suite 1856 green; lint + tsc clean. Verified live: a throwaway Prometheus scraped the running service and the dashboard PromQL returned real scale-out/scale-in recommendations across products. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>		2026-06-01 23:02:08 -07:00
..
grafana	feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5 )	2026-06-01 23:02:08 -07:00
loki	fix(common): configure ESLint 9 and fix lint issues	2026-02-12 16:37:30 -08:00
prometheus	feat(fleet): Prometheus metrics export + Grafana dashboard (ops #4 )	2026-06-01 22:24:03 -07:00
health-check.local.sh	fix(monitoring): update health-check endpoints for consolidated services	2026-02-17 20:53:37 -08:00
health-check.ts	chore(monitoring): document health-check output	2026-05-04 16:34:27 -07:00
package.json	chore(deps): bump @types/node 22 -> 25 (dev types)	2026-05-31 04:02:56 -07:00
tsconfig.json	feat(services): add monitoring (Loki + Grafana config, health-check)	2026-02-12 11:39:24 -08:00