learning_ai_common_plat/services/platform-service
saravanakumardb1 c63736459b feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5)
Make the capacity autoscaling signal safe to act on automatically and observable
in Grafana.

Anti-flap hysteresis:
- New pure applyHysteresis: suppresses a direction reversal (scale_in after
  scale_out, or vice versa) within a cooldown window so a consumer cannot thrash
  capacity. A critical scale-out (queued work, zero usable capacity) always
  bypasses the cooldown. Cooldown anchor only advances on an emitted action, so a
  suppressed signal keeps counting down from the real last action.
- Process-wide per-product cooldown state (mirrors reaper/breaker in-mem state)
  with a test seam; cooldown tunable via FLEET_AUTOSCALE_COOLDOWN_SEC (default 300).
- GET /fleet/autoscale[/all] now serve the debounced (stateful) recommendation.

Observability:
- Prometheus exposition emits the RAW recommendation per product
  (fleet_autoscale_recommended_seats/delta/pressure + one-hot fleet_autoscale_action
  {action}). RAW (not stateful) so a scrape never mutates the cooldown anchors.
- Grafana "Fleet Overview" gains two panels: products recommending scale-out
  (stat) + recommended seat delta vs backlog (timeseries).

Docs: FLEET_AUTOSCALE_COOLDOWN_SEC in .env.example.

Tests: +10 (hysteresis/stateful/cooldown + prom autoscale series); full suite 1856
green; lint + tsc clean. Verified live: a throwaway Prometheus scraped the running
service and the dashboard PromQL returned real scale-out/scale-in recommendations
across products.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 23:02:08 -07:00
..
scripts chore(platform): document script CLI output 2026-05-04 16:45:42 -07:00
src feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5) 2026-06-01 23:02:08 -07:00
.gitignore fix(fleet): Phase 3 hardening — budget authz, idempotent accrual, cycle detection, artifact 2026-05-31 02:45:52 -07:00
Dockerfile fix(docker): INFRA-gap-02 unblock full-stack docker compose up 2026-04-16 15:48:32 -07:00
package.json chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
POSTAL_SMTP_SETUP.md feat(platform-service): add smtp email delivery and postal setup 2026-03-14 05:52:28 +00:00
tsconfig.json feat(services): add platform-service (auth, audit, flags, notifications, blob) 2026-02-12 11:39:00 -08:00
vitest.config.ts fix(ci): add --pool forks to all vitest test scripts to fix kill EPERM on Node v25 2026-03-27 23:23:38 -07:00