learning_ai_common_plat

bytelyst/learning_ai_common_plat

Fork 0

Commit Graph

Author	SHA1	Message	Date
saravanakumardb1	c63736459b	feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5 ) Make the capacity autoscaling signal safe to act on automatically and observable in Grafana. Anti-flap hysteresis: - New pure applyHysteresis: suppresses a direction reversal (scale_in after scale_out, or vice versa) within a cooldown window so a consumer cannot thrash capacity. A critical scale-out (queued work, zero usable capacity) always bypasses the cooldown. Cooldown anchor only advances on an emitted action, so a suppressed signal keeps counting down from the real last action. - Process-wide per-product cooldown state (mirrors reaper/breaker in-mem state) with a test seam; cooldown tunable via FLEET_AUTOSCALE_COOLDOWN_SEC (default 300). - GET /fleet/autoscale[/all] now serve the debounced (stateful) recommendation. Observability: - Prometheus exposition emits the RAW recommendation per product (fleet_autoscale_recommended_seats/delta/pressure + one-hot fleet_autoscale_action {action}). RAW (not stateful) so a scrape never mutates the cooldown anchors. - Grafana "Fleet Overview" gains two panels: products recommending scale-out (stat) + recommended seat delta vs backlog (timeseries). Docs: FLEET_AUTOSCALE_COOLDOWN_SEC in .env.example. Tests: +10 (hysteresis/stateful/cooldown + prom autoscale series); full suite 1856 green; lint + tsc clean. Verified live: a throwaway Prometheus scraped the running service and the dashboard PromQL returned real scale-out/scale-in recommendations across products. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-06-01 23:02:08 -07:00
saravanakumardb1	93d1caf4a2	feat(fleet): Prometheus metrics export + Grafana dashboard (ops #4 ) Exports fleet observability to Prometheus/Grafana (previously JSON-only). - GET /api/fleet/metrics/prom: global, product-labelled Prometheus exposition (queue depth, blocked/active, per-stage histogram, factory health/seats/ utilization, active alerts, budget spent/ceiling/projected) plus process-wide reaper/GC counters and engine circuit-breaker state. Pure renderer (renderFleetMetricsProm) is unit-tested; route auth accepts a FLEET_METRICS_TOKEN bearer (scrape path) or an admin JWT — never world-readable by default. - Infra: add a prometheus container to docker-compose + a platform-service-fleet scrape job; pin the Prometheus Grafana datasource uid; add a provisioned "Fleet Overview" dashboard (breakers, dead-letter, stale factories, alerts, queue depth, utilization, budget burn, reaper rate) with a product template var. - Document FLEET_METRICS_TOKEN + the fleet feature flags in .env.example. No default behavior change: the endpoint is additive and the new container is opt-in via the compose stack. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-06-01 22:24:03 -07:00

Author

SHA1

Message

Date

saravanakumardb1

c63736459b

feat(fleet): anti-flap hysteresis + autoscale Prometheus series & dashboard (ops #5 )

Make the capacity autoscaling signal safe to act on automatically and observable
in Grafana.

Anti-flap hysteresis:
- New pure applyHysteresis: suppresses a direction reversal (scale_in after
  scale_out, or vice versa) within a cooldown window so a consumer cannot thrash
  capacity. A critical scale-out (queued work, zero usable capacity) always
  bypasses the cooldown. Cooldown anchor only advances on an emitted action, so a
  suppressed signal keeps counting down from the real last action.
- Process-wide per-product cooldown state (mirrors reaper/breaker in-mem state)
  with a test seam; cooldown tunable via FLEET_AUTOSCALE_COOLDOWN_SEC (default 300).
- GET /fleet/autoscale[/all] now serve the debounced (stateful) recommendation.

Observability:
- Prometheus exposition emits the RAW recommendation per product
  (fleet_autoscale_recommended_seats/delta/pressure + one-hot fleet_autoscale_action
  {action}). RAW (not stateful) so a scrape never mutates the cooldown anchors.
- Grafana "Fleet Overview" gains two panels: products recommending scale-out
  (stat) + recommended seat delta vs backlog (timeseries).

Docs: FLEET_AUTOSCALE_COOLDOWN_SEC in .env.example.

Tests: +10 (hysteresis/stateful/cooldown + prom autoscale series); full suite 1856
green; lint + tsc clean. Verified live: a throwaway Prometheus scraped the running
service and the dashboard PromQL returned real scale-out/scale-in recommendations
across products.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2026-06-01 23:02:08 -07:00

saravanakumardb1

93d1caf4a2

feat(fleet): Prometheus metrics export + Grafana dashboard (ops #4 )

Exports fleet observability to Prometheus/Grafana (previously JSON-only).

- GET /api/fleet/metrics/prom: global, product-labelled Prometheus exposition
  (queue depth, blocked/active, per-stage histogram, factory health/seats/
  utilization, active alerts, budget spent/ceiling/projected) plus process-wide
  reaper/GC counters and engine circuit-breaker state. Pure renderer
  (renderFleetMetricsProm) is unit-tested; route auth accepts a FLEET_METRICS_TOKEN
  bearer (scrape path) or an admin JWT — never world-readable by default.
- Infra: add a prometheus container to docker-compose + a platform-service-fleet
  scrape job; pin the Prometheus Grafana datasource uid; add a provisioned
  "Fleet Overview" dashboard (breakers, dead-letter, stale factories, alerts,
  queue depth, utilization, budget burn, reaper rate) with a product template var.
- Document FLEET_METRICS_TOKEN + the fleet feature flags in .env.example.

No default behavior change: the endpoint is additive and the new container is
opt-in via the compose stack.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2026-06-01 22:24:03 -07:00

2 Commits