Make the capacity autoscaling signal safe to act on automatically and observable
in Grafana.
Anti-flap hysteresis:
- New pure applyHysteresis: suppresses a direction reversal (scale_in after
scale_out, or vice versa) within a cooldown window so a consumer cannot thrash
capacity. A critical scale-out (queued work, zero usable capacity) always
bypasses the cooldown. Cooldown anchor only advances on an emitted action, so a
suppressed signal keeps counting down from the real last action.
- Process-wide per-product cooldown state (mirrors reaper/breaker in-mem state)
with a test seam; cooldown tunable via FLEET_AUTOSCALE_COOLDOWN_SEC (default 300).
- GET /fleet/autoscale[/all] now serve the debounced (stateful) recommendation.
Observability:
- Prometheus exposition emits the RAW recommendation per product
(fleet_autoscale_recommended_seats/delta/pressure + one-hot fleet_autoscale_action
{action}). RAW (not stateful) so a scrape never mutates the cooldown anchors.
- Grafana "Fleet Overview" gains two panels: products recommending scale-out
(stat) + recommended seat delta vs backlog (timeseries).
Docs: FLEET_AUTOSCALE_COOLDOWN_SEC in .env.example.
Tests: +10 (hysteresis/stateful/cooldown + prom autoscale series); full suite 1856
green; lint + tsc clean. Verified live: a throwaway Prometheus scraped the running
service and the dashboard PromQL returned real scale-out/scale-in recommendations
across products.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
||
|---|---|---|
| .. | ||
| cowork-service | ||
| extraction-service | ||
| mcp-server | ||
| monitoring | ||
| platform-service | ||