# Exchange API Degradation Runbook ## Incident description The exchange API responds slowly or returns errors, affecting ENTRY/EXIT execution and reconciliation data. ## Symptoms - Exchange latency histogram in `/metrics` shows spikes; errors logged from exchange connector. - `tradingLoopHealthy` or `monitorLoopHealthy` flag false because loops hit timeouts. - Logs show `exchange timeout` or repeated `429`/`503` responses. ## Metrics to check - `/internal/health` ? `tradingLoopHealthy`, `monitorLoopHealthy`, `exchangeLatencyHistogram`. - `/metrics` ? `exchange_api_latency_seconds`, `exchange_api_errors_total`, `entry_orders_rejections_total`. ## Immediate mitigation 1. Back off new ENTRY signals for profiles if exchange is unreachable. 2. Ensure deterministic clientOrderId is ready before retries; do not reissue new orders. 3. Activate retry/backoff logic in connectors; log each retry with correlation IDs. 4. Inform downstream systems (dashboard, ops) about degraded state. ## Expected self-recovery - Exchange recovers and accepts pending requests; trading loop resumes once latency normalizes. - Reconciliation loop eventually runs against fresh data; metrics fall back to baseline. ## When to escalate - Errors persist beyond 5 minutes despite retries. - Exchange reports credential or rate-limit problems requiring intervention. - Business-critical trading windows are missed. Escalate to the Exchange Account Manager and Cloud Ops; reference docs/runbooks/exchange-degradation.md. ## What NOT to do - Do not flood the exchange with retries; respect backoff policies. - Do not change API keys mid-incident without direction from the exchange team. - Do not pause reconciliation; accurate state is needed to diagnose missing fills.