71 lines
4.5 KiB
Markdown
71 lines
4.5 KiB
Markdown
# Reconciliation Divergence Runbook
|
|
|
|
## Incident description
|
|
The reconciliation loop detects mismatches between Supabase orders/positions and exchange open orders, meaning the database drifted from exchange truth.
|
|
|
|
## Symptoms
|
|
- `reconciliationMismatchCount`, `reconciliationMissingFromExchange`, or `reconciliationMissingInDb` metrics rise.
|
|
- Logs show lifecycle-safe handlers executing to correct entry/exit states.
|
|
- Dashboard shows active orders or positions that do not exist on the exchange.
|
|
|
|
## Metrics to check
|
|
- `/internal/health` ? `reconciliationLoopHealthy`, `reconciliationMismatchCount`, `reconciliationLastRun`.
|
|
- `/metrics` ? `reconciliation_mismatch_total`, `reconciliation_missing_from_exchange_total`, `reconciliation_missing_in_db_total`.
|
|
- `/internal/health` runtime fields:
|
|
- `reconciliationParityMismatchTrades`
|
|
- `reconciliationParityQuarantinedTrades`
|
|
- `reconciliationParityAutoClosedTrades`
|
|
- `reconciliationParityMaxMismatchNotionalUsd`
|
|
- `reconciliationParityTotalMismatchNotionalUsd`
|
|
- `reconciliationIntegrityWatchdogTriggered`
|
|
|
|
## Automated parity heartbeat (ghost self-healing)
|
|
- Feature flag: `ENABLE_RECON_POSITION_PARITY_HEARTBEAT=true` (default is `true`; set `false` only for controlled rollback).
|
|
- Confirmation gate: `RECON_POSITION_PARITY_CONFIRMATIONS` (default `3` consecutive checks).
|
|
- Attribution safety gate: `RECON_POSITION_PARITY_REQUIRE_SUBTAG_ATTRIBUTION` (default `true`).
|
|
- Watchdog threshold: `RECON_POSITION_PARITY_MAX_NOTIONAL_PCT` (default `0.5` of allocated capital).
|
|
- Auto-resume gate: `ENABLE_RECON_WATCHDOG_AUTO_RESUME=true`.
|
|
- Auto-resume delay: `RECON_WATCHDOG_AUTO_RESUME_MIN_PAUSE_MS` (default `900000`).
|
|
- Auto-resume clean streak: `RECON_WATCHDOG_AUTO_RESUME_CLEAN_CYCLES` (default `2`).
|
|
- Auto-resume cooldown: `RECON_WATCHDOG_AUTO_RESUME_COOLDOWN_MS` (default `1800000`).
|
|
- Dry-run mode: `RECON_POSITION_PARITY_DRY_RUN=true` to observe without applying synthetic exits.
|
|
|
|
Heartbeat behavior:
|
|
- Detects ghost lifecycle slices where virtual open qty remains but exchange position is effectively zero.
|
|
- Requires consecutive mismatch confirmations before synthetic EXIT reconciliation.
|
|
- Enforces sub-tag attribution before any synthetic close; unattributed slices are quarantined.
|
|
- Triggers integrity watchdog pause when cumulative mismatch notional exceeds configured capital ratio.
|
|
- Auto-resumes trading only when pause source is parity watchdog and reconciliation stays clean for required consecutive cycles.
|
|
|
|
## EXIT backfill safety gates
|
|
- `RECON_EXIT_BACKFILL_REQUIRE_STRONG_ATTRIBUTION=true`:
|
|
- only uses exchange fills that are attributable to the profile/trade (`sub_tag`, deterministic `client_order_id`, or explicit `trade_id` hint).
|
|
- prevents auto-backfill from consuming unrelated account activity.
|
|
- `RECON_EXIT_BACKFILL_ALLOW_HEURISTIC_MATCH=false`:
|
|
- disables heuristic assignment modes (`single_open_trade`, `qty_unique`) by default.
|
|
- keeps unmatched rows in `NO_GO` for operator review instead of auto-closing.
|
|
- `RECON_EXIT_BACKFILL_FILL_AFTER_TRADE_GRACE_MINUTES=5`:
|
|
- rejects stale fill evidence that predates the lifecycle slice timestamp beyond grace.
|
|
- blocks historical fills from being attached to newer open trades.
|
|
|
|
## Immediate mitigation
|
|
1. Confirm the reconciliation lock is available for the affected profile to avoid double processing.
|
|
2. Allow the reconciliation loop to run; it will route mismatches through lifecycle-safe handlers (`reconcileEntryFill`, `reconcileExitFill`, `reconcileCancel`).
|
|
3. If divergence persists, manually inspect trade_history and positions for inconsistent state.
|
|
4. Notify stakeholders that reconciliation is running and that no manual edits should occur during the fix.
|
|
|
|
## Expected self-recovery
|
|
- Handler corrections align DB orders/positions with exchange data, and metrics return to zero.
|
|
- The capital ledger recalculates reservations, and dashboard data becomes consistent.
|
|
|
|
## When to escalate
|
|
- Mismatch metrics stay elevated after two reconciliation runs.
|
|
- Reconciliation lock contention prevents the loop from running.
|
|
- Exchanges report stale or unknown fills after reconciliation.
|
|
Escalate to the trading engineering lead and reference docs/runbooks/reconciliation.md and docs/runbooks/lifecycle-incident.md for follow-up.
|
|
|
|
## What NOT to do
|
|
- Do not manually patch `orders` or `positions` tables while reconciliation is active.
|
|
- Do not disable the reconciliation loop; divergence will only grow.
|
|
- Do not trigger new ENTRY/EXIT flows for the affected profile until reconciliation completes.
|