4.5 KiB
4.5 KiB
Reconciliation Divergence Runbook
Incident description
The reconciliation loop detects mismatches between Supabase orders/positions and exchange open orders, meaning the database drifted from exchange truth.
Symptoms
reconciliationMismatchCount,reconciliationMissingFromExchange, orreconciliationMissingInDbmetrics rise.- Logs show lifecycle-safe handlers executing to correct entry/exit states.
- Dashboard shows active orders or positions that do not exist on the exchange.
Metrics to check
/internal/health?reconciliationLoopHealthy,reconciliationMismatchCount,reconciliationLastRun./metrics?reconciliation_mismatch_total,reconciliation_missing_from_exchange_total,reconciliation_missing_in_db_total./internal/healthruntime fields:reconciliationParityMismatchTradesreconciliationParityQuarantinedTradesreconciliationParityAutoClosedTradesreconciliationParityMaxMismatchNotionalUsdreconciliationParityTotalMismatchNotionalUsdreconciliationIntegrityWatchdogTriggered
Automated parity heartbeat (ghost self-healing)
- Feature flag:
ENABLE_RECON_POSITION_PARITY_HEARTBEAT=true(default istrue; setfalseonly for controlled rollback). - Confirmation gate:
RECON_POSITION_PARITY_CONFIRMATIONS(default3consecutive checks). - Attribution safety gate:
RECON_POSITION_PARITY_REQUIRE_SUBTAG_ATTRIBUTION(defaulttrue). - Watchdog threshold:
RECON_POSITION_PARITY_MAX_NOTIONAL_PCT(default0.5of allocated capital). - Auto-resume gate:
ENABLE_RECON_WATCHDOG_AUTO_RESUME=true. - Auto-resume delay:
RECON_WATCHDOG_AUTO_RESUME_MIN_PAUSE_MS(default900000). - Auto-resume clean streak:
RECON_WATCHDOG_AUTO_RESUME_CLEAN_CYCLES(default2). - Auto-resume cooldown:
RECON_WATCHDOG_AUTO_RESUME_COOLDOWN_MS(default1800000). - Dry-run mode:
RECON_POSITION_PARITY_DRY_RUN=trueto observe without applying synthetic exits.
Heartbeat behavior:
- Detects ghost lifecycle slices where virtual open qty remains but exchange position is effectively zero.
- Requires consecutive mismatch confirmations before synthetic EXIT reconciliation.
- Enforces sub-tag attribution before any synthetic close; unattributed slices are quarantined.
- Triggers integrity watchdog pause when cumulative mismatch notional exceeds configured capital ratio.
- Auto-resumes trading only when pause source is parity watchdog and reconciliation stays clean for required consecutive cycles.
EXIT backfill safety gates
RECON_EXIT_BACKFILL_REQUIRE_STRONG_ATTRIBUTION=true:- only uses exchange fills that are attributable to the profile/trade (
sub_tag, deterministicclient_order_id, or explicittrade_idhint). - prevents auto-backfill from consuming unrelated account activity.
- only uses exchange fills that are attributable to the profile/trade (
RECON_EXIT_BACKFILL_ALLOW_HEURISTIC_MATCH=false:- disables heuristic assignment modes (
single_open_trade,qty_unique) by default. - keeps unmatched rows in
NO_GOfor operator review instead of auto-closing.
- disables heuristic assignment modes (
RECON_EXIT_BACKFILL_FILL_AFTER_TRADE_GRACE_MINUTES=5:- rejects stale fill evidence that predates the lifecycle slice timestamp beyond grace.
- blocks historical fills from being attached to newer open trades.
Immediate mitigation
- Confirm the reconciliation lock is available for the affected profile to avoid double processing.
- Allow the reconciliation loop to run; it will route mismatches through lifecycle-safe handlers (
reconcileEntryFill,reconcileExitFill,reconcileCancel). - If divergence persists, manually inspect trade_history and positions for inconsistent state.
- Notify stakeholders that reconciliation is running and that no manual edits should occur during the fix.
Expected self-recovery
- Handler corrections align DB orders/positions with exchange data, and metrics return to zero.
- The capital ledger recalculates reservations, and dashboard data becomes consistent.
When to escalate
- Mismatch metrics stay elevated after two reconciliation runs.
- Reconciliation lock contention prevents the loop from running.
- Exchanges report stale or unknown fills after reconciliation. Escalate to the trading engineering lead and reference docs/runbooks/reconciliation.md and docs/runbooks/lifecycle-incident.md for follow-up.
What NOT to do
- Do not manually patch
ordersorpositionstables while reconciliation is active. - Do not disable the reconciliation loop; divergence will only grow.
- Do not trigger new ENTRY/EXIT flows for the affected profile until reconciliation completes.