1.8 KiB
1.8 KiB
ENTRY Lock Contention Spike Runbook
Incident description
A profile experiences repeated failures to acquire the row-based entry lock, blocking ENTRY signals and indicating pressure on horizontal scaling.
Symptoms
lockContentionCountincrements in/internal/healthand/metrics.- Logs show fn_try_acquire_entry_lock_row returning false with owner tokens different from the caller.
- Trading loop reports
lock acquisition failedwarnings and may skip signals.
Metrics to check
/internal/health?lockContentionCount,tradingLoopHealthy,reconciliationLoopHealthy./metrics?entry_lock_contention_total,lock_acquisition_latency_seconds,entry_lock_holder_info(if available).
Immediate mitigation
- Identify the profile_id and symbol from logs; confirm if another worker legitimately holds the lock.
- Ensure the existing lock owner is still alive or has not crashed; use Supabase to inspect
entry_locksTTL. - Wait for TTL expiry (default 30s) before retrying if owner appears stuck.
- Avoid forcing lock release unless owner is confirmed dead; manual deletion risks concurrent exchange submission.
Expected self-recovery
- The TTL expires, the lock row updates or deletes itself, and the next signal acquires the lock.
- Metrics return to baseline if contention was transient.
When to escalate
- Contention persists beyond three TTL cycles (90s).
- Multiple profiles report contention simultaneously.
- Lock rows show expired timestamps but fail to refresh. Escalate to Platform Ops and refer to docs/runbooks/lock-timeout.md (if it exists) for lock escalation.
What NOT to do
- Do not delete lock rows manually while other workers are active.
- Do not restart all workers; indiscriminate restarts magnify contention.
- Do not trigger new ENTRY signals for the affected profile until lock clears.