bytelyst/learning_ai_invt_trdg

Saravana Achu Mac 3cbbd6ccaa feat: scaffold trading monorepo foundation

2026-04-04 11:18:21 -07:00

1.8 KiB

Raw Permalink Blame History

ENTRY Lock Contention Spike Runbook

Incident description

A profile experiences repeated failures to acquire the row-based entry lock, blocking ENTRY signals and indicating pressure on horizontal scaling.

Symptoms

lockContentionCount increments in /internal/health and /metrics.
Logs show fn_try_acquire_entry_lock_row returning false with owner tokens different from the caller.
Trading loop reports lock acquisition failed warnings and may skip signals.

Metrics to check

/internal/health ? lockContentionCount, tradingLoopHealthy, reconciliationLoopHealthy.
/metrics ? entry_lock_contention_total, lock_acquisition_latency_seconds, entry_lock_holder_info (if available).

Immediate mitigation

Identify the profile_id and symbol from logs; confirm if another worker legitimately holds the lock.
Ensure the existing lock owner is still alive or has not crashed; use Supabase to inspect entry_locks TTL.
Wait for TTL expiry (default 30s) before retrying if owner appears stuck.
Avoid forcing lock release unless owner is confirmed dead; manual deletion risks concurrent exchange submission.

Expected self-recovery

The TTL expires, the lock row updates or deletes itself, and the next signal acquires the lock.
Metrics return to baseline if contention was transient.

When to escalate

Contention persists beyond three TTL cycles (90s).
Multiple profiles report contention simultaneously.
Lock rows show expired timestamps but fail to refresh. Escalate to Platform Ops and refer to docs/runbooks/lock-timeout.md (if it exists) for lock escalation.

What NOT to do

Do not delete lock rows manually while other workers are active.
Do not restart all workers; indiscriminate restarts magnify contention.
Do not trigger new ENTRY signals for the affected profile until lock clears.