reapExpiredLeases implements the full section-25 recovery (fence the zombie
holder via a leaseEpoch bump, return the job to queued/blocked, preserve the
checkpoint) but nothing ever called it: no route, no cron, no timer. So when a
factory crashed, lost network, or shut down, its in-flight job stayed stuck in
an active stage forever and was never requeued — the recovery code was dormant.
Add a process-wide background reaper (leases are queried across all products)
that runs reapExpiredLeases every 30s, started at server boot and stopped on
graceful shutdown, mirroring the diagnostics trigger-job pattern. A failing pass
is logged and retried on the next tick rather than crashing the service.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>