bytelyst-devops-tools/agent-queue/lib
saravanakumardb1 2ae1af1930 feat(agent-queue): resilient lease renewal + graceful drain
Two runner-side reliability improvements (additive, opt-out via env where relevant):

- Lease-renewal retry: a renewal lost to a transient blip (timeout/5xx/proxy)
  previously let the lease expire and the coordinator reclaim a still-running job,
  wasting the work. fleet_lease_renew now retries a TRANSIENT failure a few times
  with a short backoff (AQ_FLEET_RENEW_RETRIES=2, AQ_FLEET_RENEW_BACKOFF_SEC=2);
  a 409/412 FENCE is terminal and never retried.

- Graceful drain: a new `drain` command signals a running loop (via a $STATE/draining
  flag) to stop taking NEW work (coordinator claim + local inbox), let in-flight
  jobs finish, release their leases, and exit cleanly — ideal before a deploy.
  Distinct from `stop` (kills workers immediately) and the `--drain`/--once startup
  flag (drains the queue to empty). The flag is cleared at run-loop start and on exit.

bash -n clean on both files; ./selftest.sh PASS; `drain` smoke-tested.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 12:24:45 -07:00
..
fleet-client.sh feat(agent-queue): resilient lease renewal + graceful drain 2026-06-01 12:24:45 -07:00
fleet-dash.mjs feat(agent-queue): re-point TUI dashboard at /fleet API (parity) 2026-05-30 19:47:56 -07:00
fleet-dash.test.mjs feat(agent-queue): re-point TUI dashboard at /fleet API (parity) 2026-05-30 19:47:56 -07:00