learning_ai_common_plat/services/platform-service/src
saravanakumardb1 42d27d8a4f feat(fleet): enforce job.retry — auto-requeue, backoff, and dead-letter
job.retry (max / on / backoff) was persisted but never enforced: a failed
attempt just went to `failed` and required a manual operator requeue. Now, when
a factory releases a lease reporting a failure, the coordinator applies the policy:

- retryable result (matches a retry.on class) with attempts remaining ⇒ requeue
  (queued, or blocked if deps are now unmet) with a retry backoff;
- retryable but attempts exhausted ⇒ dead_letter;
- no policy or non-retryable result (capability_mismatch/no_engine) ⇒ failed,
  exactly as before (behavior-preserving).

Backoff is honored via a new job.retryNotBefore timestamp; the scheduler skips a
queued job until it elapses (new pure isAwaitingRetryBackoff gate in selectJob).
parseBackoffMs supports "<n>", "<n>s|m|h", "<n>ms", and "exp" (30s·2^(n-1), capped
1h). retry_scheduled / dead_letter audit events are emitted. decideFailureOutcome
and parseBackoffMs are pure and unit-tested (retry.test.ts), plus scheduler-gate
and end-to-end releaseLease coverage.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 11:40:33 -07:00
..
lib feat(fleet): M0 RU gate — cheap per-product queue version + skip-claim 2026-05-31 23:18:27 -07:00
migrations refactor(platform-service): migrate webhooks/routes and migrations/runner from cosmos.js to datastore 2026-03-02 02:02:47 -08:00
modules feat(fleet): enforce job.retry — auto-requeue, backoff, and dead-letter 2026-06-01 11:40:33 -07:00
nodemailer.d.ts feat(platform-service): add smtp email delivery and postal setup 2026-03-14 05:52:28 +00:00
server.test.ts feat(runtime): add platform runtime projection api 2026-04-04 01:14:37 -07:00
server.ts fix(fleet): schedule the lease reaper so dead-factory jobs are recovered 2026-06-01 11:11:14 -07:00