learning_ai_common_plat/services
saravanakumardb1 a6adaee835 feat(fleet): operator re-drive for dead-letter jobs + dead-letter alert/UI
Closes the loop on the retry automation — a job that exhausts its retries lands
in dead_letter with no way to recover it:

- New `redrive` operator action: requeues the job AND grants a fresh retry budget
  by anchoring a new `attemptsBase` to the current `attempts` (and clearing any
  retryNotBefore backoff). `attempts` stays monotonic so run ids never collide; a
  plain `requeue` leaves the budget exhausted and would instantly re-dead-letter.
  The retry policy now measures used budget as `attempts - attemptsBase`.
- fleetMetrics raises a `dead_letter` warning alert when any job is dead-lettered.
- tracker-web: a "Re-drive" button on dead_letter/failed jobs; the timeline already
  renders the retry_scheduled / dead_letter / pr_merged / pr_merge_failed /
  factory_stale events generically.

Backward compatible: attemptsBase defaults to 0 and old docs without it read as 0.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 12:11:46 -07:00
..
cowork-service chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
extraction-service chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
mcp-server chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
monitoring chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
platform-service feat(fleet): operator re-drive for dead-letter jobs + dead-letter alert/UI 2026-06-01 12:11:46 -07:00