From a6adaee835af9819143c2e639d04b52219c0a567 Mon Sep 17 00:00:00 2001 From: saravanakumardb1 Date: Mon, 1 Jun 2026 12:11:46 -0700 Subject: [PATCH] feat(fleet): operator re-drive for dead-letter jobs + dead-letter alert/UI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the loop on the retry automation — a job that exhausts its retries lands in dead_letter with no way to recover it: - New `redrive` operator action: requeues the job AND grants a fresh retry budget by anchoring a new `attemptsBase` to the current `attempts` (and clearing any retryNotBefore backoff). `attempts` stays monotonic so run ids never collide; a plain `requeue` leaves the budget exhausted and would instantly re-dead-letter. The retry policy now measures used budget as `attempts - attemptsBase`. - fleetMetrics raises a `dead_letter` warning alert when any job is dead-lettered. - tracker-web: a "Re-drive" button on dead_letter/failed jobs; the timeline already renders the retry_scheduled / dead_letter / pr_merged / pr_merge_failed / factory_stale events generically. Backward compatible: attemptsBase defaults to 0 and old docs without it read as 0. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- .../app/dashboard/fleet/jobs/[id]/page.tsx | 10 ++++ .../tracker-web/src/lib/fleet-client.ts | 2 +- .../src/modules/fleet/coordinator.ts | 30 +++++++++-- .../src/modules/fleet/retry.test.ts | 50 +++++++++++++++++++ .../src/modules/fleet/types.ts | 7 +++ 5 files changed, 94 insertions(+), 5 deletions(-) diff --git a/dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx b/dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx index 6204bce2..706bb46e 100644 --- a/dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx +++ b/dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx @@ -229,6 +229,16 @@ export default function FleetJobDetailPage() { {acting === 'requeue' ? 'Requeuing...' : 'Requeue'} )} + {(job.stage === 'dead_letter' || job.stage === 'failed') && ( + + )} {job.stage !== 'shipped' && job.stage !== 'failed' && job.stage !== 'dead_letter' && ( <>