Closes the loop on the retry automation — a job that exhausts its retries lands
in dead_letter with no way to recover it:
- New `redrive` operator action: requeues the job AND grants a fresh retry budget
by anchoring a new `attemptsBase` to the current `attempts` (and clearing any
retryNotBefore backoff). `attempts` stays monotonic so run ids never collide; a
plain `requeue` leaves the budget exhausted and would instantly re-dead-letter.
The retry policy now measures used budget as `attempts - attemptsBase`.
- fleetMetrics raises a `dead_letter` warning alert when any job is dead-lettered.
- tracker-web: a "Re-drive" button on dead_letter/failed jobs; the timeline already
renders the retry_scheduled / dead_letter / pr_merged / pr_merge_failed /
factory_stale events generically.
Backward compatible: attemptsBase defaults to 0 and old docs without it read as 0.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
availableEnginesForProduct skipped only health:down factories, so an engine
advertised solely by a host that had stopped heartbeating could still be offered
in the picker. Also skip factories whose lastHeartbeatAt is older than 90s
(mirrors the coordinator's DEFAULT_STALE_FACTORY_MS), and treat an unparseable
timestamp as stale. Adds unit coverage for the engine-collection, down, stale,
and graceful-degradation paths.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add GET /fleet/factories (lists a product's factory docs with capabilities) —
also fixes the fleet map's empty factory cards (listFactories had no route and
silently returned []). The New-Job form now loads the selected factory's
engine:* capabilities and constrains the engine dropdown to those (e.g. hides
codex when the host doesn't have it), keeping the current pick valid; falls back
to all engines when capabilities are unknown.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add an optional concrete `engine` to a job (overrides engineClass; resolved by
the runner's resolve_engine where an explicit engine wins). All additive +
optional, so existing engineless jobs keep falling back to the factory default.
- types: FLEET_ENGINES enum; engine on SubmitJob/FleetJobDoc/UpdateDraft.
- coordinator: store engine on create/supersede/updateDraft; run.engine at claim
prefers job.engine, then engineClass, then 'unknown'.
- tracker-web: Engine dropdown on the New-Job form (default devin) + editable on
draft/queued jobs; shown in the detail config grid.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Backend: insights now carry engine + sessionId/sessionUrl; releaseLease promotes
the reported engine onto the run (was created with the abstract engineClass,
usually 'unknown').
tracker-web job detail:
- Runs: show the concrete engine (insights.engine, falls back off 'unknown') and
the agent session (Devin session id with a `devin --resume <id>` hint, or a
link when a sessionUrl is present).
- PromptCard: edit repo/baseBranch/verify/autoMerge (not just the prompt) while
draft/queued/blocked.
- Timeline: filter by event type (default collapses heartbeat runs).
- Show a "no PR — needs verify / not PR mode" hint when parked in review.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The job detail timeline was buried under hundreds of lease_renewed rows on
long-running jobs. Collapse consecutive high-frequency events (lease_renewed)
into one "type xN - over Nm" summary row; everything else renders verbatim. Add
a prominent Pull Request banner (link + state) sourced from whichever run opened
the PR, instead of only the per-attempt Runs column.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The form defaulted capabilities to "build" — a token no agent-queue factory
advertises (caps are os:* / engine:* / node:* / has:*), so every default UI
submission was unroutable and stranded in queued (queue_starvation). Default
capabilities to empty (any capable factory claims it), and replace the stale
hardcoded mac-1/mac-2 factory dropdown with the 4 live factories (lysnrai /
chronomind / mindlyst / nomgap, ids matching AQ_FACTORY_ID).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Backend (platform-service):
- New `draft` stage (not claimable; scheduler only takes queued/blocked).
- submitJob accepts `draft: true` → parks a new/superseded job as a draft.
- updateDraft(): edit prompt/config in place while draft/queued/blocked;
recomputes contentHash; rejected (conflict) once picked up (assigned+).
- submitDraft(): promote draft → queued (or blocked on unmet deps); idempotent.
- Routes: PATCH /fleet/jobs/:id/draft, POST /fleet/jobs/:id/submit.
- tracker-bridge: map draft → item status `open`. Tests + FLEET_STAGES updated.
Frontend (tracker-web):
- New-Job form: add "Save as draft" alongside "Submit".
- Job detail: edit the prompt + Save while draft/queued/blocked, "Submit" a
draft, and lock it read-only once a factory picks it up.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The fleet job detail page never rendered the prompt (bodyMd) or the repo/
verify/auto-merge/capabilities/deps config. Add a Prompt card (verbatim body,
scrollable) + a target/config grid, with a read-only badge once the job leaves
queued/draft (a factory may already be acting on it). Expose verify/autoMerge/
deps on the FleetJob client type.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Clickable stage chips with live counts (color-coded) replace the plain dropdown;
click to filter, click again to clear.
- Search box (key / repo / id) + 'Hide shipped' checkbox (persisted) + 'N of M shown'.
- Redesigned table: color-coded stage badges, priority emphasis, a Repo column
(PR target), newest-first sort, bordered/zebra layout.
- All filtering is client-side over a 100-job window (instant + accurate counts);
list/links/New-Job form unchanged. FleetJob gains repo/baseBranch.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
On ship (Ship button / operator action / autoship PATCH), when the run has an open
PR and FLEET_SHIP_MERGES_PR=1, the coordinator squash-merges it via gh (best-effort,
where gh is authed) and marks the run prState=merged. UI button reads 'Ship & merge
PR' when an open PR exists; Ship refreshes runs.
- telemetry proxy attaches an X-Install-Token (derived from the payload, with a
fallback) so the backend ingest auth gate stops returning 401 on browser beacons.
- job-detail Cost shows ~$x.xx approx when the figure is estimated (token-based).
Runs now carry prState (open when the PR is opened, merged when auto-merge
succeeds), reported on lease release. Job-detail Runs table shows a status badge
next to the PR link.
New Job form: pick a Factory (2 hardcoded for this machine) -> the job is submitted
to that factory's product (submitJob productId override) and the dashboard view
switches to it so the job is visible. Confirmation shows factory + product.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add job-level verify (command run in the PR checkout before opening the PR) and
autoMerge (squash-merge the PR once opened). Surfaced in the New Job form as a
Verify-command field + Auto-merge checkbox (PR mode only); confirmation now shows
PR-mode/repo. More repos added to the dropdown.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
MVP: the New Job form picks a PR target from a fixed dropdown of local repos; base
branch is fixed to main. Empty selection = no PR (plain job).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Make "shipped" produce a real artifact. A job can now carry an optional repo
(owner/name or clone URL) + baseBranch; the factory's PR mode runs the agent in an
isolated checkout, opens a PR, and records the link.
Backend:
- SubmitJobSchema + FleetJobDoc: optional repo/baseBranch (recorded on submit).
- FleetRunDoc: optional prUrl/branch.
- ReleaseLease report carries prUrl/branch -> stored on the run.
- +2 coordinator tests.
UI (tracker-web):
- New Job form gains optional Repo + Base branch fields (and fixes the priority
options to the valid critical/high/medium/low; "normal" was rejected by the API).
- Job detail Runs table shows a PR ↗ link from run.prUrl.
- fleet-client: submitJob repo/baseBranch; FleetRun prUrl/branch; OperatorAction +ship.
Docs: FLEET_CONTROL_PLANE.md "PR deliverable (PR mode)" section.
Verified: tsc clean; fleet suite 182; tracker-web 230.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a collapsible 'New Job' form on the fleet jobs page (task body, priority,
capabilities) wired to a new fleet-client submitJob() -> POST /fleet/jobs, with
inline success/error and auto-refresh. Also add 'ship' to the OperatorAction type
for parity with the coordinator. The existing job-detail 'Ship' button already
drives the human-gate testing -> shipped transition.
Verified: tsc clean; tracker-web suite 230/230.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
product-context.test.tsx failed with "localStorage.clear is not a function".
Root cause: Node 25 ships a global `localStorage` Web Storage stub that is
non-functional without --localstorage-file, and it shadows the test DOM
environment's storage. The two DOM tests also relied on `jsdom`, which was only
present transitively (not a tracker-web dependency) while the rest of the
monorepo standardizes on happy-dom.
- Add happy-dom as a tracker-web devDependency; switch the two `@vitest-environment
jsdom` tests (product-context, command-menu) to happy-dom.
- Add vitest.setup.ts that installs a real in-memory Web Storage over Node 25's
non-functional stub when the active localStorage/sessionStorage lacks the
Storage API; wire it via test.setupFiles.
Verified: full tracker-web suite 230/230 (was 228/2).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Operator actions (ship/requeue/cancel) are bodyless POSTs. The proxy always set
Content-Type: application/json, so the backend rejected them with
FST_ERR_CTP_EMPTY_JSON_BODY (500). Only declare the JSON content type when a
body is actually forwarded. Fixes the fleet dashboard action buttons.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The login form posted {email,password} but platform-service LoginSchema
requires productId, so real logins returned 400 (only the mocked e2e passed).
Send the selected product (tracker_selected_product) or the default PRODUCT_ID.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Job detail Runs table now shows Duration, Model, Tokens (in/out + cached) and
Cost per run, plus a per-job totals header (cost / tokens / wall-time). Artifacts
get a view/download button via a fresh signed URL. Also fix the fleet API proxy
to forward to /api/fleet/* (backend mounts fleet under /api) so a live backend
resolves; previously it returned 404 and only the mocked e2e passed.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- budget page: guard spend bar against missing/zero ceiling (no NaN width);
show an explicit "no ceiling set" state. Add pure budgetUsagePct() helper.
- job detail: replace silent live/poll toggle with an explicit stream-mode
badge (Live vs Polling) so operators see when SSE degrades to polling.
- fleet-client: extend patchJob to carry optional checkpoint/blockedReason
matching the server PatchJobSchema; add FleetCheckpoint type.
- tests: unit cover budgetUsagePct + patchJob checkpoint forwarding; e2e
asserts the polling indicator appears when the stream is unavailable.
- ci: add a Gitea Playwright e2e job that runs the fleet control-plane specs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the §14 Phase 3 review gate. requestReview() routes a building
job into the review stage (fencing any worker), carrying a normalized policy
(requiredApprovals + reviewer allowlist) and clearing prior decisions.
submitReview() records one decision per reviewer (last-write-wins, identity-
normalized), advances the job to testing once distinct approvals reach the
quorum, and treats any reject as a veto that returns the job to queued for
rework. Adds POST /fleet/jobs/:id/review/request and POST /fleet/jobs/:id/review,
a typed client, and a review-gate card on the job-detail page.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds coordinator.fleetMetrics() computing queue depth, stage histogram,
oldest-queued age (starvation signal), factory health and seat utilization,
plus derived alerts (no_live_capacity, all_factories_down, queue_starvation,
saturated, stale_factories). Exposed via GET /fleet/metrics and surfaced as a
metrics+alerts panel on the fleet overview. Thresholds injectable for tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Backend: GET /fleet/jobs/:id/events/stream emits a snapshot (seq > Last-Event-ID)
then long-polls the append-only event log, closing after a bounded window so
EventSource-style clients reconnect cleanly. Honors Last-Event-ID resume,
keepalive comments, and a terminal error frame.
Frontend: subscribeJobEvents uses fetch streaming (to send auth + product
headers) with parseSseFrames, Last-Event-ID resume, reconnect backoff, and a
fatal-on-error-frame fallback to polling. Job detail page subscribes live
(deduped by seq), falls back to 4s polling on failure, and shows a Live badge;
refresh() now merges events so a slow snapshot can't clobber streamed ones.
Tests: +3 route (snapshot, resume cursor, append-after-connect), +5 client
(parseSseFrames x2, subscribe deliver/error/resume/error-frame). fleet 150, web 222.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- coordinator.costBurndown() aggregates completed run cost (insights.costUsd)
by UTC day over a window, returning a gap-free cumulative series + ceiling
- repository.listRunsByProduct() cross-partition run query
- GET /fleet/budgets/:productId/burndown?days=N route
- fleet-client.getBudgetBurndown() + CostBurndown/BurndownPoint types
- BurndownChart on the budget page: cumulative daily bars with a dashed
ceiling overlay; bars turn red past the ceiling; degrades gracefully
- Tests: +2 coordinator, +1 routes, +2 fleet-client (fleet 147, web 216)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds 'why does this job route here?' to the §7 scheduler:
- coordinator.explainJob() re-runs scoreCandidate against every live factory,
returning per-factory weighted breakdown, eligibility + reasons, deps state,
and the best eligible factory (read-only, side-effect free)
- GET /fleet/jobs/:id/explain route (404 when job missing)
- fleet-client.getJobExplain() + JobExplain/ScoreBreakdown types
- ExplainPanel on the job detail page: score table per factory with the six
weighted terms, eligibility, and unmet-deps note; degrades gracefully
- Tests: +2 coordinator, +1 routes, +2 fleet-client (fleet 144 green,
tracker-web 214 green)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fleet overview page with factory cards + recent jobs polling
- Job table with stage filter tabs
- Job detail page with events timeline, runs, artifacts, DAG subtree, SHIP action
- Budget page with usage bar, pause/resume controls
- API proxy route forwarding /api/fleet/* to platform-service
- Typed fleet-client.ts with graceful 404 degradation
- 16 unit tests for fleet-client (198 total tracker-web tests green)
- Added Fleet nav item to dashboard layout
- Full monorepo build + test green
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the hand-rolled sticky top nav in the dashboard layout with the
shared AppShell (AppShellSidebar/AppShellNav/AppShellNavItem/AppShellMain/
AppShellSkipLink + mobile toggle + overlay). The sidebar keeps the
ProductSwitcher, user email and Sign out, and adds a ⌘K trigger (replays
the global hotkey) and a theme toggle. Nav items use aria-current for the
active route and client-side navigation; the skip-link targets the
focusable main region. AppShell exports are routed through the Primitives
adapter (CC.6 ratchet) and covered by the export-presence test.
AppShellPageHeader is intentionally not used so the per-page PageHeader
(UX-10) remains the single h1 per route (no duplicate headings).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add @bytelyst/motion (workspace:* + minimal lockfile importer entry).
Apply Reveal to the dashboard overview cards/charts (staggered) and the
items DataTable, and NumberFlow to the overview KPI totals. All motion
primitives no-op under prefers-reduced-motion.
The public /roadmap is intentionally left without entrance motion: the
offline @axe-core gate scans the page synchronously and the Reveal
transient sub-1 opacity trips color-contrast (a hard CC.4 a11y gate). The
dashboard/items surfaces are auth-gated and not axe-scanned, so they keep
the motion.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Mount the shared ToastProvider in providers.tsx and replace the inline
error/success banners across the overview, items list, board, item detail
and public roadmap with toast() calls — load errors, create, delete, vote,
status/priority/visibility updates, comment add, and idea submission. The
roadmap submit now toasts + closes the dialog instead of an in-modal
banner; the item-detail page keeps a single inline empty-state for the
hard "couldn't load this item" case.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add @bytelyst/command-palette (workspace:* + minimal lockfile importer
entry). Mount CommandRegistryProvider + a lazily-loaded CommandMenu in
providers.tsx, opened with ⌘K / Ctrl-K. Register navigate commands
(Overview/Items/Board/Roadmap), New item (navigates to items with ?new=1
which auto-opens the create modal), Toggle theme, Sign out, and per-product
Switch commands wired to setProductId. Command building lives in the pure
src/lib/command-registry.ts. Add command-menu.test.tsx (jsdom) asserting the
builder set and that the palette opens on ⌘K and lists commands.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add @bytelyst/charts + @bytelyst/data-viz as workspace:* deps (minimal
importer link entries in the lockfile). Replace the badge-pill breakdowns
on /dashboard with KpiCards (total/open/in-progress/done) and a dynamically
imported chart surface: Donut for By Status + By Type (centered total) and
a BarChart for By Priority, coloured from the bridged --bl-chart-* palette.
Pure transforms live in src/lib/overview-charts.ts (finite-coerced, no NaN
reaches the SVG); the heavy chart surface is split into overview-charts.tsx
and loaded via next/dynamic (ssr:false) to keep it out of the route bundle.
Add overview-charts.test.tsx rendering the surface with mocked stats via
react-dom/server (no NaN in paths) + transform unit tests; dedupe react in
vitest so the SSR render uses a single React instance.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the bespoke local Modal (and its slate/blue/white chrome) with the
shared @bytelyst/ui Modal (Radix dialog — focus-trap/Esc/scroll-lock) for
both the Submit Idea and vote email-prompt dialogs. The dialog titles
become the accessible heading; form fields move to Input/Select/Textarea
and the submit-result message to AlertBanner. Behaviour preserved.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the hardcoded slate-*/blue-*/white utility classes on /roadmap
with tokenized equivalents (bg-background/bg-card/text-foreground/
text-muted-foreground/border-border). Swap the type/priority pills to the
shared Badge, the status column/list markers to StatusDot tones, the stat
tiles to MetricCard, search/type filter to Input/Select, the Submit Idea
button to Button, and load/vote errors to AlertBanner. Behaviour, the
board/list SegmentedControl toggle, and the vote-button a11y attrs
(aria-pressed/aria-label) are preserved. The submit + email-prompt modals
are migrated to the shared Modal in UX-3.2.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the bespoke total-count card chrome with the shared MetricCard on
the dashboard overview (breakdown cards stay until the UX-4 chart swap),
and surface load errors via AlertBanner. Wrap the items-list delete
confirm() in the accessible ConfirmDialog (focus-trapped AlertDialog)
instead of the native browser prompt.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the raw search input, the three filter selects, the New Item
button, and the hand-rolled create modal with @bytelyst/ui Input/Select/
Field/Button/Modal (via the Primitives adapter). The shared Modal closes
the focus-trap/Esc/scroll-lock a11y gap. Swap the inline type/status/
priority cell pills to StatusBadge with token-driven tones.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>