job.autoMerge was persisted but ignored — PR merging fired only when the host
set FLEET_SHIP_MERGES_PR=1, and a failed merge was silent (PR left open, no
signal). Now:
- mergeRunPrOnShip merges when EITHER the job opted in (job.autoMerge) OR the
global flag is set (new pure, unit-tested shouldMergePrOnShip gate). Existing
global-flag behavior is preserved.
- Merge outcomes are surfaced as job events: pr_merged on success (inline or via
background retry) and pr_merge_failed when the inline attempt + 4 background
retries all fail, so a stuck PR shows on the timeline instead of vanishing.
Still fully best-effort and gated (no merge attempted unless opted in), so the
real-world side effect only happens when explicitly requested.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Hardens the factory credential lifecycle (§12):
- Token expiry: tokens now carry an absolute expiresAt (FLEET_TOKEN_TTL_DAYS,
default 90; 0 disables). verifyToken rejects an expired token regardless of
status, bounding the blast radius of a leak.
- Enforcement default: factoryTokenEnforcementEnabled now defaults ON in
production and OFF in development/test (an explicit FLEET_REQUIRE_FACTORY_TOKEN
still wins) — real deployments are secure by default while the local prototype
and the test suite keep working without enrollment.
- Token GC: pruneInvalidatedTokens deletes revoked, expired, and rotating-past-
grace tokens; wired into the hourly fleet GC sweep (SweepResult.tokensDeleted)
so the credential store stays bounded.
Covered by new enrollment.test.ts cases (expiry, TTL=0, enforcement default
matrix, prune) and the reaper/sweep accounting.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
job.retry (max / on / backoff) was persisted but never enforced: a failed
attempt just went to `failed` and required a manual operator requeue. Now, when
a factory releases a lease reporting a failure, the coordinator applies the policy:
- retryable result (matches a retry.on class) with attempts remaining ⇒ requeue
(queued, or blocked if deps are now unmet) with a retry backoff;
- retryable but attempts exhausted ⇒ dead_letter;
- no policy or non-retryable result (capability_mismatch/no_engine) ⇒ failed,
exactly as before (behavior-preserving).
Backoff is honored via a new job.retryNotBefore timestamp; the scheduler skips a
queued job until it elapses (new pure isAwaitingRetryBackoff gate in selectJob).
parseBackoffMs supports "<n>", "<n>s|m|h", "<n>ms", and "exp" (30s·2^(n-1), capped
1h). retry_scheduled / dead_letter audit events are emitted. decideFailureOutcome
and parseBackoffMs are pure and unit-tested (retry.test.ts), plus scheduler-gate
and end-to-end releaseLease coverage.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two recovery/cleanup gaps left the coordinator's containers growing without
bound and jobs stuck longer than necessary:
- reclaimStaleFactoryLeases: a crashed/partitioned factory stops heartbeating
~90s before its 900s lease TTL expires; the reaper now reclaims held leases of
stale (or vanished) holders within one stale window, via the same fence +
checkpoint-preserving path as the expiry reaper (refactored into reclaimLeaseJob).
- sweepFleetGarbage: deletes ephemeral coordination state on by default (finished
expired/released leases past a 24h TTL; factory docs with no heartbeat for 7d —
a live host just re-registers). Terminal-job retention (jobs + their runs/events/
artifacts+blobs) is OPT-IN only via FLEET_GC_RETENTION_DAYS (default 0 = never
delete history). Every delete is best-effort so one failure can't stall the sweep.
Both are wired into the existing reaper loop: recovery scans run every 30s, the
deletion sweep is throttled to hourly. New repo helpers (listHeldLeases,
listFinishedLeasesOlderThan, deleteLease, listAllFactories, deleteFactory,
listTerminalJobsOlderThan, deleteRun, deleteEvent) back the new coordinator
functions. Covered by cleanup.test.ts + expanded reaper.test.ts.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
reapExpiredLeases implements the full section-25 recovery (fence the zombie
holder via a leaseEpoch bump, return the job to queued/blocked, preserve the
checkpoint) but nothing ever called it: no route, no cron, no timer. So when a
factory crashed, lost network, or shut down, its in-flight job stayed stuck in
an active stage forever and was never requeued — the recovery code was dormant.
Add a process-wide background reaper (leases are queried across all products)
that runs reapExpiredLeases every 30s, started at server boot and stopped on
graceful shutdown, mirroring the diagnostics trigger-job pattern. A failing pass
is logged and retried on the next tick rather than crashing the service.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
availableEnginesForProduct skipped only health:down factories, so an engine
advertised solely by a host that had stopped heartbeating could still be offered
in the picker. Also skip factories whose lastHeartbeatAt is older than 90s
(mirrors the coordinator's DEFAULT_STALE_FACTORY_MS), and treat an unparseable
timestamp as stale. Adds unit coverage for the engine-collection, down, stale,
and graceful-degradation paths.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The engine picker only constrained the UI; the router never gated on the
chosen engine, so a job pinned to e.g. engine:codex could be claimed by a
factory that doesn't run codex (the runner's resolve_engine honors an explicit
engine with no availability check, so it would then fail at execution time).
Add a pure engineEligible(job, caps) hard gate to the section-7 scheduler filter
(and preemption): a concrete-engine job runs only on a factory advertising the
matching engine:<e> cap. Gated only against engine-aware factories (those that
advertise any engine:* token); engine-agnostic/legacy factories stay eligible,
mirroring the picker's "engine set unknown => offer all" fallback. explainJob now
surfaces the mismatch reason. No DB migration; behavior-preserving for legacy.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add GET /fleet/factories (lists a product's factory docs with capabilities) —
also fixes the fleet map's empty factory cards (listFactories had no route and
silently returned []). The New-Job form now loads the selected factory's
engine:* capabilities and constrains the engine dropdown to those (e.g. hides
codex when the host doesn't have it), keeping the current pick valid; falls back
to all engines when capabilities are unknown.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add an optional concrete `engine` to a job (overrides engineClass; resolved by
the runner's resolve_engine where an explicit engine wins). All additive +
optional, so existing engineless jobs keep falling back to the factory default.
- types: FLEET_ENGINES enum; engine on SubmitJob/FleetJobDoc/UpdateDraft.
- coordinator: store engine on create/supersede/updateDraft; run.engine at claim
prefers job.engine, then engineClass, then 'unknown'.
- tracker-web: Engine dropdown on the New-Job form (default devin) + editable on
draft/queued jobs; shown in the detail config grid.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Backend: insights now carry engine + sessionId/sessionUrl; releaseLease promotes
the reported engine onto the run (was created with the abstract engineClass,
usually 'unknown').
tracker-web job detail:
- Runs: show the concrete engine (insights.engine, falls back off 'unknown') and
the agent session (Devin session id with a `devin --resume <id>` hint, or a
link when a sessionUrl is present).
- PromptCard: edit repo/baseBranch/verify/autoMerge (not just the prompt) while
draft/queued/blocked.
- Timeline: filter by event type (default collapses heartbeat runs).
- Show a "no PR — needs verify / not PR mode" hint when parked in review.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The job detail timeline was buried under hundreds of lease_renewed rows on
long-running jobs. Collapse consecutive high-frequency events (lease_renewed)
into one "type xN - over Nm" summary row; everything else renders verbatim. Add
a prominent Pull Request banner (link + state) sourced from whichever run opened
the PR, instead of only the per-attempt Runs column.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds dashboards/tracker-web/launchd/ (boot script + install.sh + README) so
tracker-web (:3003) auto-starts on login and restarts on crash/reboot, instead
of dying silently between sessions. Mirrors agent-queue/launchd: boot script
repairs PATH, loads JWT_SECRET from platform-service/.env (+ ~/.tracker-web.env
overrides), points at the local platform-service, and execs `pnpm dev`. plist
uses unconditional KeepAlive (restart on any exit, incl. a clean SIGTERM) + a 10s
throttle; install.sh frees :3003 first to avoid a clash with deploy-gigafactory.
Verified: killing the process respawns it and :3003 returns.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
updateDraft changes caps/deps/priority (and body) without a stage change, so it
did not bump fleet_queue_state — gated factories (AQ_FLEET_GATE=1) would not
re-evaluate claimability until the safety interval. Bump the gate on edit so an
edit that makes a job claimable wakes factories promptly.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The form defaulted capabilities to "build" — a token no agent-queue factory
advertises (caps are os:* / engine:* / node:* / has:*), so every default UI
submission was unroutable and stranded in queued (queue_starvation). Default
capabilities to empty (any capable factory claims it), and replace the stale
hardcoded mac-1/mac-2 factory dropdown with the 4 live factories (lysnrai /
chronomind / mindlyst / nomgap, ids matching AQ_FACTORY_ID).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Backend (platform-service):
- New `draft` stage (not claimable; scheduler only takes queued/blocked).
- submitJob accepts `draft: true` → parks a new/superseded job as a draft.
- updateDraft(): edit prompt/config in place while draft/queued/blocked;
recomputes contentHash; rejected (conflict) once picked up (assigned+).
- submitDraft(): promote draft → queued (or blocked on unmet deps); idempotent.
- Routes: PATCH /fleet/jobs/:id/draft, POST /fleet/jobs/:id/submit.
- tracker-bridge: map draft → item status `open`. Tests + FLEET_STAGES updated.
Frontend (tracker-web):
- New-Job form: add "Save as draft" alongside "Submit".
- Job detail: edit the prompt + Save while draft/queued/blocked, "Submit" a
draft, and lock it read-only once a factory picks it up.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The fleet job detail page never rendered the prompt (bodyMd) or the repo/
verify/auto-merge/capabilities/deps config. Add a Prompt card (verbatim body,
scrollable) + a target/config grid, with a read-only badge once the job leaves
queued/draft (a factory may already be acting on it). Expose verify/autoMerge/
deps on the FleetJob client type.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- fleet module README: add fleet_queue_state container + GET /fleet/queue-state
and /fleet/metrics; note the heartbeat cadence must stay under the 90s stale
threshold (AQ_FLEET_LEASE_RENEW_SEC).
- FLEET_CONTROL_PLANE: correct wrong endpoint paths (/fleet/claim and
/fleet/factories/heartbeat were documented as /fleet/jobs/:id/claim and
/fleet/factories/:id/heartbeat; removed a non-existent GET /fleet/factories);
add enroll, metrics, and the M0 queue-state endpoint.
- ROADMAP_COMPLETION_AUDIT: dated status banner — roadmap §0 now reconciled and
Phase-4 M0 shipped, superseding the older "stale §0 / not started" findings.
- README: point to FLEET_DISPATCH_REDESIGN.md + the M0 gate.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds fleet_queue_state (monotonic version per product), bumped on job create +
every stage change in the repository layer (best-effort, never fails a job
write), and a GET /fleet/queue-state read endpoint. Lets a polling factory
detect "work changed" with a ~1 RU point read instead of a full listJobs scan
on every claim. Registers the container; tests cover the bump + endpoint.
See agent-queue docs/GIGAFACTORY/FLEET_DISPATCH_REDESIGN.md §8/§12 (M0).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds an optional displayName claim to the platform access token so
downstream product backends can source the user's display name from the
JWT (single source of truth = platform auth), not from per-product DB
copies. verifyToken already exposes displayName; this populates it at all
token-minting sites (password login, register, refresh, SSO, OAuth,
magic-link, passkeys, QR, enterprise SAML/OIDC). Additive and backward
compatible.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Clickable stage chips with live counts (color-coded) replace the plain dropdown;
click to filter, click again to clear.
- Search box (key / repo / id) + 'Hide shipped' checkbox (persisted) + 'N of M shown'.
- Redesigned table: color-coded stage badges, priority emphasis, a Repo column
(PR target), newest-first sort, bordered/zebra layout.
- All filtering is client-side over a 100-job window (instant + accurate counts);
list/links/New-Job form unchanged. FleetJob gains repo/baseBranch.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The corporate proxy intermittently 407s GitHub's API, so a single gh pr merge can
fail transiently. Try once inline (fast path), then retry in the background with
backoff (3s/8s/20s/45s) without blocking the ship; mark prState=merged when one
lands. Best-effort throughout.
On ship (Ship button / operator action / autoship PATCH), when the run has an open
PR and FLEET_SHIP_MERGES_PR=1, the coordinator squash-merges it via gh (best-effort,
where gh is authed) and marks the run prState=merged. UI button reads 'Ship & merge
PR' when an open PR exists; Ship refreshes runs.
- telemetry proxy attaches an X-Install-Token (derived from the payload, with a
fallback) so the backend ingest auth gate stops returning 401 on browser beacons.
- job-detail Cost shows ~$x.xx approx when the figure is estimated (token-based).
Runs now carry prState (open when the PR is opened, merged when auto-merge
succeeds), reported on lease release. Job-detail Runs table shows a status badge
next to the PR link.
New Job form: pick a Factory (2 hardcoded for this machine) -> the job is submitted
to that factory's product (submitJob productId override) and the dashboard view
switches to it so the job is visible. Confirmation shows factory + product.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add job-level verify (command run in the PR checkout before opening the PR) and
autoMerge (squash-merge the PR once opened). Surfaced in the New Job form as a
Verify-command field + Auto-merge checkbox (PR mode only); confirmation now shows
PR-mode/repo. More repos added to the dropdown.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
MVP: the New Job form picks a PR target from a fixed dropdown of local repos; base
branch is fixed to main. Empty selection = no PR (plain job).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Make "shipped" produce a real artifact. A job can now carry an optional repo
(owner/name or clone URL) + baseBranch; the factory's PR mode runs the agent in an
isolated checkout, opens a PR, and records the link.
Backend:
- SubmitJobSchema + FleetJobDoc: optional repo/baseBranch (recorded on submit).
- FleetRunDoc: optional prUrl/branch.
- ReleaseLease report carries prUrl/branch -> stored on the run.
- +2 coordinator tests.
UI (tracker-web):
- New Job form gains optional Repo + Base branch fields (and fixes the priority
options to the valid critical/high/medium/low; "normal" was rejected by the API).
- Job detail Runs table shows a PR ↗ link from run.prUrl.
- fleet-client: submitJob repo/baseBranch; FleetRun prUrl/branch; OperatorAction +ship.
Docs: FLEET_CONTROL_PLANE.md "PR deliverable (PR mode)" section.
Verified: tsc clean; fleet suite 182; tracker-web 230.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a collapsible 'New Job' form on the fleet jobs page (task body, priority,
capabilities) wired to a new fleet-client submitJob() -> POST /fleet/jobs, with
inline success/error and auto-refresh. Also add 'ship' to the OperatorAction type
for parity with the coordinator. The existing job-detail 'Ship' button already
drives the human-gate testing -> shipped transition.
Verified: tsc clean; tracker-web suite 230/230.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Some keydown events (autofill, IME/composition, certain extensions) arrive with
e.key === undefined, crashing the hotkey handler at e.key.toLowerCase(). Guard
e.key (and hotkey.key) before comparing. +1 test (undefined-key keydown is
ignored without throwing). 27 pass.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
A job can reach `shipped` via autoship PATCH, the `ship` operator action, or a
terminal lease release, but the run-level `result` was left at whatever the
factory last reported (e.g. `review`), so the dashboard showed a shipped job with
a non-terminal run result.
- Add markLatestRunShipped(): on any transition to `shipped`, set the latest run
result to `shipped` (+ endedAt if unset). Idempotent, best-effort.
- Wire it into patchJobFenced (ungated; budget accrual stays flag-gated) and the
`ship` operator action.
- Document the testing->shipped paths (factory autoship vs `ship` operator action)
and the run-mirroring in docs/GIGAFACTORY/FLEET_CONTROL_PLANE.md.
Tests: +2 (patchJobFenced->shipped and operator ship both set run.result=shipped).
Fleet suite 180 pass; tsc clean.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
product-context.test.tsx failed with "localStorage.clear is not a function".
Root cause: Node 25 ships a global `localStorage` Web Storage stub that is
non-functional without --localstorage-file, and it shadows the test DOM
environment's storage. The two DOM tests also relied on `jsdom`, which was only
present transitively (not a tracker-web dependency) while the rest of the
monorepo standardizes on happy-dom.
- Add happy-dom as a tracker-web devDependency; switch the two `@vitest-environment
jsdom` tests (product-context, command-menu) to happy-dom.
- Add vitest.setup.ts that installs a real in-memory Web Storage over Node 25's
non-functional stub when the active localStorage/sessionStorage lacks the
Storage API; wire it via test.setupFiles.
Verified: full tracker-web suite 230/230 (was 228/2).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Verified: full workspace build (tsc) green across all packages/services/dashboards;
fleet+items tests pass. Compile-time only.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Bump stripe 17 -> 20 and adapt to the breaking API changes:
- PromotionCode: coupon moved under `promotion` ({ type: 'coupon', coupon }).
mapPromo now reads p.promotion.coupon; create now passes
promotion: { type: 'coupon', coupon: id } instead of a top-level coupon.
- Subscription.current_period_end removed (now per subscription item). Add
getSubscriptionPeriodEnd() = max(items[].current_period_end) and use it in the
customer.subscription.updated webhook handler.
- Update the promos route test fixture to the new promotion.coupon shape.
Verified: platform-service build (tsc) clean; promos (14) + stripe/subscriptions/
billing tests pass; full suite 1692/1692.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Applied fresh on current main (the matching dependabot branches were 350-430
commits behind and would have conflicted on the lockfile):
- @azure/cosmos 4.9.1 -> 4.9.3
- jose 6.1.3 -> 6.2.3 (mcp-server stays on the 5.x line: 5.9.6 -> 5.10.0)
- @typescript-eslint/parser 8.0 -> 8.60.0
Verified: full workspace build green; platform-service suite 1684 pass (only the
pre-existing single-fork migration-isolation flake, passes isolated); tracker-web
228 pass (only the pre-existing happy-dom product-context failures). No new
regressions. Major bumps (fastify/cors, happy-dom, lint-staged, stripe,
types/node) deferred for separate review.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>