Commit Graph

226 Commits

Author SHA1 Message Date
saravanakumardb1
6bddc88f0f feat(fleet): reconcile PR state against GitHub (detect externally-merged PR)
A PR merged in the GitHub UI was invisible to the platform — prState only
flipped to `merged` when the platform merged it (ghMergePr on ship) or a runner
reported it, so the job details page kept showing the PR as open. This adds a
simple, pull-based reconcile (no inbound webhook / public ingress needed).

- coordinator.reconcileJobPrState(jobId, productId, fetcher?): finds the latest
  run carrying a prUrl and, when `gh pr view --json state` reports MERGED, flips
  the run's prState to `merged` and appends a `pr_merged` event (data.via:
  'reconcile'). The GitHub lookup is injectable for tests; pure `mapGhPrState`
  maps MERGED/OPEN/other. Best-effort: any gh failure is a no-op.
- POST /fleet/jobs/:id/pr/reconcile route; echoes the outcome to the tracker
  Item when a merge is detected.
- tracker-web: reconcilePrState() client + a "Refresh PR status" button on the
  job details PR section (shown until the PR is merged) that reconciles then
  refreshes the view.

Tests: +5 (mapGhPrState, reconcile merged/open/no_pr/not_found, route wiring);
full suite 1861 green; lint + tsc clean (service + tracker-web). Deployed: rebuilt
the docker platform-service; POST .../pr/reconcile returns 401 (wired), not 404.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 23:58:13 -07:00
saravanakumardb1
acf7c36cda feat(tracker-web): dynamic, owner-scoped project switcher via /products/mine
Wires the switcher to real per-user data instead of a static list:

- New /api/products/[...path] proxy to platform-service (mirrors the fleet
  proxy), exposing GET /api/products/mine.
- ProductProvider fetches the caller's owner-scoped projects on mount (when a
  tracker_token is present) and uses them as the switcher list. Best-effort:
  any failure / empty result / unauthenticated state keeps the configured
  fallback list, so dev and logged-out rendering still work.

Combined with the earlier config-driven list + auto-hide, the switcher now
reflects the authenticated user's projects on a generic platform.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 16:48:27 -07:00
saravanakumardb1
2c4357b71b feat(tracker-web): make product switcher generic — configurable list + auto-hide
Steps toward a tenant-neutral platform (not hardcoded to the ByteLyst products):

- The selectable product list is now configurable via NEXT_PUBLIC_PRODUCTS
  (JSON array of { id, name, icon? }), defaulting to the built-in set. A pure,
  defensive parser (parseProductsEnv) falls back to the default on any malformed
  value so a bad env can never blank the switcher.
- The sidebar switcher auto-hides when there is <= 1 product, so a solo / freelance
  / single-tenant deployment shows no switcher clutter.
- Dedupe: the server product-config now re-exports the single client-safe list
  instead of keeping a second hardcoded copy.

NOTE: true per-user "only your projects" scoping + server-side tenant
authorization still requires an ownership/membership model that does not exist
yet (ProductDoc has no owner/members; products are a global registry). That is a
deliberate, separate effort needing a product decision and is not included here.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 16:27:14 -07:00
saravanakumardb1
705d8e8eaa feat(tracker-web): consolidated fleet overview — breaker panel, budget guardrail, dead-letter triage
Surfaces the §2/§3 signal on the fleet overview page (read-only over existing
endpoints):

- Budget guardrail card: spend vs ceiling bar, projected end-of-window burn
  (highlighted when over ceiling), and per-engine sub-ceiling breakdown — from
  the new /fleet/metrics budget summary.
- Engine circuit-breaker panel: lists only tripped/probing (factory, engine)
  pairs from metrics.engineBreakers.
- Dead-letter triage table with a Re-drive button wired to the redrive operator
  action; filtered client-side so it is correct regardless of server filtering.

All panels render only when their data is present, so the page is unchanged for
fleets without budgets/breakers/dead-letters. Adds a happy-dom page test.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 13:29:52 -07:00
saravanakumardb1
a6adaee835 feat(fleet): operator re-drive for dead-letter jobs + dead-letter alert/UI
Closes the loop on the retry automation — a job that exhausts its retries lands
in dead_letter with no way to recover it:

- New `redrive` operator action: requeues the job AND grants a fresh retry budget
  by anchoring a new `attemptsBase` to the current `attempts` (and clearing any
  retryNotBefore backoff). `attempts` stays monotonic so run ids never collide; a
  plain `requeue` leaves the budget exhausted and would instantly re-dead-letter.
  The retry policy now measures used budget as `attempts - attemptsBase`.
- fleetMetrics raises a `dead_letter` warning alert when any job is dead-lettered.
- tracker-web: a "Re-drive" button on dead_letter/failed jobs; the timeline already
  renders the retry_scheduled / dead_letter / pr_merged / pr_merge_failed /
  factory_stale events generically.

Backward compatible: attemptsBase defaults to 0 and old docs without it read as 0.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 12:11:46 -07:00
saravanakumardb1
2ed19464c5 fix(tracker-web): exclude stale factories from the engine picker
availableEnginesForProduct skipped only health:down factories, so an engine
advertised solely by a host that had stopped heartbeating could still be offered
in the picker. Also skip factories whose lastHeartbeatAt is older than 90s
(mirrors the coordinator's DEFAULT_STALE_FACTORY_MS), and treat an unparseable
timestamp as stale. Adds unit coverage for the engine-collection, down, stale,
and graceful-degradation paths.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 11:02:56 -07:00
saravanakumardb1
d318e0fa2a feat(fleet): engine picker offers only engines the factory advertises
Add GET /fleet/factories (lists a product's factory docs with capabilities) —
also fixes the fleet map's empty factory cards (listFactories had no route and
silently returned []). The New-Job form now loads the selected factory's
engine:* capabilities and constrains the engine dropdown to those (e.g. hides
codex when the host doesn't have it), keeping the current pick valid; falls back
to all engines when capabilities are unknown.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 02:38:35 -07:00
saravanakumardb1
6c31577cf2 feat(fleet): per-job engine picker (devin/claude/codex/copilot, default devin)
Add an optional concrete `engine` to a job (overrides engineClass; resolved by
the runner's resolve_engine where an explicit engine wins). All additive +
optional, so existing engineless jobs keep falling back to the factory default.

- types: FLEET_ENGINES enum; engine on SubmitJob/FleetJobDoc/UpdateDraft.
- coordinator: store engine on create/supersede/updateDraft; run.engine at claim
  prefers job.engine, then engineClass, then 'unknown'.
- tracker-web: Engine dropdown on the New-Job form (default devin) + editable on
  draft/queued jobs; shown in the detail config grid.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 02:30:22 -07:00
saravanakumardb1
928edad0af feat(fleet): surface engine + agent session, editable config, timeline filter
Backend: insights now carry engine + sessionId/sessionUrl; releaseLease promotes
the reported engine onto the run (was created with the abstract engineClass,
usually 'unknown').

tracker-web job detail:
- Runs: show the concrete engine (insights.engine, falls back off 'unknown') and
  the agent session (Devin session id with a `devin --resume <id>` hint, or a
  link when a sessionUrl is present).
- PromptCard: edit repo/baseBranch/verify/autoMerge (not just the prompt) while
  draft/queued/blocked.
- Timeline: filter by event type (default collapses heartbeat runs).
- Show a "no PR — needs verify / not PR mode" hint when parked in review.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 02:19:38 -07:00
saravanakumardb1
5262583e8b feat(tracker-web): collapse heartbeat noise + surface PR on job detail
The job detail timeline was buried under hundreds of lease_renewed rows on
long-running jobs. Collapse consecutive high-frequency events (lease_renewed)
into one "type xN - over Nm" summary row; everything else renders verbatim. Add
a prominent Pull Request banner (link + state) sourced from whichever run opened
the PR, instead of only the per-attempt Runs column.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 02:06:33 -07:00
saravanakumardb1
5ad521ad4c feat(tracker-web): macOS LaunchAgent keep-alive for the fleet web tracker
Adds dashboards/tracker-web/launchd/ (boot script + install.sh + README) so
tracker-web (:3003) auto-starts on login and restarts on crash/reboot, instead
of dying silently between sessions. Mirrors agent-queue/launchd: boot script
repairs PATH, loads JWT_SECRET from platform-service/.env (+ ~/.tracker-web.env
overrides), points at the local platform-service, and execs `pnpm dev`. plist
uses unconditional KeepAlive (restart on any exit, incl. a clean SIGTERM) + a 10s
throttle; install.sh frees :3003 first to avoid a clash with deploy-gigafactory.

Verified: killing the process respawns it and :3003 returns.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 01:46:53 -07:00
saravanakumardb1
14e982d04f fix(tracker-web): make New-Job form submissions routable to live factories
The form defaulted capabilities to "build" — a token no agent-queue factory
advertises (caps are os:* / engine:* / node:* / has:*), so every default UI
submission was unroutable and stranded in queued (queue_starvation). Default
capabilities to empty (any capable factory claims it), and replace the stale
hardcoded mac-1/mac-2 factory dropdown with the 4 live factories (lysnrai /
chronomind / mindlyst / nomgap, ids matching AQ_FACTORY_ID).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 00:54:02 -07:00
saravanakumardb1
b7df779e1d feat(fleet): draft jobs + editable prompt (save-as-draft, submit, lock on pickup)
Backend (platform-service):
- New `draft` stage (not claimable; scheduler only takes queued/blocked).
- submitJob accepts `draft: true` → parks a new/superseded job as a draft.
- updateDraft(): edit prompt/config in place while draft/queued/blocked;
  recomputes contentHash; rejected (conflict) once picked up (assigned+).
- submitDraft(): promote draft → queued (or blocked on unmet deps); idempotent.
- Routes: PATCH /fleet/jobs/:id/draft, POST /fleet/jobs/:id/submit.
- tracker-bridge: map draft → item status `open`. Tests + FLEET_STAGES updated.

Frontend (tracker-web):
- New-Job form: add "Save as draft" alongside "Submit".
- Job detail: edit the prompt + Save while draft/queued/blocked, "Submit" a
  draft, and lock it read-only once a factory picks it up.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 00:44:10 -07:00
saravanakumardb1
cb4f7a7606 feat(tracker-web): show job prompt + PR/target config on the detail page
The fleet job detail page never rendered the prompt (bodyMd) or the repo/
verify/auto-merge/capabilities/deps config. Add a Prompt card (verbatim body,
scrollable) + a target/config grid, with a read-only badge once the job leaves
queued/draft (a factory may already be acting on it). Expose verify/autoMerge/
deps on the FleetJob client type.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 00:29:29 -07:00
saravanakumardb1
6709862c1a feat(tracker-web): redesign fleet jobs list — stage chips, filters, hide-shipped
- Clickable stage chips with live counts (color-coded) replace the plain dropdown;
  click to filter, click again to clear.
- Search box (key / repo / id) + 'Hide shipped' checkbox (persisted) + 'N of M shown'.
- Redesigned table: color-coded stage badges, priority emphasis, a Repo column
  (PR target), newest-first sort, bordered/zebra layout.
- All filtering is client-side over a 100-job window (instant + accurate counts);
  list/links/New-Job form unchanged. FleetJob gains repo/baseBranch.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 16:58:12 -07:00
saravanakumardb1
740335a149 feat(fleet): Ship can also merge the linked PR (gh pr merge)
On ship (Ship button / operator action / autoship PATCH), when the run has an open
PR and FLEET_SHIP_MERGES_PR=1, the coordinator squash-merges it via gh (best-effort,
where gh is authed) and marks the run prState=merged. UI button reads 'Ship & merge
PR' when an open PR exists; Ship refreshes runs.
2026-05-31 16:08:28 -07:00
saravanakumardb1
37d049eb69 fix(tracker-web): telemetry ingest auth (X-Install-Token) + show cost as approx
- telemetry proxy attaches an X-Install-Token (derived from the payload, with a
  fallback) so the backend ingest auth gate stops returning 401 on browser beacons.
- job-detail Cost shows ~$x.xx approx when the figure is estimated (token-based).
2026-05-31 15:58:26 -07:00
saravanakumardb1
696ee4189e feat(fleet): track + show PR status (open/merged) on the run
Runs now carry prState (open when the PR is opened, merged when auto-merge
succeeds), reported on lease release. Job-detail Runs table shows a status badge
next to the PR link.
2026-05-31 13:56:33 -07:00
saravanakumardb1
e9c1714c13 feat(tracker-web): factory dropdown routes job to the selected factory's product
New Job form: pick a Factory (2 hardcoded for this machine) -> the job is submitted
to that factory's product (submitJob productId override) and the dashboard view
switches to it so the job is visible. Confirmation shows factory + product.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 06:28:28 -07:00
saravanakumardb1
c239abeec9 feat(fleet): per-repo verify + auto-merge options for PR jobs
Add job-level verify (command run in the PR checkout before opening the PR) and
autoMerge (squash-merge the PR once opened). Surfaced in the New Job form as a
Verify-command field + Auto-merge checkbox (PR mode only); confirmation now shows
PR-mode/repo. More repos added to the dropdown.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 06:17:19 -07:00
saravanakumardb1
2adddce754 feat(tracker-web): hardcoded repo dropdown for PR-mode jobs (base=main)
MVP: the New Job form picks a PR target from a fixed dropdown of local repos; base
branch is fixed to main. Empty selection = no PR (plain job).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 05:36:50 -07:00
saravanakumardb1
883cf329e5 feat(fleet): PR deliverables — jobs target a repo, factory opens a PR, link recorded
Make "shipped" produce a real artifact. A job can now carry an optional repo
(owner/name or clone URL) + baseBranch; the factory's PR mode runs the agent in an
isolated checkout, opens a PR, and records the link.

Backend:
- SubmitJobSchema + FleetJobDoc: optional repo/baseBranch (recorded on submit).
- FleetRunDoc: optional prUrl/branch.
- ReleaseLease report carries prUrl/branch -> stored on the run.
- +2 coordinator tests.

UI (tracker-web):
- New Job form gains optional Repo + Base branch fields (and fixes the priority
  options to the valid critical/high/medium/low; "normal" was rejected by the API).
- Job detail Runs table shows a PR ↗ link from run.prUrl.
- fleet-client: submitJob repo/baseBranch; FleetRun prUrl/branch; OperatorAction +ship.

Docs: FLEET_CONTROL_PLANE.md "PR deliverable (PR mode)" section.

Verified: tsc clean; fleet suite 182; tracker-web 230.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 05:27:11 -07:00
saravanakumardb1
176b778a1f feat(tracker-web): submit fleet jobs from the dashboard
Add a collapsible 'New Job' form on the fleet jobs page (task body, priority,
capabilities) wired to a new fleet-client submitJob() -> POST /fleet/jobs, with
inline success/error and auto-refresh. Also add 'ship' to the OperatorAction type
for parity with the coordinator. The existing job-detail 'Ship' button already
drives the human-gate testing -> shipped transition.

Verified: tsc clean; tracker-web suite 230/230.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 04:53:23 -07:00
saravanakumardb1
06d7d881a0 fix(tracker-web): stable test DOM env + working localStorage (Node 25)
product-context.test.tsx failed with "localStorage.clear is not a function".
Root cause: Node 25 ships a global `localStorage` Web Storage stub that is
non-functional without --localstorage-file, and it shadows the test DOM
environment's storage. The two DOM tests also relied on `jsdom`, which was only
present transitively (not a tracker-web dependency) while the rest of the
monorepo standardizes on happy-dom.

- Add happy-dom as a tracker-web devDependency; switch the two `@vitest-environment
  jsdom` tests (product-context, command-menu) to happy-dom.
- Add vitest.setup.ts that installs a real in-memory Web Storage over Node 25's
  non-functional stub when the active localStorage/sessionStorage lacks the
  Storage API; wire it via test.setupFiles.

Verified: full tracker-web suite 230/230 (was 228/2).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 04:12:34 -07:00
saravanakumardb1
8fe26027e7 chore(deps): bump @types/node 22 -> 25 (dev types)
Verified: full workspace build (tsc) green across all packages/services/dashboards;
fleet+items tests pass. Compile-time only.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 04:02:56 -07:00
saravanakumardb1
0079274bff chore(deps): bump bcryptjs 2 -> 3 (+ @types/bcryptjs 3)
Verified: auth + platform-service build (tsc, default import still resolves);
packages/auth (31) + platform-service auth/api-key (65) pass (hash/compare work).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 03:46:19 -07:00
saravanakumardb1
168bff6e27 chore(deps): bump lint-staged 15 -> 16 (pre-commit tooling)
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 03:43:42 -07:00
saravanakumardb1
808f615124 chore(deps): bump @azure/cosmos, jose, @typescript-eslint/parser
Applied fresh on current main (the matching dependabot branches were 350-430
commits behind and would have conflicted on the lockfile):
- @azure/cosmos 4.9.1 -> 4.9.3
- jose 6.1.3 -> 6.2.3 (mcp-server stays on the 5.x line: 5.9.6 -> 5.10.0)
- @typescript-eslint/parser 8.0 -> 8.60.0

Verified: full workspace build green; platform-service suite 1684 pass (only the
pre-existing single-fork migration-isolation flake, passes isolated); tracker-web
228 pass (only the pre-existing happy-dom product-context failures). No new
regressions. Major bumps (fastify/cors, happy-dom, lint-staged, stripe,
types/node) deferred for separate review.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 03:28:40 -07:00
saravanakumardb1
c1aad8a819 fix(tracker-web): do not send Content-Type on bodyless fleet proxy POSTs
Operator actions (ship/requeue/cancel) are bodyless POSTs. The proxy always set
Content-Type: application/json, so the backend rejected them with
FST_ERR_CTP_EMPTY_JSON_BODY (500). Only declare the JSON content type when a
body is actually forwarded. Fixes the fleet dashboard action buttons.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 01:50:11 -07:00
saravanakumardb1
939c7b4621 fix(tracker-web): include productId in login (LoginSchema requires it)
The login form posted {email,password} but platform-service LoginSchema
requires productId, so real logins returned 400 (only the mocked e2e passed).
Send the selected product (tracker_selected_product) or the default PRODUCT_ID.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 01:21:58 -07:00
saravanakumardb1
2253f888c7 feat(tracker-web): per-run cost/token/time metrics + log download; fix fleet proxy
Job detail Runs table now shows Duration, Model, Tokens (in/out + cached) and
Cost per run, plus a per-job totals header (cost / tokens / wall-time). Artifacts
get a view/download button via a fresh signed URL. Also fix the fleet API proxy
to forward to /api/fleet/* (backend mounts fleet under /api) so a live backend
resolves; previously it returned 404 and only the mocked e2e passed.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 01:19:57 -07:00
Saravanakumar D
0799f69c30 feat(fleet-web): harden budget bar, surface SSE polling, allow checkpoint in patchJob
- budget page: guard spend bar against missing/zero ceiling (no NaN width);
  show an explicit "no ceiling set" state. Add pure budgetUsagePct() helper.
- job detail: replace silent live/poll toggle with an explicit stream-mode
  badge (Live vs Polling) so operators see when SSE degrades to polling.
- fleet-client: extend patchJob to carry optional checkpoint/blockedReason
  matching the server PatchJobSchema; add FleetCheckpoint type.
- tests: unit cover budgetUsagePct + patchJob checkpoint forwarding; e2e
  asserts the polling indicator appears when the stream is unavailable.
- ci: add a Gitea Playwright e2e job that runs the fleet control-plane specs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 20:35:05 -07:00
Saravanakumar D
4ac499f301 feat: add multi-reviewer human gate (review-policy routing)
Implements the §14 Phase 3 review gate. requestReview() routes a building
job into the review stage (fencing any worker), carrying a normalized policy
(requiredApprovals + reviewer allowlist) and clearing prior decisions.
submitReview() records one decision per reviewer (last-write-wins, identity-
normalized), advances the job to testing once distinct approvals reach the
quorum, and treats any reject as a veto that returns the job to queued for
rework. Adds POST /fleet/jobs/:id/review/request and POST /fleet/jobs/:id/review,
a typed client, and a review-gate card on the job-detail page.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 19:04:24 -07:00
Saravanakumar D
c9c2c174db feat: add fleet metrics + alerting (GET /fleet/metrics)
Adds coordinator.fleetMetrics() computing queue depth, stage histogram,
oldest-queued age (starvation signal), factory health and seat utilization,
plus derived alerts (no_live_capacity, all_factories_down, queue_starvation,
saturated, stale_factories). Exposed via GET /fleet/metrics and surfaced as a
metrics+alerts panel on the fleet overview. Thresholds injectable for tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:51:59 -07:00
Saravanakumar D
d780739cbe test: add fleet control-plane Playwright e2e coverage
New e2e/fleet.spec.ts with a method- and URL-aware /api/fleet/** mock that
holds mutable state so operator actions and budget toggles reflect in
follow-up GETs. Covers: fleet overview (factory cards + recent jobs), jobs
table + stage filter, job detail requeue (stage building->queued) with the
SSE-driven Live badge, and budget pause/resume. All 4 specs green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:44:53 -07:00
Saravanakumar D
ea42602407 feat: add resumable SSE live event stream for fleet jobs
Backend: GET /fleet/jobs/:id/events/stream emits a snapshot (seq > Last-Event-ID)
then long-polls the append-only event log, closing after a bounded window so
EventSource-style clients reconnect cleanly. Honors Last-Event-ID resume,
keepalive comments, and a terminal error frame.

Frontend: subscribeJobEvents uses fetch streaming (to send auth + product
headers) with parseSseFrames, Last-Event-ID resume, reconnect backoff, and a
fatal-on-error-frame fallback to polling. Job detail page subscribes live
(deduped by seq), falls back to 4s polling on failure, and shows a Live badge;
refresh() now merges events so a slow snapshot can't clobber streamed ones.

Tests: +3 route (snapshot, resume cursor, append-after-connect), +5 client
(parseSseFrames x2, subscribe deliver/error/resume/error-frame). fleet 150, web 222.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:38:50 -07:00
Saravanakumar D
89860e39f9 feat: add fleet cost burndown chart
- coordinator.costBurndown() aggregates completed run cost (insights.costUsd)
  by UTC day over a window, returning a gap-free cumulative series + ceiling
- repository.listRunsByProduct() cross-partition run query
- GET /fleet/budgets/:productId/burndown?days=N route
- fleet-client.getBudgetBurndown() + CostBurndown/BurndownPoint types
- BurndownChart on the budget page: cumulative daily bars with a dashed
  ceiling overlay; bars turn red past the ceiling; degrades gracefully
- Tests: +2 coordinator, +1 routes, +2 fleet-client (fleet 147, web 216)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:25:27 -07:00
Saravanakumar D
2d5f9be642 feat: surface scoring explainability in fleet control plane
Adds 'why does this job route here?' to the §7 scheduler:
- coordinator.explainJob() re-runs scoreCandidate against every live factory,
  returning per-factory weighted breakdown, eligibility + reasons, deps state,
  and the best eligible factory (read-only, side-effect free)
- GET /fleet/jobs/:id/explain route (404 when job missing)
- fleet-client.getJobExplain() + JobExplain/ScoreBreakdown types
- ExplainPanel on the job detail page: score table per factory with the six
  weighted terms, eligibility, and unmet-deps note; degrades gracefully
- Tests: +2 coordinator, +1 routes, +2 fleet-client (fleet 144 green,
  tracker-web 214 green)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:21:14 -07:00
Saravanakumar D
283383561c feat: complete operator job actions (requeue/reject/cancel)
Adds lease-free operator lifecycle control to the fleet control plane:
- coordinator.operatorAction() fences any current factory holder by bumping
  leaseEpoch (mirrors the reaper), preserves checkpoint, and routes the job:
  requeue -> queued (or blocked if deps unmet), reject -> dead_letter,
  cancel -> failed. Shipped jobs are terminal (invalid_state).
- POST /fleet/jobs/:id/actions/:action route (400 on unknown action)
- fleet-client.operatorAction() + Requeue/Cancel/Reject buttons on job detail
- Tests: +5 coordinator, +1 routes, +2 fleet-client (platform fleet 140 green,
  tracker-web 212 green)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:12:11 -07:00
4777b28698 feat(dashboards): add ops cockpit and execution pipeline
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 14s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 42s
2026-05-30 23:12:06 +00:00
87acb8e414 test(admin-web): stabilize blocking e2e suite
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 44s
2026-05-30 22:20:14 +00:00
7465b21d91 style(admin-web): format dashboard sources
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 53s
2026-05-30 21:18:09 +00:00
20e1ac0e67 feat(admin-web): sync product context changes
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 52s
2026-05-30 21:04:04 +00:00
f77797881b feat(admin-web): add dashboard retry state
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 11s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 42s
2026-05-30 21:02:21 +00:00
e69ffadb4b feat(tracker-web): sync admin product context
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 48s
2026-05-30 20:27:27 +00:00
c1a88a39e2 feat(tracker-web): add dashboard stats retry state
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 44s
2026-05-30 20:23:14 +00:00
6c49296d40 test(tracker-web): stabilize dashboard e2e assertion
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 13s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 52s
2026-05-30 20:00:06 +00:00
359d99b67f feat(tracker-web): expose webhook ingestion proxy
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 12s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 47s
2026-05-30 19:56:28 +00:00
6e023f3bdc feat(tracker-web): expose agent v1 proxy
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 14s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 42s
2026-05-30 19:54:56 +00:00
ccfbfd194a feat(tracker-web): add admin settings page
Some checks failed
Publish @bytelyst/* packages / publish (push) Failing after 11s
CI — Common Platform / Build, Test & Typecheck (push) Successful in 38s
2026-05-30 19:53:16 +00:00