Rename docs/gigafactory/ to docs/GIGAFACTORY/ and update the cross-repo source-of-truth references in the fleet README and types.ts comment. Add an index README listing the platform docs and pointing to the canonical spec in learning_ai_devops_tools. Docs/comment only; no behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.0 KiB
Gigafactory — Tasks to Complete
Companion to
ROADMAP_COMPLETION_AUDIT.md. Ordered by priority. Update checkboxes as work lands.
-
Operator job actions — requeue / reject / cancel
- Priority: P0 (highest-impact safe slice; completes Phase-3 §14 "approve/ship/reject/requeue")
- Current status: ✅ DONE —
operatorAction+ route + client + UI buttons + 8 tests; fleet 141 green - Files involved:
services/platform-service/src/modules/fleet/coordinator.ts(newoperatorAction)services/platform-service/src/modules/fleet/routes.ts(newPOST /fleet/jobs/:id/actions/:action)services/platform-service/src/modules/fleet/coordinator.test.ts(tests)dashboards/tracker-web/src/lib/fleet-client.ts(client fn)dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx(buttons)
- Implementation plan: operator action does NOT require a held lease; it bumps
leaseEpochto fence any current holder (mirrors the reaper), preserves checkpoint, sets stage (requeue→queued/blocked, reject→dead_letter, cancel→failed), appends an event. - Acceptance criteria: requeue a building job → stage queued, epoch+1, zombie report fenced (409); reject → dead_letter; cancel → failed; unknown action → 400; flag-independent; all prior tests green.
- Verification command:
pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet
-
Scoring explainability surfaced in UI
- Priority: P1 (data already computed; Phase-3 §14)
- Current status: ✅ DONE —
explainJob+GET /fleet/jobs/:id/explain+ ExplainPanel; fleet 144 green - Files involved:
scheduler.ts,coordinator.ts,routes.ts,fleet-client.ts, fleet job detail page - Implementation plan: add
GET /fleet/jobs/:id/explainreturning the would-be score breakdown against current factories; render a "why this routes here" panel. - Acceptance criteria: endpoint returns per-factor contributions; UI shows them; degrade if absent.
- Verification command:
pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet
-
Cost burndown chart
- Priority: P1
- Current status: ✅ DONE —
costBurndown+GET /fleet/budgets/:id/burndown+ BurndownChart; fleet 147 green - Files involved:
dashboards/tracker-web/src/app/dashboard/fleet/budget/page.tsx, new client fn - Implementation plan: aggregate run cost by day from events/runs; render burndown vs ceiling overlay.
- Acceptance criteria: per-day spend visible with ceiling line; empty state when no data.
- Verification command:
pnpm --filter @bytelyst/tracker-web test
-
SSE live log streaming
- Priority: P2 (larger; §17 single-stream contract)
- Current status: ✅ DONE —
GET /fleet/jobs/:id/events/stream(resumable SSE) +subscribeJobEventsfetch-streaming consumer with Last-Event-ID resume, polling fallback, and a Live badge; fleet 150, web 222 green - Files involved:
services/platform-service/src/modules/fleet/routes.ts(stream route + clampInt/delay),dashboards/tracker-web/src/lib/fleet-client.ts(parseSseFrames,subscribeJobEvents), job detail page (live subscribe + fallback + Live indicator), route + client tests - Implementation plan:
GET /fleet/jobs/:id/events/stream(SSE) emitting appended events; UI subscribes via fetch streaming (auth headers) with polling fallback. - Acceptance criteria: new events appear without refresh; reconnect + fallback work.
- Verification command:
pnpm --filter @lysnrai/platform-service test
-
Fleet Playwright e2e
- Priority: P2 (Phase-3 exit gate)
- Current status: ✅ DONE —
e2e/fleet.spec.ts, 4 specs (overview, jobs table, job-detail requeue + live badge, budget pause/resume) against a method/URL-aware mocked/api/fleet/**; all green - Files involved:
dashboards/tracker-web/e2e/fleet.spec.ts - Implementation plan: cover fleet map render, jobs table, job detail action, budget pause/resume against a mocked fleet API.
- Acceptance criteria: e2e green in CI config.
- Verification command:
pnpm --filter @bytelyst/tracker-web exec playwright test fleet
-
Phase-1
budget.wallenforcement — P3 —agent-queue.sh— wall-clock ceiling extending timeout. -
Node
dashtag surfacing — P3 —dashboard.mjs— profile/priority/caps/tracker-item link. -
Roadmap §14 reconciliation — P3 — tick Phase-2/3 boxes in
learning_ai_devops_tools. -
Fleet metrics + alerting — P3 — ✅ DONE —
GET /fleet/metrics(coordinator.fleetMetrics): queue depth, stage histogram, oldest-queued age (starvation), factory health/seat utilization, and derived alerts (no_live_capacity,all_factories_down,queue_starvation,saturated,stale_factories). Surfaced as a metrics+alerts panel on the fleet overview (getFleetMetrics). Files:coordinator.ts,routes.ts,fleet-client.ts,dashboard/fleet/page.tsx+ tests + e2e. -
Multi-reviewer routing — P3 — ✅ DONE — review-policy human gate (§14).
requestReviewroutes a building job intoreview(fences worker);submitReviewrecords per-reviewer approve/reject (last-write-wins, identity-normalized), advances totestingonce distinct approvals reach the quorum, or vetoes any reject back toqueuedfor rework. Routes:POST /fleet/jobs/:id/review/request,POST /fleet/jobs/:id/review. UI: review-gate card on job detail (requestReview/submitReview). Files:types.ts,coordinator.ts,routes.ts,fleet-client.ts,dashboard/fleet/jobs/[id]/page.tsx+ coordinator/route/client tests + e2e. -
TUI re-point at
/fleet— P3 — Phase-3 §14.
Phase 4 / 5 (post-MVP, tracked only)
- Message broker (NATS/Redis) push dispatch + backpressure
- Autoscaling hooks (ephemeral factories)
- Capability marketplace + cross-product fairness
- Load + chaos suite
- Outcome feature capture · offline eval harness · A/B weight tuning · recommendations