learning_ai_common_plat/docs/TASKS_TO_COMPLETE.md

# Gigafactory — Tasks to Complete

> Companion to `ROADMAP_COMPLETION_AUDIT.md`. Ordered by priority. Update checkboxes as work lands.

---

- [x] **Operator job actions — requeue / reject / cancel**
  - Priority: P0 (highest-impact safe slice; completes Phase-3 §14 "approve/ship/reject/requeue")
  - Current status: ✅ DONE — `operatorAction` + route + client + UI buttons + 8 tests; fleet 141 green
  - Files involved:
    - `services/platform-service/src/modules/fleet/coordinator.ts` (new `operatorAction`)
    - `services/platform-service/src/modules/fleet/routes.ts` (new `POST /fleet/jobs/:id/actions/:action`)
    - `services/platform-service/src/modules/fleet/coordinator.test.ts` (tests)
    - `dashboards/tracker-web/src/lib/fleet-client.ts` (client fn)
    - `dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx` (buttons)
  - Implementation plan: operator action does NOT require a held lease; it bumps `leaseEpoch`
    to fence any current holder (mirrors the reaper), preserves checkpoint, sets stage
    (requeue→queued/blocked, reject→dead_letter, cancel→failed), appends an event.
  - Acceptance criteria: requeue a building job → stage queued, epoch+1, zombie report fenced (409);
    reject → dead_letter; cancel → failed; unknown action → 400; flag-independent; all prior tests green.
  - Verification command: `pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet`

- [x] **Scoring explainability surfaced in UI**
  - Priority: P1 (data already computed; Phase-3 §14)
  - Current status: ✅ DONE — `explainJob` + `GET /fleet/jobs/:id/explain` + ExplainPanel; fleet 144 green
  - Files involved: `scheduler.ts`, `coordinator.ts`, `routes.ts`, `fleet-client.ts`, fleet job detail page
  - Implementation plan: add `GET /fleet/jobs/:id/explain` returning the would-be score breakdown
    against current factories; render a "why this routes here" panel.
  - Acceptance criteria: endpoint returns per-factor contributions; UI shows them; degrade if absent.
  - Verification command: `pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet`

- [x] **Cost burndown chart**
  - Priority: P1
  - Current status: ✅ DONE — `costBurndown` + `GET /fleet/budgets/:id/burndown` + BurndownChart; fleet 147 green
  - Files involved: `dashboards/tracker-web/src/app/dashboard/fleet/budget/page.tsx`, new client fn
  - Implementation plan: aggregate run cost by day from events/runs; render burndown vs ceiling overlay.
  - Acceptance criteria: per-day spend visible with ceiling line; empty state when no data.
  - Verification command: `pnpm --filter @bytelyst/tracker-web test`

- [x] **SSE live log streaming**
  - Priority: P2 (larger; §17 single-stream contract)
  - Current status: ✅ DONE — `GET /fleet/jobs/:id/events/stream` (resumable SSE) + `subscribeJobEvents`
    fetch-streaming consumer with Last-Event-ID resume, polling fallback, and a Live badge; fleet 150,
    web 222 green
  - Files involved: `services/platform-service/src/modules/fleet/routes.ts` (stream route + clampInt/delay),
    `dashboards/tracker-web/src/lib/fleet-client.ts` (`parseSseFrames`, `subscribeJobEvents`),
    job detail page (live subscribe + fallback + Live indicator), route + client tests
  - Implementation plan: `GET /fleet/jobs/:id/events/stream` (SSE) emitting appended events;
    UI subscribes via fetch streaming (auth headers) with polling fallback.
  - Acceptance criteria: new events appear without refresh; reconnect + fallback work.
  - Verification command: `pnpm --filter @lysnrai/platform-service test`

- [x] **Fleet Playwright e2e**
  - Priority: P2 (Phase-3 exit gate)
  - Current status: ✅ DONE — `e2e/fleet.spec.ts`, 4 specs (overview, jobs table, job-detail requeue +
    live badge, budget pause/resume) against a method/URL-aware mocked `/api/fleet/**`; all green
  - Files involved: `dashboards/tracker-web/e2e/fleet.spec.ts`
  - Implementation plan: cover fleet map render, jobs table, job detail action, budget pause/resume
    against a mocked fleet API.
  - Acceptance criteria: e2e green in CI config.
  - Verification command: `pnpm --filter @bytelyst/tracker-web exec playwright test fleet`

- [ ] **Phase-1 `budget.wall` enforcement** — P3 — `agent-queue.sh` — wall-clock ceiling extending timeout.
- [ ] **Node `dash` tag surfacing** — P3 — `dashboard.mjs` — profile/priority/caps/tracker-item link.
- [ ] **Roadmap §14 reconciliation** — P3 — tick Phase-2/3 boxes in `learning_ai_devops_tools`.
- [x] **Fleet metrics + alerting** — P3 — ✅ DONE — `GET /fleet/metrics` (`coordinator.fleetMetrics`):
      queue depth, stage histogram, oldest-queued age (starvation), factory health/seat utilization, and
      derived alerts (`no_live_capacity`, `all_factories_down`, `queue_starvation`, `saturated`,
      `stale_factories`). Surfaced as a metrics+alerts panel on the fleet overview (`getFleetMetrics`).
      Files: `coordinator.ts`, `routes.ts`, `fleet-client.ts`, `dashboard/fleet/page.tsx` + tests + e2e.
- [x] **Multi-reviewer routing** — P3 — ✅ DONE — review-policy human gate (§14). `requestReview`
      routes a building job into `review` (fences worker); `submitReview` records per-reviewer
      approve/reject (last-write-wins, identity-normalized), advances to `testing` once distinct
      approvals reach the quorum, or vetoes any reject back to `queued` for rework. Routes:
      `POST /fleet/jobs/:id/review/request`, `POST /fleet/jobs/:id/review`. UI: review-gate card on
      job detail (`requestReview`/`submitReview`). Files: `types.ts`, `coordinator.ts`, `routes.ts`,
      `fleet-client.ts`, `dashboard/fleet/jobs/[id]/page.tsx` + coordinator/route/client tests + e2e.
- [ ] **TUI re-point at `/fleet`** — P3 — Phase-3 §14.

### Phase 4 / 5 (post-MVP, tracked only)

- [ ] Message broker (NATS/Redis) push dispatch + backpressure
- [ ] Autoscaling hooks (ephemeral factories)
- [ ] Capability marketplace + cross-product fairness
- [ ] Load + chaos suite
- [ ] Outcome feature capture · offline eval harness · A/B weight tuning · recommendations