learning_ai_common_plat/docs/TASKS_TO_COMPLETE.md
Saravanakumar D 69f553d432 docs: mark operator job actions complete in TASKS_TO_COMPLETE
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:15:53 -07:00

72 lines
4.3 KiB
Markdown

# Gigafactory — Tasks to Complete
> Companion to `ROADMAP_COMPLETION_AUDIT.md`. Ordered by priority. Update checkboxes as work lands.
---
- [x] **Operator job actions — requeue / reject / cancel**
- Priority: P0 (highest-impact safe slice; completes Phase-3 §14 "approve/ship/reject/requeue")
- Current status: ✅ DONE — `operatorAction` + route + client + UI buttons + 8 tests; fleet 141 green
- Files involved:
- `services/platform-service/src/modules/fleet/coordinator.ts` (new `operatorAction`)
- `services/platform-service/src/modules/fleet/routes.ts` (new `POST /fleet/jobs/:id/actions/:action`)
- `services/platform-service/src/modules/fleet/coordinator.test.ts` (tests)
- `dashboards/tracker-web/src/lib/fleet-client.ts` (client fn)
- `dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx` (buttons)
- Implementation plan: operator action does NOT require a held lease; it bumps `leaseEpoch`
to fence any current holder (mirrors the reaper), preserves checkpoint, sets stage
(requeue→queued/blocked, reject→dead_letter, cancel→failed), appends an event.
- Acceptance criteria: requeue a building job → stage queued, epoch+1, zombie report fenced (409);
reject → dead_letter; cancel → failed; unknown action → 400; flag-independent; all prior tests green.
- Verification command: `pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet`
- [ ] **Scoring explainability surfaced in UI**
- Priority: P1 (data already computed; Phase-3 §14)
- Current status: `ScoreBreakdown` computed in `scheduler.ts`, discarded after selection
- Files involved: `scheduler.ts`, `coordinator.ts`, `routes.ts`, `fleet-client.ts`, fleet job detail page
- Implementation plan: add `GET /fleet/jobs/:id/explain` returning the would-be score breakdown
against current factories; render a "why this routes here" panel.
- Acceptance criteria: endpoint returns per-factor contributions; UI shows them; degrade if absent.
- Verification command: `pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet`
- [ ] **Cost burndown chart**
- Priority: P1
- Current status: budget page shows a spend bar only
- Files involved: `dashboards/tracker-web/src/app/dashboard/fleet/budget/page.tsx`, new client fn
- Implementation plan: aggregate run cost by day from events/runs; render burndown vs ceiling overlay.
- Acceptance criteria: per-day spend visible with ceiling line; empty state when no data.
- Verification command: `pnpm --filter @bytelyst/tracker-web test`
- [ ] **SSE live log streaming**
- Priority: P2 (larger; §17 single-stream contract)
- Current status: polling only
- Files involved: new streaming route in platform-service; EventSource consumer in job detail page
- Implementation plan: `GET /fleet/jobs/:id/events/stream` (SSE) emitting appended events;
UI subscribes via EventSource with polling fallback.
- Acceptance criteria: new events appear without refresh; reconnect + fallback work.
- Verification command: `pnpm --filter @lysnrai/platform-service test`
- [ ] **Fleet Playwright e2e**
- Priority: P2 (Phase-3 exit gate)
- Current status: none for fleet pages
- Files involved: `dashboards/tracker-web/e2e/fleet.spec.ts`
- Implementation plan: cover fleet map render, jobs table, job detail action, budget pause/resume
against a mocked fleet API.
- Acceptance criteria: e2e green in CI config.
- Verification command: `pnpm --filter @bytelyst/tracker-web exec playwright test fleet`
- [ ] **Phase-1 `budget.wall` enforcement** — P3 — `agent-queue.sh` — wall-clock ceiling extending timeout.
- [ ] **Node `dash` tag surfacing** — P3 — `dashboard.mjs` — profile/priority/caps/tracker-item link.
- [ ] **Roadmap §14 reconciliation** — P3 — tick Phase-2/3 boxes in `learning_ai_devops_tools`.
- [ ] **Fleet metrics + alerting** — P3 — queue depth, assign latency, utilization, reclaim counts (§17).
- [ ] **Multi-reviewer routing** — P3 — Phase-3 §14.
- [ ] **TUI re-point at `/fleet`** — P3 — Phase-3 §14.
### Phase 4 / 5 (post-MVP, tracked only)
- [ ] Message broker (NATS/Redis) push dispatch + backpressure
- [ ] Autoscaling hooks (ephemeral factories)
- [ ] Capability marketplace + cross-product fairness
- [ ] Load + chaos suite
- [ ] Outcome feature capture · offline eval harness · A/B weight tuning · recommendations