learning_ai_common_plat/docs/TASKS_TO_COMPLETE.md
Saravanakumar D 0f903b935a audit: document current Gigafactory completion state
- ROADMAP_COMPLETION_AUDIT.md: verified state vs GIGAFACTORY_ROADMAP source of truth
- TASKS_TO_COMPLETE.md: prioritized remaining work with acceptance criteria
- Key finding: roadmap §0 tracker is stale (P2 ~95%, P3 ~70% actual vs 80%/0% claimed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 18:06:33 -07:00

4.3 KiB

Gigafactory — Tasks to Complete

Companion to ROADMAP_COMPLETION_AUDIT.md. Ordered by priority. Update checkboxes as work lands.


  • Operator job actions — requeue / reject / cancel

    • Priority: P0 (highest-impact safe slice; completes Phase-3 §14 "approve/ship/reject/requeue")
    • Current status: SHIP exists; requeue/reject/cancel missing
    • Files involved:
      • services/platform-service/src/modules/fleet/coordinator.ts (new operatorAction)
      • services/platform-service/src/modules/fleet/routes.ts (new POST /fleet/jobs/:id/actions/:action)
      • services/platform-service/src/modules/fleet/coordinator.test.ts (tests)
      • dashboards/tracker-web/src/lib/fleet-client.ts (client fn)
      • dashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx (buttons)
    • Implementation plan: operator action does NOT require a held lease; it bumps leaseEpoch to fence any current holder (mirrors the reaper), preserves checkpoint, sets stage (requeue→queued/blocked, reject→dead_letter, cancel→failed), appends an event.
    • Acceptance criteria: requeue a building job → stage queued, epoch+1, zombie report fenced (409); reject → dead_letter; cancel → failed; unknown action → 400; flag-independent; all prior tests green.
    • Verification command: pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet
  • Scoring explainability surfaced in UI

    • Priority: P1 (data already computed; Phase-3 §14)
    • Current status: ScoreBreakdown computed in scheduler.ts, discarded after selection
    • Files involved: scheduler.ts, coordinator.ts, routes.ts, fleet-client.ts, fleet job detail page
    • Implementation plan: add GET /fleet/jobs/:id/explain returning the would-be score breakdown against current factories; render a "why this routes here" panel.
    • Acceptance criteria: endpoint returns per-factor contributions; UI shows them; degrade if absent.
    • Verification command: pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet
  • Cost burndown chart

    • Priority: P1
    • Current status: budget page shows a spend bar only
    • Files involved: dashboards/tracker-web/src/app/dashboard/fleet/budget/page.tsx, new client fn
    • Implementation plan: aggregate run cost by day from events/runs; render burndown vs ceiling overlay.
    • Acceptance criteria: per-day spend visible with ceiling line; empty state when no data.
    • Verification command: pnpm --filter @bytelyst/tracker-web test
  • SSE live log streaming

    • Priority: P2 (larger; §17 single-stream contract)
    • Current status: polling only
    • Files involved: new streaming route in platform-service; EventSource consumer in job detail page
    • Implementation plan: GET /fleet/jobs/:id/events/stream (SSE) emitting appended events; UI subscribes via EventSource with polling fallback.
    • Acceptance criteria: new events appear without refresh; reconnect + fallback work.
    • Verification command: pnpm --filter @lysnrai/platform-service test
  • Fleet Playwright e2e

    • Priority: P2 (Phase-3 exit gate)
    • Current status: none for fleet pages
    • Files involved: dashboards/tracker-web/e2e/fleet.spec.ts
    • Implementation plan: cover fleet map render, jobs table, job detail action, budget pause/resume against a mocked fleet API.
    • Acceptance criteria: e2e green in CI config.
    • Verification command: pnpm --filter @bytelyst/tracker-web exec playwright test fleet
  • Phase-1 budget.wall enforcement — P3 — agent-queue.sh — wall-clock ceiling extending timeout.

  • Node dash tag surfacing — P3 — dashboard.mjs — profile/priority/caps/tracker-item link.

  • Roadmap §14 reconciliation — P3 — tick Phase-2/3 boxes in learning_ai_devops_tools.

  • Fleet metrics + alerting — P3 — queue depth, assign latency, utilization, reclaim counts (§17).

  • Multi-reviewer routing — P3 — Phase-3 §14.

  • TUI re-point at /fleet — P3 — Phase-3 §14.

Phase 4 / 5 (post-MVP, tracked only)

  • Message broker (NATS/Redis) push dispatch + backpressure
  • Autoscaling hooks (ephemeral factories)
  • Capability marketplace + cross-product fairness
  • Load + chaos suite
  • Outcome feature capture · offline eval harness · A/B weight tuning · recommendations