- Created docs/FLEET_CONTROL_PLANE.md — full operational guide covering: - Feature flags (FLEET_PREEMPTION, FLEET_BUDGETS) - Tunable scoring weights + resolution order - Preemption rules and behavior - DAG job decomposition API - Per-product budgets with auto-pause - Fleet Control Plane UI pages and configuration - API reference summary - Architecture decisions - Updated docs/gigafactory-phase3-progress.md — all 5 slices DONE with commit SHAs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5.0 KiB
Gigafactory Phase 3 — Progress
| Slice | Name | Status | Commit | Verify Gate |
|---|---|---|---|---|
| 1 | Tunable scoring weights + preemption | DONE | 4a209e23 | 119 fleet tests ✅, full build ✅, pnpm test ✅ |
| 2 | DAG job decomposition | DONE | 26606c85 | 127 fleet tests ✅, full build ✅, pnpm test ✅ |
| 3 | Per-product budgets | DONE | fd1b18d7 | 134 fleet tests ✅, full build ✅, pnpm test ✅ |
| 4 | tracker-web Fleet Control Plane UI | DONE | 39ade652 | 198 tracker-web tests ✅, full build ✅ |
| 5 | Docs + roadmap | DONE | (this) | — |
Slice 1 — Tunable scoring weights + preemption
Key files:
services/platform-service/src/modules/fleet/scheduler.ts— addedresolveWeights(),selectPreemptionVictim(),FleetWeightRegistry,RunningJobViewservices/platform-service/src/modules/fleet/coordinator.ts— addedisPreemptionEnabled(),setWeightRegistry(), seat-limit enforcement, preemption wiring
Flags: FLEET_PREEMPTION (default OFF = byte-for-byte Phase 2 behavior)
Tests added: 18 (14 scheduler pure + 4 coordinator integration)
- Weight resolution: defaults, partial override, per-request precedence, backward compat
- Preemption pure: critical evicts lower, never evicts equal/higher, picks lowest victim, capability checks
- Preemption integration: flag OFF no eviction, flag ON eviction + checkpoint preserved + zombie fenced + event
Verify gate: pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet → 119/119 ✅; pnpm build && pnpm test → all green
Slice 2 — DAG job decomposition
Key files:
services/platform-service/src/modules/fleet/types.ts—SubmitChildrenSchema, addedchildren[]toSubmitJobSchemaservices/platform-service/src/modules/fleet/repository.ts—listChildrenByParent()services/platform-service/src/modules/fleet/coordinator.ts—maybeUnblockParent(),submitChildren(),getDagSubtree()services/platform-service/src/modules/fleet/routes.ts— POST /fleet/jobs/:id/children, GET /fleet/jobs/:id/dag
Design: Children's idempotency keys added to parent's deps[]. Existing unmetDeps()/stageForDeps() logic handles blocking/unblocking. Atomic fan-out via submitJob() with children[] array.
Tests added: 8 (DAG fan-out submit, child unblock parent, subtree retrieval)
Verify gate: 127/127 fleet tests ✅; full build + test green
Slice 3 — Per-product budgets
Key files:
services/platform-service/src/modules/fleet/types.ts—FleetBudgetDoc,UpsertBudgetSchemaservices/platform-service/src/modules/fleet/repository.ts— budget CRUD (getBudget, upsertBudget, updateBudget)services/platform-service/src/modules/fleet/coordinator.ts—isBudgetsEnabled(), budget enforcement inclaimNextJob,accrueSpend()with auto-pauseservices/platform-service/src/modules/fleet/routes.ts— GET/PUT /fleet/budgets/:productId, POST pause/resume
Flags: FLEET_BUDGETS (default OFF)
Design: Budget checked FIRST in claim loop — if paused or ceiling exceeded, immediately return null (no job scan). accrueSpend() auto-pauses when ceiling reached.
Tests added: 7
Verify gate: 134/134 fleet tests ✅; full build + test green
Slice 4 — tracker-web Fleet Control Plane UI
Key files:
dashboards/tracker-web/src/lib/fleet-client.ts— Typed API client with graceful 404 → null degradationdashboards/tracker-web/src/app/api/fleet/[...path]/route.ts— Proxy route to platform-servicedashboards/tracker-web/src/app/dashboard/fleet/page.tsx— Fleet overview (factory cards + recent jobs)dashboards/tracker-web/src/app/dashboard/fleet/jobs/page.tsx— Job table with stage filter tabsdashboards/tracker-web/src/app/dashboard/fleet/jobs/[id]/page.tsx— Job detail (events timeline, runs, artifacts, DAG, SHIP action)dashboards/tracker-web/src/app/dashboard/fleet/budget/page.tsx— Budget panel (ceiling/spent bar, pause/resume)dashboards/tracker-web/src/app/dashboard/layout.tsx— Added "Fleet" nav item
UI degrades gracefully: If platform-service fleet module returns 404 (feature flags off), pages show informational empty states.
Tests added: 16 (fleet-client unit tests, 198 total tracker-web)
Verify gate: 198 tracker-web tests ✅; full build green
Slice 5 — Docs + roadmap
See docs/FLEET_CONTROL_PLANE.md for the operational guide.
Follow-ups
- Weight registry could be loaded from Cosmos (per-product config doc) in a later phase
- Seat limit enforcement is tied to FLEET_PREEMPTION flag; could be decoupled later
- E2E Playwright tests for fleet UI (pending Playwright setup in CI)
- Budget history/audit log endpoint
- Real-time WebSocket updates for job stage transitions in the UI