learning_ai_common_plat/docs/gigafactory-phase3-progress.md
Saravanakumar D 4468a69526 feat(fleet): tunable scoring weights + preemption (Phase 3 Slice 1)
- Add FleetWeightRegistry + resolveWeights() for per-product/per-request
  weight tunability with defaults fallback (backward compatible)
- Add selectPreemptionVictim() pure function: only critical jobs may
  trigger, never evicts equal/higher priority, picks lowest-priority victim
- Wire preemption into coordinator behind FLEET_PREEMPTION flag (default OFF)
- Seat-limit enforcement: at seatLimit factories skip normal selection and
  attempt preemption of lower-priority running jobs for critical newcomers
- Eviction preserves checkpoint, bumps leaseEpoch (fences zombie), requeues
- 18 new tests (pure scheduler + coordinator integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 09:49:23 -07:00

32 lines
2.0 KiB
Markdown

# Gigafactory Phase 3 — Progress
| Slice | Name | Status | Commit | Verify Gate |
| ----- | ------------------------------------ | ------- | ------ | ----------------------------------------------- |
| 1 | Tunable scoring weights + preemption | DONE | TBD | 119 fleet tests ✅, full build ✅, pnpm test ✅ |
| 2 | DAG job decomposition | WIP | — | — |
| 3 | Per-product budgets | pending | — | — |
| 4 | tracker-web Fleet Control Plane UI | pending | — | — |
| 5 | Docs + roadmap | pending | — | — |
## Slice 1 — Tunable scoring weights + preemption
**Key files:**
- `services/platform-service/src/modules/fleet/scheduler.ts` — added `resolveWeights()`, `selectPreemptionVictim()`, `FleetWeightRegistry`, `RunningJobView`
- `services/platform-service/src/modules/fleet/coordinator.ts` — added `isPreemptionEnabled()`, `setWeightRegistry()`, seat-limit enforcement, preemption wiring
**Flags:** `FLEET_PREEMPTION` (default OFF = byte-for-byte Phase 2 behavior)
**Tests added:** 18 (14 scheduler pure + 4 coordinator integration)
- Weight resolution: defaults, partial override, per-request precedence, backward compat
- Preemption pure: critical evicts lower, never evicts equal/higher, picks lowest victim, capability checks
- Preemption integration: flag OFF no eviction, flag ON eviction + checkpoint preserved + zombie fenced + event
**Verify gate:** `pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet` → 119/119 ✅; `pnpm build && pnpm test` → all green
## Follow-ups
- Weight registry could be loaded from Cosmos (per-product config doc) in a later phase
- Seat limit enforcement is tied to FLEET_PREEMPTION flag; could be decoupled later