learning_ai_common_plat/docs/gigafactory-phase3-progress.md
Saravanakumar D 4468a69526 feat(fleet): tunable scoring weights + preemption (Phase 3 Slice 1)
- Add FleetWeightRegistry + resolveWeights() for per-product/per-request
  weight tunability with defaults fallback (backward compatible)
- Add selectPreemptionVictim() pure function: only critical jobs may
  trigger, never evicts equal/higher priority, picks lowest-priority victim
- Wire preemption into coordinator behind FLEET_PREEMPTION flag (default OFF)
- Seat-limit enforcement: at seatLimit factories skip normal selection and
  attempt preemption of lower-priority running jobs for critical newcomers
- Eviction preserves checkpoint, bumps leaseEpoch (fences zombie), requeues
- 18 new tests (pure scheduler + coordinator integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 09:49:23 -07:00

2.0 KiB

Gigafactory Phase 3 — Progress

Slice Name Status Commit Verify Gate
1 Tunable scoring weights + preemption DONE TBD 119 fleet tests , full build , pnpm test
2 DAG job decomposition WIP
3 Per-product budgets pending
4 tracker-web Fleet Control Plane UI pending
5 Docs + roadmap pending

Slice 1 — Tunable scoring weights + preemption

Key files:

  • services/platform-service/src/modules/fleet/scheduler.ts — added resolveWeights(), selectPreemptionVictim(), FleetWeightRegistry, RunningJobView
  • services/platform-service/src/modules/fleet/coordinator.ts — added isPreemptionEnabled(), setWeightRegistry(), seat-limit enforcement, preemption wiring

Flags: FLEET_PREEMPTION (default OFF = byte-for-byte Phase 2 behavior)

Tests added: 18 (14 scheduler pure + 4 coordinator integration)

  • Weight resolution: defaults, partial override, per-request precedence, backward compat
  • Preemption pure: critical evicts lower, never evicts equal/higher, picks lowest victim, capability checks
  • Preemption integration: flag OFF no eviction, flag ON eviction + checkpoint preserved + zombie fenced + event

Verify gate: pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet → 119/119 ; pnpm build && pnpm test → all green

Follow-ups

  • Weight registry could be loaded from Cosmos (per-product config doc) in a later phase
  • Seat limit enforcement is tied to FLEET_PREEMPTION flag; could be decoupled later