bytelyst-devops-tools/agent-queue/docs/jobs/phase2-scheduler.md

---
engine: devin
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
yolo: true
lock: common-plat-scheduler
timeout: 4h
---

ROLE: Senior backend engineer. Implement the PHASE 2 SCHEDULER / ROUTER CORE (§7)
for the fleet coordinator: a deterministic, fixed-weight scoring engine that picks
WHICH job a claiming factory gets, and wire it into the atomic claim.

PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
- You OWN: services/platform-service/src/modules/fleet/scheduler.ts (NEW),
  scheduler.test.ts (NEW), and the candidate-ranking section of coordinator.ts +
  coordinator.test.ts.
- You MUST NOT touch: types.ts, repository.ts, routes.ts, cosmos-init.ts, server.ts
  (another Devin is editing those for fleet_artifacts). If you need a new type, define
  it inside scheduler.ts. If wiring truly requires a types.ts change, instead re-export
  from scheduler.ts. Import existing FleetJobDoc/FleetFactoryDoc from types.ts (read-only).
- A third Devin is in a different repo (agent-queue) — no overlap.

READ FIRST:
- services/platform-service/src/modules/fleet/coordinator.ts — claimNextJob /
  tryClaimJob: today it selects "highest-priority, oldest, deps-satisfied, capability-
  subset". You will replace the SELECTION step with the scoring engine (keep the atomic
  tryClaimJob CAS exactly as-is).
- types.ts (read-only) — FleetJobDoc (priority, capabilities, budget, createdAt, deps,
  stage), FleetFactoryDoc (capabilities, health, load, seatLimit).
- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §7 (the formula
  + tie-breaks + phasing note: Phase 2 = fixed weights; Phase 3 = tunable + preemption).

PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-scheduler.
Push + open PR. DO NOT merge.

DELIVERABLES
1. scheduler.ts (pure, no I/O, fully unit-testable):
   - Weight config (fixed defaults, overridable via a passed-in object — NOT env here):
     score = w1·capabilityFit + w2·affinity(prefersEngine/repo-stickiness)
           + w3·(1/(1+load)) + w4·costFit(budget) + w5·health − w6·starvationPenalty(age)
   - `scoreCandidate(job, factory, ctx, weights?) → { score, breakdown }` — return the
     per-term breakdown for explainability (§7/Phase-3 readiness).
   - `selectJob(candidates: FleetJobDoc[], factory, ctx, weights?) → FleetJobDoc | null` —
     filter to deps-satisfied + capability-subset (reuse the coordinator's existing
     predicates; if they're inline, extract pure helpers INTO scheduler.ts), then rank by
     score; deterministic tie-break: higher priority → older createdAt → lower cost class.
   - Pure, synchronous, no datastore calls. Health/load come from the factory doc; age
     from job.createdAt vs ctx.now (coordinator-authoritative time, passed in).
2. Wire into coordinator.claimNextJob: replace the ad-hoc selection with
   `selectJob(...)`, passing the existing candidate set + the claiming factory + ctx.now.
   Keep tryClaimJob's rev/updateIfMatch CAS and lease/fence logic byte-for-byte unchanged.
   If the claim has no factory capabilities/health context today, thread the minimal fields
   through ClaimContext (additive, in coordinator.ts only).

TESTS (scheduler.test.ts + additions to coordinator.test.ts — tests are sacred):
- capabilityFit: a factory missing a required cap → candidate filtered out (never selected).
- priority dominates when all else equal; age breaks ties deterministically.
- load: higher-load factory lowers score (1/(1+load)); health: degraded < ok.
- starvation: an old low-priority job eventually outranks a fresh low-priority one.
- costFit: a job exceeding the factory/budget cost class is penalized/last.
- breakdown: scoreCandidate returns each weighted term (sums to score).
- selectJob determinism: same inputs → same pick across runs; empty/no-eligible → null.
- coordinator integration: claimNextJob still returns exactly one winner under the existing
  concurrency tests (all prior fleet tests stay green); selection now follows the score.

VERIFY GATE:
- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet  (all green)
- pnpm --filter @lysnrai/platform-service build
- pnpm build && pnpm test  (no regression)

CONSTRAINTS: ESM .js imports; no any; no console.log; fixed weights this phase (tunable +
preemption are Phase 3 — do NOT build them); pure scheduler (no I/O); conventional commits
(feat(platform-service): ...); do not touch the files reserved above; do not edit the
agent-queue repo.

FINAL OUTPUT — report in EXACTLY this format:
## Implementation Report — Phase 2 Scheduler/Router Core (§7)
### Branch & commits / PR
### Files changed
### What was implemented (scoring terms, tie-breaks, coordinator wiring)
### Tests added (+ pnpm test summary)
### Verify gate results
### Deviations / assumptions (what ctx fields were threaded, weight defaults chosen)
### Suggested next slice (Phase 3 tunable weights + preemption + explainability UI)