85 lines
4.9 KiB
Markdown
85 lines
4.9 KiB
Markdown
---
|
||
engine: devin
|
||
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
|
||
yolo: true
|
||
lock: common-plat-scheduler
|
||
timeout: 4h
|
||
---
|
||
|
||
ROLE: Senior backend engineer. Implement the PHASE 2 SCHEDULER / ROUTER CORE (§7)
|
||
for the fleet coordinator: a deterministic, fixed-weight scoring engine that picks
|
||
WHICH job a claiming factory gets, and wire it into the atomic claim.
|
||
|
||
PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
|
||
- You OWN: services/platform-service/src/modules/fleet/scheduler.ts (NEW),
|
||
scheduler.test.ts (NEW), and the candidate-ranking section of coordinator.ts +
|
||
coordinator.test.ts.
|
||
- You MUST NOT touch: types.ts, repository.ts, routes.ts, cosmos-init.ts, server.ts
|
||
(another Devin is editing those for fleet_artifacts). If you need a new type, define
|
||
it inside scheduler.ts. If wiring truly requires a types.ts change, instead re-export
|
||
from scheduler.ts. Import existing FleetJobDoc/FleetFactoryDoc from types.ts (read-only).
|
||
- A third Devin is in a different repo (agent-queue) — no overlap.
|
||
|
||
READ FIRST:
|
||
- services/platform-service/src/modules/fleet/coordinator.ts — claimNextJob /
|
||
tryClaimJob: today it selects "highest-priority, oldest, deps-satisfied, capability-
|
||
subset". You will replace the SELECTION step with the scoring engine (keep the atomic
|
||
tryClaimJob CAS exactly as-is).
|
||
- types.ts (read-only) — FleetJobDoc (priority, capabilities, budget, createdAt, deps,
|
||
stage), FleetFactoryDoc (capabilities, health, load, seatLimit).
|
||
- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §7 (the formula
|
||
+ tie-breaks + phasing note: Phase 2 = fixed weights; Phase 3 = tunable + preemption).
|
||
|
||
PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-scheduler.
|
||
Push + open PR. DO NOT merge.
|
||
|
||
DELIVERABLES
|
||
1. scheduler.ts (pure, no I/O, fully unit-testable):
|
||
- Weight config (fixed defaults, overridable via a passed-in object — NOT env here):
|
||
score = w1·capabilityFit + w2·affinity(prefersEngine/repo-stickiness)
|
||
+ w3·(1/(1+load)) + w4·costFit(budget) + w5·health − w6·starvationPenalty(age)
|
||
- `scoreCandidate(job, factory, ctx, weights?) → { score, breakdown }` — return the
|
||
per-term breakdown for explainability (§7/Phase-3 readiness).
|
||
- `selectJob(candidates: FleetJobDoc[], factory, ctx, weights?) → FleetJobDoc | null` —
|
||
filter to deps-satisfied + capability-subset (reuse the coordinator's existing
|
||
predicates; if they're inline, extract pure helpers INTO scheduler.ts), then rank by
|
||
score; deterministic tie-break: higher priority → older createdAt → lower cost class.
|
||
- Pure, synchronous, no datastore calls. Health/load come from the factory doc; age
|
||
from job.createdAt vs ctx.now (coordinator-authoritative time, passed in).
|
||
2. Wire into coordinator.claimNextJob: replace the ad-hoc selection with
|
||
`selectJob(...)`, passing the existing candidate set + the claiming factory + ctx.now.
|
||
Keep tryClaimJob's rev/updateIfMatch CAS and lease/fence logic byte-for-byte unchanged.
|
||
If the claim has no factory capabilities/health context today, thread the minimal fields
|
||
through ClaimContext (additive, in coordinator.ts only).
|
||
|
||
TESTS (scheduler.test.ts + additions to coordinator.test.ts — tests are sacred):
|
||
- capabilityFit: a factory missing a required cap → candidate filtered out (never selected).
|
||
- priority dominates when all else equal; age breaks ties deterministically.
|
||
- load: higher-load factory lowers score (1/(1+load)); health: degraded < ok.
|
||
- starvation: an old low-priority job eventually outranks a fresh low-priority one.
|
||
- costFit: a job exceeding the factory/budget cost class is penalized/last.
|
||
- breakdown: scoreCandidate returns each weighted term (sums to score).
|
||
- selectJob determinism: same inputs → same pick across runs; empty/no-eligible → null.
|
||
- coordinator integration: claimNextJob still returns exactly one winner under the existing
|
||
concurrency tests (all prior fleet tests stay green); selection now follows the score.
|
||
|
||
VERIFY GATE:
|
||
- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet (all green)
|
||
- pnpm --filter @lysnrai/platform-service build
|
||
- pnpm build && pnpm test (no regression)
|
||
|
||
CONSTRAINTS: ESM .js imports; no any; no console.log; fixed weights this phase (tunable +
|
||
preemption are Phase 3 — do NOT build them); pure scheduler (no I/O); conventional commits
|
||
(feat(platform-service): ...); do not touch the files reserved above; do not edit the
|
||
agent-queue repo.
|
||
|
||
FINAL OUTPUT — report in EXACTLY this format:
|
||
## Implementation Report — Phase 2 Scheduler/Router Core (§7)
|
||
### Branch & commits / PR
|
||
### Files changed
|
||
### What was implemented (scoring terms, tie-breaks, coordinator wiring)
|
||
### Tests added (+ pnpm test summary)
|
||
### Verify gate results
|
||
### Deviations / assumptions (what ctx fields were threaded, weight defaults chosen)
|
||
### Suggested next slice (Phase 3 tunable weights + preemption + explainability UI)
|