diff --git a/agent-queue/docs/jobs/phase2-artifacts.md b/agent-queue/docs/jobs/phase2-artifacts.md new file mode 100644 index 0000000..3c01402 --- /dev/null +++ b/agent-queue/docs/jobs/phase2-artifacts.md @@ -0,0 +1,86 @@ +--- +engine: devin +cwd: /Users/sd9235/code/mygh/learning_ai_common_plat +yolo: true +lock: common-plat-artifacts +timeout: 4h +--- + +ROLE: Senior backend engineer. Implement FLEET ARTIFACTS + BLOB WIRING (§13 leftover): +large run outputs (logs, coverage, screenshots, build output) are stored in blob +storage and only POINTERS (with size/content-type/SAS) live in the `fleet_artifacts` +Cosmos container — NEVER inline in Cosmos (doc-size + RU limits). + +PARALLEL-SAFETY (two other Devins are running — DO NOT collide): +- You OWN the fleet_artifacts surface: types.ts (artifact schema only), repository.ts + (artifact repo only), routes.ts (artifact endpoints only), cosmos-init.ts (only if the + fleet_artifacts container needs registration), and a NEW artifacts.test.ts. +- You MUST NOT touch: coordinator.ts, coordinator.test.ts, scheduler.ts (another Devin owns + the scheduler + claim ranking). Keep your edits to types/repository/routes additive and + localized to the artifact pieces — do not refactor the job/lease/claim code. +- A third Devin is in a different repo (agent-queue) — no overlap. + +READ FIRST: +- services/platform-service/src/modules/fleet/types.ts — find FleetArtifactDoc (the + foundation may already declare it, pk /jobId). repository.ts — see if an artifacts repo + already exists; extend, don't duplicate. cosmos-init.ts — see if fleet_artifacts is + already registered. +- packages/blob (@bytelyst/blob) — the Azure Blob client + SAS token helpers. Learn the + exact API (upload, container/key conventions, SAS generation, the memory/dev fallback). + Use it the same way other consumers do (grep for existing @bytelyst/blob usage). +- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §13 (fleet_artifacts + bullet) + §26 (insights/artifacts). + +PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-artifacts. +Push + open PR. DO NOT merge. + +DELIVERABLES +1. FleetArtifactDoc (in types.ts — confirm/extend): { id, productId, jobId, runId?, kind + ('log'|'coverage'|'screenshot'|'build'|'other'), blobKey, contentType, sizeBytes, + sha256?, createdAt }. Zod schema → inferred type. productId on the doc. +2. repository.ts — artifacts repo: createArtifact, listArtifactsByJob(jobId), + getArtifact(id, productId), deleteArtifact. Single-partition (pk /jobId). Do not touch + the job/lease/run repos beyond importing shared helpers. +3. Blob integration (a small artifacts service fn, e.g. in a NEW + modules/fleet/artifacts-blob.ts): uploadArtifact(jobId, kind, bytes/stream, contentType) + → stores in @bytelyst/blob under a deterministic key + (`fleet///-`), returns the persisted FleetArtifactDoc with a + short-lived SAS read URL. getArtifactDownload(id) → re-issues a SAS URL. Large content + NEVER goes into Cosmos. +4. routes.ts — guarded endpoints (auth + productId, Zod-validated), additive only: + POST /fleet/jobs/:id/artifacts (multipart or base64 body → upload + pointer) + GET /fleet/jobs/:id/artifacts (list pointers) + GET /fleet/artifacts/:artifactId (pointer + fresh SAS download URL) + DELETE /fleet/artifacts/:artifactId + Register exactly like the existing fleet routes (do not reorder/rewrite the others). + +TESTS (artifacts.test.ts — memory blob + memory datastore; tests are sacred): +- upload → a fleet_artifacts pointer doc is created with productId, blobKey, sizeBytes, + contentType; the bytes live in blob, NOT in the Cosmos doc (assert the doc has no inline + payload field). +- list by job returns only that job's artifacts (partition isolation). +- get returns a (fresh) SAS download URL; a large payload (> a Cosmos-safe threshold) still + succeeds (proves blob offload). +- delete removes the pointer (and blob if your helper does so). +- routes via fastify inject: upload/list/get/delete; auth + productId enforced; invalid body + → 400; unknown id → 404. +- existing fleet tests (jobs/leases/claim/events) remain green and untouched. + +VERIFY GATE: +- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet (all green) +- pnpm --filter @lysnrai/platform-service build +- pnpm build && pnpm test (no consumer regressed) + +CONSTRAINTS: ESM .js imports; no any; no console.log; productId on every doc; large logs in +blob never Cosmos; conventional commits (feat(platform-service): ...); do not touch the files +reserved for the other Devins; do not edit the agent-queue repo. + +FINAL OUTPUT — report in EXACTLY this format: +## Implementation Report — Fleet Artifacts + Blob Wiring (§13) +### Branch & commits / PR +### Files changed +### What was implemented (artifact schema, blob key scheme, SAS, routes) +### Tests added (+ pnpm test summary; esp. the "bytes in blob not Cosmos" assertion) +### Verify gate results +### Deviations / assumptions (blob API used, dev/memory fallback, SAS TTL) +### Suggested next slice diff --git a/agent-queue/docs/jobs/phase2-scheduler.md b/agent-queue/docs/jobs/phase2-scheduler.md new file mode 100644 index 0000000..40169d9 --- /dev/null +++ b/agent-queue/docs/jobs/phase2-scheduler.md @@ -0,0 +1,84 @@ +--- +engine: devin +cwd: /Users/sd9235/code/mygh/learning_ai_common_plat +yolo: true +lock: common-plat-scheduler +timeout: 4h +--- + +ROLE: Senior backend engineer. Implement the PHASE 2 SCHEDULER / ROUTER CORE (§7) +for the fleet coordinator: a deterministic, fixed-weight scoring engine that picks +WHICH job a claiming factory gets, and wire it into the atomic claim. + +PARALLEL-SAFETY (two other Devins are running — DO NOT collide): +- You OWN: services/platform-service/src/modules/fleet/scheduler.ts (NEW), + scheduler.test.ts (NEW), and the candidate-ranking section of coordinator.ts + + coordinator.test.ts. +- You MUST NOT touch: types.ts, repository.ts, routes.ts, cosmos-init.ts, server.ts + (another Devin is editing those for fleet_artifacts). If you need a new type, define + it inside scheduler.ts. If wiring truly requires a types.ts change, instead re-export + from scheduler.ts. Import existing FleetJobDoc/FleetFactoryDoc from types.ts (read-only). +- A third Devin is in a different repo (agent-queue) — no overlap. + +READ FIRST: +- services/platform-service/src/modules/fleet/coordinator.ts — claimNextJob / + tryClaimJob: today it selects "highest-priority, oldest, deps-satisfied, capability- + subset". You will replace the SELECTION step with the scoring engine (keep the atomic + tryClaimJob CAS exactly as-is). +- types.ts (read-only) — FleetJobDoc (priority, capabilities, budget, createdAt, deps, + stage), FleetFactoryDoc (capabilities, health, load, seatLimit). +- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §7 (the formula + + tie-breaks + phasing note: Phase 2 = fixed weights; Phase 3 = tunable + preemption). + +PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-scheduler. +Push + open PR. DO NOT merge. + +DELIVERABLES +1. scheduler.ts (pure, no I/O, fully unit-testable): + - Weight config (fixed defaults, overridable via a passed-in object — NOT env here): + score = w1·capabilityFit + w2·affinity(prefersEngine/repo-stickiness) + + w3·(1/(1+load)) + w4·costFit(budget) + w5·health − w6·starvationPenalty(age) + - `scoreCandidate(job, factory, ctx, weights?) → { score, breakdown }` — return the + per-term breakdown for explainability (§7/Phase-3 readiness). + - `selectJob(candidates: FleetJobDoc[], factory, ctx, weights?) → FleetJobDoc | null` — + filter to deps-satisfied + capability-subset (reuse the coordinator's existing + predicates; if they're inline, extract pure helpers INTO scheduler.ts), then rank by + score; deterministic tie-break: higher priority → older createdAt → lower cost class. + - Pure, synchronous, no datastore calls. Health/load come from the factory doc; age + from job.createdAt vs ctx.now (coordinator-authoritative time, passed in). +2. Wire into coordinator.claimNextJob: replace the ad-hoc selection with + `selectJob(...)`, passing the existing candidate set + the claiming factory + ctx.now. + Keep tryClaimJob's rev/updateIfMatch CAS and lease/fence logic byte-for-byte unchanged. + If the claim has no factory capabilities/health context today, thread the minimal fields + through ClaimContext (additive, in coordinator.ts only). + +TESTS (scheduler.test.ts + additions to coordinator.test.ts — tests are sacred): +- capabilityFit: a factory missing a required cap → candidate filtered out (never selected). +- priority dominates when all else equal; age breaks ties deterministically. +- load: higher-load factory lowers score (1/(1+load)); health: degraded < ok. +- starvation: an old low-priority job eventually outranks a fresh low-priority one. +- costFit: a job exceeding the factory/budget cost class is penalized/last. +- breakdown: scoreCandidate returns each weighted term (sums to score). +- selectJob determinism: same inputs → same pick across runs; empty/no-eligible → null. +- coordinator integration: claimNextJob still returns exactly one winner under the existing + concurrency tests (all prior fleet tests stay green); selection now follows the score. + +VERIFY GATE: +- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet (all green) +- pnpm --filter @lysnrai/platform-service build +- pnpm build && pnpm test (no regression) + +CONSTRAINTS: ESM .js imports; no any; no console.log; fixed weights this phase (tunable + +preemption are Phase 3 — do NOT build them); pure scheduler (no I/O); conventional commits +(feat(platform-service): ...); do not touch the files reserved above; do not edit the +agent-queue repo. + +FINAL OUTPUT — report in EXACTLY this format: +## Implementation Report — Phase 2 Scheduler/Router Core (§7) +### Branch & commits / PR +### Files changed +### What was implemented (scoring terms, tie-breaks, coordinator wiring) +### Tests added (+ pnpm test summary) +### Verify gate results +### Deviations / assumptions (what ctx fields were threaded, weight defaults chosen) +### Suggested next slice (Phase 3 tunable weights + preemption + explainability UI)