docs(agent-queue): draft parallel P2 prompts — scheduler/router core (§7) + fleet artifacts blob wiring (§13)

2026-05-29 22:09:00 -07:00 · 2026-05-29 22:09:00 -07:00 · 10395983e7
commit 10395983e7
parent a8dd166108
2 changed files with 170 additions and 0 deletions
--- a/agent-queue/docs/jobs/phase2-artifacts.md
+++ b/agent-queue/docs/jobs/phase2-artifacts.md
@ -0,0 +1,86 @@
+---
+engine: devin
+cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
+yolo: true
+lock: common-plat-artifacts
+timeout: 4h
+---
+
+ROLE: Senior backend engineer. Implement FLEET ARTIFACTS + BLOB WIRING (§13 leftover):
+large run outputs (logs, coverage, screenshots, build output) are stored in blob
+storage and only POINTERS (with size/content-type/SAS) live in the `fleet_artifacts`
+Cosmos container — NEVER inline in Cosmos (doc-size + RU limits).
+
+PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
+- You OWN the fleet_artifacts surface: types.ts (artifact schema only), repository.ts
+  (artifact repo only), routes.ts (artifact endpoints only), cosmos-init.ts (only if the
+  fleet_artifacts container needs registration), and a NEW artifacts.test.ts.
+- You MUST NOT touch: coordinator.ts, coordinator.test.ts, scheduler.ts (another Devin owns
+  the scheduler + claim ranking). Keep your edits to types/repository/routes additive and
+  localized to the artifact pieces — do not refactor the job/lease/claim code.
+- A third Devin is in a different repo (agent-queue) — no overlap.
+
+READ FIRST:
+- services/platform-service/src/modules/fleet/types.ts — find FleetArtifactDoc (the
+  foundation may already declare it, pk /jobId). repository.ts — see if an artifacts repo
+  already exists; extend, don't duplicate. cosmos-init.ts — see if fleet_artifacts is
+  already registered.
+- packages/blob (@bytelyst/blob) — the Azure Blob client + SAS token helpers. Learn the
+  exact API (upload, container/key conventions, SAS generation, the memory/dev fallback).
+  Use it the same way other consumers do (grep for existing @bytelyst/blob usage).
+- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §13 (fleet_artifacts
+  bullet) + §26 (insights/artifacts).
+
+PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-artifacts.
+Push + open PR. DO NOT merge.
+
+DELIVERABLES
+1. FleetArtifactDoc (in types.ts — confirm/extend): { id, productId, jobId, runId?, kind
+   ('log'|'coverage'|'screenshot'|'build'|'other'), blobKey, contentType, sizeBytes,
+   sha256?, createdAt }. Zod schema → inferred type. productId on the doc.
+2. repository.ts — artifacts repo: createArtifact, listArtifactsByJob(jobId),
+   getArtifact(id, productId), deleteArtifact. Single-partition (pk /jobId). Do not touch
+   the job/lease/run repos beyond importing shared helpers.
+3. Blob integration (a small artifacts service fn, e.g. in a NEW
+   modules/fleet/artifacts-blob.ts): uploadArtifact(jobId, kind, bytes/stream, contentType)
+   → stores in @bytelyst/blob under a deterministic key
+   (`fleet/<productId>/<jobId>/<id>-<kind>`), returns the persisted FleetArtifactDoc with a
+   short-lived SAS read URL. getArtifactDownload(id) → re-issues a SAS URL. Large content
+   NEVER goes into Cosmos.
+4. routes.ts — guarded endpoints (auth + productId, Zod-validated), additive only:
+   POST   /fleet/jobs/:id/artifacts        (multipart or base64 body → upload + pointer)
+   GET    /fleet/jobs/:id/artifacts        (list pointers)
+   GET    /fleet/artifacts/:artifactId     (pointer + fresh SAS download URL)
+   DELETE /fleet/artifacts/:artifactId
+   Register exactly like the existing fleet routes (do not reorder/rewrite the others).
+
+TESTS (artifacts.test.ts — memory blob + memory datastore; tests are sacred):
+- upload → a fleet_artifacts pointer doc is created with productId, blobKey, sizeBytes,
+  contentType; the bytes live in blob, NOT in the Cosmos doc (assert the doc has no inline
+  payload field).
+- list by job returns only that job's artifacts (partition isolation).
+- get returns a (fresh) SAS download URL; a large payload (> a Cosmos-safe threshold) still
+  succeeds (proves blob offload).
+- delete removes the pointer (and blob if your helper does so).
+- routes via fastify inject: upload/list/get/delete; auth + productId enforced; invalid body
+  → 400; unknown id → 404.
+- existing fleet tests (jobs/leases/claim/events) remain green and untouched.
+
+VERIFY GATE:
+- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet  (all green)
+- pnpm --filter @lysnrai/platform-service build
+- pnpm build && pnpm test  (no consumer regressed)
+
+CONSTRAINTS: ESM .js imports; no any; no console.log; productId on every doc; large logs in
+blob never Cosmos; conventional commits (feat(platform-service): ...); do not touch the files
+reserved for the other Devins; do not edit the agent-queue repo.
+
+FINAL OUTPUT — report in EXACTLY this format:
+## Implementation Report — Fleet Artifacts + Blob Wiring (§13)
+### Branch & commits / PR
+### Files changed
+### What was implemented (artifact schema, blob key scheme, SAS, routes)
+### Tests added (+ pnpm test summary; esp. the "bytes in blob not Cosmos" assertion)
+### Verify gate results
+### Deviations / assumptions (blob API used, dev/memory fallback, SAS TTL)
+### Suggested next slice
--- a/agent-queue/docs/jobs/phase2-scheduler.md
+++ b/agent-queue/docs/jobs/phase2-scheduler.md
@ -0,0 +1,84 @@
+---
+engine: devin
+cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
+yolo: true
+lock: common-plat-scheduler
+timeout: 4h
+---
+
+ROLE: Senior backend engineer. Implement the PHASE 2 SCHEDULER / ROUTER CORE (§7)
+for the fleet coordinator: a deterministic, fixed-weight scoring engine that picks
+WHICH job a claiming factory gets, and wire it into the atomic claim.
+
+PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
+- You OWN: services/platform-service/src/modules/fleet/scheduler.ts (NEW),
+  scheduler.test.ts (NEW), and the candidate-ranking section of coordinator.ts +
+  coordinator.test.ts.
+- You MUST NOT touch: types.ts, repository.ts, routes.ts, cosmos-init.ts, server.ts
+  (another Devin is editing those for fleet_artifacts). If you need a new type, define
+  it inside scheduler.ts. If wiring truly requires a types.ts change, instead re-export
+  from scheduler.ts. Import existing FleetJobDoc/FleetFactoryDoc from types.ts (read-only).
+- A third Devin is in a different repo (agent-queue) — no overlap.
+
+READ FIRST:
+- services/platform-service/src/modules/fleet/coordinator.ts — claimNextJob /
+  tryClaimJob: today it selects "highest-priority, oldest, deps-satisfied, capability-
+  subset". You will replace the SELECTION step with the scoring engine (keep the atomic
+  tryClaimJob CAS exactly as-is).
+- types.ts (read-only) — FleetJobDoc (priority, capabilities, budget, createdAt, deps,
+  stage), FleetFactoryDoc (capabilities, health, load, seatLimit).
+- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §7 (the formula
+  + tie-breaks + phasing note: Phase 2 = fixed weights; Phase 3 = tunable + preemption).
+
+PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-scheduler.
+Push + open PR. DO NOT merge.
+
+DELIVERABLES
+1. scheduler.ts (pure, no I/O, fully unit-testable):
+   - Weight config (fixed defaults, overridable via a passed-in object — NOT env here):
+     score = w1·capabilityFit + w2·affinity(prefersEngine/repo-stickiness)
+           + w3·(1/(1+load)) + w4·costFit(budget) + w5·health − w6·starvationPenalty(age)
+   - `scoreCandidate(job, factory, ctx, weights?) → { score, breakdown }` — return the
+     per-term breakdown for explainability (§7/Phase-3 readiness).
+   - `selectJob(candidates: FleetJobDoc[], factory, ctx, weights?) → FleetJobDoc | null` —
+     filter to deps-satisfied + capability-subset (reuse the coordinator's existing
+     predicates; if they're inline, extract pure helpers INTO scheduler.ts), then rank by
+     score; deterministic tie-break: higher priority → older createdAt → lower cost class.
+   - Pure, synchronous, no datastore calls. Health/load come from the factory doc; age
+     from job.createdAt vs ctx.now (coordinator-authoritative time, passed in).
+2. Wire into coordinator.claimNextJob: replace the ad-hoc selection with
+   `selectJob(...)`, passing the existing candidate set + the claiming factory + ctx.now.
+   Keep tryClaimJob's rev/updateIfMatch CAS and lease/fence logic byte-for-byte unchanged.
+   If the claim has no factory capabilities/health context today, thread the minimal fields
+   through ClaimContext (additive, in coordinator.ts only).
+
+TESTS (scheduler.test.ts + additions to coordinator.test.ts — tests are sacred):
+- capabilityFit: a factory missing a required cap → candidate filtered out (never selected).
+- priority dominates when all else equal; age breaks ties deterministically.
+- load: higher-load factory lowers score (1/(1+load)); health: degraded < ok.
+- starvation: an old low-priority job eventually outranks a fresh low-priority one.
+- costFit: a job exceeding the factory/budget cost class is penalized/last.
+- breakdown: scoreCandidate returns each weighted term (sums to score).
+- selectJob determinism: same inputs → same pick across runs; empty/no-eligible → null.
+- coordinator integration: claimNextJob still returns exactly one winner under the existing
+  concurrency tests (all prior fleet tests stay green); selection now follows the score.
+
+VERIFY GATE:
+- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet  (all green)
+- pnpm --filter @lysnrai/platform-service build
+- pnpm build && pnpm test  (no regression)
+
+CONSTRAINTS: ESM .js imports; no any; no console.log; fixed weights this phase (tunable +
+preemption are Phase 3 — do NOT build them); pure scheduler (no I/O); conventional commits
+(feat(platform-service): ...); do not touch the files reserved above; do not edit the
+agent-queue repo.
+
+FINAL OUTPUT — report in EXACTLY this format:
+## Implementation Report — Phase 2 Scheduler/Router Core (§7)
+### Branch & commits / PR
+### Files changed
+### What was implemented (scoring terms, tie-breaks, coordinator wiring)
+### Tests added (+ pnpm test summary)
+### Verify gate results
+### Deviations / assumptions (what ctx fields were threaded, weight defaults chosen)
+### Suggested next slice (Phase 3 tunable weights + preemption + explainability UI)