docs(agent-queue): draft parallel P2 prompts — scheduler/router core (§7) + fleet artifacts blob wiring (§13)
This commit is contained in:
parent
a8dd166108
commit
10395983e7
86
agent-queue/docs/jobs/phase2-artifacts.md
Normal file
86
agent-queue/docs/jobs/phase2-artifacts.md
Normal file
@ -0,0 +1,86 @@
|
||||
---
|
||||
engine: devin
|
||||
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
|
||||
yolo: true
|
||||
lock: common-plat-artifacts
|
||||
timeout: 4h
|
||||
---
|
||||
|
||||
ROLE: Senior backend engineer. Implement FLEET ARTIFACTS + BLOB WIRING (§13 leftover):
|
||||
large run outputs (logs, coverage, screenshots, build output) are stored in blob
|
||||
storage and only POINTERS (with size/content-type/SAS) live in the `fleet_artifacts`
|
||||
Cosmos container — NEVER inline in Cosmos (doc-size + RU limits).
|
||||
|
||||
PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
|
||||
- You OWN the fleet_artifacts surface: types.ts (artifact schema only), repository.ts
|
||||
(artifact repo only), routes.ts (artifact endpoints only), cosmos-init.ts (only if the
|
||||
fleet_artifacts container needs registration), and a NEW artifacts.test.ts.
|
||||
- You MUST NOT touch: coordinator.ts, coordinator.test.ts, scheduler.ts (another Devin owns
|
||||
the scheduler + claim ranking). Keep your edits to types/repository/routes additive and
|
||||
localized to the artifact pieces — do not refactor the job/lease/claim code.
|
||||
- A third Devin is in a different repo (agent-queue) — no overlap.
|
||||
|
||||
READ FIRST:
|
||||
- services/platform-service/src/modules/fleet/types.ts — find FleetArtifactDoc (the
|
||||
foundation may already declare it, pk /jobId). repository.ts — see if an artifacts repo
|
||||
already exists; extend, don't duplicate. cosmos-init.ts — see if fleet_artifacts is
|
||||
already registered.
|
||||
- packages/blob (@bytelyst/blob) — the Azure Blob client + SAS token helpers. Learn the
|
||||
exact API (upload, container/key conventions, SAS generation, the memory/dev fallback).
|
||||
Use it the same way other consumers do (grep for existing @bytelyst/blob usage).
|
||||
- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §13 (fleet_artifacts
|
||||
bullet) + §26 (insights/artifacts).
|
||||
|
||||
PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-artifacts.
|
||||
Push + open PR. DO NOT merge.
|
||||
|
||||
DELIVERABLES
|
||||
1. FleetArtifactDoc (in types.ts — confirm/extend): { id, productId, jobId, runId?, kind
|
||||
('log'|'coverage'|'screenshot'|'build'|'other'), blobKey, contentType, sizeBytes,
|
||||
sha256?, createdAt }. Zod schema → inferred type. productId on the doc.
|
||||
2. repository.ts — artifacts repo: createArtifact, listArtifactsByJob(jobId),
|
||||
getArtifact(id, productId), deleteArtifact. Single-partition (pk /jobId). Do not touch
|
||||
the job/lease/run repos beyond importing shared helpers.
|
||||
3. Blob integration (a small artifacts service fn, e.g. in a NEW
|
||||
modules/fleet/artifacts-blob.ts): uploadArtifact(jobId, kind, bytes/stream, contentType)
|
||||
→ stores in @bytelyst/blob under a deterministic key
|
||||
(`fleet/<productId>/<jobId>/<id>-<kind>`), returns the persisted FleetArtifactDoc with a
|
||||
short-lived SAS read URL. getArtifactDownload(id) → re-issues a SAS URL. Large content
|
||||
NEVER goes into Cosmos.
|
||||
4. routes.ts — guarded endpoints (auth + productId, Zod-validated), additive only:
|
||||
POST /fleet/jobs/:id/artifacts (multipart or base64 body → upload + pointer)
|
||||
GET /fleet/jobs/:id/artifacts (list pointers)
|
||||
GET /fleet/artifacts/:artifactId (pointer + fresh SAS download URL)
|
||||
DELETE /fleet/artifacts/:artifactId
|
||||
Register exactly like the existing fleet routes (do not reorder/rewrite the others).
|
||||
|
||||
TESTS (artifacts.test.ts — memory blob + memory datastore; tests are sacred):
|
||||
- upload → a fleet_artifacts pointer doc is created with productId, blobKey, sizeBytes,
|
||||
contentType; the bytes live in blob, NOT in the Cosmos doc (assert the doc has no inline
|
||||
payload field).
|
||||
- list by job returns only that job's artifacts (partition isolation).
|
||||
- get returns a (fresh) SAS download URL; a large payload (> a Cosmos-safe threshold) still
|
||||
succeeds (proves blob offload).
|
||||
- delete removes the pointer (and blob if your helper does so).
|
||||
- routes via fastify inject: upload/list/get/delete; auth + productId enforced; invalid body
|
||||
→ 400; unknown id → 404.
|
||||
- existing fleet tests (jobs/leases/claim/events) remain green and untouched.
|
||||
|
||||
VERIFY GATE:
|
||||
- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet (all green)
|
||||
- pnpm --filter @lysnrai/platform-service build
|
||||
- pnpm build && pnpm test (no consumer regressed)
|
||||
|
||||
CONSTRAINTS: ESM .js imports; no any; no console.log; productId on every doc; large logs in
|
||||
blob never Cosmos; conventional commits (feat(platform-service): ...); do not touch the files
|
||||
reserved for the other Devins; do not edit the agent-queue repo.
|
||||
|
||||
FINAL OUTPUT — report in EXACTLY this format:
|
||||
## Implementation Report — Fleet Artifacts + Blob Wiring (§13)
|
||||
### Branch & commits / PR
|
||||
### Files changed
|
||||
### What was implemented (artifact schema, blob key scheme, SAS, routes)
|
||||
### Tests added (+ pnpm test summary; esp. the "bytes in blob not Cosmos" assertion)
|
||||
### Verify gate results
|
||||
### Deviations / assumptions (blob API used, dev/memory fallback, SAS TTL)
|
||||
### Suggested next slice
|
||||
84
agent-queue/docs/jobs/phase2-scheduler.md
Normal file
84
agent-queue/docs/jobs/phase2-scheduler.md
Normal file
@ -0,0 +1,84 @@
|
||||
---
|
||||
engine: devin
|
||||
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
|
||||
yolo: true
|
||||
lock: common-plat-scheduler
|
||||
timeout: 4h
|
||||
---
|
||||
|
||||
ROLE: Senior backend engineer. Implement the PHASE 2 SCHEDULER / ROUTER CORE (§7)
|
||||
for the fleet coordinator: a deterministic, fixed-weight scoring engine that picks
|
||||
WHICH job a claiming factory gets, and wire it into the atomic claim.
|
||||
|
||||
PARALLEL-SAFETY (two other Devins are running — DO NOT collide):
|
||||
- You OWN: services/platform-service/src/modules/fleet/scheduler.ts (NEW),
|
||||
scheduler.test.ts (NEW), and the candidate-ranking section of coordinator.ts +
|
||||
coordinator.test.ts.
|
||||
- You MUST NOT touch: types.ts, repository.ts, routes.ts, cosmos-init.ts, server.ts
|
||||
(another Devin is editing those for fleet_artifacts). If you need a new type, define
|
||||
it inside scheduler.ts. If wiring truly requires a types.ts change, instead re-export
|
||||
from scheduler.ts. Import existing FleetJobDoc/FleetFactoryDoc from types.ts (read-only).
|
||||
- A third Devin is in a different repo (agent-queue) — no overlap.
|
||||
|
||||
READ FIRST:
|
||||
- services/platform-service/src/modules/fleet/coordinator.ts — claimNextJob /
|
||||
tryClaimJob: today it selects "highest-priority, oldest, deps-satisfied, capability-
|
||||
subset". You will replace the SELECTION step with the scoring engine (keep the atomic
|
||||
tryClaimJob CAS exactly as-is).
|
||||
- types.ts (read-only) — FleetJobDoc (priority, capabilities, budget, createdAt, deps,
|
||||
stage), FleetFactoryDoc (capabilities, health, load, seatLimit).
|
||||
- ../learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY_ROADMAP.md §7 (the formula
|
||||
+ tie-breaks + phasing note: Phase 2 = fixed weights; Phase 3 = tunable + preemption).
|
||||
|
||||
PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-scheduler.
|
||||
Push + open PR. DO NOT merge.
|
||||
|
||||
DELIVERABLES
|
||||
1. scheduler.ts (pure, no I/O, fully unit-testable):
|
||||
- Weight config (fixed defaults, overridable via a passed-in object — NOT env here):
|
||||
score = w1·capabilityFit + w2·affinity(prefersEngine/repo-stickiness)
|
||||
+ w3·(1/(1+load)) + w4·costFit(budget) + w5·health − w6·starvationPenalty(age)
|
||||
- `scoreCandidate(job, factory, ctx, weights?) → { score, breakdown }` — return the
|
||||
per-term breakdown for explainability (§7/Phase-3 readiness).
|
||||
- `selectJob(candidates: FleetJobDoc[], factory, ctx, weights?) → FleetJobDoc | null` —
|
||||
filter to deps-satisfied + capability-subset (reuse the coordinator's existing
|
||||
predicates; if they're inline, extract pure helpers INTO scheduler.ts), then rank by
|
||||
score; deterministic tie-break: higher priority → older createdAt → lower cost class.
|
||||
- Pure, synchronous, no datastore calls. Health/load come from the factory doc; age
|
||||
from job.createdAt vs ctx.now (coordinator-authoritative time, passed in).
|
||||
2. Wire into coordinator.claimNextJob: replace the ad-hoc selection with
|
||||
`selectJob(...)`, passing the existing candidate set + the claiming factory + ctx.now.
|
||||
Keep tryClaimJob's rev/updateIfMatch CAS and lease/fence logic byte-for-byte unchanged.
|
||||
If the claim has no factory capabilities/health context today, thread the minimal fields
|
||||
through ClaimContext (additive, in coordinator.ts only).
|
||||
|
||||
TESTS (scheduler.test.ts + additions to coordinator.test.ts — tests are sacred):
|
||||
- capabilityFit: a factory missing a required cap → candidate filtered out (never selected).
|
||||
- priority dominates when all else equal; age breaks ties deterministically.
|
||||
- load: higher-load factory lowers score (1/(1+load)); health: degraded < ok.
|
||||
- starvation: an old low-priority job eventually outranks a fresh low-priority one.
|
||||
- costFit: a job exceeding the factory/budget cost class is penalized/last.
|
||||
- breakdown: scoreCandidate returns each weighted term (sums to score).
|
||||
- selectJob determinism: same inputs → same pick across runs; empty/no-eligible → null.
|
||||
- coordinator integration: claimNextJob still returns exactly one winner under the existing
|
||||
concurrency tests (all prior fleet tests stay green); selection now follows the score.
|
||||
|
||||
VERIFY GATE:
|
||||
- pnpm --filter @lysnrai/platform-service exec vitest run src/modules/fleet (all green)
|
||||
- pnpm --filter @lysnrai/platform-service build
|
||||
- pnpm build && pnpm test (no regression)
|
||||
|
||||
CONSTRAINTS: ESM .js imports; no any; no console.log; fixed weights this phase (tunable +
|
||||
preemption are Phase 3 — do NOT build them); pure scheduler (no I/O); conventional commits
|
||||
(feat(platform-service): ...); do not touch the files reserved above; do not edit the
|
||||
agent-queue repo.
|
||||
|
||||
FINAL OUTPUT — report in EXACTLY this format:
|
||||
## Implementation Report — Phase 2 Scheduler/Router Core (§7)
|
||||
### Branch & commits / PR
|
||||
### Files changed
|
||||
### What was implemented (scoring terms, tie-breaks, coordinator wiring)
|
||||
### Tests added (+ pnpm test summary)
|
||||
### Verify gate results
|
||||
### Deviations / assumptions (what ctx fields were threaded, weight defaults chosen)
|
||||
### Suggested next slice (Phase 3 tunable weights + preemption + explainability UI)
|
||||
Loading…
Reference in New Issue
Block a user