Move GIGAFACTORY_ROADMAP.md and GIGAFACTORY_SYSTEM_OVERVIEW.md under agent-queue/docs/gigafactory/ so the scattered top-level docs are easy to discover. Update the README links, the overview code-map, and all phase job-spec source-of-truth paths to the new location. Pure docs move; no behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.7 KiB
| engine | cwd | yolo | lock | timeout |
|---|---|---|---|---|
| devin | /Users/sd9235/code/mygh/learning_ai_common_plat | true | common-plat | 4h |
ROLE: Senior backend/distributed-systems engineer. Implement Phase 2 — Slice 1: the FLEET DATA MODEL + REPOSITORIES as a new platform-service module. This is the durable backbone (§13) that supersedes the single-host stand-ins. NO atomic claim/lease/fencing logic yet — that is Phase 2 Slice 2. This slice is schemas, repositories, container registration, basic guarded CRUD, and tests.
NOTE: This runs in a DIFFERENT repo (learning_ai_common_plat), so it does NOT conflict with the agent-queue (devops-tools) slices and can run independently.
READ FIRST (this is NOT the platform-service you may assume — verify conventions):
- services/platform-service/src/modules/items/{types,repository,routes}.ts — copy this module pattern EXACTLY (types.ts -> repository.ts -> routes.ts, Zod schemas, the cloud-agnostic datastore, productId on every doc, req.log/app.log).
- packages/cosmos (container registry) + how existing modules register containers.
- The fleet container spec in the roadmap: agent-queue/docs/gigafactory/GIGAFACTORY_ROADMAP.md §13 lives in the devops-tools repo at ../learning_ai_devops_tools — read it for the field lists (fleet_jobs incl. bodyMd + checkpoint; fleet_runs incl. token/ cost/tool/diff insights; fleet_leases incl. leaseEpoch; fleet_factories; fleet_profiles; fleet_events; fleet_artifacts) and §25/§26.
PREREQUISITE / BRANCHING:
- Branch off CURRENT
mainof learning_ai_common_plat. - New branch: feat/gigafactory-p2-slice1. Push + open a PR. DO NOT merge.
STRICT SCOPE:
- Add a NEW module: services/platform-service/src/modules/fleet/ (+ its tests).
- Register the new Cosmos containers via the existing registration path.
- Do NOT modify unrelated modules. Do NOT hand-edit shared infra (.npmrc, docker-prep.sh, tsconfig.base, pnpm-workspace) — those are template-managed.
- ESM everywhere ("type": "module", .js import suffixes). No
any(Zod inference or explicit types). No console.log (use req.log/app.log). Every Cosmos doc has productId. Tests are sacred.
DELIVERABLES
-
types.ts — Zod schemas + inferred types for each container, each with productId:
- FleetJobDoc (pk /productId): manifestSnapshot, bodyMd (verbatim instructions), stage, idempotencyKey, trackerItemId?, parentId?, kind ('leaf'|'composite', default 'leaf'), checkpoint? { wipBranch, wipBase, wipCommit }, priority, capabilities[], engineClass?, profile?, deps[], depsMode?, timestamps.
- FleetRunDoc (pk /jobId): jobId, attempt, factoryId?, engine, profileSnapshot?, startedAt, endedAt?, exit?, verifyResult?, result?, and insights: model?, tokensIn?, tokensOut?, tokensCached?, costUsd?, estimated?, turns?, toolCalls?, filesChanged?, linesAdded?, linesDeleted?.
- FleetLeaseDoc (pk /jobId): jobId, holderFactoryId?, expiresAt?, leaseEpoch (number, default 0), renewals, status. (Fields only — reclaim/claim logic is S2.)
- FleetFactoryDoc (pk /productId): factoryId, descriptor, capabilities[], health, load, lastHeartbeatAt, seatLimit.
- FleetProfileDoc (pk /productId): name, version, immutable snapshot (persona, defaults). FleetEventDoc (pk /jobId): append-only event { type, at, data }. FleetArtifactDoc (pk /jobId): pointers to blob-stored artifacts (no inline logs).
- Define enums for stage and result that MATCH the agent-queue lifecycle.
-
repository.ts — one repository per container using the existing datastore abstraction (so DB_PROVIDER=memory works in tests, cosmos in prod):
- CRUD: create, getById, list (by productId; jobs also by stage), update (optimistic via _etag where the datastore supports it — expose the etag, even though the ATOMIC claim flow is S2), delete where sensible.
- appendEvent(jobId, event) for the append-only fleet_events stream.
- All queries partition-aware; no cross-partition fan-out in hot paths.
-
container registration — register all fleet_* containers with correct partition keys via the existing cosmos container registry; memory provider auto-handles.
-
routes.ts — minimal guarded REST under the existing auth + productId middleware:
- POST /fleet/jobs (create), GET /fleet/jobs (list by stage/productId), GET /fleet/jobs/:id, PATCH /fleet/jobs/:id (stage/fields), and read endpoints for runs (GET /fleet/jobs/:id/runs) + events. Keep it thin — claim/lease endpoints are S2. Validate all bodies with the Zod schemas.
- Register the route module in the platform-service app the same way items does.
TESTS (Vitest — write alongside; memory provider; tests sacred):
- schema validation: valid docs pass; missing productId / bad enum fail with precise errors; at least one invalid case per container.
- repository CRUD round-trip per container (create→get→list→update→delete) on the memory provider; list filters by productId and by stage (jobs).
- appendEvent produces an ordered, append-only stream for a jobId.
- routes: create+get+list+patch a job via fastify inject (use the shared testing helpers); auth/productId enforced; invalid body rejected.
- _etag surfaced on update (lost-update guard groundwork) — assert the etag flows.
VERIFY GATE (must pass):
- pnpm --filter @lysnrai/platform-service typecheck
- pnpm --filter @lysnrai/platform-service test (new tests green; none weakened)
- pnpm --filter @lysnrai/platform-service build
DOCS:
- Short module README or header docblock describing the containers + that claim/lease/fencing is Phase 2 Slice 2.
- In ../learning_ai_devops_tools roadmap you may NOT edit (different repo) — instead note in your report which §13 items are now satisfied so I can tick them.
CONSTRAINTS: follow the items-module conventions precisely; ESM .js imports; no any; no console.log; productId everywhere; conventional commits (feat(platform-service): ...); do not touch template-managed infra files.
FINAL OUTPUT — print the report in EXACTLY this format:
Implementation Report — Phase 2 Slice 1
Branch & commits
- branch / based-on / PR
- commits:
Files changed
- :
What was implemented (1-4)
- containers + schemas + repos + routes; partition keys; etag handling
Tests added
- : (+ pnpm test summary: N passed)
Verify gate results
- typecheck / test / build:
§13 items now satisfied
- <list which roadmap §13 boxes are done so the human can tick them>
Deviations / assumptions
- <datastore/etag/provider choices>
Suggested next slice
- Phase 2 Slice 2: atomic claim (_etag/If-Match) + lease renew/release + heartbeat
- reaper + fencing (leaseEpoch) — the concurrency core.