--- engine: devin cwd: /Users/sd9235/code/mygh/learning_ai_common_plat yolo: true lock: common-plat timeout: 4h --- ROLE: Senior backend/distributed-systems engineer. Implement Phase 2 — Slice 1: the FLEET DATA MODEL + REPOSITORIES as a new platform-service module. This is the durable backbone (§13) that supersedes the single-host stand-ins. NO atomic claim/lease/fencing logic yet — that is Phase 2 Slice 2. This slice is schemas, repositories, container registration, basic guarded CRUD, and tests. NOTE: This runs in a DIFFERENT repo (learning_ai_common_plat), so it does NOT conflict with the agent-queue (devops-tools) slices and can run independently. READ FIRST (this is NOT the platform-service you may assume — verify conventions): - services/platform-service/src/modules/items/{types,repository,routes}.ts — copy this module pattern EXACTLY (types.ts -> repository.ts -> routes.ts, Zod schemas, the cloud-agnostic datastore, productId on every doc, req.log/app.log). - packages/cosmos (container registry) + how existing modules register containers. - The fleet container spec in the roadmap: agent-queue/docs/GIGAFACTORY_ROADMAP.md §13 lives in the devops-tools repo at ../learning_ai_devops_tools — read it for the field lists (fleet_jobs incl. bodyMd + checkpoint; fleet_runs incl. token/ cost/tool/diff insights; fleet_leases incl. leaseEpoch; fleet_factories; fleet_profiles; fleet_events; fleet_artifacts) and §25/§26. PREREQUISITE / BRANCHING: - Branch off CURRENT `main` of learning_ai_common_plat. - New branch: feat/gigafactory-p2-slice1. Push + open a PR. DO NOT merge. STRICT SCOPE: - Add a NEW module: services/platform-service/src/modules/fleet/ (+ its tests). - Register the new Cosmos containers via the existing registration path. - Do NOT modify unrelated modules. Do NOT hand-edit shared infra (.npmrc, docker-prep.sh, tsconfig.base, pnpm-workspace) — those are template-managed. - ESM everywhere ("type": "module", .js import suffixes). No `any` (Zod inference or explicit types). No console.log (use req.log/app.log). Every Cosmos doc has productId. Tests are sacred. DELIVERABLES 1. types.ts — Zod schemas + inferred types for each container, each with productId: - FleetJobDoc (pk /productId): manifestSnapshot, bodyMd (verbatim instructions), stage, idempotencyKey, trackerItemId?, parentId?, kind ('leaf'|'composite', default 'leaf'), checkpoint? { wipBranch, wipBase, wipCommit }, priority, capabilities[], engineClass?, profile?, deps[], depsMode?, timestamps. - FleetRunDoc (pk /jobId): jobId, attempt, factoryId?, engine, profileSnapshot?, startedAt, endedAt?, exit?, verifyResult?, result?, and insights: model?, tokensIn?, tokensOut?, tokensCached?, costUsd?, estimated?, turns?, toolCalls?, filesChanged?, linesAdded?, linesDeleted?. - FleetLeaseDoc (pk /jobId): jobId, holderFactoryId?, expiresAt?, leaseEpoch (number, default 0), renewals, status. (Fields only — reclaim/claim logic is S2.) - FleetFactoryDoc (pk /productId): factoryId, descriptor, capabilities[], health, load, lastHeartbeatAt, seatLimit. - FleetProfileDoc (pk /productId): name, version, immutable snapshot (persona, defaults). FleetEventDoc (pk /jobId): append-only event { type, at, data }. FleetArtifactDoc (pk /jobId): pointers to blob-stored artifacts (no inline logs). - Define enums for stage and result that MATCH the agent-queue lifecycle. 2. repository.ts — one repository per container using the existing datastore abstraction (so DB_PROVIDER=memory works in tests, cosmos in prod): - CRUD: create, getById, list (by productId; jobs also by stage), update (optimistic via _etag where the datastore supports it — expose the etag, even though the ATOMIC claim flow is S2), delete where sensible. - appendEvent(jobId, event) for the append-only fleet_events stream. - All queries partition-aware; no cross-partition fan-out in hot paths. 3. container registration — register all fleet_* containers with correct partition keys via the existing cosmos container registry; memory provider auto-handles. 4. routes.ts — minimal guarded REST under the existing auth + productId middleware: - POST /fleet/jobs (create), GET /fleet/jobs (list by stage/productId), GET /fleet/jobs/:id, PATCH /fleet/jobs/:id (stage/fields), and read endpoints for runs (GET /fleet/jobs/:id/runs) + events. Keep it thin — claim/lease endpoints are S2. Validate all bodies with the Zod schemas. - Register the route module in the platform-service app the same way items does. TESTS (Vitest — write alongside; memory provider; tests sacred): - schema validation: valid docs pass; missing productId / bad enum fail with precise errors; at least one invalid case per container. - repository CRUD round-trip per container (create→get→list→update→delete) on the memory provider; list filters by productId and by stage (jobs). - appendEvent produces an ordered, append-only stream for a jobId. - routes: create+get+list+patch a job via fastify inject (use the shared testing helpers); auth/productId enforced; invalid body rejected. - _etag surfaced on update (lost-update guard groundwork) — assert the etag flows. VERIFY GATE (must pass): - pnpm --filter @lysnrai/platform-service typecheck - pnpm --filter @lysnrai/platform-service test (new tests green; none weakened) - pnpm --filter @lysnrai/platform-service build DOCS: - Short module README or header docblock describing the containers + that claim/lease/fencing is Phase 2 Slice 2. - In ../learning_ai_devops_tools roadmap you may NOT edit (different repo) — instead note in your report which §13 items are now satisfied so I can tick them. CONSTRAINTS: follow the items-module conventions precisely; ESM .js imports; no any; no console.log; productId everywhere; conventional commits (feat(platform-service): ...); do not touch template-managed infra files. FINAL OUTPUT — print the report in EXACTLY this format: ## Implementation Report — Phase 2 Slice 1 ### Branch & commits - branch / based-on / PR - commits: ### Files changed - : ### What was implemented (1-4) - containers + schemas + repos + routes; partition keys; etag handling ### Tests added - : (+ pnpm test summary: N passed) ### Verify gate results - typecheck / test / build: ### §13 items now satisfied - ### Deviations / assumptions - ### Suggested next slice - Phase 2 Slice 2: atomic claim (_etag/If-Match) + lease renew/release + heartbeat + reaper + fencing (leaseEpoch) — the concurrency core.