126 lines
6.6 KiB
Markdown
126 lines
6.6 KiB
Markdown
---
|
|
engine: devin
|
|
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
|
|
yolo: true
|
|
lock: common-plat
|
|
timeout: 4h
|
|
---
|
|
|
|
ROLE: Senior backend/distributed-systems engineer. Implement Phase 2 — Slice 1:
|
|
the FLEET DATA MODEL + REPOSITORIES as a new platform-service module. This is the
|
|
durable backbone (§13) that supersedes the single-host stand-ins. NO atomic
|
|
claim/lease/fencing logic yet — that is Phase 2 Slice 2. This slice is schemas,
|
|
repositories, container registration, basic guarded CRUD, and tests.
|
|
|
|
NOTE: This runs in a DIFFERENT repo (learning_ai_common_plat), so it does NOT
|
|
conflict with the agent-queue (devops-tools) slices and can run independently.
|
|
|
|
READ FIRST (this is NOT the platform-service you may assume — verify conventions):
|
|
- services/platform-service/src/modules/items/{types,repository,routes}.ts — copy
|
|
this module pattern EXACTLY (types.ts -> repository.ts -> routes.ts, Zod schemas,
|
|
the cloud-agnostic datastore, productId on every doc, req.log/app.log).
|
|
- packages/cosmos (container registry) + how existing modules register containers.
|
|
- The fleet container spec in the roadmap: agent-queue/docs/GIGAFACTORY_ROADMAP.md
|
|
§13 lives in the devops-tools repo at ../learning_ai_devops_tools — read it for
|
|
the field lists (fleet_jobs incl. bodyMd + checkpoint; fleet_runs incl. token/
|
|
cost/tool/diff insights; fleet_leases incl. leaseEpoch; fleet_factories;
|
|
fleet_profiles; fleet_events; fleet_artifacts) and §25/§26.
|
|
|
|
PREREQUISITE / BRANCHING:
|
|
- Branch off CURRENT `main` of learning_ai_common_plat.
|
|
- New branch: feat/gigafactory-p2-slice1. Push + open a PR. DO NOT merge.
|
|
|
|
STRICT SCOPE:
|
|
- Add a NEW module: services/platform-service/src/modules/fleet/ (+ its tests).
|
|
- Register the new Cosmos containers via the existing registration path.
|
|
- Do NOT modify unrelated modules. Do NOT hand-edit shared infra (.npmrc,
|
|
docker-prep.sh, tsconfig.base, pnpm-workspace) — those are template-managed.
|
|
- ESM everywhere ("type": "module", .js import suffixes). No `any` (Zod inference
|
|
or explicit types). No console.log (use req.log/app.log). Every Cosmos doc has
|
|
productId. Tests are sacred.
|
|
|
|
DELIVERABLES
|
|
|
|
1. types.ts — Zod schemas + inferred types for each container, each with productId:
|
|
- FleetJobDoc (pk /productId): manifestSnapshot, bodyMd (verbatim instructions),
|
|
stage, idempotencyKey, trackerItemId?, parentId?, kind ('leaf'|'composite',
|
|
default 'leaf'), checkpoint? { wipBranch, wipBase, wipCommit }, priority,
|
|
capabilities[], engineClass?, profile?, deps[], depsMode?, timestamps.
|
|
- FleetRunDoc (pk /jobId): jobId, attempt, factoryId?, engine, profileSnapshot?,
|
|
startedAt, endedAt?, exit?, verifyResult?, result?, and insights: model?,
|
|
tokensIn?, tokensOut?, tokensCached?, costUsd?, estimated?, turns?, toolCalls?,
|
|
filesChanged?, linesAdded?, linesDeleted?.
|
|
- FleetLeaseDoc (pk /jobId): jobId, holderFactoryId?, expiresAt?, leaseEpoch
|
|
(number, default 0), renewals, status. (Fields only — reclaim/claim logic is S2.)
|
|
- FleetFactoryDoc (pk /productId): factoryId, descriptor, capabilities[], health,
|
|
load, lastHeartbeatAt, seatLimit.
|
|
- FleetProfileDoc (pk /productId): name, version, immutable snapshot (persona,
|
|
defaults). FleetEventDoc (pk /jobId): append-only event { type, at, data }.
|
|
FleetArtifactDoc (pk /jobId): pointers to blob-stored artifacts (no inline logs).
|
|
- Define enums for stage and result that MATCH the agent-queue lifecycle.
|
|
|
|
2. repository.ts — one repository per container using the existing datastore
|
|
abstraction (so DB_PROVIDER=memory works in tests, cosmos in prod):
|
|
- CRUD: create, getById, list (by productId; jobs also by stage), update
|
|
(optimistic via _etag where the datastore supports it — expose the etag,
|
|
even though the ATOMIC claim flow is S2), delete where sensible.
|
|
- appendEvent(jobId, event) for the append-only fleet_events stream.
|
|
- All queries partition-aware; no cross-partition fan-out in hot paths.
|
|
|
|
3. container registration — register all fleet_* containers with correct partition
|
|
keys via the existing cosmos container registry; memory provider auto-handles.
|
|
|
|
4. routes.ts — minimal guarded REST under the existing auth + productId middleware:
|
|
- POST /fleet/jobs (create), GET /fleet/jobs (list by stage/productId),
|
|
GET /fleet/jobs/:id, PATCH /fleet/jobs/:id (stage/fields), and read endpoints
|
|
for runs (GET /fleet/jobs/:id/runs) + events. Keep it thin — claim/lease
|
|
endpoints are S2. Validate all bodies with the Zod schemas.
|
|
- Register the route module in the platform-service app the same way items does.
|
|
|
|
TESTS (Vitest — write alongside; memory provider; tests sacred):
|
|
- schema validation: valid docs pass; missing productId / bad enum fail with
|
|
precise errors; at least one invalid case per container.
|
|
- repository CRUD round-trip per container (create→get→list→update→delete) on the
|
|
memory provider; list filters by productId and by stage (jobs).
|
|
- appendEvent produces an ordered, append-only stream for a jobId.
|
|
- routes: create+get+list+patch a job via fastify inject (use the shared testing
|
|
helpers); auth/productId enforced; invalid body rejected.
|
|
- _etag surfaced on update (lost-update guard groundwork) — assert the etag flows.
|
|
|
|
VERIFY GATE (must pass):
|
|
- pnpm --filter @lysnrai/platform-service typecheck
|
|
- pnpm --filter @lysnrai/platform-service test (new tests green; none weakened)
|
|
- pnpm --filter @lysnrai/platform-service build
|
|
|
|
DOCS:
|
|
- Short module README or header docblock describing the containers + that
|
|
claim/lease/fencing is Phase 2 Slice 2.
|
|
- In ../learning_ai_devops_tools roadmap you may NOT edit (different repo) — instead
|
|
note in your report which §13 items are now satisfied so I can tick them.
|
|
|
|
CONSTRAINTS: follow the items-module conventions precisely; ESM .js imports; no any;
|
|
no console.log; productId everywhere; conventional commits (feat(platform-service):
|
|
...); do not touch template-managed infra files.
|
|
|
|
FINAL OUTPUT — print the report in EXACTLY this format:
|
|
|
|
## Implementation Report — Phase 2 Slice 1
|
|
### Branch & commits
|
|
- branch / based-on / PR
|
|
- commits: <sha> <message>
|
|
### Files changed
|
|
- <path>: <one-line summary>
|
|
### What was implemented (1-4)
|
|
- containers + schemas + repos + routes; partition keys; etag handling
|
|
### Tests added
|
|
- <test name>: <assertion> (+ pnpm test summary: N passed)
|
|
### Verify gate results
|
|
- typecheck / test / build: <results>
|
|
### §13 items now satisfied
|
|
- <list which roadmap §13 boxes are done so the human can tick them>
|
|
### Deviations / assumptions
|
|
- <datastore/etag/provider choices>
|
|
### Suggested next slice
|
|
- Phase 2 Slice 2: atomic claim (_etag/If-Match) + lease renew/release + heartbeat
|
|
+ reaper + fencing (leaseEpoch) — the concurrency core.
|