docs(agent-queue): draft Slice 4 (tracker adapter) + Phase 2 Slice 1 (fleet data model)
This commit is contained in:
parent
0443590ce4
commit
7c4f5bc9b0
125
agent-queue/docs/jobs/phase1-slice4.md
Normal file
125
agent-queue/docs/jobs/phase1-slice4.md
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
---
|
||||||
|
engine: devin
|
||||||
|
cwd: /Users/sd9235/code/mygh/learning_ai_devops_tools
|
||||||
|
yolo: true
|
||||||
|
lock: devops-tools
|
||||||
|
timeout: 3h
|
||||||
|
---
|
||||||
|
|
||||||
|
ROLE: Senior engineer. Implement Phase 1 — Slice 4: TRACKER ADAPTER (single host).
|
||||||
|
This CLOSES Phase 1: a task in the tracker can become a job, and job outcomes echo
|
||||||
|
back to the tracker — the task<->job round-trip (§10, the last Phase-1 §14 item).
|
||||||
|
|
||||||
|
SOURCE OF TRUTH: agent-queue/docs/GIGAFACTORY_ROADMAP.md (read §10 tracker
|
||||||
|
integration, §5 manifest incl. tracker-item + idempotency-key, §24.5 one-way echo).
|
||||||
|
|
||||||
|
PREREQUISITE / BRANCHING:
|
||||||
|
- Slice 1, Slice 3, AND Slice 2 (profiles/deps) are merged into `main`. Branch off
|
||||||
|
the CURRENT `main`. This slice MUST run AFTER Slice 2 is merged (it shares
|
||||||
|
agent-queue.sh) — do not start it until then.
|
||||||
|
- New branch: feat/gigafactory-p1-slice4. Push + open a PR. DO NOT merge.
|
||||||
|
- Keep ALL existing selftest checks green (regression).
|
||||||
|
|
||||||
|
STRICT SCOPE:
|
||||||
|
- Edit ONLY under agent-queue/ (agent-queue.sh, selftest.sh, README.md,
|
||||||
|
docs/GIGAFACTORY_ROADMAP.md). No other repo is modified.
|
||||||
|
- You MAY READ (not edit) ../learning_ai_common_plat/services/platform-service/
|
||||||
|
src/modules/items/{types,routes}.ts to match the real Item API contract
|
||||||
|
(paths, fields, auth header). Do not change that repo.
|
||||||
|
- bash, single host, mac+linux safe, zero new runtime deps (curl only).
|
||||||
|
|
||||||
|
CONFIG (all via env; document in README; never hardcode URLs/tokens/secrets):
|
||||||
|
- AQ_TRACKER_API : base URL of the items API (default http://localhost:4003).
|
||||||
|
- AQ_TRACKER_TOKEN : bearer token for auth (required for real calls).
|
||||||
|
- AQ_PRODUCT_ID : productId to stamp/filter (every tracker Item has productId).
|
||||||
|
- A single `tracker_api <method> <path> [json]` wrapper does ALL HTTP via curl
|
||||||
|
(bearer header, content-type, base URL). It MUST be overridable for tests via
|
||||||
|
AQ_TRACKER_API_CMD (a stub script path) so selftest needs NO live service.
|
||||||
|
|
||||||
|
DELIVERABLES
|
||||||
|
|
||||||
|
1. `aq from-tracker <ITEM_ID>` — pull a tracker Item and materialize a job in inbox/:
|
||||||
|
- GET the item via tracker_api; map fields → job frontmatter:
|
||||||
|
title/description -> job body (the instruction markdown, verbatim)
|
||||||
|
item type/labels -> engine-class/profile/capabilities/priority where
|
||||||
|
labels carry them (e.g. label `engine-class:agentic-coder`,
|
||||||
|
`profile:backend-engineer`, `priority:high`,
|
||||||
|
`cap:os:mac`); otherwise sane defaults.
|
||||||
|
item id -> `tracker-item: <ITEM_ID>` and
|
||||||
|
`idempotency-key: tracker-<ITEM_ID>` (stable).
|
||||||
|
- IDEMPOTENT: if a job for this tracker-item already exists in any stage
|
||||||
|
(reuse Slice 1 idempotency on the derived key) → no duplicate enqueue.
|
||||||
|
- On success print the created inbox filename; on missing item → clear error, nonzero.
|
||||||
|
|
||||||
|
2. `aq to-tracker <job>` — push a job's CURRENT outcome to its tracker Item
|
||||||
|
(one-way echo, child -> tracker; §24.5). Only if the job meta has tracker-item.
|
||||||
|
- Map stage/result -> item status PATCH:
|
||||||
|
building/review/testing -> in_progress
|
||||||
|
shipped -> done
|
||||||
|
failed -> blocked (or the API's failure status) + note
|
||||||
|
- Post a comment/note with result, attempts, and insights summary
|
||||||
|
(duration, tokens/cost if present) — reuse Slice 3 metrics. Metrics only,
|
||||||
|
NEVER prompt content or secrets.
|
||||||
|
- IDEMPOTENT: re-running to-tracker for an unchanged outcome is a no-op
|
||||||
|
(track last-echoed state in meta, e.g. `tracker_echoed=<status>`).
|
||||||
|
|
||||||
|
3. Auto-echo hook (opt-in, default OFF): an env flag (e.g. AQ_TRACKER_AUTO=1)
|
||||||
|
makes the worker call `to-tracker` automatically on each stage transition it
|
||||||
|
already performs (enqueue→building→review/testing/failed/shipped). When OFF,
|
||||||
|
echo is manual via the command. Never block/fail a job because an echo failed —
|
||||||
|
log the echo error and continue (the tracker is downstream, not authoritative
|
||||||
|
for execution).
|
||||||
|
|
||||||
|
4. `status` / `aq insights`: show the tracker-item id and last echoed status where
|
||||||
|
present (you already surface tracker-item in status from Slice 1 — extend it).
|
||||||
|
|
||||||
|
TESTS (selftest.sh — only ADD; NO live service — use AQ_TRACKER_API_CMD stub that
|
||||||
|
returns canned JSON and records the calls it received):
|
||||||
|
- from-tracker creates an inbox job: stub returns an item JSON →
|
||||||
|
`aq from-tracker T-1` creates one inbox/*.md whose frontmatter has
|
||||||
|
tracker-item: T-1 and idempotency-key: tracker-T-1, body = item description.
|
||||||
|
- from-tracker label mapping: item with labels [engine-class:agentic-coder,
|
||||||
|
priority:high] → frontmatter reflects them.
|
||||||
|
- from-tracker idempotent: calling it twice for T-1 → exactly one job (dedupe).
|
||||||
|
- to-tracker status echo: a shipped job → stub receives a PATCH to status=done and
|
||||||
|
a comment with the insights summary; assert no prompt body is sent.
|
||||||
|
- to-tracker idempotent: second call with unchanged outcome → no duplicate
|
||||||
|
PATCH/comment (tracker_echoed honored).
|
||||||
|
- echo failure is non-fatal: stub returns HTTP 500 → `to-tracker` logs the error,
|
||||||
|
exits without corrupting job state; the job's stage is unchanged.
|
||||||
|
- REGRESSION: all existing checks (Slice 0/1/2/3) still green.
|
||||||
|
|
||||||
|
DOCS:
|
||||||
|
- README: "Tracker integration" section — from-tracker/to-tracker, the env config,
|
||||||
|
label→manifest mapping table, the one-way-echo rule, AQ_TRACKER_AUTO, and a note
|
||||||
|
that real use needs platform-service running + a token.
|
||||||
|
- docs/GIGAFACTORY_ROADMAP.md: tick the §10 single-host items + the §14 Phase-1
|
||||||
|
"tracker adapter" item; set §0 Phase 1 → complete (or note the exact remaining %).
|
||||||
|
|
||||||
|
CONSTRAINTS: bash style consistent with the script; curl-only HTTP through the one
|
||||||
|
wrapper; mac+linux safe; no emojis; conventional commits; tests sacred.
|
||||||
|
|
||||||
|
VERIFY GATE: bash agent-queue/selftest.sh fully green; bash -n agent-queue.sh;
|
||||||
|
node --check dashboard.mjs; shellcheck --severity=error clean.
|
||||||
|
|
||||||
|
FINAL OUTPUT — print the report in EXACTLY this format:
|
||||||
|
|
||||||
|
## Implementation Report — Phase 1 Slice 4
|
||||||
|
### Branch & commits
|
||||||
|
- branch / based-on: <name>
|
||||||
|
- commits: <sha> <message>
|
||||||
|
- PR: <url or "opened, not merged">
|
||||||
|
### Files changed
|
||||||
|
- <path>: <one-line summary>
|
||||||
|
### What was implemented (1-4)
|
||||||
|
- <item>: <how, key functions; the Item API contract you matched>
|
||||||
|
### Tests added
|
||||||
|
- <test name>: <what it asserts> (+ selftest PASS/FAIL summary)
|
||||||
|
### Verify gate results
|
||||||
|
- selftest / bash -n / node --check / shellcheck: <results>
|
||||||
|
### Deviations / assumptions
|
||||||
|
- <API path/field/status mapping choices; anything stubbed>
|
||||||
|
### Phase 1 status
|
||||||
|
- <which §14 items now complete; what (if anything) remains>
|
||||||
|
### Suggested next slice
|
||||||
|
- Phase 2 Slice 1 (fleet data model + repositories in platform-service)
|
||||||
125
agent-queue/docs/jobs/phase2-slice1.md
Normal file
125
agent-queue/docs/jobs/phase2-slice1.md
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
---
|
||||||
|
engine: devin
|
||||||
|
cwd: /Users/sd9235/code/mygh/learning_ai_common_plat
|
||||||
|
yolo: true
|
||||||
|
lock: common-plat
|
||||||
|
timeout: 4h
|
||||||
|
---
|
||||||
|
|
||||||
|
ROLE: Senior backend/distributed-systems engineer. Implement Phase 2 — Slice 1:
|
||||||
|
the FLEET DATA MODEL + REPOSITORIES as a new platform-service module. This is the
|
||||||
|
durable backbone (§13) that supersedes the single-host stand-ins. NO atomic
|
||||||
|
claim/lease/fencing logic yet — that is Phase 2 Slice 2. This slice is schemas,
|
||||||
|
repositories, container registration, basic guarded CRUD, and tests.
|
||||||
|
|
||||||
|
NOTE: This runs in a DIFFERENT repo (learning_ai_common_plat), so it does NOT
|
||||||
|
conflict with the agent-queue (devops-tools) slices and can run independently.
|
||||||
|
|
||||||
|
READ FIRST (this is NOT the platform-service you may assume — verify conventions):
|
||||||
|
- services/platform-service/src/modules/items/{types,repository,routes}.ts — copy
|
||||||
|
this module pattern EXACTLY (types.ts -> repository.ts -> routes.ts, Zod schemas,
|
||||||
|
the cloud-agnostic datastore, productId on every doc, req.log/app.log).
|
||||||
|
- packages/cosmos (container registry) + how existing modules register containers.
|
||||||
|
- The fleet container spec in the roadmap: agent-queue/docs/GIGAFACTORY_ROADMAP.md
|
||||||
|
§13 lives in the devops-tools repo at ../learning_ai_devops_tools — read it for
|
||||||
|
the field lists (fleet_jobs incl. bodyMd + checkpoint; fleet_runs incl. token/
|
||||||
|
cost/tool/diff insights; fleet_leases incl. leaseEpoch; fleet_factories;
|
||||||
|
fleet_profiles; fleet_events; fleet_artifacts) and §25/§26.
|
||||||
|
|
||||||
|
PREREQUISITE / BRANCHING:
|
||||||
|
- Branch off CURRENT `main` of learning_ai_common_plat.
|
||||||
|
- New branch: feat/gigafactory-p2-slice1. Push + open a PR. DO NOT merge.
|
||||||
|
|
||||||
|
STRICT SCOPE:
|
||||||
|
- Add a NEW module: services/platform-service/src/modules/fleet/ (+ its tests).
|
||||||
|
- Register the new Cosmos containers via the existing registration path.
|
||||||
|
- Do NOT modify unrelated modules. Do NOT hand-edit shared infra (.npmrc,
|
||||||
|
docker-prep.sh, tsconfig.base, pnpm-workspace) — those are template-managed.
|
||||||
|
- ESM everywhere ("type": "module", .js import suffixes). No `any` (Zod inference
|
||||||
|
or explicit types). No console.log (use req.log/app.log). Every Cosmos doc has
|
||||||
|
productId. Tests are sacred.
|
||||||
|
|
||||||
|
DELIVERABLES
|
||||||
|
|
||||||
|
1. types.ts — Zod schemas + inferred types for each container, each with productId:
|
||||||
|
- FleetJobDoc (pk /productId): manifestSnapshot, bodyMd (verbatim instructions),
|
||||||
|
stage, idempotencyKey, trackerItemId?, parentId?, kind ('leaf'|'composite',
|
||||||
|
default 'leaf'), checkpoint? { wipBranch, wipBase, wipCommit }, priority,
|
||||||
|
capabilities[], engineClass?, profile?, deps[], depsMode?, timestamps.
|
||||||
|
- FleetRunDoc (pk /jobId): jobId, attempt, factoryId?, engine, profileSnapshot?,
|
||||||
|
startedAt, endedAt?, exit?, verifyResult?, result?, and insights: model?,
|
||||||
|
tokensIn?, tokensOut?, tokensCached?, costUsd?, estimated?, turns?, toolCalls?,
|
||||||
|
filesChanged?, linesAdded?, linesDeleted?.
|
||||||
|
- FleetLeaseDoc (pk /jobId): jobId, holderFactoryId?, expiresAt?, leaseEpoch
|
||||||
|
(number, default 0), renewals, status. (Fields only — reclaim/claim logic is S2.)
|
||||||
|
- FleetFactoryDoc (pk /productId): factoryId, descriptor, capabilities[], health,
|
||||||
|
load, lastHeartbeatAt, seatLimit.
|
||||||
|
- FleetProfileDoc (pk /productId): name, version, immutable snapshot (persona,
|
||||||
|
defaults). FleetEventDoc (pk /jobId): append-only event { type, at, data }.
|
||||||
|
FleetArtifactDoc (pk /jobId): pointers to blob-stored artifacts (no inline logs).
|
||||||
|
- Define enums for stage and result that MATCH the agent-queue lifecycle.
|
||||||
|
|
||||||
|
2. repository.ts — one repository per container using the existing datastore
|
||||||
|
abstraction (so DB_PROVIDER=memory works in tests, cosmos in prod):
|
||||||
|
- CRUD: create, getById, list (by productId; jobs also by stage), update
|
||||||
|
(optimistic via _etag where the datastore supports it — expose the etag,
|
||||||
|
even though the ATOMIC claim flow is S2), delete where sensible.
|
||||||
|
- appendEvent(jobId, event) for the append-only fleet_events stream.
|
||||||
|
- All queries partition-aware; no cross-partition fan-out in hot paths.
|
||||||
|
|
||||||
|
3. container registration — register all fleet_* containers with correct partition
|
||||||
|
keys via the existing cosmos container registry; memory provider auto-handles.
|
||||||
|
|
||||||
|
4. routes.ts — minimal guarded REST under the existing auth + productId middleware:
|
||||||
|
- POST /fleet/jobs (create), GET /fleet/jobs (list by stage/productId),
|
||||||
|
GET /fleet/jobs/:id, PATCH /fleet/jobs/:id (stage/fields), and read endpoints
|
||||||
|
for runs (GET /fleet/jobs/:id/runs) + events. Keep it thin — claim/lease
|
||||||
|
endpoints are S2. Validate all bodies with the Zod schemas.
|
||||||
|
- Register the route module in the platform-service app the same way items does.
|
||||||
|
|
||||||
|
TESTS (Vitest — write alongside; memory provider; tests sacred):
|
||||||
|
- schema validation: valid docs pass; missing productId / bad enum fail with
|
||||||
|
precise errors; at least one invalid case per container.
|
||||||
|
- repository CRUD round-trip per container (create→get→list→update→delete) on the
|
||||||
|
memory provider; list filters by productId and by stage (jobs).
|
||||||
|
- appendEvent produces an ordered, append-only stream for a jobId.
|
||||||
|
- routes: create+get+list+patch a job via fastify inject (use the shared testing
|
||||||
|
helpers); auth/productId enforced; invalid body rejected.
|
||||||
|
- _etag surfaced on update (lost-update guard groundwork) — assert the etag flows.
|
||||||
|
|
||||||
|
VERIFY GATE (must pass):
|
||||||
|
- pnpm --filter @lysnrai/platform-service typecheck
|
||||||
|
- pnpm --filter @lysnrai/platform-service test (new tests green; none weakened)
|
||||||
|
- pnpm --filter @lysnrai/platform-service build
|
||||||
|
|
||||||
|
DOCS:
|
||||||
|
- Short module README or header docblock describing the containers + that
|
||||||
|
claim/lease/fencing is Phase 2 Slice 2.
|
||||||
|
- In ../learning_ai_devops_tools roadmap you may NOT edit (different repo) — instead
|
||||||
|
note in your report which §13 items are now satisfied so I can tick them.
|
||||||
|
|
||||||
|
CONSTRAINTS: follow the items-module conventions precisely; ESM .js imports; no any;
|
||||||
|
no console.log; productId everywhere; conventional commits (feat(platform-service):
|
||||||
|
...); do not touch template-managed infra files.
|
||||||
|
|
||||||
|
FINAL OUTPUT — print the report in EXACTLY this format:
|
||||||
|
|
||||||
|
## Implementation Report — Phase 2 Slice 1
|
||||||
|
### Branch & commits
|
||||||
|
- branch / based-on / PR
|
||||||
|
- commits: <sha> <message>
|
||||||
|
### Files changed
|
||||||
|
- <path>: <one-line summary>
|
||||||
|
### What was implemented (1-4)
|
||||||
|
- containers + schemas + repos + routes; partition keys; etag handling
|
||||||
|
### Tests added
|
||||||
|
- <test name>: <assertion> (+ pnpm test summary: N passed)
|
||||||
|
### Verify gate results
|
||||||
|
- typecheck / test / build: <results>
|
||||||
|
### §13 items now satisfied
|
||||||
|
- <list which roadmap §13 boxes are done so the human can tick them>
|
||||||
|
### Deviations / assumptions
|
||||||
|
- <datastore/etag/provider choices>
|
||||||
|
### Suggested next slice
|
||||||
|
- Phase 2 Slice 2: atomic claim (_etag/If-Match) + lease renew/release + heartbeat
|
||||||
|
+ reaper + fencing (leaseEpoch) — the concurrency core.
|
||||||
Loading…
Reference in New Issue
Block a user