docs(agent-queue): add gigafactory vision + checklist implementation roadmap
- docs/GIGAFACTORY_ROADMAP.md: distributed multi-machine fleet vision (factory x tool x profile routing) as a checklist-driven, phased implementation roadmap (Phase 0-5) with acceptance criteria, verify gates, and a 100% Definition-of-Done rubric - committed path: coordinator as a platform-service module + control plane on tracker-web, reached via a thin tracker adapter first; bash runner survives as the offline edge factory agent - README: add vision/roadmap pointer
This commit is contained in:
parent
7877e64f90
commit
90366e59bb
@ -5,6 +5,11 @@ A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
|
||||
and they get executed (in auto-approve mode) one slot at a time, moving through
|
||||
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
|
||||
|
||||
> **Vision & roadmap:** where this is headed — a distributed multi-machine "gigafactory"
|
||||
> (fleet of factories × tools × profiles, scheduler-routed, built on platform-service +
|
||||
> tracker-web) — is specified as a checklist-driven implementation roadmap in
|
||||
> [`docs/GIGAFACTORY_ROADMAP.md`](docs/GIGAFACTORY_ROADMAP.md).
|
||||
|
||||
**Build/ship lifecycle — auto-QA, manual ship:**
|
||||
|
||||
```
|
||||
|
||||
461
agent-queue/docs/GIGAFACTORY_ROADMAP.md
Normal file
461
agent-queue/docs/GIGAFACTORY_ROADMAP.md
Normal file
@ -0,0 +1,461 @@
|
||||
# Agent Gigafactory — Vision & Implementation Roadmap
|
||||
|
||||
> **One-liner:** Evolve today's single-host `agent-queue` bash runner into a distributed **gigafactory** — a fleet of heterogeneous machines (Mac/Ubuntu/Windows), each running different coding-agent CLIs (Devin/Codex/Claude/Copilot/…), where a scheduler **auto-picks jobs from a shared inbox and routes each `.md` to the best factory × tool × profile** — built service-side on `platform-service` + `tracker-web`, with the bash runner surviving as the offline edge agent.
|
||||
|
||||
> **How to use this doc:** It is both a PRD and an execution checklist. Every feature is a `- [ ]` checkbox with **acceptance criteria** and a **verify gate**. A phase is "100% done" only when every box is checked, its gate passes, and the phase **Definition of Done** rubric (§16) is green. Update the progress table (§0) as you go.
|
||||
|
||||
---
|
||||
|
||||
## 0. Progress tracker
|
||||
|
||||
| Phase | Theme | Status | % | Gate |
|
||||
| ----- | ----- | ------ | - | ---- |
|
||||
| **0** | Baseline (today) | ✅ shipped | 100% | `selftest.sh` green |
|
||||
| **1** | Manifest + profiles + capabilities + tracker adapter (single host) | ☐ not started | 0% | adapter e2e + selftest |
|
||||
| **2** | Coordinator as platform-service module + Cosmos + multi-factory leasing | ☐ not started | 0% | fleet e2e + module tests |
|
||||
| **3** | Fleet control plane in tracker-web + DAG deps + budgets + scoring router | ☐ not started | 0% | web e2e + router tests |
|
||||
| **4** | Message bus + autoscaling + cross-OS capability marketplace | ☐ not started | 0% | load/chaos suite |
|
||||
| **5** | Self-optimizing / learned routing | ☐ not started | 0% | offline eval + A/B |
|
||||
|
||||
Legend: ☐ not started · ◐ in progress · ✅ done. Keep per-phase checklists below as the source of truth; this table is the summary.
|
||||
|
||||
---
|
||||
|
||||
## 1. Vision & metaphor
|
||||
|
||||
A **gigafactory** turns raw intent (`.md` task files / tracker items) into shipped software with minimal human touch. The mental model is a physical factory network:
|
||||
|
||||
| Term | Meaning |
|
||||
| ---- | ------- |
|
||||
| **Fleet** | The whole network of machines under one control plane. |
|
||||
| **Factory** | One physical/virtual machine (a Mac, an Ubuntu box, a Windows host). Has an OS, installed tools, creds, capacity. |
|
||||
| **Station** | A tool/engine slot inside a factory (a Devin seat, a Codex CLI, a Claude Code session, a Copilot agent). |
|
||||
| **Worker** | A single running agent process executing one job at a station. |
|
||||
| **Job** | A unit of work: a prompt/`.md` + manifest (profile, scope, gates, budget). |
|
||||
| **Profile** | The *role* doing the work (developer, backend engineer, UX/UI designer, QA, reviewer) = persona prompt **+** capability requirements. |
|
||||
| **Capability** | A tag a factory advertises and a job requires (`os:mac`, `has:xcode`, `has:figma`, `gpu`, `engine:devin`). |
|
||||
| **Lease** | A time-boxed claim of a job by a worker; expires → job is reclaimable (crash recovery). |
|
||||
| **Gate** | A checkpoint a job must pass: auto-QA `verify`, human review, ship approval. |
|
||||
| **Artifact** | Any captured output: commits/PRs, logs, screenshots, reports, build outputs. |
|
||||
|
||||
**North star:** drop work into one inbox (or file a tracker task), and the fleet figures out *where* (factory), *with what* (tool/engine), *as whom* (profile), runs it in parallel, self-heals on crash, gates quality automatically, and surfaces everything in one live control plane — while a human only approves the final ship.
|
||||
|
||||
```
|
||||
┌──────────────────────── CONTROL PLANE (tracker-web) ────────────────────────┐
|
||||
│ plan/intake · roadmap · Fleet map · live logs · cost · approvals │
|
||||
└───────────────▲───────────────────────────────────┬─────────────────────────┘
|
||||
│ REST/SSE │
|
||||
┌────────────────────────────┴─────── COORDINATOR (platform-service module) ───────────────┐
|
||||
│ queue · scheduler/router · leases · profiles · capabilities · events · budgets (Cosmos) │
|
||||
└───▲───────────────────────▲───────────────────────▲───────────────────────▲───────────────┘
|
||||
│ claim/lease/report │ │ │
|
||||
┌───────┴───────┐ ┌────────┴───────┐ ┌────────┴───────┐ ┌───────┴────────┐
|
||||
│ FACTORY: mac │ │ FACTORY: ubuntu│ │FACTORY: windows│ │ FACTORY: mac-2 │
|
||||
│ devin, claude │ │ codex, claude │ │ copilot, codex │ │ devin (xcode) │
|
||||
│ [agent-queue] │ │ [agent-queue] │ │ [agent-queue] │ │ [agent-queue] │
|
||||
└───────────────┘ └────────────────┘ └────────────────┘ └────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Current state (Phase 0 baseline — already shipped)
|
||||
|
||||
Today's `agent-queue.sh` + `dashboard.mjs` (single host, zero-dep bash + Node):
|
||||
|
||||
- **Folder kanban lifecycle:** `inbox → building → review → testing → shipped` (+ `failed`).
|
||||
- **Auto-QA gate:** agent rc=0 → `review/`; optional `verify:` runs in `cwd` → pass `testing/`, fail `failed/`; no verify → parks in `review/`. Manual `ship` = the human gate.
|
||||
- **Per-job frontmatter:** `engine` (devin/claude/codex), `cwd`, `yolo` (→ dangerous/auto-approve), `lock` (per-repo serialization), `timeout`, `verify`.
|
||||
- **Concurrency:** `AGENT_QUEUE_MAX` (default 3), per-`lock` serialization so same-repo jobs never collide.
|
||||
- **State & logs:** `.state/<job>.meta` heartbeats + `logs/<job>.log`; git-tracked queue (audit-by-commit).
|
||||
- **Interactive dashboard:** numbered selectable job list, single-key actions (promote/ship/reject/requeue), live log viewer, run/stop, all shelling out to `agent-queue.sh`.
|
||||
|
||||
**Carries forward:** the `.md`-in-`inbox` UX, frontmatter contract, lifecycle stage names, `verify` gate, lock/affinity concept, the bash runner itself (becomes the factory agent).
|
||||
**Must change for the fleet:** single-host run loop → distributed leasing; file-only state → service + Cosmos; one engine choice → capability/profile routing; local dashboard → shared control plane.
|
||||
|
||||
- [x] Phase 0 complete — baseline shipped and self-tested. *(reference, not a work item)*
|
||||
|
||||
---
|
||||
|
||||
## 3. Goals & non-goals
|
||||
|
||||
**Goals**
|
||||
- One intake, many machines: parallel execution across heterogeneous OS/tools.
|
||||
- Automatic routing to the best `factory × tool × profile` with affinity, fairness, budget, and health awareness.
|
||||
- Self-healing (lease expiry/requeue), quality gates, and full observability.
|
||||
- Reuse the ByteLyst stack (`platform-service`, Cosmos, `@bytelyst/*`, tracker-web) — no parallel infra.
|
||||
- Preserve offline/zero-dep edge operation via the bash runner.
|
||||
|
||||
**Non-goals**
|
||||
- Not a CI/CD replacement (it *triggers* CI; CI still gates merges).
|
||||
- Not a general-purpose workflow engine (scoped to coding-agent execution).
|
||||
- Not a model/inference host (it orchestrates agent CLIs, doesn't serve models).
|
||||
- Not abandoning the simple `.md` mental model — humans still drop files / file tasks.
|
||||
|
||||
---
|
||||
|
||||
## 4. Core concepts contract (must hold across all phases)
|
||||
|
||||
- [ ] Every job has a stable **id**, an immutable **manifest**, and an append-only **event log**.
|
||||
- [ ] Every Cosmos document carries `productId` (ByteLyst rule).
|
||||
- [ ] A job in flight is always covered by exactly one **lease**; no lease → reclaimable.
|
||||
- [ ] Lifecycle stages are canonical and shared: `queued → assigned → building → review → testing → shipped` (+ `failed`, `dead_letter`).
|
||||
- [ ] The bash runner and the service speak the **same manifest + event vocabulary** (one schema, two transports).
|
||||
|
||||
---
|
||||
|
||||
## 5. The evolved Job manifest (feature)
|
||||
|
||||
Extend today's frontmatter into a richer, **backward-compatible** manifest. Old `.md` files keep working (new fields optional with sane defaults).
|
||||
|
||||
```yaml
|
||||
---
|
||||
# --- existing (unchanged) ---
|
||||
engine: devin # explicit engine; overrides profile/engine-class
|
||||
cwd: /abs/path/repo
|
||||
yolo: true
|
||||
lock: my-repo
|
||||
timeout: 45m
|
||||
verify: pnpm -s test
|
||||
# --- new ---
|
||||
profile: backend-engineer # role: persona + capability requirements
|
||||
engine-class: agentic-coder # abstract; scheduler picks a concrete engine if `engine` unset
|
||||
capabilities: [os:any, node>=20] # hard requirements a factory MUST satisfy
|
||||
prefers: [factory:mac-2] # soft routing hints (affinity)
|
||||
priority: high # critical|high|medium|low → SLA + preemption
|
||||
budget: { usd: 5, tokens: 2M, wall: 4h } # hard ceilings; exceed → pause/fail
|
||||
deps: [job-123, job-456] # DAG: don't start until these reach `shipped`/`testing`
|
||||
idempotency-key: nomgap-ux-2 # dedupe: a second identical submit is a no-op
|
||||
retry: { max: 2, backoff: 5m, on: [timeout, verify_failed] }
|
||||
review-policy: manual # auto|manual|reviewers:[@alice]
|
||||
artifacts: [coverage, screenshots] # what to capture beyond commits
|
||||
tracker-item: ITEM-789 # link back to the originating tracker task
|
||||
---
|
||||
```
|
||||
|
||||
- [ ] Define the manifest schema (Zod in the service; documented YAML for `.md`).
|
||||
- [ ] Backward-compat: a Phase-0 `.md` (only `engine/cwd/yolo`) parses with all new fields defaulted.
|
||||
- [ ] `idempotency-key` dedupe semantics specified (same key + same content hash = no-op).
|
||||
- [ ] `deps` DAG semantics specified (blocked state, cycle detection, fan-in/out).
|
||||
- **Acceptance:** a manifest fixture suite parses/validates; invalid manifests fail with precise errors.
|
||||
- **Verify gate:** schema unit tests (≥ 1 per field incl. defaults + 5 invalid cases).
|
||||
|
||||
---
|
||||
|
||||
## 6. Profiles — persona + capability (feature)
|
||||
|
||||
A **profile** = a versioned file combining a persona (system-prompt overlay), required capabilities, default gates, preferred engine/model, and allowed repo scopes. Stored as `profiles/<name>.md` (Phase 1) → Cosmos `profiles` container (Phase 2).
|
||||
|
||||
```yaml
|
||||
# profiles/backend-engineer.md
|
||||
---
|
||||
name: backend-engineer
|
||||
persona: |
|
||||
You are a senior backend engineer. Favor minimal, well-tested changes...
|
||||
capabilities: [node>=20, has:pnpm]
|
||||
default-verify: pnpm -s typecheck && pnpm -s test
|
||||
engine-class: agentic-coder
|
||||
prefers-engine: [devin, claude]
|
||||
allowed-scope: ["backend/**", "packages/**"] # blast-radius guardrail
|
||||
review-policy: manual
|
||||
---
|
||||
```
|
||||
|
||||
- [ ] Author starter catalog: `developer`, `backend-engineer`, `frontend-engineer`, `ux-designer`, `ui-designer`, `qa`, `reviewer`, `docs-writer`.
|
||||
- [ ] Persona overlay is **prepended** to the job body before the agent runs (and stripped from logs of secrets).
|
||||
- [ ] Profile supplies default `verify`, `capabilities`, `engine-class`, `allowed-scope` when the job omits them.
|
||||
- [ ] Profile versioning: changing a profile doesn't mutate in-flight jobs (snapshot at assign time).
|
||||
- [ ] `allowed-scope` enforced as a guardrail (warn in P1, enforce/deny in P2 via pre-flight diff check).
|
||||
- **Acceptance:** a job with `profile: backend-engineer` and no `verify` inherits the profile's verify + persona.
|
||||
- **Verify gate:** profile-resolution unit tests; persona-injection golden test.
|
||||
|
||||
---
|
||||
|
||||
## 7. The scheduler / router (the heart) (feature)
|
||||
|
||||
Given a `queued` job and the current fleet, choose `(factory, station/engine, profile)` and issue a lease.
|
||||
|
||||
**Inputs:** job manifest (capabilities, priority, budget, deps, prefers, lock), profile requirements, live factory descriptors (capabilities, load, health, cost class), lock/affinity table, fairness counters.
|
||||
|
||||
**Algorithm (deterministic, explainable):**
|
||||
1. **Filter** factories by **hard capability match** (job ∪ profile capabilities ⊆ factory capabilities) and free station for a compatible engine.
|
||||
2. **Block** if `deps` unmet or `lock` already held → leave `queued`/`blocked`.
|
||||
3. **Score** each candidate factory:
|
||||
`score = w1·capabilityFit + w2·affinity(prefers, repo-stickiness) + w3·(1/load) + w4·costFit(budget) + w5·health − w6·starvationPenalty`
|
||||
4. **Tie-break:** highest priority job first; then oldest; then lowest cost class.
|
||||
5. **Assign** → write lease (TTL), set job `assigned`, decrement station capacity, bump fairness counter.
|
||||
6. **Preemption (P3+):** a `critical` job may pause a `low` job at a needed station (checkpoint + requeue).
|
||||
|
||||
- [ ] Implement pure, unit-testable scoring function (no I/O) with configurable weights.
|
||||
- [ ] Hard-filter correctness: never assign a job to a factory missing a required capability.
|
||||
- [ ] Affinity/stickiness: same-repo jobs prefer the factory that has the warm checkout (lock-aware).
|
||||
- [ ] Fairness: no factory or product starves under sustained load (counter + penalty).
|
||||
- [ ] Explainability: every assignment records *why* (matched caps, score breakdown) in the event log.
|
||||
- [ ] Determinism: same inputs → same decision (seeded tie-breaks) for testability.
|
||||
- **Acceptance:** scenario fixtures (10+) produce expected assignments incl. starvation + capability-miss + budget-exceed.
|
||||
- **Verify gate:** router unit suite ≥ 95% branch coverage on the scoring/filter core.
|
||||
|
||||
---
|
||||
|
||||
## 8. Factory model & registration (feature)
|
||||
|
||||
Each machine runs a **factory agent** (the evolved `agent-queue` runner) that registers, heartbeats, claims jobs, and reports events.
|
||||
|
||||
- [ ] **Capability auto-detection** at boot: OS, installed engines (devin/claude/codex/copilot), tool probes (xcode, figma-cli, docker, gpu), node/pnpm versions, available creds (presence only, never values).
|
||||
- [ ] **Registration**: `POST /fleet/factories` with descriptor → receives a factory id + token.
|
||||
- [ ] **Heartbeat**: periodic `PUT /fleet/factories/:id/heartbeat` (load, free stations, health); missed N → factory marked `offline`, its leases reclaimed.
|
||||
- [ ] **Claim loop**: `POST /fleet/leases/claim` advertising capabilities/free stations; receives a job + lease TTL.
|
||||
- [ ] **Report**: stream stage/log/event back (`POST /fleet/runs/:id/events`); renew lease while alive.
|
||||
- [ ] **Graceful drain**: factory can stop claiming, finish in-flight, deregister.
|
||||
- **Acceptance:** a factory registers, claims a matching job, heartbeats, completes, and a killed factory's job is reclaimed by another within the lease TTL.
|
||||
- **Verify gate:** factory-agent integration test against a mock coordinator; crash-recovery test.
|
||||
|
||||
---
|
||||
|
||||
## 9. Coordination architecture (decision + path)
|
||||
|
||||
Three transports were evaluated. **Decision: platform-service-native coordinator is the spine; git-queue stays for the offline edge; broker added only at scale.**
|
||||
|
||||
| Option | Pros | Cons | Verdict |
|
||||
| ------ | ---- | ---- | ------- |
|
||||
| (a) **Git-synced queue** (evolve folders) | zero infra, audit-by-commit, offline | weak/racey leasing, latency, merge churn | **Edge/offline only** |
|
||||
| (b) **Coordinator service** (platform-service module) | real leases, fairness, observability, reuses auth/Cosmos/productId | a service to run | **Chosen spine (P2)** |
|
||||
| (c) **Message broker** (NATS/Redis/SQS) | scale, backpressure, push dispatch | most moving parts/ops | **P4 when throughput demands** |
|
||||
|
||||
- [ ] Document the decision + rationale in-repo (this section is the canonical record).
|
||||
- [ ] Define the **claim/lease protocol** once; both git-queue (poll) and service (API) implement it.
|
||||
- [ ] Offline-degrade: a factory cut off from the coordinator falls back to its local git-queue and reconciles on reconnect (idempotency-key prevents double-execution).
|
||||
- **Acceptance:** the same job manifest runs identically through the bash/git path and the service path.
|
||||
- **Verify gate:** contract test asserting protocol parity (git vs service).
|
||||
|
||||
---
|
||||
|
||||
## 10. tracker-web / platform-service integration (committed path)
|
||||
|
||||
**Layering:** tracker = *WHAT/WHY* (plan, intake, prioritize, roadmap, votes) · gigafactory = *HOW* (execute) · platform-service = shared brain · agent-queue runner = offline edge. Grounded in the real `tracker-service` model (`Item`: `type` bug/feature/**task**, `status` open/in_progress/done/closed/wont_fix, priority, labels, assignee, `source` incl. **auto_detected**, votes, comments, public roadmap) and the `tracker-web` `/api/tracker/[...path]` proxy pattern.
|
||||
|
||||
### Phase 1 — Adapter (no new infra)
|
||||
- [ ] **task → job**: a tracker `Item` of `type: task` (e.g. `assignee: @agent` or label `agent:run`) is exported to a job `.md` (manifest mapped: title/description → body, priority → priority, labels → capabilities/profile hints).
|
||||
- [ ] **job → tracker**: lifecycle events post back as **status updates + comments** — `building` → status `in_progress` + comment "started on factory X"; `shipped` → `done` + comment with commit SHAs / PR link / verify results; `failed` → comment with reason (status stays `in_progress` for human triage).
|
||||
- [ ] Idempotency: re-running the adapter for the same item doesn't create duplicate jobs (idempotency-key = item id + content hash).
|
||||
- [ ] Adapter is a thin script/CLI (`aq from-tracker ITEM-789`) + optional poller.
|
||||
- **Acceptance:** filing a tracker task, marking it `agent:run`, results in a queued job; on ship, the item flips to `done` with a SHA comment.
|
||||
- **Verify gate:** adapter e2e against a tracker-service test instance (or mock); round-trip assertion.
|
||||
|
||||
### Phase 2 — Native spine
|
||||
- [ ] Stand up a `fleet` (a.k.a. `orchestrator`) module **inside platform-service**, sibling to `tracker-service`: pattern `types.ts → repository.ts → routes.ts`, ESM, Cosmos, `productId`, `req.log`.
|
||||
- [ ] Endpoints: jobs CRUD, claim/lease, events/report, factories register/heartbeat, profiles, stats.
|
||||
- [ ] Runners (bash + any) become API clients of this module; tracker adapter calls it directly.
|
||||
- **Acceptance:** a job submitted via the module is claimed by a real factory and shipped, with all state in Cosmos.
|
||||
- **Verify gate:** module test suite (repository + routes) using the shared `@bytelyst/testing` inject helpers.
|
||||
|
||||
### Phase 3 — Unified control plane
|
||||
- [ ] Add a **Fleet** surface to `tracker-web` reusing auth/Primitives/DataTable/product switcher: fleet map (factories + load/health), job table, job DAG, **live log streaming (SSE)**, lease/heartbeat status, cost burndown, approve/ship buttons.
|
||||
- [ ] The Node TUI dashboard becomes a thin client of the same `/fleet` API (parity with web).
|
||||
- **Acceptance:** an operator can watch all factories + tail any job log + ship from the browser.
|
||||
- **Verify gate:** web e2e (Playwright) covering fleet map render, live log, and a ship action.
|
||||
|
||||
---
|
||||
|
||||
## 11. Lifecycle & gates at scale (feature)
|
||||
|
||||
- [ ] Canonical stages enforced server-side: `queued → assigned → building → review → testing → shipped` (+ `failed`, `dead_letter`).
|
||||
- [ ] Per-profile default `verify`; per-job override; verify runs at the factory, result reported as an event.
|
||||
- [ ] Human gates: `review-policy` routes to reviewers; multi-reviewer support (P3).
|
||||
- [ ] **Dead-letter**: after `retry.max` exhausted, job → `dead_letter` with full diagnostics; never silently dropped.
|
||||
- [ ] **Backpressure**: when no factory can take more, jobs stay `queued` (no thrash); SLA timers visible.
|
||||
- **Acceptance:** a perpetually-failing job lands in `dead_letter` after configured retries; a passing one auto-advances to `testing` then waits for human `ship`.
|
||||
- **Verify gate:** lifecycle state-machine unit tests (all transitions + illegal-transition rejection).
|
||||
|
||||
---
|
||||
|
||||
## 12. Security, safety & governance (feature — critical with `yolo`/dangerous)
|
||||
|
||||
- [ ] **Secret isolation**: creds live on each factory (env/keychain), **never** in the queue, manifest, logs, or Cosmos. Factory advertises *presence* of a cred capability, not the value.
|
||||
- [ ] **Scoped git tokens** per factory/repo; least-privilege; rotation documented.
|
||||
- [ ] **Push policy**: protected branches; agents push to feature branches + open PRs by default; direct-to-main gated by profile/flag.
|
||||
- [ ] **Blast-radius guardrail**: enforce `allowed-scope` — pre-flight + post-run diff check; out-of-scope changes block the ship gate.
|
||||
- [ ] **Budget kill-switch**: exceed `budget` (usd/tokens/wall) → pause worker, alert, require human resume.
|
||||
- [ ] **Supply-chain safety**: edits to shared `@bytelyst/*` packages require `reviewer` profile + human gate (never auto-ship).
|
||||
- [ ] **Audit trail**: append-only event log per job (who/what/when/where/cost); immutable.
|
||||
- [ ] **Corp network/proxy**: honor `NETWORK`/proxy + truststore conventions on factories that need them.
|
||||
- [ ] **Kill switch (global)**: one command/flag halts all claiming fleet-wide (incident response).
|
||||
- **Acceptance:** a job attempting an out-of-scope edit is blocked at the gate; a budget overrun pauses and alerts; no secret ever appears in any persisted artifact (scanner test).
|
||||
- **Verify gate:** security test suite incl. a secret-leak scanner over logs/meta + scope-enforcement test.
|
||||
|
||||
---
|
||||
|
||||
## 13. Data model (Cosmos containers, P2+)
|
||||
|
||||
Each container partitioned sensibly; every doc has `productId`.
|
||||
|
||||
- [ ] `fleet_jobs` (pk `/productId`) — manifest snapshot, current stage, idempotency-key, tracker-item link.
|
||||
- [ ] `fleet_runs` (pk `/jobId`) — one per execution attempt: factory, engine, profile snapshot, timings, cost, exit, verify result.
|
||||
- [ ] `fleet_leases` (pk `/jobId`) — holder factory, TTL, renewals; TTL index for auto-expiry.
|
||||
- [ ] `fleet_factories` (pk `/productId`) — descriptor, capabilities, health, load, last heartbeat.
|
||||
- [ ] `fleet_profiles` (pk `/productId`) — versioned profile snapshots.
|
||||
- [ ] `fleet_events` (pk `/jobId`) — append-only audit/event stream (stage changes, logs ptr, cost ticks, decisions).
|
||||
- [ ] Relate to existing tracker `Item` via `tracker-item` (no duplication of planning data).
|
||||
- **Acceptance:** repository CRUD + query tests per container; lease TTL expiry verified.
|
||||
- **Verify gate:** repository unit/integration tests (memory + Cosmos provider via `DB_PROVIDER`).
|
||||
|
||||
---
|
||||
|
||||
## 14. Phased build roadmap (checklists)
|
||||
|
||||
Each phase: **Goal → checklist → Exit criteria**. Don't start a phase until the prior phase's Exit criteria are green. Tick boxes here as the canonical progress.
|
||||
|
||||
### Phase 1 — Manifest + profiles + capabilities + tracker adapter (single host)
|
||||
**Goal:** richer single-host runner that understands profiles/capabilities and bridges to tracker — no distributed infra yet.
|
||||
|
||||
- [ ] Extend `agent-queue.sh` frontmatter parsing for all new manifest fields (§5), defaulted + backward-compatible.
|
||||
- [ ] Add `profiles/` directory + profile resolution (persona injection, default verify/caps/scope) (§6).
|
||||
- [ ] Local capability detection + a job/factory capability match check before launch (§8 subset).
|
||||
- [ ] `priority` ordering in the inbox pick (replace pure FIFO with priority-then-age).
|
||||
- [ ] `deps` (DAG) blocking on a single host; `idempotency-key` dedupe on `add`.
|
||||
- [ ] `retry` with backoff into `failed`/requeue; `budget.wall` enforced (extends `timeout`).
|
||||
- [ ] `allowed-scope` guardrail (warn-only this phase) + post-run diff report.
|
||||
- [ ] **Tracker adapter** `aq from-tracker <ITEM>` + `aq to-tracker` event poster (§10 P1).
|
||||
- [ ] Dashboard shows profile + priority + capability tags + tracker-item link.
|
||||
- [ ] Update `selftest.sh` with: manifest parse fixtures, profile resolution, priority order, dep-block, idempotency, adapter round-trip (mock).
|
||||
- [ ] Update README + this doc's progress table.
|
||||
- **Exit criteria:** all boxes ✅; `selftest.sh` green; a tracker task → executed → tracker `done` with SHA comment, fully on one host; no regression to Phase-0 `.md` files.
|
||||
|
||||
### Phase 2 — Coordinator as platform-service module + Cosmos + multi-factory leasing
|
||||
**Goal:** the service spine; ≥2 real factories executing in parallel via leases.
|
||||
|
||||
- [ ] Scaffold `fleet`/`orchestrator` module in `platform-service` (`types/repository/routes`, Zod, ESM, `productId`).
|
||||
- [ ] Cosmos containers (§13) + repository layer (memory + Cosmos providers).
|
||||
- [ ] Claim/lease protocol endpoints + TTL expiry + reclaim (§8, §9).
|
||||
- [ ] Port `agent-queue` runner to a **factory agent** API client (register/heartbeat/claim/report) while keeping git-queue fallback.
|
||||
- [ ] Scheduler/router core (§7) as a pure module + wired into assignment.
|
||||
- [ ] Tracker adapter calls the module directly (not just file export).
|
||||
- [ ] Auth: factory tokens; scoped; secret isolation enforced (§12 subset).
|
||||
- [ ] Module test suite (repository + routes via `@bytelyst/testing`); crash-recovery + lease-expiry tests.
|
||||
- [ ] Two-factory demo (e.g. mac + ubuntu) running 3 parallel jobs end-to-end.
|
||||
- **Exit criteria:** all boxes ✅; `pnpm --filter @lysnrai/platform-service test` green; killing a factory mid-job → another reclaims and completes; all state in Cosmos with `productId`.
|
||||
|
||||
### Phase 3 — Fleet control plane in tracker-web + DAG + budgets + scoring router
|
||||
**Goal:** one browser control plane; smart routing + budgets live.
|
||||
|
||||
- [ ] `fleet` API client in `tracker-web` (reuse `/api/tracker`-style proxy → `/fleet`).
|
||||
- [ ] Fleet map page (factories, load, health, capabilities) on `@bytelyst/*` components.
|
||||
- [ ] Job table + job detail + **DAG view**; live log via **SSE**; approve/ship/reject/requeue actions.
|
||||
- [ ] Cost burndown + budget kill-switch UI; multi-reviewer routing.
|
||||
- [ ] Scoring router with configurable weights + explainability surfaced in UI.
|
||||
- [ ] Preemption of low-priority by critical jobs (checkpoint + requeue).
|
||||
- [ ] TUI dashboard re-pointed at `/fleet` API (parity).
|
||||
- [ ] Web e2e (Playwright): fleet map, live log, ship, budget-pause.
|
||||
- **Exit criteria:** all boxes ✅; web `verify` (typecheck+lint+test+e2e) green; an operator runs the whole 3-repo parallel workload from the browser, including a budget pause + resume.
|
||||
|
||||
### Phase 4 — Message bus + autoscaling + cross-OS capability marketplace
|
||||
**Goal:** scale-out and elasticity.
|
||||
|
||||
- [ ] Introduce broker (NATS/Redis) for push dispatch + backpressure; coordinator publishes, factories subscribe by capability.
|
||||
- [ ] Autoscaling hooks (spin ephemeral factories: cloud VM / container) keyed to queue depth + SLA.
|
||||
- [ ] Capability "marketplace": jobs requiring rare caps (xcode/figma/gpu) routed to the few factories that have them; queueing fairness across products.
|
||||
- [ ] Load + chaos test suite (factory churn, broker outage, thundering herd).
|
||||
- **Exit criteria:** all boxes ✅; sustained N×throughput vs Phase 3 under load test; graceful degradation on broker outage (fallback to poll).
|
||||
|
||||
### Phase 5 — Self-optimizing / learned routing
|
||||
**Goal:** the scheduler learns from history to cut time/cost and raise first-pass success.
|
||||
|
||||
- [ ] Capture outcome features per run (engine, profile, repo, duration, cost, verify pass, human-edit rate).
|
||||
- [ ] Offline eval harness comparing learned vs heuristic routing on historical data.
|
||||
- [ ] Shadow/A-B rollout with guardrails; auto-tune scoring weights.
|
||||
- [ ] Recommendations surfaced ("route NomGap UX jobs to claude on mac-2: 23% faster, 11% cheaper").
|
||||
- **Exit criteria:** all boxes ✅; learned router beats heuristic on the eval set without regressing safety gates; A/B shows measurable improvement on a target metric.
|
||||
|
||||
---
|
||||
|
||||
## 15. Cross-cutting feature catalog (quick index)
|
||||
|
||||
| Feature | First phase | Section |
|
||||
| ------- | ----------- | ------- |
|
||||
| Evolved job manifest | P1 | §5 |
|
||||
| Profiles (persona + capability) | P1 | §6 |
|
||||
| Capability matching | P1→P2 | §6/§8 |
|
||||
| Priority + SLA | P1 | §5/§7 |
|
||||
| DAG dependencies | P1→P3 | §5/§11 |
|
||||
| Idempotency / dedupe | P1 | §5 |
|
||||
| Retry + dead-letter | P1→P2 | §11 |
|
||||
| Budgets + kill-switch | P1(wall)→P3 | §5/§12 |
|
||||
| Scheduler/router scoring | P2→P3 | §7 |
|
||||
| Factory registration/heartbeat/lease | P2 | §8 |
|
||||
| Coordinator (platform-service module) | P2 | §9/§10 |
|
||||
| Cosmos data model | P2 | §13 |
|
||||
| Tracker bi-directional sync | P1→P2 | §10 |
|
||||
| Web control plane + SSE logs | P3 | §10/§17 |
|
||||
| Security/scope/secret isolation | P1→P2 | §12 |
|
||||
| Broker + autoscaling | P4 | §14 |
|
||||
| Learned routing | P5 | §14 |
|
||||
|
||||
---
|
||||
|
||||
## 16. Definition of Done — the "100% accuracy" rubric
|
||||
|
||||
A feature/phase is **not done** until **every** item below is true (this is the bar for "100% end-to-end"):
|
||||
|
||||
- [ ] **Functionality**: acceptance criteria met; happy path + documented edge cases handled.
|
||||
- [ ] **Tests**: unit + integration written *first or alongside*, all green; no weakened/deleted tests; coverage targets met (router ≥95% core).
|
||||
- [ ] **Verify gate**: the phase's named gate command passes locally (and in CI where applicable).
|
||||
- [ ] **Idempotency & recovery**: re-runs are safe; crash mid-step recovers (lease/idempotency).
|
||||
- [ ] **Security review**: secret-leak scan clean; scope guardrail honored; least-privilege tokens.
|
||||
- [ ] **Observability**: events/logs/metrics emitted; failures are diagnosable from the control plane.
|
||||
- [ ] **Docs**: this roadmap's checkboxes ticked; README/AGENTS updated; manifest/profile docs current.
|
||||
- [ ] **Backward-compat**: existing `.md`/Phase-0 behavior unbroken (regression check).
|
||||
- [ ] **Drift checks**: shared-infra templates (`.npmrc`, `docker-prep`) untouched/synced; conventional commits.
|
||||
- [ ] **No `console.log`/`print`** in service code; `req.log`/`os.Logger` used; ESM `.js` imports.
|
||||
|
||||
---
|
||||
|
||||
## 17. Observability & control plane details
|
||||
|
||||
- [ ] **Live logs** via SSE from factory → coordinator → web/TUI (single stream contract).
|
||||
- [ ] **Metrics**: queue depth, assign latency, run duration, verify pass-rate, cost, factory utilization, fairness.
|
||||
- [ ] **Alerting**: stall (no log N min), failure spikes, budget breach, factory offline, dead-letter.
|
||||
- [ ] **Tracing**: a job's full timeline (queued→…→shipped) reconstructable from `fleet_events`.
|
||||
- [ ] **Cost burndown** per job/product/day with budget overlays.
|
||||
|
||||
---
|
||||
|
||||
## 18. Risks & gaps explicitly tracked (expert call-outs)
|
||||
|
||||
- [ ] **Duplicate execution** across transports (git fallback + service) — mitigated by `idempotency-key` + lease.
|
||||
- [ ] **Crash recovery** — lease TTL + reclaim; checkpoint long jobs where engines allow.
|
||||
- [ ] **Shared-package conflicts** — two jobs editing `@bytelyst/*` simultaneously → lock + reviewer gate.
|
||||
- [ ] **Starvation/fairness** — per-product + per-factory counters with penalty.
|
||||
- [ ] **Cost runaway** — hard budgets + global kill switch.
|
||||
- [ ] **Tool-version drift / reproducibility** — record engine + tool versions per run; pin where possible.
|
||||
- [ ] **Windows quirks** — path/shell differences in the factory agent; capability-gate Windows-only work.
|
||||
- [ ] **Human-review bottleneck** — auto-verify as much as possible; batch review UI; reviewer routing.
|
||||
- [ ] **Result capture beyond commits** — artifacts (coverage, screenshots, build logs) attached to runs.
|
||||
- [ ] **Secret sprawl** — never in queue/manifest/logs/Cosmos; presence-only capabilities.
|
||||
- [ ] **Data retention** — event/log retention + archival policy (extend today's `clean`).
|
||||
- [ ] **Engine API churn** — engines mapped in one place (`build_agent_cmd`); capability matrix versioned.
|
||||
|
||||
---
|
||||
|
||||
## 19. Success metrics
|
||||
|
||||
- Throughput: jobs shipped/day; parallel utilization (% of fleet busy).
|
||||
- Quality: % auto-verified, first-pass success rate, escaped-defect rate, human-edit rate post-agent.
|
||||
- Speed: mean time queued→shipped; assign latency.
|
||||
- Cost: $/shipped job; budget-breach rate.
|
||||
- Reliability: lease-reclaim success, dead-letter rate, factory uptime.
|
||||
- Fairness: max/min product wait-time ratio.
|
||||
|
||||
---
|
||||
|
||||
## 20. Open questions
|
||||
|
||||
- [ ] Copilot headless feasibility as an engine/station (CLI/automation surface?).
|
||||
- [ ] Who owns merge/push authority — agents open PRs only, or auto-merge on green for low-risk profiles?
|
||||
- [ ] Multi-user/tenant: per-user queues + RBAC in the control plane?
|
||||
- [ ] On-call/ownership for the fleet (alerts routing, runbooks)?
|
||||
- [ ] Cloud factory provisioning (Phase 4) — which provider/runtime, cost guardrails?
|
||||
- [ ] Profile authorship/governance — who can create/edit profiles, and review of persona prompts?
|
||||
|
||||
---
|
||||
|
||||
*This document is the single source of truth for the gigafactory build. Keep the §0 table and per-phase checkboxes updated; a phase ships only when its Exit criteria and the §16 Definition-of-Done rubric are fully green.*
|
||||
|
||||
Loading…
Reference in New Issue
Block a user