Adds AQ_FLEET_GATE (default OFF): the run loop point-reads the cheap per-product
queue version (GET /fleet/queue-state) and SKIPS the expensive /fleet/claim while
the version is unchanged and it is not mid-drain, with a periodic safety backstop
and fail-open-on-read-error so work is never stranded. Keeps POLL_SECONDS for
local job responsiveness rather than raising it globally. selftest 39b covers the
gate decisions; reconciles the M0 section of the dispatch redesign doc.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Proposes moving fleet work-dispatch off Cosmos busy-polling onto Azure Service
Bus in a coordinator-owns-scheduling / broker-owns-delivery hybrid, fixing the
product-as-queue routing smell and the idle-poll RU cost. Includes phased
migration (M0 RU quick win -> shadow -> cutover -> scale-to-zero) with a ticked
checklist. Self-reviewed (v2) for the outbox/change-feed, message-size,
long-job lock, idempotency, and routing-model consistency issues.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Devin's export has tokens but no USD cost; estimate cost_usd from a per-model
$/1M price map (Opus/Sonnet/Haiku) and flag usage_estimated so the dashboard
shows it as approx.
When AQ_FLEET_REPO_BASE/<repo> is an existing checkout, create a git worktree off it
for branch aq/job/<id> (shares objects + remotes, leaves the main checkout
untouched) instead of cloning. Falls back to clone for remote-only repos. selftest
exercises the worktree path.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Claim now carries verify (drives the existing verify gate -> PR opens only if it
passes) and autoMerge (squash-merge via gh pr merge after the PR opens, non-fatal).
selftest covers both.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
In PR mode the agent is asked to write .aq_pr.md (line 1 = PR title, then a markdown
description) based on the task + the diff it produced. The factory reads it for
`gh pr create` (via --body-file) and removes it before committing (never part of the
PR). Falls back to a derived title if absent. selftest asserts the authored title is
used and .aq_pr.md is not committed.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
MVP: when AQ_FLEET_REPO_BASE/<repo> is an existing local checkout, use it as the
clone source (fast, no network) and push/PR to its GitHub origin — embedded creds
in the local origin URL are stripped (gh credential helper handles auth). Selftest
PASS (full-path bare-repo fallback unchanged).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When AQ_FLEET_PR=1 and a claimed fleet job carries a `repo`, run the agent in an
isolated checkout on branch aq/job/<fleetJobId> (off baseBranch), then on a passing
verify commit/push and `gh pr create`. The PR URL + branch are recorded in the meta
and reported on lease release (-> the coordinator stores them on the run).
- fleet-client: parse repo/baseBranch from the claim, carry them in frontmatter;
fleet_report_insights now sends prUrl/branch.
- _fleet_pr_prepare (clone/fetch + branch, local-path aware, identity fallback) and
_fleet_pr_open (commit/push/gh pr create). WIP checkpointing is skipped for PR jobs
(the pushed branch is the durable artifact).
- New flags: AQ_FLEET_PR, AQ_FLEET_REPOS_DIR, GH_BIN. README documented.
- selftest: +1 case (bare-repo origin + gh stub) — branch pushed, PR opened, prUrl
reported on release. Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Previously the factory reported up to `review` and "shipping is always manual",
so a coordinator job never reached a terminal stage autonomously.
- On a passing local verify, always report `testing` to the coordinator so its
stage reflects that QA passed (was stuck at `review`).
- New AQ_FLEET_AUTOSHIP=1: the factory's verify gate IS the test phase, so advance
the coordinator job testing -> shipped and land it in shipped/ locally. This
closes the testing->shipped gap for an autonomous submit -> shipped pipeline.
Default off keeps the human review gate authoritative (job rests at testing).
selftest: +2 cases (autoship reports testing+shipped + lands in shipped/; autoship
OFF reports testing but withholds shipped). Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Devin does not surface token/cost in its stdout or local log, so parse_usage
previously emitted nothing for the devin engine (runs showed no metrics). Devin
DOES expose per-step usage in its ATIF conversation export.
- build_agent_cmd: pass `--export <path>` for the devin engine (path derived from
the job log path so parse_usage can find it; harmless 4th arg for other engines).
- parse_usage devin: read the export and sum per-step metadata.metrics
input_tokens / output_tokens / cache_read_tokens; take model from agent.model_name.
Pure grep/awk, no new dependency. USD cost is left unset (the export carries token
counts but not cost) — the dashboard shows tokens + model, cost stays blank.
These feed fleet_report_insights, so live devin fleet runs now report tokens +
model to the coordinator (verified live: model "Claude Opus 4.8", tokensIn/out +
cache populated on a real run).
selftest: +1 case (parse_usage devin sums per-step tokens + model from --export).
Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
#1 fleet_report_insights: on a successful fleet run the factory now reports the
parsed cost/token/effort metrics (model, tokensIn/Out/cached, costUsd, turns,
toolCalls) plus the run result onto the coordinator run via POST
.../lease/release (which also frees the lease). parse_usage already extracted
these into the job meta; they were never sent. Engines that do not expose usage
locally (devin) still land result + endedAt.
#2 normalize AQ_FLEET_API: platform-service mounts fleet under /api, so a base
without it silently returned 404 on every call. Strip a trailing slash and
append /api unless already present, so AQ_FLEET_API=http://host:4003 works too.
selftest: +2 cases (insights reported via lease/release; API-base normalization).
Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Document how the daemon + agents must run after a review found jobs executing
in --yolo/dangerous mode directly against live working trees (the root cause of
repo dirtiness + duplicate commits). Policy: per-job worktree off origin/main,
branch-per-task + PR, yolo:false by default (dangerous only in disposable
sandboxes), clean-tree contract, one writer per repo. Linked from the README.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Rename agent-queue/docs/gigafactory/ to docs/GIGAFACTORY/ and update every
reference (README, system-overview code-map, and all phase job specs). Add an
index README that lists the docs and points to the companion docs in
learning_ai_common_plat. Docs-only; no behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move GIGAFACTORY_ROADMAP.md and GIGAFACTORY_SYSTEM_OVERVIEW.md under
agent-queue/docs/gigafactory/ so the scattered top-level docs are easy to
discover. Update the README links, the overview code-map, and all phase
job-spec source-of-truth paths to the new location. Pure docs move; no
behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Jobs move through .state/inbox/building/testing/review/failed/shipped/logs at
runtime, which constantly dirtied the repo and blocked clean rebases. Ignore
the per-job lifecycle files (keeping each dir via .gitkeep) and stop tracking
the consumed inbox job instances.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add GIGAFACTORY_SYSTEM_OVERVIEW.md — a current-state companion to the roadmap
spec covering: what the Agent Gigafactory is, a completion snapshot, three
Mermaid diagrams (component architecture, job-lifecycle state machine, atomic
claim + lease-fencing sequence), the Cosmos data model, the scoring router,
subsystem map, full /fleet REST surface, feature flags, the two control planes,
a cross-repo code map, test coverage, next steps (Phase 4/5), and an honest
bugs/gaps/risks section. All three Mermaid blocks validated with mermaid.parse.
Also correct documentation drift in GIGAFACTORY_ROADMAP.md found during the
review:
- §0 progress table showed Phase 3 as "0% not started" while every Phase-3 box
is ticked; updated phases 1-3 to done with realistic percentages.
- Phase-2 boxes "scheduler/router wired into assignment", "tracker adapter
direct call", and "factory enrollment + scoped tokens" are implemented in
common-plat (coordinator.ts uses selectJob; routes.ts enforces
enrollment.enforceFactoryToken; tracker-bridge.ts) but were left unticked —
ticked with evidence and refreshed the stale "remaining for 100%" notes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an opt-in fleet mode to the dashboard so an operator can drive the
coordinator fleet from the same TUI used for the local folder queue.
- lib/fleet-dash.mjs: dependency-injectable read/act adapter over the
platform-service /fleet REST surface (jobs, metrics, factories, events,
ship/requeue/reject). Pure-ish + fully unit-testable without a live service.
- dashboard.mjs: render + act in fleet mode when AQ_FLEET_DASH=1 — board with
counts, factories (per-factory rows or metrics aggregate), alerts, running
(by lease/factory), actionable JOBS with manifest tags, recent, and a
per-job events log. Single-flight async refresh keeps the last good board on
failure; ship re-GETs a fresh leaseEpoch before PATCH; run/stop/promote are
disabled (no safe server contract). Local mode is byte-for-byte unchanged.
- lib/fleet-dash.test.mjs: 22 node:assert assertions (config, stage mapping,
toBoard, fetch headers/timeout/errors, board assembly + graceful degradation,
events, job actions) wired into selftest.sh.
- docs: tick the Phase 3 "TUI re-pointed at /fleet" roadmap boxes.
Verified: selftest.sh green (incl. new fleet-dash checks); live non-TTY render
smoke against a stub /fleet server (both factories and metrics-aggregate paths);
local mode unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Render a per-job tags line on the RUNNING workers and JOBS lists showing the
routing inputs operators care about: priority, profile, capabilities, and the
tracker-item reference. Tags come from the launched meta, falling back to the
job's .md frontmatter for never-launched inbox jobs (new readManifest parser).
The tracker-item becomes a clickable terminal hyperlink when AQ_TRACKER_WEB is
set. Also renders the new budget_exceeded result as a failed RECENT row.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> })
and arm it alongside the per-run timeout. Whichever ceiling fires first binds;
the kill is recorded as result=timeout or result=budget_exceeded accordingly.
budget.wall extends timeout: a job with only a budget.wall (no timeout) is now
hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class
by default and maps to the failed tracker status.
Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live
selftest coverage; usd/tokens remain best-effort and are not enforced here.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Close the final Phase-2 exit-criteria box: >=2 factories executing jobs in parallel
through one coordinator, proving the concurrency guarantees end-to-end. This is a
DEMO HARNESS over the existing runtime — agent-queue.sh and lib/fleet-client.sh are
unchanged (read + called, not modified).
demo/two-factory-demo.sh: starts two real `agent-queue.sh run` daemons (mac-1 +
ubuntu-1, separate queues/cwds) that compete ONLY through the coordinator, then
asserts: (a) no double-assign — each of 3 jobs executed by exactly one factory;
(b) fencing + reclaim — kill a factory mid-job, the reaper returns its job, the
survivor reclaims + completes it, and the dead worker's late/zombie report (stale
leaseEpoch) is FENCED (HTTP 409, never shipped); (c) parallelism — both factories
hold active jobs concurrently. Dual-mode: CI-safe stateful stub by default; live
platform-service when AQ_FLEET_API/AQ_FLEET_TOKEN set.
demo/coordinator-stub.sh: stateful, mkdir-lock-guarded, file-backed coordinator
implementing claim/lease/fence/renew/release + reaper-reclaim via the existing
AQ_FLEET_API_CMD seam — the selftest stub pattern extended with shared state so
>=2 processes coordinate through one coordinator.
demo/README.md: stub + real invocations, env knobs, what each guarantee proves,
what-to-watch guide.
selftest.sh: +3 headless stub-mode checks (existing 68 unchanged byte-for-byte ->
71 total green).
docs/GIGAFACTORY_ROADMAP.md: tick the §14 two-factory-demo box; annotate Phase-2
exit criteria; bump §0 Phase 2 to 80% (remaining: scheduler-core wiring [common-plat
PR #31], tracker-direct call, factory enrollment).
bash 3.2 + awk/sed/grep/pgrep only; mac+linux safe; no new runtime deps.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a safe, reversible path to validate the fleet coordinator against the proven
single-host path BEFORE cutover, via three independently-toggleable flags:
AQ_FLEET=0 pure offline (zero coordinator calls; offline path unchanged)
AQ_FLEET_ROUTE=1 route_via_service: coordinator authoritative for claim (default = P2-S3)
AQ_FLEET_ROUTE=0 local inbox authoritative (coordinator not used to source work)
AQ_FLEET_SHADOW=1 dual-run (needs AQ_FLEET=1 + ROUTE=0): query coordinator in parallel,
record divergence, NEVER act on it
Precedence: SHADOW only when ROUTE=0; if ROUTE=1 + SHADOW=1, ROUTE wins (one-shot warning).
lib/fleet-client.sh: fleet_route_enabled / fleet_shadow_enabled / fleet_flags_warn_once /
fleet_flags_state; fleet_shadow_claim (read-only — isolated `-shadow` factoryId +
dryRun, releases any real lease, never materializes), fleet_shadow_compare
(AGREE/DIVERGE/COORD_EMPTY/LOCAL_EMPTY → .state/fleet-shadow.log), fleet_shadow_report
(shadow:true, response never acted on), cmd_fleet_shadow_report (counts + agreement rate).
agent-queue.sh: ROUTE-gate claim sourcing (claim only when route_via_service);
shadow hook after the local authoritative decision each iteration (best-effort,
error-swallowed — shadow can never fail a real job); `fleet-shadow-report` subcommand
+ help; resolved flags surfaced in `status`/`fleet-status`. tryClaim/fence/offline
paths unchanged.
Strictly side-effect-free on real job state: shadow never ships, quarantines, or
mutates real jobs. Offline path byte-for-byte unchanged when AQ_FLEET=0.
selftest.sh: +8 checks (shadow AGREE/DIVERGE/COORD_EMPTY, non-fatal 5xx, ROUTE
precedence, ROUTE=0 local-authoritative, fleet-shadow-report summary, shadow_report
unit). 60 prior checks unchanged → 68 total green. README + GIGAFACTORY_ROADMAP
document the flag model + cutover ladder.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sources lib/fleet-client.sh and adds a few fleet_enabled-gated hooks so the offline
git-queue path is byte-for-byte unchanged when AQ_FLEET is unset/0:
- cmd_run: register at loop start; per-iteration heartbeat (cadence) + lease renew
for in-flight fleet jobs + claim one coordinator job into inbox when capacity.
- meta: persist fleet_job_id + fleet_lease_epoch (from claim frontmatter).
- run_worker: report `building` (with WIP checkpoint) after WIP setup and `review`
before accepting the agent's output — a FENCED (stale-epoch/409) report self-aborts
and quarantines (never ships); 5xx/unreachable degrades (finish locally).
- _auto_echo: for fleet jobs route the outcome echo through the coordinator
(fleet_events) instead of the direct tracker echo; offline jobs unchanged.
- cmd_ship: fence-check before shipping a fleet job; release lease after.
- status: show factory id + per-job fleet=<id>@e<epoch>; insights lists fleet_* fields.
- dispatch + help: `fleet-status` command + a FLEET env section.
New sourced library implementing the factory side of the Phase-2 `fleet`
coordinator contract — curl-only + POSIX awk, reusing the Slice-4 HTTP/JSON
helper patterns, no new deps. Every function is a no-op unless AQ_FLEET=1.
- fleet_enabled / fleet_api (AQ_FLEET_API_CMD test seam) / _fleet_call
- fleet_detect_caps (reuses detect_capabilities) -> JSON caps array
- fleet_heartbeat (+ _maybe cadence): registration == first heartbeat
- fleet_claim: POST /fleet/claim, parse job id/bodyMd/leaseEpoch, materialize a
transient local .md (fleet-job-id + fleet-lease-epoch in frontmatter)
- fleet_report: PATCH fenced stage transition {stage, leaseEpoch, checkpoint?};
returns ok / FENCED(2, stale epoch -> self-abort) / degraded(1, unreachable)
- fleet_lease_renew / fleet_lease_release / fleet_renew_active (fenced)
- fleet_quarantine: park a reclaimed (fenced) job in failed/ for human triage
- cmd_fleet_status: register + print factory identity/caps
Report payloads carry only stage/epoch/checkpoint — never prompt/bodyMd/token.
Implements §10 single-host tracker integration, closing the last Phase-1 §14 item:
- tracker_api: one curl-only HTTP wrapper (base URL + bearer + productId header),
overridable via AQ_TRACKER_API_CMD so tests need no live service. Emits the
response body + a trailing HTTP-code line; _api_call splits into API_BODY/API_CODE.
- aq from-tracker <ITEM_ID>: GET the Item, map title/description -> job body,
labels (engine-class:/profile:/priority:/cap:) + Item priority -> frontmatter,
and stamp tracker-item + a stable idempotency-key tracker-<id>. Materializes a
.md into inbox/ via cmd_add; idempotent (Slice 1 dedupe) so a re-pull never dups.
JSON parsed with POSIX awk (no jq) — mac + linux safe.
- aq to-tracker <job>: one-way echo (child -> tracker, §24.5). PATCHes the Item
status (building/review/testing->in_progress, shipped->done, failures->wont_fix,
all overridable) and posts a metrics-only comment (result/attempts/duration/
tokens/cost/diff — NEVER prompt content or secrets). Idempotent via meta
tracker_echoed; an echo failure (e.g. HTTP 500) is logged and non-fatal — the
tracker is downstream, never authoritative for execution.
- Opt-in auto-echo (AQ_TRACKER_AUTO=1, default OFF): the worker echoes on each
transition (building via cmd_run, review/testing/failed via run_worker, shipped
via ship/promote); never blocks or fails a job.
- status + insights surface tracker-item and the last echoed status.
curl-only HTTP; no new runtime deps; conventional + backward-compatible.
Implements roadmap §6 (profiles) and §5 deps on the bash runner, backward-compatible
(jobs without profile/deps behave exactly as before).
Profiles (§6):
- profile_get / profile_persona / fm_eff helpers + PROFILES_DIR (AGENT_QUEUE_PROFILES
override). A job's `profile:` inherits verify (<- default-verify), capabilities,
engine-class, prefers-engine, allowed-scope, review-policy when the job omits them;
job fields always override (precedence job > profile > default). Resolution runs via
fm_eff inside the capability gate and resolve_engine, so inherited caps/engine-class
take effect before launch.
- persona injection: the profile's persona block is prepended to the stripped body
fed to the engine (job .md unchanged on disk; nothing secret logged).
- allowed-scope guardrail (WARN-ONLY): scope_check logs a non-blocking WARNING +
records scope_warning= for changed paths outside the globs; path_in_scope is a
pure, unit-testable matcher (`dir/**` = subtree).
deps / DAG, single host (§5):
- deps reference other jobs by idempotency-key. dep_satisfied: shipped/ (hard) or
shipped/+testing/ (deps-mode: soft). deps_unmet drives a block-with-reason skip in
inbox selection (never launched/failed); cmd_status surfaces "blocked (waiting on
<keys>)". deps_would_cycle rejects cyclic submits on `add`.
- _drain_pending: `--once` drains past dep-blocked jobs (idle can't satisfy them)
while still waiting on retry/recovery backoff timers.
Meta now records effective (inherited) capabilities/engine-class/prefers-engine/
review-policy/allowed-scope so `status` reflects resolved config.