bytelyst-devops-tools/agent-queue/README.md
Saravanakumar D 237481247e docs(gigafactory): uppercase GIGAFACTORY folder + add index README
Rename agent-queue/docs/gigafactory/ to docs/GIGAFACTORY/ and update every
reference (README, system-overview code-map, and all phase job specs). Add an
index README that lists the docs and points to the companion docs in
learning_ai_common_plat. Docs-only; no behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 21:21:31 -07:00

531 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# agent-queue
A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
**Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder,
and they get executed (in auto-approve mode) one slot at a time, moving through
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
> **Vision & roadmap:** where this is headed — a distributed multi-machine "gigafactory"
> (fleet of factories × tools × profiles, scheduler-routed, built on platform-service +
> tracker-web) — is specified as a checklist-driven implementation roadmap in
> [`docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md`](docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md).
> A full architecture overview, diagrams, code map and onboarding live alongside it in
> [`docs/GIGAFACTORY/`](docs/GIGAFACTORY/).
**Build/ship lifecycle — auto-QA, manual ship:**
```
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
(queued) (agent (rc=0; (verify (you ran
running) awaiting passed — `ship`)
verify) QA gate)
agent rc≠0 / │ verify fails
timeout ──────────┴──────────────▶ failed
```
- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs
automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:`
the job parks in `review/` for a manual `promote`.
- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is
never automatic.
> **Why this exists:** the agent CLIs ship a minimal local interface (no built-in
> batch/queue/dashboard — that lives in their *cloud* products). This is the
> zero-dependency bash glue that turns "run one prompt interactively" into
> "queue many and walk away."
---
## Quick start
```bash
cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
--engine devin \
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
--yolo
# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3
```
In a **second terminal**, watch progress:
```bash
./agent-queue.sh watch
```
```
AGENT QUEUE /…/agent-queue/queue
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
RUNNING
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
```
---
## How a task is configured
Each `.md` carries optional **frontmatter** telling the runner which engine to use,
which directory to run in, and whether to auto-approve:
```md
---
engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
yolo: true # auto-approve ALL tools (default: true)
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
# pass → testing/ (QA), fail → failed/
# (omit to park in review/ for manual promote)
---
# Your task / roadmap goes here
...
```
`add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't
already have a `---` block.
### Manifest fields (Gigafactory Phase 1)
The runner parses the richer [gigafactory manifest](docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md#5-the-evolved-job-manifest-feature)
**backward-compatibly** — a legacy `engine`/`cwd`/`yolo`-only `.md` behaves exactly as before.
Fields marked **RESERVED** are parsed, stored in `.state/<job>.meta`, and shown in `status`, but
are otherwise **no-ops until a later phase** (they do not yet affect execution).
| Field | Status | Default | Meaning |
| ----- | ------ | ------- | ------- |
| `engine` | active | `$AGENT_QUEUE_ENGINE` | explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class` |
| `cwd` / `yolo` / `lock` / `timeout` / `verify` | active | see above | Phase-0 behavior, unchanged |
| `priority` | **active** | `medium` | `critical\|high\|medium\|low`. Inbox is picked **highest-priority first, then oldest** (was pure FIFO) |
| `engine-class` | **active** | _(none)_ | used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first **available** engine. No engine available → job fails `result=no_engine` |
| `prefers-engine` | **active** | _(none)_ | optional order hint for `engine-class` resolution, e.g. `[claude, devin]` |
| `capabilities` | **active** | _(none)_ | hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` **and the agent is never launched** (grammar below) |
| `idempotency-key` | **active** | _(none)_ | dedupe on `add` (semantics below) |
| `profile` | **active** | _(none)_ | inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from `profiles/<name>.md` (job fields override — see **Profiles**) |
| `prefers` | RESERVED | _(none)_ | soft routing/affinity hints (e.g. `[factory:mac-2]`) |
| `budget` | RESERVED | _(none)_ | `{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice) |
| `deps` / `deps-mode` | **active** | _(none)_ | block until each referenced `idempotency-key` is in `shipped/` (or `testing/` when `deps-mode: soft`). Submit-time cycle detection (see **Profiles & deps**) |
| `retry` | **active** | _(none)_ | `{ max: N, backoff: 5m, on: [timeout, verify_failed, crash] }` — requeue failures with backoff up to `max`, then `retries_exhausted` (see **Resilience**) |
| `review-policy` | RESERVED | _(none)_ | `auto\|manual\|reviewers:[…]` |
| `artifacts` | RESERVED | _(none)_ | extra outputs to capture (coverage, screenshots) |
| `tracker-item` | RESERVED | _(none)_ | link back to the originating tracker task |
**Capability grammar** (a job matches a host iff **every** required token is satisfied):
| Token form | Example | Satisfied when |
| ---------- | ------- | -------------- |
| `key` (bare presence) | `gpu` | the host advertises `key` in any form |
| `key:value` (exact) | `os:mac`, `engine:devin`, `has:git` | the host advertises that exact token |
| `key:any` (wildcard) | `os:any` | the host advertises any `key:*` (so `os:any` matches every host) |
| `key<op>version` (`>=` `>` `=` `<=` `<`) | `node>=20` | numeric/semver-major compare vs the host's `key:<n>` |
The host advertises (via `detect_capabilities`): `os:<mac\|linux>`, `engine:<each available engine>`,
`node:<major>`, and `has:<git\|pnpm\|docker>` when present.
**`idempotency-key` semantics** (on `add`, hashing the frontmatter-stripped body):
- same key **+ same body** → **no-op** (logged `duplicate, skipped`).
- same key **+ different body**, prior job still in `inbox/`**supersedes** it (replaces the queued file).
- same key **+ different body**, prior job already past `inbox/` (building/review/testing/shipped) →
**rejected** with a clear error (use a new key, or requeue the existing job).
## Engine mapping
| `engine:` | Command run | Auto-approve flag (`yolo: true`) |
| --------- | ----------- | -------------------------------- |
| `devin` | `devin -p --prompt-file <body>` | `--permission-mode dangerous` |
| `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` |
| `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` |
| `copilot` | `copilot -p` (body on **stdin**) | `--allow-all-tools` _(best-effort; chat-coder class target)_ |
The frontmatter is **stripped** before the body reaches the agent, and
claude/codex receive it on **stdin** so a body starting with `--` is never
misparsed as a flag.
> Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in
> `agent-queue.sh` (it's the single place each engine is mapped).
## Commands
| Command | What it does |
| ------- | ------------ |
| `init` | create the `queue/` folders |
| `add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` |
| `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) |
| `status` | kanban counts + running-worker table (marks `⚠ stalled`; per-job insights sub-line) |
| `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) |
| `insights [job]` | per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see **Insights**) |
| `recover` | reclaim orphaned `building/` jobs (dead worker) back to `inbox/` (see **Resilience**) |
| `dash [--interval N]` | **interactive Node dashboard** — navigable numbered job list with single-key actions (see below) |
| `stop` | kill running workers + the run loop |
| `logs <job> [-f]` | print / follow a job's log |
| `promote <job>` | advance one stage forward: `review → testing → shipped` |
| `ship <job>` | **manual gate:** move a `testing/` (QA) job → `shipped/` |
| `reject <job>` | send a `review/` or `testing/` job → `failed/` |
| `requeue <job>` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run |
| `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` |
Only one `run` loop may be active per queue — a second `run` against the same
queue is refused while the first is alive (a stale `daemon.pid` is cleared).
### Interactive dashboard (`dash`)
`dash` is a single-script, menu-driven control panel (think a tiny "glassbox").
It shows the kanban counts, live RUNNING workers (engine, elapsed, last log
line, stall), a **navigable numbered JOBS list**, and RECENT finished jobs — and
lets you act on jobs without leaving the screen. Every action shells out to
`agent-queue.sh`, so the script stays the single source of truth.
| Key | Action |
| --- | ------ |
| `↑`/`↓`, `j`/`k`, `1``9` | select a job in the JOBS list |
| `enter` / `l` | view the selected job's log (live, auto-refreshing) |
| `p` | promote (`review → testing → shipped`) |
| `s` | ship (`testing`/QA → `shipped`, the manual gate) |
| `x` | reject (`review`/`testing` → `failed`) — asks `y/n` |
| `u` | requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n` |
| `r` | start the `run` loop (detached → `logs/run-loop.log`) |
| `S` | stop the run loop + running workers |
| `g` | refresh now · `?`/`h` help · `q`/`Ctrl-C` quit |
The header shows a `● run loop pid N` / `○ run loop stopped` indicator. Run it in
a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.
## Via `bytelyst-cli.sh`
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
```bash
./bytelyst-cli.sh agent-queue run --max 3 # full passthrough
./bytelyst-cli.sh aq status # short alias
```
## Folder layout
```
queue/
inbox/ # drop / queued .md files (oldest eligible picked first)
building/ # currently executing (agent running)
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
testing/ # verify passed (QA) — awaiting manual `ship`
shipped/ # manually shipped — the terminal success stage
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
logs/ # <job>.log — full agent + verify output
locks/ # per-key flock files (Linux hardening; unused on macOS)
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
.archive/ # <ts>/ — logs+meta moved here by `clean`
```
**`result=` values** written to `<job>.meta`: `review`, `testing`, `shipped`,
`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`, `capability_mismatch`
(host missing a required capability — agent never launched), `no_engine`
(an `engine-class` had no available engine), `retries_exhausted` (failed after
`retry.max` attempts — single-host dead-letter stand-in), `retry_scheduled`
(transient: requeued for another attempt), `recovered` (transient: an orphan was
reclaimed to `inbox/`).
## Profiles & deps
### Profiles (roadmap §6)
A **profile** is a reusable role preset in `profiles/<name>.md`. A job opts in with
`profile: <name>` and inherits any of these fields it does **not** set itself:
`verify` (from the profile's `default-verify`), `capabilities`, `engine-class`,
`prefers-engine`, `allowed-scope`, `review-policy`. The profile's `persona` block is
**prepended** to the body sent to the engine (the job `.md` on disk is unchanged;
secrets are never logged). Resolution runs **before** the capability gate and engine
resolution, so inherited caps / engine-class take effect.
**Precedence:** `job field > profile field > built-in default`. Set `AGENT_QUEUE_PROFILES`
to point at a different catalog directory (defaults to `./profiles`).
Starter catalog: `developer`, `backend-engineer`, `frontend-engineer`, `ux-designer`,
`ui-designer`, `qa`, `reviewer`, `docs-writer`, and a reserved `planner`. Each presets
`name`, `persona`, `capabilities`, `default-verify`, `engine-class`, `prefers-engine`,
`allowed-scope`, and `review-policy`.
**allowed-scope (warn-only this phase).** After a run on a git `cwd`, changed paths
outside the profile/job `allowed-scope` globs (`dir/**` matches the whole subtree) are
logged as a `WARNING` and recorded as `scope_warning=` in the meta — **non-blocking**
(the job is not failed). `path_in_scope` is exposed as a unit-testable function.
### deps / DAG, single host (roadmap §5)
`deps: [keyA, keyB]` references other jobs by their author-controlled
`idempotency-key`. A dep is **satisfied** when a job with that key is in `shipped/`
(default), or in `shipped/` **or** `testing/` when the dependent job sets
`deps-mode: soft`. A job with unmet deps is **blocked**: it is skipped in inbox
selection (never launched, never failed) and surfaced in `status` as
`blocked (waiting on: <keys>)`, then re-evaluated every loop until its deps are met.
`add` performs **submit-time cycle detection** over the inbox + active-stage dep graph
and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.
## Resilience (crash recovery & work preservation)
Single-host implementations of the durability model (roadmap §25):
- **Orphan recovery.** A job left in `building/` whose worker process is dead (no
live `pid`, PID-reuse-guarded by `pidstart`) is an orphan from a previous
crash/power-off. On `run` startup and on every loop iteration (or on demand via
`agent-queue.sh recover`) it is moved back to `inbox/` with `attempts`
incremented. Recovery is **idempotent** — once moved out of `building/` it is
never recovered twice.
- **WIP checkpointing.** When a job's `cwd` is a git repo, the worker creates/checks
out a dedicated branch **`aq/wip/<job>`** at start and commits any changes to it
on **every** exit path — success, failure, timeout, and SIGTERM/SIGINT (via a
trap). It **never** commits to `main`/your current branch. Non-git `cwd` is
skipped cleanly. `wip_branch` / `wip_base` / `wip_commit` are recorded in the meta.
- **Resume.** When an orphan/retry of a job whose `aq/wip/<job>` branch already
exists is relaunched, that branch is checked out first so the agent **continues
from the checkpoint** instead of from zero.
- **Retry policy** (`retry` frontmatter, now active). On a failure whose class is in
`on` (`crash`/`agent_error` for a non-zero agent exit, `timeout`, `verify_failed`)
the job is requeued to `inbox/` honoring `backoff` (selection skips it until
`next_eligible`) up to `max` attempts; on exhaustion it lands in `failed/` with
`result=retries_exhausted`, preserving the WIP branch + full log. No `retry` =
no retry (Phase-0 behavior).
All bookkeeping (`attempts`, `next_eligible`, `wip_*`) is append-only in the meta
and re-derivable from the meta + folder location, so recovery is crash-safe.
## Insights (metrics & token accounting)
Each finished run records into `<job>.meta`: `duration_s`, `exit`, `result`,
`attempts`, and — for a git `cwd``files_changed` / `lines_added` /
`lines_deleted` (diffed `wip_base..HEAD`). A single `parse_usage <engine> <log>`
adapter extracts `model` / `tokens_in` / `tokens_out` / `tokens_cached` /
`cost_usd` / `turns` / `tool_calls` when the engine exposes them.
```bash
agent-queue.sh insights <job> # full metrics for one job
agent-queue.sh insights # recent-jobs table + per-engine rollup
```
> **Token caveat (honest):** real usage is captured only where the engine surfaces
> it. A cooperating wrapper may emit a machine-readable `AQ_USAGE key=value …` line;
> otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin
> session metrics + Copilot are API-only and currently TODO in `parse_usage`). When
> a value is not provider-reported it is **omitted or flagged `usage_estimated`** —
> numbers are never fabricated. The per-engine rollup marks totals that include any
> estimated value with `*`.
## Tracker integration (§10)
Closes the task ↔ job round-trip against the platform-service **items API**: a
tracker Item can become a job, and a job's outcome echoes back to the Item.
```bash
agent-queue.sh from-tracker <ITEM_ID> # pull an Item -> materialize a job in inbox/
agent-queue.sh to-tracker <job> # echo the job's current outcome to its Item
```
All HTTP goes through one curl wrapper (`tracker_api`); there are no other network
calls. Real use needs **platform-service running and a bearer token**.
### Config (env)
| Var | Default | Meaning |
| --- | ------- | ------- |
| `AQ_TRACKER_API` | `http://localhost:4003` | base URL of the items API (routes live under `/api`) |
| `AQ_TRACKER_TOKEN` | _(none)_ | bearer token — **required** for real calls; never hardcode |
| `AQ_PRODUCT_ID` | _(none)_ | productId (sent as `X-Product-Id`; every Item has one) |
| `AQ_TRACKER_CWD` | `$PWD` | cwd a tracker-derived job runs in (Items carry no cwd) |
| `AQ_TRACKER_AUTO` | `0` | `1` = auto-echo on each transition (default OFF — echo is manual) |
| `AQ_TRACKER_STATUS_INPROGRESS` / `_DONE` / `_FAILED` | `in_progress` / `done` / `wont_fix` | Item status per bucket (the API has no blocked/failed status) |
| `AQ_TRACKER_API_CMD` | _(none)_ | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
### `from-tracker` — Item → job
`GET /api/items/<id>`, then maps fields to job frontmatter:
| Item | Job |
| ---- | --- |
| `title` + `description` | job body (verbatim instruction markdown) |
| `id` | `tracker-item: <id>` and `idempotency-key: tracker-<id>` (stable) |
| `priority` | `priority:` (label overrides; else Item priority; else `medium`) |
| label `engine-class:<x>` | `engine-class: <x>` |
| label `profile:<x>` | `profile: <x>` |
| label `priority:<x>` | `priority: <x>` |
| label `cap:<token>` | a `capabilities: [...]` entry |
Idempotent on the derived `idempotency-key` (Slice 1 dedupe) — pulling the same
Item twice never enqueues a duplicate.
### `to-tracker` — job → Item (one-way echo, §24.5)
Only if the job's meta has a `tracker-item`. Maps the job's stage/result to an Item
status and `PATCH /api/items/<id>/status`, then `POST /api/items/<id>/comments`
with a **metrics-only** summary (result, attempts, duration, tokens/cost, +/- lines —
**never prompt content or secrets**):
| job result/stage | Item status |
| ---------------- | ----------- |
| building / review / testing / recovered | `in_progress` |
| shipped | `done` |
| failed / timeout / verify_failed / retries_exhausted / capability_mismatch / no_engine / rejected | `wont_fix` (override via `AQ_TRACKER_STATUS_FAILED`) |
Idempotent via `tracker_echoed` in the meta (re-echoing an unchanged outcome is a
no-op). The echo is **one-way** (child → tracker) and **never authoritative for
execution**: an echo failure is logged and the job continues unchanged. With
`AQ_TRACKER_AUTO=1` the worker echoes automatically on each transition; otherwise
echo is manual. `status` / `insights` surface the `tracker-item` and last echoed status.
## Fleet integration (Phase 2)
Behind the `AQ_FLEET` flag, the runner becomes a **factory** that registers,
heartbeats, claims, and reports against the platform-service `fleet` coordinator —
so coordinator jobs run alongside local `.md` files on the same host. All
coordinator logic lives in [`lib/fleet-client.sh`](lib/fleet-client.sh) (curl-only +
POSIX awk, sourced by `agent-queue.sh`); the few hook points in the runner are all
gated on `fleet_enabled`.
> **Offline vs fleet mode.** With `AQ_FLEET` unset/`0` (the default) the runner is
> the pure offline git-queue described above — **zero** coordinator calls, behavior
> byte-for-byte unchanged. With `AQ_FLEET=1` the run loop also registers + claims
> from the coordinator, reports fenced stage transitions, renews leases, and (in
> fleet mode) routes the outcome echo through the coordinator's `fleet_events`
> instead of the direct tracker echo. The tracker echo remains the offline path.
```bash
AQ_FLEET=1 AQ_FLEET_TOKEN=AQ_PRODUCT_ID=… agent-queue.sh fleet-status # register + show identity
AQ_FLEET=1 AQ_FLEET_TOKEN=AQ_PRODUCT_ID=… agent-queue.sh run # claim + execute coordinator jobs
```
### Config (env)
| Var | Default | Meaning |
| --- | ------- | ------- |
| `AQ_FLEET` | `0` | master switch — `1` enables coordinator integration; `0`/unset = offline git-queue (zero coordinator calls) |
| `AQ_FLEET_ROUTE` | `1` | `route_via_service`: `1` = coordinator is authoritative for claim (P2-S3 behavior); `0` = local inbox authoritative (coordinator not used to source work) |
| `AQ_FLEET_SHADOW` | `0` | shadow/dual-run: `1` (requires `AQ_FLEET=1` + `AQ_FLEET_ROUTE=0`) queries the coordinator in parallel and records divergence, **never acting on it** |
| `AQ_FLEET_SHADOW_FACTORY_ID` | `<factory>-shadow` | isolated id used for the read-only shadow claim (never the real factory id) |
| `AQ_FLEET_SHADOW_LOG` | `.state/fleet-shadow.log` | structured shadow-divergence log (`ts⇥localJob⇥coordJob⇥verdict`) |
| `AQ_FLEET_API` | `http://localhost:4003/api` | coordinator base URL (already includes `/api`) |
| `AQ_FLEET_TOKEN` | _(none)_ | bearer token — never hardcode |
| `AQ_PRODUCT_ID` | _(none)_ | productId (sent as `X-Product-Id`; shared with the tracker config) |
| `AQ_FACTORY_ID` | `<hostname>-<pid>` | stable factory identity for this process |
| `AQ_FLEET_LEASE_RENEW_SEC` | `300` | heartbeat / lease-renew cadence |
| `AQ_FLEET_CAPS` | _(auto)_ | override the auto-detected capability tokens (comma/space list) |
| `AQ_FLEET_CWD` | `$PWD` | cwd a claimed coordinator job runs in |
| `AQ_FLEET_API_CMD` | _(none)_ | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
### Protocol (claim / heartbeat / report / fence / renew)
- **register / heartbeat:** `POST /fleet/factories/heartbeat {factoryId, capabilities[], health, load}` — registration *is* the first heartbeat; re-sent on `AQ_FLEET_LEASE_RENEW_SEC` cadence.
- **claim:** `POST /fleet/claim {factoryId, capabilities[], leaseSeconds}`. A returned job (`id`, `bodyMd`, `leaseEpoch`) is materialized as a transient local `.md` (frontmatter `fleet-job-id` + `fleet-lease-epoch`) so the existing runner executes it unchanged, interleaved with local files.
- **report (fenced):** each stage transition (`building`/`review`/`testing`/`shipped`/`failed`) is `PATCH /fleet/jobs/:id {stage, leaseEpoch, checkpoint?}`. The coordinator writes `fleet_events` server-side. The payload carries only stage/epoch/checkpoint — **never** the prompt/`bodyMd` or token.
- **fencing (§18):** if a report/renew returns **conflict/409** (stale `leaseEpoch` → the coordinator reclaimed us), the worker **self-aborts**: it stops, does **not** ship/merge, and **quarantines** the local result to `failed/` (`result=fenced_quarantine`) for human triage. A reclaimed zombie can never corrupt coordinator state.
- **lease renew / release:** `POST /fleet/jobs/:id/lease/renew` while building (fenced); `…/lease/release` on terminal stages.
- **checkpoint:** the WIP `{wipBranch, wipCommit}` is sent with the building report so a reclaim can resume (§25).
### Offline-degrade + quarantine (§9)
If the coordinator is **unreachable** mid-job (5xx / connection error), the report
is treated as *degraded* (logged, `fleet_degraded=1`): the in-flight job **finishes
locally** rather than being abandoned. On the next reachable call the worker
presents its `leaseEpoch`; if the coordinator now reports it **stale** (it was
reclaimed during the outage), the local result is **quarantined** (marked, not
auto-shipped) and surfaced for human triage — split-brain is resolved in favor of
the coordinator without losing the work. `status` shows the factory id + per-job
`fleet=<id>@e<epoch>`; `insights` lists the `fleet_*` fields.
### Feature flags + shadow / dual-run (Slice 4, §16/§27)
Three explicit, independently-toggleable levels gate the coordinator — a safe,
reversible path to validate the fleet coordinator against the proven single-host
(P1) behavior **before** any real cutover:
| Flag | Effect |
| ---- | ------ |
| `AQ_FLEET=0` | **Pure offline.** Zero coordinator calls (including shadow). Offline git-queue path is byte-for-byte unchanged. |
| `AQ_FLEET_ROUTE=1` (default) | **route_via_service** — the coordinator is *authoritative* for claim/assignment (today's P2-S3 behavior). |
| `AQ_FLEET_ROUTE=0` | **Local inbox authoritative** — the coordinator is *not* used to source work (the pre-cutover state). |
| `AQ_FLEET_SHADOW=1` | **Shadow / dual-run** (requires `AQ_FLEET=1` **and** `AQ_FLEET_ROUTE=0`): run the offline path as authoritative **and** query the coordinator in parallel, recording divergence **without acting on it**. |
**Precedence.** Shadow is only meaningful when `ROUTE=0`. If both `AQ_FLEET_ROUTE=1`
and `AQ_FLEET_SHADOW=1` are set, **ROUTE wins** and shadow is disabled (a one-shot
warning is logged) — you never route *and* shadow at the same time.
**Side-effect-free by construction.** Shadow **never** ships, quarantines, or
mutates real job state. `fleet_shadow_claim` asks the coordinator what it *would*
assign using an **isolated `-shadow` factoryId** + `"dryRun":true,"shadow":true`;
if a coordinator without dry-run support actually assigned, the lease is **released
immediately** so no real assignment persists. The would-be job is never
materialized, run, or shipped locally. `fleet_shadow_report` mirrors the local
stage as a shadow event (`"shadow":true`) purely to exercise reporting — the
coordinator response is logged but **never acted on** (no fence/quarantine).
Each iteration `fleet_shadow_compare` classifies the local (authoritative) decision
vs the coordinator's would-be decision as **AGREE / DIVERGE / COORD_EMPTY /
LOCAL_EMPTY** and appends a line to the shadow log. Summarize it any time:
```bash
agent-queue.sh fleet-shadow-report # per-verdict counts + agreement rate + recent divergences
agent-queue.sh fleet-shadow-report 25 # last 25 divergence/error events
agent-queue.sh status # surfaces the three flags' resolved state
```
**Cutover ladder (rollback at any step):**
1. **Observe (zero risk):** `AQ_FLEET=1 AQ_FLEET_ROUTE=0 AQ_FLEET_SHADOW=1 run`
the local path stays authoritative; the coordinator is only shadowed.
2. **Inspect agreement:** `fleet-shadow-report` — drive `AGREEMENT` toward 100%,
investigating each `DIVERGE`.
3. **Cut over:** once agreement is high, flip `AQ_FLEET_ROUTE=1` (coordinator
becomes authoritative).
4. **Rollback:** set `AQ_FLEET_ROUTE=0` (and/or `AQ_FLEET=0`) at any time — instant
return to the local/offline path, no data migration.
## Config (env overrides)
| Var | Default | Meaning |
| --- | ------- | ------- |
| `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live |
| `AGENT_QUEUE_MAX` | `3` | max concurrent agents (override per-run with `run --max N`) |
| `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter |
| `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) |
| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it |
| `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` |
| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN` | autodetected | override CLI binary paths |
| `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes |
## ⚠️ Safety
Running agents with `yolo: true` means **no approval prompts** — they will edit files,
run shell commands, and commit unattended. Mitigate:
- Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`").
- Tell prompts **not to `git push`** — review commits before they leave your machine.
- **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are
serialized, so two agents never run in one repo at once — even at `--max 2+`.
- Set a `timeout:` on long jobs so a wedged agent can't run forever.
- Watch cost: each job is a full agent session.
### Portability notes
- **macOS** has no `flock`/`timeout`; locking relies on the single run-loop
(enforced by the second-run refusal) and timeouts use a pure-bash watchdog.
Install coreutils (`gtimeout`) for hard process-tree kills.
- **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening.
## Roadmap / nice-to-haves
- [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd).
- [x] Per-job `timeout:` with hard kill (or bash watchdog fallback).
- [x] Stall detection in `status`/`dash`.
- [x] `requeue` failed jobs + `clean`/archive old runs.
- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`.
- [ ] `--push` opt-in policy + commit review gate.
- [ ] Optional notifications (Slack/desktop) on done/failed/stall.
- [ ] Persisted run-loop as a daemon/service with auto-restart.