Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
553 lines
31 KiB
Markdown
553 lines
31 KiB
Markdown
# agent-queue
|
||
|
||
A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
|
||
**Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder,
|
||
and they get executed (in auto-approve mode) one slot at a time, moving through
|
||
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
|
||
|
||
> **Vision & roadmap:** where this is headed — a distributed multi-machine "gigafactory"
|
||
> (fleet of factories × tools × profiles, scheduler-routed, built on platform-service +
|
||
> tracker-web) — is specified as a checklist-driven implementation roadmap in
|
||
> [`docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md`](docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md).
|
||
> A full architecture overview, diagrams, code map and onboarding live alongside it in
|
||
> [`docs/GIGAFACTORY/`](docs/GIGAFACTORY/).
|
||
|
||
> **Run safety:** how the daemon and its agents must operate (isolated worktrees,
|
||
> branch-per-task, least-privilege instead of blanket `--yolo`/dangerous on live
|
||
> checkouts) is defined in [`docs/RUN_POLICY.md`](docs/RUN_POLICY.md). Read it
|
||
> before enabling `yolo: true`.
|
||
|
||
**Build/ship lifecycle — auto-QA, manual ship:**
|
||
|
||
```
|
||
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
|
||
(queued) (agent (rc=0; (verify (you ran
|
||
running) awaiting passed — `ship`)
|
||
verify) QA gate)
|
||
│
|
||
agent rc≠0 / │ verify fails
|
||
timeout ──────────┴──────────────▶ failed
|
||
```
|
||
|
||
- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs
|
||
automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:` →
|
||
the job parks in `review/` for a manual `promote`.
|
||
- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is
|
||
never automatic.
|
||
|
||
> **Why this exists:** the agent CLIs ship a minimal local interface (no built-in
|
||
> batch/queue/dashboard — that lives in their *cloud* products). This is the
|
||
> zero-dependency bash glue that turns "run one prompt interactively" into
|
||
> "queue many and walk away."
|
||
|
||
---
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
cd learning_ai_devops_tools/agent-queue
|
||
chmod +x agent-queue.sh
|
||
./agent-queue.sh init
|
||
|
||
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
|
||
./agent-queue.sh add ~/roadmaps/UX-2.md \
|
||
--engine devin \
|
||
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
|
||
--yolo
|
||
|
||
# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
|
||
./agent-queue.sh run --max 3
|
||
```
|
||
|
||
In a **second terminal**, watch progress:
|
||
|
||
```bash
|
||
./agent-queue.sh watch
|
||
```
|
||
|
||
```
|
||
AGENT QUEUE /…/agent-queue/queue
|
||
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
|
||
|
||
RUNNING
|
||
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
|
||
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
|
||
```
|
||
|
||
---
|
||
|
||
## How a task is configured
|
||
|
||
Each `.md` carries optional **frontmatter** telling the runner which engine to use,
|
||
which directory to run in, and whether to auto-approve:
|
||
|
||
```md
|
||
---
|
||
engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE)
|
||
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
|
||
yolo: true # auto-approve ALL tools (default: true)
|
||
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
|
||
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
|
||
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
|
||
# pass → testing/ (QA), fail → failed/
|
||
# (omit to park in review/ for manual promote)
|
||
---
|
||
|
||
# Your task / roadmap goes here
|
||
...
|
||
```
|
||
|
||
`add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't
|
||
already have a `---` block.
|
||
|
||
### Manifest fields (Gigafactory Phase 1)
|
||
|
||
The runner parses the richer [gigafactory manifest](docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md#5-the-evolved-job-manifest-feature)
|
||
**backward-compatibly** — a legacy `engine`/`cwd`/`yolo`-only `.md` behaves exactly as before.
|
||
Fields marked **RESERVED** are parsed, stored in `.state/<job>.meta`, and shown in `status`, but
|
||
are otherwise **no-ops until a later phase** (they do not yet affect execution).
|
||
|
||
| Field | Status | Default | Meaning |
|
||
| ----- | ------ | ------- | ------- |
|
||
| `engine` | active | `$AGENT_QUEUE_ENGINE` | explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class` |
|
||
| `cwd` / `yolo` / `lock` / `timeout` / `verify` | active | see above | Phase-0 behavior, unchanged |
|
||
| `priority` | **active** | `medium` | `critical\|high\|medium\|low`. Inbox is picked **highest-priority first, then oldest** (was pure FIFO) |
|
||
| `engine-class` | **active** | _(none)_ | used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first **available** engine. No engine available → job fails `result=no_engine` |
|
||
| `prefers-engine` | **active** | _(none)_ | optional order hint for `engine-class` resolution, e.g. `[claude, devin]` |
|
||
| `capabilities` | **active** | _(none)_ | hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` **and the agent is never launched** (grammar below) |
|
||
| `idempotency-key` | **active** | _(none)_ | dedupe on `add` (semantics below) |
|
||
| `profile` | **active** | _(none)_ | inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from `profiles/<name>.md` (job fields override — see **Profiles**) |
|
||
| `prefers` | RESERVED | _(none)_ | soft routing/affinity hints (e.g. `[factory:mac-2]`) |
|
||
| `budget` | RESERVED | _(none)_ | `{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice) |
|
||
| `deps` / `deps-mode` | **active** | _(none)_ | block until each referenced `idempotency-key` is in `shipped/` (or `testing/` when `deps-mode: soft`). Submit-time cycle detection (see **Profiles & deps**) |
|
||
| `retry` | **active** | _(none)_ | `{ max: N, backoff: 5m, on: [timeout, verify_failed, crash] }` — requeue failures with backoff up to `max`, then `retries_exhausted` (see **Resilience**) |
|
||
| `review-policy` | RESERVED | _(none)_ | `auto\|manual\|reviewers:[…]` |
|
||
| `artifacts` | RESERVED | _(none)_ | extra outputs to capture (coverage, screenshots) |
|
||
| `tracker-item` | RESERVED | _(none)_ | link back to the originating tracker task |
|
||
|
||
**Capability grammar** (a job matches a host iff **every** required token is satisfied):
|
||
|
||
| Token form | Example | Satisfied when |
|
||
| ---------- | ------- | -------------- |
|
||
| `key` (bare presence) | `gpu` | the host advertises `key` in any form |
|
||
| `key:value` (exact) | `os:mac`, `engine:devin`, `has:git` | the host advertises that exact token |
|
||
| `key:any` (wildcard) | `os:any` | the host advertises any `key:*` (so `os:any` matches every host) |
|
||
| `key<op>version` (`>=` `>` `=` `<=` `<`) | `node>=20` | numeric/semver-major compare vs the host's `key:<n>` |
|
||
|
||
The host advertises (via `detect_capabilities`): `os:<mac\|linux>`, `engine:<each available engine>`,
|
||
`node:<major>`, and `has:<git\|pnpm\|docker>` when present.
|
||
|
||
**`idempotency-key` semantics** (on `add`, hashing the frontmatter-stripped body):
|
||
|
||
- same key **+ same body** → **no-op** (logged `duplicate, skipped`).
|
||
- same key **+ different body**, prior job still in `inbox/` → **supersedes** it (replaces the queued file).
|
||
- same key **+ different body**, prior job already past `inbox/` (building/review/testing/shipped) →
|
||
**rejected** with a clear error (use a new key, or requeue the existing job).
|
||
|
||
## Engine mapping
|
||
|
||
| `engine:` | Command run | Auto-approve flag (`yolo: true`) |
|
||
| --------- | ----------- | -------------------------------- |
|
||
| `devin` | `devin -p --prompt-file <body>` | `--permission-mode dangerous` |
|
||
| `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` |
|
||
| `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` |
|
||
| `copilot` | `copilot -p` (body on **stdin**) | `--allow-all-tools` _(best-effort; chat-coder class target)_ |
|
||
|
||
The frontmatter is **stripped** before the body reaches the agent, and
|
||
claude/codex receive it on **stdin** so a body starting with `--` is never
|
||
misparsed as a flag.
|
||
|
||
> Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in
|
||
> `agent-queue.sh` (it's the single place each engine is mapped).
|
||
|
||
## Commands
|
||
|
||
| Command | What it does |
|
||
| ------- | ------------ |
|
||
| `init` | create the `queue/` folders |
|
||
| `add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` |
|
||
| `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) |
|
||
| `status` | kanban counts + running-worker table (marks `⚠ stalled`; per-job insights sub-line) |
|
||
| `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) |
|
||
| `insights [job]` | per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see **Insights**) |
|
||
| `recover` | reclaim orphaned `building/` jobs (dead worker) back to `inbox/` (see **Resilience**) |
|
||
| `dash [--interval N]` | **interactive Node dashboard** — navigable numbered job list with single-key actions (see below) |
|
||
| `stop` | kill running workers + the run loop |
|
||
| `logs <job> [-f]` | print / follow a job's log |
|
||
| `promote <job>` | advance one stage forward: `review → testing → shipped` |
|
||
| `ship <job>` | **manual gate:** move a `testing/` (QA) job → `shipped/` |
|
||
| `reject <job>` | send a `review/` or `testing/` job → `failed/` |
|
||
| `requeue <job>` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run |
|
||
| `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` |
|
||
|
||
Only one `run` loop may be active per queue — a second `run` against the same
|
||
queue is refused while the first is alive (a stale `daemon.pid` is cleared).
|
||
|
||
### Interactive dashboard (`dash`)
|
||
|
||
`dash` is a single-script, menu-driven control panel (think a tiny "glassbox").
|
||
It shows the kanban counts, live RUNNING workers (engine, elapsed, last log
|
||
line, stall), a **navigable numbered JOBS list**, and RECENT finished jobs — and
|
||
lets you act on jobs without leaving the screen. Every action shells out to
|
||
`agent-queue.sh`, so the script stays the single source of truth.
|
||
|
||
| Key | Action |
|
||
| --- | ------ |
|
||
| `↑`/`↓`, `j`/`k`, `1`–`9` | select a job in the JOBS list |
|
||
| `enter` / `l` | view the selected job's log (live, auto-refreshing) |
|
||
| `p` | promote (`review → testing → shipped`) |
|
||
| `s` | ship (`testing`/QA → `shipped`, the manual gate) |
|
||
| `x` | reject (`review`/`testing` → `failed`) — asks `y/n` |
|
||
| `u` | requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n` |
|
||
| `r` | start the `run` loop (detached → `logs/run-loop.log`) |
|
||
| `S` | stop the run loop + running workers |
|
||
| `g` | refresh now · `?`/`h` help · `q`/`Ctrl-C` quit |
|
||
|
||
The header shows a `● run loop pid N` / `○ run loop stopped` indicator. Run it in
|
||
a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.
|
||
|
||
## Via `bytelyst-cli.sh`
|
||
|
||
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
|
||
|
||
```bash
|
||
./bytelyst-cli.sh agent-queue run --max 3 # full passthrough
|
||
./bytelyst-cli.sh aq status # short alias
|
||
```
|
||
|
||
## Boot-persistence (auto-start on login)
|
||
|
||
To run the worker non-stop and survive **reboot / crash / logout** (not just a
|
||
closed terminal), install the macOS LaunchAgent — it auto-starts `agent-queue run`
|
||
on login under `caffeinate` and restarts it via `KeepAlive`:
|
||
|
||
```bash
|
||
bash launchd/install.sh # install + start now
|
||
bash launchd/install.sh --uninstall # stop + remove
|
||
```
|
||
|
||
Override engine/concurrency/secrets in `~/.agent-queue.env` (e.g.
|
||
`AGENT_QUEUE_ENGINE=codex`, `AGENT_QUEUE_MAX=1`). See [`launchd/README.md`](launchd/README.md)
|
||
for the full layer comparison (tmux/caffeinate vs LaunchAgent) and gotchas.
|
||
|
||
## Folder layout
|
||
|
||
```
|
||
queue/
|
||
inbox/ # drop / queued .md files (oldest eligible picked first)
|
||
building/ # currently executing (agent running)
|
||
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
|
||
testing/ # verify passed (QA) — awaiting manual `ship`
|
||
shipped/ # manually shipped — the terminal success stage
|
||
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
|
||
logs/ # <job>.log — full agent + verify output
|
||
locks/ # per-key flock files (Linux hardening; unused on macOS)
|
||
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
|
||
.archive/ # <ts>/ — logs+meta moved here by `clean`
|
||
```
|
||
|
||
**`result=` values** written to `<job>.meta`: `review`, `testing`, `shipped`,
|
||
`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`, `capability_mismatch`
|
||
(host missing a required capability — agent never launched), `no_engine`
|
||
(an `engine-class` had no available engine), `retries_exhausted` (failed after
|
||
`retry.max` attempts — single-host dead-letter stand-in), `retry_scheduled`
|
||
(transient: requeued for another attempt), `recovered` (transient: an orphan was
|
||
reclaimed to `inbox/`).
|
||
|
||
## Profiles & deps
|
||
|
||
### Profiles (roadmap §6)
|
||
|
||
A **profile** is a reusable role preset in `profiles/<name>.md`. A job opts in with
|
||
`profile: <name>` and inherits any of these fields it does **not** set itself:
|
||
`verify` (from the profile's `default-verify`), `capabilities`, `engine-class`,
|
||
`prefers-engine`, `allowed-scope`, `review-policy`. The profile's `persona` block is
|
||
**prepended** to the body sent to the engine (the job `.md` on disk is unchanged;
|
||
secrets are never logged). Resolution runs **before** the capability gate and engine
|
||
resolution, so inherited caps / engine-class take effect.
|
||
|
||
**Precedence:** `job field > profile field > built-in default`. Set `AGENT_QUEUE_PROFILES`
|
||
to point at a different catalog directory (defaults to `./profiles`).
|
||
|
||
Starter catalog: `developer`, `backend-engineer`, `frontend-engineer`, `ux-designer`,
|
||
`ui-designer`, `qa`, `reviewer`, `docs-writer`, and a reserved `planner`. Each presets
|
||
`name`, `persona`, `capabilities`, `default-verify`, `engine-class`, `prefers-engine`,
|
||
`allowed-scope`, and `review-policy`.
|
||
|
||
**allowed-scope (warn-only this phase).** After a run on a git `cwd`, changed paths
|
||
outside the profile/job `allowed-scope` globs (`dir/**` matches the whole subtree) are
|
||
logged as a `WARNING` and recorded as `scope_warning=` in the meta — **non-blocking**
|
||
(the job is not failed). `path_in_scope` is exposed as a unit-testable function.
|
||
|
||
### deps / DAG, single host (roadmap §5)
|
||
|
||
`deps: [keyA, keyB]` references other jobs by their author-controlled
|
||
`idempotency-key`. A dep is **satisfied** when a job with that key is in `shipped/`
|
||
(default), or in `shipped/` **or** `testing/` when the dependent job sets
|
||
`deps-mode: soft`. A job with unmet deps is **blocked**: it is skipped in inbox
|
||
selection (never launched, never failed) and surfaced in `status` as
|
||
`blocked (waiting on: <keys>)`, then re-evaluated every loop until its deps are met.
|
||
`add` performs **submit-time cycle detection** over the inbox + active-stage dep graph
|
||
and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.
|
||
|
||
## Resilience (crash recovery & work preservation)
|
||
|
||
Single-host implementations of the durability model (roadmap §25):
|
||
|
||
- **Orphan recovery.** A job left in `building/` whose worker process is dead (no
|
||
live `pid`, PID-reuse-guarded by `pidstart`) is an orphan from a previous
|
||
crash/power-off. On `run` startup and on every loop iteration (or on demand via
|
||
`agent-queue.sh recover`) it is moved back to `inbox/` with `attempts`
|
||
incremented. Recovery is **idempotent** — once moved out of `building/` it is
|
||
never recovered twice.
|
||
- **WIP checkpointing.** When a job's `cwd` is a git repo, the worker creates/checks
|
||
out a dedicated branch **`aq/wip/<job>`** at start and commits any changes to it
|
||
on **every** exit path — success, failure, timeout, and SIGTERM/SIGINT (via a
|
||
trap). It **never** commits to `main`/your current branch. Non-git `cwd` is
|
||
skipped cleanly. `wip_branch` / `wip_base` / `wip_commit` are recorded in the meta.
|
||
- **Resume.** When an orphan/retry of a job whose `aq/wip/<job>` branch already
|
||
exists is relaunched, that branch is checked out first so the agent **continues
|
||
from the checkpoint** instead of from zero.
|
||
- **Retry policy** (`retry` frontmatter, now active). On a failure whose class is in
|
||
`on` (`crash`/`agent_error` for a non-zero agent exit, `timeout`, `verify_failed`)
|
||
the job is requeued to `inbox/` honoring `backoff` (selection skips it until
|
||
`next_eligible`) up to `max` attempts; on exhaustion it lands in `failed/` with
|
||
`result=retries_exhausted`, preserving the WIP branch + full log. No `retry` =
|
||
no retry (Phase-0 behavior).
|
||
|
||
All bookkeeping (`attempts`, `next_eligible`, `wip_*`) is append-only in the meta
|
||
and re-derivable from the meta + folder location, so recovery is crash-safe.
|
||
|
||
## Insights (metrics & token accounting)
|
||
|
||
Each finished run records into `<job>.meta`: `duration_s`, `exit`, `result`,
|
||
`attempts`, and — for a git `cwd` — `files_changed` / `lines_added` /
|
||
`lines_deleted` (diffed `wip_base..HEAD`). A single `parse_usage <engine> <log>`
|
||
adapter extracts `model` / `tokens_in` / `tokens_out` / `tokens_cached` /
|
||
`cost_usd` / `turns` / `tool_calls` when the engine exposes them.
|
||
|
||
```bash
|
||
agent-queue.sh insights <job> # full metrics for one job
|
||
agent-queue.sh insights # recent-jobs table + per-engine rollup
|
||
```
|
||
|
||
> **Token caveat (honest):** real usage is captured only where the engine surfaces
|
||
> it. A cooperating wrapper may emit a machine-readable `AQ_USAGE key=value …` line;
|
||
> otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin
|
||
> session metrics + Copilot are API-only and currently TODO in `parse_usage`). When
|
||
> a value is not provider-reported it is **omitted or flagged `usage_estimated`** —
|
||
> numbers are never fabricated. The per-engine rollup marks totals that include any
|
||
> estimated value with `*`.
|
||
|
||
## Tracker integration (§10)
|
||
|
||
Closes the task ↔ job round-trip against the platform-service **items API**: a
|
||
tracker Item can become a job, and a job's outcome echoes back to the Item.
|
||
|
||
```bash
|
||
agent-queue.sh from-tracker <ITEM_ID> # pull an Item -> materialize a job in inbox/
|
||
agent-queue.sh to-tracker <job> # echo the job's current outcome to its Item
|
||
```
|
||
|
||
All HTTP goes through one curl wrapper (`tracker_api`); there are no other network
|
||
calls. Real use needs **platform-service running and a bearer token**.
|
||
|
||
### Config (env)
|
||
|
||
| Var | Default | Meaning |
|
||
| --- | ------- | ------- |
|
||
| `AQ_TRACKER_API` | `http://localhost:4003` | base URL of the items API (routes live under `/api`) |
|
||
| `AQ_TRACKER_TOKEN` | _(none)_ | bearer token — **required** for real calls; never hardcode |
|
||
| `AQ_PRODUCT_ID` | _(none)_ | productId (sent as `X-Product-Id`; every Item has one) |
|
||
| `AQ_TRACKER_CWD` | `$PWD` | cwd a tracker-derived job runs in (Items carry no cwd) |
|
||
| `AQ_TRACKER_AUTO` | `0` | `1` = auto-echo on each transition (default OFF — echo is manual) |
|
||
| `AQ_TRACKER_STATUS_INPROGRESS` / `_DONE` / `_FAILED` | `in_progress` / `done` / `wont_fix` | Item status per bucket (the API has no blocked/failed status) |
|
||
| `AQ_TRACKER_API_CMD` | _(none)_ | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
|
||
|
||
### `from-tracker` — Item → job
|
||
|
||
`GET /api/items/<id>`, then maps fields to job frontmatter:
|
||
|
||
| Item | Job |
|
||
| ---- | --- |
|
||
| `title` + `description` | job body (verbatim instruction markdown) |
|
||
| `id` | `tracker-item: <id>` and `idempotency-key: tracker-<id>` (stable) |
|
||
| `priority` | `priority:` (label overrides; else Item priority; else `medium`) |
|
||
| label `engine-class:<x>` | `engine-class: <x>` |
|
||
| label `profile:<x>` | `profile: <x>` |
|
||
| label `priority:<x>` | `priority: <x>` |
|
||
| label `cap:<token>` | a `capabilities: [...]` entry |
|
||
|
||
Idempotent on the derived `idempotency-key` (Slice 1 dedupe) — pulling the same
|
||
Item twice never enqueues a duplicate.
|
||
|
||
### `to-tracker` — job → Item (one-way echo, §24.5)
|
||
|
||
Only if the job's meta has a `tracker-item`. Maps the job's stage/result to an Item
|
||
status and `PATCH /api/items/<id>/status`, then `POST /api/items/<id>/comments`
|
||
with a **metrics-only** summary (result, attempts, duration, tokens/cost, +/- lines —
|
||
**never prompt content or secrets**):
|
||
|
||
| job result/stage | Item status |
|
||
| ---------------- | ----------- |
|
||
| building / review / testing / recovered | `in_progress` |
|
||
| shipped | `done` |
|
||
| failed / timeout / verify_failed / retries_exhausted / capability_mismatch / no_engine / rejected | `wont_fix` (override via `AQ_TRACKER_STATUS_FAILED`) |
|
||
|
||
Idempotent via `tracker_echoed` in the meta (re-echoing an unchanged outcome is a
|
||
no-op). The echo is **one-way** (child → tracker) and **never authoritative for
|
||
execution**: an echo failure is logged and the job continues unchanged. With
|
||
`AQ_TRACKER_AUTO=1` the worker echoes automatically on each transition; otherwise
|
||
echo is manual. `status` / `insights` surface the `tracker-item` and last echoed status.
|
||
|
||
## Fleet integration (Phase 2)
|
||
|
||
Behind the `AQ_FLEET` flag, the runner becomes a **factory** that registers,
|
||
heartbeats, claims, and reports against the platform-service `fleet` coordinator —
|
||
so coordinator jobs run alongside local `.md` files on the same host. All
|
||
coordinator logic lives in [`lib/fleet-client.sh`](lib/fleet-client.sh) (curl-only +
|
||
POSIX awk, sourced by `agent-queue.sh`); the few hook points in the runner are all
|
||
gated on `fleet_enabled`.
|
||
|
||
> **Offline vs fleet mode.** With `AQ_FLEET` unset/`0` (the default) the runner is
|
||
> the pure offline git-queue described above — **zero** coordinator calls, behavior
|
||
> byte-for-byte unchanged. With `AQ_FLEET=1` the run loop also registers + claims
|
||
> from the coordinator, reports fenced stage transitions, renews leases, and (in
|
||
> fleet mode) routes the outcome echo through the coordinator's `fleet_events`
|
||
> instead of the direct tracker echo. The tracker echo remains the offline path.
|
||
|
||
```bash
|
||
AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh fleet-status # register + show identity
|
||
AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh run # claim + execute coordinator jobs
|
||
```
|
||
|
||
### Config (env)
|
||
|
||
| Var | Default | Meaning |
|
||
| --- | ------- | ------- |
|
||
| `AQ_FLEET` | `0` | master switch — `1` enables coordinator integration; `0`/unset = offline git-queue (zero coordinator calls) |
|
||
| `AQ_FLEET_ROUTE` | `1` | `route_via_service`: `1` = coordinator is authoritative for claim (P2-S3 behavior); `0` = local inbox authoritative (coordinator not used to source work) |
|
||
| `AQ_FLEET_AUTOSHIP` | `0` | `1` = when the local verify gate passes, advance the coordinator job `testing → shipped` (the factory's verify **is** the test phase); `0` = report `testing` and rest for the human review gate |
|
||
| `AQ_FLEET_SHADOW` | `0` | shadow/dual-run: `1` (requires `AQ_FLEET=1` + `AQ_FLEET_ROUTE=0`) queries the coordinator in parallel and records divergence, **never acting on it** |
|
||
| `AQ_FLEET_SHADOW_FACTORY_ID` | `<factory>-shadow` | isolated id used for the read-only shadow claim (never the real factory id) |
|
||
| `AQ_FLEET_SHADOW_LOG` | `.state/fleet-shadow.log` | structured shadow-divergence log (`ts⇥localJob⇥coordJob⇥verdict`) |
|
||
| `AQ_FLEET_API` | `http://localhost:4003/api` | coordinator base URL (already includes `/api`) |
|
||
| `AQ_FLEET_TOKEN` | _(none)_ | bearer token — never hardcode |
|
||
| `AQ_PRODUCT_ID` | _(none)_ | productId (sent as `X-Product-Id`; shared with the tracker config) |
|
||
| `AQ_FACTORY_ID` | `<hostname>-<pid>` | stable factory identity for this process |
|
||
| `AQ_FLEET_LEASE_RENEW_SEC` | `300` | heartbeat / lease-renew cadence |
|
||
| `AQ_FLEET_CAPS` | _(auto)_ | override the auto-detected capability tokens (comma/space list) |
|
||
| `AQ_FLEET_CWD` | `$PWD` | cwd a claimed coordinator job runs in |
|
||
| `AQ_FLEET_API_CMD` | _(none)_ | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
|
||
|
||
### Protocol (claim / heartbeat / report / fence / renew)
|
||
|
||
- **register / heartbeat:** `POST /fleet/factories/heartbeat {factoryId, capabilities[], health, load}` — registration *is* the first heartbeat; re-sent on `AQ_FLEET_LEASE_RENEW_SEC` cadence.
|
||
- **claim:** `POST /fleet/claim {factoryId, capabilities[], leaseSeconds}`. A returned job (`id`, `bodyMd`, `leaseEpoch`) is materialized as a transient local `.md` (frontmatter `fleet-job-id` + `fleet-lease-epoch`) so the existing runner executes it unchanged, interleaved with local files.
|
||
- **report (fenced):** each stage transition (`building`/`review`/`testing`/`shipped`/`failed`) is `PATCH /fleet/jobs/:id {stage, leaseEpoch, checkpoint?}`. The coordinator writes `fleet_events` server-side. The payload carries only stage/epoch/checkpoint — **never** the prompt/`bodyMd` or token.
|
||
- **fencing (§18):** if a report/renew returns **conflict/409** (stale `leaseEpoch` → the coordinator reclaimed us), the worker **self-aborts**: it stops, does **not** ship/merge, and **quarantines** the local result to `failed/` (`result=fenced_quarantine`) for human triage. A reclaimed zombie can never corrupt coordinator state.
|
||
- **lease renew / release:** `POST /fleet/jobs/:id/lease/renew` while building (fenced); `…/lease/release` on terminal stages.
|
||
- **checkpoint:** the WIP `{wipBranch, wipCommit}` is sent with the building report so a reclaim can resume (§25).
|
||
|
||
### Offline-degrade + quarantine (§9)
|
||
|
||
If the coordinator is **unreachable** mid-job (5xx / connection error), the report
|
||
is treated as *degraded* (logged, `fleet_degraded=1`): the in-flight job **finishes
|
||
locally** rather than being abandoned. On the next reachable call the worker
|
||
presents its `leaseEpoch`; if the coordinator now reports it **stale** (it was
|
||
reclaimed during the outage), the local result is **quarantined** (marked, not
|
||
auto-shipped) and surfaced for human triage — split-brain is resolved in favor of
|
||
the coordinator without losing the work. `status` shows the factory id + per-job
|
||
`fleet=<id>@e<epoch>`; `insights` lists the `fleet_*` fields.
|
||
|
||
### Feature flags + shadow / dual-run (Slice 4, §16/§27)
|
||
|
||
Three explicit, independently-toggleable levels gate the coordinator — a safe,
|
||
reversible path to validate the fleet coordinator against the proven single-host
|
||
(P1) behavior **before** any real cutover:
|
||
|
||
| Flag | Effect |
|
||
| ---- | ------ |
|
||
| `AQ_FLEET=0` | **Pure offline.** Zero coordinator calls (including shadow). Offline git-queue path is byte-for-byte unchanged. |
|
||
| `AQ_FLEET_ROUTE=1` (default) | **route_via_service** — the coordinator is *authoritative* for claim/assignment (today's P2-S3 behavior). |
|
||
| `AQ_FLEET_ROUTE=0` | **Local inbox authoritative** — the coordinator is *not* used to source work (the pre-cutover state). |
|
||
| `AQ_FLEET_AUTOSHIP=1` | **Autonomous ship.** On a passing local verify, advance the coordinator job `testing → shipped` (closing the `testing → shipped` gap). Default `0` reports `testing` and leaves the job for the human review gate / `ship` operator action. |
|
||
| `AQ_FLEET_SHADOW=1` | **Shadow / dual-run** (requires `AQ_FLEET=1` **and** `AQ_FLEET_ROUTE=0`): run the offline path as authoritative **and** query the coordinator in parallel, recording divergence **without acting on it**. |
|
||
|
||
**Precedence.** Shadow is only meaningful when `ROUTE=0`. If both `AQ_FLEET_ROUTE=1`
|
||
and `AQ_FLEET_SHADOW=1` are set, **ROUTE wins** and shadow is disabled (a one-shot
|
||
warning is logged) — you never route *and* shadow at the same time.
|
||
|
||
**Side-effect-free by construction.** Shadow **never** ships, quarantines, or
|
||
mutates real job state. `fleet_shadow_claim` asks the coordinator what it *would*
|
||
assign using an **isolated `-shadow` factoryId** + `"dryRun":true,"shadow":true`;
|
||
if a coordinator without dry-run support actually assigned, the lease is **released
|
||
immediately** so no real assignment persists. The would-be job is never
|
||
materialized, run, or shipped locally. `fleet_shadow_report` mirrors the local
|
||
stage as a shadow event (`"shadow":true`) purely to exercise reporting — the
|
||
coordinator response is logged but **never acted on** (no fence/quarantine).
|
||
|
||
Each iteration `fleet_shadow_compare` classifies the local (authoritative) decision
|
||
vs the coordinator's would-be decision as **AGREE / DIVERGE / COORD_EMPTY /
|
||
LOCAL_EMPTY** and appends a line to the shadow log. Summarize it any time:
|
||
|
||
```bash
|
||
agent-queue.sh fleet-shadow-report # per-verdict counts + agreement rate + recent divergences
|
||
agent-queue.sh fleet-shadow-report 25 # last 25 divergence/error events
|
||
agent-queue.sh status # surfaces the three flags' resolved state
|
||
```
|
||
|
||
**Cutover ladder (rollback at any step):**
|
||
|
||
1. **Observe (zero risk):** `AQ_FLEET=1 AQ_FLEET_ROUTE=0 AQ_FLEET_SHADOW=1 run` —
|
||
the local path stays authoritative; the coordinator is only shadowed.
|
||
2. **Inspect agreement:** `fleet-shadow-report` — drive `AGREEMENT` toward 100%,
|
||
investigating each `DIVERGE`.
|
||
3. **Cut over:** once agreement is high, flip `AQ_FLEET_ROUTE=1` (coordinator
|
||
becomes authoritative).
|
||
4. **Rollback:** set `AQ_FLEET_ROUTE=0` (and/or `AQ_FLEET=0`) at any time — instant
|
||
return to the local/offline path, no data migration.
|
||
|
||
## Config (env overrides)
|
||
|
||
| Var | Default | Meaning |
|
||
| --- | ------- | ------- |
|
||
| `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live |
|
||
| `AGENT_QUEUE_MAX` | `3` | max concurrent agents (override per-run with `run --max N`) |
|
||
| `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter |
|
||
| `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) |
|
||
| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it |
|
||
| `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` |
|
||
| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN` | autodetected | override CLI binary paths |
|
||
| `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes |
|
||
|
||
## ⚠️ Safety
|
||
|
||
Running agents with `yolo: true` means **no approval prompts** — they will edit files,
|
||
run shell commands, and commit unattended. Mitigate:
|
||
|
||
- Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`").
|
||
- Tell prompts **not to `git push`** — review commits before they leave your machine.
|
||
- **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are
|
||
serialized, so two agents never run in one repo at once — even at `--max 2+`.
|
||
- Set a `timeout:` on long jobs so a wedged agent can't run forever.
|
||
- Watch cost: each job is a full agent session.
|
||
|
||
### Portability notes
|
||
|
||
- **macOS** has no `flock`/`timeout`; locking relies on the single run-loop
|
||
(enforced by the second-run refusal) and timeouts use a pure-bash watchdog.
|
||
Install coreutils (`gtimeout`) for hard process-tree kills.
|
||
- **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening.
|
||
|
||
## Roadmap / nice-to-haves
|
||
|
||
- [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd).
|
||
- [x] Per-job `timeout:` with hard kill (or bash watchdog fallback).
|
||
- [x] Stall detection in `status`/`dash`.
|
||
- [x] `requeue` failed jobs + `clean`/archive old runs.
|
||
- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`.
|
||
- [ ] `--push` opt-in policy + commit review gate.
|
||
- [ ] Optional notifications (Slack/desktop) on done/failed/stall.
|
||
- [ ] Persisted run-loop as a daemon/service with auto-restart.
|