# agent-queue A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs — **Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder, and they get executed (in auto-approve mode) one slot at a time, moving through `inbox → building → review → testing → shipped` (plus `failed`) with live status. > **Vision & roadmap:** where this is headed — a distributed multi-machine "gigafactory" > (fleet of factories × tools × profiles, scheduler-routed, built on platform-service + > tracker-web) — is specified as a checklist-driven implementation roadmap in > [`docs/GIGAFACTORY_ROADMAP.md`](docs/GIGAFACTORY_ROADMAP.md). **Build/ship lifecycle — auto-QA, manual ship:** ``` inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped (queued) (agent (rc=0; (verify (you ran running) awaiting passed — `ship`) verify) QA gate) │ agent rc≠0 / │ verify fails timeout ──────────┴──────────────▶ failed ``` - **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:` → the job parks in `review/` for a manual `promote`. - **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is never automatic. > **Why this exists:** the agent CLIs ship a minimal local interface (no built-in > batch/queue/dashboard — that lives in their *cloud* products). This is the > zero-dependency bash glue that turns "run one prompt interactively" into > "queue many and walk away." --- ## Quick start ```bash cd learning_ai_devops_tools/agent-queue chmod +x agent-queue.sh ./agent-queue.sh init # queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything ./agent-queue.sh add ~/roadmaps/UX-2.md \ --engine devin \ --cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \ --yolo # start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default). ./agent-queue.sh run --max 3 ``` In a **second terminal**, watch progress: ```bash ./agent-queue.sh watch ``` ``` AGENT QUEUE /…/agent-queue/queue inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2 RUNNING 20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx 20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck ``` --- ## How a task is configured Each `.md` carries optional **frontmatter** telling the runner which engine to use, which directory to run in, and whether to auto-approve: ```md --- engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE) cwd: /abs/path/to/repo # where the agent executes (default: cwd when added) yolo: true # auto-approve ALL tools (default: true) lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout) verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0: # pass → testing/ (QA), fail → failed/ # (omit to park in review/ for manual promote) --- # Your task / roadmap goes here ... ``` `add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't already have a `---` block. ### Manifest fields (Gigafactory Phase 1) The runner parses the richer [gigafactory manifest](docs/GIGAFACTORY_ROADMAP.md#5-the-evolved-job-manifest-feature) **backward-compatibly** — a legacy `engine`/`cwd`/`yolo`-only `.md` behaves exactly as before. Fields marked **RESERVED** are parsed, stored in `.state/.meta`, and shown in `status`, but are otherwise **no-ops until a later phase** (they do not yet affect execution). | Field | Status | Default | Meaning | | ----- | ------ | ------- | ------- | | `engine` | active | `$AGENT_QUEUE_ENGINE` | explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class` | | `cwd` / `yolo` / `lock` / `timeout` / `verify` | active | see above | Phase-0 behavior, unchanged | | `priority` | **active** | `medium` | `critical\|high\|medium\|low`. Inbox is picked **highest-priority first, then oldest** (was pure FIFO) | | `engine-class` | **active** | _(none)_ | used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first **available** engine. No engine available → job fails `result=no_engine` | | `prefers-engine` | **active** | _(none)_ | optional order hint for `engine-class` resolution, e.g. `[claude, devin]` | | `capabilities` | **active** | _(none)_ | hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` **and the agent is never launched** (grammar below) | | `idempotency-key` | **active** | _(none)_ | dedupe on `add` (semantics below) | | `profile` | **active** | _(none)_ | inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from `profiles/.md` (job fields override — see **Profiles**) | | `prefers` | RESERVED | _(none)_ | soft routing/affinity hints (e.g. `[factory:mac-2]`) | | `budget` | RESERVED | _(none)_ | `{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice) | | `deps` / `deps-mode` | **active** | _(none)_ | block until each referenced `idempotency-key` is in `shipped/` (or `testing/` when `deps-mode: soft`). Submit-time cycle detection (see **Profiles & deps**) | | `retry` | **active** | _(none)_ | `{ max: N, backoff: 5m, on: [timeout, verify_failed, crash] }` — requeue failures with backoff up to `max`, then `retries_exhausted` (see **Resilience**) | | `review-policy` | RESERVED | _(none)_ | `auto\|manual\|reviewers:[…]` | | `artifacts` | RESERVED | _(none)_ | extra outputs to capture (coverage, screenshots) | | `tracker-item` | RESERVED | _(none)_ | link back to the originating tracker task | **Capability grammar** (a job matches a host iff **every** required token is satisfied): | Token form | Example | Satisfied when | | ---------- | ------- | -------------- | | `key` (bare presence) | `gpu` | the host advertises `key` in any form | | `key:value` (exact) | `os:mac`, `engine:devin`, `has:git` | the host advertises that exact token | | `key:any` (wildcard) | `os:any` | the host advertises any `key:*` (so `os:any` matches every host) | | `keyversion` (`>=` `>` `=` `<=` `<`) | `node>=20` | numeric/semver-major compare vs the host's `key:` | The host advertises (via `detect_capabilities`): `os:`, `engine:`, `node:`, and `has:` when present. **`idempotency-key` semantics** (on `add`, hashing the frontmatter-stripped body): - same key **+ same body** → **no-op** (logged `duplicate, skipped`). - same key **+ different body**, prior job still in `inbox/` → **supersedes** it (replaces the queued file). - same key **+ different body**, prior job already past `inbox/` (building/review/testing/shipped) → **rejected** with a clear error (use a new key, or requeue the existing job). ## Engine mapping | `engine:` | Command run | Auto-approve flag (`yolo: true`) | | --------- | ----------- | -------------------------------- | | `devin` | `devin -p --prompt-file ` | `--permission-mode dangerous` | | `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` | | `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` | | `copilot` | `copilot -p` (body on **stdin**) | `--allow-all-tools` _(best-effort; chat-coder class target)_ | The frontmatter is **stripped** before the body reaches the agent, and claude/codex receive it on **stdin** so a body starting with `--` is never misparsed as a flag. > Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in > `agent-queue.sh` (it's the single place each engine is mapped). ## Commands | Command | What it does | | ------- | ------------ | | `init` | create the `queue/` folders | | `add [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` | | `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) | | `status` | kanban counts + running-worker table (marks `⚠ stalled`; per-job insights sub-line) | | `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) | | `insights [job]` | per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see **Insights**) | | `recover` | reclaim orphaned `building/` jobs (dead worker) back to `inbox/` (see **Resilience**) | | `dash [--interval N]` | **interactive Node dashboard** — navigable numbered job list with single-key actions (see below) | | `stop` | kill running workers + the run loop | | `logs [-f]` | print / follow a job's log | | `promote ` | advance one stage forward: `review → testing → shipped` | | `ship ` | **manual gate:** move a `testing/` (QA) job → `shipped/` | | `reject ` | send a `review/` or `testing/` job → `failed/` | | `requeue ` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run | | `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` | Only one `run` loop may be active per queue — a second `run` against the same queue is refused while the first is alive (a stale `daemon.pid` is cleared). ### Interactive dashboard (`dash`) `dash` is a single-script, menu-driven control panel (think a tiny "glassbox"). It shows the kanban counts, live RUNNING workers (engine, elapsed, last log line, stall), a **navigable numbered JOBS list**, and RECENT finished jobs — and lets you act on jobs without leaving the screen. Every action shells out to `agent-queue.sh`, so the script stays the single source of truth. | Key | Action | | --- | ------ | | `↑`/`↓`, `j`/`k`, `1`–`9` | select a job in the JOBS list | | `enter` / `l` | view the selected job's log (live, auto-refreshing) | | `p` | promote (`review → testing → shipped`) | | `s` | ship (`testing`/QA → `shipped`, the manual gate) | | `x` | reject (`review`/`testing` → `failed`) — asks `y/n` | | `u` | requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n` | | `r` | start the `run` loop (detached → `logs/run-loop.log`) | | `S` | stop the run loop + running workers | | `g` | refresh now · `?`/`h` help · `q`/`Ctrl-C` quit | The header shows a `● run loop pid N` / `○ run loop stopped` indicator. Run it in a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view. ## Via `bytelyst-cli.sh` Wired into the repo's unified CLI (no GitHub token required for this subcommand): ```bash ./bytelyst-cli.sh agent-queue run --max 3 # full passthrough ./bytelyst-cli.sh aq status # short alias ``` ## Folder layout ``` queue/ inbox/ # drop / queued .md files (oldest eligible picked first) building/ # currently executing (agent running) review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote) testing/ # verify passed (QA) — awaiting manual `ship` shipped/ # manually shipped — the terminal success stage failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject logs/ # .log — full agent + verify output locks/ # per-key flock files (Linux hardening; unused on macOS) .state/ # .meta heartbeats + daemon.pid (runtime only) .archive/ # / — logs+meta moved here by `clean` ``` **`result=` values** written to `.meta`: `review`, `testing`, `shipped`, `failed`, `timeout`, `verify_failed`, `rejected`, `requeued`, `capability_mismatch` (host missing a required capability — agent never launched), `no_engine` (an `engine-class` had no available engine), `retries_exhausted` (failed after `retry.max` attempts — single-host dead-letter stand-in), `retry_scheduled` (transient: requeued for another attempt), `recovered` (transient: an orphan was reclaimed to `inbox/`). ## Profiles & deps ### Profiles (roadmap §6) A **profile** is a reusable role preset in `profiles/.md`. A job opts in with `profile: ` and inherits any of these fields it does **not** set itself: `verify` (from the profile's `default-verify`), `capabilities`, `engine-class`, `prefers-engine`, `allowed-scope`, `review-policy`. The profile's `persona` block is **prepended** to the body sent to the engine (the job `.md` on disk is unchanged; secrets are never logged). Resolution runs **before** the capability gate and engine resolution, so inherited caps / engine-class take effect. **Precedence:** `job field > profile field > built-in default`. Set `AGENT_QUEUE_PROFILES` to point at a different catalog directory (defaults to `./profiles`). Starter catalog: `developer`, `backend-engineer`, `frontend-engineer`, `ux-designer`, `ui-designer`, `qa`, `reviewer`, `docs-writer`, and a reserved `planner`. Each presets `name`, `persona`, `capabilities`, `default-verify`, `engine-class`, `prefers-engine`, `allowed-scope`, and `review-policy`. **allowed-scope (warn-only this phase).** After a run on a git `cwd`, changed paths outside the profile/job `allowed-scope` globs (`dir/**` matches the whole subtree) are logged as a `WARNING` and recorded as `scope_warning=` in the meta — **non-blocking** (the job is not failed). `path_in_scope` is exposed as a unit-testable function. ### deps / DAG, single host (roadmap §5) `deps: [keyA, keyB]` references other jobs by their author-controlled `idempotency-key`. A dep is **satisfied** when a job with that key is in `shipped/` (default), or in `shipped/` **or** `testing/` when the dependent job sets `deps-mode: soft`. A job with unmet deps is **blocked**: it is skipped in inbox selection (never launched, never failed) and surfaced in `status` as `blocked (waiting on: )`, then re-evaluated every loop until its deps are met. `add` performs **submit-time cycle detection** over the inbox + active-stage dep graph and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2. ## Resilience (crash recovery & work preservation) Single-host implementations of the durability model (roadmap §25): - **Orphan recovery.** A job left in `building/` whose worker process is dead (no live `pid`, PID-reuse-guarded by `pidstart`) is an orphan from a previous crash/power-off. On `run` startup and on every loop iteration (or on demand via `agent-queue.sh recover`) it is moved back to `inbox/` with `attempts` incremented. Recovery is **idempotent** — once moved out of `building/` it is never recovered twice. - **WIP checkpointing.** When a job's `cwd` is a git repo, the worker creates/checks out a dedicated branch **`aq/wip/`** at start and commits any changes to it on **every** exit path — success, failure, timeout, and SIGTERM/SIGINT (via a trap). It **never** commits to `main`/your current branch. Non-git `cwd` is skipped cleanly. `wip_branch` / `wip_base` / `wip_commit` are recorded in the meta. - **Resume.** When an orphan/retry of a job whose `aq/wip/` branch already exists is relaunched, that branch is checked out first so the agent **continues from the checkpoint** instead of from zero. - **Retry policy** (`retry` frontmatter, now active). On a failure whose class is in `on` (`crash`/`agent_error` for a non-zero agent exit, `timeout`, `verify_failed`) the job is requeued to `inbox/` honoring `backoff` (selection skips it until `next_eligible`) up to `max` attempts; on exhaustion it lands in `failed/` with `result=retries_exhausted`, preserving the WIP branch + full log. No `retry` = no retry (Phase-0 behavior). All bookkeeping (`attempts`, `next_eligible`, `wip_*`) is append-only in the meta and re-derivable from the meta + folder location, so recovery is crash-safe. ## Insights (metrics & token accounting) Each finished run records into `.meta`: `duration_s`, `exit`, `result`, `attempts`, and — for a git `cwd` — `files_changed` / `lines_added` / `lines_deleted` (diffed `wip_base..HEAD`). A single `parse_usage ` adapter extracts `model` / `tokens_in` / `tokens_out` / `tokens_cached` / `cost_usd` / `turns` / `tool_calls` when the engine exposes them. ```bash agent-queue.sh insights # full metrics for one job agent-queue.sh insights # recent-jobs table + per-engine rollup ``` > **Token caveat (honest):** real usage is captured only where the engine surfaces > it. A cooperating wrapper may emit a machine-readable `AQ_USAGE key=value …` line; > otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin > session metrics + Copilot are API-only and currently TODO in `parse_usage`). When > a value is not provider-reported it is **omitted or flagged `usage_estimated`** — > numbers are never fabricated. The per-engine rollup marks totals that include any > estimated value with `*`. ## Config (env overrides) | Var | Default | Meaning | | --- | ------- | ------- | | `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live | | `AGENT_QUEUE_MAX` | `3` | max concurrent agents (override per-run with `run --max N`) | | `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter | | `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) | | `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it | | `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` | | `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN` | autodetected | override CLI binary paths | | `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes | ## ⚠️ Safety Running agents with `yolo: true` means **no approval prompts** — they will edit files, run shell commands, and commit unattended. Mitigate: - Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`"). - Tell prompts **not to `git push`** — review commits before they leave your machine. - **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are serialized, so two agents never run in one repo at once — even at `--max 2+`. - Set a `timeout:` on long jobs so a wedged agent can't run forever. - Watch cost: each job is a full agent session. ### Portability notes - **macOS** has no `flock`/`timeout`; locking relies on the single run-loop (enforced by the second-run refusal) and timeouts use a pure-bash watchdog. Install coreutils (`gtimeout`) for hard process-tree kills. - **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening. ## Roadmap / nice-to-haves - [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd). - [x] Per-job `timeout:` with hard kill (or bash watchdog fallback). - [x] Stall detection in `status`/`dash`. - [x] `requeue` failed jobs + `clean`/archive old runs. - [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`. - [ ] `--push` opt-in policy + commit review gate. - [ ] Optional notifications (Slack/desktop) on done/failed/stall. - [ ] Persisted run-loop as a daemon/service with auto-restart.