Redesign the kanban runner stages from inbox->doing->done/failed to inbox->building->review->testing->shipped (+ failed): - worker: agent rc=0 lands in review/, then runs the configurable verify command (frontmatter verify: / AGENT_QUEUE_VERIFY) in cwd; pass -> testing/ (QA), fail -> failed/, none -> parks in review/ - new commands: ship (testing->shipped, manual gate), promote (advance one stage), reject (review/testing->failed); requeue now also pulls from review/testing - status + dashboard.mjs render all six stages; RECENT panel labels shipped/testing/review/verify_failed/timeout/rejected - README: new lifecycle diagram, verify: frontmatter, result= glossary, command table + folder layout - selftest: assert no-verify->review, verify-pass->testing->ship->shipped, verify-fail->failed - rename queue/doing->building, queue/done->review; add testing/ shipped/
197 lines
8.4 KiB
Markdown
197 lines
8.4 KiB
Markdown
# agent-queue
|
|
|
|
A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
|
|
**Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder,
|
|
and they get executed (in auto-approve mode) one slot at a time, moving through
|
|
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
|
|
|
|
**Build/ship lifecycle — auto-QA, manual ship:**
|
|
|
|
```
|
|
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
|
|
(queued) (agent (rc=0; (verify (you ran
|
|
running) awaiting passed — `ship`)
|
|
verify) QA gate)
|
|
│
|
|
agent rc≠0 / │ verify fails
|
|
timeout ──────────┴──────────────▶ failed
|
|
```
|
|
|
|
- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs
|
|
automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:` →
|
|
the job parks in `review/` for a manual `promote`.
|
|
- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is
|
|
never automatic.
|
|
|
|
> **Why this exists:** the agent CLIs ship a minimal local interface (no built-in
|
|
> batch/queue/dashboard — that lives in their *cloud* products). This is the
|
|
> zero-dependency bash glue that turns "run one prompt interactively" into
|
|
> "queue many and walk away."
|
|
|
|
---
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
cd learning_ai_devops_tools/agent-queue
|
|
chmod +x agent-queue.sh
|
|
./agent-queue.sh init
|
|
|
|
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
|
|
./agent-queue.sh add ~/roadmaps/UX-2.md \
|
|
--engine devin \
|
|
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
|
|
--yolo
|
|
|
|
# start processing (foreground; Ctrl-C to stop). Run up to 2 agents at once.
|
|
./agent-queue.sh run --max 2
|
|
```
|
|
|
|
In a **second terminal**, watch progress:
|
|
|
|
```bash
|
|
./agent-queue.sh watch
|
|
```
|
|
|
|
```
|
|
AGENT QUEUE /…/agent-queue/queue
|
|
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
|
|
|
|
RUNNING
|
|
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
|
|
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
|
|
```
|
|
|
|
---
|
|
|
|
## How a task is configured
|
|
|
|
Each `.md` carries optional **frontmatter** telling the runner which engine to use,
|
|
which directory to run in, and whether to auto-approve:
|
|
|
|
```md
|
|
---
|
|
engine: devin # devin | claude | codex (default: $AGENT_QUEUE_ENGINE)
|
|
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
|
|
yolo: true # auto-approve ALL tools (default: true)
|
|
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
|
|
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
|
|
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
|
|
# pass → testing/ (QA), fail → failed/
|
|
# (omit to park in review/ for manual promote)
|
|
---
|
|
|
|
# Your task / roadmap goes here
|
|
...
|
|
```
|
|
|
|
`add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't
|
|
already have a `---` block.
|
|
|
|
## Engine mapping
|
|
|
|
| `engine:` | Command run | Auto-approve flag (`yolo: true`) |
|
|
| --------- | ----------- | -------------------------------- |
|
|
| `devin` | `devin -p --prompt-file <body>` | `--permission-mode dangerous` |
|
|
| `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` |
|
|
| `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` |
|
|
|
|
The frontmatter is **stripped** before the body reaches the agent, and
|
|
claude/codex receive it on **stdin** so a body starting with `--` is never
|
|
misparsed as a flag.
|
|
|
|
> Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in
|
|
> `agent-queue.sh` (it's the single place each engine is mapped).
|
|
|
|
## Commands
|
|
|
|
| Command | What it does |
|
|
| ------- | ------------ |
|
|
| `init` | create the `queue/` folders |
|
|
| `add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` |
|
|
| `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) |
|
|
| `status` | kanban counts + running-worker table (marks `⚠ stalled` workers) |
|
|
| `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) |
|
|
| `dash [--interval N]` | richer **Node** live dashboard — running workers (engine, elapsed, last log line, stall) + recent shipped/failed |
|
|
| `stop` | kill running workers + the run loop |
|
|
| `logs <job> [-f]` | print / follow a job's log |
|
|
| `promote <job>` | advance one stage forward: `review → testing → shipped` |
|
|
| `ship <job>` | **manual gate:** move a `testing/` (QA) job → `shipped/` |
|
|
| `reject <job>` | send a `review/` or `testing/` job → `failed/` |
|
|
| `requeue <job>` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run |
|
|
| `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` |
|
|
|
|
Only one `run` loop may be active per queue — a second `run` against the same
|
|
queue is refused while the first is alive (a stale `daemon.pid` is cleared).
|
|
|
|
## Via `bytelyst-cli.sh`
|
|
|
|
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
|
|
|
|
```bash
|
|
./bytelyst-cli.sh agent-queue run --max 2 # full passthrough
|
|
./bytelyst-cli.sh aq status # short alias
|
|
```
|
|
|
|
## Folder layout
|
|
|
|
```
|
|
queue/
|
|
inbox/ # drop / queued .md files (oldest eligible picked first)
|
|
building/ # currently executing (agent running)
|
|
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
|
|
testing/ # verify passed (QA) — awaiting manual `ship`
|
|
shipped/ # manually shipped — the terminal success stage
|
|
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
|
|
logs/ # <job>.log — full agent + verify output
|
|
locks/ # per-key flock files (Linux hardening; unused on macOS)
|
|
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
|
|
.archive/ # <ts>/ — logs+meta moved here by `clean`
|
|
```
|
|
|
|
**`result=` values** written to `<job>.meta`: `review`, `testing`, `shipped`,
|
|
`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`.
|
|
|
|
## Config (env overrides)
|
|
|
|
| Var | Default | Meaning |
|
|
| --- | ------- | ------- |
|
|
| `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live |
|
|
| `AGENT_QUEUE_MAX` | `2` | max concurrent agents |
|
|
| `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter |
|
|
| `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) |
|
|
| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it |
|
|
| `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` |
|
|
| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` | autodetected | override CLI binary paths |
|
|
| `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes |
|
|
|
|
## ⚠️ Safety
|
|
|
|
Running agents with `yolo: true` means **no approval prompts** — they will edit files,
|
|
run shell commands, and commit unattended. Mitigate:
|
|
|
|
- Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`").
|
|
- Tell prompts **not to `git push`** — review commits before they leave your machine.
|
|
- **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are
|
|
serialized, so two agents never run in one repo at once — even at `--max 2+`.
|
|
- Set a `timeout:` on long jobs so a wedged agent can't run forever.
|
|
- Watch cost: each job is a full agent session.
|
|
|
|
### Portability notes
|
|
|
|
- **macOS** has no `flock`/`timeout`; locking relies on the single run-loop
|
|
(enforced by the second-run refusal) and timeouts use a pure-bash watchdog.
|
|
Install coreutils (`gtimeout`) for hard process-tree kills.
|
|
- **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening.
|
|
|
|
## Roadmap / nice-to-haves
|
|
|
|
- [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd).
|
|
- [x] Per-job `timeout:` with hard kill (or bash watchdog fallback).
|
|
- [x] Stall detection in `status`/`dash`.
|
|
- [x] `requeue` failed jobs + `clean`/archive old runs.
|
|
- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`.
|
|
- [ ] `--push` opt-in policy + commit review gate.
|
|
- [ ] Optional notifications (Slack/desktop) on done/failed/stall.
|
|
- [ ] Persisted run-loop as a daemon/service with auto-restart.
|