bytelyst-devops-tools/agent-queue/README.md
saravanakumardb1 af1bc6904e feat(agent-queue): build/ship lifecycle with auto-QA verify gate + manual ship
Redesign the kanban runner stages from inbox->doing->done/failed to
inbox->building->review->testing->shipped (+ failed):

- worker: agent rc=0 lands in review/, then runs the configurable
  verify command (frontmatter verify: / AGENT_QUEUE_VERIFY) in cwd;
  pass -> testing/ (QA), fail -> failed/, none -> parks in review/
- new commands: ship (testing->shipped, manual gate), promote
  (advance one stage), reject (review/testing->failed); requeue now
  also pulls from review/testing
- status + dashboard.mjs render all six stages; RECENT panel labels
  shipped/testing/review/verify_failed/timeout/rejected
- README: new lifecycle diagram, verify: frontmatter, result= glossary,
  command table + folder layout
- selftest: assert no-verify->review, verify-pass->testing->ship->shipped,
  verify-fail->failed
- rename queue/doing->building, queue/done->review; add testing/ shipped/
2026-05-29 16:03:01 -07:00

8.4 KiB

agent-queue

A zero-dependency folder "kanban" runner for headless coding-agent CLIs — Devin, Claude Code, and OpenAI Codex. Drop prompt .md files into a folder, and they get executed (in auto-approve mode) one slot at a time, moving through inbox → building → review → testing → shipped (plus failed) with live status.

Build/ship lifecycle — auto-QA, manual ship:

inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
  (queued)  (agent     (rc=0;    (verify    (you ran
            running)   awaiting  passed —    `ship`)
                       verify)   QA gate)
                          │
        agent rc≠0 /      │ verify fails
        timeout ──────────┴──────────────▶ failed
  • Auto: agent exits 0 → review/. If a verify: command is configured it runs automatically: pass → testing/ (QA), fail → failed/. No verify: → the job parks in review/ for a manual promote.
  • Manual: you ship a testing/ job → shipped/ (the human gate). Shipping is never automatic.

Why this exists: the agent CLIs ship a minimal local interface (no built-in batch/queue/dashboard — that lives in their cloud products). This is the zero-dependency bash glue that turns "run one prompt interactively" into "queue many and walk away."


Quick start

cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init

# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
  --engine devin \
  --cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
  --yolo

# start processing (foreground; Ctrl-C to stop). Run up to 2 agents at once.
./agent-queue.sh run --max 2

In a second terminal, watch progress:

./agent-queue.sh watch
  AGENT QUEUE  /…/agent-queue/queue
  inbox 3   building 2   review 1   testing 2   shipped 5   failed 0   running 2/2

  RUNNING
    20260528-2130__UX-2        devin     4m12s  pid 51234  ⏺ Edited src/app/dashboard/items/page.tsx
    20260528-2131__UX-3        claude    1m02s  pid 51290  Running: pnpm typecheck

How a task is configured

Each .md carries optional frontmatter telling the runner which engine to use, which directory to run in, and whether to auto-approve:

---
engine: devin          # devin | claude | codex   (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes   (default: cwd when added)
yolo: true             # auto-approve ALL tools      (default: true)
lock: my-repo          # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m           # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test   # optional auto-QA gate. Runs in cwd after rc=0:
                       #   pass → testing/ (QA),  fail → failed/
                       #   (omit to park in review/ for manual promote)
---

# Your task / roadmap goes here
...

add --engine/--cwd/--yolo will inject this frontmatter for you if the file doesn't already have a --- block.

Engine mapping

engine: Command run Auto-approve flag (yolo: true)
devin devin -p --prompt-file <body> --permission-mode dangerous
claude claude -p (body on stdin) --dangerously-skip-permissions
codex codex exec (body on stdin) --dangerously-bypass-approvals-and-sandbox

The frontmatter is stripped before the body reaches the agent, and claude/codex receive it on stdin so a body starting with -- is never misparsed as a flag.

Flags drift between CLI versions — if one changes, edit build_agent_cmd() in agent-queue.sh (it's the single place each engine is mapped).

Commands

Command What it does
init create the queue/ folders
add <file> [--engine E] [--cwd P] [--yolo|--no-yolo] queue a prompt into inbox/
run [--max N] [--engine E] [--once] process the inbox (foreground loop)
status kanban counts + running-worker table (marks ⚠ stalled workers)
watch [interval] live status (bash), redrawn every N seconds (default 2)
dash [--interval N] richer Node live dashboard — running workers (engine, elapsed, last log line, stall) + recent shipped/failed
stop kill running workers + the run loop
logs <job> [-f] print / follow a job's log
promote <job> advance one stage forward: review → testing → shipped
ship <job> manual gate: move a testing/ (QA) job → shipped/
reject <job> send a review/ or testing/ job → failed/
requeue <job> move a failed/review/testing job back to inbox/ for a fresh run
clean [--keep N] archive finished logs+meta beyond the newest N (default 50) into queue/.archive/

Only one run loop may be active per queue — a second run against the same queue is refused while the first is alive (a stale daemon.pid is cleared).

Via bytelyst-cli.sh

Wired into the repo's unified CLI (no GitHub token required for this subcommand):

./bytelyst-cli.sh agent-queue run --max 2     # full passthrough
./bytelyst-cli.sh aq status                   # short alias

Folder layout

queue/
  inbox/    # drop / queued .md files (oldest eligible picked first)
  building/ # currently executing (agent running)
  review/   # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
  testing/  # verify passed (QA) — awaiting manual `ship`
  shipped/  # manually shipped — the terminal success stage
  failed/   # non-zero exit, bad cwd, timeout, verify failure, or manual reject
  logs/     # <job>.log — full agent + verify output
  locks/    # per-key flock files (Linux hardening; unused on macOS)
  .state/   # <job>.meta heartbeats + daemon.pid (runtime only)
  .archive/ # <ts>/ — logs+meta moved here by `clean`

result= values written to <job>.meta: review, testing, shipped, failed, timeout, verify_failed, rejected, requeued.

Config (env overrides)

Var Default Meaning
AGENT_QUEUE_ROOT ./queue where the kanban folders live
AGENT_QUEUE_MAX 2 max concurrent agents
AGENT_QUEUE_ENGINE devin default engine when none in frontmatter
AGENT_QUEUE_POLL 3 inbox poll interval (seconds)
AGENT_QUEUE_VERIFY (empty) default auto-QA verify command; per-job verify: overrides it
AGENT_QUEUE_STALL_MIN 10 minutes of unchanged log before a worker is ⚠ stalled
DEVIN_BIN / CLAUDE_BIN / CODEX_BIN autodetected override CLI binary paths
FLOCK_BIN / TIMEOUT_BIN autodetected flock (lock hardening) and timeout/gtimeout (hard timeouts); absent on stock macOS — see notes

⚠️ Safety

Running agents with yolo: true means no approval prompts — they will edit files, run shell commands, and commit unattended. Mitigate:

  • Prefer scope-locked prompt files (e.g. "edit only under dashboards/tracker-web/").
  • Tell prompts not to git push — review commits before they leave your machine.
  • Same-repo safety is automatic: jobs sharing a cwd (or lock: key) are serialized, so two agents never run in one repo at once — even at --max 2+.
  • Set a timeout: on long jobs so a wedged agent can't run forever.
  • Watch cost: each job is a full agent session.

Portability notes

  • macOS has no flock/timeout; locking relies on the single run-loop (enforced by the second-run refusal) and timeouts use a pure-bash watchdog. Install coreutils (gtimeout) for hard process-tree kills.
  • Linux (incl. Gitea CI) uses flock + timeout for cross-process hardening.

Roadmap / nice-to-haves

  • Per-repo lock to serialize same-repo jobs automatically (lock: / cwd).
  • Per-job timeout: with hard kill (or bash watchdog fallback).
  • Stall detection in status/dash.
  • requeue failed jobs + clean/archive old runs.
  • Build/ship lifecycle: review → testing → shipped with auto-QA verify: gate + manual ship.
  • --push opt-in policy + commit review gate.
  • Optional notifications (Slack/desktop) on done/failed/stall.
  • Persisted run-loop as a daemon/service with auto-restart.