bytelyst-devops-tools/agent-queue/README.md
saravanakumardb1 dde677f4b9 feat(agent-queue): interactive dashboard — navigable job list + single-key actions
Turn dash into a menu-driven control panel (single mjs script):
- numbered, arrow/j-k/1-9 selectable JOBS list (review/testing/failed/inbox)
- single-key actions wired to agent-queue.sh (single source of truth):
  p promote, s ship, x reject, u requeue (reject/requeue confirm y/n)
- enter/l opens a live log viewer; r starts a detached run loop, S stops it
- run-loop pid indicator, transient action flashes, ? help overlay
- non-TTY falls back to the read-only live view
- README: dash command + interactive key table
2026-05-29 16:19:23 -07:00

220 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# agent-queue
A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
**Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder,
and they get executed (in auto-approve mode) one slot at a time, moving through
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
**Build/ship lifecycle — auto-QA, manual ship:**
```
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
(queued) (agent (rc=0; (verify (you ran
running) awaiting passed — `ship`)
verify) QA gate)
agent rc≠0 / │ verify fails
timeout ──────────┴──────────────▶ failed
```
- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs
automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:`
the job parks in `review/` for a manual `promote`.
- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is
never automatic.
> **Why this exists:** the agent CLIs ship a minimal local interface (no built-in
> batch/queue/dashboard — that lives in their *cloud* products). This is the
> zero-dependency bash glue that turns "run one prompt interactively" into
> "queue many and walk away."
---
## Quick start
```bash
cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
--engine devin \
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
--yolo
# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3
```
In a **second terminal**, watch progress:
```bash
./agent-queue.sh watch
```
```
AGENT QUEUE /…/agent-queue/queue
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
RUNNING
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
```
---
## How a task is configured
Each `.md` carries optional **frontmatter** telling the runner which engine to use,
which directory to run in, and whether to auto-approve:
```md
---
engine: devin # devin | claude | codex (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
yolo: true # auto-approve ALL tools (default: true)
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
# pass → testing/ (QA), fail → failed/
# (omit to park in review/ for manual promote)
---
# Your task / roadmap goes here
...
```
`add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't
already have a `---` block.
## Engine mapping
| `engine:` | Command run | Auto-approve flag (`yolo: true`) |
| --------- | ----------- | -------------------------------- |
| `devin` | `devin -p --prompt-file <body>` | `--permission-mode dangerous` |
| `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` |
| `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` |
The frontmatter is **stripped** before the body reaches the agent, and
claude/codex receive it on **stdin** so a body starting with `--` is never
misparsed as a flag.
> Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in
> `agent-queue.sh` (it's the single place each engine is mapped).
## Commands
| Command | What it does |
| ------- | ------------ |
| `init` | create the `queue/` folders |
| `add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` |
| `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) |
| `status` | kanban counts + running-worker table (marks `⚠ stalled` workers) |
| `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) |
| `dash [--interval N]` | **interactive Node dashboard** — navigable numbered job list with single-key actions (see below) |
| `stop` | kill running workers + the run loop |
| `logs <job> [-f]` | print / follow a job's log |
| `promote <job>` | advance one stage forward: `review → testing → shipped` |
| `ship <job>` | **manual gate:** move a `testing/` (QA) job → `shipped/` |
| `reject <job>` | send a `review/` or `testing/` job → `failed/` |
| `requeue <job>` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run |
| `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` |
Only one `run` loop may be active per queue — a second `run` against the same
queue is refused while the first is alive (a stale `daemon.pid` is cleared).
### Interactive dashboard (`dash`)
`dash` is a single-script, menu-driven control panel (think a tiny "glassbox").
It shows the kanban counts, live RUNNING workers (engine, elapsed, last log
line, stall), a **navigable numbered JOBS list**, and RECENT finished jobs — and
lets you act on jobs without leaving the screen. Every action shells out to
`agent-queue.sh`, so the script stays the single source of truth.
| Key | Action |
| --- | ------ |
| `↑`/`↓`, `j`/`k`, `1``9` | select a job in the JOBS list |
| `enter` / `l` | view the selected job's log (live, auto-refreshing) |
| `p` | promote (`review → testing → shipped`) |
| `s` | ship (`testing`/QA → `shipped`, the manual gate) |
| `x` | reject (`review`/`testing` → `failed`) — asks `y/n` |
| `u` | requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n` |
| `r` | start the `run` loop (detached → `logs/run-loop.log`) |
| `S` | stop the run loop + running workers |
| `g` | refresh now · `?`/`h` help · `q`/`Ctrl-C` quit |
The header shows a `● run loop pid N` / `○ run loop stopped` indicator. Run it in
a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.
## Via `bytelyst-cli.sh`
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
```bash
./bytelyst-cli.sh agent-queue run --max 3 # full passthrough
./bytelyst-cli.sh aq status # short alias
```
## Folder layout
```
queue/
inbox/ # drop / queued .md files (oldest eligible picked first)
building/ # currently executing (agent running)
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
testing/ # verify passed (QA) — awaiting manual `ship`
shipped/ # manually shipped — the terminal success stage
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
logs/ # <job>.log — full agent + verify output
locks/ # per-key flock files (Linux hardening; unused on macOS)
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
.archive/ # <ts>/ — logs+meta moved here by `clean`
```
**`result=` values** written to `<job>.meta`: `review`, `testing`, `shipped`,
`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`.
## Config (env overrides)
| Var | Default | Meaning |
| --- | ------- | ------- |
| `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live |
| `AGENT_QUEUE_MAX` | `3` | max concurrent agents (override per-run with `run --max N`) |
| `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter |
| `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) |
| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it |
| `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` |
| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` | autodetected | override CLI binary paths |
| `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes |
## ⚠️ Safety
Running agents with `yolo: true` means **no approval prompts** — they will edit files,
run shell commands, and commit unattended. Mitigate:
- Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`").
- Tell prompts **not to `git push`** — review commits before they leave your machine.
- **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are
serialized, so two agents never run in one repo at once — even at `--max 2+`.
- Set a `timeout:` on long jobs so a wedged agent can't run forever.
- Watch cost: each job is a full agent session.
### Portability notes
- **macOS** has no `flock`/`timeout`; locking relies on the single run-loop
(enforced by the second-run refusal) and timeouts use a pure-bash watchdog.
Install coreutils (`gtimeout`) for hard process-tree kills.
- **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening.
## Roadmap / nice-to-haves
- [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd).
- [x] Per-job `timeout:` with hard kill (or bash watchdog fallback).
- [x] Stall detection in `status`/`dash`.
- [x] `requeue` failed jobs + `clean`/archive old runs.
- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`.
- [ ] `--push` opt-in policy + commit review gate.
- [ ] Optional notifications (Slack/desktop) on done/failed/stall.
- [ ] Persisted run-loop as a daemon/service with auto-restart.