diff --git a/agent-queue/README.md b/agent-queue/README.md index 864c650..9366abf 100644 --- a/agent-queue/README.md +++ b/agent-queue/README.md @@ -3,7 +3,25 @@ A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs — **Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder, and they get executed (in auto-approve mode) one slot at a time, moving through -`inbox → doing → done/failed` with live status. +`inbox → building → review → testing → shipped` (plus `failed`) with live status. + +**Build/ship lifecycle — auto-QA, manual ship:** + +``` +inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped + (queued) (agent (rc=0; (verify (you ran + running) awaiting passed — `ship`) + verify) QA gate) + │ + agent rc≠0 / │ verify fails + timeout ──────────┴──────────────▶ failed +``` + +- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs + automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:` → + the job parks in `review/` for a manual `promote`. +- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is + never automatic. > **Why this exists:** the agent CLIs ship a minimal local interface (no built-in > batch/queue/dashboard — that lives in their *cloud* products). This is the @@ -37,7 +55,7 @@ In a **second terminal**, watch progress: ``` AGENT QUEUE /…/agent-queue/queue - inbox 3 doing 2 done 5 failed 0 running 2/2 + inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2 RUNNING 20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx @@ -58,6 +76,9 @@ cwd: /abs/path/to/repo # where the agent executes (default: cwd when added) yolo: true # auto-approve ALL tools (default: true) lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout) +verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0: + # pass → testing/ (QA), fail → failed/ + # (omit to park in review/ for manual promote) --- # Your task / roadmap goes here @@ -91,10 +112,13 @@ misparsed as a flag. | `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) | | `status` | kanban counts + running-worker table (marks `⚠ stalled` workers) | | `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) | -| `dash [--interval N]` | richer **Node** live dashboard — running workers (engine, elapsed, last log line, stall) + recent done/failed | +| `dash [--interval N]` | richer **Node** live dashboard — running workers (engine, elapsed, last log line, stall) + recent shipped/failed | | `stop` | kill running workers + the run loop | | `logs [-f]` | print / follow a job's log | -| `requeue ` | move a failed job back to `inbox/` for a fresh run | +| `promote ` | advance one stage forward: `review → testing → shipped` | +| `ship ` | **manual gate:** move a `testing/` (QA) job → `shipped/` | +| `reject ` | send a `review/` or `testing/` job → `failed/` | +| `requeue ` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run | | `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` | Only one `run` loop may be active per queue — a second `run` against the same @@ -114,15 +138,20 @@ Wired into the repo's unified CLI (no GitHub token required for this subcommand) ``` queue/ inbox/ # drop / queued .md files (oldest eligible picked first) - doing/ # currently executing - done/ # exited 0 - failed/ # non-zero exit, bad cwd, or timeout (result=timeout) - logs/ # .log — full agent output + building/ # currently executing (agent running) + review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote) + testing/ # verify passed (QA) — awaiting manual `ship` + shipped/ # manually shipped — the terminal success stage + failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject + logs/ # .log — full agent + verify output locks/ # per-key flock files (Linux hardening; unused on macOS) .state/ # .meta heartbeats + daemon.pid (runtime only) .archive/ # / — logs+meta moved here by `clean` ``` +**`result=` values** written to `.meta`: `review`, `testing`, `shipped`, +`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`. + ## Config (env overrides) | Var | Default | Meaning | @@ -131,6 +160,7 @@ queue/ | `AGENT_QUEUE_MAX` | `2` | max concurrent agents | | `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter | | `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) | +| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it | | `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` | | `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` | autodetected | override CLI binary paths | | `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes | @@ -160,6 +190,7 @@ run shell commands, and commit unattended. Mitigate: - [x] Per-job `timeout:` with hard kill (or bash watchdog fallback). - [x] Stall detection in `status`/`dash`. - [x] `requeue` failed jobs + `clean`/archive old runs. +- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`. - [ ] `--push` opt-in policy + commit review gate. - [ ] Optional notifications (Slack/desktop) on done/failed/stall. - [ ] Persisted run-loop as a daemon/service with auto-restart. diff --git a/agent-queue/agent-queue.sh b/agent-queue/agent-queue.sh index ad79082..e75ab30 100755 --- a/agent-queue/agent-queue.sh +++ b/agent-queue/agent-queue.sh @@ -4,10 +4,15 @@ # # Drop a prompt .md file into queue/inbox/, and `agent-queue run` will: # 1. pick the oldest file (respecting --max concurrency), -# 2. move it inbox/ -> doing/, +# 2. move it inbox/ -> building/, # 3. launch the chosen agent CLI (devin | claude | codex) in --yolo mode, -# 4. on success move doing/ -> done/, on failure -> failed/, -# 5. write a per-job log + live state so `status`/`watch` can show progress. +# 4. on agent rc=0 move building/ -> review/, then run the auto-QA verify gate: +# verify pass -> testing/ verify fail -> failed/ (no verify -> stays in review/) +# 5. on agent failure/timeout move building/ -> failed/, +# 6. you manually `ship` testing/ -> shipped/ (the human gate), +# 7. write a per-job log + live state so `status`/`watch` can show progress. +# +# Lifecycle: inbox -> building -> review -> testing -> shipped (+ failed) # # Per-task config travels in YAML-ish frontmatter at the top of the .md: # --- @@ -16,7 +21,8 @@ # yolo: true # auto-approve all tools (default: true) # --- # -# Subcommands: init | add | run | status | watch | stop | logs | help +# Subcommands: init | add | run | status | watch | dash | stop | logs | +# promote | ship | reject | requeue | clean | help # set -uo pipefail @@ -24,8 +30,10 @@ set -uo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" QUEUE_ROOT="${AGENT_QUEUE_ROOT:-$SCRIPT_DIR/queue}" INBOX="$QUEUE_ROOT/inbox" -DOING="$QUEUE_ROOT/doing" -DONE="$QUEUE_ROOT/done" +BUILDING="$QUEUE_ROOT/building" +REVIEW="$QUEUE_ROOT/review" +TESTING="$QUEUE_ROOT/testing" +SHIPPED="$QUEUE_ROOT/shipped" FAILED="$QUEUE_ROOT/failed" LOGS="$QUEUE_ROOT/logs" STATE="$QUEUE_ROOT/.state" @@ -38,6 +46,11 @@ POLL_SECONDS="${AGENT_QUEUE_POLL:-3}" # A running worker is flagged "stalled" if its log has not changed in this many # minutes (no new agent output) — surfaced in status + dash. STALL_MIN="${AGENT_QUEUE_STALL_MIN:-10}" +# Auto-QA verify command. After an agent exits 0 the job lands in review/; if a +# verify command is set (frontmatter `verify:` overrides this default) it runs in +# the job's cwd: pass -> testing/ (QA), fail -> failed/. Empty default = jobs park +# in review/ for manual `promote`. Shipping (testing -> shipped) is always manual. +DEFAULT_VERIFY="${AGENT_QUEUE_VERIFY:-}" # flock is used for cross-process lock hardening when available (Linux). macOS # has no flock; mutual exclusion there relies on the single run-loop (see cmd_run). @@ -63,7 +76,7 @@ err() { printf '%s[agent-queue]%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; } die() { err "$*"; exit 1; } # ── Init ──────────────────────────────────────────────────────────── -ensure_dirs() { mkdir -p "$INBOX" "$DOING" "$DONE" "$FAILED" "$LOGS" "$STATE" "$LOCKS"; } +ensure_dirs() { mkdir -p "$INBOX" "$BUILDING" "$REVIEW" "$TESTING" "$SHIPPED" "$FAILED" "$LOGS" "$STATE" "$LOCKS"; } # ── Frontmatter parsing ───────────────────────────────────────────── # fm_get @@ -291,19 +304,45 @@ run_worker() { fi rm -f "$tmo_flag" - echo "ended=$(date +%s)" >> "$metaf" echo "exit=$rc" >> "$metaf" if $timed_out; then mv "$doing_file" "$FAILED/" 2>/dev/null echo "result=timeout" >> "$metaf" + echo "ended=$(date +%s)" >> "$metaf" echo "TIMED OUT after ${tmo}s (rc=$rc): $(date)" >> "$logf" elif [[ $rc -eq 0 ]]; then - mv "$doing_file" "$DONE/" 2>/dev/null - echo "result=done" >> "$metaf" - echo "completed OK (rc=0): $(date)" >> "$logf" + # Agent succeeded: land in review/, then run the auto-QA verify gate. The + # worker is still alive here so the concurrency slot stays held through + # verification — `ended=` is written only once we reach a resting stage. + mv "$doing_file" "$REVIEW/" 2>/dev/null + local review_file="$REVIEW/$job.md" + echo "completed OK (rc=0): landed in review — $(date)" >> "$logf" + local verify; verify=$(fm_get "$review_file" verify "$DEFAULT_VERIFY") + if [[ -z "$verify" ]]; then + echo "result=review" >> "$metaf" + echo "ended=$(date +%s)" >> "$metaf" + echo "no verify command — parked in review for manual promote: $(date)" >> "$logf" + else + echo "----- verify: $verify -----" >> "$logf" + local vrc=0 + ( cd "$cwd" && bash -c "$verify" ) >> "$logf" 2>&1 || vrc=$? + echo "verify_exit=$vrc" >> "$metaf" + if [[ $vrc -eq 0 ]]; then + mv "$review_file" "$TESTING/" 2>/dev/null + echo "result=testing" >> "$metaf" + echo "ended=$(date +%s)" >> "$metaf" + echo "VERIFY PASSED — promoted to testing (QA): $(date)" >> "$logf" + else + mv "$review_file" "$FAILED/" 2>/dev/null + echo "result=verify_failed" >> "$metaf" + echo "ended=$(date +%s)" >> "$metaf" + echo "VERIFY FAILED (rc=$vrc): $(date)" >> "$logf" + fi + fi else mv "$doing_file" "$FAILED/" 2>/dev/null echo "result=failed" >> "$metaf" + echo "ended=$(date +%s)" >> "$metaf" echo "FAILED (rc=$rc): $(date)" >> "$logf" fi } @@ -385,7 +424,7 @@ cmd_run() { [[ -z "$next" ]] && break local job; job=$(basename "$next"); job=${job%.md} - local doing_file="$DOING/$(basename "$next")" + local doing_file="$BUILDING/$(basename "$next")" mv "$next" "$doing_file" local w_eng w_cwd w_yolo w_key w_eng=$(fm_get "$doing_file" engine "$DEFAULT_ENGINE") @@ -420,14 +459,16 @@ _count() { ls -1 "$1"/*.md 2>/dev/null | wc -l | tr -d ' '; } cmd_status() { ensure_dirs - local ib dg dn fl - ib=$(_count "$INBOX"); dg=$(_count "$DOING"); dn=$(_count "$DONE"); fl=$(_count "$FAILED") + local ib bd rv ts sh fl + ib=$(_count "$INBOX"); bd=$(_count "$BUILDING"); rv=$(_count "$REVIEW") + ts=$(_count "$TESTING"); sh=$(_count "$SHIPPED"); fl=$(_count "$FAILED") local running; running=$(active_workers) echo printf '%s AGENT QUEUE %s %s\n' "$C_BOLD" "$C_DIM$QUEUE_ROOT$C_RESET" "" - printf ' %sinbox%s %-3s %sdoing%s %-3s %sdone%s %-3s %sfailed%s %-3s %srunning%s %s/%s\n\n' \ - "$C_BLUE" "$C_RESET" "$ib" "$C_YEL" "$C_RESET" "$dg" \ - "$C_GREEN" "$C_RESET" "$dn" "$C_RED" "$C_RESET" "$fl" \ + printf ' %sinbox%s %-3s %sbuilding%s %-3s %sreview%s %-3s %stesting%s %-3s %sshipped%s %-3s %sfailed%s %-3s %srunning%s %s/%s\n\n' \ + "$C_BLUE" "$C_RESET" "$ib" "$C_YEL" "$C_RESET" "$bd" \ + "$C_CYAN" "$C_RESET" "$rv" "$C_CYAN" "$C_RESET" "$ts" \ + "$C_GREEN" "$C_RESET" "$sh" "$C_RED" "$C_RESET" "$fl" \ "$C_BOLD" "$C_RESET" "$running" "$MAX_CONCURRENCY" # running table @@ -489,19 +530,77 @@ cmd_logs() { if [[ -n "$follow" ]]; then tail -f "$lf"; else cat "$lf"; fi } -# requeue — move a failed job back to inbox/ for a fresh run. +# _find_job — echo the first matching .md across the given dirs +# (exact ".md" preferred, else newest fuzzy match). Empty if none found. +_find_job() { + local job=$1; shift + local d f + for d in "$@"; do + [[ -f "$d/$job.md" ]] && { printf '%s' "$d/$job.md"; return; } + done + for d in "$@"; do + f=$(ls -1t "$d"/*"$job"*.md 2>/dev/null | head -1) + [[ -f "$f" ]] && { printf '%s' "$f"; return; } + done +} + +# requeue — move a job back to inbox/ for a fresh run (from failed/review/testing). cmd_requeue() { ensure_dirs local job="${1:-}" [[ -n "$job" ]] || die "usage: requeue " - local f="$FAILED/$job.md" - [[ -f "$f" ]] || f=$(ls -1t "$FAILED"/*"$job"*.md 2>/dev/null | head -1) - [[ -f "$f" ]] || die "no failed job matching '$job'" - local base name; base=$(basename "$f"); name=${base%.md} + local f; f=$(_find_job "$job" "$FAILED" "$REVIEW" "$TESTING") + [[ -n "$f" ]] || die "no failed/review/testing job matching '$job'" + local base name from; base=$(basename "$f"); name=${base%.md}; from=$(basename "$(dirname "$f")") mv "$f" "$INBOX/$base" # drop stale state so it re-runs cleanly rm -f "$STATE/$name.meta" "$STATE/$name.body.md" "$STATE/$name.timedout" - log "requeued $C_BOLD$base$C_RESET (failed → inbox)" + log "requeued $C_BOLD$base$C_RESET ($from → inbox)" +} + +# ship — manual promotion testing/ (QA) → shipped/. The human gate. +cmd_ship() { + ensure_dirs + local job="${1:-}" + [[ -n "$job" ]] || die "usage: ship " + local f; f=$(_find_job "$job" "$TESTING") + [[ -n "$f" ]] || die "no job in testing/ matching '$job' (only QA-passed jobs can ship)" + local base name; base=$(basename "$f"); name=${base%.md} + mv "$f" "$SHIPPED/$base" + [[ -f "$STATE/$name.meta" ]] && echo "result=shipped" >> "$STATE/$name.meta" + log "shipped $C_BOLD$base$C_RESET (testing → shipped)" +} + +# promote — advance one stage forward: review → testing → shipped. +cmd_promote() { + ensure_dirs + local job="${1:-}" + [[ -n "$job" ]] || die "usage: promote " + local f; f=$(_find_job "$job" "$REVIEW" "$TESTING") + [[ -n "$f" ]] || die "no job in review/ or testing/ matching '$job'" + local base name from dest result; base=$(basename "$f"); name=${base%.md} + from=$(basename "$(dirname "$f")") + case "$from" in + review) dest="$TESTING"; result="testing";; + testing) dest="$SHIPPED"; result="shipped";; + *) die "promote: '$base' is in '$from' — nothing to promote";; + esac + mv "$f" "$dest/$base" + [[ -f "$STATE/$name.meta" ]] && echo "result=$result" >> "$STATE/$name.meta" + log "promoted $C_BOLD$base$C_RESET ($from → $result)" +} + +# reject — move a review/testing job to failed/ (manual gate rejection). +cmd_reject() { + ensure_dirs + local job="${1:-}" + [[ -n "$job" ]] || die "usage: reject " + local f; f=$(_find_job "$job" "$REVIEW" "$TESTING") + [[ -n "$f" ]] || die "no job in review/ or testing/ matching '$job'" + local base name from; base=$(basename "$f"); name=${base%.md}; from=$(basename "$(dirname "$f")") + mv "$f" "$FAILED/$base" + [[ -f "$STATE/$name.meta" ]] && echo "result=rejected" >> "$STATE/$name.meta" + log "rejected $C_BOLD$base$C_RESET ($from → failed)" } # clean [--keep N] — archive finished jobs' logs+meta beyond the newest N @@ -556,14 +655,19 @@ ${C_BOLD}COMMANDS${C_RESET} process inbox/ (foreground loop; Ctrl-C to stop) status show kanban counts + running workers watch [interval] live status (default 2s, bash) - dash [--interval N] richer live Node dashboard (recent done/failed too) + dash [--interval N] richer live Node dashboard (recent shipped/failed too) stop kill running workers + the run loop logs [-f] print (or follow) a job's log - requeue move a failed job back to inbox/ + promote advance one stage (review → testing → shipped) + ship manual gate: testing (QA) → shipped + reject send a review/testing job to failed/ + requeue move a failed/review/testing job back to inbox/ clean [--keep N] archive finished logs+meta beyond newest N (default 50) help this message -${C_BOLD}KANBAN${C_RESET} inbox → doing → done / failed (logs/ + .state/ alongside) +${C_BOLD}KANBAN${C_RESET} inbox → building → review → testing → shipped (+ failed; logs/ + .state/ alongside) + auto: agent rc=0 → review; verify pass → testing; verify fail → failed + manual: ship (testing → shipped) ${C_BOLD}TASK FRONTMATTER${C_RESET} (top of each .md) --- @@ -572,11 +676,13 @@ ${C_BOLD}TASK FRONTMATTER${C_RESET} (top of each .md) yolo: true lock: my-repo # optional; defaults to cwd. Jobs sharing a key run serially timeout: 45m # optional; 90s|45m|2h|1d. On expiry -> failed (result=timeout) + verify: pnpm -s test # optional; auto-QA gate. pass -> testing, fail -> failed --- ${C_BOLD}ENV${C_RESET} AGENT_QUEUE_ROOT (=$QUEUE_ROOT) AGENT_QUEUE_MAX (=$MAX_CONCURRENCY) - AGENT_QUEUE_ENGINE (=$DEFAULT_ENGINE) DEVIN_BIN / CLAUDE_BIN / CODEX_BIN + AGENT_QUEUE_ENGINE (=$DEFAULT_ENGINE) AGENT_QUEUE_VERIFY (default verify cmd) + DEVIN_BIN / CLAUDE_BIN / CODEX_BIN EOF } @@ -591,6 +697,9 @@ main() { dash|dashboard) cmd_dash "$@";; stop) cmd_stop "$@";; logs) cmd_logs "$@";; + promote) cmd_promote "$@";; + ship) cmd_ship "$@";; + reject) cmd_reject "$@";; requeue) cmd_requeue "$@";; clean) cmd_clean "$@";; help|-h|--help) usage;; diff --git a/agent-queue/dashboard.mjs b/agent-queue/dashboard.mjs index 148a973..aaf9c31 100644 --- a/agent-queue/dashboard.mjs +++ b/agent-queue/dashboard.mjs @@ -3,7 +3,9 @@ // // Reads the same queue/ state written by agent-queue.sh and re-renders a board // every interval: kanban counts, running workers (engine, elapsed, last log line), -// and the most recent done/failed jobs. +// and the most recent shipped/failed jobs. +// +// Lifecycle: inbox → building → review → testing → shipped (+ failed) // // Usage: node dashboard.mjs [--interval 2] [--root /path/to/queue] // AGENT_QUEUE_ROOT=/path node dashboard.mjs @@ -28,8 +30,10 @@ const STALL_MIN = Math.max(1, parseInt(process.env.AGENT_QUEUE_STALL_MIN || '10' const DIRS = { inbox: path.join(ROOT, 'inbox'), - doing: path.join(ROOT, 'doing'), - done: path.join(ROOT, 'done'), + building: path.join(ROOT, 'building'), + review: path.join(ROOT, 'review'), + testing: path.join(ROOT, 'testing'), + shipped: path.join(ROOT, 'shipped'), failed: path.join(ROOT, 'failed'), logs: path.join(ROOT, 'logs'), state: path.join(ROOT, '.state'), @@ -112,8 +116,9 @@ function render() { .sort((a, b) => Number(b.ended) - Number(a.ended)); const counts = { - inbox: count(DIRS.inbox), doing: count(DIRS.doing), - done: count(DIRS.done), failed: count(DIRS.failed), + inbox: count(DIRS.inbox), building: count(DIRS.building), + review: count(DIRS.review), testing: count(DIRS.testing), + shipped: count(DIRS.shipped), failed: count(DIRS.failed), }; const out = []; @@ -123,8 +128,10 @@ function render() { out.push(''); out.push( ` ${c('blue', '▢ inbox')} ${String(counts.inbox).padEnd(3)}` + - ` ${c('yellow', '◧ doing')} ${String(counts.doing).padEnd(3)}` + - ` ${c('green', '▣ done')} ${String(counts.done).padEnd(3)}` + + ` ${c('yellow', '◧ building')} ${String(counts.building).padEnd(3)}` + + ` ${c('cyan', '◔ review')} ${String(counts.review).padEnd(3)}` + + ` ${c('cyan', '◕ testing')} ${String(counts.testing).padEnd(3)}` + + ` ${c('green', '▣ shipped')} ${String(counts.shipped).padEnd(3)}` + ` ${c('red', '✕ failed')} ${String(counts.failed).padEnd(3)}` + ` ${C.bold}running ${running.length}${C.reset}` ); @@ -161,13 +168,24 @@ function render() { out.push(` ${c('dim', 'nothing finished yet')}`); } else { for (const m of recent) { - const ok = m.result === 'done'; + const res = m.result || ''; + const failedRes = res === 'failed' || res === 'timeout' || res === 'verify_failed' || res === 'rejected'; + const ok = !failedRes; const mark = ok ? c('green', '▣') : c('red', '✕'); const when = m.ended ? new Date(Number(m.ended) * 1000).toLocaleTimeString() : ''; + let label; + if (res === 'shipped') label = c('green', 'shipped'); + else if (res === 'testing') label = c('cyan', 'testing (QA)'); + else if (res === 'review') label = c('cyan', 'review'); + else if (res === 'verify_failed') label = c('red', 'verify failed'); + else if (res === 'timeout') label = c('red', 'timeout'); + else if (res === 'rejected') label = c('red', 'rejected'); + else if (res === 'failed') label = c('red', 'failed rc=' + (m.exit || '?')); + else label = c('gray', res || '?'); out.push( ` ${mark} ${trunc(m.job || '?', 34).padEnd(34)} ` + `${c('gray', (m.engine || '').padEnd(7))} ` + - `${ok ? c('green', 'done') : c('red', 'failed rc=' + (m.exit || '?'))} ${c('gray', when)}` + `${label} ${c('gray', when)}` ); } } diff --git a/agent-queue/queue/doing/.gitkeep b/agent-queue/queue/building/.gitkeep similarity index 100% rename from agent-queue/queue/doing/.gitkeep rename to agent-queue/queue/building/.gitkeep diff --git a/agent-queue/queue/done/.gitkeep b/agent-queue/queue/review/.gitkeep similarity index 100% rename from agent-queue/queue/done/.gitkeep rename to agent-queue/queue/review/.gitkeep diff --git a/agent-queue/queue/shipped/.gitkeep b/agent-queue/queue/shipped/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/agent-queue/queue/testing/.gitkeep b/agent-queue/queue/testing/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/agent-queue/selftest.sh b/agent-queue/selftest.sh index 6949cb6..fa31217 100755 --- a/agent-queue/selftest.sh +++ b/agent-queue/selftest.sh @@ -59,12 +59,45 @@ printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' '---' '' '# self-t DEVIN_BIN="$stub" "$AQ" add "$task" >/dev/null DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 -if ls "$AGENT_QUEUE_ROOT"/done/*.md >/dev/null 2>&1; then - pass "init/add/run --once → task landed in done/" +if ls "$AGENT_QUEUE_ROOT"/review/*.md >/dev/null 2>&1; then + pass "no-verify cycle → task parked in review/" else echo "--- queue state ---" >&2 ls -R "$AGENT_QUEUE_ROOT" >&2 || true - fail "no-op cycle did not complete (expected a file in done/)" + fail "no-op cycle did not complete (expected a file in review/)" +fi + +# 5. verify-pass gate: rc=0 + passing verify → testing/, then manual ship → shipped/ +task2="$tmp/task-verify.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'verify: true' '---' '' '# self-test verify-pass task' > "$task2" +DEVIN_BIN="$stub" "$AQ" add "$task2" >/dev/null +DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/testing/*.md >/dev/null 2>&1; then + pass "verify-pass cycle → task promoted to testing/" +else + echo "--- queue state ---" >&2 + ls -R "$AGENT_QUEUE_ROOT" >&2 || true + fail "verify-pass cycle did not reach testing/ (expected a file in testing/)" +fi +shipjob="$(basename "$(ls -1t "$AGENT_QUEUE_ROOT"/testing/*.md | head -1)" .md)" +"$AQ" ship "$shipjob" >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/shipped/*.md >/dev/null 2>&1; then + pass "manual ship → task landed in shipped/" +else + fail "ship did not move job to shipped/" +fi + +# 6. verify-fail gate: rc=0 + failing verify → failed/ +task3="$tmp/task-verifyfail.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'verify: false' '---' '' '# self-test verify-fail task' > "$task3" +DEVIN_BIN="$stub" "$AQ" add "$task3" >/dev/null +DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/failed/*verifyfail*.md >/dev/null 2>&1; then + pass "verify-fail cycle → task routed to failed/" +else + echo "--- queue state ---" >&2 + ls -R "$AGENT_QUEUE_ROOT" >&2 || true + fail "verify-fail cycle did not route to failed/" fi # status must not error