bytelyst-devops-tools/agent-queue/README.md
saravanakumardb1 e183919c60 docs(agent-queue): profiles + deps docs; tick §5/§6 + bump Phase 1 to 80% (P1-S2)
README: Profiles & deps section (resolution precedence, persona, allowed-scope warn-only, deps/blocked + cycle detection); manifest table moves profile/deps/deps-mode to active. Roadmap: tick §6 catalog/persona/inheritance/allowed-scope and §5 deps + the §14 profile/deps/scope boxes; add P1-S2 slice note; §0 Phase 1 -> 80%.
2026-05-29 19:26:33 -07:00

362 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# agent-queue
A zero-dependency **folder "kanban" runner** for headless coding-agent CLIs —
**Devin**, **Claude Code**, and **OpenAI Codex**. Drop prompt `.md` files into a folder,
and they get executed (in auto-approve mode) one slot at a time, moving through
`inbox → building → review → testing → shipped` (plus `failed`) with live status.
> **Vision & roadmap:** where this is headed — a distributed multi-machine "gigafactory"
> (fleet of factories × tools × profiles, scheduler-routed, built on platform-service +
> tracker-web) — is specified as a checklist-driven implementation roadmap in
> [`docs/GIGAFACTORY_ROADMAP.md`](docs/GIGAFACTORY_ROADMAP.md).
**Build/ship lifecycle — auto-QA, manual ship:**
```
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
(queued) (agent (rc=0; (verify (you ran
running) awaiting passed — `ship`)
verify) QA gate)
agent rc≠0 / │ verify fails
timeout ──────────┴──────────────▶ failed
```
- **Auto:** agent exits 0 → `review/`. If a `verify:` command is configured it runs
automatically: **pass → `testing/` (QA)**, **fail → `failed/`**. No `verify:`
the job parks in `review/` for a manual `promote`.
- **Manual:** you `ship` a `testing/` job → `shipped/` (the human gate). Shipping is
never automatic.
> **Why this exists:** the agent CLIs ship a minimal local interface (no built-in
> batch/queue/dashboard — that lives in their *cloud* products). This is the
> zero-dependency bash glue that turns "run one prompt interactively" into
> "queue many and walk away."
---
## Quick start
```bash
cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
--engine devin \
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
--yolo
# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3
```
In a **second terminal**, watch progress:
```bash
./agent-queue.sh watch
```
```
AGENT QUEUE /…/agent-queue/queue
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
RUNNING
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
```
---
## How a task is configured
Each `.md` carries optional **frontmatter** telling the runner which engine to use,
which directory to run in, and whether to auto-approve:
```md
---
engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
yolo: true # auto-approve ALL tools (default: true)
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
# pass → testing/ (QA), fail → failed/
# (omit to park in review/ for manual promote)
---
# Your task / roadmap goes here
...
```
`add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't
already have a `---` block.
### Manifest fields (Gigafactory Phase 1)
The runner parses the richer [gigafactory manifest](docs/GIGAFACTORY_ROADMAP.md#5-the-evolved-job-manifest-feature)
**backward-compatibly** — a legacy `engine`/`cwd`/`yolo`-only `.md` behaves exactly as before.
Fields marked **RESERVED** are parsed, stored in `.state/<job>.meta`, and shown in `status`, but
are otherwise **no-ops until a later phase** (they do not yet affect execution).
| Field | Status | Default | Meaning |
| ----- | ------ | ------- | ------- |
| `engine` | active | `$AGENT_QUEUE_ENGINE` | explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class` |
| `cwd` / `yolo` / `lock` / `timeout` / `verify` | active | see above | Phase-0 behavior, unchanged |
| `priority` | **active** | `medium` | `critical\|high\|medium\|low`. Inbox is picked **highest-priority first, then oldest** (was pure FIFO) |
| `engine-class` | **active** | _(none)_ | used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first **available** engine. No engine available → job fails `result=no_engine` |
| `prefers-engine` | **active** | _(none)_ | optional order hint for `engine-class` resolution, e.g. `[claude, devin]` |
| `capabilities` | **active** | _(none)_ | hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` **and the agent is never launched** (grammar below) |
| `idempotency-key` | **active** | _(none)_ | dedupe on `add` (semantics below) |
| `profile` | **active** | _(none)_ | inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from `profiles/<name>.md` (job fields override — see **Profiles**) |
| `prefers` | RESERVED | _(none)_ | soft routing/affinity hints (e.g. `[factory:mac-2]`) |
| `budget` | RESERVED | _(none)_ | `{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice) |
| `deps` / `deps-mode` | **active** | _(none)_ | block until each referenced `idempotency-key` is in `shipped/` (or `testing/` when `deps-mode: soft`). Submit-time cycle detection (see **Profiles & deps**) |
| `retry` | **active** | _(none)_ | `{ max: N, backoff: 5m, on: [timeout, verify_failed, crash] }` — requeue failures with backoff up to `max`, then `retries_exhausted` (see **Resilience**) |
| `review-policy` | RESERVED | _(none)_ | `auto\|manual\|reviewers:[…]` |
| `artifacts` | RESERVED | _(none)_ | extra outputs to capture (coverage, screenshots) |
| `tracker-item` | RESERVED | _(none)_ | link back to the originating tracker task |
**Capability grammar** (a job matches a host iff **every** required token is satisfied):
| Token form | Example | Satisfied when |
| ---------- | ------- | -------------- |
| `key` (bare presence) | `gpu` | the host advertises `key` in any form |
| `key:value` (exact) | `os:mac`, `engine:devin`, `has:git` | the host advertises that exact token |
| `key:any` (wildcard) | `os:any` | the host advertises any `key:*` (so `os:any` matches every host) |
| `key<op>version` (`>=` `>` `=` `<=` `<`) | `node>=20` | numeric/semver-major compare vs the host's `key:<n>` |
The host advertises (via `detect_capabilities`): `os:<mac\|linux>`, `engine:<each available engine>`,
`node:<major>`, and `has:<git\|pnpm\|docker>` when present.
**`idempotency-key` semantics** (on `add`, hashing the frontmatter-stripped body):
- same key **+ same body** → **no-op** (logged `duplicate, skipped`).
- same key **+ different body**, prior job still in `inbox/`**supersedes** it (replaces the queued file).
- same key **+ different body**, prior job already past `inbox/` (building/review/testing/shipped) →
**rejected** with a clear error (use a new key, or requeue the existing job).
## Engine mapping
| `engine:` | Command run | Auto-approve flag (`yolo: true`) |
| --------- | ----------- | -------------------------------- |
| `devin` | `devin -p --prompt-file <body>` | `--permission-mode dangerous` |
| `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` |
| `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` |
| `copilot` | `copilot -p` (body on **stdin**) | `--allow-all-tools` _(best-effort; chat-coder class target)_ |
The frontmatter is **stripped** before the body reaches the agent, and
claude/codex receive it on **stdin** so a body starting with `--` is never
misparsed as a flag.
> Flags drift between CLI versions — if one changes, edit `build_agent_cmd()` in
> `agent-queue.sh` (it's the single place each engine is mapped).
## Commands
| Command | What it does |
| ------- | ------------ |
| `init` | create the `queue/` folders |
| `add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]` | queue a prompt into `inbox/` |
| `run [--max N] [--engine E] [--once]` | process the inbox (foreground loop) |
| `status` | kanban counts + running-worker table (marks `⚠ stalled`; per-job insights sub-line) |
| `watch [interval]` | live `status` (bash), redrawn every N seconds (default 2) |
| `insights [job]` | per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see **Insights**) |
| `recover` | reclaim orphaned `building/` jobs (dead worker) back to `inbox/` (see **Resilience**) |
| `dash [--interval N]` | **interactive Node dashboard** — navigable numbered job list with single-key actions (see below) |
| `stop` | kill running workers + the run loop |
| `logs <job> [-f]` | print / follow a job's log |
| `promote <job>` | advance one stage forward: `review → testing → shipped` |
| `ship <job>` | **manual gate:** move a `testing/` (QA) job → `shipped/` |
| `reject <job>` | send a `review/` or `testing/` job → `failed/` |
| `requeue <job>` | move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run |
| `clean [--keep N]` | archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/` |
Only one `run` loop may be active per queue — a second `run` against the same
queue is refused while the first is alive (a stale `daemon.pid` is cleared).
### Interactive dashboard (`dash`)
`dash` is a single-script, menu-driven control panel (think a tiny "glassbox").
It shows the kanban counts, live RUNNING workers (engine, elapsed, last log
line, stall), a **navigable numbered JOBS list**, and RECENT finished jobs — and
lets you act on jobs without leaving the screen. Every action shells out to
`agent-queue.sh`, so the script stays the single source of truth.
| Key | Action |
| --- | ------ |
| `↑`/`↓`, `j`/`k`, `1``9` | select a job in the JOBS list |
| `enter` / `l` | view the selected job's log (live, auto-refreshing) |
| `p` | promote (`review → testing → shipped`) |
| `s` | ship (`testing`/QA → `shipped`, the manual gate) |
| `x` | reject (`review`/`testing` → `failed`) — asks `y/n` |
| `u` | requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n` |
| `r` | start the `run` loop (detached → `logs/run-loop.log`) |
| `S` | stop the run loop + running workers |
| `g` | refresh now · `?`/`h` help · `q`/`Ctrl-C` quit |
The header shows a `● run loop pid N` / `○ run loop stopped` indicator. Run it in
a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.
## Via `bytelyst-cli.sh`
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
```bash
./bytelyst-cli.sh agent-queue run --max 3 # full passthrough
./bytelyst-cli.sh aq status # short alias
```
## Folder layout
```
queue/
inbox/ # drop / queued .md files (oldest eligible picked first)
building/ # currently executing (agent running)
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
testing/ # verify passed (QA) — awaiting manual `ship`
shipped/ # manually shipped — the terminal success stage
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
logs/ # <job>.log — full agent + verify output
locks/ # per-key flock files (Linux hardening; unused on macOS)
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
.archive/ # <ts>/ — logs+meta moved here by `clean`
```
**`result=` values** written to `<job>.meta`: `review`, `testing`, `shipped`,
`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`, `capability_mismatch`
(host missing a required capability — agent never launched), `no_engine`
(an `engine-class` had no available engine), `retries_exhausted` (failed after
`retry.max` attempts — single-host dead-letter stand-in), `retry_scheduled`
(transient: requeued for another attempt), `recovered` (transient: an orphan was
reclaimed to `inbox/`).
## Profiles & deps
### Profiles (roadmap §6)
A **profile** is a reusable role preset in `profiles/<name>.md`. A job opts in with
`profile: <name>` and inherits any of these fields it does **not** set itself:
`verify` (from the profile's `default-verify`), `capabilities`, `engine-class`,
`prefers-engine`, `allowed-scope`, `review-policy`. The profile's `persona` block is
**prepended** to the body sent to the engine (the job `.md` on disk is unchanged;
secrets are never logged). Resolution runs **before** the capability gate and engine
resolution, so inherited caps / engine-class take effect.
**Precedence:** `job field > profile field > built-in default`. Set `AGENT_QUEUE_PROFILES`
to point at a different catalog directory (defaults to `./profiles`).
Starter catalog: `developer`, `backend-engineer`, `frontend-engineer`, `ux-designer`,
`ui-designer`, `qa`, `reviewer`, `docs-writer`, and a reserved `planner`. Each presets
`name`, `persona`, `capabilities`, `default-verify`, `engine-class`, `prefers-engine`,
`allowed-scope`, and `review-policy`.
**allowed-scope (warn-only this phase).** After a run on a git `cwd`, changed paths
outside the profile/job `allowed-scope` globs (`dir/**` matches the whole subtree) are
logged as a `WARNING` and recorded as `scope_warning=` in the meta — **non-blocking**
(the job is not failed). `path_in_scope` is exposed as a unit-testable function.
### deps / DAG, single host (roadmap §5)
`deps: [keyA, keyB]` references other jobs by their author-controlled
`idempotency-key`. A dep is **satisfied** when a job with that key is in `shipped/`
(default), or in `shipped/` **or** `testing/` when the dependent job sets
`deps-mode: soft`. A job with unmet deps is **blocked**: it is skipped in inbox
selection (never launched, never failed) and surfaced in `status` as
`blocked (waiting on: <keys>)`, then re-evaluated every loop until its deps are met.
`add` performs **submit-time cycle detection** over the inbox + active-stage dep graph
and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.
## Resilience (crash recovery & work preservation)
Single-host implementations of the durability model (roadmap §25):
- **Orphan recovery.** A job left in `building/` whose worker process is dead (no
live `pid`, PID-reuse-guarded by `pidstart`) is an orphan from a previous
crash/power-off. On `run` startup and on every loop iteration (or on demand via
`agent-queue.sh recover`) it is moved back to `inbox/` with `attempts`
incremented. Recovery is **idempotent** — once moved out of `building/` it is
never recovered twice.
- **WIP checkpointing.** When a job's `cwd` is a git repo, the worker creates/checks
out a dedicated branch **`aq/wip/<job>`** at start and commits any changes to it
on **every** exit path — success, failure, timeout, and SIGTERM/SIGINT (via a
trap). It **never** commits to `main`/your current branch. Non-git `cwd` is
skipped cleanly. `wip_branch` / `wip_base` / `wip_commit` are recorded in the meta.
- **Resume.** When an orphan/retry of a job whose `aq/wip/<job>` branch already
exists is relaunched, that branch is checked out first so the agent **continues
from the checkpoint** instead of from zero.
- **Retry policy** (`retry` frontmatter, now active). On a failure whose class is in
`on` (`crash`/`agent_error` for a non-zero agent exit, `timeout`, `verify_failed`)
the job is requeued to `inbox/` honoring `backoff` (selection skips it until
`next_eligible`) up to `max` attempts; on exhaustion it lands in `failed/` with
`result=retries_exhausted`, preserving the WIP branch + full log. No `retry` =
no retry (Phase-0 behavior).
All bookkeeping (`attempts`, `next_eligible`, `wip_*`) is append-only in the meta
and re-derivable from the meta + folder location, so recovery is crash-safe.
## Insights (metrics & token accounting)
Each finished run records into `<job>.meta`: `duration_s`, `exit`, `result`,
`attempts`, and — for a git `cwd``files_changed` / `lines_added` /
`lines_deleted` (diffed `wip_base..HEAD`). A single `parse_usage <engine> <log>`
adapter extracts `model` / `tokens_in` / `tokens_out` / `tokens_cached` /
`cost_usd` / `turns` / `tool_calls` when the engine exposes them.
```bash
agent-queue.sh insights <job> # full metrics for one job
agent-queue.sh insights # recent-jobs table + per-engine rollup
```
> **Token caveat (honest):** real usage is captured only where the engine surfaces
> it. A cooperating wrapper may emit a machine-readable `AQ_USAGE key=value …` line;
> otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin
> session metrics + Copilot are API-only and currently TODO in `parse_usage`). When
> a value is not provider-reported it is **omitted or flagged `usage_estimated`** —
> numbers are never fabricated. The per-engine rollup marks totals that include any
> estimated value with `*`.
## Config (env overrides)
| Var | Default | Meaning |
| --- | ------- | ------- |
| `AGENT_QUEUE_ROOT` | `./queue` | where the kanban folders live |
| `AGENT_QUEUE_MAX` | `3` | max concurrent agents (override per-run with `run --max N`) |
| `AGENT_QUEUE_ENGINE` | `devin` | default engine when none in frontmatter |
| `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) |
| `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it |
| `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` |
| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN` | autodetected | override CLI binary paths |
| `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes |
## ⚠️ Safety
Running agents with `yolo: true` means **no approval prompts** — they will edit files,
run shell commands, and commit unattended. Mitigate:
- Prefer **scope-locked** prompt files (e.g. "edit only under `dashboards/tracker-web/`").
- Tell prompts **not to `git push`** — review commits before they leave your machine.
- **Same-repo safety is automatic:** jobs sharing a `cwd` (or `lock:` key) are
serialized, so two agents never run in one repo at once — even at `--max 2+`.
- Set a `timeout:` on long jobs so a wedged agent can't run forever.
- Watch cost: each job is a full agent session.
### Portability notes
- **macOS** has no `flock`/`timeout`; locking relies on the single run-loop
(enforced by the second-run refusal) and timeouts use a pure-bash watchdog.
Install coreutils (`gtimeout`) for hard process-tree kills.
- **Linux** (incl. Gitea CI) uses `flock` + `timeout` for cross-process hardening.
## Roadmap / nice-to-haves
- [x] Per-repo lock to serialize same-repo jobs automatically (`lock:` / cwd).
- [x] Per-job `timeout:` with hard kill (or bash watchdog fallback).
- [x] Stall detection in `status`/`dash`.
- [x] `requeue` failed jobs + `clean`/archive old runs.
- [x] Build/ship lifecycle: `review → testing → shipped` with auto-QA `verify:` gate + manual `ship`.
- [ ] `--push` opt-in policy + commit review gate.
- [ ] Optional notifications (Slack/desktop) on done/failed/stall.
- [ ] Persisted run-loop as a daemon/service with auto-restart.