bytelyst-devops-tools/agent-queue/README.md
saravanakumardb1 d0e800247c feat(agent-queue): PR mode clones from local repo base (AQ_FLEET_REPO_BASE)
MVP: when AQ_FLEET_REPO_BASE/<repo> is an existing local checkout, use it as the
clone source (fast, no network) and push/PR to its GitHub origin — embedded creds
in the local origin URL are stripped (gh credential helper handles auth). Selftest
PASS (full-path bare-repo fallback unchanged).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-31 05:36:46 -07:00

32 KiB
Raw Blame History

agent-queue

A zero-dependency folder "kanban" runner for headless coding-agent CLIs — Devin, Claude Code, and OpenAI Codex. Drop prompt .md files into a folder, and they get executed (in auto-approve mode) one slot at a time, moving through inbox → building → review → testing → shipped (plus failed) with live status.

Vision & roadmap: where this is headed — a distributed multi-machine "gigafactory" (fleet of factories × tools × profiles, scheduler-routed, built on platform-service + tracker-web) — is specified as a checklist-driven implementation roadmap in docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md. A full architecture overview, diagrams, code map and onboarding live alongside it in docs/GIGAFACTORY/.

Run safety: how the daemon and its agents must operate (isolated worktrees, branch-per-task, least-privilege instead of blanket --yolo/dangerous on live checkouts) is defined in docs/RUN_POLICY.md. Read it before enabling yolo: true.

Build/ship lifecycle — auto-QA, manual ship:

inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
  (queued)  (agent     (rc=0;    (verify    (you ran
            running)   awaiting  passed —    `ship`)
                       verify)   QA gate)
                          │
        agent rc≠0 /      │ verify fails
        timeout ──────────┴──────────────▶ failed
  • Auto: agent exits 0 → review/. If a verify: command is configured it runs automatically: pass → testing/ (QA), fail → failed/. No verify: → the job parks in review/ for a manual promote.
  • Manual: you ship a testing/ job → shipped/ (the human gate). Shipping is never automatic.

Why this exists: the agent CLIs ship a minimal local interface (no built-in batch/queue/dashboard — that lives in their cloud products). This is the zero-dependency bash glue that turns "run one prompt interactively" into "queue many and walk away."


Quick start

cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init

# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
  --engine devin \
  --cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
  --yolo

# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3

In a second terminal, watch progress:

./agent-queue.sh watch
  AGENT QUEUE  /…/agent-queue/queue
  inbox 3   building 2   review 1   testing 2   shipped 5   failed 0   running 2/2

  RUNNING
    20260528-2130__UX-2        devin     4m12s  pid 51234  ⏺ Edited src/app/dashboard/items/page.tsx
    20260528-2131__UX-3        claude    1m02s  pid 51290  Running: pnpm typecheck

How a task is configured

Each .md carries optional frontmatter telling the runner which engine to use, which directory to run in, and whether to auto-approve:

---
engine: devin          # devin | claude | codex | copilot  (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes   (default: cwd when added)
yolo: true             # auto-approve ALL tools      (default: true)
lock: my-repo          # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m           # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test   # optional auto-QA gate. Runs in cwd after rc=0:
                       #   pass → testing/ (QA),  fail → failed/
                       #   (omit to park in review/ for manual promote)
---

# Your task / roadmap goes here
...

add --engine/--cwd/--yolo will inject this frontmatter for you if the file doesn't already have a --- block.

Manifest fields (Gigafactory Phase 1)

The runner parses the richer gigafactory manifest backward-compatibly — a legacy engine/cwd/yolo-only .md behaves exactly as before. Fields marked RESERVED are parsed, stored in .state/<job>.meta, and shown in status, but are otherwise no-ops until a later phase (they do not yet affect execution).

Field Status Default Meaning
engine active $AGENT_QUEUE_ENGINE explicit engine (devin|claude|codex|copilot) — always wins over engine-class
cwd / yolo / lock / timeout / verify active see above Phase-0 behavior, unchanged
priority active medium critical|high|medium|low. Inbox is picked highest-priority first, then oldest (was pure FIFO)
engine-class active (none) used only when engine is unset: agentic-coderdevin,claude,codex; chat-codercopilot. Picks the first available engine. No engine available → job fails result=no_engine
prefers-engine active (none) optional order hint for engine-class resolution, e.g. [claude, devin]
capabilities active (none) hard host requirements, e.g. [os:any, node>=20, has:git]. If the host can't satisfy them the job is sent to failed/ with result=capability_mismatch and the agent is never launched (grammar below)
idempotency-key active (none) dedupe on add (semantics below)
profile active (none) inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from profiles/<name>.md (job fields override — see Profiles)
prefers RESERVED (none) soft routing/affinity hints (e.g. [factory:mac-2])
budget RESERVED (none) { usd, tokens, wall } ceilings (wall enforcement is a later slice)
deps / deps-mode active (none) block until each referenced idempotency-key is in shipped/ (or testing/ when deps-mode: soft). Submit-time cycle detection (see Profiles & deps)
retry active (none) { max: N, backoff: 5m, on: [timeout, verify_failed, crash] } — requeue failures with backoff up to max, then retries_exhausted (see Resilience)
review-policy RESERVED (none) auto|manual|reviewers:[…]
artifacts RESERVED (none) extra outputs to capture (coverage, screenshots)
tracker-item RESERVED (none) link back to the originating tracker task

Capability grammar (a job matches a host iff every required token is satisfied):

Token form Example Satisfied when
key (bare presence) gpu the host advertises key in any form
key:value (exact) os:mac, engine:devin, has:git the host advertises that exact token
key:any (wildcard) os:any the host advertises any key:* (so os:any matches every host)
key<op>version (>= > = <= <) node>=20 numeric/semver-major compare vs the host's key:<n>

The host advertises (via detect_capabilities): os:<mac\|linux>, engine:<each available engine>, node:<major>, and has:<git\|pnpm\|docker> when present.

idempotency-key semantics (on add, hashing the frontmatter-stripped body):

  • same key + same bodyno-op (logged duplicate, skipped).
  • same key + different body, prior job still in inbox/supersedes it (replaces the queued file).
  • same key + different body, prior job already past inbox/ (building/review/testing/shipped) → rejected with a clear error (use a new key, or requeue the existing job).

Engine mapping

engine: Command run Auto-approve flag (yolo: true)
devin devin -p --prompt-file <body> --permission-mode dangerous
claude claude -p (body on stdin) --dangerously-skip-permissions
codex codex exec (body on stdin) --dangerously-bypass-approvals-and-sandbox
copilot copilot -p (body on stdin) --allow-all-tools (best-effort; chat-coder class target)

The frontmatter is stripped before the body reaches the agent, and claude/codex receive it on stdin so a body starting with -- is never misparsed as a flag.

Flags drift between CLI versions — if one changes, edit build_agent_cmd() in agent-queue.sh (it's the single place each engine is mapped).

Commands

Command What it does
init create the queue/ folders
add <file> [--engine E] [--cwd P] [--yolo|--no-yolo] queue a prompt into inbox/
run [--max N] [--engine E] [--once] process the inbox (foreground loop)
status kanban counts + running-worker table (marks ⚠ stalled; per-job insights sub-line)
watch [interval] live status (bash), redrawn every N seconds (default 2)
insights [job] per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see Insights)
recover reclaim orphaned building/ jobs (dead worker) back to inbox/ (see Resilience)
dash [--interval N] interactive Node dashboard — navigable numbered job list with single-key actions (see below)
stop kill running workers + the run loop
logs <job> [-f] print / follow a job's log
promote <job> advance one stage forward: review → testing → shipped
ship <job> manual gate: move a testing/ (QA) job → shipped/
reject <job> send a review/ or testing/ job → failed/
requeue <job> move a failed/review/testing job back to inbox/ for a fresh run
clean [--keep N] archive finished logs+meta beyond the newest N (default 50) into queue/.archive/

Only one run loop may be active per queue — a second run against the same queue is refused while the first is alive (a stale daemon.pid is cleared).

Interactive dashboard (dash)

dash is a single-script, menu-driven control panel (think a tiny "glassbox"). It shows the kanban counts, live RUNNING workers (engine, elapsed, last log line, stall), a navigable numbered JOBS list, and RECENT finished jobs — and lets you act on jobs without leaving the screen. Every action shells out to agent-queue.sh, so the script stays the single source of truth.

Key Action
/, j/k, 19 select a job in the JOBS list
enter / l view the selected job's log (live, auto-refreshing)
p promote (review → testing → shipped)
s ship (testing/QA → shipped, the manual gate)
x reject (review/testingfailed) — asks y/n
u requeue (failed/review/testinginbox) — asks y/n
r start the run loop (detached → logs/run-loop.log)
S stop the run loop + running workers
g refresh now · ?/h help · q/Ctrl-C quit

The header shows a ● run loop pid N / ○ run loop stopped indicator. Run it in a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.

Via bytelyst-cli.sh

Wired into the repo's unified CLI (no GitHub token required for this subcommand):

./bytelyst-cli.sh agent-queue run --max 3     # full passthrough
./bytelyst-cli.sh aq status                   # short alias

Boot-persistence (auto-start on login)

To run the worker non-stop and survive reboot / crash / logout (not just a closed terminal), install the macOS LaunchAgent — it auto-starts agent-queue run on login under caffeinate and restarts it via KeepAlive:

bash launchd/install.sh             # install + start now
bash launchd/install.sh --uninstall # stop + remove

Override engine/concurrency/secrets in ~/.agent-queue.env (e.g. AGENT_QUEUE_ENGINE=codex, AGENT_QUEUE_MAX=1). See launchd/README.md for the full layer comparison (tmux/caffeinate vs LaunchAgent) and gotchas.

Folder layout

queue/
  inbox/    # drop / queued .md files (oldest eligible picked first)
  building/ # currently executing (agent running)
  review/   # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
  testing/  # verify passed (QA) — awaiting manual `ship`
  shipped/  # manually shipped — the terminal success stage
  failed/   # non-zero exit, bad cwd, timeout, verify failure, or manual reject
  logs/     # <job>.log — full agent + verify output
  locks/    # per-key flock files (Linux hardening; unused on macOS)
  .state/   # <job>.meta heartbeats + daemon.pid (runtime only)
  .archive/ # <ts>/ — logs+meta moved here by `clean`

result= values written to <job>.meta: review, testing, shipped, failed, timeout, verify_failed, rejected, requeued, capability_mismatch (host missing a required capability — agent never launched), no_engine (an engine-class had no available engine), retries_exhausted (failed after retry.max attempts — single-host dead-letter stand-in), retry_scheduled (transient: requeued for another attempt), recovered (transient: an orphan was reclaimed to inbox/).

Profiles & deps

Profiles (roadmap §6)

A profile is a reusable role preset in profiles/<name>.md. A job opts in with profile: <name> and inherits any of these fields it does not set itself: verify (from the profile's default-verify), capabilities, engine-class, prefers-engine, allowed-scope, review-policy. The profile's persona block is prepended to the body sent to the engine (the job .md on disk is unchanged; secrets are never logged). Resolution runs before the capability gate and engine resolution, so inherited caps / engine-class take effect.

Precedence: job field > profile field > built-in default. Set AGENT_QUEUE_PROFILES to point at a different catalog directory (defaults to ./profiles).

Starter catalog: developer, backend-engineer, frontend-engineer, ux-designer, ui-designer, qa, reviewer, docs-writer, and a reserved planner. Each presets name, persona, capabilities, default-verify, engine-class, prefers-engine, allowed-scope, and review-policy.

allowed-scope (warn-only this phase). After a run on a git cwd, changed paths outside the profile/job allowed-scope globs (dir/** matches the whole subtree) are logged as a WARNING and recorded as scope_warning= in the meta — non-blocking (the job is not failed). path_in_scope is exposed as a unit-testable function.

deps / DAG, single host (roadmap §5)

deps: [keyA, keyB] references other jobs by their author-controlled idempotency-key. A dep is satisfied when a job with that key is in shipped/ (default), or in shipped/ or testing/ when the dependent job sets deps-mode: soft. A job with unmet deps is blocked: it is skipped in inbox selection (never launched, never failed) and surfaced in status as blocked (waiting on: <keys>), then re-evaluated every loop until its deps are met. add performs submit-time cycle detection over the inbox + active-stage dep graph and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.

Resilience (crash recovery & work preservation)

Single-host implementations of the durability model (roadmap §25):

  • Orphan recovery. A job left in building/ whose worker process is dead (no live pid, PID-reuse-guarded by pidstart) is an orphan from a previous crash/power-off. On run startup and on every loop iteration (or on demand via agent-queue.sh recover) it is moved back to inbox/ with attempts incremented. Recovery is idempotent — once moved out of building/ it is never recovered twice.
  • WIP checkpointing. When a job's cwd is a git repo, the worker creates/checks out a dedicated branch aq/wip/<job> at start and commits any changes to it on every exit path — success, failure, timeout, and SIGTERM/SIGINT (via a trap). It never commits to main/your current branch. Non-git cwd is skipped cleanly. wip_branch / wip_base / wip_commit are recorded in the meta.
  • Resume. When an orphan/retry of a job whose aq/wip/<job> branch already exists is relaunched, that branch is checked out first so the agent continues from the checkpoint instead of from zero.
  • Retry policy (retry frontmatter, now active). On a failure whose class is in on (crash/agent_error for a non-zero agent exit, timeout, verify_failed) the job is requeued to inbox/ honoring backoff (selection skips it until next_eligible) up to max attempts; on exhaustion it lands in failed/ with result=retries_exhausted, preserving the WIP branch + full log. No retry = no retry (Phase-0 behavior).

All bookkeeping (attempts, next_eligible, wip_*) is append-only in the meta and re-derivable from the meta + folder location, so recovery is crash-safe.

Insights (metrics & token accounting)

Each finished run records into <job>.meta: duration_s, exit, result, attempts, and — for a git cwdfiles_changed / lines_added / lines_deleted (diffed wip_base..HEAD). A single parse_usage <engine> <log> adapter extracts model / tokens_in / tokens_out / tokens_cached / cost_usd / turns / tool_calls when the engine exposes them.

agent-queue.sh insights <job>   # full metrics for one job
agent-queue.sh insights         # recent-jobs table + per-engine rollup

Token caveat (honest): real usage is captured only where the engine surfaces it. A cooperating wrapper may emit a machine-readable AQ_USAGE key=value … line; otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin session metrics + Copilot are API-only and currently TODO in parse_usage). When a value is not provider-reported it is omitted or flagged usage_estimated — numbers are never fabricated. The per-engine rollup marks totals that include any estimated value with *.

Tracker integration (§10)

Closes the task ↔ job round-trip against the platform-service items API: a tracker Item can become a job, and a job's outcome echoes back to the Item.

agent-queue.sh from-tracker <ITEM_ID>   # pull an Item -> materialize a job in inbox/
agent-queue.sh to-tracker <job>         # echo the job's current outcome to its Item

All HTTP goes through one curl wrapper (tracker_api); there are no other network calls. Real use needs platform-service running and a bearer token.

Config (env)

Var Default Meaning
AQ_TRACKER_API http://localhost:4003 base URL of the items API (routes live under /api)
AQ_TRACKER_TOKEN (none) bearer token — required for real calls; never hardcode
AQ_PRODUCT_ID (none) productId (sent as X-Product-Id; every Item has one)
AQ_TRACKER_CWD $PWD cwd a tracker-derived job runs in (Items carry no cwd)
AQ_TRACKER_AUTO 0 1 = auto-echo on each transition (default OFF — echo is manual)
AQ_TRACKER_STATUS_INPROGRESS / _DONE / _FAILED in_progress / done / wont_fix Item status per bucket (the API has no blocked/failed status)
AQ_TRACKER_API_CMD (none) test seam: a stub that replaces the curl HTTP entirely (selftest uses it)

from-tracker — Item → job

GET /api/items/<id>, then maps fields to job frontmatter:

Item Job
title + description job body (verbatim instruction markdown)
id tracker-item: <id> and idempotency-key: tracker-<id> (stable)
priority priority: (label overrides; else Item priority; else medium)
label engine-class:<x> engine-class: <x>
label profile:<x> profile: <x>
label priority:<x> priority: <x>
label cap:<token> a capabilities: [...] entry

Idempotent on the derived idempotency-key (Slice 1 dedupe) — pulling the same Item twice never enqueues a duplicate.

to-tracker — job → Item (one-way echo, §24.5)

Only if the job's meta has a tracker-item. Maps the job's stage/result to an Item status and PATCH /api/items/<id>/status, then POST /api/items/<id>/comments with a metrics-only summary (result, attempts, duration, tokens/cost, +/- lines — never prompt content or secrets):

job result/stage Item status
building / review / testing / recovered in_progress
shipped done
failed / timeout / verify_failed / retries_exhausted / capability_mismatch / no_engine / rejected wont_fix (override via AQ_TRACKER_STATUS_FAILED)

Idempotent via tracker_echoed in the meta (re-echoing an unchanged outcome is a no-op). The echo is one-way (child → tracker) and never authoritative for execution: an echo failure is logged and the job continues unchanged. With AQ_TRACKER_AUTO=1 the worker echoes automatically on each transition; otherwise echo is manual. status / insights surface the tracker-item and last echoed status.

Fleet integration (Phase 2)

Behind the AQ_FLEET flag, the runner becomes a factory that registers, heartbeats, claims, and reports against the platform-service fleet coordinator — so coordinator jobs run alongside local .md files on the same host. All coordinator logic lives in lib/fleet-client.sh (curl-only + POSIX awk, sourced by agent-queue.sh); the few hook points in the runner are all gated on fleet_enabled.

Offline vs fleet mode. With AQ_FLEET unset/0 (the default) the runner is the pure offline git-queue described above — zero coordinator calls, behavior byte-for-byte unchanged. With AQ_FLEET=1 the run loop also registers + claims from the coordinator, reports fenced stage transitions, renews leases, and (in fleet mode) routes the outcome echo through the coordinator's fleet_events instead of the direct tracker echo. The tracker echo remains the offline path.

AQ_FLEET=1 AQ_FLEET_TOKEN=AQ_PRODUCT_ID=… agent-queue.sh fleet-status   # register + show identity
AQ_FLEET=1 AQ_FLEET_TOKEN=AQ_PRODUCT_ID=… agent-queue.sh run            # claim + execute coordinator jobs

Config (env)

Var Default Meaning
AQ_FLEET 0 master switch — 1 enables coordinator integration; 0/unset = offline git-queue (zero coordinator calls)
AQ_FLEET_ROUTE 1 route_via_service: 1 = coordinator is authoritative for claim (P2-S3 behavior); 0 = local inbox authoritative (coordinator not used to source work)
AQ_FLEET_AUTOSHIP 0 1 = when the local verify gate passes, advance the coordinator job testing → shipped (the factory's verify is the test phase); 0 = report testing and rest for the human review gate
AQ_FLEET_PR 0 1 = for a job carrying a repo, run the agent in an isolated checkout on branch aq/job/<id>, then commit/push and gh pr create; the PR URL is reported back and recorded on the run
AQ_FLEET_REPOS_DIR .state/repos cache dir for PR-mode repo checkouts (one per repo)
AQ_FLEET_REPO_BASE (none) base dir of existing local repos; a job repo matching <base>/<repo> is cloned from there (fast, no network) and PRs are pushed to its GitHub origin (embedded creds stripped)
GH_BIN gh GitHub CLI used to open PRs in PR mode
AQ_FLEET_SHADOW 0 shadow/dual-run: 1 (requires AQ_FLEET=1 + AQ_FLEET_ROUTE=0) queries the coordinator in parallel and records divergence, never acting on it
AQ_FLEET_SHADOW_FACTORY_ID <factory>-shadow isolated id used for the read-only shadow claim (never the real factory id)
AQ_FLEET_SHADOW_LOG .state/fleet-shadow.log structured shadow-divergence log (ts⇥localJob⇥coordJob⇥verdict)
AQ_FLEET_API http://localhost:4003/api coordinator base URL (already includes /api)
AQ_FLEET_TOKEN (none) bearer token — never hardcode
AQ_PRODUCT_ID (none) productId (sent as X-Product-Id; shared with the tracker config)
AQ_FACTORY_ID <hostname>-<pid> stable factory identity for this process
AQ_FLEET_LEASE_RENEW_SEC 300 heartbeat / lease-renew cadence
AQ_FLEET_CAPS (auto) override the auto-detected capability tokens (comma/space list)
AQ_FLEET_CWD $PWD cwd a claimed coordinator job runs in
AQ_FLEET_API_CMD (none) test seam: a stub that replaces the curl HTTP entirely (selftest uses it)

Protocol (claim / heartbeat / report / fence / renew)

  • register / heartbeat: POST /fleet/factories/heartbeat {factoryId, capabilities[], health, load} — registration is the first heartbeat; re-sent on AQ_FLEET_LEASE_RENEW_SEC cadence.
  • claim: POST /fleet/claim {factoryId, capabilities[], leaseSeconds}. A returned job (id, bodyMd, leaseEpoch) is materialized as a transient local .md (frontmatter fleet-job-id + fleet-lease-epoch) so the existing runner executes it unchanged, interleaved with local files.
  • report (fenced): each stage transition (building/review/testing/shipped/failed) is PATCH /fleet/jobs/:id {stage, leaseEpoch, checkpoint?}. The coordinator writes fleet_events server-side. The payload carries only stage/epoch/checkpoint — never the prompt/bodyMd or token.
  • fencing (§18): if a report/renew returns conflict/409 (stale leaseEpoch → the coordinator reclaimed us), the worker self-aborts: it stops, does not ship/merge, and quarantines the local result to failed/ (result=fenced_quarantine) for human triage. A reclaimed zombie can never corrupt coordinator state.
  • lease renew / release: POST /fleet/jobs/:id/lease/renew while building (fenced); …/lease/release on terminal stages.
  • checkpoint: the WIP {wipBranch, wipCommit} is sent with the building report so a reclaim can resume (§25).

Offline-degrade + quarantine (§9)

If the coordinator is unreachable mid-job (5xx / connection error), the report is treated as degraded (logged, fleet_degraded=1): the in-flight job finishes locally rather than being abandoned. On the next reachable call the worker presents its leaseEpoch; if the coordinator now reports it stale (it was reclaimed during the outage), the local result is quarantined (marked, not auto-shipped) and surfaced for human triage — split-brain is resolved in favor of the coordinator without losing the work. status shows the factory id + per-job fleet=<id>@e<epoch>; insights lists the fleet_* fields.

Feature flags + shadow / dual-run (Slice 4, §16/§27)

Three explicit, independently-toggleable levels gate the coordinator — a safe, reversible path to validate the fleet coordinator against the proven single-host (P1) behavior before any real cutover:

Flag Effect
AQ_FLEET=0 Pure offline. Zero coordinator calls (including shadow). Offline git-queue path is byte-for-byte unchanged.
AQ_FLEET_ROUTE=1 (default) route_via_service — the coordinator is authoritative for claim/assignment (today's P2-S3 behavior).
AQ_FLEET_ROUTE=0 Local inbox authoritative — the coordinator is not used to source work (the pre-cutover state).
AQ_FLEET_AUTOSHIP=1 Autonomous ship. On a passing local verify, advance the coordinator job testing → shipped (closing the testing → shipped gap). Default 0 reports testing and leaves the job for the human review gate / ship operator action.
AQ_FLEET_SHADOW=1 Shadow / dual-run (requires AQ_FLEET=1 and AQ_FLEET_ROUTE=0): run the offline path as authoritative and query the coordinator in parallel, recording divergence without acting on it.

Precedence. Shadow is only meaningful when ROUTE=0. If both AQ_FLEET_ROUTE=1 and AQ_FLEET_SHADOW=1 are set, ROUTE wins and shadow is disabled (a one-shot warning is logged) — you never route and shadow at the same time.

Side-effect-free by construction. Shadow never ships, quarantines, or mutates real job state. fleet_shadow_claim asks the coordinator what it would assign using an isolated -shadow factoryId + "dryRun":true,"shadow":true; if a coordinator without dry-run support actually assigned, the lease is released immediately so no real assignment persists. The would-be job is never materialized, run, or shipped locally. fleet_shadow_report mirrors the local stage as a shadow event ("shadow":true) purely to exercise reporting — the coordinator response is logged but never acted on (no fence/quarantine).

Each iteration fleet_shadow_compare classifies the local (authoritative) decision vs the coordinator's would-be decision as AGREE / DIVERGE / COORD_EMPTY / LOCAL_EMPTY and appends a line to the shadow log. Summarize it any time:

agent-queue.sh fleet-shadow-report        # per-verdict counts + agreement rate + recent divergences
agent-queue.sh fleet-shadow-report 25     # last 25 divergence/error events
agent-queue.sh status                     # surfaces the three flags' resolved state

Cutover ladder (rollback at any step):

  1. Observe (zero risk): AQ_FLEET=1 AQ_FLEET_ROUTE=0 AQ_FLEET_SHADOW=1 run — the local path stays authoritative; the coordinator is only shadowed.
  2. Inspect agreement: fleet-shadow-report — drive AGREEMENT toward 100%, investigating each DIVERGE.
  3. Cut over: once agreement is high, flip AQ_FLEET_ROUTE=1 (coordinator becomes authoritative).
  4. Rollback: set AQ_FLEET_ROUTE=0 (and/or AQ_FLEET=0) at any time — instant return to the local/offline path, no data migration.

Config (env overrides)

Var Default Meaning
AGENT_QUEUE_ROOT ./queue where the kanban folders live
AGENT_QUEUE_MAX 3 max concurrent agents (override per-run with run --max N)
AGENT_QUEUE_ENGINE devin default engine when none in frontmatter
AGENT_QUEUE_POLL 3 inbox poll interval (seconds)
AGENT_QUEUE_VERIFY (empty) default auto-QA verify command; per-job verify: overrides it
AGENT_QUEUE_STALL_MIN 10 minutes of unchanged log before a worker is ⚠ stalled
DEVIN_BIN / CLAUDE_BIN / CODEX_BIN / COPILOT_BIN autodetected override CLI binary paths
FLOCK_BIN / TIMEOUT_BIN autodetected flock (lock hardening) and timeout/gtimeout (hard timeouts); absent on stock macOS — see notes

⚠️ Safety

Running agents with yolo: true means no approval prompts — they will edit files, run shell commands, and commit unattended. Mitigate:

  • Prefer scope-locked prompt files (e.g. "edit only under dashboards/tracker-web/").
  • Tell prompts not to git push — review commits before they leave your machine.
  • Same-repo safety is automatic: jobs sharing a cwd (or lock: key) are serialized, so two agents never run in one repo at once — even at --max 2+.
  • Set a timeout: on long jobs so a wedged agent can't run forever.
  • Watch cost: each job is a full agent session.

Portability notes

  • macOS has no flock/timeout; locking relies on the single run-loop (enforced by the second-run refusal) and timeouts use a pure-bash watchdog. Install coreutils (gtimeout) for hard process-tree kills.
  • Linux (incl. Gitea CI) uses flock + timeout for cross-process hardening.

Roadmap / nice-to-haves

  • Per-repo lock to serialize same-repo jobs automatically (lock: / cwd).
  • Per-job timeout: with hard kill (or bash watchdog fallback).
  • Stall detection in status/dash.
  • requeue failed jobs + clean/archive old runs.
  • Build/ship lifecycle: review → testing → shipped with auto-QA verify: gate + manual ship.
  • --push opt-in policy + commit review gate.
  • Optional notifications (Slack/desktop) on done/failed/stall.
  • Persisted run-loop as a daemon/service with auto-restart.