Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
31 KiB
agent-queue
A zero-dependency folder "kanban" runner for headless coding-agent CLIs —
Devin, Claude Code, and OpenAI Codex. Drop prompt .md files into a folder,
and they get executed (in auto-approve mode) one slot at a time, moving through
inbox → building → review → testing → shipped (plus failed) with live status.
Vision & roadmap: where this is headed — a distributed multi-machine "gigafactory" (fleet of factories × tools × profiles, scheduler-routed, built on platform-service + tracker-web) — is specified as a checklist-driven implementation roadmap in
docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md. A full architecture overview, diagrams, code map and onboarding live alongside it indocs/GIGAFACTORY/.
Run safety: how the daemon and its agents must operate (isolated worktrees, branch-per-task, least-privilege instead of blanket
--yolo/dangerous on live checkouts) is defined indocs/RUN_POLICY.md. Read it before enablingyolo: true.
Build/ship lifecycle — auto-QA, manual ship:
inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
(queued) (agent (rc=0; (verify (you ran
running) awaiting passed — `ship`)
verify) QA gate)
│
agent rc≠0 / │ verify fails
timeout ──────────┴──────────────▶ failed
- Auto: agent exits 0 →
review/. If averify:command is configured it runs automatically: pass →testing/(QA), fail →failed/. Noverify:→ the job parks inreview/for a manualpromote. - Manual: you
shipatesting/job →shipped/(the human gate). Shipping is never automatic.
Why this exists: the agent CLIs ship a minimal local interface (no built-in batch/queue/dashboard — that lives in their cloud products). This is the zero-dependency bash glue that turns "run one prompt interactively" into "queue many and walk away."
Quick start
cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init
# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
--engine devin \
--cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
--yolo
# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3
In a second terminal, watch progress:
./agent-queue.sh watch
AGENT QUEUE /…/agent-queue/queue
inbox 3 building 2 review 1 testing 2 shipped 5 failed 0 running 2/2
RUNNING
20260528-2130__UX-2 devin 4m12s pid 51234 ⏺ Edited src/app/dashboard/items/page.tsx
20260528-2131__UX-3 claude 1m02s pid 51290 Running: pnpm typecheck
How a task is configured
Each .md carries optional frontmatter telling the runner which engine to use,
which directory to run in, and whether to auto-approve:
---
engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes (default: cwd when added)
yolo: true # auto-approve ALL tools (default: true)
lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0:
# pass → testing/ (QA), fail → failed/
# (omit to park in review/ for manual promote)
---
# Your task / roadmap goes here
...
add --engine/--cwd/--yolo will inject this frontmatter for you if the file doesn't
already have a --- block.
Manifest fields (Gigafactory Phase 1)
The runner parses the richer gigafactory manifest
backward-compatibly — a legacy engine/cwd/yolo-only .md behaves exactly as before.
Fields marked RESERVED are parsed, stored in .state/<job>.meta, and shown in status, but
are otherwise no-ops until a later phase (they do not yet affect execution).
| Field | Status | Default | Meaning |
|---|---|---|---|
engine |
active | $AGENT_QUEUE_ENGINE |
explicit engine (devin|claude|codex|copilot) — always wins over engine-class |
cwd / yolo / lock / timeout / verify |
active | see above | Phase-0 behavior, unchanged |
priority |
active | medium |
critical|high|medium|low. Inbox is picked highest-priority first, then oldest (was pure FIFO) |
engine-class |
active | (none) | used only when engine is unset: agentic-coder→devin,claude,codex; chat-coder→copilot. Picks the first available engine. No engine available → job fails result=no_engine |
prefers-engine |
active | (none) | optional order hint for engine-class resolution, e.g. [claude, devin] |
capabilities |
active | (none) | hard host requirements, e.g. [os:any, node>=20, has:git]. If the host can't satisfy them the job is sent to failed/ with result=capability_mismatch and the agent is never launched (grammar below) |
idempotency-key |
active | (none) | dedupe on add (semantics below) |
profile |
active | (none) | inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from profiles/<name>.md (job fields override — see Profiles) |
prefers |
RESERVED | (none) | soft routing/affinity hints (e.g. [factory:mac-2]) |
budget |
RESERVED | (none) | { usd, tokens, wall } ceilings (wall enforcement is a later slice) |
deps / deps-mode |
active | (none) | block until each referenced idempotency-key is in shipped/ (or testing/ when deps-mode: soft). Submit-time cycle detection (see Profiles & deps) |
retry |
active | (none) | { max: N, backoff: 5m, on: [timeout, verify_failed, crash] } — requeue failures with backoff up to max, then retries_exhausted (see Resilience) |
review-policy |
RESERVED | (none) | auto|manual|reviewers:[…] |
artifacts |
RESERVED | (none) | extra outputs to capture (coverage, screenshots) |
tracker-item |
RESERVED | (none) | link back to the originating tracker task |
Capability grammar (a job matches a host iff every required token is satisfied):
| Token form | Example | Satisfied when |
|---|---|---|
key (bare presence) |
gpu |
the host advertises key in any form |
key:value (exact) |
os:mac, engine:devin, has:git |
the host advertises that exact token |
key:any (wildcard) |
os:any |
the host advertises any key:* (so os:any matches every host) |
key<op>version (>= > = <= <) |
node>=20 |
numeric/semver-major compare vs the host's key:<n> |
The host advertises (via detect_capabilities): os:<mac\|linux>, engine:<each available engine>,
node:<major>, and has:<git\|pnpm\|docker> when present.
idempotency-key semantics (on add, hashing the frontmatter-stripped body):
- same key + same body → no-op (logged
duplicate, skipped). - same key + different body, prior job still in
inbox/→ supersedes it (replaces the queued file). - same key + different body, prior job already past
inbox/(building/review/testing/shipped) → rejected with a clear error (use a new key, or requeue the existing job).
Engine mapping
engine: |
Command run | Auto-approve flag (yolo: true) |
|---|---|---|
devin |
devin -p --prompt-file <body> |
--permission-mode dangerous |
claude |
claude -p (body on stdin) |
--dangerously-skip-permissions |
codex |
codex exec (body on stdin) |
--dangerously-bypass-approvals-and-sandbox |
copilot |
copilot -p (body on stdin) |
--allow-all-tools (best-effort; chat-coder class target) |
The frontmatter is stripped before the body reaches the agent, and
claude/codex receive it on stdin so a body starting with -- is never
misparsed as a flag.
Flags drift between CLI versions — if one changes, edit
build_agent_cmd()inagent-queue.sh(it's the single place each engine is mapped).
Commands
| Command | What it does |
|---|---|
init |
create the queue/ folders |
add <file> [--engine E] [--cwd P] [--yolo|--no-yolo] |
queue a prompt into inbox/ |
run [--max N] [--engine E] [--once] |
process the inbox (foreground loop) |
status |
kanban counts + running-worker table (marks ⚠ stalled; per-job insights sub-line) |
watch [interval] |
live status (bash), redrawn every N seconds (default 2) |
insights [job] |
per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see Insights) |
recover |
reclaim orphaned building/ jobs (dead worker) back to inbox/ (see Resilience) |
dash [--interval N] |
interactive Node dashboard — navigable numbered job list with single-key actions (see below) |
stop |
kill running workers + the run loop |
logs <job> [-f] |
print / follow a job's log |
promote <job> |
advance one stage forward: review → testing → shipped |
ship <job> |
manual gate: move a testing/ (QA) job → shipped/ |
reject <job> |
send a review/ or testing/ job → failed/ |
requeue <job> |
move a failed/review/testing job back to inbox/ for a fresh run |
clean [--keep N] |
archive finished logs+meta beyond the newest N (default 50) into queue/.archive/ |
Only one run loop may be active per queue — a second run against the same
queue is refused while the first is alive (a stale daemon.pid is cleared).
Interactive dashboard (dash)
dash is a single-script, menu-driven control panel (think a tiny "glassbox").
It shows the kanban counts, live RUNNING workers (engine, elapsed, last log
line, stall), a navigable numbered JOBS list, and RECENT finished jobs — and
lets you act on jobs without leaving the screen. Every action shells out to
agent-queue.sh, so the script stays the single source of truth.
| Key | Action |
|---|---|
↑/↓, j/k, 1–9 |
select a job in the JOBS list |
enter / l |
view the selected job's log (live, auto-refreshing) |
p |
promote (review → testing → shipped) |
s |
ship (testing/QA → shipped, the manual gate) |
x |
reject (review/testing → failed) — asks y/n |
u |
requeue (failed/review/testing → inbox) — asks y/n |
r |
start the run loop (detached → logs/run-loop.log) |
S |
stop the run loop + running workers |
g |
refresh now · ?/h help · q/Ctrl-C quit |
The header shows a ● run loop pid N / ○ run loop stopped indicator. Run it in
a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.
Via bytelyst-cli.sh
Wired into the repo's unified CLI (no GitHub token required for this subcommand):
./bytelyst-cli.sh agent-queue run --max 3 # full passthrough
./bytelyst-cli.sh aq status # short alias
Boot-persistence (auto-start on login)
To run the worker non-stop and survive reboot / crash / logout (not just a
closed terminal), install the macOS LaunchAgent — it auto-starts agent-queue run
on login under caffeinate and restarts it via KeepAlive:
bash launchd/install.sh # install + start now
bash launchd/install.sh --uninstall # stop + remove
Override engine/concurrency/secrets in ~/.agent-queue.env (e.g.
AGENT_QUEUE_ENGINE=codex, AGENT_QUEUE_MAX=1). See launchd/README.md
for the full layer comparison (tmux/caffeinate vs LaunchAgent) and gotchas.
Folder layout
queue/
inbox/ # drop / queued .md files (oldest eligible picked first)
building/ # currently executing (agent running)
review/ # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
testing/ # verify passed (QA) — awaiting manual `ship`
shipped/ # manually shipped — the terminal success stage
failed/ # non-zero exit, bad cwd, timeout, verify failure, or manual reject
logs/ # <job>.log — full agent + verify output
locks/ # per-key flock files (Linux hardening; unused on macOS)
.state/ # <job>.meta heartbeats + daemon.pid (runtime only)
.archive/ # <ts>/ — logs+meta moved here by `clean`
result= values written to <job>.meta: review, testing, shipped,
failed, timeout, verify_failed, rejected, requeued, capability_mismatch
(host missing a required capability — agent never launched), no_engine
(an engine-class had no available engine), retries_exhausted (failed after
retry.max attempts — single-host dead-letter stand-in), retry_scheduled
(transient: requeued for another attempt), recovered (transient: an orphan was
reclaimed to inbox/).
Profiles & deps
Profiles (roadmap §6)
A profile is a reusable role preset in profiles/<name>.md. A job opts in with
profile: <name> and inherits any of these fields it does not set itself:
verify (from the profile's default-verify), capabilities, engine-class,
prefers-engine, allowed-scope, review-policy. The profile's persona block is
prepended to the body sent to the engine (the job .md on disk is unchanged;
secrets are never logged). Resolution runs before the capability gate and engine
resolution, so inherited caps / engine-class take effect.
Precedence: job field > profile field > built-in default. Set AGENT_QUEUE_PROFILES
to point at a different catalog directory (defaults to ./profiles).
Starter catalog: developer, backend-engineer, frontend-engineer, ux-designer,
ui-designer, qa, reviewer, docs-writer, and a reserved planner. Each presets
name, persona, capabilities, default-verify, engine-class, prefers-engine,
allowed-scope, and review-policy.
allowed-scope (warn-only this phase). After a run on a git cwd, changed paths
outside the profile/job allowed-scope globs (dir/** matches the whole subtree) are
logged as a WARNING and recorded as scope_warning= in the meta — non-blocking
(the job is not failed). path_in_scope is exposed as a unit-testable function.
deps / DAG, single host (roadmap §5)
deps: [keyA, keyB] references other jobs by their author-controlled
idempotency-key. A dep is satisfied when a job with that key is in shipped/
(default), or in shipped/ or testing/ when the dependent job sets
deps-mode: soft. A job with unmet deps is blocked: it is skipped in inbox
selection (never launched, never failed) and surfaced in status as
blocked (waiting on: <keys>), then re-evaluated every loop until its deps are met.
add performs submit-time cycle detection over the inbox + active-stage dep graph
and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.
Resilience (crash recovery & work preservation)
Single-host implementations of the durability model (roadmap §25):
- Orphan recovery. A job left in
building/whose worker process is dead (no livepid, PID-reuse-guarded bypidstart) is an orphan from a previous crash/power-off. Onrunstartup and on every loop iteration (or on demand viaagent-queue.sh recover) it is moved back toinbox/withattemptsincremented. Recovery is idempotent — once moved out ofbuilding/it is never recovered twice. - WIP checkpointing. When a job's
cwdis a git repo, the worker creates/checks out a dedicated branchaq/wip/<job>at start and commits any changes to it on every exit path — success, failure, timeout, and SIGTERM/SIGINT (via a trap). It never commits tomain/your current branch. Non-gitcwdis skipped cleanly.wip_branch/wip_base/wip_commitare recorded in the meta. - Resume. When an orphan/retry of a job whose
aq/wip/<job>branch already exists is relaunched, that branch is checked out first so the agent continues from the checkpoint instead of from zero. - Retry policy (
retryfrontmatter, now active). On a failure whose class is inon(crash/agent_errorfor a non-zero agent exit,timeout,verify_failed) the job is requeued toinbox/honoringbackoff(selection skips it untilnext_eligible) up tomaxattempts; on exhaustion it lands infailed/withresult=retries_exhausted, preserving the WIP branch + full log. Noretry= no retry (Phase-0 behavior).
All bookkeeping (attempts, next_eligible, wip_*) is append-only in the meta
and re-derivable from the meta + folder location, so recovery is crash-safe.
Insights (metrics & token accounting)
Each finished run records into <job>.meta: duration_s, exit, result,
attempts, and — for a git cwd — files_changed / lines_added /
lines_deleted (diffed wip_base..HEAD). A single parse_usage <engine> <log>
adapter extracts model / tokens_in / tokens_out / tokens_cached /
cost_usd / turns / tool_calls when the engine exposes them.
agent-queue.sh insights <job> # full metrics for one job
agent-queue.sh insights # recent-jobs table + per-engine rollup
Token caveat (honest): real usage is captured only where the engine surfaces it. A cooperating wrapper may emit a machine-readable
AQ_USAGE key=value …line; otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin session metrics + Copilot are API-only and currently TODO inparse_usage). When a value is not provider-reported it is omitted or flaggedusage_estimated— numbers are never fabricated. The per-engine rollup marks totals that include any estimated value with*.
Tracker integration (§10)
Closes the task ↔ job round-trip against the platform-service items API: a tracker Item can become a job, and a job's outcome echoes back to the Item.
agent-queue.sh from-tracker <ITEM_ID> # pull an Item -> materialize a job in inbox/
agent-queue.sh to-tracker <job> # echo the job's current outcome to its Item
All HTTP goes through one curl wrapper (tracker_api); there are no other network
calls. Real use needs platform-service running and a bearer token.
Config (env)
| Var | Default | Meaning |
|---|---|---|
AQ_TRACKER_API |
http://localhost:4003 |
base URL of the items API (routes live under /api) |
AQ_TRACKER_TOKEN |
(none) | bearer token — required for real calls; never hardcode |
AQ_PRODUCT_ID |
(none) | productId (sent as X-Product-Id; every Item has one) |
AQ_TRACKER_CWD |
$PWD |
cwd a tracker-derived job runs in (Items carry no cwd) |
AQ_TRACKER_AUTO |
0 |
1 = auto-echo on each transition (default OFF — echo is manual) |
AQ_TRACKER_STATUS_INPROGRESS / _DONE / _FAILED |
in_progress / done / wont_fix |
Item status per bucket (the API has no blocked/failed status) |
AQ_TRACKER_API_CMD |
(none) | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
from-tracker — Item → job
GET /api/items/<id>, then maps fields to job frontmatter:
| Item | Job |
|---|---|
title + description |
job body (verbatim instruction markdown) |
id |
tracker-item: <id> and idempotency-key: tracker-<id> (stable) |
priority |
priority: (label overrides; else Item priority; else medium) |
label engine-class:<x> |
engine-class: <x> |
label profile:<x> |
profile: <x> |
label priority:<x> |
priority: <x> |
label cap:<token> |
a capabilities: [...] entry |
Idempotent on the derived idempotency-key (Slice 1 dedupe) — pulling the same
Item twice never enqueues a duplicate.
to-tracker — job → Item (one-way echo, §24.5)
Only if the job's meta has a tracker-item. Maps the job's stage/result to an Item
status and PATCH /api/items/<id>/status, then POST /api/items/<id>/comments
with a metrics-only summary (result, attempts, duration, tokens/cost, +/- lines —
never prompt content or secrets):
| job result/stage | Item status |
|---|---|
| building / review / testing / recovered | in_progress |
| shipped | done |
| failed / timeout / verify_failed / retries_exhausted / capability_mismatch / no_engine / rejected | wont_fix (override via AQ_TRACKER_STATUS_FAILED) |
Idempotent via tracker_echoed in the meta (re-echoing an unchanged outcome is a
no-op). The echo is one-way (child → tracker) and never authoritative for
execution: an echo failure is logged and the job continues unchanged. With
AQ_TRACKER_AUTO=1 the worker echoes automatically on each transition; otherwise
echo is manual. status / insights surface the tracker-item and last echoed status.
Fleet integration (Phase 2)
Behind the AQ_FLEET flag, the runner becomes a factory that registers,
heartbeats, claims, and reports against the platform-service fleet coordinator —
so coordinator jobs run alongside local .md files on the same host. All
coordinator logic lives in lib/fleet-client.sh (curl-only +
POSIX awk, sourced by agent-queue.sh); the few hook points in the runner are all
gated on fleet_enabled.
Offline vs fleet mode. With
AQ_FLEETunset/0(the default) the runner is the pure offline git-queue described above — zero coordinator calls, behavior byte-for-byte unchanged. WithAQ_FLEET=1the run loop also registers + claims from the coordinator, reports fenced stage transitions, renews leases, and (in fleet mode) routes the outcome echo through the coordinator'sfleet_eventsinstead of the direct tracker echo. The tracker echo remains the offline path.
AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh fleet-status # register + show identity
AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh run # claim + execute coordinator jobs
Config (env)
| Var | Default | Meaning |
|---|---|---|
AQ_FLEET |
0 |
master switch — 1 enables coordinator integration; 0/unset = offline git-queue (zero coordinator calls) |
AQ_FLEET_ROUTE |
1 |
route_via_service: 1 = coordinator is authoritative for claim (P2-S3 behavior); 0 = local inbox authoritative (coordinator not used to source work) |
AQ_FLEET_AUTOSHIP |
0 |
1 = when the local verify gate passes, advance the coordinator job testing → shipped (the factory's verify is the test phase); 0 = report testing and rest for the human review gate |
AQ_FLEET_SHADOW |
0 |
shadow/dual-run: 1 (requires AQ_FLEET=1 + AQ_FLEET_ROUTE=0) queries the coordinator in parallel and records divergence, never acting on it |
AQ_FLEET_SHADOW_FACTORY_ID |
<factory>-shadow |
isolated id used for the read-only shadow claim (never the real factory id) |
AQ_FLEET_SHADOW_LOG |
.state/fleet-shadow.log |
structured shadow-divergence log (ts⇥localJob⇥coordJob⇥verdict) |
AQ_FLEET_API |
http://localhost:4003/api |
coordinator base URL (already includes /api) |
AQ_FLEET_TOKEN |
(none) | bearer token — never hardcode |
AQ_PRODUCT_ID |
(none) | productId (sent as X-Product-Id; shared with the tracker config) |
AQ_FACTORY_ID |
<hostname>-<pid> |
stable factory identity for this process |
AQ_FLEET_LEASE_RENEW_SEC |
300 |
heartbeat / lease-renew cadence |
AQ_FLEET_CAPS |
(auto) | override the auto-detected capability tokens (comma/space list) |
AQ_FLEET_CWD |
$PWD |
cwd a claimed coordinator job runs in |
AQ_FLEET_API_CMD |
(none) | test seam: a stub that replaces the curl HTTP entirely (selftest uses it) |
Protocol (claim / heartbeat / report / fence / renew)
- register / heartbeat:
POST /fleet/factories/heartbeat {factoryId, capabilities[], health, load}— registration is the first heartbeat; re-sent onAQ_FLEET_LEASE_RENEW_SECcadence. - claim:
POST /fleet/claim {factoryId, capabilities[], leaseSeconds}. A returned job (id,bodyMd,leaseEpoch) is materialized as a transient local.md(frontmatterfleet-job-id+fleet-lease-epoch) so the existing runner executes it unchanged, interleaved with local files. - report (fenced): each stage transition (
building/review/testing/shipped/failed) isPATCH /fleet/jobs/:id {stage, leaseEpoch, checkpoint?}. The coordinator writesfleet_eventsserver-side. The payload carries only stage/epoch/checkpoint — never the prompt/bodyMdor token. - fencing (§18): if a report/renew returns conflict/409 (stale
leaseEpoch→ the coordinator reclaimed us), the worker self-aborts: it stops, does not ship/merge, and quarantines the local result tofailed/(result=fenced_quarantine) for human triage. A reclaimed zombie can never corrupt coordinator state. - lease renew / release:
POST /fleet/jobs/:id/lease/renewwhile building (fenced);…/lease/releaseon terminal stages. - checkpoint: the WIP
{wipBranch, wipCommit}is sent with the building report so a reclaim can resume (§25).
Offline-degrade + quarantine (§9)
If the coordinator is unreachable mid-job (5xx / connection error), the report
is treated as degraded (logged, fleet_degraded=1): the in-flight job finishes
locally rather than being abandoned. On the next reachable call the worker
presents its leaseEpoch; if the coordinator now reports it stale (it was
reclaimed during the outage), the local result is quarantined (marked, not
auto-shipped) and surfaced for human triage — split-brain is resolved in favor of
the coordinator without losing the work. status shows the factory id + per-job
fleet=<id>@e<epoch>; insights lists the fleet_* fields.
Feature flags + shadow / dual-run (Slice 4, §16/§27)
Three explicit, independently-toggleable levels gate the coordinator — a safe, reversible path to validate the fleet coordinator against the proven single-host (P1) behavior before any real cutover:
| Flag | Effect |
|---|---|
AQ_FLEET=0 |
Pure offline. Zero coordinator calls (including shadow). Offline git-queue path is byte-for-byte unchanged. |
AQ_FLEET_ROUTE=1 (default) |
route_via_service — the coordinator is authoritative for claim/assignment (today's P2-S3 behavior). |
AQ_FLEET_ROUTE=0 |
Local inbox authoritative — the coordinator is not used to source work (the pre-cutover state). |
AQ_FLEET_AUTOSHIP=1 |
Autonomous ship. On a passing local verify, advance the coordinator job testing → shipped (closing the testing → shipped gap). Default 0 reports testing and leaves the job for the human review gate / ship operator action. |
AQ_FLEET_SHADOW=1 |
Shadow / dual-run (requires AQ_FLEET=1 and AQ_FLEET_ROUTE=0): run the offline path as authoritative and query the coordinator in parallel, recording divergence without acting on it. |
Precedence. Shadow is only meaningful when ROUTE=0. If both AQ_FLEET_ROUTE=1
and AQ_FLEET_SHADOW=1 are set, ROUTE wins and shadow is disabled (a one-shot
warning is logged) — you never route and shadow at the same time.
Side-effect-free by construction. Shadow never ships, quarantines, or
mutates real job state. fleet_shadow_claim asks the coordinator what it would
assign using an isolated -shadow factoryId + "dryRun":true,"shadow":true;
if a coordinator without dry-run support actually assigned, the lease is released
immediately so no real assignment persists. The would-be job is never
materialized, run, or shipped locally. fleet_shadow_report mirrors the local
stage as a shadow event ("shadow":true) purely to exercise reporting — the
coordinator response is logged but never acted on (no fence/quarantine).
Each iteration fleet_shadow_compare classifies the local (authoritative) decision
vs the coordinator's would-be decision as AGREE / DIVERGE / COORD_EMPTY /
LOCAL_EMPTY and appends a line to the shadow log. Summarize it any time:
agent-queue.sh fleet-shadow-report # per-verdict counts + agreement rate + recent divergences
agent-queue.sh fleet-shadow-report 25 # last 25 divergence/error events
agent-queue.sh status # surfaces the three flags' resolved state
Cutover ladder (rollback at any step):
- Observe (zero risk):
AQ_FLEET=1 AQ_FLEET_ROUTE=0 AQ_FLEET_SHADOW=1 run— the local path stays authoritative; the coordinator is only shadowed. - Inspect agreement:
fleet-shadow-report— driveAGREEMENTtoward 100%, investigating eachDIVERGE. - Cut over: once agreement is high, flip
AQ_FLEET_ROUTE=1(coordinator becomes authoritative). - Rollback: set
AQ_FLEET_ROUTE=0(and/orAQ_FLEET=0) at any time — instant return to the local/offline path, no data migration.
Config (env overrides)
| Var | Default | Meaning |
|---|---|---|
AGENT_QUEUE_ROOT |
./queue |
where the kanban folders live |
AGENT_QUEUE_MAX |
3 |
max concurrent agents (override per-run with run --max N) |
AGENT_QUEUE_ENGINE |
devin |
default engine when none in frontmatter |
AGENT_QUEUE_POLL |
3 |
inbox poll interval (seconds) |
AGENT_QUEUE_VERIFY |
(empty) | default auto-QA verify command; per-job verify: overrides it |
AGENT_QUEUE_STALL_MIN |
10 |
minutes of unchanged log before a worker is ⚠ stalled |
DEVIN_BIN / CLAUDE_BIN / CODEX_BIN / COPILOT_BIN |
autodetected | override CLI binary paths |
FLOCK_BIN / TIMEOUT_BIN |
autodetected | flock (lock hardening) and timeout/gtimeout (hard timeouts); absent on stock macOS — see notes |
⚠️ Safety
Running agents with yolo: true means no approval prompts — they will edit files,
run shell commands, and commit unattended. Mitigate:
- Prefer scope-locked prompt files (e.g. "edit only under
dashboards/tracker-web/"). - Tell prompts not to
git push— review commits before they leave your machine. - Same-repo safety is automatic: jobs sharing a
cwd(orlock:key) are serialized, so two agents never run in one repo at once — even at--max 2+. - Set a
timeout:on long jobs so a wedged agent can't run forever. - Watch cost: each job is a full agent session.
Portability notes
- macOS has no
flock/timeout; locking relies on the single run-loop (enforced by the second-run refusal) and timeouts use a pure-bash watchdog. Install coreutils (gtimeout) for hard process-tree kills. - Linux (incl. Gitea CI) uses
flock+timeoutfor cross-process hardening.
Roadmap / nice-to-haves
- Per-repo lock to serialize same-repo jobs automatically (
lock:/ cwd). - Per-job
timeout:with hard kill (or bash watchdog fallback). - Stall detection in
status/dash. requeuefailed jobs +clean/archive old runs.- Build/ship lifecycle:
review → testing → shippedwith auto-QAverify:gate + manualship. --pushopt-in policy + commit review gate.- Optional notifications (Slack/desktop) on done/failed/stall.
- Persisted run-loop as a daemon/service with auto-restart.