saravanakumardb1 d0e800247c feat(agent-queue): PR mode clones from local repo base (AQ_FLEET_REPO_BASE)

MVP: when AQ_FLEET_REPO_BASE/<repo> is an existing local checkout, use it as the
clone source (fast, no network) and push/PR to its GitHub origin — embedded creds
in the local origin URL are stripped (gh credential helper handles auth). Selftest
PASS (full-path bare-repo fallback unchanged).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2026-05-31 05:36:46 -07:00

32 KiB

Raw Blame History

agent-queue

A zero-dependency folder "kanban" runner for headless coding-agent CLIs — Devin, Claude Code, and OpenAI Codex. Drop prompt .md files into a folder, and they get executed (in auto-approve mode) one slot at a time, moving through inbox → building → review → testing → shipped (plus failed) with live status.

Vision & roadmap: where this is headed — a distributed multi-machine "gigafactory" (fleet of factories × tools × profiles, scheduler-routed, built on platform-service + tracker-web) — is specified as a checklist-driven implementation roadmap in docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md. A full architecture overview, diagrams, code map and onboarding live alongside it in docs/GIGAFACTORY/.

Run safety: how the daemon and its agents must operate (isolated worktrees, branch-per-task, least-privilege instead of blanket --yolo/dangerous on live checkouts) is defined in docs/RUN_POLICY.md. Read it before enabling yolo: true.

Build/ship lifecycle — auto-QA, manual ship:

inbox ─▶ building ─▶ review ─▶ testing ─▶ shipped
  (queued)  (agent     (rc=0;    (verify    (you ran
            running)   awaiting  passed —    `ship`)
                       verify)   QA gate)
                          │
        agent rc≠0 /      │ verify fails
        timeout ──────────┴──────────────▶ failed

Auto: agent exits 0 → review/. If a verify: command is configured it runs automatically: pass → testing/ (QA), fail → failed/. No verify: → the job parks in review/ for a manual promote.
Manual: you ship a testing/ job → shipped/ (the human gate). Shipping is never automatic.

Why this exists: the agent CLIs ship a minimal local interface (no built-in batch/queue/dashboard — that lives in their cloud products). This is the zero-dependency bash glue that turns "run one prompt interactively" into "queue many and walk away."

Quick start

cd learning_ai_devops_tools/agent-queue
chmod +x agent-queue.sh
./agent-queue.sh init

# queue a roadmap for Devin, running in the tracker-web repo, auto-approving everything
./agent-queue.sh add ~/roadmaps/UX-2.md \
  --engine devin \
  --cwd /Users/sd9235/code/mygh/learning_ai_common_plat/dashboards/tracker-web \
  --yolo

# start processing (foreground; Ctrl-C to stop). Run up to 3 agents at once (default).
./agent-queue.sh run --max 3

In a second terminal, watch progress:

./agent-queue.sh watch

  AGENT QUEUE  /…/agent-queue/queue
  inbox 3   building 2   review 1   testing 2   shipped 5   failed 0   running 2/2

  RUNNING
    20260528-2130__UX-2        devin     4m12s  pid 51234  ⏺ Edited src/app/dashboard/items/page.tsx
    20260528-2131__UX-3        claude    1m02s  pid 51290  Running: pnpm typecheck

How a task is configured

Each .md carries optional frontmatter telling the runner which engine to use, which directory to run in, and whether to auto-approve:

---
engine: devin          # devin | claude | codex | copilot  (default: $AGENT_QUEUE_ENGINE)
cwd: /abs/path/to/repo # where the agent executes   (default: cwd when added)
yolo: true             # auto-approve ALL tools      (default: true)
lock: my-repo          # optional mutex key (default: cwd). Jobs sharing a key run serially
timeout: 45m           # optional. 90s|45m|2h|1d. On expiry → failed (result=timeout)
verify: pnpm -s test   # optional auto-QA gate. Runs in cwd after rc=0:
                       #   pass → testing/ (QA),  fail → failed/
                       #   (omit to park in review/ for manual promote)
---

# Your task / roadmap goes here
...

add --engine/--cwd/--yolo will inject this frontmatter for you if the file doesn't already have a --- block.

Manifest fields (Gigafactory Phase 1)

The runner parses the richer gigafactory manifest backward-compatibly — a legacy engine/cwd/yolo-only .md behaves exactly as before. Fields marked RESERVED are parsed, stored in .state/<job>.meta, and shown in status, but are otherwise no-ops until a later phase (they do not yet affect execution).

Field	Status	Default	Meaning
`engine`	active	`$AGENT_QUEUE_ENGINE`	explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class`
`cwd` / `yolo` / `lock` / `timeout` / `verify`	active	see above	Phase-0 behavior, unchanged
`priority`	active	`medium`	`critical\|high\|medium\|low`. Inbox is picked highest-priority first, then oldest (was pure FIFO)
`engine-class`	active	(none)	used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first available engine. No engine available → job fails `result=no_engine`
`prefers-engine`	active	(none)	optional order hint for `engine-class` resolution, e.g. `[claude, devin]`
`capabilities`	active	(none)	hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` and the agent is never launched (grammar below)
`idempotency-key`	active	(none)	dedupe on `add` (semantics below)
`profile`	active	(none)	inherit persona + verify/caps/engine-class/prefers-engine/allowed-scope/review-policy from `profiles/<name>.md` (job fields override — see Profiles)
`prefers`	RESERVED	(none)	soft routing/affinity hints (e.g. `[factory:mac-2]`)
`budget`	RESERVED	(none)	`{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice)
`deps` / `deps-mode`	active	(none)	block until each referenced `idempotency-key` is in `shipped/` (or `testing/` when `deps-mode: soft`). Submit-time cycle detection (see Profiles & deps)
`retry`	active	(none)	`{ max: N, backoff: 5m, on: [timeout, verify_failed, crash] }` — requeue failures with backoff up to `max`, then `retries_exhausted` (see Resilience)
`review-policy`	RESERVED	(none)	`auto\|manual\|reviewers:[…]`
`artifacts`	RESERVED	(none)	extra outputs to capture (coverage, screenshots)
`tracker-item`	RESERVED	(none)	link back to the originating tracker task

Capability grammar (a job matches a host iff every required token is satisfied):

Token form	Example	Satisfied when
`key` (bare presence)	`gpu`	the host advertises `key` in any form
`key:value` (exact)	`os:mac`, `engine:devin`, `has:git`	the host advertises that exact token
`key:any` (wildcard)	`os:any`	the host advertises any `key:*` (so `os:any` matches every host)
`key<op>version` (`>=` `>` `=` `<=` `<`)	`node>=20`	numeric/semver-major compare vs the host's `key:<n>`

The host advertises (via detect_capabilities): os:<mac\|linux>, engine:<each available engine>, node:<major>, and has:<git\|pnpm\|docker> when present.

idempotency-key semantics (on add, hashing the frontmatter-stripped body):

same key + same body → no-op (logged duplicate, skipped).
same key + different body, prior job still in inbox/ → supersedes it (replaces the queued file).
same key + different body, prior job already past inbox/ (building/review/testing/shipped) → rejected with a clear error (use a new key, or requeue the existing job).

Engine mapping

`engine:`	Command run	Auto-approve flag (`yolo: true`)
`devin`	`devin -p --prompt-file <body>`	`--permission-mode dangerous`
`claude`	`claude -p` (body on stdin)	`--dangerously-skip-permissions`
`codex`	`codex exec` (body on stdin)	`--dangerously-bypass-approvals-and-sandbox`
`copilot`	`copilot -p` (body on stdin)	`--allow-all-tools` (best-effort; chat-coder class target)

The frontmatter is stripped before the body reaches the agent, and claude/codex receive it on stdin so a body starting with -- is never misparsed as a flag.

Flags drift between CLI versions — if one changes, edit build_agent_cmd() in agent-queue.sh (it's the single place each engine is mapped).

Commands

Command	What it does
`init`	create the `queue/` folders
`add <file> [--engine E] [--cwd P] [--yolo\|--no-yolo]`	queue a prompt into `inbox/`
`run [--max N] [--engine E] [--once]`	process the inbox (foreground loop)
`status`	kanban counts + running-worker table (marks `⚠ stalled`; per-job insights sub-line)
`watch [interval]`	live `status` (bash), redrawn every N seconds (default 2)
`insights [job]`	per-job metrics, or a recent-jobs table + per-engine token/cost/success rollup (see Insights)
`recover`	reclaim orphaned `building/` jobs (dead worker) back to `inbox/` (see Resilience)
`dash [--interval N]`	interactive Node dashboard — navigable numbered job list with single-key actions (see below)
`stop`	kill running workers + the run loop
`logs <job> [-f]`	print / follow a job's log
`promote <job>`	advance one stage forward: `review → testing → shipped`
`ship <job>`	manual gate: move a `testing/` (QA) job → `shipped/`
`reject <job>`	send a `review/` or `testing/` job → `failed/`
`requeue <job>`	move a `failed`/`review`/`testing` job back to `inbox/` for a fresh run
`clean [--keep N]`	archive finished logs+meta beyond the newest N (default 50) into `queue/.archive/`

Only one run loop may be active per queue — a second run against the same queue is refused while the first is alive (a stale daemon.pid is cleared).

Interactive dashboard (`dash`)

dash is a single-script, menu-driven control panel (think a tiny "glassbox"). It shows the kanban counts, live RUNNING workers (engine, elapsed, last log line, stall), a navigable numbered JOBS list, and RECENT finished jobs — and lets you act on jobs without leaving the screen. Every action shells out to agent-queue.sh, so the script stays the single source of truth.

Key	Action
`↑`/`↓`, `j`/`k`, `1`–`9`	select a job in the JOBS list
`enter` / `l`	view the selected job's log (live, auto-refreshing)
`p`	promote (`review → testing → shipped`)
`s`	ship (`testing`/QA → `shipped`, the manual gate)
`x`	reject (`review`/`testing` → `failed`) — asks `y/n`
`u`	requeue (`failed`/`review`/`testing` → `inbox`) — asks `y/n`
`r`	start the `run` loop (detached → `logs/run-loop.log`)
`S`	stop the run loop + running workers
`g`	refresh now · `?`/`h` help · `q`/`Ctrl-C` quit

The header shows a ● run loop pid N / ○ run loop stopped indicator. Run it in a TTY for the interactive mode; piped/non-TTY it falls back to a read-only live view.

Via `bytelyst-cli.sh`

Wired into the repo's unified CLI (no GitHub token required for this subcommand):

./bytelyst-cli.sh agent-queue run --max 3     # full passthrough
./bytelyst-cli.sh aq status                   # short alias

To run the worker non-stop and survive reboot / crash / logout (not just a closed terminal), install the macOS LaunchAgent — it auto-starts agent-queue run on login under caffeinate and restarts it via KeepAlive:

bash launchd/install.sh             # install + start now
bash launchd/install.sh --uninstall # stop + remove

Override engine/concurrency/secrets in ~/.agent-queue.env (e.g. AGENT_QUEUE_ENGINE=codex, AGENT_QUEUE_MAX=1). See launchd/README.md for the full layer comparison (tmux/caffeinate vs LaunchAgent) and gotchas.

Folder layout

queue/
  inbox/    # drop / queued .md files (oldest eligible picked first)
  building/ # currently executing (agent running)
  review/   # agent exited 0 — awaiting the auto-QA verify gate (or manual promote)
  testing/  # verify passed (QA) — awaiting manual `ship`
  shipped/  # manually shipped — the terminal success stage
  failed/   # non-zero exit, bad cwd, timeout, verify failure, or manual reject
  logs/     # <job>.log — full agent + verify output
  locks/    # per-key flock files (Linux hardening; unused on macOS)
  .state/   # <job>.meta heartbeats + daemon.pid (runtime only)
  .archive/ # <ts>/ — logs+meta moved here by `clean`

result= values written to <job>.meta: review, testing, shipped, failed, timeout, verify_failed, rejected, requeued, capability_mismatch (host missing a required capability — agent never launched), no_engine (an engine-class had no available engine), retries_exhausted (failed after retry.max attempts — single-host dead-letter stand-in), retry_scheduled (transient: requeued for another attempt), recovered (transient: an orphan was reclaimed to inbox/).

Profiles & deps

Profiles (roadmap §6)

A profile is a reusable role preset in profiles/<name>.md. A job opts in with profile: <name> and inherits any of these fields it does not set itself: verify (from the profile's default-verify), capabilities, engine-class, prefers-engine, allowed-scope, review-policy. The profile's persona block is prepended to the body sent to the engine (the job .md on disk is unchanged; secrets are never logged). Resolution runs before the capability gate and engine resolution, so inherited caps / engine-class take effect.

Precedence: job field > profile field > built-in default. Set AGENT_QUEUE_PROFILES to point at a different catalog directory (defaults to ./profiles).

Starter catalog: developer, backend-engineer, frontend-engineer, ux-designer, ui-designer, qa, reviewer, docs-writer, and a reserved planner. Each presets name, persona, capabilities, default-verify, engine-class, prefers-engine, allowed-scope, and review-policy.

allowed-scope (warn-only this phase). After a run on a git cwd, changed paths outside the profile/job allowed-scope globs (dir/** matches the whole subtree) are logged as a WARNING and recorded as scope_warning= in the meta — non-blocking (the job is not failed). path_in_scope is exposed as a unit-testable function.

deps / DAG, single host (roadmap §5)

deps: [keyA, keyB] references other jobs by their author-controlled idempotency-key. A dep is satisfied when a job with that key is in shipped/ (default), or in shipped/ or testing/ when the dependent job sets deps-mode: soft. A job with unmet deps is blocked: it is skipped in inbox selection (never launched, never failed) and surfaced in status as blocked (waiting on: <keys>), then re-evaluated every loop until its deps are met. add performs submit-time cycle detection over the inbox + active-stage dep graph and rejects (nonzero exit) a job that would create a cycle. Cross-machine deps are P2.

Resilience (crash recovery & work preservation)

Single-host implementations of the durability model (roadmap §25):

Orphan recovery. A job left in building/ whose worker process is dead (no live pid, PID-reuse-guarded by pidstart) is an orphan from a previous crash/power-off. On run startup and on every loop iteration (or on demand via agent-queue.sh recover) it is moved back to inbox/ with attempts incremented. Recovery is idempotent — once moved out of building/ it is never recovered twice.
WIP checkpointing. When a job's cwd is a git repo, the worker creates/checks out a dedicated branch aq/wip/<job> at start and commits any changes to it on every exit path — success, failure, timeout, and SIGTERM/SIGINT (via a trap). It never commits to main/your current branch. Non-git cwd is skipped cleanly. wip_branch / wip_base / wip_commit are recorded in the meta.
Resume. When an orphan/retry of a job whose aq/wip/<job> branch already exists is relaunched, that branch is checked out first so the agent continues from the checkpoint instead of from zero.
Retry policy (retry frontmatter, now active). On a failure whose class is in on (crash/agent_error for a non-zero agent exit, timeout, verify_failed) the job is requeued to inbox/ honoring backoff (selection skips it until next_eligible) up to max attempts; on exhaustion it lands in failed/ with result=retries_exhausted, preserving the WIP branch + full log. No retry = no retry (Phase-0 behavior).

All bookkeeping (attempts, next_eligible, wip_*) is append-only in the meta and re-derivable from the meta + folder location, so recovery is crash-safe.

Insights (metrics & token accounting)

Each finished run records into <job>.meta: duration_s, exit, result, attempts, and — for a git cwd — files_changed / lines_added / lines_deleted (diffed wip_base..HEAD). A single parse_usage <engine> <log> adapter extracts model / tokens_in / tokens_out / tokens_cached / cost_usd / turns / tool_calls when the engine exposes them.

agent-queue.sh insights <job>   # full metrics for one job
agent-queue.sh insights         # recent-jobs table + per-engine rollup

Token caveat (honest): real usage is captured only where the engine surfaces it. A cooperating wrapper may emit a machine-readable AQ_USAGE key=value … line; otherwise per-engine heuristics apply (Claude/Codex token fields parsed; Devin session metrics + Copilot are API-only and currently TODO in parse_usage). When a value is not provider-reported it is omitted or flagged usage_estimated — numbers are never fabricated. The per-engine rollup marks totals that include any estimated value with *.

Tracker integration (§10)

Closes the task ↔ job round-trip against the platform-service items API: a tracker Item can become a job, and a job's outcome echoes back to the Item.

agent-queue.sh from-tracker <ITEM_ID>   # pull an Item -> materialize a job in inbox/
agent-queue.sh to-tracker <job>         # echo the job's current outcome to its Item

All HTTP goes through one curl wrapper (tracker_api); there are no other network calls. Real use needs platform-service running and a bearer token.

Config (env)

Var	Default	Meaning
`AQ_TRACKER_API`	`http://localhost:4003`	base URL of the items API (routes live under `/api`)
`AQ_TRACKER_TOKEN`	(none)	bearer token — required for real calls; never hardcode
`AQ_PRODUCT_ID`	(none)	productId (sent as `X-Product-Id`; every Item has one)
`AQ_TRACKER_CWD`	`$PWD`	cwd a tracker-derived job runs in (Items carry no cwd)
`AQ_TRACKER_AUTO`	`0`	`1` = auto-echo on each transition (default OFF — echo is manual)
`AQ_TRACKER_STATUS_INPROGRESS` / `_DONE` / `_FAILED`	`in_progress` / `done` / `wont_fix`	Item status per bucket (the API has no blocked/failed status)
`AQ_TRACKER_API_CMD`	(none)	test seam: a stub that replaces the curl HTTP entirely (selftest uses it)

`from-tracker` — Item → job

GET /api/items/<id>, then maps fields to job frontmatter:

Item	Job
`title` + `description`	job body (verbatim instruction markdown)
`id`	`tracker-item: <id>` and `idempotency-key: tracker-<id>` (stable)
`priority`	`priority:` (label overrides; else Item priority; else `medium`)
label `engine-class:<x>`	`engine-class: <x>`
label `profile:<x>`	`profile: <x>`
label `priority:<x>`	`priority: <x>`
label `cap:<token>`	a `capabilities: [...]` entry

Idempotent on the derived idempotency-key (Slice 1 dedupe) — pulling the same Item twice never enqueues a duplicate.

`to-tracker` — job → Item (one-way echo, §24.5)

Only if the job's meta has a tracker-item. Maps the job's stage/result to an Item status and PATCH /api/items/<id>/status, then POST /api/items/<id>/comments with a metrics-only summary (result, attempts, duration, tokens/cost, +/- lines — never prompt content or secrets):

job result/stage	Item status
building / review / testing / recovered	`in_progress`
shipped	`done`
failed / timeout / verify_failed / retries_exhausted / capability_mismatch / no_engine / rejected	`wont_fix` (override via `AQ_TRACKER_STATUS_FAILED`)

Idempotent via tracker_echoed in the meta (re-echoing an unchanged outcome is a no-op). The echo is one-way (child → tracker) and never authoritative for execution: an echo failure is logged and the job continues unchanged. With AQ_TRACKER_AUTO=1 the worker echoes automatically on each transition; otherwise echo is manual. status / insights surface the tracker-item and last echoed status.

Fleet integration (Phase 2)

Behind the AQ_FLEET flag, the runner becomes a factory that registers, heartbeats, claims, and reports against the platform-service fleet coordinator — so coordinator jobs run alongside local .md files on the same host. All coordinator logic lives in lib/fleet-client.sh (curl-only + POSIX awk, sourced by agent-queue.sh); the few hook points in the runner are all gated on fleet_enabled.

Offline vs fleet mode. With AQ_FLEET unset/0 (the default) the runner is the pure offline git-queue described above — zero coordinator calls, behavior byte-for-byte unchanged. With AQ_FLEET=1 the run loop also registers + claims from the coordinator, reports fenced stage transitions, renews leases, and (in fleet mode) routes the outcome echo through the coordinator's fleet_events instead of the direct tracker echo. The tracker echo remains the offline path.

AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh fleet-status   # register + show identity
AQ_FLEET=1 AQ_FLEET_TOKEN=… AQ_PRODUCT_ID=… agent-queue.sh run            # claim + execute coordinator jobs

Config (env)

Var	Default	Meaning
`AQ_FLEET`	`0`	master switch — `1` enables coordinator integration; `0`/unset = offline git-queue (zero coordinator calls)
`AQ_FLEET_ROUTE`	`1`	`route_via_service`: `1` = coordinator is authoritative for claim (P2-S3 behavior); `0` = local inbox authoritative (coordinator not used to source work)
`AQ_FLEET_AUTOSHIP`	`0`	`1` = when the local verify gate passes, advance the coordinator job `testing → shipped` (the factory's verify is the test phase); `0` = report `testing` and rest for the human review gate
`AQ_FLEET_PR`	`0`	`1` = for a job carrying a `repo`, run the agent in an isolated checkout on branch `aq/job/<id>`, then commit/push and `gh pr create`; the PR URL is reported back and recorded on the run
`AQ_FLEET_REPOS_DIR`	`.state/repos`	cache dir for PR-mode repo checkouts (one per repo)
`AQ_FLEET_REPO_BASE`	(none)	base dir of existing local repos; a job `repo` matching `<base>/<repo>` is cloned from there (fast, no network) and PRs are pushed to its GitHub origin (embedded creds stripped)
`GH_BIN`	`gh`	GitHub CLI used to open PRs in PR mode
`AQ_FLEET_SHADOW`	`0`	shadow/dual-run: `1` (requires `AQ_FLEET=1` + `AQ_FLEET_ROUTE=0`) queries the coordinator in parallel and records divergence, never acting on it
`AQ_FLEET_SHADOW_FACTORY_ID`	`<factory>-shadow`	isolated id used for the read-only shadow claim (never the real factory id)
`AQ_FLEET_SHADOW_LOG`	`.state/fleet-shadow.log`	structured shadow-divergence log (`ts⇥localJob⇥coordJob⇥verdict`)
`AQ_FLEET_API`	`http://localhost:4003/api`	coordinator base URL (already includes `/api`)
`AQ_FLEET_TOKEN`	(none)	bearer token — never hardcode
`AQ_PRODUCT_ID`	(none)	productId (sent as `X-Product-Id`; shared with the tracker config)
`AQ_FACTORY_ID`	`<hostname>-<pid>`	stable factory identity for this process
`AQ_FLEET_LEASE_RENEW_SEC`	`300`	heartbeat / lease-renew cadence
`AQ_FLEET_CAPS`	(auto)	override the auto-detected capability tokens (comma/space list)
`AQ_FLEET_CWD`	`$PWD`	cwd a claimed coordinator job runs in
`AQ_FLEET_API_CMD`	(none)	test seam: a stub that replaces the curl HTTP entirely (selftest uses it)

Protocol (claim / heartbeat / report / fence / renew)

register / heartbeat: POST /fleet/factories/heartbeat {factoryId, capabilities[], health, load} — registration is the first heartbeat; re-sent on AQ_FLEET_LEASE_RENEW_SEC cadence.
claim: POST /fleet/claim {factoryId, capabilities[], leaseSeconds}. A returned job (id, bodyMd, leaseEpoch) is materialized as a transient local .md (frontmatter fleet-job-id + fleet-lease-epoch) so the existing runner executes it unchanged, interleaved with local files.
report (fenced): each stage transition (building/review/testing/shipped/failed) is PATCH /fleet/jobs/:id {stage, leaseEpoch, checkpoint?}. The coordinator writes fleet_events server-side. The payload carries only stage/epoch/checkpoint — never the prompt/bodyMd or token.
fencing (§18): if a report/renew returns conflict/409 (stale leaseEpoch → the coordinator reclaimed us), the worker self-aborts: it stops, does not ship/merge, and quarantines the local result to failed/ (result=fenced_quarantine) for human triage. A reclaimed zombie can never corrupt coordinator state.
lease renew / release: POST /fleet/jobs/:id/lease/renew while building (fenced); …/lease/release on terminal stages.
checkpoint: the WIP {wipBranch, wipCommit} is sent with the building report so a reclaim can resume (§25).

Offline-degrade + quarantine (§9)

If the coordinator is unreachable mid-job (5xx / connection error), the report is treated as degraded (logged, fleet_degraded=1): the in-flight job finishes locally rather than being abandoned. On the next reachable call the worker presents its leaseEpoch; if the coordinator now reports it stale (it was reclaimed during the outage), the local result is quarantined (marked, not auto-shipped) and surfaced for human triage — split-brain is resolved in favor of the coordinator without losing the work. status shows the factory id + per-job fleet=<id>@e<epoch>; insights lists the fleet_* fields.

Feature flags + shadow / dual-run (Slice 4, §16/§27)

Three explicit, independently-toggleable levels gate the coordinator — a safe, reversible path to validate the fleet coordinator against the proven single-host (P1) behavior before any real cutover:

Flag	Effect
`AQ_FLEET=0`	Pure offline. Zero coordinator calls (including shadow). Offline git-queue path is byte-for-byte unchanged.
`AQ_FLEET_ROUTE=1` (default)	route_via_service — the coordinator is authoritative for claim/assignment (today's P2-S3 behavior).
`AQ_FLEET_ROUTE=0`	Local inbox authoritative — the coordinator is not used to source work (the pre-cutover state).
`AQ_FLEET_AUTOSHIP=1`	Autonomous ship. On a passing local verify, advance the coordinator job `testing → shipped` (closing the `testing → shipped` gap). Default `0` reports `testing` and leaves the job for the human review gate / `ship` operator action.
`AQ_FLEET_SHADOW=1`	Shadow / dual-run (requires `AQ_FLEET=1` and `AQ_FLEET_ROUTE=0`): run the offline path as authoritative and query the coordinator in parallel, recording divergence without acting on it.

Precedence. Shadow is only meaningful when ROUTE=0. If both AQ_FLEET_ROUTE=1 and AQ_FLEET_SHADOW=1 are set, ROUTE wins and shadow is disabled (a one-shot warning is logged) — you never route and shadow at the same time.

Side-effect-free by construction. Shadow never ships, quarantines, or mutates real job state. fleet_shadow_claim asks the coordinator what it would assign using an isolated -shadow factoryId + "dryRun":true,"shadow":true; if a coordinator without dry-run support actually assigned, the lease is released immediately so no real assignment persists. The would-be job is never materialized, run, or shipped locally. fleet_shadow_report mirrors the local stage as a shadow event ("shadow":true) purely to exercise reporting — the coordinator response is logged but never acted on (no fence/quarantine).

Each iteration fleet_shadow_compare classifies the local (authoritative) decision vs the coordinator's would-be decision as AGREE / DIVERGE / COORD_EMPTY / LOCAL_EMPTY and appends a line to the shadow log. Summarize it any time:

agent-queue.sh fleet-shadow-report        # per-verdict counts + agreement rate + recent divergences
agent-queue.sh fleet-shadow-report 25     # last 25 divergence/error events
agent-queue.sh status                     # surfaces the three flags' resolved state

Cutover ladder (rollback at any step):

Observe (zero risk): AQ_FLEET=1 AQ_FLEET_ROUTE=0 AQ_FLEET_SHADOW=1 run — the local path stays authoritative; the coordinator is only shadowed.
Inspect agreement: fleet-shadow-report — drive AGREEMENT toward 100%, investigating each DIVERGE.
Cut over: once agreement is high, flip AQ_FLEET_ROUTE=1 (coordinator becomes authoritative).
Rollback: set AQ_FLEET_ROUTE=0 (and/or AQ_FLEET=0) at any time — instant return to the local/offline path, no data migration.

Config (env overrides)

Var	Default	Meaning
`AGENT_QUEUE_ROOT`	`./queue`	where the kanban folders live
`AGENT_QUEUE_MAX`	`3`	max concurrent agents (override per-run with `run --max N`)
`AGENT_QUEUE_ENGINE`	`devin`	default engine when none in frontmatter
`AGENT_QUEUE_POLL`	`3`	inbox poll interval (seconds)
`AGENT_QUEUE_VERIFY`	(empty)	default auto-QA verify command; per-job `verify:` overrides it
`AGENT_QUEUE_STALL_MIN`	`10`	minutes of unchanged log before a worker is `⚠ stalled`
`DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN`	autodetected	override CLI binary paths
`FLOCK_BIN` / `TIMEOUT_BIN`	autodetected	`flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes

⚠️ Safety

Running agents with yolo: true means no approval prompts — they will edit files, run shell commands, and commit unattended. Mitigate:

Prefer scope-locked prompt files (e.g. "edit only under dashboards/tracker-web/").
Tell prompts not to git push — review commits before they leave your machine.
Same-repo safety is automatic: jobs sharing a cwd (or lock: key) are serialized, so two agents never run in one repo at once — even at --max 2+.
Set a timeout: on long jobs so a wedged agent can't run forever.
Watch cost: each job is a full agent session.

Portability notes

macOS has no flock/timeout; locking relies on the single run-loop (enforced by the second-run refusal) and timeouts use a pure-bash watchdog. Install coreutils (gtimeout) for hard process-tree kills.
Linux (incl. Gitea CI) uses flock + timeout for cross-process hardening.

Roadmap / nice-to-haves

Per-repo lock to serialize same-repo jobs automatically (lock: / cwd).
Per-job timeout: with hard kill (or bash watchdog fallback).
Stall detection in status/dash.
requeue failed jobs + clean/archive old runs.
Build/ship lifecycle: review → testing → shipped with auto-QA verify: gate + manual ship.
--push opt-in policy + commit review gate.
Optional notifications (Slack/desktop) on done/failed/stall.
Persisted run-loop as a daemon/service with auto-restart.

32 KiB Raw Blame History Unescape Escape

agent-queue

Quick start

How a task is configured

Manifest fields (Gigafactory Phase 1)

Engine mapping

Commands

Interactive dashboard (dash)

Via bytelyst-cli.sh

Boot-persistence (auto-start on login)

Folder layout

Profiles & deps

Profiles (roadmap §6)

deps / DAG, single host (roadmap §5)

Resilience (crash recovery & work preservation)

Insights (metrics & token accounting)

Tracker integration (§10)

Config (env)

from-tracker — Item → job

to-tracker — job → Item (one-way echo, §24.5)

Fleet integration (Phase 2)

Config (env)

Protocol (claim / heartbeat / report / fence / renew)

Offline-degrade + quarantine (§9)

Feature flags + shadow / dual-run (Slice 4, §16/§27)

Config (env overrides)

⚠️ Safety

Portability notes

Roadmap / nice-to-haves

32 KiB

Raw Blame History

Interactive dashboard (`dash`)

Via `bytelyst-cli.sh`

`from-tracker` — Item → job

`to-tracker` — job → Item (one-way echo, §24.5)