diff --git a/agent-queue/README.md b/agent-queue/README.md index 073aacd..f91437b 100644 --- a/agent-queue/README.md +++ b/agent-queue/README.md @@ -76,7 +76,7 @@ which directory to run in, and whether to auto-approve: ```md --- -engine: devin # devin | claude | codex (default: $AGENT_QUEUE_ENGINE) +engine: devin # devin | claude | codex | copilot (default: $AGENT_QUEUE_ENGINE) cwd: /abs/path/to/repo # where the agent executes (default: cwd when added) yolo: true # auto-approve ALL tools (default: true) lock: my-repo # optional mutex key (default: cwd). Jobs sharing a key run serially @@ -93,6 +93,50 @@ verify: pnpm -s test # optional auto-QA gate. Runs in cwd after rc=0: `add --engine/--cwd/--yolo` will inject this frontmatter for you if the file doesn't already have a `---` block. +### Manifest fields (Gigafactory Phase 1) + +The runner parses the richer [gigafactory manifest](docs/GIGAFACTORY_ROADMAP.md#5-the-evolved-job-manifest-feature) +**backward-compatibly** — a legacy `engine`/`cwd`/`yolo`-only `.md` behaves exactly as before. +Fields marked **RESERVED** are parsed, stored in `.state/.meta`, and shown in `status`, but +are otherwise **no-ops until a later phase** (they do not yet affect execution). + +| Field | Status | Default | Meaning | +| ----- | ------ | ------- | ------- | +| `engine` | active | `$AGENT_QUEUE_ENGINE` | explicit engine (`devin\|claude\|codex\|copilot`) — always wins over `engine-class` | +| `cwd` / `yolo` / `lock` / `timeout` / `verify` | active | see above | Phase-0 behavior, unchanged | +| `priority` | **active** | `medium` | `critical\|high\|medium\|low`. Inbox is picked **highest-priority first, then oldest** (was pure FIFO) | +| `engine-class` | **active** | _(none)_ | used only when `engine` is unset: `agentic-coder`→`devin,claude,codex`; `chat-coder`→`copilot`. Picks the first **available** engine. No engine available → job fails `result=no_engine` | +| `prefers-engine` | **active** | _(none)_ | optional order hint for `engine-class` resolution, e.g. `[claude, devin]` | +| `capabilities` | **active** | _(none)_ | hard host requirements, e.g. `[os:any, node>=20, has:git]`. If the host can't satisfy them the job is sent to `failed/` with `result=capability_mismatch` **and the agent is never launched** (grammar below) | +| `idempotency-key` | **active** | _(none)_ | dedupe on `add` (semantics below) | +| `profile` | RESERVED | _(none)_ | role/persona + caps (profiles land in a later slice) | +| `prefers` | RESERVED | _(none)_ | soft routing/affinity hints (e.g. `[factory:mac-2]`) | +| `budget` | RESERVED | _(none)_ | `{ usd, tokens, wall }` ceilings (`wall` enforcement is a later slice) | +| `deps` / `deps-mode` | RESERVED | _(none)_ | DAG dependencies (single-host blocking is a later slice) | +| `retry` | RESERVED | _(none)_ | `{ max, backoff, on }` retry policy | +| `review-policy` | RESERVED | _(none)_ | `auto\|manual\|reviewers:[…]` | +| `artifacts` | RESERVED | _(none)_ | extra outputs to capture (coverage, screenshots) | +| `tracker-item` | RESERVED | _(none)_ | link back to the originating tracker task | + +**Capability grammar** (a job matches a host iff **every** required token is satisfied): + +| Token form | Example | Satisfied when | +| ---------- | ------- | -------------- | +| `key` (bare presence) | `gpu` | the host advertises `key` in any form | +| `key:value` (exact) | `os:mac`, `engine:devin`, `has:git` | the host advertises that exact token | +| `key:any` (wildcard) | `os:any` | the host advertises any `key:*` (so `os:any` matches every host) | +| `keyversion` (`>=` `>` `=` `<=` `<`) | `node>=20` | numeric/semver-major compare vs the host's `key:` | + +The host advertises (via `detect_capabilities`): `os:`, `engine:`, +`node:`, and `has:` when present. + +**`idempotency-key` semantics** (on `add`, hashing the frontmatter-stripped body): + +- same key **+ same body** → **no-op** (logged `duplicate, skipped`). +- same key **+ different body**, prior job still in `inbox/` → **supersedes** it (replaces the queued file). +- same key **+ different body**, prior job already past `inbox/` (building/review/testing/shipped) → + **rejected** with a clear error (use a new key, or requeue the existing job). + ## Engine mapping | `engine:` | Command run | Auto-approve flag (`yolo: true`) | @@ -100,6 +144,7 @@ already have a `---` block. | `devin` | `devin -p --prompt-file ` | `--permission-mode dangerous` | | `claude` | `claude -p` (body on **stdin**) | `--dangerously-skip-permissions` | | `codex` | `codex exec` (body on **stdin**) | `--dangerously-bypass-approvals-and-sandbox` | +| `copilot` | `copilot -p` (body on **stdin**) | `--allow-all-tools` _(best-effort; chat-coder class target)_ | The frontmatter is **stripped** before the body reaches the agent, and claude/codex receive it on **stdin** so a body starting with `--` is never @@ -178,7 +223,9 @@ queue/ ``` **`result=` values** written to `.meta`: `review`, `testing`, `shipped`, -`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`. +`failed`, `timeout`, `verify_failed`, `rejected`, `requeued`, `capability_mismatch` +(host missing a required capability — agent never launched), `no_engine` +(an `engine-class` had no available engine). ## Config (env overrides) @@ -190,7 +237,7 @@ queue/ | `AGENT_QUEUE_POLL` | `3` | inbox poll interval (seconds) | | `AGENT_QUEUE_VERIFY` | _(empty)_ | default auto-QA verify command; per-job `verify:` overrides it | | `AGENT_QUEUE_STALL_MIN` | `10` | minutes of unchanged log before a worker is `⚠ stalled` | -| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` | autodetected | override CLI binary paths | +| `DEVIN_BIN` / `CLAUDE_BIN` / `CODEX_BIN` / `COPILOT_BIN` | autodetected | override CLI binary paths | | `FLOCK_BIN` / `TIMEOUT_BIN` | autodetected | `flock` (lock hardening) and `timeout`/`gtimeout` (hard timeouts); absent on stock macOS — see notes | ## ⚠️ Safety diff --git a/agent-queue/agent-queue.sh b/agent-queue/agent-queue.sh index 58287db..ee5bba3 100755 --- a/agent-queue/agent-queue.sh +++ b/agent-queue/agent-queue.sh @@ -62,6 +62,7 @@ TIMEOUT_BIN="${TIMEOUT_BIN:-$(command -v timeout || command -v gtimeout || true) DEVIN_BIN="${DEVIN_BIN:-$(command -v devin || echo "$HOME/.local/bin/devin")}" CLAUDE_BIN="${CLAUDE_BIN:-$(command -v claude || echo claude)}" CODEX_BIN="${CODEX_BIN:-$(command -v codex || echo codex)}" +COPILOT_BIN="${COPILOT_BIN:-$(command -v copilot || echo copilot)}" # ── Colors ────────────────────────────────────────────────────────── if [[ -t 1 ]]; then @@ -194,6 +195,188 @@ busy_keys() { done } +# ── Manifest helpers (Phase 1 — §5/§6/§7/§8 single-host subset) ────── +# +# parse_list -> one token per line. Accepts a YAML-ish inline list +# ("[a, b]"), comma- or space-separated values, with surrounding quotes +# stripped. Empty tokens are dropped. Used for `capabilities`, `prefers`, +# `prefers-engine`, `deps`, etc. +parse_list() { + local raw cleaned t + cleaned=$(printf '%s' "${1:-}" | tr '[],' ' ') + for t in $cleaned; do + t=${t#[\"\']}; t=${t%[\"\']} + [[ -n "$t" ]] && printf '%s\n' "$t" + done +} + +# _body_hash -> stable content hash of the frontmatter-STRIPPED body. +# Drives idempotency-key dedupe on `add`. Prefers shasum/sha1sum, falls back +# to cksum so it works on a bare macOS/Linux host with no extra tools. +_body_hash() { + local f=$1 + if command -v shasum >/dev/null 2>&1; then + strip_frontmatter "$f" | shasum | awk '{print $1}' + elif command -v sha1sum >/dev/null 2>&1; then + strip_frontmatter "$f" | sha1sum | awk '{print $1}' + else + strip_frontmatter "$f" | cksum | awk '{print $1"-"$2}' + fi +} + +# priority_rank -> sort rank (lower = picked first). Unknown/empty +# defaults to medium. Used by inbox_sorted for priority-then-age selection. +priority_rank() { + case "$1" in + critical) echo 0;; high) echo 1;; medium) echo 2;; low) echo 3;; *) echo 2;; + esac +} + +# inbox_sorted -> inbox/*.md ordered by priority (critical first) then oldest +# (filename embeds the queue timestamp, so a plain string sort = age order). +# Replaces the pure-FIFO `ls | sort`; per-lock serialization is unchanged +# (the caller still skips files whose lock key is busy). +inbox_sorted() { + local f p r + for f in "$INBOX"/*.md; do + [[ -e "$f" ]] || continue + p=$(fm_get "$f" priority medium); r=$(priority_rank "$p") + printf '%s\t%s\n' "$r" "$f" + done | sort -t"$(printf '\t')" -k1,1n -k2,2 | cut -f2- +} + +# engine_available -> 0 if a runnable binary exists for the engine +# (executable path or a name resolvable on PATH). Honors the *_BIN overrides +# (so the self-test stub counts as "available"). +engine_available() { + local eng=$1 bin="" + case "$eng" in + devin) bin=$DEVIN_BIN;; + claude) bin=$CLAUDE_BIN;; + codex) bin=$CODEX_BIN;; + copilot) bin=$COPILOT_BIN;; + *) return 1;; + esac + [[ -n "$bin" ]] || return 1 + [[ -x "$bin" ]] && return 0 + command -v "$bin" >/dev/null 2>&1 +} + +# engine_class_engines -> candidate engines (priority order) for an +# engine-class (§5 taxonomy). `review-only` has no concrete mapping yet +# (reserved). Unknown class -> empty. +engine_class_engines() { + case "$1" in + agentic-coder) echo "devin claude codex";; + chat-coder) echo "copilot";; + review-only) echo "";; + *) echo "";; + esac +} + +# detect_capabilities -> one capability token per line describing THIS host: +# os: (os:any required-side wildcard matches this) +# engine: for each available engine +# node: the host node major (compared by nodeN) +# has: tool probes (only emitted when present) +detect_capabilities() { + case "$(uname -s 2>/dev/null)" in + Darwin) echo "os:mac";; + Linux) echo "os:linux";; + *) echo "os:other";; + esac + local e + for e in devin claude codex copilot; do + engine_available "$e" && echo "engine:$e" + done + if command -v node >/dev/null 2>&1; then + local nv; nv=$(node -v 2>/dev/null | sed 's/^v//' | cut -d. -f1) + [[ "$nv" =~ ^[0-9]+$ ]] && echo "node:$nv" + fi + local t + for t in git pnpm docker; do + command -v "$t" >/dev/null 2>&1 && echo "has:$t" + done +} + +# _cap_token_ok -> 0 if satisfied (§5 grammar): +# key:any wildcard — host advertises any "key:*" token (os:any matches all) +# keyver numeric/semver-major compare against the host's "key:" token +# (op in >= > = <= <) +# key:value exact token match +# key bare presence — exact, or advertised in any "key:"/"key" form +_cap_token_ok() { + local tok=$1 avail=$2 a + if [[ "$tok" == *:any ]]; then + local wkey=${tok%:any} + for a in $avail; do [[ "$a" == "$wkey":* ]] && return 0; done + return 1 + fi + if [[ "$tok" =~ ^([A-Za-z0-9_.:-]+)(\>=|\<=|=|\>|\<)([0-9][0-9.]*)$ ]]; then + local key=${BASH_REMATCH[1]} op=${BASH_REMATCH[2]} want=${BASH_REMATCH[3]} hostval="" + for a in $avail; do [[ "$a" == "$key":* ]] && { hostval=${a#*:}; break; }; done + [[ -n "$hostval" ]] || return 1 + local hv=${hostval%%.*} wv=${want%%.*} + [[ "$hv" =~ ^[0-9]+$ && "$wv" =~ ^[0-9]+$ ]] || return 1 + case "$op" in + ">=") [[ "$hv" -ge "$wv" ]];; + ">") [[ "$hv" -gt "$wv" ]];; + "=") [[ "$hv" -eq "$wv" ]];; + "<=") [[ "$hv" -le "$wv" ]];; + "<") [[ "$hv" -lt "$wv" ]];; + esac + return $? + fi + if [[ "$tok" == *:* ]]; then + for a in $avail; do [[ "$a" == "$tok" ]] && return 0; done + return 1 + fi + for a in $avail; do + case "$a" in "$tok"|"$tok":*|"$tok"\>=*|"$tok"\>*|"$tok"=*|"$tok"\<=*|"$tok"\<*) return 0;; esac + done + return 1 +} + +# caps_match -> 0 iff EVERY required token +# is satisfied by the available set (both passed as whitespace-separated lists). +caps_match() { + local required=$1 available=$2 r + for r in $required; do + _cap_token_ok "$r" "$available" || return 1 + done + return 0 +} + +# resolve_engine -> the concrete engine to run: +# explicit `engine:` always wins; else `engine-class` picks the first +# available engine honoring `prefers-engine` then the class default order; +# else the global default engine. Prints "" only when an engine-class was +# requested but no engine in it is available (caller fails with no_engine). +resolve_engine() { + local f=$1 eng cls prefers + eng=$(fm_get "$f" engine "") + if [[ -n "$eng" ]]; then printf '%s' "$eng"; return 0; fi + cls=$(fm_get "$f" engine-class "") + if [[ -z "$cls" ]]; then printf '%s' "$DEFAULT_ENGINE"; return 0; fi + local class_engines; class_engines=$(engine_class_engines "$cls") + prefers=$(fm_get "$f" prefers-engine "") + local ordered=() seen=" " p c + if [[ -n "$prefers" ]]; then + while IFS= read -r p; do + [[ -n "$p" ]] || continue + case " $class_engines " in *" $p "*) ordered+=("$p"); seen+="$p ";; esac + done < <(parse_list "$prefers") + fi + for c in $class_engines; do + case "$seen" in *" $c "*) ;; *) ordered+=("$c"); seen+="$c ";; esac + done + local cand + for cand in ${ordered[@]+"${ordered[@]}"}; do + engine_available "$cand" && { printf '%s' "$cand"; return 0; } + done + printf '%s' "" +} + # ── Engine driver: builds argv into AGENT_CMD[]; sets AGENT_STDIN if the ── # prompt should be fed on stdin (claude/codex) rather than a flag. $pf is the # frontmatter-STRIPPED body file, so a body starting with '--' is never @@ -216,7 +399,14 @@ build_agent_cmd() { [[ "$yolo" == "true" ]] && AGENT_CMD+=( --dangerously-bypass-approvals-and-sandbox ) AGENT_STDIN="$pf" ;; - *) die "unknown engine '$engine' (use: devin | claude | codex)";; + copilot) + # Best-effort GitHub Copilot CLI mapping for the chat-coder engine-class. + # Flags drift between CLI versions — this is the single place to edit. + AGENT_CMD=( "$COPILOT_BIN" -p ) + [[ "$yolo" == "true" ]] && AGENT_CMD+=( --allow-all-tools ) + AGENT_STDIN="$pf" + ;; + *) die "unknown engine '$engine' (use: devin | claude | codex | copilot)";; esac } @@ -226,7 +416,6 @@ run_worker() { local job; job=$(basename "$doing_file") job=${job%.md} local engine cwd yolo logf metaf - engine=$(fm_get "$doing_file" engine "$DEFAULT_ENGINE") cwd=$(fm_get "$doing_file" cwd "$PWD") yolo=$(fm_get "$doing_file" yolo "true") logf="$LOGS/$job.log" @@ -234,6 +423,38 @@ run_worker() { # NOTE: the parent (cmd_run) creates $metaf with job/engine/cwd/started/pid. # The worker only ever APPENDS (ended/exit/result) to avoid a truncation race. + # ── Capability gate (§5/§8 single-host): if the job declares `capabilities` + # this host does not satisfy, route to failed/ WITHOUT launching the agent. ── + local req_caps; req_caps=$(parse_list "$(fm_get "$doing_file" capabilities "")" | tr '\n' ' ') + if [[ -n "${req_caps// /}" ]]; then + local avail; avail=$(detect_capabilities) + if ! caps_match "$req_caps" "$avail"; then + { + echo "===== agent-queue job: $job =====" + echo "CAPABILITY MISMATCH — host cannot satisfy required: $req_caps" + echo "host advertises: $(printf '%s' "$avail" | tr '\n' ' ')" + echo "agent NOT launched — routed to failed/ ($(date))" + } >> "$logf" + mv "$doing_file" "$FAILED/" 2>/dev/null + { echo "result=capability_mismatch"; echo "ended=$(date +%s)"; } >> "$metaf" + return 0 + fi + fi + + # ── Engine resolution (§5): explicit `engine` wins, else `engine-class`. + # No available engine for a requested class -> fail (no_engine), no launch. ── + engine=$(resolve_engine "$doing_file") + if [[ -z "$engine" ]]; then + { + echo "===== agent-queue job: $job =====" + echo "NO ENGINE — engine-class '$(fm_get "$doing_file" engine-class "")' has no available engine on this host" + echo "agent NOT launched — routed to failed/ ($(date))" + } >> "$logf" + mv "$doing_file" "$FAILED/" 2>/dev/null + { echo "result=no_engine"; echo "ended=$(date +%s)"; } >> "$metaf" + return 0 + fi + { echo "===== agent-queue job: $job =====" echo "engine=$engine cwd=$cwd yolo=$yolo" @@ -363,6 +584,50 @@ cmd_add() { esac done [[ -n "$file" && -f "$file" ]] || die "usage: add [--engine devin|claude|codex] [--cwd PATH] [--yolo|--no-yolo]" + + # ── idempotency-key dedupe (§5) ── + # If the file declares an idempotency-key, compare a content hash of its + # stripped body against existing jobs in any active stage: + # same key + same body -> no-op (skip; "duplicate") + # same key + different body, prior in inbox/ -> supersede (replace it) + # same key + different body, prior elsewhere -> reject (clear error) + local idem; idem=$(fm_get "$file" idempotency-key "") + if [[ -n "$idem" ]]; then + local newhash; newhash=$(_body_hash "$file") + local d ef ekey ehash stage + # pass 1: exact duplicate anywhere active -> no-op + for d in "$INBOX" "$BUILDING" "$REVIEW" "$TESTING" "$SHIPPED"; do + for ef in "$d"/*.md; do + [[ -e "$ef" ]] || continue + ekey=$(fm_get "$ef" idempotency-key "") + [[ "$ekey" == "$idem" ]] || continue + ehash=$(_body_hash "$ef") + if [[ "$ehash" == "$newhash" ]]; then + log "duplicate, skipped (idempotency-key=$C_BOLD$idem$C_RESET matches $(basename "$ef"))" + return 0 + fi + done + done + # pass 2: same key, different body, prior is past inbox -> reject + for d in "$BUILDING" "$REVIEW" "$TESTING" "$SHIPPED"; do + for ef in "$d"/*.md; do + [[ -e "$ef" ]] || continue + ekey=$(fm_get "$ef" idempotency-key "") + [[ "$ekey" == "$idem" ]] || continue + stage=$(basename "$d") + die "idempotency-key '$idem' already in use by $(basename "$ef") (stage: $stage) with different content — refusing. Use a new key, or requeue the existing job." + done + done + # pass 3: same key, different body, prior still queued -> supersede + for ef in "$INBOX"/*.md; do + [[ -e "$ef" ]] || continue + ekey=$(fm_get "$ef" idempotency-key "") + [[ "$ekey" == "$idem" ]] || continue + log "superseding inbox job $(basename "$ef") (same idempotency-key=$idem, changed content)" + rm -f "$ef" + done + fi + local base; base=$(basename "$file") local stamp; stamp=$(date +%Y%m%d-%H%M%S) local dest="$INBOX/${stamp}__${base}" @@ -411,8 +676,9 @@ cmd_run() { local running; running=$(active_workers) # launch jobs while we have capacity and an eligible inbox file while [[ "$running" -lt "$MAX_CONCURRENCY" ]]; do - # pick the oldest inbox file whose lock key is not currently busy, so two - # jobs sharing a cwd (or `lock:` key) never run at once, regardless of --max. + # pick by priority (critical→low) then age, skipping files whose lock key + # is currently busy, so two jobs sharing a cwd (or `lock:` key) never run + # at once regardless of --max. inbox_sorted replaces the old pure-FIFO sort. local busy; busy=$(busy_keys) local next="" cand cand_key while IFS= read -r cand; do @@ -420,25 +686,45 @@ cmd_run() { cand_key=$(lock_key_for "$cand") if printf '%s\n' "$busy" | grep -qxF -- "$cand_key"; then continue; fi next="$cand"; break - done < <(ls -1 "$INBOX"/*.md 2>/dev/null | sort) + done < <(inbox_sorted) [[ -z "$next" ]] && break local job; job=$(basename "$next"); job=${job%.md} local doing_file="$BUILDING/$(basename "$next")" mv "$next" "$doing_file" local w_eng w_cwd w_yolo w_key - w_eng=$(fm_get "$doing_file" engine "$DEFAULT_ENGINE") + # resolve the concrete engine now (explicit engine / engine-class) so the + # meta + status reflect what will actually run; run_worker re-resolves and + # is the authority on capability/engine gates. + w_eng=$(resolve_engine "$doing_file") w_cwd=$(fm_get "$doing_file" cwd "$PWD") w_yolo=$(fm_get "$doing_file" yolo "true") w_key=$(lock_key_for "$doing_file") - # write meta BEFORE launch (no pid yet), then append the worker pid from $! + # write meta BEFORE launch (no pid yet), then append the worker pid from $!. + # The new manifest fields (§5) are recorded here; only priority, + # capabilities, engine-class and idempotency-key are functional this + # phase — the rest are stored for `status`/audit but otherwise inert. { echo "job=$job" - echo "engine=$w_eng" + echo "engine=${w_eng:-}" echo "cwd=$w_cwd" echo "yolo=$w_yolo" echo "lock=$w_key" echo "started=$(date +%s)" + echo "priority=$(fm_get "$doing_file" priority medium)" + echo "profile=$(fm_get "$doing_file" profile "")" + echo "engine_class=$(fm_get "$doing_file" engine-class "")" + echo "capabilities=$(fm_get "$doing_file" capabilities "")" + echo "prefers=$(fm_get "$doing_file" prefers "")" + echo "prefers_engine=$(fm_get "$doing_file" prefers-engine "")" + echo "budget=$(fm_get "$doing_file" budget "")" + echo "deps=$(fm_get "$doing_file" deps "")" + echo "deps_mode=$(fm_get "$doing_file" deps-mode "")" + echo "idempotency_key=$(fm_get "$doing_file" idempotency-key "")" + echo "retry=$(fm_get "$doing_file" retry "")" + echo "review_policy=$(fm_get "$doing_file" review-policy "")" + echo "artifacts=$(fm_get "$doing_file" artifacts "")" + echo "tracker_item=$(fm_get "$doing_file" tracker-item "")" } > "$STATE/$job.meta" run_worker "$doing_file" & { echo "pid=$!"; echo "pidstart=$(_pidstart "$!")"; } >> "$STATE/$job.meta" @@ -490,6 +776,15 @@ cmd_status() { [[ "$age" -gt $(( STALL_MIN * 60 )) ]] && stall="${C_RED}⚠ stalled${C_RESET} " printf ' %s%-26s%s %-7s %3dm%02ds pid %-6s %s%s%s%s\n' \ "$C_BOLD" "$job" "$C_RESET" "$eng" $((el/60)) $((el%60)) "$pid" "$stall" "$C_DIM" "$last" "$C_RESET" + # manifest sub-line (Phase 1): priority + profile + capabilities + tracker-item + local m_prio m_prof m_caps m_trk extra="" + m_prio=$(grep '^priority=' "$f" | cut -d= -f2-); m_prof=$(grep '^profile=' "$f" | cut -d= -f2-) + m_caps=$(grep '^capabilities=' "$f" | cut -d= -f2-); m_trk=$(grep '^tracker_item=' "$f" | cut -d= -f2-) + [[ -n "$m_prio" ]] && extra+="prio=$m_prio " + [[ -n "$m_prof" ]] && extra+="profile=$m_prof " + [[ -n "$m_caps" ]] && extra+="caps=$m_caps " + [[ -n "$m_trk" ]] && extra+="tracker=$m_trk " + [[ -n "$extra" ]] && printf ' %s%s%s\n' "$C_DIM" "$extra" "$C_RESET" done $printed || printf ' %sno workers running%s\n' "$C_DIM" "$C_RESET" echo @@ -671,18 +966,26 @@ ${C_BOLD}KANBAN${C_RESET} inbox → building → review → testing → shipped ${C_BOLD}TASK FRONTMATTER${C_RESET} (top of each .md) --- - engine: devin + engine: devin # devin|claude|codex|copilot. Explicit engine always wins cwd: /Users/you/code/repo yolo: true lock: my-repo # optional; defaults to cwd. Jobs sharing a key run serially timeout: 45m # optional; 90s|45m|2h|1d. On expiry -> failed (result=timeout) verify: pnpm -s test # optional; auto-QA gate. pass -> testing, fail -> failed + # --- Phase 1 manifest (active) --- + priority: high # critical|high|medium|low (default medium). Picked highest-first, then oldest + engine-class: agentic-coder # used only if `engine` unset: agentic-coder->devin,claude,codex; chat-coder->copilot + prefers-engine: [claude] # optional order hint for engine-class resolution + capabilities: [os:any, node>=20, has:git] # hard host requirements; unmet -> failed (capability_mismatch) + idempotency-key: my-task-1 # re-adding same key+body = no-op; same key+different body = reject/supersede + # --- reserved (parsed + shown in status, but no-op until a later phase) --- + profile: prefers: budget: deps: deps-mode: retry: review-policy: artifacts: tracker-item: --- ${C_BOLD}ENV${C_RESET} AGENT_QUEUE_ROOT (=$QUEUE_ROOT) AGENT_QUEUE_MAX (=$MAX_CONCURRENCY) AGENT_QUEUE_ENGINE (=$DEFAULT_ENGINE) AGENT_QUEUE_VERIFY (default verify cmd) - DEVIN_BIN / CLAUDE_BIN / CODEX_BIN + DEVIN_BIN / CLAUDE_BIN / CODEX_BIN / COPILOT_BIN EOF } diff --git a/agent-queue/docs/GIGAFACTORY_ROADMAP.md b/agent-queue/docs/GIGAFACTORY_ROADMAP.md index 7367b01..ee26297 100644 --- a/agent-queue/docs/GIGAFACTORY_ROADMAP.md +++ b/agent-queue/docs/GIGAFACTORY_ROADMAP.md @@ -11,7 +11,7 @@ | Phase | Theme | Status | % | Gate | | ----- | ----- | ------ | - | ---- | | **0** | Baseline (today) | ✅ shipped | 100% | `selftest.sh` green | -| **1** | Manifest + profiles + capabilities + tracker adapter (single host) | ☐ not started | 0% | adapter e2e + selftest | +| **1** | Manifest + profiles + capabilities + tracker adapter (single host) | ◐ in progress | 35% | adapter e2e + selftest | | **2** | Coordinator as platform-service module + Cosmos + multi-factory leasing | ☐ not started | 0% | fleet e2e + module tests | | **3** | Fleet control plane in tracker-web + DAG deps + budgets + scoring router | ☐ not started | 0% | web e2e + router tests | | **4** | Message bus + autoscaling + cross-OS capability marketplace | ☐ not started | 0% | load/chaos suite | @@ -138,10 +138,10 @@ tracker-item: ITEM-789 # link back to the originating tracker task ``` - [ ] Define the manifest schema (Zod in the service; documented YAML for `.md`). -- [ ] Backward-compat: a Phase-0 `.md` (only `engine/cwd/yolo`) parses with all new fields defaulted. -- [ ] **Capability grammar** defined: tokens are `key` (presence, e.g. `has:xcode`), `key:value` (e.g. `os:mac`, `engine:devin`), or `keyversion` with `op ∈ {>=,>,=,<=,<}` (e.g. `node>=20`). `os:any` is a wildcard that matches every factory. A job matches a factory iff every required token is satisfied by the factory descriptor. -- [ ] **`engine-class` taxonomy** defined as an enum (`agentic-coder`, `chat-coder`, `review-only`) with a documented engine→class map (`devin,claude,codex → agentic-coder`; `copilot → chat-coder`). If `engine` is set it wins; else the scheduler picks any free engine in the class honoring `prefers-engine`. -- [ ] **`idempotency-key` semantics:** `key + content-hash` identical ⇒ no-op (returns existing job). Same `key`, **different** content ⇒ **rejected with 409** unless the prior job is still `queued`/`blocked` (then it is superseded). A re-`run`/`retry` of an existing job is **not** a new submit and never trips dedupe. +- [x] Backward-compat: a Phase-0 `.md` (only `engine/cwd/yolo`) parses with all new fields defaulted. *(P1-S1: bash runner; Zod schema still P2. selftest backward-compat case green.)* +- [x] **Capability grammar** defined: tokens are `key` (presence, e.g. `has:xcode`), `key:value` (e.g. `os:mac`, `engine:devin`), or `keyversion` with `op ∈ {>=,>,=,<=,<}` (e.g. `node>=20`). `os:any` is a wildcard that matches every factory. A job matches a factory iff every required token is satisfied by the factory descriptor. *(P1-S1: `caps_match`/`detect_capabilities` in `agent-queue.sh`.)* +- [x] **`engine-class` taxonomy** defined as an enum (`agentic-coder`, `chat-coder`, `review-only`) with a documented engine→class map (`devin,claude,codex → agentic-coder`; `copilot → chat-coder`). If `engine` is set it wins; else the scheduler picks any free engine in the class honoring `prefers-engine`. *(P1-S1: `resolve_engine`; `review-only` mapping reserved.)* +- [x] **`idempotency-key` semantics:** `key + content-hash` identical ⇒ no-op (returns existing job). Same `key`, **different** content ⇒ **rejected with 409** unless the prior job is still `queued`/`blocked` (then it is superseded). A re-`run`/`retry` of an existing job is **not** a new submit and never trips dedupe. *(P1-S1: add-time dedupe; bash maps "409" → clear error, `queued` → still in `inbox/` ⇒ superseded.)* - [ ] **`deps` semantics:** a dep is satisfied when it reaches `shipped` (default) or `testing` if `deps-mode: soft`. Submit-time **cycle detection** rejects cyclic graphs; unmet deps put the job in `blocked` (not `queued`). Cross-factory deps require the coordinator (P2); single-host deps work in P1. - **Acceptance:** a manifest fixture suite parses/validates; invalid manifests fail with precise errors; capability-grammar + dep-cycle + idempotency-conflict cases covered. - **Verify gate:** schema unit tests (≥ 1 per field incl. defaults + 5 invalid cases + grammar/cycle/409 cases). @@ -340,17 +340,19 @@ Each phase: **Goal → checklist → Exit criteria**. Don't start a phase until ### Phase 1 — Manifest + profiles + capabilities + tracker adapter (single host) **Goal:** richer single-host runner that understands profiles/capabilities and bridges to tracker — no distributed infra yet. -- [ ] Extend `agent-queue.sh` frontmatter parsing for all new manifest fields (§5), defaulted + backward-compatible. +> **Slice progress — P1-S1 (this commit):** manifest parsing (all §5 fields, defaulted + backward-compatible), `priority` ordering, capability detection+match gate, `engine-class` resolution, and `idempotency-key` dedupe are **done** on the bash runner. Profiles, `deps` DAG, `retry`/`budget.wall`, `allowed-scope`, the tracker adapter, and dashboard surfacing remain **for later slices**. + +- [x] Extend `agent-queue.sh` frontmatter parsing for all new manifest fields (§5), defaulted + backward-compatible. *(P1-S1)* - [ ] Add `profiles/` directory + profile resolution (persona injection, default verify/caps/scope) (§6). -- [ ] Local capability detection + a job/factory capability match check before launch (§8 subset). -- [ ] `priority` ordering in the inbox pick (replace pure FIFO with priority-then-age). -- [ ] `deps` (DAG) blocking on a single host; `idempotency-key` dedupe on `add`. +- [x] Local capability detection + a job/factory capability match check before launch (§8 subset). *(P1-S1: `detect_capabilities` + `caps_match`; mismatch ⇒ `failed/` `result=capability_mismatch`, agent never launched.)* +- [x] `priority` ordering in the inbox pick (replace pure FIFO with priority-then-age). *(P1-S1: `inbox_sorted`; per-lock serialization preserved.)* +- [ ] `deps` (DAG) blocking on a single host; `idempotency-key` dedupe on `add`. *(P1-S1: `idempotency-key` dedupe DONE; `deps` DAG blocking still pending.)* - [ ] `retry` with backoff into `failed`/requeue; `budget.wall` enforced (extends `timeout`). - [ ] `allowed-scope` guardrail (warn-only this phase) + post-run diff report. - [ ] **Tracker adapter** `aq from-tracker ` + `aq to-tracker` event poster (§10 P1). -- [ ] Dashboard shows profile + priority + capability tags + tracker-item link. -- [ ] Update `selftest.sh` with: manifest parse fixtures, profile resolution, priority order, dep-block, idempotency, adapter round-trip (mock). -- [ ] Update README + this doc's progress table. +- [ ] Dashboard shows profile + priority + capability tags + tracker-item link. *(P1-S1: `status` shows priority/profile/caps/tracker-item; Node `dash` surfacing pending.)* +- [ ] Update `selftest.sh` with: manifest parse fixtures, profile resolution, priority order, dep-block, idempotency, adapter round-trip (mock). *(P1-S1: added backward-compat, priority, capability-mismatch, engine-class, idempotency cases; profile/dep-block/adapter pending.)* +- [x] Update README + this doc's progress table. *(P1-S1)* - **Exit criteria:** all boxes ✅; `selftest.sh` green; a tracker task → executed → tracker `done` with SHA comment, fully on one host; no regression to Phase-0 `.md` files. ### Phase 2 — Coordinator as platform-service module + Cosmos + multi-factory leasing diff --git a/agent-queue/selftest.sh b/agent-queue/selftest.sh index fa31217..c15b3e8 100755 --- a/agent-queue/selftest.sh +++ b/agent-queue/selftest.sh @@ -103,4 +103,131 @@ fi # status must not error "$AQ" status >/dev/null 2>&1 && pass "status runs clean" +# ───────────────────────────────────────────────────────────────────── +# Phase 1 — Slice 1 cases (manifest/priority/capabilities/engine-class/idempotency). +# Each uses its OWN AGENT_QUEUE_ROOT; the no-op engine stub means no real CLI runs. +# ───────────────────────────────────────────────────────────────────── + +# 7. backward-compat: a legacy engine/cwd/yolo-only .md still completes → review/ +export AGENT_QUEUE_ROOT="$tmp/queue-bc" +bc="$tmp/bc-legacy.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' '---' '' '# legacy task' > "$bc" +"$AQ" init >/dev/null +DEVIN_BIN="$stub" "$AQ" add "$bc" >/dev/null +DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/review/*.md >/dev/null 2>&1; then + pass "backward-compat: legacy engine/cwd/yolo-only .md → review/" +else + ls -R "$AGENT_QUEUE_ROOT" >&2 || true + fail "legacy .md did not land in review/" +fi + +# 8. priority: with --max 1, a 'critical' job queued AFTER a 'low' job runs first. +# An order-recording devin-style stub appends each job's TASKID as it launches. +export AGENT_QUEUE_ROOT="$tmp/queue-prio" +ostub="$tmp/order-engine" +cat > "$ostub" <<'STUB' +#!/usr/bin/env bash +# order-recording no-op engine stub (devin-style: --prompt-file ) +pf="" +while [ $# -gt 0 ]; do + case "$1" in + --prompt-file) pf="${2:-}"; shift 2;; + *) shift;; + esac +done +if [ -n "${pf:-}" ] && [ -n "${AQ_ORDER:-}" ]; then + grep -m1 '^TASKID=' "$pf" >> "$AQ_ORDER" 2>/dev/null || true +fi +exit 0 +STUB +chmod +x "$ostub" +export AQ_ORDER="$tmp/prio-order.log"; : > "$AQ_ORDER" +plow="$tmp/p-low.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'priority: low' '---' '' 'TASKID=low' > "$plow" +pcrit="$tmp/p-crit.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'priority: critical' '---' '' 'TASKID=critical' > "$pcrit" +"$AQ" init >/dev/null +DEVIN_BIN="$ostub" "$AQ" add "$plow" >/dev/null +sleep 1 # ensure the critical job has a strictly newer (later) queue timestamp +DEVIN_BIN="$ostub" "$AQ" add "$pcrit" >/dev/null +DEVIN_BIN="$ostub" "$AQ" run --once --max 1 >/dev/null 2>&1 +if [ "$(head -1 "$AQ_ORDER" 2>/dev/null || true)" = "TASKID=critical" ]; then + pass "priority: critical (queued later) ran before low" +else + echo "--- execution order ---" >&2; cat "$AQ_ORDER" >&2 || true + fail "priority ordering did not pick the critical job first" +fi + +# 9. capability mismatch: a job requiring an absent tool → failed/ with +# result=capability_mismatch, and the agent is NEVER launched. +export AGENT_QUEUE_ROOT="$tmp/queue-cap" +launchflag="$tmp/cap-launched.flag"; rm -f "$launchflag" +launchstub="$tmp/launch-engine" +printf '#!/usr/bin/env bash\ntouch %q\nexit 0\n' "$launchflag" > "$launchstub" +chmod +x "$launchstub" +capjob="$tmp/cap.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' \ + 'capabilities: [has:definitely-not-installed]' '---' '' '# capability task' > "$capjob" +"$AQ" init >/dev/null +DEVIN_BIN="$launchstub" "$AQ" add "$capjob" >/dev/null +DEVIN_BIN="$launchstub" "$AQ" run --once >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/failed/*.md >/dev/null 2>&1 \ + && grep -q '^result=capability_mismatch' "$AGENT_QUEUE_ROOT"/.state/*.meta 2>/dev/null; then + pass "capability mismatch → failed/ (result=capability_mismatch)" +else + ls -R "$AGENT_QUEUE_ROOT" >&2 || true + fail "capability-mismatch job not routed to failed/ with result=capability_mismatch" +fi +if [ -e "$launchflag" ]; then + fail "agent WAS launched on capability mismatch (it must not be)" +else + pass "capability mismatch: agent never launched" +fi + +# 10. engine-class: a job with engine-class:agentic-coder and no engine, with +# DEVIN_BIN stubbed (available), resolves to devin, runs, and lands in review/. +export AGENT_QUEUE_ROOT="$tmp/queue-ec" +ecjob="$tmp/ec.md" +printf '%s\n' '---' 'engine-class: agentic-coder' "cwd: $work" 'yolo: true' '---' '' '# engine-class task' > "$ecjob" +"$AQ" init >/dev/null +DEVIN_BIN="$stub" "$AQ" add "$ecjob" >/dev/null +DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 +if ls "$AGENT_QUEUE_ROOT"/review/*.md >/dev/null 2>&1; then + pass "engine-class: agentic-coder (no engine) resolved to devin → review/" +else + ls -R "$AGENT_QUEUE_ROOT" >&2 || true + fail "engine-class job did not resolve+run to review/" +fi + +# 11. idempotency-key dedupe on add. +export AGENT_QUEUE_ROOT="$tmp/queue-idem" +"$AQ" init >/dev/null +ia="$tmp/idem-a.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'idempotency-key: dedup-1' '---' '' 'alpha body' > "$ia" +DEVIN_BIN="$stub" "$AQ" add "$ia" >/dev/null +DEVIN_BIN="$stub" "$AQ" add "$ia" >/dev/null # identical key+body → no-op +cnt=$(find "$AGENT_QUEUE_ROOT/inbox" -maxdepth 1 -type f -name '*.md' 2>/dev/null | wc -l | tr -d ' ') +[ "$cnt" = "1" ] && pass "idempotency: same key+body added twice → exactly 1 inbox file" \ + || fail "idempotency: expected 1 inbox file after duplicate add, got $cnt" +# same key, different body, prior STILL in inbox → supersede (still exactly 1) +ib="$tmp/idem-b.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'idempotency-key: dedup-1' '---' '' 'beta body (changed)' > "$ib" +DEVIN_BIN="$stub" "$AQ" add "$ib" >/dev/null +cnt=$(find "$AGENT_QUEUE_ROOT/inbox" -maxdepth 1 -type f -name '*.md' 2>/dev/null | wc -l | tr -d ' ') +[ "$cnt" = "1" ] && pass "idempotency: same key + changed body, prior in inbox → superseded (1 file)" \ + || fail "idempotency: expected 1 inbox file after supersede, got $cnt" +# drain (prior leaves inbox → review), then same key + different body → REJECT +DEVIN_BIN="$stub" "$AQ" run --once >/dev/null 2>&1 +ic="$tmp/idem-c.md" +printf '%s\n' '---' 'engine: devin' "cwd: $work" 'yolo: true' 'idempotency-key: dedup-1' '---' '' 'gamma body (changed again)' > "$ic" +if DEVIN_BIN="$stub" "$AQ" add "$ic" >/dev/null 2>&1; then + fail "idempotency: same key + different body (prior past inbox) should be rejected" +else + pass "idempotency: same key + different body, prior past inbox → rejected" +fi +cnt=$(find "$AGENT_QUEUE_ROOT/inbox" -maxdepth 1 -type f -name '*.md' 2>/dev/null | wc -l | tr -d ' ') +[ "$cnt" = "0" ] && pass "idempotency: a rejected add enqueues nothing" \ + || fail "idempotency: rejected add should not enqueue (inbox=$cnt)" + echo "self-test PASS"