bytelyst-devops-tools

Author	SHA1	Message	Date
saravanakumardb1	b8f0369f63	feat(agent-queue): approximate Devin run cost from tokens (model price map) Devin's export has tokens but no USD cost; estimate cost_usd from a per-model $/1M price map (Opus/Sonnet/Haiku) and flag usage_estimated so the dashboard shows it as approx.	2026-05-31 15:58:35 -07:00
saravanakumardb1	c2dbbaf188	feat(agent-queue): report PR state (open/merged) on the run	2026-05-31 13:56:46 -07:00
saravanakumardb1	d6fa1d9e28	feat(agent-queue): PR mode uses existing local repo via git worktree (no clone) When AQ_FLEET_REPO_BASE/<repo> is an existing checkout, create a git worktree off it for branch aq/job/<id> (shares objects + remotes, leaves the main checkout untouched) instead of cloning. Falls back to clone for remote-only repos. selftest exercises the worktree path. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 06:28:36 -07:00
saravanakumardb1	b442b95728	feat(agent-queue): per-repo verify + opt-in auto-merge for PR jobs Claim now carries verify (drives the existing verify gate -> PR opens only if it passes) and autoMerge (squash-merge via gh pr merge after the PR opens, non-fatal). selftest covers both. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 06:17:28 -07:00
saravanakumardb1	e634d4915f	feat(agent-queue): agent authors PR title + description (.aq_pr.md) In PR mode the agent is asked to write .aq_pr.md (line 1 = PR title, then a markdown description) based on the task + the diff it produced. The factory reads it for `gh pr create` (via --body-file) and removes it before committing (never part of the PR). Falls back to a derived title if absent. selftest asserts the authored title is used and .aq_pr.md is not committed. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 05:48:37 -07:00
saravanakumardb1	d0e800247c	feat(agent-queue): PR mode clones from local repo base (AQ_FLEET_REPO_BASE) MVP: when AQ_FLEET_REPO_BASE/<repo> is an existing local checkout, use it as the clone source (fast, no network) and push/PR to its GitHub origin — embedded creds in the local origin URL are stripped (gh credential helper handles auth). Selftest PASS (full-path bare-repo fallback unchanged). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 05:36:46 -07:00
saravanakumardb1	cfbcc2da9d	feat(agent-queue): PR mode — open a PR per fleet job (AQ_FLEET_PR) When AQ_FLEET_PR=1 and a claimed fleet job carries a `repo`, run the agent in an isolated checkout on branch aq/job/<fleetJobId> (off baseBranch), then on a passing verify commit/push and `gh pr create`. The PR URL + branch are recorded in the meta and reported on lease release (-> the coordinator stores them on the run). - fleet-client: parse repo/baseBranch from the claim, carry them in frontmatter; fleet_report_insights now sends prUrl/branch. - _fleet_pr_prepare (clone/fetch + branch, local-path aware, identity fallback) and _fleet_pr_open (commit/push/gh pr create). WIP checkpointing is skipped for PR jobs (the pushed branch is the durable artifact). - New flags: AQ_FLEET_PR, AQ_FLEET_REPOS_DIR, GH_BIN. README documented. - selftest: +1 case (bare-repo origin + gh stub) — branch pushed, PR opened, prUrl reported on release. Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 05:27:41 -07:00
saravanakumardb1	df65b7a245	feat(agent-queue): report testing + optional autoship to the fleet (close testing->shipped) Previously the factory reported up to `review` and "shipping is always manual", so a coordinator job never reached a terminal stage autonomously. - On a passing local verify, always report `testing` to the coordinator so its stage reflects that QA passed (was stuck at `review`). - New AQ_FLEET_AUTOSHIP=1: the factory's verify gate IS the test phase, so advance the coordinator job testing -> shipped and land it in shipped/ locally. This closes the testing->shipped gap for an autonomous submit -> shipped pipeline. Default off keeps the human review gate authoritative (job rests at testing). selftest: +2 cases (autoship reports testing+shipped + lands in shipped/; autoship OFF reports testing but withholds shipped). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 04:21:44 -07:00
saravanakumardb1	8085501506	feat(agent-queue): extract Devin token usage from the conversation export Devin does not surface token/cost in its stdout or local log, so parse_usage previously emitted nothing for the devin engine (runs showed no metrics). Devin DOES expose per-step usage in its ATIF conversation export. - build_agent_cmd: pass `--export <path>` for the devin engine (path derived from the job log path so parse_usage can find it; harmless 4th arg for other engines). - parse_usage devin: read the export and sum per-step metadata.metrics input_tokens / output_tokens / cache_read_tokens; take model from agent.model_name. Pure grep/awk, no new dependency. USD cost is left unset (the export carries token counts but not cost) — the dashboard shows tokens + model, cost stays blank. These feed fleet_report_insights, so live devin fleet runs now report tokens + model to the coordinator (verified live: model "Claude Opus 4.8", tokensIn/out + cache populated on a real run). selftest: +1 case (parse_usage devin sums per-step tokens + model from --export). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 02:55:11 -07:00
saravanakumardb1	57831e3e7a	feat(agent-queue): report run insights to the fleet + normalize API base #1 fleet_report_insights: on a successful fleet run the factory now reports the parsed cost/token/effort metrics (model, tokensIn/Out/cached, costUsd, turns, toolCalls) plus the run result onto the coordinator run via POST .../lease/release (which also frees the lease). parse_usage already extracted these into the job meta; they were never sent. Engines that do not expose usage locally (devin) still land result + endedAt. #2 normalize AQ_FLEET_API: platform-service mounts fleet under /api, so a base without it silently returned 404 on every call. Strip a trailing slash and append /api unless already present, so AQ_FLEET_API=http://host:4003 works too. selftest: +2 cases (insights reported via lease/release; API-base normalization). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 02:27:51 -07:00
Saravanakumar D	7f77e9abc7	feat(agent-queue): enforce budget.wall as a hard wall-clock ceiling Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> }) and arm it alongside the per-run timeout. Whichever ceiling fires first binds; the kill is recorded as result=timeout or result=budget_exceeded accordingly. budget.wall extends timeout: a job with only a budget.wall (no timeout) is now hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class by default and maps to the failed tracker status. Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live selftest coverage; usd/tokens remain best-effort and are not enforced here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-30 19:21:49 -07:00
saravanakumardb1	fbecbe82b6	feat(agent-queue): fleet feature flags + shadow/dual-run (Phase 2) Add a safe, reversible path to validate the fleet coordinator against the proven single-host path BEFORE cutover, via three independently-toggleable flags: AQ_FLEET=0 pure offline (zero coordinator calls; offline path unchanged) AQ_FLEET_ROUTE=1 route_via_service: coordinator authoritative for claim (default = P2-S3) AQ_FLEET_ROUTE=0 local inbox authoritative (coordinator not used to source work) AQ_FLEET_SHADOW=1 dual-run (needs AQ_FLEET=1 + ROUTE=0): query coordinator in parallel, record divergence, NEVER act on it Precedence: SHADOW only when ROUTE=0; if ROUTE=1 + SHADOW=1, ROUTE wins (one-shot warning). lib/fleet-client.sh: fleet_route_enabled / fleet_shadow_enabled / fleet_flags_warn_once / fleet_flags_state; fleet_shadow_claim (read-only — isolated `-shadow` factoryId + dryRun, releases any real lease, never materializes), fleet_shadow_compare (AGREE/DIVERGE/COORD_EMPTY/LOCAL_EMPTY → .state/fleet-shadow.log), fleet_shadow_report (shadow:true, response never acted on), cmd_fleet_shadow_report (counts + agreement rate). agent-queue.sh: ROUTE-gate claim sourcing (claim only when route_via_service); shadow hook after the local authoritative decision each iteration (best-effort, error-swallowed — shadow can never fail a real job); `fleet-shadow-report` subcommand + help; resolved flags surfaced in `status`/`fleet-status`. tryClaim/fence/offline paths unchanged. Strictly side-effect-free on real job state: shadow never ships, quarantines, or mutates real jobs. Offline path byte-for-byte unchanged when AQ_FLEET=0. selftest.sh: +8 checks (shadow AGREE/DIVERGE/COORD_EMPTY, non-fatal 5xx, ROUTE precedence, ROUTE=0 local-authoritative, fleet-shadow-report summary, shadow_report unit). 60 prior checks unchanged → 68 total green. README + GIGAFACTORY_ROADMAP document the flag model + cutover ladder. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 00:22:48 -07:00
saravanakumardb1	1d84712b47	feat(agent-queue): wire runner to fleet coordinator at minimal hook points (P2-S3) Sources lib/fleet-client.sh and adds a few fleet_enabled-gated hooks so the offline git-queue path is byte-for-byte unchanged when AQ_FLEET is unset/0: - cmd_run: register at loop start; per-iteration heartbeat (cadence) + lease renew for in-flight fleet jobs + claim one coordinator job into inbox when capacity. - meta: persist fleet_job_id + fleet_lease_epoch (from claim frontmatter). - run_worker: report `building` (with WIP checkpoint) after WIP setup and `review` before accepting the agent's output — a FENCED (stale-epoch/409) report self-aborts and quarantines (never ships); 5xx/unreachable degrades (finish locally). - _auto_echo: for fleet jobs route the outcome echo through the coordinator (fleet_events) instead of the direct tracker echo; offline jobs unchanged. - cmd_ship: fence-check before shipping a fleet job; release lease after. - status: show factory id + per-job fleet=<id>@e<epoch>; insights lists fleet_* fields. - dispatch + help: `fleet-status` command + a FLEET env section.	2026-05-29 22:45:44 -07:00
saravanakumardb1	b7a9ea1b7a	feat(agent-queue): tracker adapter — task <-> job round-trip (P1-S4) Implements §10 single-host tracker integration, closing the last Phase-1 §14 item: - tracker_api: one curl-only HTTP wrapper (base URL + bearer + productId header), overridable via AQ_TRACKER_API_CMD so tests need no live service. Emits the response body + a trailing HTTP-code line; _api_call splits into API_BODY/API_CODE. - aq from-tracker <ITEM_ID>: GET the Item, map title/description -> job body, labels (engine-class:/profile:/priority:/cap:) + Item priority -> frontmatter, and stamp tracker-item + a stable idempotency-key tracker-<id>. Materializes a .md into inbox/ via cmd_add; idempotent (Slice 1 dedupe) so a re-pull never dups. JSON parsed with POSIX awk (no jq) — mac + linux safe. - aq to-tracker <job>: one-way echo (child -> tracker, §24.5). PATCHes the Item status (building/review/testing->in_progress, shipped->done, failures->wont_fix, all overridable) and posts a metrics-only comment (result/attempts/duration/ tokens/cost/diff — NEVER prompt content or secrets). Idempotent via meta tracker_echoed; an echo failure (e.g. HTTP 500) is logged and non-fatal — the tracker is downstream, never authoritative for execution. - Opt-in auto-echo (AQ_TRACKER_AUTO=1, default OFF): the worker echoes on each transition (building via cmd_run, review/testing/failed via run_worker, shipped via ship/promote); never blocks or fails a job. - status + insights surface tracker-item and the last echoed status. curl-only HTTP; no new runtime deps; conventional + backward-compatible.	2026-05-29 21:35:06 -07:00
saravanakumardb1	3d99f04427	feat(agent-queue): profiles (persona + presets) and single-host deps/DAG (P1-S2) Implements roadmap §6 (profiles) and §5 deps on the bash runner, backward-compatible (jobs without profile/deps behave exactly as before). Profiles (§6): - profile_get / profile_persona / fm_eff helpers + PROFILES_DIR (AGENT_QUEUE_PROFILES override). A job's `profile:` inherits verify (<- default-verify), capabilities, engine-class, prefers-engine, allowed-scope, review-policy when the job omits them; job fields always override (precedence job > profile > default). Resolution runs via fm_eff inside the capability gate and resolve_engine, so inherited caps/engine-class take effect before launch. - persona injection: the profile's persona block is prepended to the stripped body fed to the engine (job .md unchanged on disk; nothing secret logged). - allowed-scope guardrail (WARN-ONLY): scope_check logs a non-blocking WARNING + records scope_warning= for changed paths outside the globs; path_in_scope is a pure, unit-testable matcher (`dir/**` = subtree). deps / DAG, single host (§5): - deps reference other jobs by idempotency-key. dep_satisfied: shipped/ (hard) or shipped/+testing/ (deps-mode: soft). deps_unmet drives a block-with-reason skip in inbox selection (never launched/failed); cmd_status surfaces "blocked (waiting on <keys>)". deps_would_cycle rejects cyclic submits on `add`. - _drain_pending: `--once` drains past dep-blocked jobs (idle can't satisfy them) while still waiting on retry/recovery backoff timers. Meta now records effective (inherited) capabilities/engine-class/prefers-engine/ review-policy/allowed-scope so `status` reflects resolved config.	2026-05-29 19:26:16 -07:00
saravanakumardb1	1758bc1ab1	feat(agent-queue): single-host crash recovery, WIP checkpoint/resume, retry + insights (P1-S3) Implements the single-host bash equivalents of roadmap §25 (durability/crash recovery) and §26 (execution insights), plus §11 retry/dead-letter stand-in. Resilience (A1-A4): - recover_orphans + `recover` command: building/ jobs with a dead worker (dead pid, pidstart reuse-guard) are moved back to inbox/ with attempts incremented, on `run` startup and each loop. Idempotent (folder location is the guard). - WIP checkpointing: for a git cwd, _wip_start creates/checks out aq/wip/<job> and _wip_checkpoint commits changes on every exit path via an EXIT/INT/TERM trap; never commits to main/current branch; non-git cwd skipped. RESUME: a relaunch whose aq/wip/<job> exists checks it out first (continue from checkpoint). wip_base persisted in a write-once sidecar. - retry policy (now functional): retry { max, backoff, on } requeues failures whose class (timeout\|verify_failed\|crash) is in `on`, honoring backoff via next_eligible (selection skips until eligible), up to max attempts; exhaustion -> failed/ result=retries_exhausted with the WIP branch + full log preserved. - state integrity: all meta writes stay append-only; attempts/next_eligible/wip_* are re-derivable; recovery is crash-safe. Insights (B1-B6): - per-run metrics into meta: duration_s, exit, result, attempts, and (git cwd) files_changed/lines_added/lines_deleted from numstat wip_base..HEAD. - parse_usage(engine, log) adapter: generic AQ_USAGE line + Claude/Codex token heuristics; Devin/Copilot TODO; usage_estimated flag; never fabricates numbers. - status insights sub-line; new `insights [job]` command (per-job metrics or a recent table + per-engine token/cost/success/duration rollup). - privacy: only metrics are recorded, never prompt content or secrets. Backward-compatible: legacy .md and non-git cwd behave exactly as before.	2026-05-29 18:43:21 -07:00
saravanakumardb1	0be5b34123	feat(agent-queue): evolved manifest, priority, capabilities, engine-class, idempotency (P1-S1) Implements Gigafactory Phase 1 - Slice 1 in the bash runner (backward-compatible; a legacy engine/cwd/yolo-only .md behaves exactly as before): - Parse all new §5 manifest keys via fm_get with safe defaults; record them in <job>.meta and surface priority/profile/capabilities/tracker-item in `status`. Only priority, capabilities, engine-class and idempotency-key are functional this slice; the rest (profile, prefers, budget, deps, deps-mode, retry, review-policy, artifacts, tracker-item) are stored but inert. - priority ordering: inbox_sorted picks critical>high>medium>low, ties by oldest; per-lock serialization preserved. - capability grammar + match: detect_capabilities advertises os/engine/node/has tokens; caps_match honors key, key:value, key<op>version and os:any. A job whose declared capabilities the host cannot satisfy is moved to failed/ with result=capability_mismatch and the agent is never launched. - engine-class resolution: explicit engine wins; else engine-class picks the first available engine honoring prefers-engine (agentic-coder->devin,claude,codex; chat-coder->copilot). No available engine -> result=no_engine. Adds copilot to the engine driver + COPILOT_BIN. - idempotency-key dedupe on add: same key+body -> no-op; same key+different body supersedes an inbox prior, else is rejected with a clear error. No change to queue/ data or the run/ship lifecycle. macOS + Linux safe.	2026-05-29 17:44:19 -07:00
saravanakumardb1	4ed4d75a67	feat(agent-queue): default max concurrency 2->3 (still env/flag configurable) - AGENT_QUEUE_MAX default 3 (override via env or run --max N) - sync README quick-start + env table + bytelyst-cli example to --max 3	2026-05-29 16:09:12 -07:00
saravanakumardb1	af1bc6904e	feat(agent-queue): build/ship lifecycle with auto-QA verify gate + manual ship Redesign the kanban runner stages from inbox->doing->done/failed to inbox->building->review->testing->shipped (+ failed): - worker: agent rc=0 lands in review/, then runs the configurable verify command (frontmatter verify: / AGENT_QUEUE_VERIFY) in cwd; pass -> testing/ (QA), fail -> failed/, none -> parks in review/ - new commands: ship (testing->shipped, manual gate), promote (advance one stage), reject (review/testing->failed); requeue now also pulls from review/testing - status + dashboard.mjs render all six stages; RECENT panel labels shipped/testing/review/verify_failed/timeout/rejected - README: new lifecycle diagram, verify: frontmatter, result= glossary, command table + folder layout - selftest: assert no-verify->review, verify-pass->testing->ship->shipped, verify-fail->failed - rename queue/doing->building, queue/done->review; add testing/ shipped/	2026-05-29 16:03:01 -07:00
saravanakumardb1	27feba36fa	fix(agent-queue): status used undefined live_workers; call active_workers	2026-05-29 15:27:15 -07:00
saravanakumardb1	1f15520c4f	feat(agent-queue): add requeue and clean commands - requeue <job>: move a failed job back to inbox/ and drop stale meta/body so it re-runs cleanly - clean [--keep N]: archive finished jobs' logs+meta beyond the newest N (default 50) into queue/.archive/<ts>/; running jobs + .md records untouched - document both in usage + bytelyst-cli subcommand list	2026-05-28 22:31:56 -07:00
saravanakumardb1	4239648876	fix(agent-queue): verify pid start time to defeat pid reuse Record pidstart (ps lstart) at launch and verify it in all liveness checks (_meta_active, status, stop) via _pid_alive, so a recycled pid can never be mistaken for our worker. Falls back to plain liveness when no start time recorded.	2026-05-28 22:24:50 -07:00
saravanakumardb1	a849a30e11	feat(agent-queue): refuse a second run when a daemon is already active cmd_run now checks daemon.pid liveness up front: if a run loop is alive it exits with an error (protecting the single-launcher invariant locking depends on); a stale daemon.pid (dead pid) is cleared and the run proceeds.	2026-05-28 22:21:31 -07:00
saravanakumardb1	11935d0539	fix(agent-queue): reserve concurrency slot before backgrounding worker Replace live_workers with reservation-aware active_workers + shared _meta_active: a job counts toward --max the moment its meta is written (before the worker is backgrounded), so --max can never be exceeded. A <30s guard prevents a meta orphaned mid-launch from pinning a slot. busy_keys now shares _meta_active.	2026-05-28 22:17:36 -07:00
saravanakumardb1	79331d591f	feat(agent-queue): flag stalled workers in status + dash Mark a running worker '⚠ stalled' when its log has not changed for more than AGENT_QUEUE_STALL_MIN minutes (default 10), using log mtime as the freshness signal. Implemented in both the bash status table and the Node dashboard.	2026-05-28 22:15:26 -07:00
saravanakumardb1	3b71f0117a	feat(agent-queue): per-job timeout via frontmatter timeout: Honor 'timeout: 45m' (90s\|45m\|2h\|1d) by wrapping the agent in timeout/gtimeout when available (hard process-tree kill), else a portable bash watchdog. On expiry the job moves doing->failed with result=timeout and a TIMED OUT log line.	2026-05-28 22:13:50 -07:00
saravanakumardb1	f14e6c2336	feat(agent-queue): per-cwd locking so two agents never share a repo Serialize jobs by lock key (frontmatter 'lock:' override, default cwd) via the single run-loop's pre-launch eligibility check; the oldest non-busy job is picked regardless of --max. Adds a flock-based worker guard where flock exists (Linux); macOS relies on the single-daemon model. Records lock= in job meta.	2026-05-28 22:10:30 -07:00
saravanakumardb1	169e944c3c	feat(agent-queue): Node live dashboard + bytelyst-cli integration - dashboard.mjs: zero-dep Node TUI (running workers w/ engine, elapsed, cwd, last log line + recent done/failed); 'dash' subcommand execs it - bytelyst-cli.sh: 'agent-queue' / 'aq' passthrough handled before the GITHUB_TOKEN + jq/curl gates; usage + interactive-menu entry - README: document dash + bytelyst-cli usage	2026-05-28 21:39:25 -07:00
saravanakumardb1	179108504f	feat(agent-queue): folder-kanban runner for devin/claude/codex CLIs Add a zero-dependency, bash 3.2-compatible queue runner that executes prompt .md files through headless coding-agent CLIs in auto-approve mode, moving them inbox -> doing -> done/failed with per-job logs and live status. - pluggable engine drivers (devin --prompt-file, claude/codex via stdin) - per-task YAML frontmatter: engine, cwd, yolo - subcommands: init, add, run (--max N), status, watch, stop, logs - runtime queue/ state gitignored	2026-05-28 21:35:59 -07:00

29 Commits