bytelyst-devops-tools

Author	SHA1	Message	Date
saravanakumardb1	c2dbbaf188	feat(agent-queue): report PR state (open/merged) on the run	2026-05-31 13:56:46 -07:00
saravanakumardb1	d6fa1d9e28	feat(agent-queue): PR mode uses existing local repo via git worktree (no clone) When AQ_FLEET_REPO_BASE/<repo> is an existing checkout, create a git worktree off it for branch aq/job/<id> (shares objects + remotes, leaves the main checkout untouched) instead of cloning. Falls back to clone for remote-only repos. selftest exercises the worktree path. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 06:28:36 -07:00
saravanakumardb1	b442b95728	feat(agent-queue): per-repo verify + opt-in auto-merge for PR jobs Claim now carries verify (drives the existing verify gate -> PR opens only if it passes) and autoMerge (squash-merge via gh pr merge after the PR opens, non-fatal). selftest covers both. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 06:17:28 -07:00
saravanakumardb1	e634d4915f	feat(agent-queue): agent authors PR title + description (.aq_pr.md) In PR mode the agent is asked to write .aq_pr.md (line 1 = PR title, then a markdown description) based on the task + the diff it produced. The factory reads it for `gh pr create` (via --body-file) and removes it before committing (never part of the PR). Falls back to a derived title if absent. selftest asserts the authored title is used and .aq_pr.md is not committed. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 05:48:37 -07:00
saravanakumardb1	cfbcc2da9d	feat(agent-queue): PR mode — open a PR per fleet job (AQ_FLEET_PR) When AQ_FLEET_PR=1 and a claimed fleet job carries a `repo`, run the agent in an isolated checkout on branch aq/job/<fleetJobId> (off baseBranch), then on a passing verify commit/push and `gh pr create`. The PR URL + branch are recorded in the meta and reported on lease release (-> the coordinator stores them on the run). - fleet-client: parse repo/baseBranch from the claim, carry them in frontmatter; fleet_report_insights now sends prUrl/branch. - _fleet_pr_prepare (clone/fetch + branch, local-path aware, identity fallback) and _fleet_pr_open (commit/push/gh pr create). WIP checkpointing is skipped for PR jobs (the pushed branch is the durable artifact). - New flags: AQ_FLEET_PR, AQ_FLEET_REPOS_DIR, GH_BIN. README documented. - selftest: +1 case (bare-repo origin + gh stub) — branch pushed, PR opened, prUrl reported on release. Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 05:27:41 -07:00
saravanakumardb1	df65b7a245	feat(agent-queue): report testing + optional autoship to the fleet (close testing->shipped) Previously the factory reported up to `review` and "shipping is always manual", so a coordinator job never reached a terminal stage autonomously. - On a passing local verify, always report `testing` to the coordinator so its stage reflects that QA passed (was stuck at `review`). - New AQ_FLEET_AUTOSHIP=1: the factory's verify gate IS the test phase, so advance the coordinator job testing -> shipped and land it in shipped/ locally. This closes the testing->shipped gap for an autonomous submit -> shipped pipeline. Default off keeps the human review gate authoritative (job rests at testing). selftest: +2 cases (autoship reports testing+shipped + lands in shipped/; autoship OFF reports testing but withholds shipped). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 04:21:44 -07:00
saravanakumardb1	8085501506	feat(agent-queue): extract Devin token usage from the conversation export Devin does not surface token/cost in its stdout or local log, so parse_usage previously emitted nothing for the devin engine (runs showed no metrics). Devin DOES expose per-step usage in its ATIF conversation export. - build_agent_cmd: pass `--export <path>` for the devin engine (path derived from the job log path so parse_usage can find it; harmless 4th arg for other engines). - parse_usage devin: read the export and sum per-step metadata.metrics input_tokens / output_tokens / cache_read_tokens; take model from agent.model_name. Pure grep/awk, no new dependency. USD cost is left unset (the export carries token counts but not cost) — the dashboard shows tokens + model, cost stays blank. These feed fleet_report_insights, so live devin fleet runs now report tokens + model to the coordinator (verified live: model "Claude Opus 4.8", tokensIn/out + cache populated on a real run). selftest: +1 case (parse_usage devin sums per-step tokens + model from --export). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 02:55:11 -07:00
saravanakumardb1	57831e3e7a	feat(agent-queue): report run insights to the fleet + normalize API base #1 fleet_report_insights: on a successful fleet run the factory now reports the parsed cost/token/effort metrics (model, tokensIn/Out/cached, costUsd, turns, toolCalls) plus the run result onto the coordinator run via POST .../lease/release (which also frees the lease). parse_usage already extracted these into the job meta; they were never sent. Engines that do not expose usage locally (devin) still land result + endedAt. #2 normalize AQ_FLEET_API: platform-service mounts fleet under /api, so a base without it silently returned 404 on every call. Strip a trailing slash and append /api unless already present, so AQ_FLEET_API=http://host:4003 works too. selftest: +2 cases (insights reported via lease/release; API-base normalization). Full self-test PASS. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-31 02:27:51 -07:00
Saravanakumar D	66c91233da	feat(agent-queue): re-point TUI dashboard at /fleet API (parity) Add an opt-in fleet mode to the dashboard so an operator can drive the coordinator fleet from the same TUI used for the local folder queue. - lib/fleet-dash.mjs: dependency-injectable read/act adapter over the platform-service /fleet REST surface (jobs, metrics, factories, events, ship/requeue/reject). Pure-ish + fully unit-testable without a live service. - dashboard.mjs: render + act in fleet mode when AQ_FLEET_DASH=1 — board with counts, factories (per-factory rows or metrics aggregate), alerts, running (by lease/factory), actionable JOBS with manifest tags, recent, and a per-job events log. Single-flight async refresh keeps the last good board on failure; ship re-GETs a fresh leaseEpoch before PATCH; run/stop/promote are disabled (no safe server contract). Local mode is byte-for-byte unchanged. - lib/fleet-dash.test.mjs: 22 node:assert assertions (config, stage mapping, toBoard, fetch headers/timeout/errors, board assembly + graceful degradation, events, job actions) wired into selftest.sh. - docs: tick the Phase 3 "TUI re-pointed at /fleet" roadmap boxes. Verified: selftest.sh green (incl. new fleet-dash checks); live non-TTY render smoke against a stub /fleet server (both factories and metrics-aggregate paths); local mode unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-30 19:47:56 -07:00
Saravanakumar D	7f77e9abc7	feat(agent-queue): enforce budget.wall as a hard wall-clock ceiling Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> }) and arm it alongside the per-run timeout. Whichever ceiling fires first binds; the kill is recorded as result=timeout or result=budget_exceeded accordingly. budget.wall extends timeout: a job with only a budget.wall (no timeout) is now hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class by default and maps to the failed tracker status. Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live selftest coverage; usd/tokens remain best-effort and are not enforced here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-30 19:21:49 -07:00
saravanakumardb1	0cde7def6a	feat(agent-queue): two-factory parallel demo — Phase 2 exit criteria (§14) Close the final Phase-2 exit-criteria box: >=2 factories executing jobs in parallel through one coordinator, proving the concurrency guarantees end-to-end. This is a DEMO HARNESS over the existing runtime — agent-queue.sh and lib/fleet-client.sh are unchanged (read + called, not modified). demo/two-factory-demo.sh: starts two real `agent-queue.sh run` daemons (mac-1 + ubuntu-1, separate queues/cwds) that compete ONLY through the coordinator, then asserts: (a) no double-assign — each of 3 jobs executed by exactly one factory; (b) fencing + reclaim — kill a factory mid-job, the reaper returns its job, the survivor reclaims + completes it, and the dead worker's late/zombie report (stale leaseEpoch) is FENCED (HTTP 409, never shipped); (c) parallelism — both factories hold active jobs concurrently. Dual-mode: CI-safe stateful stub by default; live platform-service when AQ_FLEET_API/AQ_FLEET_TOKEN set. demo/coordinator-stub.sh: stateful, mkdir-lock-guarded, file-backed coordinator implementing claim/lease/fence/renew/release + reaper-reclaim via the existing AQ_FLEET_API_CMD seam — the selftest stub pattern extended with shared state so >=2 processes coordinate through one coordinator. demo/README.md: stub + real invocations, env knobs, what each guarantee proves, what-to-watch guide. selftest.sh: +3 headless stub-mode checks (existing 68 unchanged byte-for-byte -> 71 total green). docs/GIGAFACTORY_ROADMAP.md: tick the §14 two-factory-demo box; annotate Phase-2 exit criteria; bump §0 Phase 2 to 80% (remaining: scheduler-core wiring [common-plat PR #31], tracker-direct call, factory enrollment). bash 3.2 + awk/sed/grep/pgrep only; mac+linux safe; no new runtime deps. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 01:53:36 -07:00
saravanakumardb1	fbecbe82b6	feat(agent-queue): fleet feature flags + shadow/dual-run (Phase 2) Add a safe, reversible path to validate the fleet coordinator against the proven single-host path BEFORE cutover, via three independently-toggleable flags: AQ_FLEET=0 pure offline (zero coordinator calls; offline path unchanged) AQ_FLEET_ROUTE=1 route_via_service: coordinator authoritative for claim (default = P2-S3) AQ_FLEET_ROUTE=0 local inbox authoritative (coordinator not used to source work) AQ_FLEET_SHADOW=1 dual-run (needs AQ_FLEET=1 + ROUTE=0): query coordinator in parallel, record divergence, NEVER act on it Precedence: SHADOW only when ROUTE=0; if ROUTE=1 + SHADOW=1, ROUTE wins (one-shot warning). lib/fleet-client.sh: fleet_route_enabled / fleet_shadow_enabled / fleet_flags_warn_once / fleet_flags_state; fleet_shadow_claim (read-only — isolated `-shadow` factoryId + dryRun, releases any real lease, never materializes), fleet_shadow_compare (AGREE/DIVERGE/COORD_EMPTY/LOCAL_EMPTY → .state/fleet-shadow.log), fleet_shadow_report (shadow:true, response never acted on), cmd_fleet_shadow_report (counts + agreement rate). agent-queue.sh: ROUTE-gate claim sourcing (claim only when route_via_service); shadow hook after the local authoritative decision each iteration (best-effort, error-swallowed — shadow can never fail a real job); `fleet-shadow-report` subcommand + help; resolved flags surfaced in `status`/`fleet-status`. tryClaim/fence/offline paths unchanged. Strictly side-effect-free on real job state: shadow never ships, quarantines, or mutates real jobs. Offline path byte-for-byte unchanged when AQ_FLEET=0. selftest.sh: +8 checks (shadow AGREE/DIVERGE/COORD_EMPTY, non-fatal 5xx, ROUTE precedence, ROUTE=0 local-authoritative, fleet-shadow-report summary, shadow_report unit). 60 prior checks unchanged → 68 total green. README + GIGAFACTORY_ROADMAP document the flag model + cutover ladder. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 00:22:48 -07:00
saravanakumardb1	064dbf3d8f	test(agent-queue): fleet integration selftest cases (P2-S3) Adds 7 stub-driven fleet cases (AQ_FLEET_API_CMD stub, no live coordinator); never weakens the prior 53 (full suite now 60 green): - flag OFF (default): zero coordinator calls; offline job completes unchanged - register(heartbeat)+claim -> coordinator job materialized + executed to review/ - report+checkpoint: PATCH carries stage+leaseEpoch (+ wipBranch on building) - FENCING: stale-epoch 409 -> self-abort + quarantine (never shipped) - lease renew (unit): POST .../lease/renew with current leaseEpoch - offline-degrade: coordinator 5xx -> job completes locally (degraded), not quarantined - no-leak: bodyMd/token never appear in report payloads	2026-05-29 22:45:44 -07:00
saravanakumardb1	1e0a17bbc0	test(agent-queue): tracker adapter selftest cases (P1-S4) Adds (never weakens) 7 stub-driven cases (AQ_TRACKER_API_CMD stub, no live service): from-tracker create + label mapping + idempotent; to-tracker shipped echo (PATCH done + metrics comment, asserts NO prompt body sent) + idempotent; HTTP 500 non-fatal; AQ_TRACKER_AUTO auto-echo on run. Full suite green (53 checks).	2026-05-29 21:35:16 -07:00
saravanakumardb1	71d8a7cd4e	test(agent-queue): profiles + deps/DAG selftest cases (P1-S2) Adds (never weakens) temp-catalog + temp-git cases: profile verify inheritance + job-override precedence, persona-injection golden, profile capability inheritance, allowed-scope warn-only + path_in_scope unit, deps block->run, deps-mode soft (testing/), and submit-time cycle rejection. Full suite green (46 checks).	2026-05-29 19:26:26 -07:00
saravanakumardb1	f46dd38adb	test(agent-queue): resilience + insights selftest cases (P1-S3) Adds (never weakens) temp-git-repo + stub cases: orphan recovery (+idempotent), WIP checkpoint/numstat, non-git skip, WIP resume, retry on verify_failed and crash (incl. no-retry when class absent), parse_usage extraction, per-engine aggregate. Inbox-empty-safe counts; avoids the pipefail+grep -q SIGPIPE trap.	2026-05-29 18:43:30 -07:00
saravanakumardb1	4600a41e5d	test(agent-queue): self-test cases for manifest/priority/capabilities/engine-class/idempotency (P1-S1) Adds (never weakens existing) cases, each in its own temp AGENT_QUEUE_ROOT using the no-op engine stub: - backward-compat: legacy engine/cwd/yolo-only .md still lands in review/. - priority: with --max 1, a critical job queued after a low job runs first (order-recording stub). - capability mismatch: has:definitely-not-installed -> failed/ result=capability_mismatch, asserting the agent was never launched. - engine-class: agentic-coder + no engine, DEVIN_BIN stubbed -> review/. - idempotency: same key+body twice -> 1 inbox file; same key+changed body in inbox -> superseded; same key+different body after drain -> rejected. Inbox counts use find (not a globbing ls) so set -e/pipefail tolerate an empty inbox.	2026-05-29 17:44:27 -07:00
saravanakumardb1	af1bc6904e	feat(agent-queue): build/ship lifecycle with auto-QA verify gate + manual ship Redesign the kanban runner stages from inbox->doing->done/failed to inbox->building->review->testing->shipped (+ failed): - worker: agent rc=0 lands in review/, then runs the configurable verify command (frontmatter verify: / AGENT_QUEUE_VERIFY) in cwd; pass -> testing/ (QA), fail -> failed/, none -> parks in review/ - new commands: ship (testing->shipped, manual gate), promote (advance one stage), reject (review/testing->failed); requeue now also pulls from review/testing - status + dashboard.mjs render all six stages; RECENT panel labels shipped/testing/review/verify_failed/timeout/rejected - README: new lifecycle diagram, verify: frontmatter, result= glossary, command table + folder layout - selftest: assert no-verify->review, verify-pass->testing->ship->shipped, verify-fail->failed - rename queue/doing->building, queue/done->review; add testing/ shipped/	2026-05-29 16:03:01 -07:00
saravanakumardb1	9b49c28af5	chore(agent-queue): add self-test harness (shellcheck + no-op run cycle)	2026-05-28 22:07:15 -07:00

19 Commits