Commit Graph

10 Commits

Author SHA1 Message Date
Saravanakumar D
7f77e9abc7 feat(agent-queue): enforce budget.wall as a hard wall-clock ceiling
Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> })
and arm it alongside the per-run timeout. Whichever ceiling fires first binds;
the kill is recorded as result=timeout or result=budget_exceeded accordingly.
budget.wall extends timeout: a job with only a budget.wall (no timeout) is now
hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class
by default and maps to the failed tracker status.

Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live
selftest coverage; usd/tokens remain best-effort and are not enforced here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-30 19:21:49 -07:00
saravanakumardb1
0cde7def6a feat(agent-queue): two-factory parallel demo — Phase 2 exit criteria (§14)
Close the final Phase-2 exit-criteria box: >=2 factories executing jobs in parallel
through one coordinator, proving the concurrency guarantees end-to-end. This is a
DEMO HARNESS over the existing runtime — agent-queue.sh and lib/fleet-client.sh are
unchanged (read + called, not modified).

demo/two-factory-demo.sh: starts two real `agent-queue.sh run` daemons (mac-1 +
ubuntu-1, separate queues/cwds) that compete ONLY through the coordinator, then
asserts: (a) no double-assign — each of 3 jobs executed by exactly one factory;
(b) fencing + reclaim — kill a factory mid-job, the reaper returns its job, the
survivor reclaims + completes it, and the dead worker's late/zombie report (stale
leaseEpoch) is FENCED (HTTP 409, never shipped); (c) parallelism — both factories
hold active jobs concurrently. Dual-mode: CI-safe stateful stub by default; live
platform-service when AQ_FLEET_API/AQ_FLEET_TOKEN set.

demo/coordinator-stub.sh: stateful, mkdir-lock-guarded, file-backed coordinator
implementing claim/lease/fence/renew/release + reaper-reclaim via the existing
AQ_FLEET_API_CMD seam — the selftest stub pattern extended with shared state so
>=2 processes coordinate through one coordinator.

demo/README.md: stub + real invocations, env knobs, what each guarantee proves,
what-to-watch guide.

selftest.sh: +3 headless stub-mode checks (existing 68 unchanged byte-for-byte ->
71 total green).

docs/GIGAFACTORY_ROADMAP.md: tick the §14 two-factory-demo box; annotate Phase-2
exit criteria; bump §0 Phase 2 to 80% (remaining: scheduler-core wiring [common-plat
PR #31], tracker-direct call, factory enrollment).

bash 3.2 + awk/sed/grep/pgrep only; mac+linux safe; no new runtime deps.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 01:53:36 -07:00
saravanakumardb1
fbecbe82b6 feat(agent-queue): fleet feature flags + shadow/dual-run (Phase 2)
Add a safe, reversible path to validate the fleet coordinator against the proven
single-host path BEFORE cutover, via three independently-toggleable flags:
  AQ_FLEET=0          pure offline (zero coordinator calls; offline path unchanged)
  AQ_FLEET_ROUTE=1    route_via_service: coordinator authoritative for claim (default = P2-S3)
  AQ_FLEET_ROUTE=0    local inbox authoritative (coordinator not used to source work)
  AQ_FLEET_SHADOW=1   dual-run (needs AQ_FLEET=1 + ROUTE=0): query coordinator in parallel,
                      record divergence, NEVER act on it
Precedence: SHADOW only when ROUTE=0; if ROUTE=1 + SHADOW=1, ROUTE wins (one-shot warning).

lib/fleet-client.sh: fleet_route_enabled / fleet_shadow_enabled / fleet_flags_warn_once /
fleet_flags_state; fleet_shadow_claim (read-only — isolated `-shadow` factoryId +
dryRun, releases any real lease, never materializes), fleet_shadow_compare
(AGREE/DIVERGE/COORD_EMPTY/LOCAL_EMPTY → .state/fleet-shadow.log), fleet_shadow_report
(shadow:true, response never acted on), cmd_fleet_shadow_report (counts + agreement rate).

agent-queue.sh: ROUTE-gate claim sourcing (claim only when route_via_service);
shadow hook after the local authoritative decision each iteration (best-effort,
error-swallowed — shadow can never fail a real job); `fleet-shadow-report` subcommand
+ help; resolved flags surfaced in `status`/`fleet-status`. tryClaim/fence/offline
paths unchanged.

Strictly side-effect-free on real job state: shadow never ships, quarantines, or
mutates real jobs. Offline path byte-for-byte unchanged when AQ_FLEET=0.

selftest.sh: +8 checks (shadow AGREE/DIVERGE/COORD_EMPTY, non-fatal 5xx, ROUTE
precedence, ROUTE=0 local-authoritative, fleet-shadow-report summary, shadow_report
unit). 60 prior checks unchanged → 68 total green. README + GIGAFACTORY_ROADMAP
document the flag model + cutover ladder.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 00:22:48 -07:00
saravanakumardb1
064dbf3d8f test(agent-queue): fleet integration selftest cases (P2-S3)
Adds 7 stub-driven fleet cases (AQ_FLEET_API_CMD stub, no live coordinator); never
weakens the prior 53 (full suite now 60 green):
- flag OFF (default): zero coordinator calls; offline job completes unchanged
- register(heartbeat)+claim -> coordinator job materialized + executed to review/
- report+checkpoint: PATCH carries stage+leaseEpoch (+ wipBranch on building)
- FENCING: stale-epoch 409 -> self-abort + quarantine (never shipped)
- lease renew (unit): POST .../lease/renew with current leaseEpoch
- offline-degrade: coordinator 5xx -> job completes locally (degraded), not quarantined
- no-leak: bodyMd/token never appear in report payloads
2026-05-29 22:45:44 -07:00
saravanakumardb1
1e0a17bbc0 test(agent-queue): tracker adapter selftest cases (P1-S4)
Adds (never weakens) 7 stub-driven cases (AQ_TRACKER_API_CMD stub, no live service): from-tracker create + label mapping + idempotent; to-tracker shipped echo (PATCH done + metrics comment, asserts NO prompt body sent) + idempotent; HTTP 500 non-fatal; AQ_TRACKER_AUTO auto-echo on run. Full suite green (53 checks).
2026-05-29 21:35:16 -07:00
saravanakumardb1
71d8a7cd4e test(agent-queue): profiles + deps/DAG selftest cases (P1-S2)
Adds (never weakens) temp-catalog + temp-git cases: profile verify inheritance + job-override precedence, persona-injection golden, profile capability inheritance, allowed-scope warn-only + path_in_scope unit, deps block->run, deps-mode soft (testing/), and submit-time cycle rejection. Full suite green (46 checks).
2026-05-29 19:26:26 -07:00
saravanakumardb1
f46dd38adb test(agent-queue): resilience + insights selftest cases (P1-S3)
Adds (never weakens) temp-git-repo + stub cases: orphan recovery (+idempotent), WIP checkpoint/numstat, non-git skip, WIP resume, retry on verify_failed and crash (incl. no-retry when class absent), parse_usage extraction, per-engine aggregate. Inbox-empty-safe counts; avoids the pipefail+grep -q SIGPIPE trap.
2026-05-29 18:43:30 -07:00
saravanakumardb1
4600a41e5d test(agent-queue): self-test cases for manifest/priority/capabilities/engine-class/idempotency (P1-S1)
Adds (never weakens existing) cases, each in its own temp AGENT_QUEUE_ROOT using
the no-op engine stub:
- backward-compat: legacy engine/cwd/yolo-only .md still lands in review/.
- priority: with --max 1, a critical job queued after a low job runs first
  (order-recording stub).
- capability mismatch: has:definitely-not-installed -> failed/
  result=capability_mismatch, asserting the agent was never launched.
- engine-class: agentic-coder + no engine, DEVIN_BIN stubbed -> review/.
- idempotency: same key+body twice -> 1 inbox file; same key+changed body in
  inbox -> superseded; same key+different body after drain -> rejected.

Inbox counts use find (not a globbing ls) so set -e/pipefail tolerate an empty inbox.
2026-05-29 17:44:27 -07:00
saravanakumardb1
af1bc6904e feat(agent-queue): build/ship lifecycle with auto-QA verify gate + manual ship
Redesign the kanban runner stages from inbox->doing->done/failed to
inbox->building->review->testing->shipped (+ failed):

- worker: agent rc=0 lands in review/, then runs the configurable
  verify command (frontmatter verify: / AGENT_QUEUE_VERIFY) in cwd;
  pass -> testing/ (QA), fail -> failed/, none -> parks in review/
- new commands: ship (testing->shipped, manual gate), promote
  (advance one stage), reject (review/testing->failed); requeue now
  also pulls from review/testing
- status + dashboard.mjs render all six stages; RECENT panel labels
  shipped/testing/review/verify_failed/timeout/rejected
- README: new lifecycle diagram, verify: frontmatter, result= glossary,
  command table + folder layout
- selftest: assert no-verify->review, verify-pass->testing->ship->shipped,
  verify-fail->failed
- rename queue/doing->building, queue/done->review; add testing/ shipped/
2026-05-29 16:03:01 -07:00
saravanakumardb1
9b49c28af5 chore(agent-queue): add self-test harness (shellcheck + no-op run cycle) 2026-05-28 22:07:15 -07:00