bytelyst-devops-tools/agent-queue/docs/jobs/phase2-feature-flags-shadow.md

---
engine: devin
cwd: /Users/sd9235/code/mygh/learning_ai_devops_tools
yolo: true
lock: agent-queue
timeout: 4h
---

ROLE: Senior bash + distributed-systems engineer. Implement PHASE 2 — FLEET FEATURE FLAGS
+ SHADOW / DUAL-RUN for the agent-queue runner: a safe, reversible path to validate the
fleet coordinator against the proven single-host (P1) behavior BEFORE any real cutover.

PARALLEL-SAFETY (another Devin is running in a DIFFERENT repo — learning_ai_common_plat —
on enrollment/tokens; no file overlap with you. Stay within the agent-queue repo):
- You OWN: agent-queue/lib/fleet-client.sh, agent-queue/agent-queue.sh (the fleet hook
  points only), agent-queue/selftest.sh, agent-queue/README.md,
  agent-queue/docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md.
- Keep the offline git-queue path unchanged when fleet is off. All 60 existing selftest
  checks MUST stay green.

READ FIRST:
- agent-queue/lib/fleet-client.sh — the P2-S3 client: fleet_enabled, fleet_api,
  fleet_claim, fleet_report, lease renew/release, fleet_quarantine. You EXTEND this.
- agent-queue/agent-queue.sh — the run loop + the existing fleet hook points + the offline
  path (cmd_add/run_worker/ship). Study how AQ_FLEET gates everything today.
- agent-queue/docs/GIGAFACTORY/GIGAFACTORY_ROADMAP.md §9 (split-brain / offline degrade), §16/§17
  (feature flags fleet.enabled / fleet.route_via_service), §27 (cutover & rollback).

PREREQUISITE / BRANCHING: branch off CURRENT main → feat/gigafactory-p2-flags-shadow.
Push + open PR. DO NOT merge.

FLAG MODEL (three explicit, independently-toggleable levels; document precedence):
- AQ_FLEET=0|1            master switch (exists). 0 ⇒ pure offline, zero coordinator calls.
- AQ_FLEET_ROUTE=0|1      route_via_service: when 1 (and AQ_FLEET=1) the coordinator is
                          AUTHORITATIVE for claim/assignment (today's P2-S3 behavior).
                          When 0, the LOCAL inbox is authoritative (coordinator not used to
                          source work) — this is the pre-cutover state.
- AQ_FLEET_SHADOW=0|1     shadow/dual-run: when 1 (requires AQ_FLEET=1, AQ_FLEET_ROUTE=0)
                          the runner does its normal OFFLINE/local processing as the
                          authoritative path, and IN PARALLEL queries the coordinator
                          (shadow claim + shadow report) WITHOUT acting on its responses —
                          purely to compare decisions and record divergence. Shadow NEVER
                          ships, quarantines, or mutates real job state.

DELIVERABLES
1. fleet-client.sh additions (all guarded; no-ops unless their flag is on):
   - fleet_route_enabled / fleet_shadow_enabled helpers (precedence: SHADOW only meaningful
     when ROUTE=0; if both ROUTE=1 and SHADOW=1, ROUTE wins and a warning is logged).
   - fleet_shadow_claim — asks the coordinator what it WOULD assign for this factory's caps,
     without claiming a lease for real (read-only / dry-run; if the API has no dry-run, claim
     then immediately lease/release, or use a shadow factoryId — pick the least-invasive and
     document it). Returns the would-be job id (or none).
   - fleet_shadow_compare — given the LOCAL decision (the job the offline path actually ran)
     and the coordinator's would-be decision, classify AGREE / DIVERGE / COORD_EMPTY /
     LOCAL_EMPTY and append a structured line to a shadow log
     (agent-queue/queue/.state/fleet-shadow.log: ts, localJob, coordJob, verdict).
   - fleet_shadow_report — mirrors stage transitions to the coordinator as shadow events
     (clearly flagged shadow=1) so reporting is exercised, but divergence in the coordinator
     response is logged, never acted on.
2. agent-queue.sh wiring (minimal, flag-gated):
   - run loop: if SHADOW on, after the local authoritative decision each iteration, call
     fleet_shadow_claim + fleet_shadow_compare (best-effort, error-swallowed — shadow must
     NEVER fail a real job).
   - ROUTE flag: thread it so claim sourcing honors it (ROUTE=1 ⇒ coordinator-sourced as
     today; ROUTE=0 ⇒ local inbox authoritative even when AQ_FLEET=1).
   - new subcommand `aq fleet-shadow-report` — summarize the shadow log (counts of
     AGREE/DIVERGE/…, last N divergences). Add to dispatch + help.
   - surface the three flags' resolved state in `aq status` / `aq fleet-status`.
3. Cutover safety: document the recommended rollout ladder in README — (1) AQ_FLEET=1,
   ROUTE=0, SHADOW=1 (observe, zero risk) → (2) inspect agreement rate → (3) flip ROUTE=1
   once agreement is high → rollback = set ROUTE=0 (and/or AQ_FLEET=0) at any time.

TESTS — extend selftest.sh (stub the coordinator like the P2-S3 fleet stub; all 60 prior
checks stay green):
- flags off: AQ_FLEET=0 ⇒ zero coordinator calls (incl. shadow); offline flow identical.
- shadow agree: stub returns the same job the local path runs ⇒ shadow log records AGREE;
  the real job still ships via the offline/local path; coordinator state NOT mutated for real.
- shadow diverge: stub returns a different/empty job ⇒ DIVERGE/COORD_EMPTY logged; real job
  still completes; nothing quarantined.
- shadow is non-fatal: coordinator 5xx/timeout during shadow ⇒ real job still completes,
  exit 0, a shadow-error noted.
- ROUTE precedence: ROUTE=1 + SHADOW=1 ⇒ ROUTE path taken, warning logged, no shadow compare.
- ROUTE=0 + AQ_FLEET=1 ⇒ local inbox is authoritative (coordinator not used to source work).
- fleet-shadow-report summarizes the log counts correctly.

VERIFY GATE:
- bash agent-queue/selftest.sh   (60 prior + new shadow/flag cases; none weakened)
- bash -n agent-queue/agent-queue.sh && bash -n agent-queue/lib/fleet-client.sh
- shellcheck --severity=error agent-queue/agent-queue.sh agent-queue/lib/fleet-client.sh
- node --check agent-queue/dashboard.mjs (if unchanged)

CONSTRAINTS: bash + curl + POSIX awk only (no jq/new deps); reuse P2-S3 helpers; shadow must
be strictly side-effect-free on real job state; offline path unchanged when AQ_FLEET=0;
never hardcode tokens; conventional commits (feat(agent-queue): ...); never weaken a test;
do not edit the common-plat repo.

FINAL OUTPUT — report in EXACTLY this format:
## Implementation Report — Phase 2 Feature Flags + Shadow/Dual-run
### Branch & commits / PR
### Files changed
### What was implemented (flag model + precedence, shadow claim/compare/report, cutover ladder)
### Tests added (+ selftest summary = 60 prior + N new; esp. flags-off no-op, shadow non-fatal, ROUTE precedence)
### Verify gate results
### Deviations / assumptions (how shadow claim avoids real lease mutation)
### Suggested next slice