# Two-Factory Parallel Demo (Phase-2 Exit Criteria, §14) This demo closes the final Phase-2 exit-criteria box: **≥2 factories executing jobs in parallel through one coordinator**, proving the concurrency guarantees end-to-end. It is a **harness over the existing runtime** — it does *not* change `agent-queue.sh` or `lib/fleet-client.sh`; it starts two real `agent-queue.sh run` daemons (distinct factoryIds, separate queues/cwds) that compete **only** through the coordinator, then observes and asserts. ## The three guarantees it proves | # | Guarantee | How it's shown | |---|-----------|----------------| | **(a)** | **No double-assign** | Each of the 3 jobs is claimed/executed by exactly **one** factory. The coordinator's atomic claim (lock-guarded; only a `queued` job is claimable) means two concurrent claimers never get the same job version. | | **(b)** | **Fencing + reclaim** | One factory is **killed mid-job**. The reaper returns its in-flight job to `queued` with a **bumped lease epoch**; the surviving factory **reclaims and completes** it. The dead worker's late/zombie report (stale epoch) is **fenced (HTTP 409)** and never ships. | | **(c)** | **Parallelism** | Both factories hold an active job **simultaneously** (observed in coordinator state) — work is concurrent, not serialized. | ## Run it ### Stub mode (default, zero dependencies, CI-safe) ```bash bash demo/two-factory-demo.sh ``` Drives [`coordinator-stub.sh`](coordinator-stub.sh) — a stateful, lock-guarded, file-backed coordinator that implements the same claim / lease / fence / reaper contract as platform-service, via the existing `AQ_FLEET_API_CMD` test seam. No platform-service, no Cosmos, no network. This is exactly what `selftest.sh` runs headlessly. ### Real-coordinator mode (against a live platform-service) ```bash DEMO_MODE=real \ AQ_FLEET_API=http://localhost:4003/api \ AQ_FLEET_TOKEN= \ AQ_PRODUCT_ID= \ bash demo/two-factory-demo.sh ``` In real mode the demo submits via the platform-service fleet API and relies on the coordinator's **own lease reaper** to reclaim the killed factory's job (it waits `DEMO_REAP_WAIT` seconds; pair with a short `AQ_FLEET_LEASE_SECONDS` so the lease expires quickly). Submit endpoint is overridable via `DEMO_SUBMIT_PATH` (default `/fleet/jobs`). Real mode is observational/best-effort — the machine-checked assertions run in stub mode (and in `selftest.sh`). ## Env knobs | Var | Default | Meaning | |-----|---------|---------| | `DEMO_MODE` | `stub` | `stub` or `real` (auto-set to `real` when `AQ_FLEET_API`+`AQ_FLEET_TOKEN` are set and `DEMO_MODE` ≠ `stub`) | | `DEMO_JOB_SLEEP` | `2` | per-job engine seconds — the window during which the victim is killed mid-job | | `DEMO_TIMEOUT` | `60` | max seconds to wait for the survivor to drain all 3 jobs | | `DEMO_POLL` | `0.2` | coordinator-state poll interval | | `DEMO_FACTORY_1` / `DEMO_FACTORY_2` | `mac-1` / `ubuntu-1` | factory ids (F1 is the victim) | | `DEMO_KEEP` | `0` | `1` keeps the temp dir (queues, logs, coordinator state) for inspection | | `DEMO_REAP_WAIT` / `DEMO_DRAIN_WAIT` | `20` / `30` | real-mode waits for the coordinator reaper / drain | ## What to watch The demo prints a step-by-step trace and a final `RESULTS` block. The key lines: - `PARALLELISM observed: mac-1 and ubuntu-1 both holding active jobs concurrently` — guarantee (c). - `killed factory mac-1 ... mid-job` then `reaper reclaimed mac-1's lease(s)` — the crash + reclaim. - `zombie report for @epoch=N was FENCED (HTTP 409)` — guarantee (b) fencing. - `RESULTS` shows each job's winning factory; the reclaimed job's winner is the **survivor**. With `DEMO_KEEP=1`, inspect under the printed temp dir: - `coord/events.log` — the coordinator's audit trail: `CLAIM` / `PATCH:` / `RECLAIM` / `FENCE` events (factory + epoch on each). - `coord/jobs/.job` — final per-job `stage` / `holder` / `epoch`. - `log-mac-1.txt`, `log-ubuntu-1.txt` — each factory's run-loop log (claims, the `▶ launching`, the fenced/quarantine path on the killed worker). ## Files - `two-factory-demo.sh` — the orchestrator (start factories, kill/reclaim/fence, assert). - `coordinator-stub.sh` — the stateful coordinator stub (claim/patch/fence/renew/release/reap, mkdir-locked). - `start-fleet.example.sh` — reference launcher for a **real** multi-product local fleet against a live platform-service (one `agent-queue.sh run` daemon per product). Parameterized via env; ships the two settings you must get right — `AQ_FLEET_GATE=1` (M0 RU gate) and `AQ_FLEET_LEASE_RENEW_SEC=30` (heartbeat cadence < the 90s stale threshold). Copy + adjust for your sandbox.