# Fleet Dispatch Redesign — Broker-Backed, On-Demand Factories > Design proposal (no code yet). Companion to `GIGAFACTORY_SYSTEM_OVERVIEW.md` > (what exists today) and `GIGAFACTORY_ROADMAP.md` (source-of-truth spec). This > doc realizes roadmap **Phase 4** ("Message bus + autoscaling") and the > routing-model cleanup that comes with it. Last reviewed: **2026-05-31**. > > **Review log** > - v1 (2026-05-31): initial proposal. > - v2 (2026-05-31): self-review pass — reconciled the routing model > (coordinator-targeted as primary), fixed the Cosmos outbox transactionality > claim (change feed *is* the log), constrained message size (jobId + routing > props only), addressed long-job vs Service Bus 5-min lock, corrected the > idempotency key (`MessageId = jobId`), renamed migration steps `M0–M3` to > avoid collision with roadmap phases, fixed the Phase-0 RU figure, and added a > ticked roadmap checklist + auth/observability notes. > - v3 (2026-05-31): added **§5.5 Error handling & cleanup** (current behavior + > lease-release-on-failure, branch/worktree GC, same-repo worktree clobber). > Review fixes: unified the field name to `targetFactoryId` (§5.1), reconciled > §5.3 with the complete-on-claim model (broker is not the redelivery path), > aligned §6 token scoping with per-factory subscriptions, and added the GC / > `POST /fleet/fail` checklist block to §12. --- ## 1. Why this doc exists (the two smells) Two structural problems surfaced while running the local fleet against `tracker-web` + `platform-service`: ### 1.1 Product-as-queue is conflated with repo-as-work-target - `fleet_jobs` is partitioned by **`/productId`**, and a factory is bound to a single product via `AQ_PRODUCT_ID`. The job's **`repo`** is just a payload field (the PR target). Routing uses `productId`; the repo is orthogonal. - Consequence observed: a `learning_ai_notes` job submitted via the form was filed under **`chronomind`** (because the form's Factory dropdown maps `mac-2 → chronomind`), and would have opened a PR to the notes repo from a "chronomind" factory. Nothing ties the product to the repo, and nothing guarantees the chosen factory even has that repo checked out. - The form (`dashboards/tracker-web/.../fleet/jobs/page.tsx`) hardcodes `FLEET_FACTORIES = [mac-1→lysnrai, mac-2→chronomind]` and defaults `capabilities = "build"` — a capability **no agent-queue factory ever advertises** (`detect_capabilities` only emits `os:*`, `engine:*`, `node:*`, `has:*`). So default UI submissions are unroutable to live factories. ### 1.2 Pull-poll daemons burn Cosmos RU to stay "ready" - The run loop iterates every **`POLL_SECONDS=3`**; with `AQ_FLEET_ROUTE=1` (default) each iteration calls `POST /fleet/claim`. - `claimNextJob` runs `repo.listJobs({ productId })` — **reads every job doc in the product partition, no stage filter, no limit** — on every claim, plus a `getLease` point-read per active job when preemption is on. - One process **per product** (`_start_fleet.sh` spawns 4) ⇒ ~`4 × (1/3s)` ≈ **115k claim queries/day at idle**, each scaling with partition size, billed continuously whether or not work exists. The machine must also stay up running the loop. > **Root cause:** `productId` is doing double duty as *tenant/billing scope* and > *work-routing queue*, and work discovery is a busy-poll against the state store. --- ## 2. Goals, non-goals, constraints **Goals** - Eliminate idle-poll RU cost; pay (near) zero when there is no work. - Make a factory a **generic build worker** (host + capabilities + engines + checked-out repos), not a product-bound process. - Route work by what actually matters (**capabilities + repo**), while keeping per-product **billing, budgets, visibility, and token scoping**. - Preserve the existing **weighted scheduler** and **leaseEpoch fencing** (exactly-once assignment, zombie-writer protection). - Enable later **on-demand spawn** (scale-to-zero) without re-architecting. **Non-goals (this phase)** - Replacing Cosmos as the system of record for job/lease/event/budget **state**. - Rewriting the scheduler's scoring math. - Multi-region / cross-cloud dispatch. **Hard constraints (ecosystem rules)** - Every Cosmos doc keeps a `productId` (platform rule) — product stays a first-class **tag**, even when it is no longer the routing key. - Per-product budgets (`fleet_budgets /productId`), enrollment tokens (§12), and the `tracker-web` per-product views must keep working. - Changes must be flag-gated and reversible (match the existing `AQ_FLEET` / `AQ_FLEET_ROUTE` / `AQ_FLEET_SHADOW` cutover discipline). --- ## 3. Decision summary 1. **Do NOT build A3 ("single shared queue") inside Cosmos.** A single logical queue tempts a hot partition; scaling it forces a synthetic partition key and a **cross-partition "find next job" query**, which *increases* RU — the opposite of the goal. It also dissolves the per-product isolation the platform's tenancy/budget/token model depends on. 2. **Get the shared-queue behavior from a real broker (B3), not from Cosmos.** Adopt **Azure Service Bus** as the dispatch substrate. Cosmos remains product-partitioned for **state**; the broker owns **delivery**. 3. **Keep the scheduler.** Use a **coordinator-owns-scheduling / broker-owns-delivery hybrid** (B2 ⊕ B3): the coordinator decides *which factory* should run a job and pushes a **targeted** message; the broker handles transport, visibility timeout, retries, and dead-lettering. 4. **Ship the cheap RU win first (B1) as step M0** — it is reversible, needs no new infra, and de-risks the broker migration by removing the bleed while the bigger change is built and shadowed. > Net: the shared-queue *experience* (generic workers, one work stream) comes > from Service Bus topics/subscriptions; Cosmos stays `/productId`-partitioned > for state, budgets, and visibility. --- ## 4. Target architecture ### 4.1 Components & ownership | Concern | Owner (target) | Notes | | --- | --- | --- | | Job/lease/event/budget **state** | **Cosmos** (`/productId`, `/jobId` as today) | unchanged system of record | | **Scheduling** (which factory) | **Coordinator** (platform-service) | existing weighted scorer + preemption | | **Dispatch / delivery** | **Service Bus** | competing consumers, visibility timeout, DLQ | | **Fencing** (zombie writers) | **Cosmos `leaseEpoch`** | broker visibility ≠ correctness boundary | | Per-product billing/budgets/tokens | **Cosmos + coordinator** | enforced at submit + assign, not by partition | | Control planes | `tracker-web`, `agent-queue` dashboard | unchanged REST surface | ### 4.2 Service Bus topology - One **topic** `fleet-dispatch`. - **Primary model — coordinator-targeted (preserves the scheduler):** the coordinator picks the factory, then publishes a message stamped with `targetFactoryId`. Each factory has its **own subscription** with a **correlation filter** `targetFactoryId = ''`. The broker does no policy — it just delivers the scorer's decision. **This is the model the rest of this doc assumes.** - **Fallback model — self-select (only if the scheduler is disabled):** capability/repo **SQL filters** on message application properties let consumers self-match. Multi-valued `capabilities` do **not** filter cleanly as one string, so encode each as a boolean property (`cap_os_mac=true`, `repo_learning_ai_notes=true`) rather than `LIKE '%…%'`. Subscription filters are why Service Bus beats Storage Queue / SQS (which can't filter → a queue-per-class sprawl). - **Messages stay small.** A message carries only `{ jobId, productId, repo, caps, priority, targetFactoryId }` — **not** `bodyMd`/manifest. The consumer reads the full job from Cosmos by `jobId`. (Service Bus max message is **256 KB** Standard / 1 MB Premium; job bodies can approach that — reinforcing "broker = transport, Cosmos = state".) - **DLQ** per subscription ⇒ maps onto `failed` / `retries_exhausted`. - **Sessions** (optional) keyed by `repo` to serialize same-repo work and avoid worktree/branch contention on one host. ### 4.3 Why this keeps the scheduler A vanilla broker is FIFO competing-consumers and does **no** weighted scoring. To preserve the existing scorer (`capabilityFit / affinity / load / costFit / health / starvation`) + preemption + seat limits, the coordinator stays in the decision path: it **selects the target factory** and publishes a message whose filter routes it to *that* factory's subscription (or a per-factory subscription). The broker is transport, not policy. --- ## 5. Key flows ### 5.1 Submit → dispatch (consistency) The **Cosmos change feed on `fleet_jobs` is the durable, ordered event log**, so no separate outbox container is needed for the primary design: 1. `submitJob` writes the `fleet_jobs` doc (`stage: queued`). That write *is* the event. 2. A single **dispatcher** (coordinator process) tails the `fleet_jobs` change feed (via a lease container), runs the scheduler for each new/`queued` job, stamps `targetFactoryId` on the job (CAS), and **publishes** the targeted Service Bus message. 3. **Crash-safe & idempotent:** the change feed redelivers from the last checkpoint on dispatcher restart; Service Bus **duplicate detection** keyed on **`MessageId = jobId`** collapses re-publishes. The consumer is idempotent because the authoritative claim is a Cosmos CAS on `leaseEpoch` — a second delivery is simply fenced (`leaseEpoch` is assigned *at claim*, so it is **not** a valid dedup key for the message itself). > A separate **transactional outbox** is only needed if you ever publish *inline* > at submit instead of via the change feed. Cross-container writes are **not > atomic** in Cosmos, so an outbox row would have to live in the **same container > + same partition** as the job and be written with a **Cosmos transactional > batch** — or, simpler, carried as an `outboxState` field on the job doc itself. > The change-feed design avoids this entirely. > Net effect: the per-factory busy-poll is replaced by one change-feed-driven > dispatcher. Idle cost is event-driven, not a per-3s full-partition scan. ### 5.2 Deliver → claim → fence 1. Factory **receives** a message (long-poll/`receiveMessages`, no RU). 2. Factory calls `POST /fleet/claim` (or a lighter `/fleet/accept`) with `{ jobId, factoryId }`. Coordinator does the **CAS lease** in Cosmos exactly as today (`revUpdateJob` + `leaseEpoch` bump) and returns the new epoch. 409 ⇒ fenced ⇒ factory abandons the message (it goes back / to DLQ). 3. The **broker lock** governs redelivery (a dead consumer's message reappears); the **Cosmos `leaseEpoch`** governs *correctness* (a zombie writer is rejected on PATCH). Two distinct mechanisms — do not collapse them. 4. **Long-running jobs vs the broker lock.** Service Bus message lock max is **5 minutes**; a coding job runs far longer. Two viable patterns: - **(recommended) complete-on-claim:** complete the message immediately after a successful Cosmos claim. The **Cosmos lease + reaper** then own liveness — on crash the reaper sets the job back to `queued`, which is a change-feed event that **re-dispatches** (§5.1). This decouples job runtime from the 5-min lock entirely. - **renew-lock:** keep the message locked and call `renewMessageLock` on a timer, reusing the existing `AQ_FLEET_LEASE_RENEW_SEC` cadence to renew *both* the Cosmos lease and the broker lock. Simpler delivery semantics, but couples runtime to the broker and risks redelivery storms on long jobs. ### 5.3 Failure / retry / DLQ Assumes the recommended **complete-on-claim** model (§5.2): the broker message is completed at claim, so the broker is **not** the redelivery path — re-dispatch is driven by Cosmos stage changes through the change feed (§5.1). - **Logical failure** (engine error / verify-fail) ⇒ coordinator transitions `failed` and **releases the lease immediately** (new `/fleet/fail`, see §5.5); no redelivery (a logical failure is terminal unless a retry policy applies). - **Retryable failure** ⇒ coordinator sets the job back to `queued` (attempts++, backoff) ⇒ change-feed re-dispatch to the next best factory. - **Crash / lease-expiry** ⇒ the **reaper** reclaims the Cosmos lease (bumps `leaseEpoch`, fencing the dead holder) and returns the job to `queued` ⇒ change-feed re-dispatch. (With the alternative *renew-lock* model, broker redelivery is the trigger instead — pick one, not both.) - **Exhausted retries** ⇒ Cosmos `retries_exhausted`; mirror to the broker DLQ for visibility. ### 5.4 Routing model (the §1.1 fix) - Job carries `repo` + required `capabilities` (real tokens: `os:*`, `engine:*`, `has:git`, plus a new `repo:` token). - The **scheduler** does the matching: it picks among factories that advertise those caps **and** have the repo locally (or can clone it), then targets the winner (§4.2 primary model: message stamped `targetFactoryId`, delivered via that factory's correlation-filtered subscription). - **Product is a property/tag** used for billing/visibility and budget checks — **not** the routing key. (In the self-select fallback, product/caps/repo become subscription SQL filters instead.) - Fix the `tracker-web` form in lockstep: derive factories/repos from live data, drop the bogus default `capabilities = "build"`, and stop hardcoding `mac-1/mac-2`. ### 5.5 Error handling & cleanup (worktrees, branches, leases) **Today (single-host, agent-queue.sh).** The worker already handles errors well: the stage machine routes `timeout`/`budget_exceeded`/`crash`/`verify_failed`/ `capability_mismatch`/`no_engine` through `_finish_failure` (→ `failed/`, with a retry policy that requeues to `inbox/` with backoff); a `trap` writes a WIP checkpoint to `aq/wip/` on **every** exit path; `recover_orphans` requeues dead-worker `building/` jobs; and a **FENCED** report (stale `leaseEpoch`) triggers `fleet_quarantine` → `failed/` that **never ships or merges** (split-brain guard). PR/merge cleanup: `.aq_pr.md` is removed before commit; the PR branch `aq/job/` is deleted on auto-merge (`--delete-branch`); the repo worktree is force-recreated at the next job for that repo. **Gaps this redesign must close.** These are real loose ends in the current code: 1. **No client-side lease release on failure.** `_finish_failure` is fleet-agnostic, so a failed fleet job's lease only frees on **expiry** via the reaper — slow recovery. Target: a `POST /fleet/fail` (stage=`failed`/`queued` + release lease) so failure is reflected and the lease freed **immediately**. 2. **Unbounded git artifacts.** `aq/wip/` branches are never GC'd; worktrees are cleaned only on reuse; unmerged `aq/job/` branches accumulate on origin when auto-merge is off or blocked by branch protection. Target: a periodic **GC** sweep — delete merged `aq/job/*`, prune stale worktrees, and sweep `aq/wip/*` after a job reaches a terminal/shipped stage. 3. **Same-repo concurrency can clobber a worktree.** The per-repo worktree is force-recreated, so two same-repo jobs on one host collide. Target: **Service Bus sessions keyed by `repo`** (serialize same-repo work) plus a per-`(host, repo)` lock as a local backstop. **Target invariants.** - Terminal failure ⇒ Cosmos `failed` + lease released now (no expiry wait); DLQ mirrors `retries_exhausted` for visibility. - Crash / fence ⇒ reaper bumps `leaseEpoch` (fences zombie) ⇒ `queued` ⇒ change-feed re-dispatch (§5.3). - Cleanup is **explicit and idempotent** — safe to re-run, never deletes a branch with unmerged work or a worktree with an in-flight job. (Checklist in §12.) --- ## 6. Per-product tenancy without product-partitioned queues - **Budgets:** checked by the coordinator at **assign time** (it already reads `fleet_budgets /productId` in `claimNextJob`); unchanged, just moved to the dispatcher. - **Tokens (§12):** the factory token still scopes `productId + capabilities + factoryId`. In the primary (coordinator-targeted) model the dispatcher only ever targets a factory the scheduler deemed eligible, and the coordinator **re-checks the token on `/fleet/claim`** — so least-privilege holds without relying on the subscription topology. (In the self-select fallback, scope it with per-product/per-token subscription filters instead.) - **Visibility:** `tracker-web` keeps querying per product (state is still product-partitioned), so the UX is unchanged. --- ## 7. Alternatives considered | Option | Verdict | Reason | | --- | --- | --- | | **A3 shared queue in Cosmos** | ✗ | hot partition; cross-partition claim = more RU; loses tenancy isolation | | **A1 validate ownership only** | partial | fixes "wrong factory" but not the RU/poll model or process-per-product | | **Storage Queue / SQS broker** | ✗ (for now) | no subscription filters ⇒ queue-per-capability sprawl; weaker DLQ/visibility ergonomics | | **B2 change feed, no broker** | viable | good for dispatch signal, but still needs a transport to *reach* factories; pairs naturally with B3 | | **Plain competing-consumers (drop scheduler)** | ✗ | throws away weighted scoring + preemption + cost/affinity routing | | **B3 Service Bus + coordinator hybrid** | ✓ chosen | zero idle RU, keeps scheduler + fencing, filters give capability/repo routing, paves path to B4 | --- ## 8. Phased migration > Steps are labelled **M0–M3** to avoid collision with the roadmap's Phase 0–5 > numbering; all of M0–M3 sit *inside* roadmap **Phase 4**. The ticked checklist > is in §12. ### M0 — RU quick win (no new infra, fully reversible) — *do now* - Add a per-product `queue_version`/`pending_count` doc bumped on submit/stage change. The factory's loop does a **1-RU point-read** of that doc each tick and only runs the expensive `listJobs`/claim when it changed. - Raise `POLL_SECONDS` (e.g. 3 → 15–30) and add jittered backoff when idle. - Expected: **~10–50× fewer claim queries at idle**, behavior otherwise identical. Gate behind a flag; trivially revertible. ### M1 — Stand up the broker in **shadow** - Provision Service Bus (`fleet-dispatch` topic + subscriptions) with **managed-identity** auth (no connection-string keys in env/`.env`). Coordinator publishes messages **in parallel** with the existing claim path but factories still source work from Cosmos. Use the existing `AQ_FLEET_SHADOW` discipline: record divergence (did the broker route match the scorer's pick?) without acting on it. ### M2 — Cutover delivery to the broker - Flip a flag so factories source work from Service Bus + `/fleet/claim` for fencing; Cosmos poll path becomes the fallback only. Keep the reaper + lease fencing untouched. Validate exactly-once + crash recovery on multi-host. ### M3 — On-demand factories (B4) - KEDA / Container Apps scale-to-zero on subscription depth: spin a factory only when depth > 0; idle ⇒ **zero** running workers and zero RU. Warm-pool a single small instance if cold-start latency matters. --- ## 9. Risks & mitigations | Risk | Mitigation | | --- | --- | | Dual source-of-truth (broker + Cosmos) drift | change-feed *is* the log (no separate outbox); SB duplicate-detection on `MessageId=jobId`; claim is a Cosmos CAS on `leaseEpoch` | | Broker lock vs `leaseEpoch` confusion | explicit rule: broker lock = *delivery*, `leaseEpoch` = *correctness*; never merge (§5.2) | | Long job > 5-min broker lock | **complete-on-claim** (reaper + change feed re-dispatch) or `renewMessageLock` on the lease cadence (§5.2) | | Message > 256 KB | message carries `jobId` + routing props only; consumer reads body from Cosmos (§4.2) | | Same-repo worktree contention across hosts | Service Bus **sessions** keyed by `repo` to serialize same-repo jobs | | Lost scheduler features under FIFO | coordinator keeps assignment; broker only transports targeted messages | | Token scope leak in shared subscriptions | per-factory subscription + correlation filter; coordinator re-checks the §12 token on claim | | Secrets in env (`.env` keys) | **managed identity** for Service Bus + Cosmos; no connection-string keys committed | | Blind operation | emit metrics: subscription depth, dispatch lag, claim-conflict (409) rate, DLQ count, change-feed lag — wire to existing monitoring | | Migration regressions | M1 shadow measures divergence before any cutover; all flag-gated | --- ## 10. Open questions 1. **Per-factory subscription scale.** The chosen coordinator-targeted model uses one subscription per factory (correlation filter on `targetFactoryId`). Service Bus allows up to **2,000 subscriptions/topic**, so this scales for realistic fleets. If factory churn is high, fall back to a single subscription with a per-consumer `targetFactoryId` SQL filter. 2. **Where does the dispatcher run?** A new lightweight loop in platform-service vs a separate worker. A change-feed lease container is required either way; a single active dispatcher (leader-elected) avoids double-publish. 3. **Cost envelope:** Service Bus tier (Standard vs Premium). Standard likely sufficient; Premium only if sessions/large messages/VNet are needed. Confirm against expected message volume. 4. **Do we keep the Cosmos poll path permanently** as an offline/degraded fallback (like today's `AQ_FLEET_ROUTE=0`)? Recommend yes. 5. **Repo advertisement.** How does a factory tell the coordinator which repos it has locally (for the `repo:` capability)? Extend the heartbeat payload with a `repos[]` list, or derive from `AQ_FLEET_REPO_BASE`. --- ## 11. Appendix — idle RU cost sketch (today vs M0 vs target) | Model | Claim/work-find ops at idle (4 factories) | Notes | | --- | --- | --- | | **Today** (poll 3s) | ~115k/day full-partition `listJobs` | scales with partition size; ~`4 × 28.8k` | | **M0** (poll 15–30s + gate) | ~12–23k/day **1-RU point-reads** + ~0 full scans | full scan only when the gate doc changes | | **Target (B3)** | ~0 | long-poll receive, no RU; full scan never on the hot path | > Figures are order-of-magnitude to frame the decision, not a billing estimate. > A full-partition `listJobs` costs many RU and grows with partition size; a > point-read is ~1 RU and flat. The point: idle cost goes from "linear in > partition size, forever" to "≈ zero". --- ## 12. Roadmap & checklist (roadmap Phase 4) Acceptance gate for the whole effort: **idle work-find RU ≈ 0**, the "wrong-factory / ineligible-capability" stranding is gone, exactly-once assignment + crash recovery still hold on multi-host, and every step is flag-gated + reversible. ### M0 — RU quick win (no new infra) - [ ] Add per-product `queue_version`/`pending_count` doc; bump on submit + stage change. - [ ] Factory loop point-reads the gate each tick; run `listJobs`/claim only when it changed. - [ ] Raise `POLL_SECONDS` default (3 → 15–30) + jittered idle backoff. - [ ] Flag-gate (e.g. `AQ_FLEET_GATE=1`) with a clean off-switch. - [ ] Verify: idle claim queries drop ~10–50×; functional behavior unchanged (selftest green). ### Routing-model fix (lands with M0/M1) - [ ] Add `repo:` capability token; factories advertise local repos via heartbeat (`repos[]`). - [ ] Scheduler matches on caps **+ repo**; product becomes a tag, not the routing key. - [ ] Fix `tracker-web` New-Job form: drop default `capabilities="build"`, stop hardcoding `mac-1/mac-2`, derive factories/repos from live data. - [ ] Add product→repo ownership validation (reject/route mismatches) — the A1 safety net. ### M1 — Broker in shadow - [ ] Provision Service Bus `fleet-dispatch` topic + per-factory subscriptions (managed identity, no keys). - [ ] Change-feed dispatcher (leader-elected) tails `fleet_jobs`, runs scheduler, publishes targeted messages (`MessageId=jobId`, dup-detection on). - [ ] Publish in **shadow** alongside the Cosmos claim path; record route divergence (no action taken). - [ ] Verify: ≥ N hours shadow with broker-route == scorer-pick within tolerance. ### M2 — Cutover delivery - [ ] Factories consume from Service Bus; `/fleet/accept` does the Cosmos CAS claim + returns `leaseEpoch`. - [ ] Implement **complete-on-claim** (reaper + change-feed re-dispatch owns liveness). - [ ] Cosmos poll path retained as flag-gated fallback (`AQ_FLEET_ROUTE=0`). - [ ] Emit metrics: subscription depth, dispatch lag, 409 claim-conflict rate, DLQ count, change-feed lag. - [ ] Verify exactly-once + crash recovery on a real multi-host run; DLQ ↔ `failed`/`retries_exhausted` mapping correct. ### Error handling & cleanup (lands with M2) — see §5.5 - [ ] Add `POST /fleet/fail` so a failed job sets the coordinator stage + **releases the lease immediately** (no expiry wait); wire it into `_finish_failure` / `fleet_quarantine`. - [ ] GC sweep (idempotent): delete merged `aq/job/*` branches, prune stale worktrees, sweep `aq/wip/*` after a job reaches a terminal/shipped stage. - [ ] Prevent same-repo worktree clobber: Service Bus **sessions keyed by `repo`** + a per-`(host, repo)` local lock. - [ ] Verify: failed jobs free their lease promptly; no orphaned worktrees/branches after N jobs; GC never deletes unmerged work or an in-flight worktree. ### M3 — On-demand factories (scale-to-zero) - [ ] KEDA / Container Apps scaler on subscription depth; idle ⇒ zero running workers. - [ ] Optional warm-pool (1 small instance) if cold-start latency matters. - [ ] Verify: zero idle workers + zero idle RU; cold-start latency within target. ### Docs to update on completion - [ ] `GIGAFACTORY_ROADMAP.md` — tick Phase 4; correct the stale §0 progress table. - [ ] `GIGAFACTORY_SYSTEM_OVERVIEW.md` — add the broker/dispatcher to the architecture + code map. - [ ] common_plat `docs/GIGAFACTORY/` — mirror the backend/dispatcher changes.