# Runbook — Run a Devin Fleet Job End‑to‑End (local) > **Audience:** developers and coding agents. > **Goal:** stand up `platform-service` + `tracker-web` + a **fleet factory** (the > `agent-queue` runner) so a submitted job is claimed and executed **autonomously > by the Devin CLI** against a target repo (worked example: `learning_ai_notes`), > pushing a branch and opening a **real pull request**. > ⚠️ **This is a real, cost‑incurring, side‑effecting operation.** The factory runs > an autonomous coding agent (Devin) that consumes API credits, can run for a long > time, pushes a branch, and opens a **real PR** on GitHub. Read [§9 Safety & > cost](#9-safety--cost) before launching. For unattended local prototyping only — > not a production deployment guide. --- ## 1. Architecture (what talks to what) ``` you ──▶ tracker-web (:3003) ─┐ │ REST + SSE (/api/fleet/*) coding agent / curl ────────┼─▶ platform-service (:4003) ──▶ Azure Cosmos (jobs/runs/leases/events) │ ▲ ▲ Prometheus (:9090)┘ │ │ claim / lease-renew / report (Bearer JWT + X-Product-Id) Grafana (:3000) ───────────┘ │ │ agent-queue FACTORY (fleet mode) ──▶ Devin CLI ──▶ git push + gh pr create (learning_ai_devops_tools/agent-queue) (target repo, e.g. learning_ai_notes) ``` - **platform-service** — the fleet **coordinator**. Owns the job lifecycle (`queued → assigned → building → review → testing → shipped|failed|dead_letter`), atomic claim, leases, events, budgets, metrics. Code: `services/platform-service/src/modules/fleet/`. - **tracker-web** (`:3003`) — submit/inspect jobs (`/dashboard/fleet/jobs/...`). - **factory** — `learning_ai_devops_tools/agent-queue` in **fleet mode**. Polls `POST /api/fleet/claim`, runs the agent CLI in an isolated checkout, reports back, and (PR mode) opens the PR. - **Prometheus/Grafana** — fleet metrics + the "Fleet Overview" dashboard. Lifecycle the factory drives: ``` queued ─▶ assigned ─▶ building ─▶ review ─▶ testing ─▶ shipped (claim) (agent (rc=0) (verify (manual/auto ship) running) passed) └─ agent rc≠0 / timeout / verify fail ─▶ failed ─▶ (retry|dead_letter) ``` --- ## 2. Prerequisites | Tool | Why | Check | | ----------------------------------------------- | ---------------------------------------------------------------------------- | -------------------- | | Node ≥ 20 + `pnpm` (corepack) | host-run service, scripts, tracker-web, build | `node -v && pnpm -v` | | `git` + `gh` (authenticated) | factory clones target repo, pushes branch, opens PR; `gh pr merge`/reconcile | `gh auth status` | | `devin` CLI (authenticated) | the agent the factory runs | `devin --version` | | Both repos cloned side‑by‑side | coordinator/dashboards + the factory | see below | | repo `.env` (root of `learning_ai_common_plat`) | `JWT_SECRET`, Cosmos creds, `FLEET_METRICS_TOKEN` | `test -f .env` | | Docker | **optional** — only for the Docker path (§3 Option B) / Grafana+Prometheus | `docker info` | > **Node version:** the Docker image pins **node 22**; for the host path any **Node ≥ 20** > works. Use one Node (nvm/asdf) for both repos to avoid native-module surprises. ### 2.1 First‑time setup (fresh machine) Clone both repos as **siblings** (the factory clones targets relative to a shared parent): ```bash mkdir -p ~/code && cd ~/code git clone /learning_ai_common_plat.git git clone /learning_ai_devops_tools.git # contains agent-queue (the factory) ``` Create and fill `.env` at the **root of `learning_ai_common_plat`**: ```bash cd ~/code/learning_ai_common_plat cp .env.example .env # then edit .env — minimum for the fleet flow: # JWT_SECRET= # FLEET_METRICS_TOKEN=changeme-fleet-metrics-token # only needed for Prometheus # COSMOS_* / connection vars -> see note below ``` - `JWT_SECRET` — HS256 secret platform-service verifies tokens with. Any strong value; it only needs to be **internally consistent on this machine** (the token you mint in §5 and the running service must share it). **Required.** - **Cosmos** — the default prototype talks to a **real Azure Cosmos account** (no emulator in the default compose). On a new machine you must either (a) point `.env` at the **same Cosmos account** (to see/share existing jobs) or (b) point at your own DB and set `COSMOS_AUTO_INIT=true` so containers are created on boot. Without valid Cosmos creds the service starts but every fleet call fails. - `FLEET_METRICS_TOKEN` — only needed if you run Prometheus (§4); must match `services/monitoring/prometheus/prometheus.yml` (`credentials:`). ### 2.2 Install + build the workspace (required for the host path) Host-run resolves `@bytelyst/*` workspace packages from their **`dist/`** (the `exports` field points at `dist`), so you must build them once before `tsx`/Next can import them: ```bash cd ~/code/learning_ai_common_plat pnpm install pnpm -r build # builds all workspace packages (incl. @bytelyst/* → dist/) # (faster, just the platform-service closure:) # pnpm -r --filter @lysnrai/platform-service... build ``` > Skipping this is the #1 fresh-machine failure: `tsx watch` crashes with > `Cannot find module '@bytelyst/...'/dist/index.js`. Re-run `pnpm -r build` after pulling > changes to shared packages. --- ## 3. Bring up platform-service + tracker-web Two ways. **Option A (all localhost, no Docker)** is recommended for a single dev Mac / WSL box — everything runs on the host, so `gh`-backed features work out of the box. **Option B (Docker)** is for when you also want the Grafana/Prometheus stack. ### Option A — all localhost, no Docker (recommended) Two long‑lived processes, each in its own terminal. Both assume §2.1/§2.2 are done (`.env` filled, `pnpm -r build` run). **Terminal 1 — coordinator (platform-service, :4003):** ```bash cd ~/code/learning_ai_common_plat/services/platform-service pnpm exec tsx watch --env-file=../../.env src/server.ts ``` `tsx watch` hot-reloads on source changes. Use the explicit `--env-file=../../.env` (the bare `pnpm dev` script does **not** load the root `.env`, so `JWT_SECRET`/Cosmos would be missing). `FLEET_METRICS_TOKEN` is already in `.env` if you set it in §2.1. **Terminal 2 — dashboard (tracker-web, :3003):** ```bash cd ~/code/learning_ai_common_plat/dashboards/tracker-web pnpm dev # serves http://localhost:3003 (proxies /api → :4003) ``` That's the whole coordinator + UI. **Monitoring (Grafana/Prometheus) is optional** on the host path — `GET /api/fleet/metrics` (JSON), `GET /api/fleet/autoscale`, and the tracker-web job pages cover observability without it. To get the Grafana "Fleet Overview" dashboard you need Prometheus + Grafana (run them via Docker — Option B — or Homebrew binaries pointed at `services/monitoring/...`). Because everything is on the host, `gh` is on `PATH` → the PR‑state **reconcile** (§8) and ship‑time `gh pr merge` work (unlike the Docker container, which has no `gh`). Health checks: ```bash curl -s -o /dev/null -w '%{http_code}\n' http://localhost:4003/health # 200 curl -s -o /dev/null -w '%{http_code}\n' http://localhost:3003 # 200 ``` ### Option B — Docker (adds Grafana + Prometheus) ```bash cd ~/code/learning_ai_common_plat # targeted fleet subset that always builds cleanly: docker compose up -d --build platform-service prometheus grafana # (full stack: bash scripts/prototype-up.sh) ``` Starts `platform-service` (`:4003`), `prometheus` (`:9090`), `grafana` (`:3000`, admin/`lysnrai`) + deps. Still run **tracker-web from source** (Option A, Terminal 2). > **Docker caveats:** > > - `prototype-up.sh` may fail building the **dashboard** images when > `corepack prepare pnpm@…` can't fetch pnpm on a restricted network → use the targeted > subset above. > - **`gh` is NOT in the container** → coordinator‑side `gh pr merge` and PR‑reconcile (§8) > are no‑ops in Docker. Use the host path (Option A) if you need them. > - Don't run both: the container and a host `tsx` both bind `:4003` > (`docker compose stop platform-service` before host‑running). --- ## 4. Make Prometheus auth work (only if running Prometheus) Skip this on the host path unless you also run Prometheus. `prometheus.yml` scrapes `/api/fleet/metrics/prom` with a bearer, so the running `platform-service` must see the same `FLEET_METRICS_TOKEN`: ```bash cd ~/code/learning_ai_common_plat grep -q '^FLEET_METRICS_TOKEN=' .env || \ printf '\nFLEET_METRICS_TOKEN=changeme-fleet-metrics-token\n' >> .env # host path: restart Terminal-1 tsx so it re-reads .env # docker path: docker compose up -d platform-service ``` Verify (if Prometheus is up): `http://localhost:9090/api/v1/targets` → `platform-service-fleet` is `up`. The value must equal `credentials:` in `services/monitoring/prometheus/prometheus.yml`. --- ## 5. Mint a local API token (dev only) `platform-service` verifies HS256 JWTs signed with `JWT_SECRET` and requires `type: "access"`. The tracker-web UI obtains one via login; for scripts/agents and the factory, mint one directly. **Local dev only — never commit tokens or the secret.** Save `mint-token.mjs` (resolve `jose` from the workspace): ```js import { readFileSync } from 'node:fs'; // adjust the jose path to your checkout if needed: import { SignJWT } from '/ABS/PATH/learning_ai_common_plat/node_modules/.pnpm/jose@5.10.0/node_modules/jose/dist/node/esm/index.js'; const env = readFileSync('/ABS/PATH/learning_ai_common_plat/.env', 'utf8'); const secret = new TextEncoder().encode(env.match(/^JWT_SECRET=(.*)$/m)[1].trim()); const ttl = process.argv[2] || '15m'; // e.g. '15m' for scripts, '24h' for a factory process.stdout.write( await new SignJWT({ sub: 'local-dev', role: 'admin', type: 'access' }) .setProtectedHeader({ alg: 'HS256' }) .setIssuedAt() .setExpirationTime(ttl) .sign(secret) ); ``` ```bash node mint-token.mjs 15m > /tmp/tok # short-lived, for API calls node mint-token.mjs 24h > /tmp/factok # longer-lived, for the factory daemon ``` > Find the jose path with: > `find . -path '*/node_modules/jose/dist/*/esm/index.js' | head -1`. Requests must also carry the **product**: header `X-Product-Id: ` (e.g. `notelett`). `role: admin` bypasses tenant ownership checks when `FLEET_TENANT_ENFORCEMENT` is on (it's off by default). --- ## 6. Submit a job ### Via tracker-web (preferred) Open `http://localhost:3003/dashboard/fleet/jobs`, "New job". **Set the correct product first** (the product selector) — a job is partitioned by `productId`, and submitting under the wrong product misattributes cost/metrics/ownership and the factory won't see it under the product it polls. PR‑mode fields that matter: - **`repo`** — must be `owner/name` (e.g. `saravanakumardb1/learning_ai_notes`) or a clone URL, **not** a bare name (the factory feeds it to `gh`). - **`baseBranch`** — e.g. `main`. - **`engine`** — `devin` (pins the agent; otherwise the factory's default/engineClass). - **`autoMerge`** — leave **`false`** for a human merge gate (recommended for large PRs). ### Via API ```bash JOB=$(curl -s -X POST http://localhost:4003/api/fleet/jobs \ -H "Authorization: Bearer $(cat /tmp/tok)" \ -H "X-Product-Id: notelett" -H 'Content-Type: application/json' \ -d '{ "idempotencyKey": "notelett-demo-1", "bodyMd": "# Task\n…full prompt…", "priority": "high", "engine": "devin", "repo": "saravanakumardb1/learning_ai_notes", "baseBranch": "main", "autoMerge": false }') echo "$JOB" # → { outcome: "created", job: { id: "fjob_…", stage: "queued", ... } } ``` The job is now `queued` and claimable. It will **not run** until a factory polls for its product (next step). --- ## 7. Start the factory (agent-queue, fleet mode) The factory lives in a **separate repo**: `learning_ai_devops_tools/agent-queue`. Run it on the **host** (needs `devin` + `gh`). Read its `docs/RUN_POLICY.md` first. ### 7a. Sanity‑check connectivity (safe — registers + heartbeats only) ```bash cd learning_ai_devops_tools/agent-queue ./agent-queue.sh init # idempotent AQ_FLEET=1 AQ_FLEET_ROUTE=1 \ AQ_FLEET_API=http://localhost:4003/api \ AQ_PRODUCT_ID=notelett \ AQ_FLEET_TOKEN="$(cat /tmp/factok)" \ AQ_FACTORY_ID=mac-local-1 \ AQ_FLEET_CAPS=engine:devin \ AQ_FLEET_LEASE_RENEW_SEC=60 \ ./agent-queue.sh fleet-status # → "heartbeat OK (registered)." ``` ### 7b. Launch the run loop (claims + runs the agent) ```bash cd learning_ai_devops_tools/agent-queue AQ_FLEET=1 AQ_FLEET_ROUTE=1 AQ_FLEET_PR=1 \ AQ_FLEET_API=http://localhost:4003/api \ AQ_PRODUCT_ID=notelett \ AQ_FLEET_TOKEN="$(cat /tmp/factok)" \ AQ_FACTORY_ID=mac-local-1 \ AQ_FLEET_CAPS=engine:devin \ AQ_FLEET_LEASE_RENEW_SEC=60 \ ./agent-queue.sh run --max 1 ``` > ⚠️ **Set `AQ_FLEET_LEASE_RENEW_SEC` below 90 (e.g. 60).** This is the heartbeat/ > lease‑renew cadence. The coordinator's reaper marks a factory **stale after 90s** > (`DEFAULT_STALE_FACTORY_MS`, a constant — no env knob) and **reclaims its in‑flight > lease**. The default cadence is **300s**, so a busy single‑slot factory looks stale > for most of every cycle and its running job gets requeued mid‑run (`leaseEpoch` > climbs, stage flips back to `queued`, and the final report is **fenced** so the job > never tidies to `review`/`shipped`). 60s keeps it comfortably live. (Add the same env > to the §7a `fleet-status` check for consistency.) Key fleet env vars (see `lib/fleet-client.sh`): | Var | Meaning | | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `AQ_FLEET=1` | master switch — enable coordinator calls (0 = pure offline) | | `AQ_FLEET_ROUTE=1` | coordinator is **authoritative** for claim (pulls work from platform-service) | | `AQ_FLEET_PR=1` | PR mode — open a PR for jobs that target a `repo` | | `AQ_FLEET_API` | base URL **including `/api`** (`http://localhost:4003/api`) | | `AQ_FLEET_TOKEN` | **Bearer JWT** (mint per §5; ≥ run duration, e.g. 24h) | | `AQ_PRODUCT_ID` | product to poll — sent as `X-Product-Id` (must match the job's product) | | `AQ_FACTORY_ID` | this factory's id (registered/heartbeated) | | `AQ_FLEET_CAPS` | advertised capabilities, e.g. `engine:devin` | | `AQ_FLEET_LEASE_RENEW_SEC` | **set `<90`** (e.g. `60`) — heartbeat/renew cadence vs the 90s stale window (see warning) | | `AQ_FLEET_REPO_BASE` | _(optional)_ dir of local checkouts; if `…//.git` exists it uses a **git worktree**, else it `git clone`s `https://github.com/.git` into its cache | | `AQ_FLEET_AUTOSHIP=1` | _(optional)_ auto-advance to `shipped` (skips the manual gate) | The run loop `claim → assigned → building`, runs Devin in an isolated checkout, heartbeats + renews the lease (`lease_renewed` events) so the reaper doesn't reclaim it, then on agent exit moves to `review` and (PR mode) opens the PR. With `autoMerge:false` it **stops at the human merge gate**. > **Repo checkout:** the job's `repo` is `owner/name`, so by default the factory > `git clone`s `https://github.com//.git` into its own cache > (`queue/.state/repos/…`) — clean isolation, nothing touches your working copies. To > reuse an existing local clone via a **git worktree** instead, set > `AQ_FLEET_REPO_BASE=` where `///.git` exists. --- ## 8. Observe progress - **Factory/agent logs (the live Devin transcript):** use the helper `scripts/fleet-logs.sh` (auto-finds the agent-queue logs; takes a full or partial job id, defaults to the newest job): ```bash scripts/fleet-logs.sh ls # list jobs: slot + step count scripts/fleet-logs.sh status 3c0586ce # steps count + slot + latest step scripts/fleet-logs.sh steps 3c0586ce 20 # last 20 transcript steps scripts/fleet-logs.sh watch 3c0586ce # live-refresh the tail scripts/fleet-logs.sh tail 3c0586ce # follow the runner lifecycle .log scripts/fleet-logs.sh full 3c0586ce # all agent messages in your pager ``` (Override the factory location with `AQ=/path/to/agent-queue`. Needs `jq` for the transcript commands.) - **tracker-web:** `http://localhost:3003/dashboard/fleet/jobs/` — live event stream (SSE), runs, PR link + state. (Select the job's **product** in the UI first, or it shows "job does not exist" — every call is scoped by `X-Product-Id`.) - **Events/API:** ```bash curl -s http://localhost:4003/api/fleet/jobs//events \ -H "Authorization: Bearer $(cat /tmp/tok)" -H "X-Product-Id: notelett" ``` - **Metrics:** `GET /api/fleet/metrics` (JSON, per product) · `GET /api/fleet/metrics/prom` (Prometheus, all products; needs `FLEET_METRICS_TOKEN`) · Grafana **Fleet Overview** (`http://localhost:3000/d/fleet-overview`). - **Autoscale signal:** `GET /api/fleet/autoscale` (this product) / `…/autoscale/all`. ### PR‑state reconcile (externally‑merged PRs) If you merge the PR in the GitHub UI, the coordinator doesn't know until told. Trigger a reconcile (flips run `prState → merged` when `gh pr view` reports MERGED): - UI: **"Refresh PR status"** button on the job's PR section, or - API: `POST /api/fleet/jobs//pr/reconcile`. > Requires `gh` where platform-service runs → use the **host path** (§3 Option A); > it's a no‑op in the Docker container (no `gh`). --- ## 9. Safety & cost - **Billable + autonomous + long‑running.** Each run consumes Devin credits and can run for a long time unattended. Scope jobs deliberately; very large multi‑workstream specs are better split into several jobs. - **Real PR.** PR mode pushes a branch and opens a PR on the target repo. Keep `autoMerge:false` so a human reviews/merges; `gh pr merge` (auto) only fires when the job opts in or `FLEET_SHIP_MERGES_PR=1`. - **Isolation.** The factory works in an isolated worktree/clone, never your main checkout (per `agent-queue/docs/RUN_POLICY.md`). Avoid blanket `--yolo` on live trees. - **Stopping the daemon** mid‑run lets the lease expire; the coordinator's reaper then reclaims and requeues the job (so partial work may be retried). Stop intentionally. - **Tokens/secrets:** the minted JWT and `JWT_SECRET` are sensitive — never commit them or paste into shared logs. `.env` is git‑ignored; keep it that way. --- ## 10. Teardown ```bash # stop the factory: Ctrl-C the run loop # host path: Ctrl-C the tsx (Terminal 1) and pnpm dev (Terminal 2) # docker path: # cd ~/code/learning_ai_common_plat && docker compose down # keep volumes # docker compose down -v # also drop volumes rm -f /tmp/tok /tmp/factok # discard minted tokens ``` --- ## 11. Troubleshooting | Symptom | Cause → Fix | | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `Cannot find module '@bytelyst/…/dist/index.js'` on `tsx`/Next start | workspace packages not built → `pnpm -r build` (§2.2). | | `401 {"error":"Invalid or expired token"}` | JWT expired/mis‑signed → re‑mint (§5); ensure same `JWT_SECRET` as the running service. | | Job claimed then flips back to `queued` mid‑run; `leaseEpoch` keeps climbing; final report **fenced**; PR opens but job never reaches `review`/`shipped` | factory heartbeat cadence (`AQ_FLEET_LEASE_RENEW_SEC`, default **300s**) > reaper stale window (**90s**) → set `AQ_FLEET_LEASE_RENEW_SEC=60` (§7). To recover the record after the fact, reconcile PR state (§8). | | Job stays `queued`, never claimed | No factory for that product → `fleet-status` shows it registered? `AQ_PRODUCT_ID` must equal the job's product. Check `GET /api/fleet/factories` (X‑Product‑Id) for `0 live`. | | `POST …/pr/reconcile` or ship auto‑merge does nothing | `gh` not present where platform-service runs (Docker container) → run the host path (§3 Option A). | | Prometheus target `platform-service-fleet` = `down (401)` | service missing `FLEET_METRICS_TOKEN` → §4 (restart host `tsx` / recreate container). | | `prototype-up.sh` build fails on `corepack prepare pnpm` | dashboard image network issue → use the targeted subset, or just use the host path (Option A). | | `POST …/actions/` returns 500 "Body cannot be empty" | sent `Content-Type: application/json` with no body → omit the header or send `{}`. | | Port `4003` conflict | host `tsx watch` and a `platform-service` container both bind `4003` → run only one. | | `gh pr create` fails | `repo` is a bare name → must be `owner/name` or a clone URL; confirm `gh auth status`. | | PR/cost attributed to wrong product | job submitted under the wrong `productId` partition → resubmit under the right product and cancel the stray (`POST …/actions/cancel`). | | `vitest` exits non‑zero with `kill EPERM` after all suites pass | worker‑pool teardown artifact (sandbox), not a test failure → re‑run; all suites already passed. | --- ## 12. Copy‑paste quickstart — all localhost (notelett → learning_ai_notes) Assumes §2.1/§2.2 done (`.env` filled, `pnpm -r build` run). Four terminals. ```bash # Terminal 1 — coordinator cd ~/code/learning_ai_common_plat/services/platform-service pnpm exec tsx watch --env-file=../../.env src/server.ts # Terminal 2 — dashboard cd ~/code/learning_ai_common_plat/dashboards/tracker-web && pnpm dev # :3003 # Terminal 3 — tokens + submit (save mint-token.mjs from §5; fix ABS paths) node mint-token.mjs 15m > /tmp/tok node mint-token.mjs 24h > /tmp/factok curl -s -X POST http://localhost:4003/api/fleet/jobs \ -H "Authorization: Bearer $(cat /tmp/tok)" -H "X-Product-Id: notelett" \ -H 'Content-Type: application/json' \ -d '{"idempotencyKey":"notelett-demo-1","bodyMd":"# Task…","priority":"high","engine":"devin","repo":"saravanakumardb1/learning_ai_notes","baseBranch":"main","autoMerge":false}' # Terminal 4 — factory (runs Devin → opens a real PR). NOTE the <90s heartbeat cadence. cd ~/code/learning_ai_devops_tools/agent-queue && ./agent-queue.sh init AQ_FLEET=1 AQ_FLEET_ROUTE=1 AQ_FLEET_PR=1 AQ_FLEET_API=http://localhost:4003/api \ AQ_PRODUCT_ID=notelett AQ_FLEET_TOKEN="$(cat /tmp/factok)" \ AQ_FACTORY_ID=mac-local-1 AQ_FLEET_CAPS=engine:devin AQ_FLEET_LEASE_RENEW_SEC=60 \ ./agent-queue.sh run --max 1 ``` --- ## 13. WSL on Windows — differences to note The flow is identical **inside a WSL2 (Ubuntu) shell**, with these adjustments. Treat WSL as "the Linux host" — install and run **everything inside WSL**, not Windows. - **Keep repos on the WSL filesystem, not `/mnt/c`.** Clone under e.g. `~/code` inside WSL. On `/mnt/c` (the Windows drive over 9p) `tsx watch`/Next file‑watching is unreliable (inotify doesn't fire) and git/pnpm are far slower. This is the single most important difference. - **Install the toolchain inside WSL** (Linux builds): `node`/`pnpm` (nvm), `git`, **`gh`**, and the **`devin` CLI** — and run `gh auth login` + Devin auth **inside WSL**. A `gh`/ `devin` installed on Windows is not visible to the WSL bash factory. - **Line endings.** Clone inside WSL (don't reuse a Windows checkout with `core.autocrlf=true`) so the `*.sh` scripts stay LF — CRLF breaks `agent-queue.sh` (`bad interpreter`/`\r`). If needed: `git config --global core.autocrlf input`. - **Reaching the UI from the Windows browser.** WSL2 forwards `localhost`, so `http://localhost:3003` / `:4003` usually work from a Windows browser. If they don't (older Windows / mirrored‑networking off), use the WSL IP (`hostname -I`) or set `networkingMode=mirrored` in `.wslconfig`. - **Ports.** Make sure nothing on the **Windows** side already binds `3003`/`4003` (WSL2 publishes to the same localhost). Stop the Windows process or change ports. - **Docker (Option B), if used.** Use **Docker Desktop with the WSL2 backend** and run `docker compose` from inside the WSL shell. `host.docker.internal` resolves from containers to the host as on Mac. - **`/tmp` token paths** (`/tmp/tok`, `/tmp/factok`) are the WSL `/tmp` — fine; just keep all four terminals in the same WSL distro so they share it. - **Clock skew.** If WSL's clock drifts after sleep, JWT `iat/exp` checks can fail (`Invalid or expired token`) — `sudo hwclock -s` (or restart WSL) to resync. Everything else — env vars, `pnpm -r build`, `tsx --env-file`, the factory env incl. `AQ_FLEET_LEASE_RENEW_SEC=60`, token minting — is identical to the Mac host path. --- ### Reference - Coordinator routes: `services/platform-service/src/modules/fleet/routes.ts` - Coordinator logic: `services/platform-service/src/modules/fleet/coordinator.ts` - Factory fleet client: `learning_ai_devops_tools/agent-queue/lib/fleet-client.sh` - Factory runner + PR mode: `learning_ai_devops_tools/agent-queue/agent-queue.sh` - Gigafactory spec/roadmap: `learning_ai_devops_tools/agent-queue/docs/GIGAFACTORY/` - Prometheus scrape config: `services/monitoring/prometheus/prometheus.yml` - Grafana dashboard: `services/monitoring/grafana/dashboards/fleet-overview.json`