# Fleet Control Plane — Operational Guide > Phase 3 of the Agent Gigafactory. Adds tunable scoring, preemption, DAG decomposition, per-product budgets, and a tracker-web UI. ## Feature Flags All Phase 3 features are **gated behind environment variables** (default OFF) for safe rollout: | Flag | Default | Effect | | ------------------ | ------- | ----------------------------------------------------------------------------- | | `FLEET_PREEMPTION` | `""` | Enables seat-limit enforcement + critical-job preemption | | `FLEET_BUDGETS` | `""` | Enables per-product USD ceiling enforcement. Pauses jobs when budget exceeded | Set to any truthy value (`"1"`, `"true"`, `"yes"`) to enable. ## Tunable Scoring Weights Scoring determines which queued job a factory picks up next. The formula: ``` score = w.age * ageMinutes + w.priority * priorityOrder + w.retries * attempts + w.capabilities * capabilityBonus ``` ### Weight Resolution Order 1. **Per-request override** — `weights` field in `POST /fleet/jobs/:id/claim` body 2. **Product registry** — set via `setWeightRegistry({ [productId]: weights })` 3. **Defaults** — `{ age: 1, priority: 10, retries: -2, capabilities: 5 }` Each level does a **per-field merge** (not full object replacement). ## Preemption When `FLEET_PREEMPTION` is enabled and a factory is at its `seatLimit`: 1. A critical-priority job arrives in `claimNextJob` 2. `selectPreemptionVictim(runningJobs, incomingJob)` picks the lowest-scoring running job 3. The victim is evicted: its lease is released with `checkpoint: true`, ensuring the job can resume 4. The critical job takes the freed seat 5. An event `{ type: 'preempted', victim, preemptor }` is recorded **Rules:** - Only `critical` priority can trigger preemption - Never preempts jobs of equal or higher priority - Capability mismatch disqualifies a factory from preemption ## DAG Job Decomposition Submit a composite job with children for parallel fan-out: ```http POST /fleet/jobs { "idempotencyKey": "parent-job", "kind": "composite", "children": [ { "idempotencyKey": "child-1", "bodyMd": "..." }, { "idempotencyKey": "child-2", "bodyMd": "..." } ] } ``` Or add children later: ```http POST /fleet/jobs/:parentId/children { "children": [ { "idempotencyKey": "child-3", "bodyMd": "..." } ] } ``` **Behavior:** - Parent is automatically blocked until all children complete (children's idempotency keys become parent deps) - Children unblock parent via `maybeUnblockParent()` when transitioning to `shipped`/`done` - View the full DAG: `GET /fleet/jobs/:id/dag` ## Per-Product Budgets Control spend per product with USD ceilings: ```http PUT /fleet/budgets/:productId { "ceilingUsd": 100, "window": "monthly" } ``` | Endpoint | Method | Effect | | ---------------------------------- | ------ | ----------------------- | | `/fleet/budgets/:productId` | GET | Read current budget | | `/fleet/budgets/:productId` | PUT | Create/update ceiling | | `/fleet/budgets/:productId/pause` | POST | Manually pause spending | | `/fleet/budgets/:productId/resume` | POST | Resume spending | **Enforcement:** When `FLEET_BUDGETS` is enabled, `claimNextJob` checks budget status FIRST. If paused or ceiling exceeded → returns null (no job scan). **Auto-pause:** `accrueSpend(productId, amount)` auto-pauses when `spentUsd >= ceilingUsd`. ## Fleet Control Plane UI (tracker-web) Navigate to **Dashboard → Fleet** in tracker-web. ### Pages | Route | Description | | ---------------------------- | ----------------------------------------------- | | `/dashboard/fleet` | Overview — factory health cards + recent jobs | | `/dashboard/fleet/jobs` | Job list with stage filter tabs | | `/dashboard/fleet/jobs/[id]` | Job detail — events, runs, artifacts, DAG, SHIP | | `/dashboard/fleet/budget` | Budget view — spend bar, pause/resume controls | ### Graceful Degradation The UI calls platform-service fleet endpoints via `/api/fleet/[...path]` proxy. If the fleet module returns 404 (flags off), pages display informational empty states instead of errors. ### Configuration | Env Var | Default | Purpose | | ------------------ | ----------------------- | ----------------------------------- | | `PLATFORM_API_URL` | `http://localhost:4003` | Platform-service base URL for proxy | ## Job Lifecycle & Shipping (testing → shipped) Stages: `queued → assigned → building → review → testing → shipped` (plus `blocked`, `failed`, `dead_letter`). A factory drives `assigned → building → review`, then runs its local verify gate. There are two ways a job reaches the terminal `shipped` stage (the `testing → shipped` transition has no claimable lease holder after the review gate, so it is driven by one of): 1. **Factory autoship** (`AQ_FLEET_AUTOSHIP=1` on the agent-queue factory): when the factory's local verify passes it reports `testing`, then advances the coordinator job `testing → shipped` autonomously (the factory's verify **is** the test phase). This is the autonomous `submit → … → shipped` path. Default off. 2. **`ship` operator action** (`POST /fleet/jobs/:id/actions/:action` with `ship`): an operator/controller marks a non-terminal job `shipped`. Lease-free (works after the human review gate), idempotent, and retries on optimistic- concurrency conflict. With `AQ_FLEET_AUTOSHIP=0` (default) a verify-passing job rests at `testing` for the **human review gate** (`review/request` + multi-reviewer `review` approve) or a manual `ship`. Whenever a job reaches `shipped` (autoship PATCH, `ship` action, or a terminal lease release), the coordinator mirrors the outcome onto the latest **run** (`result = 'shipped'`, `endedAt` set) and — if budgets are enabled — accrues that run's `insights.costUsd`. So the dashboard's per-run result/cost/tokens stay consistent with the job stage. ### PR deliverable (PR mode) A job may carry an optional **`repo`** (`owner/name` or a clone URL) + **`baseBranch`**. When the factory runs with `AQ_FLEET_PR=1`, it runs the agent in an isolated checkout on branch `aq/job/`, then commits, pushes, and opens a PR via `gh`. The PR URL - branch are reported on lease release and recorded on the **run** (`run.prUrl`, `run.branch`) — the dashboard shows a **PR ↗** link in the job's Runs table. Submit `repo`/`baseBranch` from the dashboard "New Job" form or the `POST /fleet/jobs` body. This round opens the PR (merge stays a human/CI step); opt-in auto-merge is a planned follow-up. ## API Reference Summary | Endpoint | Method | Phase | Notes | | ---------------------------------- | ------ | ----- | --------------------------------------------------------- | | `/fleet/jobs` | GET | 2 | List jobs (query: stage, productId, limit, offset) | | `/fleet/jobs` | POST | 2 | Submit job (+ optional children[] for DAG) | | `/fleet/jobs/:id` | GET | 2 | Get job | | `/fleet/jobs/:id` | PATCH | 2 | Update stage (fenced) | | `/fleet/jobs/:id/actions/:action` | POST | 3 | Operator action: `requeue` / `reject` / `cancel` / `ship` | | `/fleet/jobs/:id/lease/release` | POST | 2 | Release lease (optional `stage`, `insights`, `result`) | | `/fleet/jobs/:id/claim` | POST | 2 | Factory claims next job | | `/fleet/jobs/:id/children` | POST | 3 | Add children to existing job | | `/fleet/jobs/:id/dag` | GET | 3 | Get DAG subtree | | `/fleet/factories` | GET | 2 | List factories | | `/fleet/factories/:id/heartbeat` | POST | 2 | Factory heartbeat | | `/fleet/budgets/:productId` | GET | 3 | Get budget | | `/fleet/budgets/:productId` | PUT | 3 | Upsert budget | | `/fleet/budgets/:productId/pause` | POST | 3 | Pause budget | | `/fleet/budgets/:productId/resume` | POST | 3 | Resume budget | ## Architecture Decisions 1. **Feature flags default OFF** — zero breaking changes to Phase 2 behavior 2. **Budget checked first** — avoids expensive job scan when budget is exhausted 3. **DAG via deps array** — reuses existing dependency resolution; no new scheduler logic needed 4. **Preemption requires seat limit** — only triggers when factory genuinely can't take more work 5. **UI degrades gracefully** — all API calls handle 404 → null/empty; no hard failures