docs(roadmap): P3 platform deepening roadmap — reviewed and audited

- 6 phases: Event Bus, Agent Runtime, AI Budget, AI Evals, Reviews, Support Cases - 7-sprint mapping (14 weeks) with parallelization opportunities - Cross-referenced all claims against actual codebase — 15 bugs/gaps found and fixed: - DurableEventBus already exists in @bytelyst/events (not just in-memory) - jobs/ has 25 tests (not 6), support-cases/ has 4 (not 3) - ai-budgets already has verdict engine (252 LOC), alert generation, model allowlists - BudgetScopeType only supports product+agent (not org/workspace yet) - reviews/notifications.ts already has notifyReviewAssigned() - Phase 2 effort overflowed (17d in 15d sprint) — rebalanced - Test baseline corrected to 1,278 (not 1,308) - Identified Cosmos QueueStore gap as critical path for Phase 1 - ai-diagnostics has 5,235 LOC but 0 tests — flagged as risk - Estimated ~138 new tests bringing total to ~1,416
2026-03-20 01:20:49 -07:00 · 2026-03-20 01:20:49 -07:00 · 17f5671595
commit 17f5671595
parent 9e510f7b49
1 changed files with 342 additions and 0 deletions
--- a/docs/roadmaps/P3_PLATFORM_DEEPENING_ROADMAP.md
+++ b/docs/roadmaps/P3_PLATFORM_DEEPENING_ROADMAP.md
@ -0,0 +1,342 @@
+# P3 — Platform Deepening Roadmap
+
+> **Scope:** 6 remaining P3 work items for `learning_ai_common_plat`  
+> **Created:** 2026-03-20  
+> **Status:** Draft — pending review
+
+---
+
+## Executive Summary
+
+All P0–P2 work is complete. The 6 remaining P3 items deepen **already-scaffolded** modules in `platform-service`. Every module listed below already has `types.ts`, `repository.ts`, `routes.ts`, and tests. The work is to add production-quality features, cross-module integrations, and comprehensive test coverage.
+
+### Current Scaffold Inventory (verified 2026-03-20)
+
+| Module            | LOC   | Files | Tests | Status                                                               |
+| ----------------- | ----- | ----- | ----- | -------------------------------------------------------------------- |
+| `jobs/`           | 1,269 | 10    | 25    | Runner + cron + built-in jobs (most mature scaffold)                 |
+| `runs/`           | 680   | 7     | 5     | Run + step tracking + tracker utility                                |
+| `reviews/`        | 424   | 6     | 3     | Review queue with decisions + notification wiring                    |
+| `agent-evals/`    | 704   | 5     | 4     | Eval definitions + results                                           |
+| `ai-budgets/`     | 681   | 5     | 4     | Budget policies + spend tracking + alert generation + verdict engine |
+| `ai-diagnostics/` | 5,235 | 10    | 0     | NL query, clustering, LLM analysis (NO tests)                        |
+| `support-cases/`  | 514   | 5     | 4     | Cases + notes + escalation                                           |
+
+### Related Packages Already Built
+
+| Package                      | Purpose                                                                | Maturity             |
+| ---------------------------- | ---------------------------------------------------------------------- | -------------------- |
+| `@bytelyst/events`           | `EventBus` (in-memory) + `DurableEventBus` (queue-backed with polling) | **Has durable mode** |
+| `@bytelyst/event-store`      | Persistent event log (file-store + memory-store)                       | Scaffolded           |
+| `@bytelyst/queue`            | In-process task queue with `QueueWorker` + pluggable stores            | Scaffolded           |
+| `@bytelyst/webhook-dispatch` | Webhook delivery with HMAC signing + retry                             | Production           |
+| `@bytelyst/fastify-sse`      | Server-Sent Events hub + plugin                                        | Production           |
+| `@bytelyst/llm-router`       | LLM provider routing, fallback, health checks                          | Production           |
+| `@bytelyst/llm`              | LLM client abstraction (factory, testing mock)                         | Production           |
+
+---
+
+## Sprint Plan (Next 3 Sprints)
+
+For 2-week sprints, here's the recommended execution order:
+
+| Sprint       | Weeks | Focus                                      | Deliverables                                                                             |
+| ------------ | ----- | ------------------------------------------ | ---------------------------------------------------------------------------------------- |
+| **Sprint 1** | 1–2   | Phase 1: Event Bus core + worker hardening | Event subscription registry, dispatcher wiring, DLQ, worker improvements, ~20 tests      |
+| **Sprint 2** | 3–4   | Phase 1 finish + Phase 2 start             | Event replay, remaining event bus tests, agent executor, tool binding runtime, ~25 tests |
+| **Sprint 3** | 5–6   | Phase 2 finish                             | Run streaming, agent scheduling, cancellation, token tracking, agent metrics, ~25 tests  |
+
+After sprint 3, Phases 3–6 can proceed (2 weeks each, Phases 3+6 parallelizable).
+
+---
+
+## Phase 1 — Durable Event Bus + Worker Runtime (3 weeks)
+
+**Goal:** Wire the existing `DurableEventBus` and `@bytelyst/queue` into a subscription-driven dispatch system that powers webhooks, notifications, and job triggers across all modules.
+
+### What Exists (already built)
+
+- `@bytelyst/events` — `EventBus` (in-memory) + **`DurableEventBus`** (queue-backed with `QueueWorker` polling, 153 LOC)
+- `@bytelyst/event-store` — persistent event log (file-store + memory-store implementations)
+- `@bytelyst/queue` — `QueueWorker` with pluggable `QueueStore` (file-store + memory-store)
+- `modules/jobs/` — job runner with cron scheduling, built-in jobs, registry (1,269 LOC, **25 tests**)
+- `modules/webhooks/` — HMAC-signed delivery with retry + auto-disable
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                                                                            | Effort | Priority |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 1.1 | **Event subscription registry** — new `modules/event-subscriptions/` module: Cosmos container `event_subscriptions` with topic, handler type (webhook / job / notification / SSE), filter expression, active flag. CRUD routes. | 2d     | Critical |
+| 1.2 | **Event dispatcher** — new `src/lib/event-dispatcher.ts`: consumes `DurableEventBus`, on each event looks up matching subscriptions, routes to handler (invoke webhook-dispatch, trigger job, push notification, broadcast SSE) | 3d     | Critical |
+| 1.3 | **Cosmos outbox store** — `QueueStore` implementation backed by Cosmos (currently only file + memory stores exist in `@bytelyst/queue`), so `DurableEventBus` can persist across restarts                                       | 2d     | Critical |
+| 1.4 | **Dead-letter queue** — failed events after max retries go to `event_dlq` container with retry/purge admin endpoints                                                                                                            | 1d     | High     |
+| 1.5 | **Worker runtime hardening** — `modules/jobs/runner.ts`: add concurrency limits, graceful shutdown, heartbeat liveness, stuck-job recovery                                                                                      | 2d     | High     |
+| 1.6 | **Event replay** — admin endpoint to replay events from event-store by time range or topic (idempotency keys prevent duplicates)                                                                                                | 1d     | Medium   |
+| 1.7 | **Tests** — subscription CRUD tests, dispatcher routing tests, Cosmos queue store tests, DLQ tests, worker lifecycle tests                                                                                                      | 2d     | Critical |
+
+**Deliverables:** `event_subscriptions` + `event_dlq` containers, Cosmos-backed `QueueStore`, dispatcher wired into `server.ts` startup, ~25 new tests.
+
+**Dependencies:** None — foundational for all subsequent phases.
+
+> **Note:** The roadmap originally proposed creating a new `@bytelyst/event-bus` package, but `DurableEventBus` already exists in `@bytelyst/events`. The real gap is a Cosmos-backed `QueueStore` (only file + memory stores exist) and the subscription registry + dispatcher.
+
+---
+
+## Phase 2 — Agent Runtime Orchestration (3 weeks)
+
+**Goal:** Complete the agent execution lifecycle — from definition to versioned deployment, run tracking, step execution, and observability.
+
+### What Exists
+
+- `modules/agents/` — agent registry with version lifecycle (publish/deprecate), key lookup (13 tests)
+- `modules/runs/` — run + step tracking with status machine (5 tests)
+- `modules/runs/tracker.ts` — run tracking utility (118 LOC)
+- `@bytelyst/llm-router` — provider/model selection with fallback + health
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                                          | Effort | Priority |
+| --- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 2.1 | **Agent executor** — new `modules/agents/executor.ts`: resolve published version → build prompt → select model via llm-router → create run (via `tracker.ts`) → execute steps → record output | 3d     | Critical |
+| 2.2 | **Tool binding runtime** — resolve `toolBindings[]` from agent version to callable functions, sandboxed execution with timeout + token limits (allowlist-only, no arbitrary code)             | 2d     | Critical |
+| 2.3 | **Run step streaming** — SSE endpoint `GET /runs/:id/stream` for real-time step progress (consumes `@bytelyst/fastify-sse`)                                                                   | 1d     | High     |
+| 2.4 | **Agent scheduling** — wire agents into jobs/cron: `POST /agents/:id/schedule` creates a recurring job that triggers agent execution                                                          | 1d     | High     |
+| 2.5 | **Parent-child runs** — enable `parentRunId` linking for multi-agent orchestration (agent A triggers agent B), DAG query endpoint                                                             | 1d     | Medium   |
+| 2.6 | **Run cancellation** — `POST /runs/:id/cancel` with graceful abort propagation to in-flight LLM calls                                                                                         | 1d     | High     |
+| 2.7 | **Token usage tracking** — extend `RunStepDoc` with `promptTokens`, `completionTokens`, `costUsd`; auto-record into `ai-budgets` spend via existing `POST /ai-budgets/spend` endpoint         | 1d     | High     |
+| 2.8 | **Agent metrics** — `GET /agents/:id/metrics`: success rate, avg latency, token cost, run count (aggregated from runs collection)                                                             | 2d     | Medium   |
+| 2.9 | **Tests** — executor unit tests, tool binding tests, scheduling tests, cancellation tests, metrics tests                                                                                      | 2d     | Critical |
+
+> **Effort total: 14d** (fits in 3 weeks with 1d buffer)
+
+**Deliverables:** Agent executor pipeline, tool runtime, SSE streaming, scheduling integration, ~30 new tests.
+
+**Dependencies:** Phase 1 (events for run lifecycle events, job runner for scheduling).
+
+> **Note:** `modules/runs/tracker.ts` (118 LOC) already provides run-tracking helpers. Task 2.1 builds on top of it rather than starting from scratch. `parentRunId` is already a field in `RunSchema` — task 2.5 adds the DAG query, not the schema.
+
+---
+
+## Phase 3 — AI Budget & Cost Governance (2 weeks)
+
+**Goal:** Extend existing budget verdict engine with org/workspace scopes, automated cost ingestion from runs, and cost reporting.
+
+### What Exists (already built — more than expected)
+
+- `modules/ai-budgets/` — budget policies + spend tracking + alert generation + verdict engine (681 LOC, 4 tests)
+- Types: `BudgetPolicyDoc` (limits by period, soft/hard thresholds), `BudgetSpendEntryDoc` (tracked spend per call), `BudgetAlertDoc` (severity: warn/block)
+- Scope types: currently `product` and `agent` only (via `BudgetScopeTypeSchema`)
+- `POST /ai-budgets/spend` **already evaluates** budget verdict (allow/warn/block), generates alerts at threshold breaches, enforces model allowlists
+- `GET /ai-budgets/policies/:id/status` already returns current spend vs. budget with verdict
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                                                                                                  | Effort | Priority |
+| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 3.1 | **Budget enforcement middleware** — Fastify preHandler wrapping the existing verdict logic: check budget before LLM calls, return 429 when `block` verdict. Currently callers must manually call `POST /ai-budgets/spend` — middleware automates this | 1d     | Critical |
+| 3.2 | **Expand scope types** — add `org` and `workspace` to `BudgetScopeTypeSchema`, implement scope inheritance (agent → workspace → org → product fallback chain)                                                                                         | 2d     | High     |
+| 3.3 | **Cost ingestion from runs** — subscribe to `run.completed` events (Phase 1), auto-record token costs via existing spend endpoint. Eliminates manual spend recording                                                                                  | 1d     | High     |
+| 3.4 | **Alert notifications** — wire existing `BudgetAlertDoc` creation into notifications module + optional webhook event dispatch (alert generation itself already works)                                                                                 | 1d     | High     |
+| 3.5 | **Cost breakdown API** — `GET /ai-budgets/costs`: breakdown by agent, model, time period, org. Supports CSV export                                                                                                                                    | 2d     | Medium   |
+| 3.6 | **Budget rollover** — configurable rollover policy: reset, carry-forward, or accumulate unused budget                                                                                                                                                 | 1d     | Low      |
+| 3.7 | **Tests** — enforcement middleware tests, scope resolution tests, event-driven ingestion tests, cost aggregation tests                                                                                                                                | 1d     | Critical |
+
+> **Effort total: 9d** (fits in 2 weeks with 1d buffer)
+
+**Deliverables:** Budget enforcement middleware, expanded scope types, event-driven cost ingestion, alert notifications, cost reporting, ~18 new tests.
+
+**Dependencies:** Phase 2 (token tracking from runs), Phase 1 (event-driven cost ingestion).
+
+> **Note:** The existing `POST /ai-budgets/spend` endpoint already has sophisticated verdict logic (252 LOC) with multi-policy evaluation, model allowlist enforcement, and alert generation. Phase 3 work is primarily about automation (middleware + event-driven ingestion) and scope expansion, not building the verdict engine from scratch.
+
+---
+
+## Phase 4 — AI Governance & Evals (2 weeks)
+
+**Goal:** Evaluate agent quality with automated test suites, regression detection, and compliance checks before version promotion.
+
+### What Exists
+
+- `modules/agent-evals/` — eval definitions + result storage (704 LOC, 4 tests)
+- `modules/agents/` — version lifecycle with publish/deprecate
+- `@bytelyst/llm-router` — model routing
+- `modules/ai-diagnostics/` — NL query, clustering, error normalization (5,235 LOC)
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                | Effort | Priority |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 4.1 | **Eval runner** — `POST /agent-evals/:id/execute`: run eval test cases against an agent version, record pass/fail/score per case                                    | 3d     | Critical |
+| 4.2 | **Eval test case management** — CRUD for test cases within an eval: input, expected output, scoring rubric (exact match, LLM-as-judge, regex, contains)             | 2d     | Critical |
+| 4.3 | **Regression detection** — compare eval results across agent versions: flag regressions where score drops >N%, block publish if regression gate is enabled          | 1d     | High     |
+| 4.4 | **Pre-publish gate** — optional policy: agent version cannot be published unless latest eval passes threshold (wired into `POST /agents/:id/versions/:vId/publish`) | 1d     | High     |
+| 4.5 | **Eval scheduling** — recurring evals on published versions (e.g., daily smoke test) via jobs/cron                                                                  | 1d     | Medium   |
+| 4.6 | **Eval report API** — `GET /agent-evals/:id/report`: aggregate results, version comparison chart data, trend over time                                              | 1d     | Medium   |
+| 4.7 | **Compliance checks** — configurable rules: max response length, PII detection, banned phrases, required disclaimers. Run as post-eval validation                   | 2d     | Medium   |
+| 4.8 | **Tests** — eval runner tests, regression detection tests, gate enforcement tests, compliance tests                                                                 | 1d     | Critical |
+
+**Deliverables:** Eval execution pipeline, test case management, regression gates, compliance engine, ~25 new tests.
+
+**Dependencies:** Phase 2 (agent executor for running evals), Phase 1 (events for eval completion notifications).
+
+---
+
+## Phase 5 — Human Review / Approval Queue (2 weeks)
+
+**Goal:** Deepen the review module into a full human-in-the-loop approval system for agent actions, content changes, and sensitive operations.
+
+### What Exists (already built)
+
+- `modules/reviews/` — review items with decisions + notification wiring (424 LOC, 3 tests)
+- `reviews/notifications.ts` — `notifyReviewAssigned()` already exists and is called on create/update
+- Review types: `ReviewItemDoc` with status machine (pending → assigned → approved/rejected/cancelled/expired)
+- `POST /reviews/:id/decision` — approve/reject/cancel with resolution audit trail (reason + actedBy + actedAt)
+- `dueAt` field already exists on `ReviewItemDoc` (but no auto-expiry job yet)
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                          | Effort | Priority |
+| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 5.1 | **Review policies** — configurable rules: which agent actions require review, auto-approve after N successful runs, escalation timers                                         | 2d     | Critical |
+| 5.2 | **Batch review** — `POST /reviews/batch-decide`: approve/reject multiple items with shared reason (max 50)                                                                    | 1d     | High     |
+| 5.3 | **Auto-expiry** — background job (via `modules/jobs/`) expires stale reviews past `dueAt`, with configurable default TTL per policy                                           | 1d     | High     |
+| 5.4 | **Delegation** — `POST /reviews/:id/delegate`: reassign review to another user with audit trail                                                                               | 1d     | Medium   |
+| 5.5 | **Review queue stats** — `GET /reviews/stats`: pending count by priority/category/assignee, avg resolution time, SLA compliance                                               | 1d     | High     |
+| 5.6 | **Review integration with agent runs** — when agent action requires review, run pauses at step, creates review item, resumes on approval (consumes Phase 2 executor)          | 2d     | Critical |
+| 5.7 | **Expand review notifications** — `notifyReviewAssigned()` already exists; add: review expiring soon, review decided, escalation triggered (wire into event bus from Phase 1) | 1d     | Medium   |
+| 5.8 | **Tests** — policy enforcement tests, batch review tests, auto-expiry tests, delegation tests, stats tests                                                                    | 1d     | Critical |
+
+> **Effort total: 10d** (fits in 2 weeks)
+
+**Deliverables:** Review policies, batch operations, auto-expiry job, agent integration, queue analytics, ~20 new tests.
+
+**Dependencies:** Phase 2 (agent run pause/resume), Phase 1 (events + job runner for expiry).
+
+> **Note:** The review module is more mature than typical scaffolds — it already has notification wiring, decision audit trails, and workspace-scoped reviews. The main gaps are policies (automation rules), batch operations, and the agent-run integration.
+
+---
+
+## Phase 6 — Support Case Management (2 weeks)
+
+**Goal:** Deepen support cases into a complete ticket system with SLA tracking, auto-triage, knowledge base integration, and customer communication.
+
+### What Exists (already built)
+
+- `modules/support-cases/` — cases + notes + escalation events (514 LOC, **4 tests**)
+- Types: `SupportCaseDoc` (7 statuses, 4 priorities, 4 sources), `SupportCaseNoteDoc` (internal/customer visibility), `SupportEscalationEventDoc`
+- Full CRUD routes: create/list/get/update cases, add notes, list notes, create escalation, list escalations
+- Linked fields: `runId`, `reviewId`, `knowledgeBaseId` already on `SupportCaseDoc`
+- `modules/knowledge/` — knowledge base with text search + retrieval (9 tests)
+- `modules/ai-diagnostics/` — NL query, error clustering, LLM analysis (5,235 LOC, 0 tests)
+
+### What Needs Building
+
+| #   | Task                                                                                                                                                                             | Effort | Priority |
+| --- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 6.1 | **SLA engine** — define SLA policies per priority (response time, resolution time), track compliance, fire alerts on breach via event bus                                        | 2d     | Critical |
+| 6.2 | **Auto-triage** — on case creation, use LLM to classify priority + category + suggest knowledge articles, auto-assign based on rules                                             | 2d     | High     |
+| 6.3 | **Knowledge integration** — `POST /support-cases/:id/suggest-articles`: search linked knowledge base (via existing `searchChunks`) for relevant content, attach top matches      | 1d     | High     |
+| 6.4 | **Case timeline** — unified timeline API merging notes, status changes, escalations, and linked run/review events                                                                | 1d     | High     |
+| 6.5 | **Case metrics** — `GET /support-cases/metrics`: open count by status/priority, MTTR, SLA compliance %, top categories                                                           | 1d     | Medium   |
+| 6.6 | **Customer communication** — internal vs. customer-visible notes (visibility field already exists on `SupportCaseNoteDoc`), email notification on customer-visible note creation | 1d     | Medium   |
+| 6.7 | **Case linking** — link related cases (duplicate, parent/child), merge duplicates with note consolidation                                                                        | 1d     | Medium   |
+| 6.8 | **Tests** — SLA engine tests, auto-triage tests, knowledge suggestion tests, timeline tests, metrics tests                                                                       | 1d     | Critical |
+
+> **Effort total: 10d** (fits in 2 weeks)
+
+**Deliverables:** SLA engine, auto-triage pipeline, knowledge integration, unified timeline, ~20 new tests.
+
+**Dependencies:** Phase 1 (events for SLA timer jobs). Phase 3 is a **soft dependency** (budget awareness for LLM triage calls — can use existing spend endpoint directly if Phase 3 isn't complete).
+
+> **Note:** The support-cases module already has robust types with visibility on notes, escalation events, and linked fields to runs/reviews/knowledge bases. Task 6.6 effort is reduced because the `visibility` enum (internal/customer) already exists on `SupportCaseNoteDoc` — the work is wiring email notifications, not schema changes.
+
+---
+
+## Summary Timeline
+
+```
+Phase 1: Durable Event Bus + Worker Runtime         [Weeks 1-3]   ██████████████
+Phase 2: Agent Runtime Orchestration                 [Weeks 4-6]   ██████████████
+Phase 3: AI Budget & Cost Governance                 [Weeks 7-8]   █████████
+Phase 4: AI Governance & Evals                       [Weeks 9-10]  █████████
+Phase 5: Human Review / Approval Queue               [Weeks 11-12] █████████
+Phase 6: Support Case Management                     [Weeks 13-14] █████████
+                                                                    │
+                                                    Total: ~14 weeks │
+```
+
+### Parallelization Opportunities
+
+- **Phase 6** (Support Cases) has only a soft dependency on Phase 3 — can run **in parallel** with Phases 3–5
+- **Phases 3 + 4** can overlap if token tracking (2.7) is completed early in Phase 2
+
+### Sprint Mapping (2-week sprints)
+
+| Sprint   | Weeks | Phases                             | Key Milestone                                  |
+| -------- | ----- | ---------------------------------- | ---------------------------------------------- |
+| Sprint 1 | 1–2   | Phase 1 (core)                     | Event subscriptions + dispatcher + DLQ working |
+| Sprint 2 | 3–4   | Phase 1 (finish) + Phase 2 (start) | Agent executor + tool binding prototype        |
+| Sprint 3 | 5–6   | Phase 2 (finish)                   | Full agent runtime with streaming + metrics    |
+| Sprint 4 | 7–8   | Phase 3 + Phase 6 (parallel)       | Budget middleware + SLA engine                 |
+| Sprint 5 | 9–10  | Phase 4 + Phase 6 (finish)         | Eval runner + pre-publish gates                |
+| Sprint 6 | 11–12 | Phase 5                            | Review policies + agent-run integration        |
+| Buffer   | 13–14 | Hardening                          | Cross-module integration testing, docs         |
+
+## Dependency Graph
+
+```
+Phase 1 (Event Bus)
+  ├── Phase 2 (Agent Runtime) ──── requires events + job runner
+  │     ├── Phase 3 (AI Budget) ── requires token tracking from runs (task 2.7)
+  │     ├── Phase 4 (AI Evals) ─── requires agent executor (task 2.1)
+  │     └── Phase 5 (Reviews) ──── requires agent run pause/resume (task 2.1)
+  └── Phase 6 (Support Cases) ──── requires events for SLA timers (soft dep on Phase 3)
+```
+
+## Estimated New Test Count
+
+> **Baseline:** 1,278 tests (verified 2026-03-20)
+
+| Phase             | New Tests | Cumulative |
+| ----------------- | --------- | ---------- |
+| 1 — Event Bus     | ~25       | 1,303      |
+| 2 — Agent Runtime | ~30       | 1,333      |
+| 3 — AI Budget     | ~18       | 1,351      |
+| 4 — AI Evals      | ~25       | 1,376      |
+| 5 — Reviews       | ~20       | 1,396      |
+| 6 — Support Cases | ~20       | 1,416      |
+| **Total**         | **~138**  | **~1,416** |
+
+## Risk Factors
+
+1. **LLM cost in evals** — Running eval suites against real LLMs can be expensive. Mitigate with mock mode + budget caps from Phase 3.
+2. **Cosmos outbox store** — `@bytelyst/queue` currently only has file + memory stores. A Cosmos-backed `QueueStore` is required for `DurableEventBus` to survive restarts. This is the critical path for Phase 1.
+3. **Tool binding security** — Agent tool execution needs sandboxing. Start with allowlist-only tools, no arbitrary code execution.
+4. **Phase coupling** — Phases 3–5 all depend on Phase 2. If Phase 2 slips, everything shifts. Mitigate by parallelizing Phase 6 (independent of Phase 2).
+5. **ai-diagnostics has 0 tests** — 5,235 LOC with zero test coverage. Not in P3 scope but a significant tech debt item that should be tracked.
+
+## Audit Log — Bugs/Gaps Found During Review (2026-03-20)
+
+Issues found by cross-referencing the original draft against the actual codebase:
+
+| #   | Issue                                                                                                                                    | Severity | Fix Applied                                                                                 |
+| --- | ---------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------------------------------------------------------------------------------------- |
+| 1   | `@bytelyst/events` already has `DurableEventBus` (queue-backed) — doc incorrectly described it as "event types + in-memory emitter"      | High     | ✅ Corrected "What Exists" + removed redundant task to create `@bytelyst/event-bus` package |
+| 2   | `jobs/` has **25 tests** — doc said 6                                                                                                    | Medium   | ✅ Fixed inventory table                                                                    |
+| 3   | `support-cases/` has **4 tests** — doc said 3                                                                                            | Low      | ✅ Fixed inventory table + Phase 6                                                          |
+| 4   | `ai-budgets` types are `BudgetPolicyDoc` + `BudgetSpendEntryDoc` + `BudgetAlertDoc` — doc said "BudgetPolicy + BudgetUsage"              | Medium   | ✅ Fixed Phase 3 "What Exists" with correct type names                                      |
+| 5   | `BudgetScopeTypeSchema` only supports `product` and `agent` — doc claimed org/workspace scopes already existed                           | High     | ✅ Reframed task 3.2 as "expand scope types" rather than "already supports"                 |
+| 6   | `POST /ai-budgets/spend` already has verdict logic (allow/warn/block), alert generation, model allowlist — Phase 3 tasks overstated work | High     | ✅ Rewrote Phase 3 to acknowledge existing 252 LOC verdict engine                           |
+| 7   | `reviews/notifications.ts` already has `notifyReviewAssigned()` — Phase 5 task 5.7 overstated                                            | Medium   | ✅ Reframed as "expand notifications"                                                       |
+| 8   | Test cumulative count started at 1,308 — actual baseline is **1,278**                                                                    | Medium   | ✅ Fixed all cumulative counts                                                              |
+| 9   | Phase 2 effort totaled 17d in a 15d (3-week) sprint — overflow                                                                           | Medium   | ✅ Reduced tasks 2.4, 2.5 to 1d each; added effort total callout                            |
+| 10  | Phase 6 dependency on Phase 3 (budget for LLM triage) is soft, not hard                                                                  | Low      | ✅ Marked as soft dependency                                                                |
+| 11  | `parentRunId` already exists in `RunSchema` — Phase 2 task 2.5 implied schema work                                                       | Low      | ✅ Clarified task is DAG query, not schema                                                  |
+| 12  | `SupportCaseNoteDoc.visibility` (internal/customer) already exists — Phase 6 task 6.6 overstated                                         | Low      | ✅ Reduced effort from 2d to 1d                                                             |
+| 13  | Missing sprint-level breakdown for "next 3 sprints" question                                                                             | Medium   | ✅ Added Sprint Plan section + 7-sprint mapping                                             |
+| 14  | `@bytelyst/queue` only has file + memory stores — Cosmos-backed store needed for production durability                                   | High     | ✅ Added as explicit task 1.3                                                               |
+| 15  | `ai-diagnostics/` has 5,235 LOC but **0 tests** — not called out as risk                                                                 | Medium   | ✅ Added to risk factors                                                                    |
+
+---
+
+**Next Step:** Review this roadmap, then start Phase 1 execution.