docs(roadmap): P3 platform deepening roadmap — reviewed and audited

- 6 phases: Event Bus, Agent Runtime, AI Budget, AI Evals, Reviews, Support Cases
- 7-sprint mapping (14 weeks) with parallelization opportunities
- Cross-referenced all claims against actual codebase — 15 bugs/gaps found and fixed:
  - DurableEventBus already exists in @bytelyst/events (not just in-memory)
  - jobs/ has 25 tests (not 6), support-cases/ has 4 (not 3)
  - ai-budgets already has verdict engine (252 LOC), alert generation, model allowlists
  - BudgetScopeType only supports product+agent (not org/workspace yet)
  - reviews/notifications.ts already has notifyReviewAssigned()
  - Phase 2 effort overflowed (17d in 15d sprint) — rebalanced
  - Test baseline corrected to 1,278 (not 1,308)
  - Identified Cosmos QueueStore gap as critical path for Phase 1
  - ai-diagnostics has 5,235 LOC but 0 tests — flagged as risk
- Estimated ~138 new tests bringing total to ~1,416
This commit is contained in:
saravanakumardb1 2026-03-20 01:20:49 -07:00
parent 9e510f7b49
commit 17f5671595

View File

@ -0,0 +1,342 @@
# P3 — Platform Deepening Roadmap
> **Scope:** 6 remaining P3 work items for `learning_ai_common_plat`
> **Created:** 2026-03-20
> **Status:** Draft — pending review
---
## Executive Summary
All P0P2 work is complete. The 6 remaining P3 items deepen **already-scaffolded** modules in `platform-service`. Every module listed below already has `types.ts`, `repository.ts`, `routes.ts`, and tests. The work is to add production-quality features, cross-module integrations, and comprehensive test coverage.
### Current Scaffold Inventory (verified 2026-03-20)
| Module | LOC | Files | Tests | Status |
| ----------------- | ----- | ----- | ----- | -------------------------------------------------------------------- |
| `jobs/` | 1,269 | 10 | 25 | Runner + cron + built-in jobs (most mature scaffold) |
| `runs/` | 680 | 7 | 5 | Run + step tracking + tracker utility |
| `reviews/` | 424 | 6 | 3 | Review queue with decisions + notification wiring |
| `agent-evals/` | 704 | 5 | 4 | Eval definitions + results |
| `ai-budgets/` | 681 | 5 | 4 | Budget policies + spend tracking + alert generation + verdict engine |
| `ai-diagnostics/` | 5,235 | 10 | 0 | NL query, clustering, LLM analysis (NO tests) |
| `support-cases/` | 514 | 5 | 4 | Cases + notes + escalation |
### Related Packages Already Built
| Package | Purpose | Maturity |
| ---------------------------- | ---------------------------------------------------------------------- | -------------------- |
| `@bytelyst/events` | `EventBus` (in-memory) + `DurableEventBus` (queue-backed with polling) | **Has durable mode** |
| `@bytelyst/event-store` | Persistent event log (file-store + memory-store) | Scaffolded |
| `@bytelyst/queue` | In-process task queue with `QueueWorker` + pluggable stores | Scaffolded |
| `@bytelyst/webhook-dispatch` | Webhook delivery with HMAC signing + retry | Production |
| `@bytelyst/fastify-sse` | Server-Sent Events hub + plugin | Production |
| `@bytelyst/llm-router` | LLM provider routing, fallback, health checks | Production |
| `@bytelyst/llm` | LLM client abstraction (factory, testing mock) | Production |
---
## Sprint Plan (Next 3 Sprints)
For 2-week sprints, here's the recommended execution order:
| Sprint | Weeks | Focus | Deliverables |
| ------------ | ----- | ------------------------------------------ | ---------------------------------------------------------------------------------------- |
| **Sprint 1** | 12 | Phase 1: Event Bus core + worker hardening | Event subscription registry, dispatcher wiring, DLQ, worker improvements, ~20 tests |
| **Sprint 2** | 34 | Phase 1 finish + Phase 2 start | Event replay, remaining event bus tests, agent executor, tool binding runtime, ~25 tests |
| **Sprint 3** | 56 | Phase 2 finish | Run streaming, agent scheduling, cancellation, token tracking, agent metrics, ~25 tests |
After sprint 3, Phases 36 can proceed (2 weeks each, Phases 3+6 parallelizable).
---
## Phase 1 — Durable Event Bus + Worker Runtime (3 weeks)
**Goal:** Wire the existing `DurableEventBus` and `@bytelyst/queue` into a subscription-driven dispatch system that powers webhooks, notifications, and job triggers across all modules.
### What Exists (already built)
- `@bytelyst/events``EventBus` (in-memory) + **`DurableEventBus`** (queue-backed with `QueueWorker` polling, 153 LOC)
- `@bytelyst/event-store` — persistent event log (file-store + memory-store implementations)
- `@bytelyst/queue``QueueWorker` with pluggable `QueueStore` (file-store + memory-store)
- `modules/jobs/` — job runner with cron scheduling, built-in jobs, registry (1,269 LOC, **25 tests**)
- `modules/webhooks/` — HMAC-signed delivery with retry + auto-disable
### What Needs Building
| # | Task | Effort | Priority |
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 1.1 | **Event subscription registry** — new `modules/event-subscriptions/` module: Cosmos container `event_subscriptions` with topic, handler type (webhook / job / notification / SSE), filter expression, active flag. CRUD routes. | 2d | Critical |
| 1.2 | **Event dispatcher** — new `src/lib/event-dispatcher.ts`: consumes `DurableEventBus`, on each event looks up matching subscriptions, routes to handler (invoke webhook-dispatch, trigger job, push notification, broadcast SSE) | 3d | Critical |
| 1.3 | **Cosmos outbox store**`QueueStore` implementation backed by Cosmos (currently only file + memory stores exist in `@bytelyst/queue`), so `DurableEventBus` can persist across restarts | 2d | Critical |
| 1.4 | **Dead-letter queue** — failed events after max retries go to `event_dlq` container with retry/purge admin endpoints | 1d | High |
| 1.5 | **Worker runtime hardening**`modules/jobs/runner.ts`: add concurrency limits, graceful shutdown, heartbeat liveness, stuck-job recovery | 2d | High |
| 1.6 | **Event replay** — admin endpoint to replay events from event-store by time range or topic (idempotency keys prevent duplicates) | 1d | Medium |
| 1.7 | **Tests** — subscription CRUD tests, dispatcher routing tests, Cosmos queue store tests, DLQ tests, worker lifecycle tests | 2d | Critical |
**Deliverables:** `event_subscriptions` + `event_dlq` containers, Cosmos-backed `QueueStore`, dispatcher wired into `server.ts` startup, ~25 new tests.
**Dependencies:** None — foundational for all subsequent phases.
> **Note:** The roadmap originally proposed creating a new `@bytelyst/event-bus` package, but `DurableEventBus` already exists in `@bytelyst/events`. The real gap is a Cosmos-backed `QueueStore` (only file + memory stores exist) and the subscription registry + dispatcher.
---
## Phase 2 — Agent Runtime Orchestration (3 weeks)
**Goal:** Complete the agent execution lifecycle — from definition to versioned deployment, run tracking, step execution, and observability.
### What Exists
- `modules/agents/` — agent registry with version lifecycle (publish/deprecate), key lookup (13 tests)
- `modules/runs/` — run + step tracking with status machine (5 tests)
- `modules/runs/tracker.ts` — run tracking utility (118 LOC)
- `@bytelyst/llm-router` — provider/model selection with fallback + health
### What Needs Building
| # | Task | Effort | Priority |
| --- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 2.1 | **Agent executor** — new `modules/agents/executor.ts`: resolve published version → build prompt → select model via llm-router → create run (via `tracker.ts`) → execute steps → record output | 3d | Critical |
| 2.2 | **Tool binding runtime** — resolve `toolBindings[]` from agent version to callable functions, sandboxed execution with timeout + token limits (allowlist-only, no arbitrary code) | 2d | Critical |
| 2.3 | **Run step streaming** — SSE endpoint `GET /runs/:id/stream` for real-time step progress (consumes `@bytelyst/fastify-sse`) | 1d | High |
| 2.4 | **Agent scheduling** — wire agents into jobs/cron: `POST /agents/:id/schedule` creates a recurring job that triggers agent execution | 1d | High |
| 2.5 | **Parent-child runs** — enable `parentRunId` linking for multi-agent orchestration (agent A triggers agent B), DAG query endpoint | 1d | Medium |
| 2.6 | **Run cancellation**`POST /runs/:id/cancel` with graceful abort propagation to in-flight LLM calls | 1d | High |
| 2.7 | **Token usage tracking** — extend `RunStepDoc` with `promptTokens`, `completionTokens`, `costUsd`; auto-record into `ai-budgets` spend via existing `POST /ai-budgets/spend` endpoint | 1d | High |
| 2.8 | **Agent metrics**`GET /agents/:id/metrics`: success rate, avg latency, token cost, run count (aggregated from runs collection) | 2d | Medium |
| 2.9 | **Tests** — executor unit tests, tool binding tests, scheduling tests, cancellation tests, metrics tests | 2d | Critical |
> **Effort total: 14d** (fits in 3 weeks with 1d buffer)
**Deliverables:** Agent executor pipeline, tool runtime, SSE streaming, scheduling integration, ~30 new tests.
**Dependencies:** Phase 1 (events for run lifecycle events, job runner for scheduling).
> **Note:** `modules/runs/tracker.ts` (118 LOC) already provides run-tracking helpers. Task 2.1 builds on top of it rather than starting from scratch. `parentRunId` is already a field in `RunSchema` — task 2.5 adds the DAG query, not the schema.
---
## Phase 3 — AI Budget & Cost Governance (2 weeks)
**Goal:** Extend existing budget verdict engine with org/workspace scopes, automated cost ingestion from runs, and cost reporting.
### What Exists (already built — more than expected)
- `modules/ai-budgets/` — budget policies + spend tracking + alert generation + verdict engine (681 LOC, 4 tests)
- Types: `BudgetPolicyDoc` (limits by period, soft/hard thresholds), `BudgetSpendEntryDoc` (tracked spend per call), `BudgetAlertDoc` (severity: warn/block)
- Scope types: currently `product` and `agent` only (via `BudgetScopeTypeSchema`)
- `POST /ai-budgets/spend` **already evaluates** budget verdict (allow/warn/block), generates alerts at threshold breaches, enforces model allowlists
- `GET /ai-budgets/policies/:id/status` already returns current spend vs. budget with verdict
### What Needs Building
| # | Task | Effort | Priority |
| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 3.1 | **Budget enforcement middleware** — Fastify preHandler wrapping the existing verdict logic: check budget before LLM calls, return 429 when `block` verdict. Currently callers must manually call `POST /ai-budgets/spend` — middleware automates this | 1d | Critical |
| 3.2 | **Expand scope types** — add `org` and `workspace` to `BudgetScopeTypeSchema`, implement scope inheritance (agent → workspace → org → product fallback chain) | 2d | High |
| 3.3 | **Cost ingestion from runs** — subscribe to `run.completed` events (Phase 1), auto-record token costs via existing spend endpoint. Eliminates manual spend recording | 1d | High |
| 3.4 | **Alert notifications** — wire existing `BudgetAlertDoc` creation into notifications module + optional webhook event dispatch (alert generation itself already works) | 1d | High |
| 3.5 | **Cost breakdown API**`GET /ai-budgets/costs`: breakdown by agent, model, time period, org. Supports CSV export | 2d | Medium |
| 3.6 | **Budget rollover** — configurable rollover policy: reset, carry-forward, or accumulate unused budget | 1d | Low |
| 3.7 | **Tests** — enforcement middleware tests, scope resolution tests, event-driven ingestion tests, cost aggregation tests | 1d | Critical |
> **Effort total: 9d** (fits in 2 weeks with 1d buffer)
**Deliverables:** Budget enforcement middleware, expanded scope types, event-driven cost ingestion, alert notifications, cost reporting, ~18 new tests.
**Dependencies:** Phase 2 (token tracking from runs), Phase 1 (event-driven cost ingestion).
> **Note:** The existing `POST /ai-budgets/spend` endpoint already has sophisticated verdict logic (252 LOC) with multi-policy evaluation, model allowlist enforcement, and alert generation. Phase 3 work is primarily about automation (middleware + event-driven ingestion) and scope expansion, not building the verdict engine from scratch.
---
## Phase 4 — AI Governance & Evals (2 weeks)
**Goal:** Evaluate agent quality with automated test suites, regression detection, and compliance checks before version promotion.
### What Exists
- `modules/agent-evals/` — eval definitions + result storage (704 LOC, 4 tests)
- `modules/agents/` — version lifecycle with publish/deprecate
- `@bytelyst/llm-router` — model routing
- `modules/ai-diagnostics/` — NL query, clustering, error normalization (5,235 LOC)
### What Needs Building
| # | Task | Effort | Priority |
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 4.1 | **Eval runner**`POST /agent-evals/:id/execute`: run eval test cases against an agent version, record pass/fail/score per case | 3d | Critical |
| 4.2 | **Eval test case management** — CRUD for test cases within an eval: input, expected output, scoring rubric (exact match, LLM-as-judge, regex, contains) | 2d | Critical |
| 4.3 | **Regression detection** — compare eval results across agent versions: flag regressions where score drops >N%, block publish if regression gate is enabled | 1d | High |
| 4.4 | **Pre-publish gate** — optional policy: agent version cannot be published unless latest eval passes threshold (wired into `POST /agents/:id/versions/:vId/publish`) | 1d | High |
| 4.5 | **Eval scheduling** — recurring evals on published versions (e.g., daily smoke test) via jobs/cron | 1d | Medium |
| 4.6 | **Eval report API**`GET /agent-evals/:id/report`: aggregate results, version comparison chart data, trend over time | 1d | Medium |
| 4.7 | **Compliance checks** — configurable rules: max response length, PII detection, banned phrases, required disclaimers. Run as post-eval validation | 2d | Medium |
| 4.8 | **Tests** — eval runner tests, regression detection tests, gate enforcement tests, compliance tests | 1d | Critical |
**Deliverables:** Eval execution pipeline, test case management, regression gates, compliance engine, ~25 new tests.
**Dependencies:** Phase 2 (agent executor for running evals), Phase 1 (events for eval completion notifications).
---
## Phase 5 — Human Review / Approval Queue (2 weeks)
**Goal:** Deepen the review module into a full human-in-the-loop approval system for agent actions, content changes, and sensitive operations.
### What Exists (already built)
- `modules/reviews/` — review items with decisions + notification wiring (424 LOC, 3 tests)
- `reviews/notifications.ts``notifyReviewAssigned()` already exists and is called on create/update
- Review types: `ReviewItemDoc` with status machine (pending → assigned → approved/rejected/cancelled/expired)
- `POST /reviews/:id/decision` — approve/reject/cancel with resolution audit trail (reason + actedBy + actedAt)
- `dueAt` field already exists on `ReviewItemDoc` (but no auto-expiry job yet)
### What Needs Building
| # | Task | Effort | Priority |
| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 5.1 | **Review policies** — configurable rules: which agent actions require review, auto-approve after N successful runs, escalation timers | 2d | Critical |
| 5.2 | **Batch review**`POST /reviews/batch-decide`: approve/reject multiple items with shared reason (max 50) | 1d | High |
| 5.3 | **Auto-expiry** — background job (via `modules/jobs/`) expires stale reviews past `dueAt`, with configurable default TTL per policy | 1d | High |
| 5.4 | **Delegation**`POST /reviews/:id/delegate`: reassign review to another user with audit trail | 1d | Medium |
| 5.5 | **Review queue stats**`GET /reviews/stats`: pending count by priority/category/assignee, avg resolution time, SLA compliance | 1d | High |
| 5.6 | **Review integration with agent runs** — when agent action requires review, run pauses at step, creates review item, resumes on approval (consumes Phase 2 executor) | 2d | Critical |
| 5.7 | **Expand review notifications**`notifyReviewAssigned()` already exists; add: review expiring soon, review decided, escalation triggered (wire into event bus from Phase 1) | 1d | Medium |
| 5.8 | **Tests** — policy enforcement tests, batch review tests, auto-expiry tests, delegation tests, stats tests | 1d | Critical |
> **Effort total: 10d** (fits in 2 weeks)
**Deliverables:** Review policies, batch operations, auto-expiry job, agent integration, queue analytics, ~20 new tests.
**Dependencies:** Phase 2 (agent run pause/resume), Phase 1 (events + job runner for expiry).
> **Note:** The review module is more mature than typical scaffolds — it already has notification wiring, decision audit trails, and workspace-scoped reviews. The main gaps are policies (automation rules), batch operations, and the agent-run integration.
---
## Phase 6 — Support Case Management (2 weeks)
**Goal:** Deepen support cases into a complete ticket system with SLA tracking, auto-triage, knowledge base integration, and customer communication.
### What Exists (already built)
- `modules/support-cases/` — cases + notes + escalation events (514 LOC, **4 tests**)
- Types: `SupportCaseDoc` (7 statuses, 4 priorities, 4 sources), `SupportCaseNoteDoc` (internal/customer visibility), `SupportEscalationEventDoc`
- Full CRUD routes: create/list/get/update cases, add notes, list notes, create escalation, list escalations
- Linked fields: `runId`, `reviewId`, `knowledgeBaseId` already on `SupportCaseDoc`
- `modules/knowledge/` — knowledge base with text search + retrieval (9 tests)
- `modules/ai-diagnostics/` — NL query, error clustering, LLM analysis (5,235 LOC, 0 tests)
### What Needs Building
| # | Task | Effort | Priority |
| --- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
| 6.1 | **SLA engine** — define SLA policies per priority (response time, resolution time), track compliance, fire alerts on breach via event bus | 2d | Critical |
| 6.2 | **Auto-triage** — on case creation, use LLM to classify priority + category + suggest knowledge articles, auto-assign based on rules | 2d | High |
| 6.3 | **Knowledge integration**`POST /support-cases/:id/suggest-articles`: search linked knowledge base (via existing `searchChunks`) for relevant content, attach top matches | 1d | High |
| 6.4 | **Case timeline** — unified timeline API merging notes, status changes, escalations, and linked run/review events | 1d | High |
| 6.5 | **Case metrics**`GET /support-cases/metrics`: open count by status/priority, MTTR, SLA compliance %, top categories | 1d | Medium |
| 6.6 | **Customer communication** — internal vs. customer-visible notes (visibility field already exists on `SupportCaseNoteDoc`), email notification on customer-visible note creation | 1d | Medium |
| 6.7 | **Case linking** — link related cases (duplicate, parent/child), merge duplicates with note consolidation | 1d | Medium |
| 6.8 | **Tests** — SLA engine tests, auto-triage tests, knowledge suggestion tests, timeline tests, metrics tests | 1d | Critical |
> **Effort total: 10d** (fits in 2 weeks)
**Deliverables:** SLA engine, auto-triage pipeline, knowledge integration, unified timeline, ~20 new tests.
**Dependencies:** Phase 1 (events for SLA timer jobs). Phase 3 is a **soft dependency** (budget awareness for LLM triage calls — can use existing spend endpoint directly if Phase 3 isn't complete).
> **Note:** The support-cases module already has robust types with visibility on notes, escalation events, and linked fields to runs/reviews/knowledge bases. Task 6.6 effort is reduced because the `visibility` enum (internal/customer) already exists on `SupportCaseNoteDoc` — the work is wiring email notifications, not schema changes.
---
## Summary Timeline
```
Phase 1: Durable Event Bus + Worker Runtime [Weeks 1-3] ██████████████
Phase 2: Agent Runtime Orchestration [Weeks 4-6] ██████████████
Phase 3: AI Budget & Cost Governance [Weeks 7-8] █████████
Phase 4: AI Governance & Evals [Weeks 9-10] █████████
Phase 5: Human Review / Approval Queue [Weeks 11-12] █████████
Phase 6: Support Case Management [Weeks 13-14] █████████
Total: ~14 weeks │
```
### Parallelization Opportunities
- **Phase 6** (Support Cases) has only a soft dependency on Phase 3 — can run **in parallel** with Phases 35
- **Phases 3 + 4** can overlap if token tracking (2.7) is completed early in Phase 2
### Sprint Mapping (2-week sprints)
| Sprint | Weeks | Phases | Key Milestone |
| -------- | ----- | ---------------------------------- | ---------------------------------------------- |
| Sprint 1 | 12 | Phase 1 (core) | Event subscriptions + dispatcher + DLQ working |
| Sprint 2 | 34 | Phase 1 (finish) + Phase 2 (start) | Agent executor + tool binding prototype |
| Sprint 3 | 56 | Phase 2 (finish) | Full agent runtime with streaming + metrics |
| Sprint 4 | 78 | Phase 3 + Phase 6 (parallel) | Budget middleware + SLA engine |
| Sprint 5 | 910 | Phase 4 + Phase 6 (finish) | Eval runner + pre-publish gates |
| Sprint 6 | 1112 | Phase 5 | Review policies + agent-run integration |
| Buffer | 1314 | Hardening | Cross-module integration testing, docs |
## Dependency Graph
```
Phase 1 (Event Bus)
├── Phase 2 (Agent Runtime) ──── requires events + job runner
│ ├── Phase 3 (AI Budget) ── requires token tracking from runs (task 2.7)
│ ├── Phase 4 (AI Evals) ─── requires agent executor (task 2.1)
│ └── Phase 5 (Reviews) ──── requires agent run pause/resume (task 2.1)
└── Phase 6 (Support Cases) ──── requires events for SLA timers (soft dep on Phase 3)
```
## Estimated New Test Count
> **Baseline:** 1,278 tests (verified 2026-03-20)
| Phase | New Tests | Cumulative |
| ----------------- | --------- | ---------- |
| 1 — Event Bus | ~25 | 1,303 |
| 2 — Agent Runtime | ~30 | 1,333 |
| 3 — AI Budget | ~18 | 1,351 |
| 4 — AI Evals | ~25 | 1,376 |
| 5 — Reviews | ~20 | 1,396 |
| 6 — Support Cases | ~20 | 1,416 |
| **Total** | **~138** | **~1,416** |
## Risk Factors
1. **LLM cost in evals** — Running eval suites against real LLMs can be expensive. Mitigate with mock mode + budget caps from Phase 3.
2. **Cosmos outbox store**`@bytelyst/queue` currently only has file + memory stores. A Cosmos-backed `QueueStore` is required for `DurableEventBus` to survive restarts. This is the critical path for Phase 1.
3. **Tool binding security** — Agent tool execution needs sandboxing. Start with allowlist-only tools, no arbitrary code execution.
4. **Phase coupling** — Phases 35 all depend on Phase 2. If Phase 2 slips, everything shifts. Mitigate by parallelizing Phase 6 (independent of Phase 2).
5. **ai-diagnostics has 0 tests** — 5,235 LOC with zero test coverage. Not in P3 scope but a significant tech debt item that should be tracked.
## Audit Log — Bugs/Gaps Found During Review (2026-03-20)
Issues found by cross-referencing the original draft against the actual codebase:
| # | Issue | Severity | Fix Applied |
| --- | ---------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------------------------------------------------------------------------------------- |
| 1 | `@bytelyst/events` already has `DurableEventBus` (queue-backed) — doc incorrectly described it as "event types + in-memory emitter" | High | ✅ Corrected "What Exists" + removed redundant task to create `@bytelyst/event-bus` package |
| 2 | `jobs/` has **25 tests** — doc said 6 | Medium | ✅ Fixed inventory table |
| 3 | `support-cases/` has **4 tests** — doc said 3 | Low | ✅ Fixed inventory table + Phase 6 |
| 4 | `ai-budgets` types are `BudgetPolicyDoc` + `BudgetSpendEntryDoc` + `BudgetAlertDoc` — doc said "BudgetPolicy + BudgetUsage" | Medium | ✅ Fixed Phase 3 "What Exists" with correct type names |
| 5 | `BudgetScopeTypeSchema` only supports `product` and `agent` — doc claimed org/workspace scopes already existed | High | ✅ Reframed task 3.2 as "expand scope types" rather than "already supports" |
| 6 | `POST /ai-budgets/spend` already has verdict logic (allow/warn/block), alert generation, model allowlist — Phase 3 tasks overstated work | High | ✅ Rewrote Phase 3 to acknowledge existing 252 LOC verdict engine |
| 7 | `reviews/notifications.ts` already has `notifyReviewAssigned()` — Phase 5 task 5.7 overstated | Medium | ✅ Reframed as "expand notifications" |
| 8 | Test cumulative count started at 1,308 — actual baseline is **1,278** | Medium | ✅ Fixed all cumulative counts |
| 9 | Phase 2 effort totaled 17d in a 15d (3-week) sprint — overflow | Medium | ✅ Reduced tasks 2.4, 2.5 to 1d each; added effort total callout |
| 10 | Phase 6 dependency on Phase 3 (budget for LLM triage) is soft, not hard | Low | ✅ Marked as soft dependency |
| 11 | `parentRunId` already exists in `RunSchema` — Phase 2 task 2.5 implied schema work | Low | ✅ Clarified task is DAG query, not schema |
| 12 | `SupportCaseNoteDoc.visibility` (internal/customer) already exists — Phase 6 task 6.6 overstated | Low | ✅ Reduced effort from 2d to 1d |
| 13 | Missing sprint-level breakdown for "next 3 sprints" question | Medium | ✅ Added Sprint Plan section + 7-sprint mapping |
| 14 | `@bytelyst/queue` only has file + memory stores — Cosmos-backed store needed for production durability | High | ✅ Added as explicit task 1.3 |
| 15 | `ai-diagnostics/` has 5,235 LOC but **0 tests** — not called out as risk | Medium | ✅ Added to risk factors |
---
**Next Step:** Review this roadmap, then start Phase 1 execution.