docs(roadmaps): add AI diagnostic assistant, A/B testing, and churn prediction roadmaps

- AI Diagnostic Assistant: LLM-powered root cause analysis, error clustering, natural language queries - Intelligent A/B Testing: Thompson sampling, Bayesian early stopping, AI hypothesis generation - Predictive Churn & Health: XGBoost models, health scoring, automated retention campaigns All roadmaps include: - Implementation tracking tables with status/commit columns - Quick reference sections with file structures - Phase-by-phase task breakdowns with [X.Y.Z] codes
2026-03-03 11:38:56 -08:00 · 2026-03-03 11:38:56 -08:00 · e98380003b
commit e98380003b
parent d510867b87
3 changed files with 2164 additions and 0 deletions
--- a/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
+++ b/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
@ -0,0 +1,597 @@
 # AI Diagnostic Assistant — Implementation Roadmap
 > **Module:** `platform-service/src/modules/ai-diagnostics/`  
 > **Admin UI:** `/ops/ai-diagnostics/`  
 > **Target:** LLM-powered root cause analysis from telemetry + debug sessions  
 > **Estimated Effort:** 2–3 weeks  
 > **Status:** 🟡 Planning
 ---
 ## Executive Summary
 This roadmap delivers an **AI-powered diagnostic assistant** that analyzes error patterns, debug session data, and telemetry to automatically suggest root causes—like having a senior engineer on-call 24/7. Engineers can ask natural language questions like _"Why did the iOS keyboard crash yesterday?"_ and receive AI-generated hypotheses with supporting evidence.
 ### Key Differentiators vs. Manual Debugging
 | Feature           | Manual Debugging            | AI Diagnostic Assistant             |
 | ----------------- | --------------------------- | ----------------------------------- |
 | Query             | SQL + log grep              | **Natural language**                |
 | Pattern Detection | Hours of manual correlation | **AI finds hidden patterns**        |
 | Context Assembly  | Check 5+ systems manually   | **Auto-assembles timeline**         |
 | Hypothesis        | Engineer intuition          | **LLM-generated + evidence**        |
 | Learning          | Per-engineer experience     | **Accumulates across all sessions** |
 ---
 ## Phase 1: Data Pipeline & Embeddings (Week 1)
 **Goal:** Extract, normalize, and embed error data for semantic search and clustering.
 ### 1.1 Error Fingerprinting & Clustering
 - [ ] **1.1.1** Create `modules/ai-diagnostics/types.ts`
  - [ ] `ErrorClusterDoc` — grouped similar errors with signature
  - [ ] `ErrorFingerprint` — normalized stack trace hash
  - [ ] `ClusterAnalysis` — AI-generated pattern description
  - [ ] Zod schemas for all inputs
  _Commit format:_ `git commit -m "feat(ai-diagnostics): add error clustering types [1.1.1]"` → `https://github.com/saravanakumardb1/learning_ai_common_plat/commit/<hash>`
 - [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
  - [ ] `error_clusters` (pk: `/productId`, TTL: 90 days)
  - [ ] `error_fingerprints` (pk: `/fingerprintHash`, unique index)
  - [ ] `diagnostic_insights` (pk: `/clusterId`, AI-generated analyses)
  _Commit format:_ `git commit -m "feat(ai-diagnostics): add cosmos containers for error clustering [1.1.2]"`
 - [ ] **1.1.3** Implement error normalization
  - [ ] Stack trace parsing (remove line numbers, file paths)
  - [ ] Message templating (replace UUIDs, timestamps, user IDs with placeholders)
  - [ ] Fingerprint generation (SHA-256 of normalized error)
  - [ ] Similarity scoring (Levenshtein for near-matches)
  _Commit format:_ `git commit -m "feat(ai-diagnostics): implement error normalization and fingerprinting [1.1.3]"`
 ### 1.2 Vector Embeddings for Semantic Search
 - [ ] **1.2.1** Create embedding pipeline
  - [ ] Azure OpenAI `text-embedding-3-small` integration
  - [ ] Error message + stack trace → 1536-dim vector
  - [ ] Batch embedding job (100 errors at a time)
 - [ ] **1.2.2** Cosmos DB vector search setup
  - [ ] Store embeddings in `error_clusters` documents
  - [ ] Cosine similarity query function
  - [ ] Similar error lookup by vector distance
 - [ ] **1.2.3** Clustering algorithm
  - [ ] HDBSCAN for density-based clustering
  - [ ] DBSCAN fallback for smaller datasets
  - [ ] Auto-determine cluster count (no manual k)
  - [ ] Re-cluster nightly as new errors arrive
 ### 1.3 Telemetry Ingestion for Context
 - [ ] **1.3.1** Link telemetry to errors
  - [ ] `correlationId` propagation across services
  - [ ] 5-minute window: error → preceding telemetry events
  - [ ] Session state reconstruction (what user was doing)
 - [ ] **1.3.2** Enrich error context
  - [ ] Device info (OS version, model, memory)
  - [ ] App state (screen, feature flags, config)
  - [ ] Recent API calls (network trace from diagnostics)
  - [ ] Recent user actions (breadcrumb trail)
 **Phase 1 Exit Criteria:**
 - [ ] Errors auto-clustered with 90%+ accuracy
 - [ ] Vector search returns semantically similar errors
 - [ ] 10,000+ historical errors embedded and clustered
 - [ ] Correlation pipeline links errors to telemetry context
 ---
 ## Phase 2: LLM Analysis Engine (Week 1–2)
 ### 2.1 Prompt Engineering & Analysis Pipeline
 - [ ] **2.1.1** Create analysis prompts
  - [ ] `ROOT_CAUSE_ANALYSIS` prompt template
    ```
    Given this error cluster:
    - Error signature: {fingerprint}
    - Sample stack traces: {samples}
    - Common context: {deviceStats}, {appState}
    - Preceding events: {breadcrumbSummary}
    - Similar resolved issues: {relatedClusters}
    Analyze and provide:
    1. Likely root cause category (config, dependency, logic, resource, external)
    2. Specific hypothesis with reasoning
    3. Evidence confidence (high/medium/low)
    4. Suggested investigation steps
    5. Potential fix direction
    ```
  - [ ] `PATTERN_SUMMARY` prompt for cluster descriptions
  - [ ] `COMPARATIVE_ANALYSIS` for error vs. baseline
 - [ ] **2.1.2** LLM integration
  - [ ] Azure OpenAI GPT-4o-mini for analysis (cost-effective)
  - [ ] GPT-4o for complex multi-factor analysis
  - [ ] Response JSON schema enforcement
  - [ ] Retry logic with exponential backoff
 ### 2.2 Insight Generation Service
 - [ ] **2.2.1** Create `modules/ai-diagnostics/analyzer.ts`
  - [ ] `analyzeCluster(clusterId)` — full analysis workflow
  - [ ] `generateInsight(errorContext)` — single error analysis
  - [ ] `compareClusters(clusterA, clusterB)` — diff analysis
 - [ ] **2.2.2** Analysis workflow
  - [ ] Fetch cluster data + related telemetry
  - [ ] Build LLM context (respect token limits)
  - [ ] Call LLM with structured prompt
  - [ ] Parse and validate response
  - [ ] Store insight in `diagnostic_insights`
 - [ ] **2.2.3** Confidence scoring
  - [ ] Evidence count weighting
  - [ ] Similar resolved issue bonus
  - [ ] Recency decay (older patterns = lower confidence)
  - [ ] Multi-model consensus (if available)
 ### 2.3 Continuous Learning
 - [ ] **2.3.1** Feedback loop
  - [ ] Engineer feedback: "Was this insight helpful? 👍/👎"
  - [ ] Resolution tracking (link commits to clusters)
  - [ ] Confidence recalibration based on outcomes
 - [ ] **2.3.2** Pattern accumulation
  - [ ] "Known issues" database (manually curated)
  - [ ] Historical fix patterns (what solved similar issues)
  - [ ] Regression detection (old issue reappearing)
 **Phase 2 Exit Criteria:**
 - [ ] LLM generates root cause hypotheses with evidence
 - [ ] Confidence scores align with actual resolution rates
 - [ ] Analysis completes in < 5 seconds for typical clusters
 - [ ] Feedback loop capturing engineer ratings
 ---
 ## Phase 3: Natural Language Query Interface (Week 2)
 ### 3.1 Query Understanding
 - [ ] **3.1.1** Create `modules/ai-diagnostics/query-parser.ts`
  - [ ] Intent classification (root cause, pattern search, comparison, trend)
  - [ ] Entity extraction (product, time range, error type, user segment)
  - [ ] Temporal parsing ("yesterday", "last week", "since v2.1")
  - [ ] Constraint identification ("only iOS", "excluding beta users")
 - [ ] **3.1.2** Query patterns
  - [ ] Root cause: _"Why did X happen?"_ → analyze cluster
  - [ ] Pattern search: _"Show me similar crashes"_ → vector search
  - [ ] Comparison: _"Did error rate increase after release?"_ → trend analysis
  - [ ] User impact: _"How many users affected by Y?"_ → aggregation query
 ### 3.2 Query Execution Engine
 - [ ] **3.2.1** Query → data pipeline
  - [ ] Map entities to Cosmos queries
  - [ ] Fetch relevant clusters, telemetry, sessions
  - [ ] Assemble context for response generation
 - [ ] **3.2.2** Response generation
  - [ ] Direct answers for simple queries
  - [ ] AI-generated summaries for complex analysis
  - [ ] Data + visualization suggestions
  - [ ] Drill-down links for exploration
 ### 3.3 REST API Routes
 - [ ] **3.3.1** Create `modules/ai-diagnostics/routes.ts`
  - [ ] `POST /ai-diagnostics/query` — natural language question
  - [ ] `GET /ai-diagnostics/clusters/:id/analysis` — pre-computed insight
  - [ ] `POST /ai-diagnostics/clusters/:id/analyze` — trigger fresh analysis
  - [ ] `GET /ai-diagnostics/suggestions` — auto-suggested investigations
  - [ ] `POST /ai-diagnostics/feedback` — submit insight rating
 **Phase 3 Exit Criteria:**
 - [ ] Natural language queries parse correctly (90%+ intent accuracy)
 - [ ] Query → response pipeline < 3 seconds
 - [ ] Complex queries return structured answers with evidence
 - [ ] API routes tested and documented
 ---
 ## Phase 4: Admin Dashboard UI (Week 2–3)
 ### 4.1 AI Insights Page
 - [ ] **4.1.1** Create `/ops/ai-diagnostics/page.tsx`
  - [ ] Smart search bar (natural language input)
  - [ ] Suggested queries based on recent errors
  - [ ] Recent AI-generated insights list
  - [ ] Trending clusters (auto-detected anomalies)
 - [ ] **4.1.2** Query results view
  - [ ] AI-generated answer with confidence badge
  - [ ] Supporting evidence cards (cluster stats, sample errors)
  - [ ] Related debug sessions (linked traces)
  - [ ] Timeline visualization of error pattern
  - [ ] "Investigate further" actions
 ### 4.2 Cluster Detail with AI Analysis
 - [ ] **4.2.1** Enhance error cluster detail
  - [ ] AI-generated summary card ("This appears to be...")
  - [ ] Root cause hypothesis with confidence
  - [ ] Evidence breakdown (stack samples, device patterns, API failures)
  - [ ] Suggested fixes from similar resolved issues
  - [ ] "Request deeper analysis" button (GPT-4o)
 - [ ] **4.2.2** Interactive investigation
  - [ ] Compare with other clusters ("Show me similar issues")
  - [ ] Filter by context (OS version, app version, feature flags)
  - [ ] View affected user journeys (breadcrumb trails)
 ### 4.3 Proactive Alerts
 - [ ] **4.3.1** Anomaly detection
  - [ ] Auto-detect emerging error clusters
  - [ ] Spike in existing cluster frequency
  - [ ] New error types after releases
 - [ ] **4.3.2** AI-generated alerts
  - [ ] Slack/Teams notification with summary
  - [ ] "Investigate in AI Diagnostics" deep link
  - [ ] Auto-started debug session recommendations
 **Phase 4 Exit Criteria:**
 - [ ] Admin can ask questions and get AI-generated answers
 - [ ] Cluster detail shows AI analysis with evidence
 - [ ] Proactive alerts for emerging issues
 - [ ] Full test coverage (UI + API)
 ---
 ## Phase 5: Advanced Capabilities (Future)
 ### 5.1 Multi-Modal Analysis
 - [ ] Analyze screenshots from debug sessions for UI issues
 - [ ] Voice transcription analysis (for voice app errors)
 - [ ] Performance trace visualization with AI annotations
 ### 5.2 Predictive Diagnostics
 - [ ] Pre-crash pattern detection (warn before crash happens)
 - [ ] Resource exhaustion prediction (memory, disk, API quotas)
 - [ ] Config drift detection ("this setting combination often fails")
 ### 5.3 Self-Healing Suggestions
 - [ ] Auto-generated config recommendations
 - [ ] Feature flag rollback suggestions
 - [ ] Circuit breaker threshold recommendations
 ## Implementation Tracking
 | Phase | Task                       | Status | Commit |
 | ----- | -------------------------- | ------ | ------ |
 | 1.1   | Error clustering types     | ⬜     | —      |
 | 1.1   | Cosmos containers          | ⬜     | —      |
 | 1.1   | Error normalization        | ⬜     | —      |
 | 1.2   | Embedding pipeline         | ⬜     | —      |
 | 1.2   | Vector search setup        | ⬜     | —      |
 | 1.2   | Clustering algorithm       | ⬜     | —      |
 | 1.3   | Telemetry linking          | ⬜     | —      |
 | 1.3   | Error context enrichment   | ⬜     | —      |
 | 2.1   | Analysis prompts           | ⬜     | —      |
 | 2.1   | LLM integration            | ⬜     | —      |
 | 2.2   | Insight generation service | ⬜     | —      |
 | 2.2   | Analysis workflow          | ⬜     | —      |
 | 2.2   | Confidence scoring         | ⬜     | —      |
 | 2.3   | Feedback loop              | ⬜     | —      |
 | 2.3   | Pattern accumulation       | ⬜     | —      |
 | 3.1   | Query parser               | ⬜     | —      |
 | 3.1   | Query patterns             | ⬜     | —      |
 | 3.2   | Query execution            | ⬜     | —      |
 | 3.2   | Response generation        | ⬜     | —      |
 | 3.3   | REST API routes            | ⬜     | —      |
 | 4.1   | AI insights page           | ⬜     | —      |
 | 4.1   | Query results view         | ⬜     | —      |
 | 4.2   | Cluster detail             | ⬜     | —      |
 | 4.2   | Interactive investigation  | ⬜     | —      |
 | 4.3   | Proactive alerts           | ⬜     | —      |
 **Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
 ---
 ## Quick Reference for Implementing Agent
 **📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md`
 **Key Files to Modify/Create:**
 ```
 services/platform-service/
 ├── src/
 │   ├── modules/ai-diagnostics/
 │   │   ├── types.ts              # [1.1.1] Error clustering types
 │   │   ├── repository.ts         # [1.2] Data access layer
 │   │   ├── analyzer.ts           # [2.2] LLM analysis engine
 │   │   ├── query-parser.ts       # [3.1] NL query understanding
 │   │   ├── query-executor.ts     # [3.2] Query execution
 │   │   ├── routes.ts             # [3.3] REST API
 │   │   └── ai-diagnostics.test.ts # Tests
 │   ├── lib/
 │   │   ├── cosmos-init.ts        # [1.1.2] Add containers
 │   │   ├── embedding-client.ts   # [1.2.1] Azure OpenAI embeddings
 │   │   └── pii-redaction.ts      # Reuse existing
 │   └── server.ts                 # [3.3] Register routes
 dashboards/admin-web/
 ├── src/
 │   ├── app/(dashboard)/
 │   │   ├── ai-diagnostics/
 │   │   │   ├── page.tsx          # [4.1] Main insights page
 │   │   │   └── [id]/
 │   │   │       └── page.tsx      # [4.2] Cluster detail
 │   ├── lib/
 │   │   └── ai-diagnostics-client.ts # API client
 │   └── components/
 │       └── ai-diagnostics/       # Reusable components
 ```
 **Commit Message Format:**
 ```
 feat(ai-diagnostics): <description> [<task.code>]
 ```
 **Example:**
 ```bash
 git add services/platform-service/src/modules/ai-diagnostics/
 git commit -m "feat(ai-diagnostics): add error clustering types and cosmos containers [1.1.1-1.1.2]"
 ```
 **Testing Requirements:**
 - Unit tests: 20+ Vitest tests for clustering, embeddings, LLM responses
 - Integration tests: End-to-end query → analysis pipeline
 **Dependencies:**
 - Telemetry module (error events)
 - Azure OpenAI (embeddings + GPT-4o)
 - Existing diagnostics module (optional linking)
 ---
 ### ErrorClusterDoc
 ```typescript
 interface ErrorClusterDoc {
  id: string; // ec_<uuid>
  productId: string; // partition key
  fingerprintHash: string; // SHA-256 of normalized error
  // Cluster metadata
  firstSeenAt: string; // ISO 8601
  lastSeenAt: string;
  occurrenceCount: number; // Total occurrences
  uniqueUsers: number; // Affected user count
  // Error signature
  errorType: string; // Exception class/name
  messageTemplate: string; // Normalized message with placeholders
  stackSignature: string; // Normalized stack frames
  // Vector embedding for semantic search
  embedding: number[]; // 1536-dim from text-embedding-3-small
  embeddingVersion: string; // Model version for re-embedding
  // Context patterns (auto-extracted)
  commonContext: {
    osVersions: Array<{ version: string; count: number }>;
    appVersions: Array<{ version: string; count: number }>;
    deviceModels: Array<{ model: string; count: number }>;
    screenContexts: Array<{ screen: string; count: number }>;
  };
  // Related data
  relatedClusterIds: string[]; // Similar clusters (vector similarity)
  mergedIntoClusterId?: string; // If deduplicated
  // Resolution tracking
  status: 'active' | 'investigating' | 'resolved' | 'ignored';
  resolvedAt?: string;
  resolutionCommit?: string; // Link to fix
  // Timestamps
  createdAt: string;
  updatedAt: string;
  ttl: number; // 90 days
 }
 ```
 ### DiagnosticInsightDoc
 ```typescript
 interface DiagnosticInsightDoc {
  id: string; // di_<uuid>
  clusterId: string; // partition key (with productId)
  productId: string;
  // AI-generated analysis
  analysisType: 'root_cause' | 'pattern' | 'comparison' | 'trend';
  generatedAt: string;
  // LLM output
  rootCauseCategory: 'config' | 'dependency' | 'logic' | 'resource' | 'external' | 'unknown';
  hypothesis: string; // Natural language explanation
  reasoning: string; // Why LLM thinks this
  confidence: 'high' | 'medium' | 'low';
  confidenceScore: number; // 0.0–1.0
  // Evidence
  evidence: Array<{
    type:
      | 'stack_trace'
      | 'telemetry_pattern'
      | 'device_correlation'
      | 'api_failure'
      | 'similar_issue';
    description: string;
    strength: 'strong' | 'moderate' | 'weak';
    data: Record<string, unknown>;
  }>;
  // Suggested actions
  suggestedInvestigation: string[];
  potentialFixDirection?: string;
  similarResolvedIssues?: Array<{
    clusterId: string;
    resolution: string;
    confidence: number;
  }>;
  // Feedback
  feedbackStats: {
    helpful: number;
    notHelpful: number;
    engineerNotes: string[];
  };
  // LLM metadata
  modelUsed: string; // gpt-4o, gpt-4o-mini
  promptTokens: number;
  completionTokens: number;
  createdAt: string;
  ttl: number; // 90 days
 }
 ```
 ### NaturalLanguageQueryDoc
 ```typescript
 interface NaturalLanguageQueryDoc {
  id: string; // nq_<uuid>
  userId: string; // Admin who asked
  productId?: string; // Optional filter
  // Query
  rawQuery: string; // "Why did iOS keyboard crash yesterday?"
  parsedIntent: 'root_cause' | 'pattern_search' | 'comparison' | 'trend' | 'impact';
  extractedEntities: {
    products?: string[];
    timeRange?: { start: string; end: string };
    errorTypes?: string[];
    platforms?: string[];
    userSegments?: string[];
  };
  // Execution
  executedQuery: string; // Translated Cosmos query
  dataSources: string[]; // Clusters, telemetry, sessions accessed
  executionTimeMs: number;
  // Response
  aiResponse: string; // Generated answer
  confidence: number; // Overall confidence
  supportingData: Array<{
    type: 'cluster' | 'telemetry' | 'session';
    id: string;
    relevanceScore: number;
  }>;
  // Feedback
  userRating?: 'helpful' | 'not_helpful';
  userComment?: string;
  createdAt: string;
  ttl: number; // 30 days
 }
 ```
 ---
 ## Appendix B: API Reference
 | Method | Endpoint                                | Auth  | Description                             |
 | ------ | --------------------------------------- | ----- | --------------------------------------- |
 | POST   | `/ai-diagnostics/query`                 | Admin | Natural language diagnostic query       |
 | GET    | `/ai-diagnostics/clusters`              | Admin | List error clusters (with AI summaries) |
 | GET    | `/ai-diagnostics/clusters/:id`          | Admin | Cluster detail with AI analysis         |
 | POST   | `/ai-diagnostics/clusters/:id/analyze`  | Admin | Trigger fresh LLM analysis              |
 | GET    | `/ai-diagnostics/clusters/:id/analysis` | Admin | Get pre-computed insight                |
 | GET    | `/ai-diagnostics/suggestions`           | Admin | AI-suggested investigations             |
 | POST   | `/ai-diagnostics/feedback`              | Admin | Rate insight helpfulness                |
 | POST   | `/ai-diagnostics/search`                | Admin | Semantic search across errors           |
 ---
 ## Appendix C: Integration Points
 ### With Telemetry Module
 - Error events auto-create/update clusters
 - Telemetry context enriches error analysis
 - Correlation IDs link errors to user journeys
 ### With Diagnostics Module
 - Debug sessions linked to error clusters
 - Screenshots from sessions aid visual analysis
 - Network traces provide API failure context
 ### With Event Bus
 | Event                           | Action                                                    |
 | ------------------------------- | --------------------------------------------------------- |
 | `telemetry.error.ingested`      | Update/create cluster, trigger re-analysis if new pattern |
 | `diagnostics.session.completed` | Link session to related clusters, analyze captured logs   |
 | `diagnostics.ingest.fatal`      | High-priority cluster analysis, alert if novel pattern    |
 ---
 ## Appendix D: Cost Estimation
 | Component                | Monthly Cost (est.)             |
 | ------------------------ | ------------------------------- |
 | Azure OpenAI embeddings  | $50–100 (10K errors/day)        |
 | GPT-4o-mini analysis     | $100–200 (1K analyses/day)      |
 | GPT-4o deep analysis     | $50–100 (100 deep analyses/day) |
 | Cosmos DB vector storage | $20–50                          |
 | **Total**                | **$220–450/month**              |
 Optimization:
 - Cache frequent cluster analyses (24hr TTL)
 - Use GPT-4o-mini for 90% of queries
 - Batch embedding jobs during off-peak
 ---
 ## Current Status
 - [ ] **Design complete** — Target: 2026-03-10
 - [ ] **Phase 1: Data Pipeline** — Not started
 - [ ] **Phase 2: LLM Engine** — Not started
 - [ ] **Phase 3: Query Interface** — Not started
 - [ ] **Phase 4: Admin UI** — Not started
 - [ ] **Phase 5: Advanced Capabilities** — Future
 **Estimated Timeline:** 2–3 weeks (Phases 1–4)
 **Dependencies:**
 - Telemetry module (must be collecting errors)
 - Diagnostics module (optional, for rich context)
 - Azure OpenAI deployment (embedding + GPT-4o access)
 ---
 _Last Updated: 2026-03-03_
--- a/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
+++ b/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
@ -0,0 +1,719 @@
 # Intelligent A/B Testing — Implementation Roadmap
 > **Module:** `platform-service/src/modules/ab-testing/`  
 > **Admin UI:** `/ops/experiments/`  
 > **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation  
 > **Estimated Effort:** 2.5–3 weeks  
 > **Status:** 🟡 Planning
 ---
 ## Executive Summary
 This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns.
 ### Key Differentiators vs. Static Feature Flags
 | Capability         | Static Flags (Current) | Intelligent A/B Testing                   |
 | ------------------ | ---------------------- | ----------------------------------------- |
 | Traffic Allocation | Manual percentage      | **Multi-armed bandit optimization**       |
 | Stopping Decision  | Manual monitoring      | **Auto-stop at statistical significance** |
 | Winner Selection   | Human judgment         | **Bayesian probability of superiority**   |
 | Test Duration      | Fixed (often wrong)    | **Dynamic based on effect size**          |
 | Hypothesis         | Human-written          | **AI-generated from usage patterns**      |
 | Sample Size        | Guesswork              | **Power analysis + sequential testing**   |
 ---
 ## Phase 1: Core Experiment Engine (Week 1)
 ### 1.1 Data Model & Schemas
 - [ ] **1.1.1** Create `modules/ab-testing/types.ts`
  - [ ] `ExperimentDoc` — experiment definition and config
  - [ ] `VariantDoc` — variant metadata + metrics
  - [ ] `AssignmentDoc` — user → variant assignments
  - [ ] `MetricDoc` — event types being tracked
  - [ ] `ExperimentResult` — statistical analysis results
  - [ ] Zod schemas for all inputs
 - [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
  - [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed)
  - [ ] `experiment_variants` (pk: `/experimentId`)
  - [ ] `experiment_assignments` (pk: `/userId`, query by experiment)
  - [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series)
  - [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates)
 ### 1.2 Assignment & Bucketing
 - [ ] **1.2.1** Create deterministic bucketing
  - [ ] Consistent hashing (userId + experimentId → variant)
  - [ ] FNV-1a hash algorithm (same as feature flags)
  - [ ] Sticky assignments (user always sees same variant)
  - [ ] Override capability (force specific variant for QA)
 - [ ] **1.2.2** Assignment strategies
  - [ ] `random` — Simple randomization (control vs static)
  - [ ] `thompson` — Thompson sampling (multi-armed bandit)
  - [ ] `epsilon_greedy` — Epsilon-greedy exploration
  - [ ] `ucb` — Upper Confidence Bound algorithm
 - [ ] **1.2.3** Audience targeting
  - [ ] User property filters (platform, version, region, subscription tier)
  - [ ] Percentage rollout within target segment
  - [ ] Exclusion lists (beta users, internal accounts)
 ### 1.3 Event Tracking Pipeline
 - [ ] **1.3.1** Metric definitions
  - [ ] `conversion` — Binary (did/didn't convert)
  - [ ] `count` — Integer events (sessions, messages)
  - [ ] `duration` — Time-based (session length, task time)
  - [ ] `revenue` — Monetary (purchase amount, LTV)
  - [ ] `custom` — Arbitrary numeric values
 - [ ] **1.3.2** Event ingestion
  - [ ] `POST /ab-testing/events` batch endpoint
  - [ ] Client SDK: `track(event, value, metadata)`
  - [ ] Automatic attribution (which variant caused this event)
  - [ ] Deduplication (eventId + userId uniqueness)
 **Phase 1 Exit Criteria:**
 - [ ] Experiments created with multiple variants
 - [ ] Users consistently assigned to variants
 - [ ] Events tracked and attributed correctly
 - [ ] 20+ tests for assignment and ingestion
 ---
 ## Phase 2: Statistical Analysis Engine (Week 1–2)
 ### 2.1 Bayesian Inference
 - [ ] **2.1.1** Create `modules/ab-testing/statistics.ts`
  - [ ] `BetaDistribution` for conversion rates
  - [ ] `GammaDistribution` for count/duration metrics
  - [ ] `NormalDistribution` for continuous metrics
  - [ ] Monte Carlo simulation (10,000 samples)
 - [ ] **2.1.2** Probability calculations
  - [ ] `probabilityVariantBeatsControl(variant, control)`
  - [ ] `expectedLossIfChosen(variant)`
  - [ ] `probabilityBeatAllVariants(variant)`
 - [ ] **2.1.3** Credible intervals
  - [ ] 95% credible interval for each variant's true metric
  - [ ] Visualization-ready (lower, mean, upper bounds)
 ### 2.2 Early Stopping Rules
 - [ ] **2.2.1** Stopping criteria
  - [ ] **Winner found:** Variant has > 95% probability of beating control
  - [ ] **Loser clear:** Control has > 95% probability of beating variant
  - [ ] **Practical significance:** Minimum detectable effect not reached
  - [ ] **Time bound:** Max duration reached (safety limit)
 - [ ] **2.2.2** Auto-promotion
  - [ ] Auto-rollout winner to 100% when threshold hit
  - [ ] Notify admins via Slack/email
  - [ ] Create audit log entry
 - [ ] **2.2.3** Guardrails
  - [ ] Minimum sample size before early stopping (100 users/variant)
  - [ ] Business hours only for auto-actions
  - [ ] Require approval for revenue-impacting experiments
 ### 2.3 Thompson Sampling
 - [ ] **2.3.1** Multi-armed bandit implementation
  - [ ] Sample from posterior distributions
  - [ ] Assign user to variant with highest sample
  - [ ] Re-balance traffic every hour based on performance
 - [ ] **2.3.2** Exploration vs exploitation
  - [ ] Exploration rate decays over time
  - [ ] High uncertainty = more exploration
  - [ ] Clear winner = more traffic to winner
 - [ ] **2.3.3** Regret minimization
  - [ ] Track cumulative regret vs optimal variant
  - [ ] Regret bounds reporting
 **Phase 2 Exit Criteria:**
 - [ ] Bayesian probabilities calculated correctly
 - [ ] Early stopping triggers at appropriate thresholds
 - [ ] Thompson sampling re-allocates traffic dynamically
 - [ ] Statistical tests validate correctness
 ---
 ## Phase 3: AI-Powered Hypothesis Generation (Week 2)
 ### 3.1 Pattern Detection
 - [ ] **3.1.1** Usage pattern analysis
  - [ ] Analyze feature flag usage telemetry
  - [ ] Segment analysis (iOS vs Android, free vs pro)
  - [ ] Temporal patterns (day of week, time of day)
  - [ ] User behavior sequences (funnel analysis)
 - [ ] **3.1.2** Anomaly detection
  - [ ] Unexpected drop in feature adoption
  - [ ] Performance regression signals
  - [ ] User segment showing different behavior
 - [ ] **3.1.3** Opportunity identification
  - [ ] Underperforming features (low adoption)
  - [ ] High-dropoff flows
  - [ ] Competitor feature gaps
 ### 3.2 Hypothesis Generation
 - [ ] **3.2.1** LLM hypothesis prompts
  ```
  Given this feature usage data:
  - Feature: {featureName}
  - Current adoption: {adoptionRate}% (baseline: {baseline}%)
  - Segment performance: {segmentData}
  - User feedback: {feedbackSamples}
  - Competitor analysis: {competitorFeatures}
  Generate experiment hypotheses:
  1. Primary hypothesis: "Changing X will improve Y because..."
  2. Secondary hypotheses (2-3 alternatives)
  3. Expected effect size (conservative estimate)
  4. Success metric recommendation
  5. Risk assessment
  ```
 - [ ] **3.2.2** Hypothesis ranking
  - [ ] Expected impact scoring
  - [ ] Implementation difficulty estimate
  - [ ] Statistical power prediction
  - [ ] Risk-adjusted expected value
 - [ ] **3.2.3** Suggested experiment design
  - [ ] Variant count recommendation
  - [ ] Traffic allocation suggestion
  - [ ] Duration estimate
  - [ ] Required sample size calculation
 ### 3.3 Auto-Experiment Suggestions
 - [ ] **3.3.1** Weekly AI reports
  - [ ] Top 5 experiment opportunities
  - [ ] Hypotheses with supporting evidence
  - [ ] Prioritized by expected impact
 - [ ] **3.3.2** One-click experiment creation
  - [ ] Pre-fill experiment from hypothesis
  - [ ] Suggested variants with descriptions
  - [ ] Pre-configured metrics
 **Phase 3 Exit Criteria:**
 - [ ] AI generates meaningful hypotheses from usage data
 - [ ] Hypothesis quality rated by product team (80%+ useful)
 - [ ] Auto-suggested experiments created in 1 click
 - [ ] Weekly reports generated automatically
 ---
 ## Phase 4: Admin Dashboard UI (Week 2–3)
 ### 4.1 Experiments List Page
 - [ ] **4.1.1** Create `/ops/experiments/page.tsx`
  - [ ] Experiment cards (status, duration, sample size)
  - [ ] Quick filters (running, completed, draft)
  - [ ] AI-generated hypothesis badge
  - [ ] Health indicators (traffic balance, event flow)
 - [ ] **4.1.2** Experiment creation wizard
  - [ ] Step 1: Define hypothesis (AI suggestions available)
  - [ ] Step 2: Create variants (name, description, config)
  - [ ] Step 3: Select metrics (primary + secondary)
  - [ ] Step 4: Audience targeting
  - [ ] Step 5: Traffic allocation (manual or Thompson)
  - [ ] Step 6: Review and launch
 ### 4.2 Live Experiment Dashboard
 - [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx`
  - [ ] Real-time metrics comparison
  - [ ] Variant performance table (conversions, counts, durations)
  - [ ] Bayesian probability visualization
  - [ ] Credible interval charts
 - [ ] **4.2.2** Statistical summary card
  - [ ] Probability of beating control (per variant)
  - [ ] Expected lift if implemented
  - [ ] Sample size progress bar
  - [ ] Days to significance estimate
 - [ ] **4.2.3** Action buttons
  - [ ] Adjust traffic allocation
  - [ ] Pause/resume experiment
  - [ ] Stop and declare winner
  - [ ] Rollout winner to 100%
  - [ ] Archive experiment
 ### 4.3 Results & Reporting
 - [ ] **4.3.1** Results page
  - [ ] Final statistical summary
  - [ ] Variant comparison visualization
  - [ ] Segment breakdown (iOS vs Android, etc.)
  - [ ] Confidence intervals over time
 - [ ] **4.3.2** AI insights panel
  - [ ] Why this result occurred (LLM summary)
  - [ ] Unexpected findings
  - [ ] Follow-up experiment suggestions
 - [ ] **4.3.3** Export capabilities
  - [ ] CSV export of raw data
  - [ ] PDF report generation
  - [ ] API endpoint for data warehouse sync
 **Phase 4 Exit Criteria:**
 - [ ] Full experiment lifecycle manageable in UI
 - [ ] Real-time stats visible and accurate
 - [ ] Bayesian visualizations clear to non-statisticians
 - [ ] Export and reporting functional
 ---
 ## Phase 5: Advanced Capabilities (Future)
 ### 5.1 Multi-Variate Testing
 - [ ] Test multiple variables simultaneously
 - [ ] Full factorial and fractional factorial designs
 - [ ] Interaction effect detection
 ### 5.2 Sequential Experimentation
 - [ ] Multi-phase experiments (qualification → main → validation)
 - [ ] Holdout groups for long-term validation
 - [ ] Global holdout (never-exposed users)
 ### 5.3 Personalization Layer
 - [ ] Contextual bandits (different variants for different users)
 - [ ] ML model for variant selection
 - [ ] Automatic personalization optimization
 ### 5.4 Experiment Coordination
 - [ ] Mutually exclusive experiments
 - [ ] Experiment priority rules
 - [ ] Layered experimentation (orthogonal tests)
 ---
 ## Appendix A: Data Models
 ### ExperimentDoc
 ```typescript
 interface ExperimentDoc {
  id: string; // exp_<uuid>
  productId: string; // partition key
  // Experiment definition
  name: string;
  description: string;
  hypothesis: string;
  aiGeneratedHypothesis?: boolean; // Flag for AI-suggested
  // Status lifecycle: draft → running → paused | stopped | completed
  status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';
  // Variants
  controlVariantId: string; // Baseline variant
  variantIds: string[]; // All variant IDs
  // Configuration
  allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
  targetPercent: number; // % of eligible traffic
  // Audience targeting
  targeting: {
    platforms?: string[]; // ios, android, web
    appVersions?: { min: string; max?: string };
    regions?: string[];
    userSegments?: string[]; // pro, free, enterprise
    userProperties?: Record<string, string | number | boolean>;
  };
  // Metrics
  primaryMetric: {
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string; // Telemetry event to track
    aggregation: 'sum' | 'mean' | 'count' | 'unique';
    direction: 'increase' | 'decrease'; // Is higher better?
    minimumDetectableEffect: number; // % change we want to detect
  };
  secondaryMetrics: Array<{
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string;
  }>;
  // Guardrails
  guardrails: {
    minSampleSizePerVariant: number; // Default: 100
    maxDurationDays: number; // Safety limit, default: 30
    autoStopEnabled: boolean;
    winnerThreshold: number; // % probability to auto-stop, default: 95
    requireApprovalFor: 'none' | 'revenue' | 'all';
  };
  // Scheduling
  startAt?: string; // Scheduled start (ISO 8601)
  endAt?: string; // Scheduled end or actual stop
  // Stats (denormalized for fast reads)
  totalParticipants: number;
  totalEvents: number;
  // Timestamps
  createdAt: string;
  updatedAt: string;
  startedAt?: string;
  completedAt?: string;
  ttl: number; // 2 years for completed
 }
 ```
 ### VariantDoc
 ```typescript
 interface VariantDoc {
  id: string; // var_<uuid>
  experimentId: string; // partition key
  // Variant definition
  name: string; // "Control", "New Button Color", etc.
  description?: string;
  isControl: boolean;
  // Feature flag configuration
  flagConfig: Record<string, unknown>; // Arbitrary config payload
  // Traffic allocation (dynamic for bandit strategies)
  currentAllocationPercent: number; // 0–100%
  // Statistics (real-time computed)
  stats: {
    participants: number;
    events: number;
    // Primary metric
    primaryMetricValue: number; // Mean or conversion rate
    primaryMetricStdDev?: number;
    // For conversion metrics
    conversions?: number;
    conversionRate?: number; // 0–1
    // Bayesian posterior parameters
    betaAlpha?: number; // For Beta distribution
    betaBeta?: number;
    gammaShape?: number; // For Gamma distribution
    gammaScale?: number;
  };
  // Bayesian results
  bayesianResults?: {
    probabilityBeatsControl: number; // 0–1
    probabilityBeatsAll: number; // 0–1
    expectedLiftPercent: number; // Relative to control
    expectedLoss: number; // Risk of choosing this variant
    credibleInterval: {
      lower: number;
      mean: number;
      upper: number;
    };
  };
  createdAt: string;
  updatedAt: string;
 }
 ```
 ### ExperimentAssignmentDoc
 ```typescript
 interface ExperimentAssignmentDoc {
  id: string; // ea_<uuid>
  userId: string; // partition key (for user lookups)
  experimentId: string;
  variantId: string;
  // Assignment metadata
  assignedAt: string; // First assignment
  firstExposedAt?: string; // First actual exposure (feature use)
  // Context at assignment
  assignmentContext: {
    platform: string;
    appVersion: string;
    osVersion: string;
    deviceModel?: string;
    region?: string;
  };
  // Events attributed to this assignment
  eventCount: number;
  lastEventAt?: string;
  // TTL: Remove after experiment completes + analysis period
  ttl: number; // experimentEnd + 90 days
 }
 ```
 ### ExperimentEventDoc
 ```typescript
 interface ExperimentEventDoc {
  id: string; // ee_<uuid>
  experimentId: string; // partition key
  timestamp: string; // Sort key for time-series queries
  // Attribution
  userId: string;
  variantId: string;
  assignmentId: string;
  // Event details
  metricName: string;
  metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
  value: number; // Numeric value
  // Conversion tracking (for binary metrics)
  converted: boolean; // For conversion metrics
  // Context
  eventMetadata?: Record<string, unknown>;
  // Denormalized for filtering
  platform: string;
  appVersion: string;
  // TTL: Shorter for raw events
  ttl: number; // 90 days
 }
 ```
 ---
 ## Implementation Tracking
 | Phase | Task                          | Status | Commit |
 | ----- | ----------------------------- | ------ | ------ |
 | 1.1   | Experiment types & schemas    | ⬜     | —      |
 | 1.1   | Cosmos containers             | ⬜     | —      |
 | 1.2   | Deterministic bucketing       | ⬜     | —      |
 | 1.2   | Assignment strategies         | ⬜     | —      |
 | 1.2   | Audience targeting            | ⬜     | —      |
 | 1.3   | Metric definitions            | ⬜     | —      |
 | 1.3   | Event ingestion               | ⬜     | —      |
 | 2.1   | Bayesian inference engine     | ⬜     | —      |
 | 2.1   | Probability calculations      | ⬜     | —      |
 | 2.1   | Credible intervals            | ⬜     | —      |
 | 2.2   | Early stopping rules          | ⬜     | —      |
 | 2.2   | Auto-promotion                | ⬜     | —      |
 | 2.2   | Guardrails                    | ⬜     | —      |
 | 2.3   | Thompson sampling             | ⬜     | —      |
 | 2.3   | Exploration vs exploitation   | ⬜     | —      |
 | 2.3   | Regret minimization           | ⬜     | —      |
 | 3.1   | Pattern detection             | ⬜     | —      |
 | 3.1   | Anomaly detection             | ⬜     | —      |
 | 3.2   | Hypothesis generation prompts | ⬜     | —      |
 | 3.2   | Hypothesis ranking            | ⬜     | —      |
 | 3.3   | Auto-experiment suggestions   | ⬜     | —      |
 | 4.1   | Experiments list page         | ⬜     | —      |
 | 4.1   | Creation wizard               | ⬜     | —      |
 | 4.2   | Live dashboard                | ⬜     | —      |
 | 4.2   | Statistical summary           | ⬜     | —      |
 | 4.3   | Results & reporting           | ⬜     | —      |
 | 4.3   | AI insights panel             | ⬜     | —      |
 **Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
 ---
 ## Quick Reference for Implementing Agent
 **📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md`
 **Key Files to Modify/Create:**
 ```
 services/platform-service/
 ├── src/
 │   ├── modules/ab-testing/
 │   │   ├── types.ts              # [1.1] Experiment, Variant, Assignment types
 │   │   ├── repository.ts         # [1.2] Data access layer
 │   │   ├── bucketing.ts          # [1.2] FNV-1a hash, sticky assignments
 │   │   ├── statistics.ts         # [2.1] Bayesian inference, Beta/Normal distributions
 │   │   ├── allocation.ts         # [2.3] Thompson sampling, bandit strategies
 │   │   ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
 │   │   ├── routes.ts             # [4] REST API
 │   │   └── ab-testing.test.ts    # Tests
 │   ├── lib/
 │   │   └── cosmos-init.ts        # [1.1] Add containers
 │   └── server.ts                 # Register routes
 dashboards/admin-web/
 ├── src/
 │   ├── app/(dashboard)/
 │   │   ├── experiments/
 │   │   │   ├── page.tsx          # [4.1] Experiments list
 │   │   │   ├── new/page.tsx      # [4.1] Creation wizard
 │   │   │   └── [id]/
 │   │   │       └── page.tsx      # [4.2] Live dashboard
 │   ├── lib/
 │   │   └── experiments-client.ts # API client
 │   └── components/
 │       └── experiments/          # Bayesian charts, variant cards
 ```
 **Commit Message Format:**
 ```
 feat(ab-testing): <description> [<task.code>]
 ```
 **Example:**
 ```bash
 git add services/platform-service/src/modules/ab-testing/
 git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"
 ```
 **Testing Requirements:**
 - Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
 - Statistical validation: A/A tests, known distribution tests
 - Integration: End-to-end experiment lifecycle
 **Dependencies:**
 - Feature flags module (reuse bucketing logic)
 - Telemetry module (event tracking)
 - Azure OpenAI (hypothesis generation)
 ---
 ## Appendix B: Statistical Methods
 ### Bayesian A/B Testing
 **Conversion Metrics (Beta-Binomial):**
 ```
 Posterior: Beta(α + conversions, β + non-conversions)
 Where α = β = 1 (uniform prior)
 Probability variant beats control:
 P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]
 ```
 **Continuous Metrics (Normal):**
 ```
 Posterior: Normal(μ_n, σ_n²)
 Where μ_n, σ_n updated via conjugate prior
 Probability variant beats control via Monte Carlo sampling
 ```
 ### Thompson Sampling
 ```
 For each incoming user:
  For each variant:
    Sample θ_i from variant's posterior distribution
  Assign user to variant with max(θ_i)
 Update variant's posterior after observing outcome
 ```
 ### Early Stopping
 ```
 Stop experiment when:
  max_variant P(beats control) > 0.95  → Winner found
  OR max_variant P(beats control) < 0.05 → No winner
  OR days_running > max_duration
  AND samples_per_variant > min_sample_size
 ```
 ---
 ## Appendix C: API Reference
 | Method | Endpoint                               | Auth     | Description                      |
 | ------ | -------------------------------------- | -------- | -------------------------------- |
 | POST   | `/ab-testing/experiments`              | Admin    | Create experiment                |
 | GET    | `/ab-testing/experiments`              | Admin    | List experiments                 |
 | GET    | `/ab-testing/experiments/:id`          | Admin    | Get experiment details           |
 | PATCH  | `/ab-testing/experiments/:id`          | Admin    | Update experiment                |
 | DELETE | `/ab-testing/experiments/:id`          | Admin    | Stop/archive experiment          |
 | POST   | `/ab-testing/experiments/:id/start`    | Admin    | Start experiment                 |
 | POST   | `/ab-testing/experiments/:id/pause`    | Admin    | Pause experiment                 |
 | POST   | `/ab-testing/experiments/:id/complete` | Admin    | Complete with winner             |
 | POST   | `/ab-testing/assign`                   | Any auth | Get variant assignment for user  |
 | POST   | `/ab-testing/events`                   | Any auth | Track experiment event           |
 | GET    | `/ab-testing/experiments/:id/results`  | Admin    | Get statistical results          |
 | GET    | `/ab-testing/suggestions`              | Admin    | AI-generated experiment ideas    |
 | POST   | `/ab-testing/hypotheses`               | Admin    | Generate hypothesis from pattern |
 ---
 ## Appendix D: Integration Points
 ### With Feature Flags Module
 - Experiments build on feature flag infrastructure
 - Flag state = variant assignment
 - Consistent bucketing with existing flags
 ### With Telemetry Module
 - Experiment events enriched with telemetry context
 - Automatic metric tracking from existing events
 - Funnel analysis using telemetry breadcrumbs
 ### With Event Bus
 | Event                         | Action                              |
 | ----------------------------- | ----------------------------------- |
 | `ab.experiment.started`       | Notify stakeholders, log audit      |
 | `ab.experiment.completed`     | Generate report, suggest follow-ups |
 | `ab.variant.declared_winner`  | Trigger auto-rollout if enabled     |
 | `ab.early_stopping.triggered` | Alert experiment owner              |
 ---
 ## Appendix E: Cost Estimation
 | Component                    | Monthly Cost (est.)      |
 | ---------------------------- | ------------------------ |
 | Cosmos DB (experiment data)  | $100–200                 |
 | LLM hypothesis generation    | $50–100 (weekly reports) |
 | Compute (statistical engine) | $50 (negligible)         |
 | **Total**                    | **$200–350/month**       |
 ---
 ## Current Status
 - [ ] **Design complete** — Target: 2026-03-10
 - [ ] **Phase 1: Core Engine** — Not started
 - [ ] **Phase 2: Statistics** — Not started
 - [ ] **Phase 3: AI Hypotheses** — Not started
 - [ ] **Phase 4: Admin UI** — Not started
 - [ ] **Phase 5: Advanced** — Future
 **Estimated Timeline:** 2.5–3 weeks (Phases 1–4)
 **Dependencies:**
 - Feature flags module (for assignment infrastructure)
 - Telemetry module (for event tracking)
 - Azure OpenAI (for hypothesis generation)
 ---
 _Last Updated: 2026-03-03_
--- a/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
+++ b/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
@ -0,0 +1,848 @@
 # Predictive Churn & Health Scoring — Implementation Roadmap
 > **Module:** `platform-service/src/modules/predictive-analytics/`  
 > **Admin UI:** `/ops/health-dashboard/`  
 > **Target:** ML-powered churn prediction, health scoring, and proactive retention  
 > **Estimated Effort:** 3 weeks  
 > **Status:** 🟡 Planning
 ---
 ## Executive Summary
 This roadmap delivers a **predictive analytics platform** that forecasts user churn 7–30 days in advance and computes product health scores from telemetry. Unlike reactive dashboards that show what happened, this system **predicts what will happen**—enabling proactive retention campaigns, resource allocation, and product improvements before users leave.
 ### Key Differentiators vs. Reactive Analytics
 | Capability          | Traditional Analytics      | Predictive Churn & Health         |
 | ------------------- | -------------------------- | --------------------------------- |
 | Insight Type        | Historical (what happened) | **Predictive (what will happen)** |
 | Churn Detection     | After user leaves          | **7–30 days before churn**        |
 | Health View         | Current snapshot only      | **Trending + forecasted**         |
 | Interventions       | Reactive recovery          | **Proactive prevention**          |
 | Product Insights    | Manual pattern search      | **Auto-detected risk signals**    |
 | Resource Allocation | Guesswork                  | **Risk-weighted prioritization**  |
 ---
 ## Phase 1: Feature Engineering Pipeline (Week 1)
 ### 1.1 Telemetry Feature Extraction
 - [ ] **1.1.1** Create `modules/predictive-analytics/feature-extractor.ts`
  - [ ] User behavior features (session frequency, depth, recency)
  - [ ] Engagement features (feature usage diversity, core action completion)
  - [ ] Performance features (error rate, latency exposure, crash frequency)
  - [ ] Social features (sharing, collaboration, network effects)
  - [ ] Revenue features (payment history, plan changes, support tickets)
 - [ ] **1.1.2** Time-window aggregations
  - [ ] Last 24 hours (recent behavior)
  - [ ] Last 7 days (weekly patterns)
  - [ ] Last 30 days (monthly trends)
  - [ ] Life-to-date (all-time totals)
 - [ ] **1.1.3** Rolling window features
  - [ ] 7-day rolling average (trend smoothing)
  - [ ] Week-over-week change (acceleration)
  - [ ] Cohort-normalized scores (vs. similar users)
 ### 1.2 Feature Store
 - [ ] **1.2.1** Create `modules/predictive-analytics/feature-store.ts`
  - [ ] `UserFeatureVector` — normalized feature values per user
  - [ ] `ProductHealthMetrics` — aggregated product-level scores
  - [ ] Feature versioning (track feature schema changes)
 - [ ] **1.2.2** Add Cosmos containers to `cosmos-init.ts`
  - [ ] `user_features` (pk: `/userId`, TTL: 90 days)
  - [ ] `product_health` (pk: `/productId` + `/date`, time-series)
  - [ ] `feature_definitions` (pk: `/productId`, feature metadata)
 - [ ] **1.2.3** Feature computation jobs
  - [ ] Daily feature computation (nightly batch)
  - [ ] Real-time feature updates (on key events)
  - [ ] Feature backfill (compute historical features)
 ### 1.3 Product-Specific Feature Catalog
 - [ ] **1.3.1** Define features per product
  - [ ] **NomGap:** Fast completion rate, protocol adherence, streak length, autophagy engagement
  - [ ] **JarvisJr:** Session frequency, agent diversity, voice/text ratio, skill progression
  - [ ] **ChronoMind:** Timer completion rate, cascade effectiveness, routine adherence, urgency response
  - [ ] **MindLyst:** Brain usage diversity, triage accuracy, memory capture frequency, reflection completion
  - [ ] **PeakPulse:** Session frequency, goal completion, streak maintenance, social sharing
  - [ ] **LysnrAI:** Dictation frequency, accuracy rate, hotkey usage, vocabulary growth
 - [ ] **1.3.2** Feature importance tracking
  - [ ] Which features correlate with churn/retention
  - [ ] Feature drift detection (behavior changes over time)
  - [ ] Auto-suggest new features based on patterns
 **Phase 1 Exit Criteria:**
 - [ ] 50+ features extracted per product
 - [ ] Feature store populated for all active users
 - [ ] Daily feature computation job running
 - [ ] Feature importance analysis completed
 ---
 ## Phase 2: Churn Prediction Model (Week 1–2)
 ### 2.1 Model Architecture
 - [ ] **2.1.1** Create `modules/predictive-analytics/churn-model.ts`
  - [ ] Binary classification (will churn in 7 days? 30 days?)
  - [ ] Gradient Boosted Trees (XGBoost/LightGBM) baseline
  - [ ] Neural network ensemble (for comparison)
 - [ ] **2.1.2** Training pipeline
  - [ ] Label definition: No activity for N days = churned
  - [ ] Train/validation/test split (time-based, not random)
  - [ ] Cross-validation with temporal folds
  - [ ] Hyperparameter tuning (optuna/ray tune)
 - [ ] **2.1.3** Model evaluation
  - [ ] ROC-AUC (discrimination ability)
  - [ ] Precision/Recall at different thresholds
  - [ ] Calibration (predicted prob vs. actual rate)
  - [ ] Per-product performance breakdown
 ### 2.2 Prediction Service
 - [ ] **2.2.1** Real-time scoring API
  - [ ] `POST /predictive/churn-score` — single user prediction
  - [ ] `POST /predictive/churn-batch` — batch scoring
  - [ ] Latency < 100ms for single prediction
 - [ ] **2.2.2** Risk segmentation
  - [ ] Risk buckets: Critical (>80%), High (60–80%), Medium (30–60%), Low (<30%)
  - [ ] Risk score components (which features drive the score)
  - [ ] Confidence intervals on predictions
 - [ ] **2.2.3** Model versioning
  - [ ] A/B test model versions
  - [ ] Shadow mode (predict without acting)
  - [ ] Rollback capability
 ### 2.3 Explanation Engine
 - [ ] **2.3.1** SHAP value computation
  - [ ] Feature contributions to each prediction
  - [ ] Global feature importance (what drives churn overall)
  - [ ] Local explanations (why this specific user is at risk)
 - [ ] **2.3.2** Natural language explanations
  ```
  "This user shows 78% churn risk because:
  - Session frequency dropped 60% in the last week
  - No core feature usage in 5 days
  - Error rate increased 3x vs. their baseline
  - Similar users who showed these patterns had 85% churn rate"
  ```
 - [ ] **2.3.3** Actionable insight extraction
  - [ ] Top 3 risk factors per user
  - [ ] Suggested intervention based on risk profile
  - [ ] Priority ranking (who to contact first)
 **Phase 2 Exit Criteria:**
 - [ ] Model achieves > 75% AUC on test set
 - [ ] Real-time scoring API < 100ms latency
 - [ ] Explanations generated for all predictions
 - [ ] Risk segmentation validated against historical churn
 ---
 ## Phase 3: Product Health Scoring (Week 2)
 ### 3.1 Health Metric Framework
 - [ ] **3.1.1** Create `modules/predictive-analytics/health-scoring.ts`
  - [ ] Health dimensions: Acquisition, Activation, Retention, Revenue, Engagement
  - [ ] Composite health score (weighted average)
  - [ ] Per-dimension scores with drill-down
 - [ ] **3.1.2** Product health indicators
  - [ ] Daily Active Users (DAU) trend
  - [ ] New user activation rate (Day-1, Day-7)
  - [ ] Cohort retention curves (Day-1, Day-7, Day-30)
  - [ ] Feature adoption rates (new feature uptake)
  - [ ] Error rates and stability scores
  - [ ] Support ticket volume and sentiment
  - [ ] Revenue metrics (MRR, ARPU, LTV)
 ### 3.2 Health Score Computation
 - [ ] **3.2.1** Baseline establishment
  - [ ] Historical 90-day baseline for each metric
  - [ ] Peer product comparison (ChronoMind vs. JarvisJr benchmarks)
  - [ ] Industry benchmarks (if available)
 - [ ] **3.2.2** Scoring algorithm
  - [ ] Z-score normalization (how many std devs from baseline)
  - [ ] Trend direction (improving vs. declining)
  - [ ] Volatility adjustment (consistent vs. erratic)
  - [ ] 0–100 health score scale
 - [ ] **3.2.3** Alert thresholds
  - [ ] Critical: Score < 60 or 20% drop from baseline
  - [ ] Warning: Score 60–75 or 10% drop
  - [ ] Healthy: Score > 75 and stable
 ### 3.3 Anomaly Detection
 - [ ] **3.3.1** Statistical anomaly detection
  - [ ] Prophet/ARIMA for time-series forecasting
  - [ ] Forecast vs. actual deviation detection
  - [ ] Seasonal pattern recognition (day-of-week, monthly)
 - [ ] **3.3.2** Multi-dimensional anomaly detection
  - [ ] Correlation breakdown detection (metrics usually correlated diverging)
  - [ ] Cohort-specific anomalies (specific region, platform, segment)
 - [ ] **3.3.3** Root cause suggestion
  - [ ] Correlation with deployments/releases
  - [ ] Error spike correlation
  - [ ] External factor detection (holidays, events)
 **Phase 3 Exit Criteria:**
 - [ ] Health scores computed daily for all products
  - [ ] Anomaly detection with < 5% false positive rate
  - [ ] Historical baseline established for all metrics
 ---
 ## Phase 4: Proactive Intervention System (Week 2–3)
 ### 4.1 Retention Campaign Automation
 - [ ] **4.1.1** Campaign trigger rules
  - [ ] High-risk user enters segment → trigger email
  - [ ] Medium-risk + specific behavior → trigger in-app message
  - [ ] Critical risk → trigger personal outreach task
 - [ ] **4.1.2** Personalized messaging
  - [ ] Message variant based on risk factors
  - [ ] Feature recommendations based on unused capabilities
  - [ ] Success stories from similar users
 - [ ] **4.1.3** Campaign effectiveness tracking
  - [ ] Control group vs. treatment
  - [ ] Churn rate comparison
  - [ ] Revenue impact measurement
 ### 4.2 Auto-Trigger Flows
 - [ ] **4.2.1** Platform integrations
  - [ ] Email delivery via existing `modules/delivery/`
  - [ ] Push notifications via `modules/notifications/`
  - [ ] Slack notifications for CS team
  - [ ] CRM integration (create outreach tasks)
 - [ ] **4.2.2** Smart scheduling
  - [ ] Optimal contact time prediction
  - [ ] Frequency capping (don't spam)
  - [ ] Multi-channel orchestration
 - [ ] **4.2.3** Feedback loop
  - [ ] Track intervention outcomes
  - [ ] Retrain model with intervention effectiveness
  - [ ] A/B test intervention strategies
 ### 4.3 Risk Dashboard for CS Team
 - [ ] **4.3.1** At-risk user list
  - [ ] Sortable by churn probability
  - [ ] Filter by product, segment, risk factors
  - [ ] Last activity preview
 - [ ] **4.3.2** User risk profile
  - [ ] Churn probability trend over time
  - [ ] Key risk factors highlighted
  - [ ] Recommended actions
  - [ ] User activity timeline
 - [ ] **4.3.3** Intervention tracking
  - [ ] Contact history
  - [ ] Response tracking
  - [ ] Outcome recording (retained/churned)
 **Phase 4 Exit Criteria:**
 - [ ] Automated campaigns triggered for high-risk users
 - [ ] CS team dashboard with at-risk user queue
 - [ ] Intervention effectiveness measurement in place
 - [ ] Feedback loop improving model accuracy
 ---
 ## Phase 5: Admin Dashboard UI (Week 3)
 ### 5.1 Product Health Overview
 - [ ] **5.1.1** Create `/ops/health-dashboard/page.tsx`
  - [ ] Health score cards for each product
  - [ ] Trend sparklines (7-day, 30-day)
  - [ ] Alert summary (critical issues count)
  - [ ] Product comparison table
 - [ ] **5.1.2** Health detail view
  - [ ] Dimension breakdown (acquisition, activation, retention, etc.)
  - [ ] Metric time-series charts
  - [ ] Anomaly markers on charts
  - [ ] Cohort retention curves
 ### 5.2 Churn Prediction Dashboard
 - [ ] **5.2.1** Churn risk overview
  - [ ] Risk distribution pie chart
  - [ ] At-risk user count by product
  - [ ] Predicted churn impact (revenue at risk)
  - [ ] Model performance metrics (AUC, calibration)
 - [ ] **5.2.2** User risk explorer
  - [ ] Search/filter at-risk users
  - [ ] Risk score with explanation
  - [ ] Top risk factors
  - [ ] Recommended interventions
 - [ ] **5.2.3** Model insights
  - [ ] Global feature importance chart
  - [ ] Model performance over time
  - [ ] Feature drift alerts
 ### 5.3 Campaign Management
 - [ ] **5.3.1** Campaign list
  - [ ] Active/paused/completed campaigns
  - [ ] Trigger rules summary
  - [ ] Performance stats (sent, opened, converted)
 - [ ] **5.3.2** Campaign editor
  - [ ] Trigger condition builder
  - [ ] Message template editor
  - [ ] Audience targeting
  - [ ] A/B test configuration
 - [ ] **5.3.3** Campaign analytics
  - [ ] Funnel: triggered → sent → opened → retained
  - [ ] Revenue impact
  - [ ] Comparison to control group
 **Phase 5 Exit Criteria:**
 - [ ] Health dashboard shows all products with trends
 - [ ] Churn predictions visible with explanations
 - [ ] Campaign creation and management functional
 - [ ] Full test coverage
 ---
 ## Phase 6: Advanced Capabilities (Future)
 ### 6.1 Cohort-Specific Models
 - [ ] Segment-specific churn models (iOS vs. Android, free vs. pro)
 - [ ] Regional models (different behaviors by geography)
 - [ ] Temporal models (seasonal churn patterns)
 ### 6.2 LTV Prediction
 - [ ] Predict lifetime value at signup
 - [ ] Predict upgrade probability (free → pro)
 - [ ] Optimize acquisition channels by predicted LTV
 ### 6.3 Product Recommendations
 - [ ] Suggest features to at-risk users based on successful cohorts
 - [ ] Personalized onboarding based on predicted needs
 - [ ] Next-best-action recommendations
 ---
 ## Appendix A: Data Models
 ### UserChurnPredictionDoc
 ```typescript
 interface UserChurnPredictionDoc {
  id: string; // cp_<uuid>
  userId: string; // partition key
  productId: string;
  // Prediction
  predictionHorizon: 7 | 14 | 30; // Days
  churnProbability: number; // 0–1
  riskSegment: 'critical' | 'high' | 'medium' | 'low';
  // Feature vector snapshot
  features: Record<string, number>; // Normalized feature values
  featureVersion: string; // Schema version
  // Model info
  modelVersion: string;
  modelType: 'xgboost' | 'neural';
  predictionTimestamp: string;
  // Explanation (SHAP values)
  explanation: {
    topRiskFactors: Array<{
      feature: string;
      contribution: number; // SHAP value
      direction: 'positive' | 'negative'; // Increases or decreases churn risk
    }>;
    globalFeatureImportance: Array<{
      feature: string;
      importance: number;
    }>;
  };
  // Natural language summary
  nlExplanation: string; // Auto-generated explanation
  // Intervention
  suggestedActions: string[];
  interventionHistory: Array<{
    action: string;
    timestamp: string;
    outcome?: 'responded' | 'ignored' | 'churned' | 'retained';
  }>;
  // Validation (ground truth)
  actualChurned?: boolean;
  validationDate?: string;
  createdAt: string;
  ttl: number; // predictionHorizon + 90 days
 }
 ```
 ### ProductHealthScoreDoc
 ```typescript
 interface ProductHealthScoreDoc {
  id: string; // ph_<uuid>
  productId: string; // partition key
  date: string; // Sort key (YYYY-MM-DD)
  // Composite score
  overallHealthScore: number; // 0–100
  healthStatus: 'critical' | 'warning' | 'healthy';
  // Dimension scores
  dimensions: {
    acquisition: {
      score: number; // 0–100
      metrics: {
        newUsers: number;
        activationRateDay1: number;
        activationRateDay7: number;
        cac: number;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
    activation: {
      score: number;
      metrics: {
        firstValueMomentRate: number;
        timeToFirstAction: number;
        onboardingCompletionRate: number;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
    retention: {
      score: number;
      metrics: {
        dau: number;
        mau: number;
        dauMauRatio: number;
        day7Retention: number;
        day30Retention: number;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
    engagement: {
      score: number;
      metrics: {
        avgSessionLength: number;
        sessionsPerUser: number;
        featureAdoption: Record<string, number>;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
    revenue: {
      score: number;
      metrics: {
        mrr: number;
        arpu: number;
        churnRate: number;
        upgradeRate: number;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
    stability: {
      score: number;
      metrics: {
        crashFreeRate: number;
        errorRate: number;
        avgLatency: number;
        uptimePercent: number;
      };
      trend: 'improving' | 'stable' | 'declining';
    };
  };
  // Anomalies detected
  anomalies: Array<{
    metric: string;
    expectedValue: number;
    actualValue: number;
    deviationPercent: number;
    severity: 'critical' | 'warning';
    suggestedCause?: string;
  }>;
  // Forecasts
  forecasts: {
    next7Days: {
      expectedHealthScore: number;
      confidenceInterval: [number, number];
    };
    next30Days: {
      expectedHealthScore: number;
      confidenceInterval: [number, number];
    };
  };
  // Benchmarks
  vsBaseline7Day: number; // % change vs. 7-day baseline
  vsBaseline30Day: number; // % change vs. 30-day baseline
  createdAt: string;
  ttl: number; // 2 years
 }
 ```
 ### RetentionCampaignDoc
 ```typescript
 interface RetentionCampaignDoc {
  id: string; // rc_<uuid>
  productId: string; // partition key
  // Campaign definition
  name: string;
  description: string;
  status: 'draft' | 'active' | 'paused' | 'completed';
  // Trigger conditions
  trigger: {
    type: 'churn_risk' | 'health_score_drop' | 'behavioral' | 'scheduled';
    conditions: Array<{
      field: string;
      operator: 'gt' | 'lt' | 'eq' | 'in';
      value: unknown;
    }>;
  };
  // Audience
  audience: {
    riskSegments?: string[]; // 'critical', 'high', etc.
    products?: string[];
    userSegments?: string[];
    excludeRecentContact?: number; // Hours (frequency capping)
  };
  // Message content
  messages: Array<{
    channel: 'email' | 'push' | 'in_app' | 'slack_cs';
    templateId: string;
    variant?: string; // A/B test variant
    delayHours?: number; // Delay after trigger
    conditions?: Array<{
      field: string;
      operator: string;
      value: unknown;
    }>;
  }>;
  // Performance tracking
  stats: {
    triggered: number;
    sent: number;
    opened: number;
    clicked: number;
    converted: number;
    controlGroupSize: number;
    controlChurnRate: number;
    treatmentChurnRate: number;
  };
  createdAt: string;
  updatedAt: string;
  ttl: number; // 1 year after completion
 }
 ```
 ### UserFeatureVectorDoc
 ```typescript
 interface UserFeatureVectorDoc {
  id: string; // fv_<uuid>
  userId: string; // partition key
  productId: string;
  // Computed features
  features: {
    // Recency features
    daysSinceLastSession: number;
    daysSinceLastCoreAction: number;
    // Frequency features
    sessionsLast7Days: number;
    sessionsLast30Days: number;
    avgSessionsPerWeek: number;
    // Engagement depth
    avgSessionDuration: number;
    actionsPerSession: number;
    uniqueFeaturesUsed: number;
    // Product-specific (examples)
    // NomGap
    fastCompletionRate?: number;
    streakLength?: number;
    // JarvisJr
    agentDiversityScore?: number;
    voiceSessionRatio?: number;
    // ChronoMind
    timerCompletionRate?: number;
    routineAdherenceScore?: number;
    // Error/stability
    errorRateLast7Days: number;
    crashCountLast30Days: number;
    // Revenue
    planTier: number; // 0=free, 1=pro, 2=enterprise
    lifetimeValue: number;
    daysSinceLastPayment?: number;
  };
  // Normalized (0–1) for model input
  normalizedFeatures: Record<string, number>;
  // Metadata
  featureSchemaVersion: string;
  computedAt: string;
  // Time windows
  observationWindow: {
    start: string;
    end: string;
  };
  ttl: number; // 90 days
 }
 ```
 ---
 ## Implementation Tracking
 | Phase | Task                          | Status | Commit |
 | ----- | ----------------------------- | ------ | ------ |
 | 1.1   | Telemetry feature extraction  | ⬜     | —      |
 | 1.1   | Time-window aggregations      | ⬜     | —      |
 | 1.1   | Rolling window features       | ⬜     | —      |
 | 1.2   | Feature store                 | ⬜     | —      |
 | 1.2   | Cosmos containers             | ⬜     | —      |
 | 1.2   | Feature computation jobs      | ⬜     | —      |
 | 1.3   | Product-specific features     | ⬜     | —      |
 | 1.3   | Feature importance tracking   | ⬜     | —      |
 | 2.1   | XGBoost model architecture    | ⬜     | —      |
 | 2.1   | Training pipeline             | ⬜     | —      |
 | 2.1   | Model evaluation              | ⬜     | —      |
 | 2.2   | Real-time scoring API         | ⬜     | —      |
 | 2.2   | Risk segmentation             | ⬜     | —      |
 | 2.2   | Model versioning              | ⬜     | —      |
 | 2.3   | SHAP explanations             | ⬜     | —      |
 | 2.3   | Natural language explanations | ⬜     | —      |
 | 2.3   | Actionable insights           | ⬜     | —      |
 | 3.1   | Health metric framework       | ⬜     | —      |
 | 3.1   | Health indicators             | ⬜     | —      |
 | 3.2   | Baseline establishment        | ⬜     | —      |
 | 3.2   | Scoring algorithm             | ⬜     | —      |
 | 3.2   | Alert thresholds              | ⬜     | —      |
 | 3.3   | Anomaly detection             | ⬜     | —      |
 | 4.1   | Campaign trigger rules        | ⬜     | —      |
 | 4.1   | Personalized messaging        | ⬜     | —      |
 | 4.2   | Platform integrations         | ⬜     | —      |
 | 4.3   | CS team dashboard             | ⬜     | —      |
 | 5.1   | Health overview UI            | ⬜     | —      |
 | 5.2   | Churn prediction dashboard    | ⬜     | —      |
 | 5.3   | Campaign management           | ⬜     | —      |
 **Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
 ---
 ## Quick Reference for Implementing Agent
 **📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md`
 **Key Files to Modify/Create:**
 ```
 services/platform-service/
 ├── src/
 │   ├── modules/predictive-analytics/
 │   │   ├── types.ts              # [1.2] Feature, HealthScore, Prediction types
 │   │   ├── repository.ts         # Data access layer
 │   │   ├── feature-extractor.ts  # [1.1] Telemetry → features
 │   │   ├── feature-store.ts      # [1.2] Feature vector storage
 │   │   ├── churn-model.ts        # [2.1] XGBoost training & inference
 │   │   ├── scoring-api.ts        # [2.2] Real-time prediction endpoint
 │   │   ├── explanation-engine.ts # [2.3] SHAP + NL explanations
 │   │   ├── health-scoring.ts     # [3] Health dimension calculation
 │   │   ├── anomaly-detection.ts  # [3.3] Prophet/ARIMA forecasting
 │   │   ├── campaign-engine.ts    # [4] Retention automation
 │   │   ├── routes.ts             # [5] REST API
 │   │   └── predictive-analytics.test.ts # Tests
 │   ├── lib/
 │   │   └── cosmos-init.ts        # [1.2] Add containers
 │   └── server.ts                 # Register routes
 dashboards/admin-web/
 ├── src/
 │   ├── app/(dashboard)/
 │   │   ├── health-dashboard/
 │   │   │   └── page.tsx          # [5.1] Product health overview
 │   │   └── predictive/
 │   │       ├── at-risk/
 │   │       │   └── page.tsx      # [4.3] At-risk user list
 │   │       └── campaigns/
 │   │           └── page.tsx      # [5.3] Campaign management
 │   ├── lib/
 │   │   └── predictive-client.ts  # API client
 │   └── components/
 │       └── predictive/           # Risk cards, health charts
 ```
 **Commit Message Format:**
 ```
 feat(predictive-analytics): <description> [<task.code>]
 ```
 **Example:**
 ```bash
 git add services/platform-service/src/modules/predictive-analytics/
 git commit -m "feat(predictive-analytics): add feature extraction and store [1.1-1.2]"
 ```
 **Testing Requirements:**
 - Unit tests: 20+ Vitest tests for feature extraction, model inference
 - Model validation: AUC, calibration, precision@k metrics
 - Integration: End-to-end prediction pipeline
 **Dependencies:**
 - Telemetry module (feature extraction)
 - Delivery module (retention campaigns)
 - Azure ML or scikit-learn (model training)
 ---
 ## Appendix B: API Reference
 | Method | Endpoint                               | Auth          | Description                   |
 | ------ | -------------------------------------- | ------------- | ----------------------------- |
 | GET    | `/predictive/health`                   | Admin         | Get all product health scores |
 | GET    | `/predictive/health/:productId`        | Admin         | Get product health detail     |
 | GET    | `/predictive/health/:productId/trends` | Admin         | Historical health trends      |
 | POST   | `/predictive/churn-score`              | Admin/Service | Get churn prediction for user |
 | POST   | `/predictive/churn-batch`              | Admin         | Batch churn scoring           |
 | GET    | `/predictive/at-risk-users`            | Admin/CS      | List users by risk segment    |
 | GET    | `/predictive/users/:id/risk-profile`   | Admin/CS      | User churn risk details       |
 | GET    | `/predictive/model/performance`        | Admin         | Model accuracy metrics        |
 | GET    | `/predictive/model/features`           | Admin         | Feature importance ranking    |
 | GET    | `/predictive/campaigns`                | Admin         | List retention campaigns      |
 | POST   | `/predictive/campaigns`                | Admin         | Create campaign               |
 | PATCH  | `/predictive/campaigns/:id`            | Admin         | Update campaign               |
 | GET    | `/predictive/campaigns/:id/stats`      | Admin         | Campaign performance          |
 | POST   | `/predictive/campaigns/:id/trigger`    | Admin         | Manual trigger for testing    |
 ---
 ## Appendix C: Integration Points
 ### With Telemetry Module
 - Raw events feed feature extraction
 - Error rates flow into health scores
 - Correlation IDs link behaviors to predictions
 ### With Diagnostics Module
 - Debug sessions enrich feature vectors
 - Error clusters correlate with churn risk
 - Screenshot patterns analyzed for UX issues
 ### With Event Bus
 | Event                               | Action                                   |
 | ----------------------------------- | ---------------------------------------- |
 | `predictive.churn.risk_detected`    | Trigger retention campaign               |
 | `predictive.health.critical`        | Alert leadership, suggest debug sessions |
 | `predictive.anomaly.detected`       | Create incident, notify on-call          |
 | `user.retention.campaign_responded` | Update model with outcome                |
 ### With Delivery Module
 - Retention campaigns use email templates
 - Push notifications for urgent interventions
 - A/B test message variants
 ---
 ## Appendix D: Cost Estimation
 | Component                            | Monthly Cost (est.)        |
 | ------------------------------------ | -------------------------- |
 | Cosmos DB (features + predictions)   | $150–300                   |
 | Model training (Azure ML)            | $100–200                   |
 | Inference compute                    | $50–100                    |
 | Email delivery (retention campaigns) | $50–200 (volume-dependent) |
 | **Total**                            | **$350–800/month**         |
 ROI: If system prevents 5% of predicted churn at $50 LTV with 10K at-risk users/month:
 - 500 users retained × $50 = $25K/month value
 - 10:1+ ROI
 ---
 ## Appendix E: Success Metrics
 ### Model Performance
 - [ ] AUC > 75% (discrimination)
 - [ ] Calibration slope 0.9–1.1 (well-calibrated probabilities)
 - [ ] Precision@10% > 60% (high-risk predictions are accurate)
 ### Business Impact
 - [ ] 10%+ reduction in churn rate for targeted cohorts
 - [ ] 5%+ increase in re-engagement campaign response
 - [ ] CS team satisfaction with at-risk user visibility
 ### Operational
 - [ ] Prediction latency < 100ms
 - [ ] Feature freshness < 24 hours
 - [ ] Model retraining automation
 ---
 ## Current Status
 - [ ] **Design complete** — Target: 2026-03-10
 - [ ] **Phase 1: Feature Pipeline** — Not started
 - [ ] **Phase 2: Churn Model** — Not started
 - [ ] **Phase 3: Health Scoring** — Not started
 - [ ] **Phase 4: Interventions** — Not started
 - [ ] **Phase 5: Admin UI** — Not started
 - [ ] **Phase 6: Advanced** — Future
 **Estimated Timeline:** 3 weeks (Phases 1–5)
 **Dependencies:**
 - Telemetry module (for feature extraction)
 - Azure ML or similar (for model training)
 - Delivery module (for retention campaigns)
 ---
 _Last Updated: 2026-03-03_