docs(roadmaps): add AI diagnostic assistant, A/B testing, and churn prediction roadmaps

- AI Diagnostic Assistant: LLM-powered root cause analysis, error clustering, natural language queries - Intelligent A/B Testing: Thompson sampling, Bayesian early stopping, AI hypothesis generation - Predictive Churn & Health: XGBoost models, health scoring, automated retention campaigns All roadmaps include: - Implementation tracking tables with status/commit columns - Quick reference sections with file structures - Phase-by-phase task breakdowns with [X.Y.Z] codes
2026-03-03 11:38:56 -08:00 · 2026-03-03 11:38:56 -08:00 · e98380003b
commit e98380003b
parent d510867b87
3 changed files with 2164 additions and 0 deletions
--- a/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
+++ b/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
@ -0,0 +1,597 @@
+# AI Diagnostic Assistant — Implementation Roadmap
+
+> **Module:** `platform-service/src/modules/ai-diagnostics/`  
+> **Admin UI:** `/ops/ai-diagnostics/`  
+> **Target:** LLM-powered root cause analysis from telemetry + debug sessions  
+> **Estimated Effort:** 2–3 weeks  
+> **Status:** 🟡 Planning
+
+---
+
+## Executive Summary
+
+This roadmap delivers an **AI-powered diagnostic assistant** that analyzes error patterns, debug session data, and telemetry to automatically suggest root causes—like having a senior engineer on-call 24/7. Engineers can ask natural language questions like _"Why did the iOS keyboard crash yesterday?"_ and receive AI-generated hypotheses with supporting evidence.
+
+### Key Differentiators vs. Manual Debugging
+
+| Feature           | Manual Debugging            | AI Diagnostic Assistant             |
+| ----------------- | --------------------------- | ----------------------------------- |
+| Query             | SQL + log grep              | **Natural language**                |
+| Pattern Detection | Hours of manual correlation | **AI finds hidden patterns**        |
+| Context Assembly  | Check 5+ systems manually   | **Auto-assembles timeline**         |
+| Hypothesis        | Engineer intuition          | **LLM-generated + evidence**        |
+| Learning          | Per-engineer experience     | **Accumulates across all sessions** |
+
+---
+
+## Phase 1: Data Pipeline & Embeddings (Week 1)
+
+**Goal:** Extract, normalize, and embed error data for semantic search and clustering.
+
+### 1.1 Error Fingerprinting & Clustering
+
+- [ ] **1.1.1** Create `modules/ai-diagnostics/types.ts`
+  - [ ] `ErrorClusterDoc` — grouped similar errors with signature
+  - [ ] `ErrorFingerprint` — normalized stack trace hash
+  - [ ] `ClusterAnalysis` — AI-generated pattern description
+  - [ ] Zod schemas for all inputs
+
+  _Commit format:_ `git commit -m "feat(ai-diagnostics): add error clustering types [1.1.1]"` → `https://github.com/saravanakumardb1/learning_ai_common_plat/commit/<hash>`
+
+- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
+  - [ ] `error_clusters` (pk: `/productId`, TTL: 90 days)
+  - [ ] `error_fingerprints` (pk: `/fingerprintHash`, unique index)
+  - [ ] `diagnostic_insights` (pk: `/clusterId`, AI-generated analyses)
+
+  _Commit format:_ `git commit -m "feat(ai-diagnostics): add cosmos containers for error clustering [1.1.2]"`
+
+- [ ] **1.1.3** Implement error normalization
+  - [ ] Stack trace parsing (remove line numbers, file paths)
+  - [ ] Message templating (replace UUIDs, timestamps, user IDs with placeholders)
+  - [ ] Fingerprint generation (SHA-256 of normalized error)
+  - [ ] Similarity scoring (Levenshtein for near-matches)
+
+  _Commit format:_ `git commit -m "feat(ai-diagnostics): implement error normalization and fingerprinting [1.1.3]"`
+
+### 1.2 Vector Embeddings for Semantic Search
+
+- [ ] **1.2.1** Create embedding pipeline
+  - [ ] Azure OpenAI `text-embedding-3-small` integration
+  - [ ] Error message + stack trace → 1536-dim vector
+  - [ ] Batch embedding job (100 errors at a time)
+- [ ] **1.2.2** Cosmos DB vector search setup
+  - [ ] Store embeddings in `error_clusters` documents
+  - [ ] Cosine similarity query function
+  - [ ] Similar error lookup by vector distance
+- [ ] **1.2.3** Clustering algorithm
+  - [ ] HDBSCAN for density-based clustering
+  - [ ] DBSCAN fallback for smaller datasets
+  - [ ] Auto-determine cluster count (no manual k)
+  - [ ] Re-cluster nightly as new errors arrive
+
+### 1.3 Telemetry Ingestion for Context
+
+- [ ] **1.3.1** Link telemetry to errors
+  - [ ] `correlationId` propagation across services
+  - [ ] 5-minute window: error → preceding telemetry events
+  - [ ] Session state reconstruction (what user was doing)
+- [ ] **1.3.2** Enrich error context
+  - [ ] Device info (OS version, model, memory)
+  - [ ] App state (screen, feature flags, config)
+  - [ ] Recent API calls (network trace from diagnostics)
+  - [ ] Recent user actions (breadcrumb trail)
+
+**Phase 1 Exit Criteria:**
+
+- [ ] Errors auto-clustered with 90%+ accuracy
+- [ ] Vector search returns semantically similar errors
+- [ ] 10,000+ historical errors embedded and clustered
+- [ ] Correlation pipeline links errors to telemetry context
+
+---
+
+## Phase 2: LLM Analysis Engine (Week 1–2)
+
+### 2.1 Prompt Engineering & Analysis Pipeline
+
+- [ ] **2.1.1** Create analysis prompts
+  - [ ] `ROOT_CAUSE_ANALYSIS` prompt template
+
+    ```
+    Given this error cluster:
+    - Error signature: {fingerprint}
+    - Sample stack traces: {samples}
+    - Common context: {deviceStats}, {appState}
+    - Preceding events: {breadcrumbSummary}
+    - Similar resolved issues: {relatedClusters}
+
+    Analyze and provide:
+    1. Likely root cause category (config, dependency, logic, resource, external)
+    2. Specific hypothesis with reasoning
+    3. Evidence confidence (high/medium/low)
+    4. Suggested investigation steps
+    5. Potential fix direction
+    ```
+
+  - [ ] `PATTERN_SUMMARY` prompt for cluster descriptions
+  - [ ] `COMPARATIVE_ANALYSIS` for error vs. baseline
+
+- [ ] **2.1.2** LLM integration
+  - [ ] Azure OpenAI GPT-4o-mini for analysis (cost-effective)
+  - [ ] GPT-4o for complex multi-factor analysis
+  - [ ] Response JSON schema enforcement
+  - [ ] Retry logic with exponential backoff
+
+### 2.2 Insight Generation Service
+
+- [ ] **2.2.1** Create `modules/ai-diagnostics/analyzer.ts`
+  - [ ] `analyzeCluster(clusterId)` — full analysis workflow
+  - [ ] `generateInsight(errorContext)` — single error analysis
+  - [ ] `compareClusters(clusterA, clusterB)` — diff analysis
+- [ ] **2.2.2** Analysis workflow
+  - [ ] Fetch cluster data + related telemetry
+  - [ ] Build LLM context (respect token limits)
+  - [ ] Call LLM with structured prompt
+  - [ ] Parse and validate response
+  - [ ] Store insight in `diagnostic_insights`
+- [ ] **2.2.3** Confidence scoring
+  - [ ] Evidence count weighting
+  - [ ] Similar resolved issue bonus
+  - [ ] Recency decay (older patterns = lower confidence)
+  - [ ] Multi-model consensus (if available)
+
+### 2.3 Continuous Learning
+
+- [ ] **2.3.1** Feedback loop
+  - [ ] Engineer feedback: "Was this insight helpful? 👍/👎"
+  - [ ] Resolution tracking (link commits to clusters)
+  - [ ] Confidence recalibration based on outcomes
+- [ ] **2.3.2** Pattern accumulation
+  - [ ] "Known issues" database (manually curated)
+  - [ ] Historical fix patterns (what solved similar issues)
+  - [ ] Regression detection (old issue reappearing)
+
+**Phase 2 Exit Criteria:**
+
+- [ ] LLM generates root cause hypotheses with evidence
+- [ ] Confidence scores align with actual resolution rates
+- [ ] Analysis completes in < 5 seconds for typical clusters
+- [ ] Feedback loop capturing engineer ratings
+
+---
+
+## Phase 3: Natural Language Query Interface (Week 2)
+
+### 3.1 Query Understanding
+
+- [ ] **3.1.1** Create `modules/ai-diagnostics/query-parser.ts`
+  - [ ] Intent classification (root cause, pattern search, comparison, trend)
+  - [ ] Entity extraction (product, time range, error type, user segment)
+  - [ ] Temporal parsing ("yesterday", "last week", "since v2.1")
+  - [ ] Constraint identification ("only iOS", "excluding beta users")
+- [ ] **3.1.2** Query patterns
+  - [ ] Root cause: _"Why did X happen?"_ → analyze cluster
+  - [ ] Pattern search: _"Show me similar crashes"_ → vector search
+  - [ ] Comparison: _"Did error rate increase after release?"_ → trend analysis
+  - [ ] User impact: _"How many users affected by Y?"_ → aggregation query
+
+### 3.2 Query Execution Engine
+
+- [ ] **3.2.1** Query → data pipeline
+  - [ ] Map entities to Cosmos queries
+  - [ ] Fetch relevant clusters, telemetry, sessions
+  - [ ] Assemble context for response generation
+- [ ] **3.2.2** Response generation
+  - [ ] Direct answers for simple queries
+  - [ ] AI-generated summaries for complex analysis
+  - [ ] Data + visualization suggestions
+  - [ ] Drill-down links for exploration
+
+### 3.3 REST API Routes
+
+- [ ] **3.3.1** Create `modules/ai-diagnostics/routes.ts`
+  - [ ] `POST /ai-diagnostics/query` — natural language question
+  - [ ] `GET /ai-diagnostics/clusters/:id/analysis` — pre-computed insight
+  - [ ] `POST /ai-diagnostics/clusters/:id/analyze` — trigger fresh analysis
+  - [ ] `GET /ai-diagnostics/suggestions` — auto-suggested investigations
+  - [ ] `POST /ai-diagnostics/feedback` — submit insight rating
+
+**Phase 3 Exit Criteria:**
+
+- [ ] Natural language queries parse correctly (90%+ intent accuracy)
+- [ ] Query → response pipeline < 3 seconds
+- [ ] Complex queries return structured answers with evidence
+- [ ] API routes tested and documented
+
+---
+
+## Phase 4: Admin Dashboard UI (Week 2–3)
+
+### 4.1 AI Insights Page
+
+- [ ] **4.1.1** Create `/ops/ai-diagnostics/page.tsx`
+  - [ ] Smart search bar (natural language input)
+  - [ ] Suggested queries based on recent errors
+  - [ ] Recent AI-generated insights list
+  - [ ] Trending clusters (auto-detected anomalies)
+- [ ] **4.1.2** Query results view
+  - [ ] AI-generated answer with confidence badge
+  - [ ] Supporting evidence cards (cluster stats, sample errors)
+  - [ ] Related debug sessions (linked traces)
+  - [ ] Timeline visualization of error pattern
+  - [ ] "Investigate further" actions
+
+### 4.2 Cluster Detail with AI Analysis
+
+- [ ] **4.2.1** Enhance error cluster detail
+  - [ ] AI-generated summary card ("This appears to be...")
+  - [ ] Root cause hypothesis with confidence
+  - [ ] Evidence breakdown (stack samples, device patterns, API failures)
+  - [ ] Suggested fixes from similar resolved issues
+  - [ ] "Request deeper analysis" button (GPT-4o)
+- [ ] **4.2.2** Interactive investigation
+  - [ ] Compare with other clusters ("Show me similar issues")
+  - [ ] Filter by context (OS version, app version, feature flags)
+  - [ ] View affected user journeys (breadcrumb trails)
+
+### 4.3 Proactive Alerts
+
+- [ ] **4.3.1** Anomaly detection
+  - [ ] Auto-detect emerging error clusters
+  - [ ] Spike in existing cluster frequency
+  - [ ] New error types after releases
+- [ ] **4.3.2** AI-generated alerts
+  - [ ] Slack/Teams notification with summary
+  - [ ] "Investigate in AI Diagnostics" deep link
+  - [ ] Auto-started debug session recommendations
+
+**Phase 4 Exit Criteria:**
+
+- [ ] Admin can ask questions and get AI-generated answers
+- [ ] Cluster detail shows AI analysis with evidence
+- [ ] Proactive alerts for emerging issues
+- [ ] Full test coverage (UI + API)
+
+---
+
+## Phase 5: Advanced Capabilities (Future)
+
+### 5.1 Multi-Modal Analysis
+
+- [ ] Analyze screenshots from debug sessions for UI issues
+- [ ] Voice transcription analysis (for voice app errors)
+- [ ] Performance trace visualization with AI annotations
+
+### 5.2 Predictive Diagnostics
+
+- [ ] Pre-crash pattern detection (warn before crash happens)
+- [ ] Resource exhaustion prediction (memory, disk, API quotas)
+- [ ] Config drift detection ("this setting combination often fails")
+
+### 5.3 Self-Healing Suggestions
+
+- [ ] Auto-generated config recommendations
+- [ ] Feature flag rollback suggestions
+- [ ] Circuit breaker threshold recommendations
+
+## Implementation Tracking
+
+| Phase | Task                       | Status | Commit |
+| ----- | -------------------------- | ------ | ------ |
+| 1.1   | Error clustering types     | ⬜     | —      |
+| 1.1   | Cosmos containers          | ⬜     | —      |
+| 1.1   | Error normalization        | ⬜     | —      |
+| 1.2   | Embedding pipeline         | ⬜     | —      |
+| 1.2   | Vector search setup        | ⬜     | —      |
+| 1.2   | Clustering algorithm       | ⬜     | —      |
+| 1.3   | Telemetry linking          | ⬜     | —      |
+| 1.3   | Error context enrichment   | ⬜     | —      |
+| 2.1   | Analysis prompts           | ⬜     | —      |
+| 2.1   | LLM integration            | ⬜     | —      |
+| 2.2   | Insight generation service | ⬜     | —      |
+| 2.2   | Analysis workflow          | ⬜     | —      |
+| 2.2   | Confidence scoring         | ⬜     | —      |
+| 2.3   | Feedback loop              | ⬜     | —      |
+| 2.3   | Pattern accumulation       | ⬜     | —      |
+| 3.1   | Query parser               | ⬜     | —      |
+| 3.1   | Query patterns             | ⬜     | —      |
+| 3.2   | Query execution            | ⬜     | —      |
+| 3.2   | Response generation        | ⬜     | —      |
+| 3.3   | REST API routes            | ⬜     | —      |
+| 4.1   | AI insights page           | ⬜     | —      |
+| 4.1   | Query results view         | ⬜     | —      |
+| 4.2   | Cluster detail             | ⬜     | —      |
+| 4.2   | Interactive investigation  | ⬜     | —      |
+| 4.3   | Proactive alerts           | ⬜     | —      |
+
+**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
+
+---
+
+## Quick Reference for Implementing Agent
+
+**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md`
+
+**Key Files to Modify/Create:**
+
+```
+services/platform-service/
+├── src/
+│   ├── modules/ai-diagnostics/
+│   │   ├── types.ts              # [1.1.1] Error clustering types
+│   │   ├── repository.ts         # [1.2] Data access layer
+│   │   ├── analyzer.ts           # [2.2] LLM analysis engine
+│   │   ├── query-parser.ts       # [3.1] NL query understanding
+│   │   ├── query-executor.ts     # [3.2] Query execution
+│   │   ├── routes.ts             # [3.3] REST API
+│   │   └── ai-diagnostics.test.ts # Tests
+│   ├── lib/
+│   │   ├── cosmos-init.ts        # [1.1.2] Add containers
+│   │   ├── embedding-client.ts   # [1.2.1] Azure OpenAI embeddings
+│   │   └── pii-redaction.ts      # Reuse existing
+│   └── server.ts                 # [3.3] Register routes
+dashboards/admin-web/
+├── src/
+│   ├── app/(dashboard)/
+│   │   ├── ai-diagnostics/
+│   │   │   ├── page.tsx          # [4.1] Main insights page
+│   │   │   └── [id]/
+│   │   │       └── page.tsx      # [4.2] Cluster detail
+│   ├── lib/
+│   │   └── ai-diagnostics-client.ts # API client
+│   └── components/
+│       └── ai-diagnostics/       # Reusable components
+```
+
+**Commit Message Format:**
+
+```
+feat(ai-diagnostics): <description> [<task.code>]
+```
+
+**Example:**
+
+```bash
+git add services/platform-service/src/modules/ai-diagnostics/
+git commit -m "feat(ai-diagnostics): add error clustering types and cosmos containers [1.1.1-1.1.2]"
+```
+
+**Testing Requirements:**
+
+- Unit tests: 20+ Vitest tests for clustering, embeddings, LLM responses
+- Integration tests: End-to-end query → analysis pipeline
+
+**Dependencies:**
+
+- Telemetry module (error events)
+- Azure OpenAI (embeddings + GPT-4o)
+- Existing diagnostics module (optional linking)
+
+---
+
+### ErrorClusterDoc
+
+```typescript
+interface ErrorClusterDoc {
+  id: string; // ec_<uuid>
+  productId: string; // partition key
+  fingerprintHash: string; // SHA-256 of normalized error
+
+  // Cluster metadata
+  firstSeenAt: string; // ISO 8601
+  lastSeenAt: string;
+  occurrenceCount: number; // Total occurrences
+  uniqueUsers: number; // Affected user count
+
+  // Error signature
+  errorType: string; // Exception class/name
+  messageTemplate: string; // Normalized message with placeholders
+  stackSignature: string; // Normalized stack frames
+
+  // Vector embedding for semantic search
+  embedding: number[]; // 1536-dim from text-embedding-3-small
+  embeddingVersion: string; // Model version for re-embedding
+
+  // Context patterns (auto-extracted)
+  commonContext: {
+    osVersions: Array<{ version: string; count: number }>;
+    appVersions: Array<{ version: string; count: number }>;
+    deviceModels: Array<{ model: string; count: number }>;
+    screenContexts: Array<{ screen: string; count: number }>;
+  };
+
+  // Related data
+  relatedClusterIds: string[]; // Similar clusters (vector similarity)
+  mergedIntoClusterId?: string; // If deduplicated
+
+  // Resolution tracking
+  status: 'active' | 'investigating' | 'resolved' | 'ignored';
+  resolvedAt?: string;
+  resolutionCommit?: string; // Link to fix
+
+  // Timestamps
+  createdAt: string;
+  updatedAt: string;
+  ttl: number; // 90 days
+}
+```
+
+### DiagnosticInsightDoc
+
+```typescript
+interface DiagnosticInsightDoc {
+  id: string; // di_<uuid>
+  clusterId: string; // partition key (with productId)
+  productId: string;
+
+  // AI-generated analysis
+  analysisType: 'root_cause' | 'pattern' | 'comparison' | 'trend';
+  generatedAt: string;
+
+  // LLM output
+  rootCauseCategory: 'config' | 'dependency' | 'logic' | 'resource' | 'external' | 'unknown';
+  hypothesis: string; // Natural language explanation
+  reasoning: string; // Why LLM thinks this
+  confidence: 'high' | 'medium' | 'low';
+  confidenceScore: number; // 0.0–1.0
+
+  // Evidence
+  evidence: Array<{
+    type:
+      | 'stack_trace'
+      | 'telemetry_pattern'
+      | 'device_correlation'
+      | 'api_failure'
+      | 'similar_issue';
+    description: string;
+    strength: 'strong' | 'moderate' | 'weak';
+    data: Record<string, unknown>;
+  }>;
+
+  // Suggested actions
+  suggestedInvestigation: string[];
+  potentialFixDirection?: string;
+  similarResolvedIssues?: Array<{
+    clusterId: string;
+    resolution: string;
+    confidence: number;
+  }>;
+
+  // Feedback
+  feedbackStats: {
+    helpful: number;
+    notHelpful: number;
+    engineerNotes: string[];
+  };
+
+  // LLM metadata
+  modelUsed: string; // gpt-4o, gpt-4o-mini
+  promptTokens: number;
+  completionTokens: number;
+
+  createdAt: string;
+  ttl: number; // 90 days
+}
+```
+
+### NaturalLanguageQueryDoc
+
+```typescript
+interface NaturalLanguageQueryDoc {
+  id: string; // nq_<uuid>
+  userId: string; // Admin who asked
+  productId?: string; // Optional filter
+
+  // Query
+  rawQuery: string; // "Why did iOS keyboard crash yesterday?"
+  parsedIntent: 'root_cause' | 'pattern_search' | 'comparison' | 'trend' | 'impact';
+  extractedEntities: {
+    products?: string[];
+    timeRange?: { start: string; end: string };
+    errorTypes?: string[];
+    platforms?: string[];
+    userSegments?: string[];
+  };
+
+  // Execution
+  executedQuery: string; // Translated Cosmos query
+  dataSources: string[]; // Clusters, telemetry, sessions accessed
+  executionTimeMs: number;
+
+  // Response
+  aiResponse: string; // Generated answer
+  confidence: number; // Overall confidence
+  supportingData: Array<{
+    type: 'cluster' | 'telemetry' | 'session';
+    id: string;
+    relevanceScore: number;
+  }>;
+
+  // Feedback
+  userRating?: 'helpful' | 'not_helpful';
+  userComment?: string;
+
+  createdAt: string;
+  ttl: number; // 30 days
+}
+```
+
+---
+
+## Appendix B: API Reference
+
+| Method | Endpoint                                | Auth  | Description                             |
+| ------ | --------------------------------------- | ----- | --------------------------------------- |
+| POST   | `/ai-diagnostics/query`                 | Admin | Natural language diagnostic query       |
+| GET    | `/ai-diagnostics/clusters`              | Admin | List error clusters (with AI summaries) |
+| GET    | `/ai-diagnostics/clusters/:id`          | Admin | Cluster detail with AI analysis         |
+| POST   | `/ai-diagnostics/clusters/:id/analyze`  | Admin | Trigger fresh LLM analysis              |
+| GET    | `/ai-diagnostics/clusters/:id/analysis` | Admin | Get pre-computed insight                |
+| GET    | `/ai-diagnostics/suggestions`           | Admin | AI-suggested investigations             |
+| POST   | `/ai-diagnostics/feedback`              | Admin | Rate insight helpfulness                |
+| POST   | `/ai-diagnostics/search`                | Admin | Semantic search across errors           |
+
+---
+
+## Appendix C: Integration Points
+
+### With Telemetry Module
+
+- Error events auto-create/update clusters
+- Telemetry context enriches error analysis
+- Correlation IDs link errors to user journeys
+
+### With Diagnostics Module
+
+- Debug sessions linked to error clusters
+- Screenshots from sessions aid visual analysis
+- Network traces provide API failure context
+
+### With Event Bus
+
+| Event                           | Action                                                    |
+| ------------------------------- | --------------------------------------------------------- |
+| `telemetry.error.ingested`      | Update/create cluster, trigger re-analysis if new pattern |
+| `diagnostics.session.completed` | Link session to related clusters, analyze captured logs   |
+| `diagnostics.ingest.fatal`      | High-priority cluster analysis, alert if novel pattern    |
+
+---
+
+## Appendix D: Cost Estimation
+
+| Component                | Monthly Cost (est.)             |
+| ------------------------ | ------------------------------- |
+| Azure OpenAI embeddings  | $50–100 (10K errors/day)        |
+| GPT-4o-mini analysis     | $100–200 (1K analyses/day)      |
+| GPT-4o deep analysis     | $50–100 (100 deep analyses/day) |
+| Cosmos DB vector storage | $20–50                          |
+| **Total**                | **$220–450/month**              |
+
+Optimization:
+
+- Cache frequent cluster analyses (24hr TTL)
+- Use GPT-4o-mini for 90% of queries
+- Batch embedding jobs during off-peak
+
+---
+
+## Current Status
+
+- [ ] **Design complete** — Target: 2026-03-10
+- [ ] **Phase 1: Data Pipeline** — Not started
+- [ ] **Phase 2: LLM Engine** — Not started
+- [ ] **Phase 3: Query Interface** — Not started
+- [ ] **Phase 4: Admin UI** — Not started
+- [ ] **Phase 5: Advanced Capabilities** — Future
+
+**Estimated Timeline:** 2–3 weeks (Phases 1–4)
+
+**Dependencies:**
+
+- Telemetry module (must be collecting errors)
+- Diagnostics module (optional, for rich context)
+- Azure OpenAI deployment (embedding + GPT-4o access)
+
+---
+
+_Last Updated: 2026-03-03_
--- a/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
+++ b/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
@ -0,0 +1,719 @@
+# Intelligent A/B Testing — Implementation Roadmap
+
+> **Module:** `platform-service/src/modules/ab-testing/`  
+> **Admin UI:** `/ops/experiments/`  
+> **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation  
+> **Estimated Effort:** 2.5–3 weeks  
+> **Status:** 🟡 Planning
+
+---
+
+## Executive Summary
+
+This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns.
+
+### Key Differentiators vs. Static Feature Flags
+
+| Capability         | Static Flags (Current) | Intelligent A/B Testing                   |
+| ------------------ | ---------------------- | ----------------------------------------- |
+| Traffic Allocation | Manual percentage      | **Multi-armed bandit optimization**       |
+| Stopping Decision  | Manual monitoring      | **Auto-stop at statistical significance** |
+| Winner Selection   | Human judgment         | **Bayesian probability of superiority**   |
+| Test Duration      | Fixed (often wrong)    | **Dynamic based on effect size**          |
+| Hypothesis         | Human-written          | **AI-generated from usage patterns**      |
+| Sample Size        | Guesswork              | **Power analysis + sequential testing**   |
+
+---
+
+## Phase 1: Core Experiment Engine (Week 1)
+
+### 1.1 Data Model & Schemas
+
+- [ ] **1.1.1** Create `modules/ab-testing/types.ts`
+  - [ ] `ExperimentDoc` — experiment definition and config
+  - [ ] `VariantDoc` — variant metadata + metrics
+  - [ ] `AssignmentDoc` — user → variant assignments
+  - [ ] `MetricDoc` — event types being tracked
+  - [ ] `ExperimentResult` — statistical analysis results
+  - [ ] Zod schemas for all inputs
+- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
+  - [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed)
+  - [ ] `experiment_variants` (pk: `/experimentId`)
+  - [ ] `experiment_assignments` (pk: `/userId`, query by experiment)
+  - [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series)
+  - [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates)
+
+### 1.2 Assignment & Bucketing
+
+- [ ] **1.2.1** Create deterministic bucketing
+  - [ ] Consistent hashing (userId + experimentId → variant)
+  - [ ] FNV-1a hash algorithm (same as feature flags)
+  - [ ] Sticky assignments (user always sees same variant)
+  - [ ] Override capability (force specific variant for QA)
+- [ ] **1.2.2** Assignment strategies
+  - [ ] `random` — Simple randomization (control vs static)
+  - [ ] `thompson` — Thompson sampling (multi-armed bandit)
+  - [ ] `epsilon_greedy` — Epsilon-greedy exploration
+  - [ ] `ucb` — Upper Confidence Bound algorithm
+- [ ] **1.2.3** Audience targeting
+  - [ ] User property filters (platform, version, region, subscription tier)
+  - [ ] Percentage rollout within target segment
+  - [ ] Exclusion lists (beta users, internal accounts)
+
+### 1.3 Event Tracking Pipeline
+
+- [ ] **1.3.1** Metric definitions
+  - [ ] `conversion` — Binary (did/didn't convert)
+  - [ ] `count` — Integer events (sessions, messages)
+  - [ ] `duration` — Time-based (session length, task time)
+  - [ ] `revenue` — Monetary (purchase amount, LTV)
+  - [ ] `custom` — Arbitrary numeric values
+- [ ] **1.3.2** Event ingestion
+  - [ ] `POST /ab-testing/events` batch endpoint
+  - [ ] Client SDK: `track(event, value, metadata)`
+  - [ ] Automatic attribution (which variant caused this event)
+  - [ ] Deduplication (eventId + userId uniqueness)
+
+**Phase 1 Exit Criteria:**
+
+- [ ] Experiments created with multiple variants
+- [ ] Users consistently assigned to variants
+- [ ] Events tracked and attributed correctly
+- [ ] 20+ tests for assignment and ingestion
+
+---
+
+## Phase 2: Statistical Analysis Engine (Week 1–2)
+
+### 2.1 Bayesian Inference
+
+- [ ] **2.1.1** Create `modules/ab-testing/statistics.ts`
+  - [ ] `BetaDistribution` for conversion rates
+  - [ ] `GammaDistribution` for count/duration metrics
+  - [ ] `NormalDistribution` for continuous metrics
+  - [ ] Monte Carlo simulation (10,000 samples)
+- [ ] **2.1.2** Probability calculations
+  - [ ] `probabilityVariantBeatsControl(variant, control)`
+  - [ ] `expectedLossIfChosen(variant)`
+  - [ ] `probabilityBeatAllVariants(variant)`
+- [ ] **2.1.3** Credible intervals
+  - [ ] 95% credible interval for each variant's true metric
+  - [ ] Visualization-ready (lower, mean, upper bounds)
+
+### 2.2 Early Stopping Rules
+
+- [ ] **2.2.1** Stopping criteria
+  - [ ] **Winner found:** Variant has > 95% probability of beating control
+  - [ ] **Loser clear:** Control has > 95% probability of beating variant
+  - [ ] **Practical significance:** Minimum detectable effect not reached
+  - [ ] **Time bound:** Max duration reached (safety limit)
+- [ ] **2.2.2** Auto-promotion
+  - [ ] Auto-rollout winner to 100% when threshold hit
+  - [ ] Notify admins via Slack/email
+  - [ ] Create audit log entry
+- [ ] **2.2.3** Guardrails
+  - [ ] Minimum sample size before early stopping (100 users/variant)
+  - [ ] Business hours only for auto-actions
+  - [ ] Require approval for revenue-impacting experiments
+
+### 2.3 Thompson Sampling
+
+- [ ] **2.3.1** Multi-armed bandit implementation
+  - [ ] Sample from posterior distributions
+  - [ ] Assign user to variant with highest sample
+  - [ ] Re-balance traffic every hour based on performance
+- [ ] **2.3.2** Exploration vs exploitation
+  - [ ] Exploration rate decays over time
+  - [ ] High uncertainty = more exploration
+  - [ ] Clear winner = more traffic to winner
+- [ ] **2.3.3** Regret minimization
+  - [ ] Track cumulative regret vs optimal variant
+  - [ ] Regret bounds reporting
+
+**Phase 2 Exit Criteria:**
+
+- [ ] Bayesian probabilities calculated correctly
+- [ ] Early stopping triggers at appropriate thresholds
+- [ ] Thompson sampling re-allocates traffic dynamically
+- [ ] Statistical tests validate correctness
+
+---
+
+## Phase 3: AI-Powered Hypothesis Generation (Week 2)
+
+### 3.1 Pattern Detection
+
+- [ ] **3.1.1** Usage pattern analysis
+  - [ ] Analyze feature flag usage telemetry
+  - [ ] Segment analysis (iOS vs Android, free vs pro)
+  - [ ] Temporal patterns (day of week, time of day)
+  - [ ] User behavior sequences (funnel analysis)
+- [ ] **3.1.2** Anomaly detection
+  - [ ] Unexpected drop in feature adoption
+  - [ ] Performance regression signals
+  - [ ] User segment showing different behavior
+- [ ] **3.1.3** Opportunity identification
+  - [ ] Underperforming features (low adoption)
+  - [ ] High-dropoff flows
+  - [ ] Competitor feature gaps
+
+### 3.2 Hypothesis Generation
+
+- [ ] **3.2.1** LLM hypothesis prompts
+
+  ```
+  Given this feature usage data:
+  - Feature: {featureName}
+  - Current adoption: {adoptionRate}% (baseline: {baseline}%)
+  - Segment performance: {segmentData}
+  - User feedback: {feedbackSamples}
+  - Competitor analysis: {competitorFeatures}
+
+  Generate experiment hypotheses:
+  1. Primary hypothesis: "Changing X will improve Y because..."
+  2. Secondary hypotheses (2-3 alternatives)
+  3. Expected effect size (conservative estimate)
+  4. Success metric recommendation
+  5. Risk assessment
+  ```
+
+- [ ] **3.2.2** Hypothesis ranking
+  - [ ] Expected impact scoring
+  - [ ] Implementation difficulty estimate
+  - [ ] Statistical power prediction
+  - [ ] Risk-adjusted expected value
+- [ ] **3.2.3** Suggested experiment design
+  - [ ] Variant count recommendation
+  - [ ] Traffic allocation suggestion
+  - [ ] Duration estimate
+  - [ ] Required sample size calculation
+
+### 3.3 Auto-Experiment Suggestions
+
+- [ ] **3.3.1** Weekly AI reports
+  - [ ] Top 5 experiment opportunities
+  - [ ] Hypotheses with supporting evidence
+  - [ ] Prioritized by expected impact
+- [ ] **3.3.2** One-click experiment creation
+  - [ ] Pre-fill experiment from hypothesis
+  - [ ] Suggested variants with descriptions
+  - [ ] Pre-configured metrics
+
+**Phase 3 Exit Criteria:**
+
+- [ ] AI generates meaningful hypotheses from usage data
+- [ ] Hypothesis quality rated by product team (80%+ useful)
+- [ ] Auto-suggested experiments created in 1 click
+- [ ] Weekly reports generated automatically
+
+---
+
+## Phase 4: Admin Dashboard UI (Week 2–3)
+
+### 4.1 Experiments List Page
+
+- [ ] **4.1.1** Create `/ops/experiments/page.tsx`
+  - [ ] Experiment cards (status, duration, sample size)
+  - [ ] Quick filters (running, completed, draft)
+  - [ ] AI-generated hypothesis badge
+  - [ ] Health indicators (traffic balance, event flow)
+- [ ] **4.1.2** Experiment creation wizard
+  - [ ] Step 1: Define hypothesis (AI suggestions available)
+  - [ ] Step 2: Create variants (name, description, config)
+  - [ ] Step 3: Select metrics (primary + secondary)
+  - [ ] Step 4: Audience targeting
+  - [ ] Step 5: Traffic allocation (manual or Thompson)
+  - [ ] Step 6: Review and launch
+
+### 4.2 Live Experiment Dashboard
+
+- [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx`
+  - [ ] Real-time metrics comparison
+  - [ ] Variant performance table (conversions, counts, durations)
+  - [ ] Bayesian probability visualization
+  - [ ] Credible interval charts
+- [ ] **4.2.2** Statistical summary card
+  - [ ] Probability of beating control (per variant)
+  - [ ] Expected lift if implemented
+  - [ ] Sample size progress bar
+  - [ ] Days to significance estimate
+- [ ] **4.2.3** Action buttons
+  - [ ] Adjust traffic allocation
+  - [ ] Pause/resume experiment
+  - [ ] Stop and declare winner
+  - [ ] Rollout winner to 100%
+  - [ ] Archive experiment
+
+### 4.3 Results & Reporting
+
+- [ ] **4.3.1** Results page
+  - [ ] Final statistical summary
+  - [ ] Variant comparison visualization
+  - [ ] Segment breakdown (iOS vs Android, etc.)
+  - [ ] Confidence intervals over time
+- [ ] **4.3.2** AI insights panel
+  - [ ] Why this result occurred (LLM summary)
+  - [ ] Unexpected findings
+  - [ ] Follow-up experiment suggestions
+- [ ] **4.3.3** Export capabilities
+  - [ ] CSV export of raw data
+  - [ ] PDF report generation
+  - [ ] API endpoint for data warehouse sync
+
+**Phase 4 Exit Criteria:**
+
+- [ ] Full experiment lifecycle manageable in UI
+- [ ] Real-time stats visible and accurate
+- [ ] Bayesian visualizations clear to non-statisticians
+- [ ] Export and reporting functional
+
+---
+
+## Phase 5: Advanced Capabilities (Future)
+
+### 5.1 Multi-Variate Testing
+
+- [ ] Test multiple variables simultaneously
+- [ ] Full factorial and fractional factorial designs
+- [ ] Interaction effect detection
+
+### 5.2 Sequential Experimentation
+
+- [ ] Multi-phase experiments (qualification → main → validation)
+- [ ] Holdout groups for long-term validation
+- [ ] Global holdout (never-exposed users)
+
+### 5.3 Personalization Layer
+
+- [ ] Contextual bandits (different variants for different users)
+- [ ] ML model for variant selection
+- [ ] Automatic personalization optimization
+
+### 5.4 Experiment Coordination
+
+- [ ] Mutually exclusive experiments
+- [ ] Experiment priority rules
+- [ ] Layered experimentation (orthogonal tests)
+
+---
+
+## Appendix A: Data Models
+
+### ExperimentDoc
+
+```typescript
+interface ExperimentDoc {
+  id: string; // exp_<uuid>
+  productId: string; // partition key
+
+  // Experiment definition
+  name: string;
+  description: string;
+  hypothesis: string;
+  aiGeneratedHypothesis?: boolean; // Flag for AI-suggested
+
+  // Status lifecycle: draft → running → paused | stopped | completed
+  status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';
+
+  // Variants
+  controlVariantId: string; // Baseline variant
+  variantIds: string[]; // All variant IDs
+
+  // Configuration
+  allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
+  targetPercent: number; // % of eligible traffic
+
+  // Audience targeting
+  targeting: {
+    platforms?: string[]; // ios, android, web
+    appVersions?: { min: string; max?: string };
+    regions?: string[];
+    userSegments?: string[]; // pro, free, enterprise
+    userProperties?: Record<string, string | number | boolean>;
+  };
+
+  // Metrics
+  primaryMetric: {
+    name: string;
+    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
+    eventName: string; // Telemetry event to track
+    aggregation: 'sum' | 'mean' | 'count' | 'unique';
+    direction: 'increase' | 'decrease'; // Is higher better?
+    minimumDetectableEffect: number; // % change we want to detect
+  };
+  secondaryMetrics: Array<{
+    name: string;
+    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
+    eventName: string;
+  }>;
+
+  // Guardrails
+  guardrails: {
+    minSampleSizePerVariant: number; // Default: 100
+    maxDurationDays: number; // Safety limit, default: 30
+    autoStopEnabled: boolean;
+    winnerThreshold: number; // % probability to auto-stop, default: 95
+    requireApprovalFor: 'none' | 'revenue' | 'all';
+  };
+
+  // Scheduling
+  startAt?: string; // Scheduled start (ISO 8601)
+  endAt?: string; // Scheduled end or actual stop
+
+  // Stats (denormalized for fast reads)
+  totalParticipants: number;
+  totalEvents: number;
+
+  // Timestamps
+  createdAt: string;
+  updatedAt: string;
+  startedAt?: string;
+  completedAt?: string;
+  ttl: number; // 2 years for completed
+}
+```
+
+### VariantDoc
+
+```typescript
+interface VariantDoc {
+  id: string; // var_<uuid>
+  experimentId: string; // partition key
+
+  // Variant definition
+  name: string; // "Control", "New Button Color", etc.
+  description?: string;
+  isControl: boolean;
+
+  // Feature flag configuration
+  flagConfig: Record<string, unknown>; // Arbitrary config payload
+
+  // Traffic allocation (dynamic for bandit strategies)
+  currentAllocationPercent: number; // 0–100%
+
+  // Statistics (real-time computed)
+  stats: {
+    participants: number;
+    events: number;
+
+    // Primary metric
+    primaryMetricValue: number; // Mean or conversion rate
+    primaryMetricStdDev?: number;
+
+    // For conversion metrics
+    conversions?: number;
+    conversionRate?: number; // 0–1
+
+    // Bayesian posterior parameters
+    betaAlpha?: number; // For Beta distribution
+    betaBeta?: number;
+
+    gammaShape?: number; // For Gamma distribution
+    gammaScale?: number;
+  };
+
+  // Bayesian results
+  bayesianResults?: {
+    probabilityBeatsControl: number; // 0–1
+    probabilityBeatsAll: number; // 0–1
+    expectedLiftPercent: number; // Relative to control
+    expectedLoss: number; // Risk of choosing this variant
+    credibleInterval: {
+      lower: number;
+      mean: number;
+      upper: number;
+    };
+  };
+
+  createdAt: string;
+  updatedAt: string;
+}
+```
+
+### ExperimentAssignmentDoc
+
+```typescript
+interface ExperimentAssignmentDoc {
+  id: string; // ea_<uuid>
+  userId: string; // partition key (for user lookups)
+
+  experimentId: string;
+  variantId: string;
+
+  // Assignment metadata
+  assignedAt: string; // First assignment
+  firstExposedAt?: string; // First actual exposure (feature use)
+
+  // Context at assignment
+  assignmentContext: {
+    platform: string;
+    appVersion: string;
+    osVersion: string;
+    deviceModel?: string;
+    region?: string;
+  };
+
+  // Events attributed to this assignment
+  eventCount: number;
+  lastEventAt?: string;
+
+  // TTL: Remove after experiment completes + analysis period
+  ttl: number; // experimentEnd + 90 days
+}
+```
+
+### ExperimentEventDoc
+
+```typescript
+interface ExperimentEventDoc {
+  id: string; // ee_<uuid>
+  experimentId: string; // partition key
+  timestamp: string; // Sort key for time-series queries
+
+  // Attribution
+  userId: string;
+  variantId: string;
+  assignmentId: string;
+
+  // Event details
+  metricName: string;
+  metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
+  value: number; // Numeric value
+
+  // Conversion tracking (for binary metrics)
+  converted: boolean; // For conversion metrics
+
+  // Context
+  eventMetadata?: Record<string, unknown>;
+
+  // Denormalized for filtering
+  platform: string;
+  appVersion: string;
+
+  // TTL: Shorter for raw events
+  ttl: number; // 90 days
+}
+```
+
+---
+
+## Implementation Tracking
+
+| Phase | Task                          | Status | Commit |
+| ----- | ----------------------------- | ------ | ------ |
+| 1.1   | Experiment types & schemas    | ⬜     | —      |
+| 1.1   | Cosmos containers             | ⬜     | —      |
+| 1.2   | Deterministic bucketing       | ⬜     | —      |
+| 1.2   | Assignment strategies         | ⬜     | —      |
+| 1.2   | Audience targeting            | ⬜     | —      |
+| 1.3   | Metric definitions            | ⬜     | —      |
+| 1.3   | Event ingestion               | ⬜     | —      |
+| 2.1   | Bayesian inference engine     | ⬜     | —      |
+| 2.1   | Probability calculations      | ⬜     | —      |
+| 2.1   | Credible intervals            | ⬜     | —      |
+| 2.2   | Early stopping rules          | ⬜     | —      |
+| 2.2   | Auto-promotion                | ⬜     | —      |
+| 2.2   | Guardrails                    | ⬜     | —      |
+| 2.3   | Thompson sampling             | ⬜     | —      |
+| 2.3   | Exploration vs exploitation   | ⬜     | —      |
+| 2.3   | Regret minimization           | ⬜     | —      |
+| 3.1   | Pattern detection             | ⬜     | —      |
+| 3.1   | Anomaly detection             | ⬜     | —      |
+| 3.2   | Hypothesis generation prompts | ⬜     | —      |
+| 3.2   | Hypothesis ranking            | ⬜     | —      |
+| 3.3   | Auto-experiment suggestions   | ⬜     | —      |
+| 4.1   | Experiments list page         | ⬜     | —      |
+| 4.1   | Creation wizard               | ⬜     | —      |
+| 4.2   | Live dashboard                | ⬜     | —      |
+| 4.2   | Statistical summary           | ⬜     | —      |
+| 4.3   | Results & reporting           | ⬜     | —      |
+| 4.3   | AI insights panel             | ⬜     | —      |
+
+**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
+
+---
+
+## Quick Reference for Implementing Agent
+
+**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md`
+
+**Key Files to Modify/Create:**
+
+```
+services/platform-service/
+├── src/
+│   ├── modules/ab-testing/
+│   │   ├── types.ts              # [1.1] Experiment, Variant, Assignment types
+│   │   ├── repository.ts         # [1.2] Data access layer
+│   │   ├── bucketing.ts          # [1.2] FNV-1a hash, sticky assignments
+│   │   ├── statistics.ts         # [2.1] Bayesian inference, Beta/Normal distributions
+│   │   ├── allocation.ts         # [2.3] Thompson sampling, bandit strategies
+│   │   ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
+│   │   ├── routes.ts             # [4] REST API
+│   │   └── ab-testing.test.ts    # Tests
+│   ├── lib/
+│   │   └── cosmos-init.ts        # [1.1] Add containers
+│   └── server.ts                 # Register routes
+dashboards/admin-web/
+├── src/
+│   ├── app/(dashboard)/
+│   │   ├── experiments/
+│   │   │   ├── page.tsx          # [4.1] Experiments list
+│   │   │   ├── new/page.tsx      # [4.1] Creation wizard
+│   │   │   └── [id]/
+│   │   │       └── page.tsx      # [4.2] Live dashboard
+│   ├── lib/
+│   │   └── experiments-client.ts # API client
+│   └── components/
+│       └── experiments/          # Bayesian charts, variant cards
+```
+
+**Commit Message Format:**
+
+```
+feat(ab-testing): <description> [<task.code>]
+```
+
+**Example:**
+
+```bash
+git add services/platform-service/src/modules/ab-testing/
+git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"
+```
+
+**Testing Requirements:**
+
+- Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
+- Statistical validation: A/A tests, known distribution tests
+- Integration: End-to-end experiment lifecycle
+
+**Dependencies:**
+
+- Feature flags module (reuse bucketing logic)
+- Telemetry module (event tracking)
+- Azure OpenAI (hypothesis generation)
+
+---
+
+## Appendix B: Statistical Methods
+
+### Bayesian A/B Testing
+
+**Conversion Metrics (Beta-Binomial):**
+
+```
+Posterior: Beta(α + conversions, β + non-conversions)
+Where α = β = 1 (uniform prior)
+
+Probability variant beats control:
+P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]
+```
+
+**Continuous Metrics (Normal):**
+
+```
+Posterior: Normal(μ_n, σ_n²)
+Where μ_n, σ_n updated via conjugate prior
+
+Probability variant beats control via Monte Carlo sampling
+```
+
+### Thompson Sampling
+
+```
+For each incoming user:
+  For each variant:
+    Sample θ_i from variant's posterior distribution
+  Assign user to variant with max(θ_i)
+
+Update variant's posterior after observing outcome
+```
+
+### Early Stopping
+
+```
+Stop experiment when:
+  max_variant P(beats control) > 0.95  → Winner found
+  OR max_variant P(beats control) < 0.05 → No winner
+  OR days_running > max_duration
+  AND samples_per_variant > min_sample_size
+```
+
+---
+
+## Appendix C: API Reference
+
+| Method | Endpoint                               | Auth     | Description                      |
+| ------ | -------------------------------------- | -------- | -------------------------------- |
+| POST   | `/ab-testing/experiments`              | Admin    | Create experiment                |
+| GET    | `/ab-testing/experiments`              | Admin    | List experiments                 |
+| GET    | `/ab-testing/experiments/:id`          | Admin    | Get experiment details           |
+| PATCH  | `/ab-testing/experiments/:id`          | Admin    | Update experiment                |
+| DELETE | `/ab-testing/experiments/:id`          | Admin    | Stop/archive experiment          |
+| POST   | `/ab-testing/experiments/:id/start`    | Admin    | Start experiment                 |
+| POST   | `/ab-testing/experiments/:id/pause`    | Admin    | Pause experiment                 |
+| POST   | `/ab-testing/experiments/:id/complete` | Admin    | Complete with winner             |
+| POST   | `/ab-testing/assign`                   | Any auth | Get variant assignment for user  |
+| POST   | `/ab-testing/events`                   | Any auth | Track experiment event           |
+| GET    | `/ab-testing/experiments/:id/results`  | Admin    | Get statistical results          |
+| GET    | `/ab-testing/suggestions`              | Admin    | AI-generated experiment ideas    |
+| POST   | `/ab-testing/hypotheses`               | Admin    | Generate hypothesis from pattern |
+
+---
+
+## Appendix D: Integration Points
+
+### With Feature Flags Module
+
+- Experiments build on feature flag infrastructure
+- Flag state = variant assignment
+- Consistent bucketing with existing flags
+
+### With Telemetry Module
+
+- Experiment events enriched with telemetry context
+- Automatic metric tracking from existing events
+- Funnel analysis using telemetry breadcrumbs
+
+### With Event Bus
+
+| Event                         | Action                              |
+| ----------------------------- | ----------------------------------- |
+| `ab.experiment.started`       | Notify stakeholders, log audit      |
+| `ab.experiment.completed`     | Generate report, suggest follow-ups |
+| `ab.variant.declared_winner`  | Trigger auto-rollout if enabled     |
+| `ab.early_stopping.triggered` | Alert experiment owner              |
+
+---
+
+## Appendix E: Cost Estimation
+
+| Component                    | Monthly Cost (est.)      |
+| ---------------------------- | ------------------------ |
+| Cosmos DB (experiment data)  | $100–200                 |
+| LLM hypothesis generation    | $50–100 (weekly reports) |
+| Compute (statistical engine) | $50 (negligible)         |
+| **Total**                    | **$200–350/month**       |
+
+---
+
+## Current Status
+
+- [ ] **Design complete** — Target: 2026-03-10
+- [ ] **Phase 1: Core Engine** — Not started
+- [ ] **Phase 2: Statistics** — Not started
+- [ ] **Phase 3: AI Hypotheses** — Not started
+- [ ] **Phase 4: Admin UI** — Not started
+- [ ] **Phase 5: Advanced** — Future
+
+**Estimated Timeline:** 2.5–3 weeks (Phases 1–4)
+
+**Dependencies:**
+
+- Feature flags module (for assignment infrastructure)
+- Telemetry module (for event tracking)
+- Azure OpenAI (for hypothesis generation)
+
+---
+
+_Last Updated: 2026-03-03_
--- a/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
+++ b/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
@ -0,0 +1,848 @@
+# Predictive Churn & Health Scoring — Implementation Roadmap
+
+> **Module:** `platform-service/src/modules/predictive-analytics/`  
+> **Admin UI:** `/ops/health-dashboard/`  
+> **Target:** ML-powered churn prediction, health scoring, and proactive retention  
+> **Estimated Effort:** 3 weeks  
+> **Status:** 🟡 Planning
+
+---
+
+## Executive Summary
+
+This roadmap delivers a **predictive analytics platform** that forecasts user churn 7–30 days in advance and computes product health scores from telemetry. Unlike reactive dashboards that show what happened, this system **predicts what will happen**—enabling proactive retention campaigns, resource allocation, and product improvements before users leave.
+
+### Key Differentiators vs. Reactive Analytics
+
+| Capability          | Traditional Analytics      | Predictive Churn & Health         |
+| ------------------- | -------------------------- | --------------------------------- |
+| Insight Type        | Historical (what happened) | **Predictive (what will happen)** |
+| Churn Detection     | After user leaves          | **7–30 days before churn**        |
+| Health View         | Current snapshot only      | **Trending + forecasted**         |
+| Interventions       | Reactive recovery          | **Proactive prevention**          |
+| Product Insights    | Manual pattern search      | **Auto-detected risk signals**    |
+| Resource Allocation | Guesswork                  | **Risk-weighted prioritization**  |
+
+---
+
+## Phase 1: Feature Engineering Pipeline (Week 1)
+
+### 1.1 Telemetry Feature Extraction
+
+- [ ] **1.1.1** Create `modules/predictive-analytics/feature-extractor.ts`
+  - [ ] User behavior features (session frequency, depth, recency)
+  - [ ] Engagement features (feature usage diversity, core action completion)
+  - [ ] Performance features (error rate, latency exposure, crash frequency)
+  - [ ] Social features (sharing, collaboration, network effects)
+  - [ ] Revenue features (payment history, plan changes, support tickets)
+- [ ] **1.1.2** Time-window aggregations
+  - [ ] Last 24 hours (recent behavior)
+  - [ ] Last 7 days (weekly patterns)
+  - [ ] Last 30 days (monthly trends)
+  - [ ] Life-to-date (all-time totals)
+- [ ] **1.1.3** Rolling window features
+  - [ ] 7-day rolling average (trend smoothing)
+  - [ ] Week-over-week change (acceleration)
+  - [ ] Cohort-normalized scores (vs. similar users)
+
+### 1.2 Feature Store
+
+- [ ] **1.2.1** Create `modules/predictive-analytics/feature-store.ts`
+  - [ ] `UserFeatureVector` — normalized feature values per user
+  - [ ] `ProductHealthMetrics` — aggregated product-level scores
+  - [ ] Feature versioning (track feature schema changes)
+- [ ] **1.2.2** Add Cosmos containers to `cosmos-init.ts`
+  - [ ] `user_features` (pk: `/userId`, TTL: 90 days)
+  - [ ] `product_health` (pk: `/productId` + `/date`, time-series)
+  - [ ] `feature_definitions` (pk: `/productId`, feature metadata)
+- [ ] **1.2.3** Feature computation jobs
+  - [ ] Daily feature computation (nightly batch)
+  - [ ] Real-time feature updates (on key events)
+  - [ ] Feature backfill (compute historical features)
+
+### 1.3 Product-Specific Feature Catalog
+
+- [ ] **1.3.1** Define features per product
+  - [ ] **NomGap:** Fast completion rate, protocol adherence, streak length, autophagy engagement
+  - [ ] **JarvisJr:** Session frequency, agent diversity, voice/text ratio, skill progression
+  - [ ] **ChronoMind:** Timer completion rate, cascade effectiveness, routine adherence, urgency response
+  - [ ] **MindLyst:** Brain usage diversity, triage accuracy, memory capture frequency, reflection completion
+  - [ ] **PeakPulse:** Session frequency, goal completion, streak maintenance, social sharing
+  - [ ] **LysnrAI:** Dictation frequency, accuracy rate, hotkey usage, vocabulary growth
+- [ ] **1.3.2** Feature importance tracking
+  - [ ] Which features correlate with churn/retention
+  - [ ] Feature drift detection (behavior changes over time)
+  - [ ] Auto-suggest new features based on patterns
+
+**Phase 1 Exit Criteria:**
+
+- [ ] 50+ features extracted per product
+- [ ] Feature store populated for all active users
+- [ ] Daily feature computation job running
+- [ ] Feature importance analysis completed
+
+---
+
+## Phase 2: Churn Prediction Model (Week 1–2)
+
+### 2.1 Model Architecture
+
+- [ ] **2.1.1** Create `modules/predictive-analytics/churn-model.ts`
+  - [ ] Binary classification (will churn in 7 days? 30 days?)
+  - [ ] Gradient Boosted Trees (XGBoost/LightGBM) baseline
+  - [ ] Neural network ensemble (for comparison)
+- [ ] **2.1.2** Training pipeline
+  - [ ] Label definition: No activity for N days = churned
+  - [ ] Train/validation/test split (time-based, not random)
+  - [ ] Cross-validation with temporal folds
+  - [ ] Hyperparameter tuning (optuna/ray tune)
+- [ ] **2.1.3** Model evaluation
+  - [ ] ROC-AUC (discrimination ability)
+  - [ ] Precision/Recall at different thresholds
+  - [ ] Calibration (predicted prob vs. actual rate)
+  - [ ] Per-product performance breakdown
+
+### 2.2 Prediction Service
+
+- [ ] **2.2.1** Real-time scoring API
+  - [ ] `POST /predictive/churn-score` — single user prediction
+  - [ ] `POST /predictive/churn-batch` — batch scoring
+  - [ ] Latency < 100ms for single prediction
+- [ ] **2.2.2** Risk segmentation
+  - [ ] Risk buckets: Critical (>80%), High (60–80%), Medium (30–60%), Low (<30%)
+  - [ ] Risk score components (which features drive the score)
+  - [ ] Confidence intervals on predictions
+- [ ] **2.2.3** Model versioning
+  - [ ] A/B test model versions
+  - [ ] Shadow mode (predict without acting)
+  - [ ] Rollback capability
+
+### 2.3 Explanation Engine
+
+- [ ] **2.3.1** SHAP value computation
+  - [ ] Feature contributions to each prediction
+  - [ ] Global feature importance (what drives churn overall)
+  - [ ] Local explanations (why this specific user is at risk)
+- [ ] **2.3.2** Natural language explanations
+  ```
+  "This user shows 78% churn risk because:
+  - Session frequency dropped 60% in the last week
+  - No core feature usage in 5 days
+  - Error rate increased 3x vs. their baseline
+  - Similar users who showed these patterns had 85% churn rate"
+  ```
+- [ ] **2.3.3** Actionable insight extraction
+  - [ ] Top 3 risk factors per user
+  - [ ] Suggested intervention based on risk profile
+  - [ ] Priority ranking (who to contact first)
+
+**Phase 2 Exit Criteria:**
+
+- [ ] Model achieves > 75% AUC on test set
+- [ ] Real-time scoring API < 100ms latency
+- [ ] Explanations generated for all predictions
+- [ ] Risk segmentation validated against historical churn
+
+---
+
+## Phase 3: Product Health Scoring (Week 2)
+
+### 3.1 Health Metric Framework
+
+- [ ] **3.1.1** Create `modules/predictive-analytics/health-scoring.ts`
+  - [ ] Health dimensions: Acquisition, Activation, Retention, Revenue, Engagement
+  - [ ] Composite health score (weighted average)
+  - [ ] Per-dimension scores with drill-down
+- [ ] **3.1.2** Product health indicators
+  - [ ] Daily Active Users (DAU) trend
+  - [ ] New user activation rate (Day-1, Day-7)
+  - [ ] Cohort retention curves (Day-1, Day-7, Day-30)
+  - [ ] Feature adoption rates (new feature uptake)
+  - [ ] Error rates and stability scores
+  - [ ] Support ticket volume and sentiment
+  - [ ] Revenue metrics (MRR, ARPU, LTV)
+
+### 3.2 Health Score Computation
+
+- [ ] **3.2.1** Baseline establishment
+  - [ ] Historical 90-day baseline for each metric
+  - [ ] Peer product comparison (ChronoMind vs. JarvisJr benchmarks)
+  - [ ] Industry benchmarks (if available)
+- [ ] **3.2.2** Scoring algorithm
+  - [ ] Z-score normalization (how many std devs from baseline)
+  - [ ] Trend direction (improving vs. declining)
+  - [ ] Volatility adjustment (consistent vs. erratic)
+  - [ ] 0–100 health score scale
+- [ ] **3.2.3** Alert thresholds
+  - [ ] Critical: Score < 60 or 20% drop from baseline
+  - [ ] Warning: Score 60–75 or 10% drop
+  - [ ] Healthy: Score > 75 and stable
+
+### 3.3 Anomaly Detection
+
+- [ ] **3.3.1** Statistical anomaly detection
+  - [ ] Prophet/ARIMA for time-series forecasting
+  - [ ] Forecast vs. actual deviation detection
+  - [ ] Seasonal pattern recognition (day-of-week, monthly)
+- [ ] **3.3.2** Multi-dimensional anomaly detection
+  - [ ] Correlation breakdown detection (metrics usually correlated diverging)
+  - [ ] Cohort-specific anomalies (specific region, platform, segment)
+- [ ] **3.3.3** Root cause suggestion
+  - [ ] Correlation with deployments/releases
+  - [ ] Error spike correlation
+  - [ ] External factor detection (holidays, events)
+
+**Phase 3 Exit Criteria:**
+
+- [ ] Health scores computed daily for all products
+  - [ ] Anomaly detection with < 5% false positive rate
+  - [ ] Historical baseline established for all metrics
+
+---
+
+## Phase 4: Proactive Intervention System (Week 2–3)
+
+### 4.1 Retention Campaign Automation
+
+- [ ] **4.1.1** Campaign trigger rules
+  - [ ] High-risk user enters segment → trigger email
+  - [ ] Medium-risk + specific behavior → trigger in-app message
+  - [ ] Critical risk → trigger personal outreach task
+- [ ] **4.1.2** Personalized messaging
+  - [ ] Message variant based on risk factors
+  - [ ] Feature recommendations based on unused capabilities
+  - [ ] Success stories from similar users
+- [ ] **4.1.3** Campaign effectiveness tracking
+  - [ ] Control group vs. treatment
+  - [ ] Churn rate comparison
+  - [ ] Revenue impact measurement
+
+### 4.2 Auto-Trigger Flows
+
+- [ ] **4.2.1** Platform integrations
+  - [ ] Email delivery via existing `modules/delivery/`
+  - [ ] Push notifications via `modules/notifications/`
+  - [ ] Slack notifications for CS team
+  - [ ] CRM integration (create outreach tasks)
+- [ ] **4.2.2** Smart scheduling
+  - [ ] Optimal contact time prediction
+  - [ ] Frequency capping (don't spam)
+  - [ ] Multi-channel orchestration
+- [ ] **4.2.3** Feedback loop
+  - [ ] Track intervention outcomes
+  - [ ] Retrain model with intervention effectiveness
+  - [ ] A/B test intervention strategies
+
+### 4.3 Risk Dashboard for CS Team
+
+- [ ] **4.3.1** At-risk user list
+  - [ ] Sortable by churn probability
+  - [ ] Filter by product, segment, risk factors
+  - [ ] Last activity preview
+- [ ] **4.3.2** User risk profile
+  - [ ] Churn probability trend over time
+  - [ ] Key risk factors highlighted
+  - [ ] Recommended actions
+  - [ ] User activity timeline
+- [ ] **4.3.3** Intervention tracking
+  - [ ] Contact history
+  - [ ] Response tracking
+  - [ ] Outcome recording (retained/churned)
+
+**Phase 4 Exit Criteria:**
+
+- [ ] Automated campaigns triggered for high-risk users
+- [ ] CS team dashboard with at-risk user queue
+- [ ] Intervention effectiveness measurement in place
+- [ ] Feedback loop improving model accuracy
+
+---
+
+## Phase 5: Admin Dashboard UI (Week 3)
+
+### 5.1 Product Health Overview
+
+- [ ] **5.1.1** Create `/ops/health-dashboard/page.tsx`
+  - [ ] Health score cards for each product
+  - [ ] Trend sparklines (7-day, 30-day)
+  - [ ] Alert summary (critical issues count)
+  - [ ] Product comparison table
+- [ ] **5.1.2** Health detail view
+  - [ ] Dimension breakdown (acquisition, activation, retention, etc.)
+  - [ ] Metric time-series charts
+  - [ ] Anomaly markers on charts
+  - [ ] Cohort retention curves
+
+### 5.2 Churn Prediction Dashboard
+
+- [ ] **5.2.1** Churn risk overview
+  - [ ] Risk distribution pie chart
+  - [ ] At-risk user count by product
+  - [ ] Predicted churn impact (revenue at risk)
+  - [ ] Model performance metrics (AUC, calibration)
+- [ ] **5.2.2** User risk explorer
+  - [ ] Search/filter at-risk users
+  - [ ] Risk score with explanation
+  - [ ] Top risk factors
+  - [ ] Recommended interventions
+- [ ] **5.2.3** Model insights
+  - [ ] Global feature importance chart
+  - [ ] Model performance over time
+  - [ ] Feature drift alerts
+
+### 5.3 Campaign Management
+
+- [ ] **5.3.1** Campaign list
+  - [ ] Active/paused/completed campaigns
+  - [ ] Trigger rules summary
+  - [ ] Performance stats (sent, opened, converted)
+- [ ] **5.3.2** Campaign editor
+  - [ ] Trigger condition builder
+  - [ ] Message template editor
+  - [ ] Audience targeting
+  - [ ] A/B test configuration
+- [ ] **5.3.3** Campaign analytics
+  - [ ] Funnel: triggered → sent → opened → retained
+  - [ ] Revenue impact
+  - [ ] Comparison to control group
+
+**Phase 5 Exit Criteria:**
+
+- [ ] Health dashboard shows all products with trends
+- [ ] Churn predictions visible with explanations
+- [ ] Campaign creation and management functional
+- [ ] Full test coverage
+
+---
+
+## Phase 6: Advanced Capabilities (Future)
+
+### 6.1 Cohort-Specific Models
+
+- [ ] Segment-specific churn models (iOS vs. Android, free vs. pro)
+- [ ] Regional models (different behaviors by geography)
+- [ ] Temporal models (seasonal churn patterns)
+
+### 6.2 LTV Prediction
+
+- [ ] Predict lifetime value at signup
+- [ ] Predict upgrade probability (free → pro)
+- [ ] Optimize acquisition channels by predicted LTV
+
+### 6.3 Product Recommendations
+
+- [ ] Suggest features to at-risk users based on successful cohorts
+- [ ] Personalized onboarding based on predicted needs
+- [ ] Next-best-action recommendations
+
+---
+
+## Appendix A: Data Models
+
+### UserChurnPredictionDoc
+
+```typescript
+interface UserChurnPredictionDoc {
+  id: string; // cp_<uuid>
+  userId: string; // partition key
+  productId: string;
+
+  // Prediction
+  predictionHorizon: 7 | 14 | 30; // Days
+  churnProbability: number; // 0–1
+  riskSegment: 'critical' | 'high' | 'medium' | 'low';
+
+  // Feature vector snapshot
+  features: Record<string, number>; // Normalized feature values
+  featureVersion: string; // Schema version
+
+  // Model info
+  modelVersion: string;
+  modelType: 'xgboost' | 'neural';
+  predictionTimestamp: string;
+
+  // Explanation (SHAP values)
+  explanation: {
+    topRiskFactors: Array<{
+      feature: string;
+      contribution: number; // SHAP value
+      direction: 'positive' | 'negative'; // Increases or decreases churn risk
+    }>;
+    globalFeatureImportance: Array<{
+      feature: string;
+      importance: number;
+    }>;
+  };
+
+  // Natural language summary
+  nlExplanation: string; // Auto-generated explanation
+
+  // Intervention
+  suggestedActions: string[];
+  interventionHistory: Array<{
+    action: string;
+    timestamp: string;
+    outcome?: 'responded' | 'ignored' | 'churned' | 'retained';
+  }>;
+
+  // Validation (ground truth)
+  actualChurned?: boolean;
+  validationDate?: string;
+
+  createdAt: string;
+  ttl: number; // predictionHorizon + 90 days
+}
+```
+
+### ProductHealthScoreDoc
+
+```typescript
+interface ProductHealthScoreDoc {
+  id: string; // ph_<uuid>
+  productId: string; // partition key
+  date: string; // Sort key (YYYY-MM-DD)
+
+  // Composite score
+  overallHealthScore: number; // 0–100
+  healthStatus: 'critical' | 'warning' | 'healthy';
+
+  // Dimension scores
+  dimensions: {
+    acquisition: {
+      score: number; // 0–100
+      metrics: {
+        newUsers: number;
+        activationRateDay1: number;
+        activationRateDay7: number;
+        cac: number;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+    activation: {
+      score: number;
+      metrics: {
+        firstValueMomentRate: number;
+        timeToFirstAction: number;
+        onboardingCompletionRate: number;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+    retention: {
+      score: number;
+      metrics: {
+        dau: number;
+        mau: number;
+        dauMauRatio: number;
+        day7Retention: number;
+        day30Retention: number;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+    engagement: {
+      score: number;
+      metrics: {
+        avgSessionLength: number;
+        sessionsPerUser: number;
+        featureAdoption: Record<string, number>;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+    revenue: {
+      score: number;
+      metrics: {
+        mrr: number;
+        arpu: number;
+        churnRate: number;
+        upgradeRate: number;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+    stability: {
+      score: number;
+      metrics: {
+        crashFreeRate: number;
+        errorRate: number;
+        avgLatency: number;
+        uptimePercent: number;
+      };
+      trend: 'improving' | 'stable' | 'declining';
+    };
+  };
+
+  // Anomalies detected
+  anomalies: Array<{
+    metric: string;
+    expectedValue: number;
+    actualValue: number;
+    deviationPercent: number;
+    severity: 'critical' | 'warning';
+    suggestedCause?: string;
+  }>;
+
+  // Forecasts
+  forecasts: {
+    next7Days: {
+      expectedHealthScore: number;
+      confidenceInterval: [number, number];
+    };
+    next30Days: {
+      expectedHealthScore: number;
+      confidenceInterval: [number, number];
+    };
+  };
+
+  // Benchmarks
+  vsBaseline7Day: number; // % change vs. 7-day baseline
+  vsBaseline30Day: number; // % change vs. 30-day baseline
+
+  createdAt: string;
+  ttl: number; // 2 years
+}
+```
+
+### RetentionCampaignDoc
+
+```typescript
+interface RetentionCampaignDoc {
+  id: string; // rc_<uuid>
+  productId: string; // partition key
+
+  // Campaign definition
+  name: string;
+  description: string;
+  status: 'draft' | 'active' | 'paused' | 'completed';
+
+  // Trigger conditions
+  trigger: {
+    type: 'churn_risk' | 'health_score_drop' | 'behavioral' | 'scheduled';
+    conditions: Array<{
+      field: string;
+      operator: 'gt' | 'lt' | 'eq' | 'in';
+      value: unknown;
+    }>;
+  };
+
+  // Audience
+  audience: {
+    riskSegments?: string[]; // 'critical', 'high', etc.
+    products?: string[];
+    userSegments?: string[];
+    excludeRecentContact?: number; // Hours (frequency capping)
+  };
+
+  // Message content
+  messages: Array<{
+    channel: 'email' | 'push' | 'in_app' | 'slack_cs';
+    templateId: string;
+    variant?: string; // A/B test variant
+    delayHours?: number; // Delay after trigger
+    conditions?: Array<{
+      field: string;
+      operator: string;
+      value: unknown;
+    }>;
+  }>;
+
+  // Performance tracking
+  stats: {
+    triggered: number;
+    sent: number;
+    opened: number;
+    clicked: number;
+    converted: number;
+    controlGroupSize: number;
+    controlChurnRate: number;
+    treatmentChurnRate: number;
+  };
+
+  createdAt: string;
+  updatedAt: string;
+  ttl: number; // 1 year after completion
+}
+```
+
+### UserFeatureVectorDoc
+
+```typescript
+interface UserFeatureVectorDoc {
+  id: string; // fv_<uuid>
+  userId: string; // partition key
+  productId: string;
+
+  // Computed features
+  features: {
+    // Recency features
+    daysSinceLastSession: number;
+    daysSinceLastCoreAction: number;
+
+    // Frequency features
+    sessionsLast7Days: number;
+    sessionsLast30Days: number;
+    avgSessionsPerWeek: number;
+
+    // Engagement depth
+    avgSessionDuration: number;
+    actionsPerSession: number;
+    uniqueFeaturesUsed: number;
+
+    // Product-specific (examples)
+    // NomGap
+    fastCompletionRate?: number;
+    streakLength?: number;
+
+    // JarvisJr
+    agentDiversityScore?: number;
+    voiceSessionRatio?: number;
+
+    // ChronoMind
+    timerCompletionRate?: number;
+    routineAdherenceScore?: number;
+
+    // Error/stability
+    errorRateLast7Days: number;
+    crashCountLast30Days: number;
+
+    // Revenue
+    planTier: number; // 0=free, 1=pro, 2=enterprise
+    lifetimeValue: number;
+    daysSinceLastPayment?: number;
+  };
+
+  // Normalized (0–1) for model input
+  normalizedFeatures: Record<string, number>;
+
+  // Metadata
+  featureSchemaVersion: string;
+  computedAt: string;
+
+  // Time windows
+  observationWindow: {
+    start: string;
+    end: string;
+  };
+
+  ttl: number; // 90 days
+}
+```
+
+---
+
+## Implementation Tracking
+
+| Phase | Task                          | Status | Commit |
+| ----- | ----------------------------- | ------ | ------ |
+| 1.1   | Telemetry feature extraction  | ⬜     | —      |
+| 1.1   | Time-window aggregations      | ⬜     | —      |
+| 1.1   | Rolling window features       | ⬜     | —      |
+| 1.2   | Feature store                 | ⬜     | —      |
+| 1.2   | Cosmos containers             | ⬜     | —      |
+| 1.2   | Feature computation jobs      | ⬜     | —      |
+| 1.3   | Product-specific features     | ⬜     | —      |
+| 1.3   | Feature importance tracking   | ⬜     | —      |
+| 2.1   | XGBoost model architecture    | ⬜     | —      |
+| 2.1   | Training pipeline             | ⬜     | —      |
+| 2.1   | Model evaluation              | ⬜     | —      |
+| 2.2   | Real-time scoring API         | ⬜     | —      |
+| 2.2   | Risk segmentation             | ⬜     | —      |
+| 2.2   | Model versioning              | ⬜     | —      |
+| 2.3   | SHAP explanations             | ⬜     | —      |
+| 2.3   | Natural language explanations | ⬜     | —      |
+| 2.3   | Actionable insights           | ⬜     | —      |
+| 3.1   | Health metric framework       | ⬜     | —      |
+| 3.1   | Health indicators             | ⬜     | —      |
+| 3.2   | Baseline establishment        | ⬜     | —      |
+| 3.2   | Scoring algorithm             | ⬜     | —      |
+| 3.2   | Alert thresholds              | ⬜     | —      |
+| 3.3   | Anomaly detection             | ⬜     | —      |
+| 4.1   | Campaign trigger rules        | ⬜     | —      |
+| 4.1   | Personalized messaging        | ⬜     | —      |
+| 4.2   | Platform integrations         | ⬜     | —      |
+| 4.3   | CS team dashboard             | ⬜     | —      |
+| 5.1   | Health overview UI            | ⬜     | —      |
+| 5.2   | Churn prediction dashboard    | ⬜     | —      |
+| 5.3   | Campaign management           | ⬜     | —      |
+
+**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
+
+---
+
+## Quick Reference for Implementing Agent
+
+**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md`
+
+**Key Files to Modify/Create:**
+
+```
+services/platform-service/
+├── src/
+│   ├── modules/predictive-analytics/
+│   │   ├── types.ts              # [1.2] Feature, HealthScore, Prediction types
+│   │   ├── repository.ts         # Data access layer
+│   │   ├── feature-extractor.ts  # [1.1] Telemetry → features
+│   │   ├── feature-store.ts      # [1.2] Feature vector storage
+│   │   ├── churn-model.ts        # [2.1] XGBoost training & inference
+│   │   ├── scoring-api.ts        # [2.2] Real-time prediction endpoint
+│   │   ├── explanation-engine.ts # [2.3] SHAP + NL explanations
+│   │   ├── health-scoring.ts     # [3] Health dimension calculation
+│   │   ├── anomaly-detection.ts  # [3.3] Prophet/ARIMA forecasting
+│   │   ├── campaign-engine.ts    # [4] Retention automation
+│   │   ├── routes.ts             # [5] REST API
+│   │   └── predictive-analytics.test.ts # Tests
+│   ├── lib/
+│   │   └── cosmos-init.ts        # [1.2] Add containers
+│   └── server.ts                 # Register routes
+dashboards/admin-web/
+├── src/
+│   ├── app/(dashboard)/
+│   │   ├── health-dashboard/
+│   │   │   └── page.tsx          # [5.1] Product health overview
+│   │   └── predictive/
+│   │       ├── at-risk/
+│   │       │   └── page.tsx      # [4.3] At-risk user list
+│   │       └── campaigns/
+│   │           └── page.tsx      # [5.3] Campaign management
+│   ├── lib/
+│   │   └── predictive-client.ts  # API client
+│   └── components/
+│       └── predictive/           # Risk cards, health charts
+```
+
+**Commit Message Format:**
+
+```
+feat(predictive-analytics): <description> [<task.code>]
+```
+
+**Example:**
+
+```bash
+git add services/platform-service/src/modules/predictive-analytics/
+git commit -m "feat(predictive-analytics): add feature extraction and store [1.1-1.2]"
+```
+
+**Testing Requirements:**
+
+- Unit tests: 20+ Vitest tests for feature extraction, model inference
+- Model validation: AUC, calibration, precision@k metrics
+- Integration: End-to-end prediction pipeline
+
+**Dependencies:**
+
+- Telemetry module (feature extraction)
+- Delivery module (retention campaigns)
+- Azure ML or scikit-learn (model training)
+
+---
+
+## Appendix B: API Reference
+
+| Method | Endpoint                               | Auth          | Description                   |
+| ------ | -------------------------------------- | ------------- | ----------------------------- |
+| GET    | `/predictive/health`                   | Admin         | Get all product health scores |
+| GET    | `/predictive/health/:productId`        | Admin         | Get product health detail     |
+| GET    | `/predictive/health/:productId/trends` | Admin         | Historical health trends      |
+| POST   | `/predictive/churn-score`              | Admin/Service | Get churn prediction for user |
+| POST   | `/predictive/churn-batch`              | Admin         | Batch churn scoring           |
+| GET    | `/predictive/at-risk-users`            | Admin/CS      | List users by risk segment    |
+| GET    | `/predictive/users/:id/risk-profile`   | Admin/CS      | User churn risk details       |
+| GET    | `/predictive/model/performance`        | Admin         | Model accuracy metrics        |
+| GET    | `/predictive/model/features`           | Admin         | Feature importance ranking    |
+| GET    | `/predictive/campaigns`                | Admin         | List retention campaigns      |
+| POST   | `/predictive/campaigns`                | Admin         | Create campaign               |
+| PATCH  | `/predictive/campaigns/:id`            | Admin         | Update campaign               |
+| GET    | `/predictive/campaigns/:id/stats`      | Admin         | Campaign performance          |
+| POST   | `/predictive/campaigns/:id/trigger`    | Admin         | Manual trigger for testing    |
+
+---
+
+## Appendix C: Integration Points
+
+### With Telemetry Module
+
+- Raw events feed feature extraction
+- Error rates flow into health scores
+- Correlation IDs link behaviors to predictions
+
+### With Diagnostics Module
+
+- Debug sessions enrich feature vectors
+- Error clusters correlate with churn risk
+- Screenshot patterns analyzed for UX issues
+
+### With Event Bus
+
+| Event                               | Action                                   |
+| ----------------------------------- | ---------------------------------------- |
+| `predictive.churn.risk_detected`    | Trigger retention campaign               |
+| `predictive.health.critical`        | Alert leadership, suggest debug sessions |
+| `predictive.anomaly.detected`       | Create incident, notify on-call          |
+| `user.retention.campaign_responded` | Update model with outcome                |
+
+### With Delivery Module
+
+- Retention campaigns use email templates
+- Push notifications for urgent interventions
+- A/B test message variants
+
+---
+
+## Appendix D: Cost Estimation
+
+| Component                            | Monthly Cost (est.)        |
+| ------------------------------------ | -------------------------- |
+| Cosmos DB (features + predictions)   | $150–300                   |
+| Model training (Azure ML)            | $100–200                   |
+| Inference compute                    | $50–100                    |
+| Email delivery (retention campaigns) | $50–200 (volume-dependent) |
+| **Total**                            | **$350–800/month**         |
+
+ROI: If system prevents 5% of predicted churn at $50 LTV with 10K at-risk users/month:
+
+- 500 users retained × $50 = $25K/month value
+- 10:1+ ROI
+
+---
+
+## Appendix E: Success Metrics
+
+### Model Performance
+
+- [ ] AUC > 75% (discrimination)
+- [ ] Calibration slope 0.9–1.1 (well-calibrated probabilities)
+- [ ] Precision@10% > 60% (high-risk predictions are accurate)
+
+### Business Impact
+
+- [ ] 10%+ reduction in churn rate for targeted cohorts
+- [ ] 5%+ increase in re-engagement campaign response
+- [ ] CS team satisfaction with at-risk user visibility
+
+### Operational
+
+- [ ] Prediction latency < 100ms
+- [ ] Feature freshness < 24 hours
+- [ ] Model retraining automation
+
+---
+
+## Current Status
+
+- [ ] **Design complete** — Target: 2026-03-10
+- [ ] **Phase 1: Feature Pipeline** — Not started
+- [ ] **Phase 2: Churn Model** — Not started
+- [ ] **Phase 3: Health Scoring** — Not started
+- [ ] **Phase 4: Interventions** — Not started
+- [ ] **Phase 5: Admin UI** — Not started
+- [ ] **Phase 6: Advanced** — Future
+
+**Estimated Timeline:** 3 weeks (Phases 1–5)
+
+**Dependencies:**
+
+- Telemetry module (for feature extraction)
+- Azure ML or similar (for model training)
+- Delivery module (for retention campaigns)
+
+---
+
+_Last Updated: 2026-03-03_