diff --git a/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md b/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md new file mode 100644 index 00000000..9634374f --- /dev/null +++ b/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md @@ -0,0 +1,597 @@ +# AI Diagnostic Assistant β€” Implementation Roadmap + +> **Module:** `platform-service/src/modules/ai-diagnostics/` +> **Admin UI:** `/ops/ai-diagnostics/` +> **Target:** LLM-powered root cause analysis from telemetry + debug sessions +> **Estimated Effort:** 2–3 weeks +> **Status:** 🟑 Planning + +--- + +## Executive Summary + +This roadmap delivers an **AI-powered diagnostic assistant** that analyzes error patterns, debug session data, and telemetry to automatically suggest root causesβ€”like having a senior engineer on-call 24/7. Engineers can ask natural language questions like _"Why did the iOS keyboard crash yesterday?"_ and receive AI-generated hypotheses with supporting evidence. + +### Key Differentiators vs. Manual Debugging + +| Feature | Manual Debugging | AI Diagnostic Assistant | +| ----------------- | --------------------------- | ----------------------------------- | +| Query | SQL + log grep | **Natural language** | +| Pattern Detection | Hours of manual correlation | **AI finds hidden patterns** | +| Context Assembly | Check 5+ systems manually | **Auto-assembles timeline** | +| Hypothesis | Engineer intuition | **LLM-generated + evidence** | +| Learning | Per-engineer experience | **Accumulates across all sessions** | + +--- + +## Phase 1: Data Pipeline & Embeddings (Week 1) + +**Goal:** Extract, normalize, and embed error data for semantic search and clustering. + +### 1.1 Error Fingerprinting & Clustering + +- [ ] **1.1.1** Create `modules/ai-diagnostics/types.ts` + - [ ] `ErrorClusterDoc` β€” grouped similar errors with signature + - [ ] `ErrorFingerprint` β€” normalized stack trace hash + - [ ] `ClusterAnalysis` β€” AI-generated pattern description + - [ ] Zod schemas for all inputs + + _Commit format:_ `git commit -m "feat(ai-diagnostics): add error clustering types [1.1.1]"` β†’ `https://github.com/saravanakumardb1/learning_ai_common_plat/commit/` + +- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts` + - [ ] `error_clusters` (pk: `/productId`, TTL: 90 days) + - [ ] `error_fingerprints` (pk: `/fingerprintHash`, unique index) + - [ ] `diagnostic_insights` (pk: `/clusterId`, AI-generated analyses) + + _Commit format:_ `git commit -m "feat(ai-diagnostics): add cosmos containers for error clustering [1.1.2]"` + +- [ ] **1.1.3** Implement error normalization + - [ ] Stack trace parsing (remove line numbers, file paths) + - [ ] Message templating (replace UUIDs, timestamps, user IDs with placeholders) + - [ ] Fingerprint generation (SHA-256 of normalized error) + - [ ] Similarity scoring (Levenshtein for near-matches) + + _Commit format:_ `git commit -m "feat(ai-diagnostics): implement error normalization and fingerprinting [1.1.3]"` + +### 1.2 Vector Embeddings for Semantic Search + +- [ ] **1.2.1** Create embedding pipeline + - [ ] Azure OpenAI `text-embedding-3-small` integration + - [ ] Error message + stack trace β†’ 1536-dim vector + - [ ] Batch embedding job (100 errors at a time) +- [ ] **1.2.2** Cosmos DB vector search setup + - [ ] Store embeddings in `error_clusters` documents + - [ ] Cosine similarity query function + - [ ] Similar error lookup by vector distance +- [ ] **1.2.3** Clustering algorithm + - [ ] HDBSCAN for density-based clustering + - [ ] DBSCAN fallback for smaller datasets + - [ ] Auto-determine cluster count (no manual k) + - [ ] Re-cluster nightly as new errors arrive + +### 1.3 Telemetry Ingestion for Context + +- [ ] **1.3.1** Link telemetry to errors + - [ ] `correlationId` propagation across services + - [ ] 5-minute window: error β†’ preceding telemetry events + - [ ] Session state reconstruction (what user was doing) +- [ ] **1.3.2** Enrich error context + - [ ] Device info (OS version, model, memory) + - [ ] App state (screen, feature flags, config) + - [ ] Recent API calls (network trace from diagnostics) + - [ ] Recent user actions (breadcrumb trail) + +**Phase 1 Exit Criteria:** + +- [ ] Errors auto-clustered with 90%+ accuracy +- [ ] Vector search returns semantically similar errors +- [ ] 10,000+ historical errors embedded and clustered +- [ ] Correlation pipeline links errors to telemetry context + +--- + +## Phase 2: LLM Analysis Engine (Week 1–2) + +### 2.1 Prompt Engineering & Analysis Pipeline + +- [ ] **2.1.1** Create analysis prompts + - [ ] `ROOT_CAUSE_ANALYSIS` prompt template + + ``` + Given this error cluster: + - Error signature: {fingerprint} + - Sample stack traces: {samples} + - Common context: {deviceStats}, {appState} + - Preceding events: {breadcrumbSummary} + - Similar resolved issues: {relatedClusters} + + Analyze and provide: + 1. Likely root cause category (config, dependency, logic, resource, external) + 2. Specific hypothesis with reasoning + 3. Evidence confidence (high/medium/low) + 4. Suggested investigation steps + 5. Potential fix direction + ``` + + - [ ] `PATTERN_SUMMARY` prompt for cluster descriptions + - [ ] `COMPARATIVE_ANALYSIS` for error vs. baseline + +- [ ] **2.1.2** LLM integration + - [ ] Azure OpenAI GPT-4o-mini for analysis (cost-effective) + - [ ] GPT-4o for complex multi-factor analysis + - [ ] Response JSON schema enforcement + - [ ] Retry logic with exponential backoff + +### 2.2 Insight Generation Service + +- [ ] **2.2.1** Create `modules/ai-diagnostics/analyzer.ts` + - [ ] `analyzeCluster(clusterId)` β€” full analysis workflow + - [ ] `generateInsight(errorContext)` β€” single error analysis + - [ ] `compareClusters(clusterA, clusterB)` β€” diff analysis +- [ ] **2.2.2** Analysis workflow + - [ ] Fetch cluster data + related telemetry + - [ ] Build LLM context (respect token limits) + - [ ] Call LLM with structured prompt + - [ ] Parse and validate response + - [ ] Store insight in `diagnostic_insights` +- [ ] **2.2.3** Confidence scoring + - [ ] Evidence count weighting + - [ ] Similar resolved issue bonus + - [ ] Recency decay (older patterns = lower confidence) + - [ ] Multi-model consensus (if available) + +### 2.3 Continuous Learning + +- [ ] **2.3.1** Feedback loop + - [ ] Engineer feedback: "Was this insight helpful? πŸ‘/πŸ‘Ž" + - [ ] Resolution tracking (link commits to clusters) + - [ ] Confidence recalibration based on outcomes +- [ ] **2.3.2** Pattern accumulation + - [ ] "Known issues" database (manually curated) + - [ ] Historical fix patterns (what solved similar issues) + - [ ] Regression detection (old issue reappearing) + +**Phase 2 Exit Criteria:** + +- [ ] LLM generates root cause hypotheses with evidence +- [ ] Confidence scores align with actual resolution rates +- [ ] Analysis completes in < 5 seconds for typical clusters +- [ ] Feedback loop capturing engineer ratings + +--- + +## Phase 3: Natural Language Query Interface (Week 2) + +### 3.1 Query Understanding + +- [ ] **3.1.1** Create `modules/ai-diagnostics/query-parser.ts` + - [ ] Intent classification (root cause, pattern search, comparison, trend) + - [ ] Entity extraction (product, time range, error type, user segment) + - [ ] Temporal parsing ("yesterday", "last week", "since v2.1") + - [ ] Constraint identification ("only iOS", "excluding beta users") +- [ ] **3.1.2** Query patterns + - [ ] Root cause: _"Why did X happen?"_ β†’ analyze cluster + - [ ] Pattern search: _"Show me similar crashes"_ β†’ vector search + - [ ] Comparison: _"Did error rate increase after release?"_ β†’ trend analysis + - [ ] User impact: _"How many users affected by Y?"_ β†’ aggregation query + +### 3.2 Query Execution Engine + +- [ ] **3.2.1** Query β†’ data pipeline + - [ ] Map entities to Cosmos queries + - [ ] Fetch relevant clusters, telemetry, sessions + - [ ] Assemble context for response generation +- [ ] **3.2.2** Response generation + - [ ] Direct answers for simple queries + - [ ] AI-generated summaries for complex analysis + - [ ] Data + visualization suggestions + - [ ] Drill-down links for exploration + +### 3.3 REST API Routes + +- [ ] **3.3.1** Create `modules/ai-diagnostics/routes.ts` + - [ ] `POST /ai-diagnostics/query` β€” natural language question + - [ ] `GET /ai-diagnostics/clusters/:id/analysis` β€” pre-computed insight + - [ ] `POST /ai-diagnostics/clusters/:id/analyze` β€” trigger fresh analysis + - [ ] `GET /ai-diagnostics/suggestions` β€” auto-suggested investigations + - [ ] `POST /ai-diagnostics/feedback` β€” submit insight rating + +**Phase 3 Exit Criteria:** + +- [ ] Natural language queries parse correctly (90%+ intent accuracy) +- [ ] Query β†’ response pipeline < 3 seconds +- [ ] Complex queries return structured answers with evidence +- [ ] API routes tested and documented + +--- + +## Phase 4: Admin Dashboard UI (Week 2–3) + +### 4.1 AI Insights Page + +- [ ] **4.1.1** Create `/ops/ai-diagnostics/page.tsx` + - [ ] Smart search bar (natural language input) + - [ ] Suggested queries based on recent errors + - [ ] Recent AI-generated insights list + - [ ] Trending clusters (auto-detected anomalies) +- [ ] **4.1.2** Query results view + - [ ] AI-generated answer with confidence badge + - [ ] Supporting evidence cards (cluster stats, sample errors) + - [ ] Related debug sessions (linked traces) + - [ ] Timeline visualization of error pattern + - [ ] "Investigate further" actions + +### 4.2 Cluster Detail with AI Analysis + +- [ ] **4.2.1** Enhance error cluster detail + - [ ] AI-generated summary card ("This appears to be...") + - [ ] Root cause hypothesis with confidence + - [ ] Evidence breakdown (stack samples, device patterns, API failures) + - [ ] Suggested fixes from similar resolved issues + - [ ] "Request deeper analysis" button (GPT-4o) +- [ ] **4.2.2** Interactive investigation + - [ ] Compare with other clusters ("Show me similar issues") + - [ ] Filter by context (OS version, app version, feature flags) + - [ ] View affected user journeys (breadcrumb trails) + +### 4.3 Proactive Alerts + +- [ ] **4.3.1** Anomaly detection + - [ ] Auto-detect emerging error clusters + - [ ] Spike in existing cluster frequency + - [ ] New error types after releases +- [ ] **4.3.2** AI-generated alerts + - [ ] Slack/Teams notification with summary + - [ ] "Investigate in AI Diagnostics" deep link + - [ ] Auto-started debug session recommendations + +**Phase 4 Exit Criteria:** + +- [ ] Admin can ask questions and get AI-generated answers +- [ ] Cluster detail shows AI analysis with evidence +- [ ] Proactive alerts for emerging issues +- [ ] Full test coverage (UI + API) + +--- + +## Phase 5: Advanced Capabilities (Future) + +### 5.1 Multi-Modal Analysis + +- [ ] Analyze screenshots from debug sessions for UI issues +- [ ] Voice transcription analysis (for voice app errors) +- [ ] Performance trace visualization with AI annotations + +### 5.2 Predictive Diagnostics + +- [ ] Pre-crash pattern detection (warn before crash happens) +- [ ] Resource exhaustion prediction (memory, disk, API quotas) +- [ ] Config drift detection ("this setting combination often fails") + +### 5.3 Self-Healing Suggestions + +- [ ] Auto-generated config recommendations +- [ ] Feature flag rollback suggestions +- [ ] Circuit breaker threshold recommendations + +## Implementation Tracking + +| Phase | Task | Status | Commit | +| ----- | -------------------------- | ------ | ------ | +| 1.1 | Error clustering types | ⬜ | β€” | +| 1.1 | Cosmos containers | ⬜ | β€” | +| 1.1 | Error normalization | ⬜ | β€” | +| 1.2 | Embedding pipeline | ⬜ | β€” | +| 1.2 | Vector search setup | ⬜ | β€” | +| 1.2 | Clustering algorithm | ⬜ | β€” | +| 1.3 | Telemetry linking | ⬜ | β€” | +| 1.3 | Error context enrichment | ⬜ | β€” | +| 2.1 | Analysis prompts | ⬜ | β€” | +| 2.1 | LLM integration | ⬜ | β€” | +| 2.2 | Insight generation service | ⬜ | β€” | +| 2.2 | Analysis workflow | ⬜ | β€” | +| 2.2 | Confidence scoring | ⬜ | β€” | +| 2.3 | Feedback loop | ⬜ | β€” | +| 2.3 | Pattern accumulation | ⬜ | β€” | +| 3.1 | Query parser | ⬜ | β€” | +| 3.1 | Query patterns | ⬜ | β€” | +| 3.2 | Query execution | ⬜ | β€” | +| 3.2 | Response generation | ⬜ | β€” | +| 3.3 | REST API routes | ⬜ | β€” | +| 4.1 | AI insights page | ⬜ | β€” | +| 4.1 | Query results view | ⬜ | β€” | +| 4.2 | Cluster detail | ⬜ | β€” | +| 4.2 | Interactive investigation | ⬜ | β€” | +| 4.3 | Proactive alerts | ⬜ | β€” | + +**Legend:** ⬜ Not started | 🟑 In progress | βœ… Complete | ⏸️ Deferred + +--- + +## Quick Reference for Implementing Agent + +**πŸ“‹ Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md` + +**Key Files to Modify/Create:** + +``` +services/platform-service/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ modules/ai-diagnostics/ +β”‚ β”‚ β”œβ”€β”€ types.ts # [1.1.1] Error clustering types +β”‚ β”‚ β”œβ”€β”€ repository.ts # [1.2] Data access layer +β”‚ β”‚ β”œβ”€β”€ analyzer.ts # [2.2] LLM analysis engine +β”‚ β”‚ β”œβ”€β”€ query-parser.ts # [3.1] NL query understanding +β”‚ β”‚ β”œβ”€β”€ query-executor.ts # [3.2] Query execution +β”‚ β”‚ β”œβ”€β”€ routes.ts # [3.3] REST API +β”‚ β”‚ └── ai-diagnostics.test.ts # Tests +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ β”œβ”€β”€ cosmos-init.ts # [1.1.2] Add containers +β”‚ β”‚ β”œβ”€β”€ embedding-client.ts # [1.2.1] Azure OpenAI embeddings +β”‚ β”‚ └── pii-redaction.ts # Reuse existing +β”‚ └── server.ts # [3.3] Register routes +dashboards/admin-web/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ app/(dashboard)/ +β”‚ β”‚ β”œβ”€β”€ ai-diagnostics/ +β”‚ β”‚ β”‚ β”œβ”€β”€ page.tsx # [4.1] Main insights page +β”‚ β”‚ β”‚ └── [id]/ +β”‚ β”‚ β”‚ └── page.tsx # [4.2] Cluster detail +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ └── ai-diagnostics-client.ts # API client +β”‚ └── components/ +β”‚ └── ai-diagnostics/ # Reusable components +``` + +**Commit Message Format:** + +``` +feat(ai-diagnostics): [] +``` + +**Example:** + +```bash +git add services/platform-service/src/modules/ai-diagnostics/ +git commit -m "feat(ai-diagnostics): add error clustering types and cosmos containers [1.1.1-1.1.2]" +``` + +**Testing Requirements:** + +- Unit tests: 20+ Vitest tests for clustering, embeddings, LLM responses +- Integration tests: End-to-end query β†’ analysis pipeline + +**Dependencies:** + +- Telemetry module (error events) +- Azure OpenAI (embeddings + GPT-4o) +- Existing diagnostics module (optional linking) + +--- + +### ErrorClusterDoc + +```typescript +interface ErrorClusterDoc { + id: string; // ec_ + productId: string; // partition key + fingerprintHash: string; // SHA-256 of normalized error + + // Cluster metadata + firstSeenAt: string; // ISO 8601 + lastSeenAt: string; + occurrenceCount: number; // Total occurrences + uniqueUsers: number; // Affected user count + + // Error signature + errorType: string; // Exception class/name + messageTemplate: string; // Normalized message with placeholders + stackSignature: string; // Normalized stack frames + + // Vector embedding for semantic search + embedding: number[]; // 1536-dim from text-embedding-3-small + embeddingVersion: string; // Model version for re-embedding + + // Context patterns (auto-extracted) + commonContext: { + osVersions: Array<{ version: string; count: number }>; + appVersions: Array<{ version: string; count: number }>; + deviceModels: Array<{ model: string; count: number }>; + screenContexts: Array<{ screen: string; count: number }>; + }; + + // Related data + relatedClusterIds: string[]; // Similar clusters (vector similarity) + mergedIntoClusterId?: string; // If deduplicated + + // Resolution tracking + status: 'active' | 'investigating' | 'resolved' | 'ignored'; + resolvedAt?: string; + resolutionCommit?: string; // Link to fix + + // Timestamps + createdAt: string; + updatedAt: string; + ttl: number; // 90 days +} +``` + +### DiagnosticInsightDoc + +```typescript +interface DiagnosticInsightDoc { + id: string; // di_ + clusterId: string; // partition key (with productId) + productId: string; + + // AI-generated analysis + analysisType: 'root_cause' | 'pattern' | 'comparison' | 'trend'; + generatedAt: string; + + // LLM output + rootCauseCategory: 'config' | 'dependency' | 'logic' | 'resource' | 'external' | 'unknown'; + hypothesis: string; // Natural language explanation + reasoning: string; // Why LLM thinks this + confidence: 'high' | 'medium' | 'low'; + confidenceScore: number; // 0.0–1.0 + + // Evidence + evidence: Array<{ + type: + | 'stack_trace' + | 'telemetry_pattern' + | 'device_correlation' + | 'api_failure' + | 'similar_issue'; + description: string; + strength: 'strong' | 'moderate' | 'weak'; + data: Record; + }>; + + // Suggested actions + suggestedInvestigation: string[]; + potentialFixDirection?: string; + similarResolvedIssues?: Array<{ + clusterId: string; + resolution: string; + confidence: number; + }>; + + // Feedback + feedbackStats: { + helpful: number; + notHelpful: number; + engineerNotes: string[]; + }; + + // LLM metadata + modelUsed: string; // gpt-4o, gpt-4o-mini + promptTokens: number; + completionTokens: number; + + createdAt: string; + ttl: number; // 90 days +} +``` + +### NaturalLanguageQueryDoc + +```typescript +interface NaturalLanguageQueryDoc { + id: string; // nq_ + userId: string; // Admin who asked + productId?: string; // Optional filter + + // Query + rawQuery: string; // "Why did iOS keyboard crash yesterday?" + parsedIntent: 'root_cause' | 'pattern_search' | 'comparison' | 'trend' | 'impact'; + extractedEntities: { + products?: string[]; + timeRange?: { start: string; end: string }; + errorTypes?: string[]; + platforms?: string[]; + userSegments?: string[]; + }; + + // Execution + executedQuery: string; // Translated Cosmos query + dataSources: string[]; // Clusters, telemetry, sessions accessed + executionTimeMs: number; + + // Response + aiResponse: string; // Generated answer + confidence: number; // Overall confidence + supportingData: Array<{ + type: 'cluster' | 'telemetry' | 'session'; + id: string; + relevanceScore: number; + }>; + + // Feedback + userRating?: 'helpful' | 'not_helpful'; + userComment?: string; + + createdAt: string; + ttl: number; // 30 days +} +``` + +--- + +## Appendix B: API Reference + +| Method | Endpoint | Auth | Description | +| ------ | --------------------------------------- | ----- | --------------------------------------- | +| POST | `/ai-diagnostics/query` | Admin | Natural language diagnostic query | +| GET | `/ai-diagnostics/clusters` | Admin | List error clusters (with AI summaries) | +| GET | `/ai-diagnostics/clusters/:id` | Admin | Cluster detail with AI analysis | +| POST | `/ai-diagnostics/clusters/:id/analyze` | Admin | Trigger fresh LLM analysis | +| GET | `/ai-diagnostics/clusters/:id/analysis` | Admin | Get pre-computed insight | +| GET | `/ai-diagnostics/suggestions` | Admin | AI-suggested investigations | +| POST | `/ai-diagnostics/feedback` | Admin | Rate insight helpfulness | +| POST | `/ai-diagnostics/search` | Admin | Semantic search across errors | + +--- + +## Appendix C: Integration Points + +### With Telemetry Module + +- Error events auto-create/update clusters +- Telemetry context enriches error analysis +- Correlation IDs link errors to user journeys + +### With Diagnostics Module + +- Debug sessions linked to error clusters +- Screenshots from sessions aid visual analysis +- Network traces provide API failure context + +### With Event Bus + +| Event | Action | +| ------------------------------- | --------------------------------------------------------- | +| `telemetry.error.ingested` | Update/create cluster, trigger re-analysis if new pattern | +| `diagnostics.session.completed` | Link session to related clusters, analyze captured logs | +| `diagnostics.ingest.fatal` | High-priority cluster analysis, alert if novel pattern | + +--- + +## Appendix D: Cost Estimation + +| Component | Monthly Cost (est.) | +| ------------------------ | ------------------------------- | +| Azure OpenAI embeddings | $50–100 (10K errors/day) | +| GPT-4o-mini analysis | $100–200 (1K analyses/day) | +| GPT-4o deep analysis | $50–100 (100 deep analyses/day) | +| Cosmos DB vector storage | $20–50 | +| **Total** | **$220–450/month** | + +Optimization: + +- Cache frequent cluster analyses (24hr TTL) +- Use GPT-4o-mini for 90% of queries +- Batch embedding jobs during off-peak + +--- + +## Current Status + +- [ ] **Design complete** β€” Target: 2026-03-10 +- [ ] **Phase 1: Data Pipeline** β€” Not started +- [ ] **Phase 2: LLM Engine** β€” Not started +- [ ] **Phase 3: Query Interface** β€” Not started +- [ ] **Phase 4: Admin UI** β€” Not started +- [ ] **Phase 5: Advanced Capabilities** β€” Future + +**Estimated Timeline:** 2–3 weeks (Phases 1–4) + +**Dependencies:** + +- Telemetry module (must be collecting errors) +- Diagnostics module (optional, for rich context) +- Azure OpenAI deployment (embedding + GPT-4o access) + +--- + +_Last Updated: 2026-03-03_ diff --git a/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md b/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md new file mode 100644 index 00000000..abe3750a --- /dev/null +++ b/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md @@ -0,0 +1,719 @@ +# Intelligent A/B Testing β€” Implementation Roadmap + +> **Module:** `platform-service/src/modules/ab-testing/` +> **Admin UI:** `/ops/experiments/` +> **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation +> **Estimated Effort:** 2.5–3 weeks +> **Status:** 🟑 Planning + +--- + +## Executive Summary + +This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns. + +### Key Differentiators vs. Static Feature Flags + +| Capability | Static Flags (Current) | Intelligent A/B Testing | +| ------------------ | ---------------------- | ----------------------------------------- | +| Traffic Allocation | Manual percentage | **Multi-armed bandit optimization** | +| Stopping Decision | Manual monitoring | **Auto-stop at statistical significance** | +| Winner Selection | Human judgment | **Bayesian probability of superiority** | +| Test Duration | Fixed (often wrong) | **Dynamic based on effect size** | +| Hypothesis | Human-written | **AI-generated from usage patterns** | +| Sample Size | Guesswork | **Power analysis + sequential testing** | + +--- + +## Phase 1: Core Experiment Engine (Week 1) + +### 1.1 Data Model & Schemas + +- [ ] **1.1.1** Create `modules/ab-testing/types.ts` + - [ ] `ExperimentDoc` β€” experiment definition and config + - [ ] `VariantDoc` β€” variant metadata + metrics + - [ ] `AssignmentDoc` β€” user β†’ variant assignments + - [ ] `MetricDoc` β€” event types being tracked + - [ ] `ExperimentResult` β€” statistical analysis results + - [ ] Zod schemas for all inputs +- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts` + - [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed) + - [ ] `experiment_variants` (pk: `/experimentId`) + - [ ] `experiment_assignments` (pk: `/userId`, query by experiment) + - [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series) + - [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates) + +### 1.2 Assignment & Bucketing + +- [ ] **1.2.1** Create deterministic bucketing + - [ ] Consistent hashing (userId + experimentId β†’ variant) + - [ ] FNV-1a hash algorithm (same as feature flags) + - [ ] Sticky assignments (user always sees same variant) + - [ ] Override capability (force specific variant for QA) +- [ ] **1.2.2** Assignment strategies + - [ ] `random` β€” Simple randomization (control vs static) + - [ ] `thompson` β€” Thompson sampling (multi-armed bandit) + - [ ] `epsilon_greedy` β€” Epsilon-greedy exploration + - [ ] `ucb` β€” Upper Confidence Bound algorithm +- [ ] **1.2.3** Audience targeting + - [ ] User property filters (platform, version, region, subscription tier) + - [ ] Percentage rollout within target segment + - [ ] Exclusion lists (beta users, internal accounts) + +### 1.3 Event Tracking Pipeline + +- [ ] **1.3.1** Metric definitions + - [ ] `conversion` β€” Binary (did/didn't convert) + - [ ] `count` β€” Integer events (sessions, messages) + - [ ] `duration` β€” Time-based (session length, task time) + - [ ] `revenue` β€” Monetary (purchase amount, LTV) + - [ ] `custom` β€” Arbitrary numeric values +- [ ] **1.3.2** Event ingestion + - [ ] `POST /ab-testing/events` batch endpoint + - [ ] Client SDK: `track(event, value, metadata)` + - [ ] Automatic attribution (which variant caused this event) + - [ ] Deduplication (eventId + userId uniqueness) + +**Phase 1 Exit Criteria:** + +- [ ] Experiments created with multiple variants +- [ ] Users consistently assigned to variants +- [ ] Events tracked and attributed correctly +- [ ] 20+ tests for assignment and ingestion + +--- + +## Phase 2: Statistical Analysis Engine (Week 1–2) + +### 2.1 Bayesian Inference + +- [ ] **2.1.1** Create `modules/ab-testing/statistics.ts` + - [ ] `BetaDistribution` for conversion rates + - [ ] `GammaDistribution` for count/duration metrics + - [ ] `NormalDistribution` for continuous metrics + - [ ] Monte Carlo simulation (10,000 samples) +- [ ] **2.1.2** Probability calculations + - [ ] `probabilityVariantBeatsControl(variant, control)` + - [ ] `expectedLossIfChosen(variant)` + - [ ] `probabilityBeatAllVariants(variant)` +- [ ] **2.1.3** Credible intervals + - [ ] 95% credible interval for each variant's true metric + - [ ] Visualization-ready (lower, mean, upper bounds) + +### 2.2 Early Stopping Rules + +- [ ] **2.2.1** Stopping criteria + - [ ] **Winner found:** Variant has > 95% probability of beating control + - [ ] **Loser clear:** Control has > 95% probability of beating variant + - [ ] **Practical significance:** Minimum detectable effect not reached + - [ ] **Time bound:** Max duration reached (safety limit) +- [ ] **2.2.2** Auto-promotion + - [ ] Auto-rollout winner to 100% when threshold hit + - [ ] Notify admins via Slack/email + - [ ] Create audit log entry +- [ ] **2.2.3** Guardrails + - [ ] Minimum sample size before early stopping (100 users/variant) + - [ ] Business hours only for auto-actions + - [ ] Require approval for revenue-impacting experiments + +### 2.3 Thompson Sampling + +- [ ] **2.3.1** Multi-armed bandit implementation + - [ ] Sample from posterior distributions + - [ ] Assign user to variant with highest sample + - [ ] Re-balance traffic every hour based on performance +- [ ] **2.3.2** Exploration vs exploitation + - [ ] Exploration rate decays over time + - [ ] High uncertainty = more exploration + - [ ] Clear winner = more traffic to winner +- [ ] **2.3.3** Regret minimization + - [ ] Track cumulative regret vs optimal variant + - [ ] Regret bounds reporting + +**Phase 2 Exit Criteria:** + +- [ ] Bayesian probabilities calculated correctly +- [ ] Early stopping triggers at appropriate thresholds +- [ ] Thompson sampling re-allocates traffic dynamically +- [ ] Statistical tests validate correctness + +--- + +## Phase 3: AI-Powered Hypothesis Generation (Week 2) + +### 3.1 Pattern Detection + +- [ ] **3.1.1** Usage pattern analysis + - [ ] Analyze feature flag usage telemetry + - [ ] Segment analysis (iOS vs Android, free vs pro) + - [ ] Temporal patterns (day of week, time of day) + - [ ] User behavior sequences (funnel analysis) +- [ ] **3.1.2** Anomaly detection + - [ ] Unexpected drop in feature adoption + - [ ] Performance regression signals + - [ ] User segment showing different behavior +- [ ] **3.1.3** Opportunity identification + - [ ] Underperforming features (low adoption) + - [ ] High-dropoff flows + - [ ] Competitor feature gaps + +### 3.2 Hypothesis Generation + +- [ ] **3.2.1** LLM hypothesis prompts + + ``` + Given this feature usage data: + - Feature: {featureName} + - Current adoption: {adoptionRate}% (baseline: {baseline}%) + - Segment performance: {segmentData} + - User feedback: {feedbackSamples} + - Competitor analysis: {competitorFeatures} + + Generate experiment hypotheses: + 1. Primary hypothesis: "Changing X will improve Y because..." + 2. Secondary hypotheses (2-3 alternatives) + 3. Expected effect size (conservative estimate) + 4. Success metric recommendation + 5. Risk assessment + ``` + +- [ ] **3.2.2** Hypothesis ranking + - [ ] Expected impact scoring + - [ ] Implementation difficulty estimate + - [ ] Statistical power prediction + - [ ] Risk-adjusted expected value +- [ ] **3.2.3** Suggested experiment design + - [ ] Variant count recommendation + - [ ] Traffic allocation suggestion + - [ ] Duration estimate + - [ ] Required sample size calculation + +### 3.3 Auto-Experiment Suggestions + +- [ ] **3.3.1** Weekly AI reports + - [ ] Top 5 experiment opportunities + - [ ] Hypotheses with supporting evidence + - [ ] Prioritized by expected impact +- [ ] **3.3.2** One-click experiment creation + - [ ] Pre-fill experiment from hypothesis + - [ ] Suggested variants with descriptions + - [ ] Pre-configured metrics + +**Phase 3 Exit Criteria:** + +- [ ] AI generates meaningful hypotheses from usage data +- [ ] Hypothesis quality rated by product team (80%+ useful) +- [ ] Auto-suggested experiments created in 1 click +- [ ] Weekly reports generated automatically + +--- + +## Phase 4: Admin Dashboard UI (Week 2–3) + +### 4.1 Experiments List Page + +- [ ] **4.1.1** Create `/ops/experiments/page.tsx` + - [ ] Experiment cards (status, duration, sample size) + - [ ] Quick filters (running, completed, draft) + - [ ] AI-generated hypothesis badge + - [ ] Health indicators (traffic balance, event flow) +- [ ] **4.1.2** Experiment creation wizard + - [ ] Step 1: Define hypothesis (AI suggestions available) + - [ ] Step 2: Create variants (name, description, config) + - [ ] Step 3: Select metrics (primary + secondary) + - [ ] Step 4: Audience targeting + - [ ] Step 5: Traffic allocation (manual or Thompson) + - [ ] Step 6: Review and launch + +### 4.2 Live Experiment Dashboard + +- [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx` + - [ ] Real-time metrics comparison + - [ ] Variant performance table (conversions, counts, durations) + - [ ] Bayesian probability visualization + - [ ] Credible interval charts +- [ ] **4.2.2** Statistical summary card + - [ ] Probability of beating control (per variant) + - [ ] Expected lift if implemented + - [ ] Sample size progress bar + - [ ] Days to significance estimate +- [ ] **4.2.3** Action buttons + - [ ] Adjust traffic allocation + - [ ] Pause/resume experiment + - [ ] Stop and declare winner + - [ ] Rollout winner to 100% + - [ ] Archive experiment + +### 4.3 Results & Reporting + +- [ ] **4.3.1** Results page + - [ ] Final statistical summary + - [ ] Variant comparison visualization + - [ ] Segment breakdown (iOS vs Android, etc.) + - [ ] Confidence intervals over time +- [ ] **4.3.2** AI insights panel + - [ ] Why this result occurred (LLM summary) + - [ ] Unexpected findings + - [ ] Follow-up experiment suggestions +- [ ] **4.3.3** Export capabilities + - [ ] CSV export of raw data + - [ ] PDF report generation + - [ ] API endpoint for data warehouse sync + +**Phase 4 Exit Criteria:** + +- [ ] Full experiment lifecycle manageable in UI +- [ ] Real-time stats visible and accurate +- [ ] Bayesian visualizations clear to non-statisticians +- [ ] Export and reporting functional + +--- + +## Phase 5: Advanced Capabilities (Future) + +### 5.1 Multi-Variate Testing + +- [ ] Test multiple variables simultaneously +- [ ] Full factorial and fractional factorial designs +- [ ] Interaction effect detection + +### 5.2 Sequential Experimentation + +- [ ] Multi-phase experiments (qualification β†’ main β†’ validation) +- [ ] Holdout groups for long-term validation +- [ ] Global holdout (never-exposed users) + +### 5.3 Personalization Layer + +- [ ] Contextual bandits (different variants for different users) +- [ ] ML model for variant selection +- [ ] Automatic personalization optimization + +### 5.4 Experiment Coordination + +- [ ] Mutually exclusive experiments +- [ ] Experiment priority rules +- [ ] Layered experimentation (orthogonal tests) + +--- + +## Appendix A: Data Models + +### ExperimentDoc + +```typescript +interface ExperimentDoc { + id: string; // exp_ + productId: string; // partition key + + // Experiment definition + name: string; + description: string; + hypothesis: string; + aiGeneratedHypothesis?: boolean; // Flag for AI-suggested + + // Status lifecycle: draft β†’ running β†’ paused | stopped | completed + status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed'; + + // Variants + controlVariantId: string; // Baseline variant + variantIds: string[]; // All variant IDs + + // Configuration + allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb'; + targetPercent: number; // % of eligible traffic + + // Audience targeting + targeting: { + platforms?: string[]; // ios, android, web + appVersions?: { min: string; max?: string }; + regions?: string[]; + userSegments?: string[]; // pro, free, enterprise + userProperties?: Record; + }; + + // Metrics + primaryMetric: { + name: string; + type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom'; + eventName: string; // Telemetry event to track + aggregation: 'sum' | 'mean' | 'count' | 'unique'; + direction: 'increase' | 'decrease'; // Is higher better? + minimumDetectableEffect: number; // % change we want to detect + }; + secondaryMetrics: Array<{ + name: string; + type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom'; + eventName: string; + }>; + + // Guardrails + guardrails: { + minSampleSizePerVariant: number; // Default: 100 + maxDurationDays: number; // Safety limit, default: 30 + autoStopEnabled: boolean; + winnerThreshold: number; // % probability to auto-stop, default: 95 + requireApprovalFor: 'none' | 'revenue' | 'all'; + }; + + // Scheduling + startAt?: string; // Scheduled start (ISO 8601) + endAt?: string; // Scheduled end or actual stop + + // Stats (denormalized for fast reads) + totalParticipants: number; + totalEvents: number; + + // Timestamps + createdAt: string; + updatedAt: string; + startedAt?: string; + completedAt?: string; + ttl: number; // 2 years for completed +} +``` + +### VariantDoc + +```typescript +interface VariantDoc { + id: string; // var_ + experimentId: string; // partition key + + // Variant definition + name: string; // "Control", "New Button Color", etc. + description?: string; + isControl: boolean; + + // Feature flag configuration + flagConfig: Record; // Arbitrary config payload + + // Traffic allocation (dynamic for bandit strategies) + currentAllocationPercent: number; // 0–100% + + // Statistics (real-time computed) + stats: { + participants: number; + events: number; + + // Primary metric + primaryMetricValue: number; // Mean or conversion rate + primaryMetricStdDev?: number; + + // For conversion metrics + conversions?: number; + conversionRate?: number; // 0–1 + + // Bayesian posterior parameters + betaAlpha?: number; // For Beta distribution + betaBeta?: number; + + gammaShape?: number; // For Gamma distribution + gammaScale?: number; + }; + + // Bayesian results + bayesianResults?: { + probabilityBeatsControl: number; // 0–1 + probabilityBeatsAll: number; // 0–1 + expectedLiftPercent: number; // Relative to control + expectedLoss: number; // Risk of choosing this variant + credibleInterval: { + lower: number; + mean: number; + upper: number; + }; + }; + + createdAt: string; + updatedAt: string; +} +``` + +### ExperimentAssignmentDoc + +```typescript +interface ExperimentAssignmentDoc { + id: string; // ea_ + userId: string; // partition key (for user lookups) + + experimentId: string; + variantId: string; + + // Assignment metadata + assignedAt: string; // First assignment + firstExposedAt?: string; // First actual exposure (feature use) + + // Context at assignment + assignmentContext: { + platform: string; + appVersion: string; + osVersion: string; + deviceModel?: string; + region?: string; + }; + + // Events attributed to this assignment + eventCount: number; + lastEventAt?: string; + + // TTL: Remove after experiment completes + analysis period + ttl: number; // experimentEnd + 90 days +} +``` + +### ExperimentEventDoc + +```typescript +interface ExperimentEventDoc { + id: string; // ee_ + experimentId: string; // partition key + timestamp: string; // Sort key for time-series queries + + // Attribution + userId: string; + variantId: string; + assignmentId: string; + + // Event details + metricName: string; + metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom'; + value: number; // Numeric value + + // Conversion tracking (for binary metrics) + converted: boolean; // For conversion metrics + + // Context + eventMetadata?: Record; + + // Denormalized for filtering + platform: string; + appVersion: string; + + // TTL: Shorter for raw events + ttl: number; // 90 days +} +``` + +--- + +## Implementation Tracking + +| Phase | Task | Status | Commit | +| ----- | ----------------------------- | ------ | ------ | +| 1.1 | Experiment types & schemas | ⬜ | β€” | +| 1.1 | Cosmos containers | ⬜ | β€” | +| 1.2 | Deterministic bucketing | ⬜ | β€” | +| 1.2 | Assignment strategies | ⬜ | β€” | +| 1.2 | Audience targeting | ⬜ | β€” | +| 1.3 | Metric definitions | ⬜ | β€” | +| 1.3 | Event ingestion | ⬜ | β€” | +| 2.1 | Bayesian inference engine | ⬜ | β€” | +| 2.1 | Probability calculations | ⬜ | β€” | +| 2.1 | Credible intervals | ⬜ | β€” | +| 2.2 | Early stopping rules | ⬜ | β€” | +| 2.2 | Auto-promotion | ⬜ | β€” | +| 2.2 | Guardrails | ⬜ | β€” | +| 2.3 | Thompson sampling | ⬜ | β€” | +| 2.3 | Exploration vs exploitation | ⬜ | β€” | +| 2.3 | Regret minimization | ⬜ | β€” | +| 3.1 | Pattern detection | ⬜ | β€” | +| 3.1 | Anomaly detection | ⬜ | β€” | +| 3.2 | Hypothesis generation prompts | ⬜ | β€” | +| 3.2 | Hypothesis ranking | ⬜ | β€” | +| 3.3 | Auto-experiment suggestions | ⬜ | β€” | +| 4.1 | Experiments list page | ⬜ | β€” | +| 4.1 | Creation wizard | ⬜ | β€” | +| 4.2 | Live dashboard | ⬜ | β€” | +| 4.2 | Statistical summary | ⬜ | β€” | +| 4.3 | Results & reporting | ⬜ | β€” | +| 4.3 | AI insights panel | ⬜ | β€” | + +**Legend:** ⬜ Not started | 🟑 In progress | βœ… Complete | ⏸️ Deferred + +--- + +## Quick Reference for Implementing Agent + +**πŸ“‹ Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md` + +**Key Files to Modify/Create:** + +``` +services/platform-service/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ modules/ab-testing/ +β”‚ β”‚ β”œβ”€β”€ types.ts # [1.1] Experiment, Variant, Assignment types +β”‚ β”‚ β”œβ”€β”€ repository.ts # [1.2] Data access layer +β”‚ β”‚ β”œβ”€β”€ bucketing.ts # [1.2] FNV-1a hash, sticky assignments +β”‚ β”‚ β”œβ”€β”€ statistics.ts # [2.1] Bayesian inference, Beta/Normal distributions +β”‚ β”‚ β”œβ”€β”€ allocation.ts # [2.3] Thompson sampling, bandit strategies +β”‚ β”‚ β”œβ”€β”€ hypothesis-generator.ts # [3.2] LLM pattern analysis +β”‚ β”‚ β”œβ”€β”€ routes.ts # [4] REST API +β”‚ β”‚ └── ab-testing.test.ts # Tests +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ └── cosmos-init.ts # [1.1] Add containers +β”‚ └── server.ts # Register routes +dashboards/admin-web/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ app/(dashboard)/ +β”‚ β”‚ β”œβ”€β”€ experiments/ +β”‚ β”‚ β”‚ β”œβ”€β”€ page.tsx # [4.1] Experiments list +β”‚ β”‚ β”‚ β”œβ”€β”€ new/page.tsx # [4.1] Creation wizard +β”‚ β”‚ β”‚ └── [id]/ +β”‚ β”‚ β”‚ └── page.tsx # [4.2] Live dashboard +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ └── experiments-client.ts # API client +β”‚ └── components/ +β”‚ └── experiments/ # Bayesian charts, variant cards +``` + +**Commit Message Format:** + +``` +feat(ab-testing): [] +``` + +**Example:** + +```bash +git add services/platform-service/src/modules/ab-testing/ +git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]" +``` + +**Testing Requirements:** + +- Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms +- Statistical validation: A/A tests, known distribution tests +- Integration: End-to-end experiment lifecycle + +**Dependencies:** + +- Feature flags module (reuse bucketing logic) +- Telemetry module (event tracking) +- Azure OpenAI (hypothesis generation) + +--- + +## Appendix B: Statistical Methods + +### Bayesian A/B Testing + +**Conversion Metrics (Beta-Binomial):** + +``` +Posterior: Beta(Ξ± + conversions, Ξ² + non-conversions) +Where Ξ± = Ξ² = 1 (uniform prior) + +Probability variant beats control: +P(variant > control) = Ξ£(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)] +``` + +**Continuous Metrics (Normal):** + +``` +Posterior: Normal(ΞΌ_n, Οƒ_nΒ²) +Where ΞΌ_n, Οƒ_n updated via conjugate prior + +Probability variant beats control via Monte Carlo sampling +``` + +### Thompson Sampling + +``` +For each incoming user: + For each variant: + Sample ΞΈ_i from variant's posterior distribution + Assign user to variant with max(ΞΈ_i) + +Update variant's posterior after observing outcome +``` + +### Early Stopping + +``` +Stop experiment when: + max_variant P(beats control) > 0.95 β†’ Winner found + OR max_variant P(beats control) < 0.05 β†’ No winner + OR days_running > max_duration + AND samples_per_variant > min_sample_size +``` + +--- + +## Appendix C: API Reference + +| Method | Endpoint | Auth | Description | +| ------ | -------------------------------------- | -------- | -------------------------------- | +| POST | `/ab-testing/experiments` | Admin | Create experiment | +| GET | `/ab-testing/experiments` | Admin | List experiments | +| GET | `/ab-testing/experiments/:id` | Admin | Get experiment details | +| PATCH | `/ab-testing/experiments/:id` | Admin | Update experiment | +| DELETE | `/ab-testing/experiments/:id` | Admin | Stop/archive experiment | +| POST | `/ab-testing/experiments/:id/start` | Admin | Start experiment | +| POST | `/ab-testing/experiments/:id/pause` | Admin | Pause experiment | +| POST | `/ab-testing/experiments/:id/complete` | Admin | Complete with winner | +| POST | `/ab-testing/assign` | Any auth | Get variant assignment for user | +| POST | `/ab-testing/events` | Any auth | Track experiment event | +| GET | `/ab-testing/experiments/:id/results` | Admin | Get statistical results | +| GET | `/ab-testing/suggestions` | Admin | AI-generated experiment ideas | +| POST | `/ab-testing/hypotheses` | Admin | Generate hypothesis from pattern | + +--- + +## Appendix D: Integration Points + +### With Feature Flags Module + +- Experiments build on feature flag infrastructure +- Flag state = variant assignment +- Consistent bucketing with existing flags + +### With Telemetry Module + +- Experiment events enriched with telemetry context +- Automatic metric tracking from existing events +- Funnel analysis using telemetry breadcrumbs + +### With Event Bus + +| Event | Action | +| ----------------------------- | ----------------------------------- | +| `ab.experiment.started` | Notify stakeholders, log audit | +| `ab.experiment.completed` | Generate report, suggest follow-ups | +| `ab.variant.declared_winner` | Trigger auto-rollout if enabled | +| `ab.early_stopping.triggered` | Alert experiment owner | + +--- + +## Appendix E: Cost Estimation + +| Component | Monthly Cost (est.) | +| ---------------------------- | ------------------------ | +| Cosmos DB (experiment data) | $100–200 | +| LLM hypothesis generation | $50–100 (weekly reports) | +| Compute (statistical engine) | $50 (negligible) | +| **Total** | **$200–350/month** | + +--- + +## Current Status + +- [ ] **Design complete** β€” Target: 2026-03-10 +- [ ] **Phase 1: Core Engine** β€” Not started +- [ ] **Phase 2: Statistics** β€” Not started +- [ ] **Phase 3: AI Hypotheses** β€” Not started +- [ ] **Phase 4: Admin UI** β€” Not started +- [ ] **Phase 5: Advanced** β€” Future + +**Estimated Timeline:** 2.5–3 weeks (Phases 1–4) + +**Dependencies:** + +- Feature flags module (for assignment infrastructure) +- Telemetry module (for event tracking) +- Azure OpenAI (for hypothesis generation) + +--- + +_Last Updated: 2026-03-03_ diff --git a/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md b/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md new file mode 100644 index 00000000..a54f97ee --- /dev/null +++ b/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md @@ -0,0 +1,848 @@ +# Predictive Churn & Health Scoring β€” Implementation Roadmap + +> **Module:** `platform-service/src/modules/predictive-analytics/` +> **Admin UI:** `/ops/health-dashboard/` +> **Target:** ML-powered churn prediction, health scoring, and proactive retention +> **Estimated Effort:** 3 weeks +> **Status:** 🟑 Planning + +--- + +## Executive Summary + +This roadmap delivers a **predictive analytics platform** that forecasts user churn 7–30 days in advance and computes product health scores from telemetry. Unlike reactive dashboards that show what happened, this system **predicts what will happen**β€”enabling proactive retention campaigns, resource allocation, and product improvements before users leave. + +### Key Differentiators vs. Reactive Analytics + +| Capability | Traditional Analytics | Predictive Churn & Health | +| ------------------- | -------------------------- | --------------------------------- | +| Insight Type | Historical (what happened) | **Predictive (what will happen)** | +| Churn Detection | After user leaves | **7–30 days before churn** | +| Health View | Current snapshot only | **Trending + forecasted** | +| Interventions | Reactive recovery | **Proactive prevention** | +| Product Insights | Manual pattern search | **Auto-detected risk signals** | +| Resource Allocation | Guesswork | **Risk-weighted prioritization** | + +--- + +## Phase 1: Feature Engineering Pipeline (Week 1) + +### 1.1 Telemetry Feature Extraction + +- [ ] **1.1.1** Create `modules/predictive-analytics/feature-extractor.ts` + - [ ] User behavior features (session frequency, depth, recency) + - [ ] Engagement features (feature usage diversity, core action completion) + - [ ] Performance features (error rate, latency exposure, crash frequency) + - [ ] Social features (sharing, collaboration, network effects) + - [ ] Revenue features (payment history, plan changes, support tickets) +- [ ] **1.1.2** Time-window aggregations + - [ ] Last 24 hours (recent behavior) + - [ ] Last 7 days (weekly patterns) + - [ ] Last 30 days (monthly trends) + - [ ] Life-to-date (all-time totals) +- [ ] **1.1.3** Rolling window features + - [ ] 7-day rolling average (trend smoothing) + - [ ] Week-over-week change (acceleration) + - [ ] Cohort-normalized scores (vs. similar users) + +### 1.2 Feature Store + +- [ ] **1.2.1** Create `modules/predictive-analytics/feature-store.ts` + - [ ] `UserFeatureVector` β€” normalized feature values per user + - [ ] `ProductHealthMetrics` β€” aggregated product-level scores + - [ ] Feature versioning (track feature schema changes) +- [ ] **1.2.2** Add Cosmos containers to `cosmos-init.ts` + - [ ] `user_features` (pk: `/userId`, TTL: 90 days) + - [ ] `product_health` (pk: `/productId` + `/date`, time-series) + - [ ] `feature_definitions` (pk: `/productId`, feature metadata) +- [ ] **1.2.3** Feature computation jobs + - [ ] Daily feature computation (nightly batch) + - [ ] Real-time feature updates (on key events) + - [ ] Feature backfill (compute historical features) + +### 1.3 Product-Specific Feature Catalog + +- [ ] **1.3.1** Define features per product + - [ ] **NomGap:** Fast completion rate, protocol adherence, streak length, autophagy engagement + - [ ] **JarvisJr:** Session frequency, agent diversity, voice/text ratio, skill progression + - [ ] **ChronoMind:** Timer completion rate, cascade effectiveness, routine adherence, urgency response + - [ ] **MindLyst:** Brain usage diversity, triage accuracy, memory capture frequency, reflection completion + - [ ] **PeakPulse:** Session frequency, goal completion, streak maintenance, social sharing + - [ ] **LysnrAI:** Dictation frequency, accuracy rate, hotkey usage, vocabulary growth +- [ ] **1.3.2** Feature importance tracking + - [ ] Which features correlate with churn/retention + - [ ] Feature drift detection (behavior changes over time) + - [ ] Auto-suggest new features based on patterns + +**Phase 1 Exit Criteria:** + +- [ ] 50+ features extracted per product +- [ ] Feature store populated for all active users +- [ ] Daily feature computation job running +- [ ] Feature importance analysis completed + +--- + +## Phase 2: Churn Prediction Model (Week 1–2) + +### 2.1 Model Architecture + +- [ ] **2.1.1** Create `modules/predictive-analytics/churn-model.ts` + - [ ] Binary classification (will churn in 7 days? 30 days?) + - [ ] Gradient Boosted Trees (XGBoost/LightGBM) baseline + - [ ] Neural network ensemble (for comparison) +- [ ] **2.1.2** Training pipeline + - [ ] Label definition: No activity for N days = churned + - [ ] Train/validation/test split (time-based, not random) + - [ ] Cross-validation with temporal folds + - [ ] Hyperparameter tuning (optuna/ray tune) +- [ ] **2.1.3** Model evaluation + - [ ] ROC-AUC (discrimination ability) + - [ ] Precision/Recall at different thresholds + - [ ] Calibration (predicted prob vs. actual rate) + - [ ] Per-product performance breakdown + +### 2.2 Prediction Service + +- [ ] **2.2.1** Real-time scoring API + - [ ] `POST /predictive/churn-score` β€” single user prediction + - [ ] `POST /predictive/churn-batch` β€” batch scoring + - [ ] Latency < 100ms for single prediction +- [ ] **2.2.2** Risk segmentation + - [ ] Risk buckets: Critical (>80%), High (60–80%), Medium (30–60%), Low (<30%) + - [ ] Risk score components (which features drive the score) + - [ ] Confidence intervals on predictions +- [ ] **2.2.3** Model versioning + - [ ] A/B test model versions + - [ ] Shadow mode (predict without acting) + - [ ] Rollback capability + +### 2.3 Explanation Engine + +- [ ] **2.3.1** SHAP value computation + - [ ] Feature contributions to each prediction + - [ ] Global feature importance (what drives churn overall) + - [ ] Local explanations (why this specific user is at risk) +- [ ] **2.3.2** Natural language explanations + ``` + "This user shows 78% churn risk because: + - Session frequency dropped 60% in the last week + - No core feature usage in 5 days + - Error rate increased 3x vs. their baseline + - Similar users who showed these patterns had 85% churn rate" + ``` +- [ ] **2.3.3** Actionable insight extraction + - [ ] Top 3 risk factors per user + - [ ] Suggested intervention based on risk profile + - [ ] Priority ranking (who to contact first) + +**Phase 2 Exit Criteria:** + +- [ ] Model achieves > 75% AUC on test set +- [ ] Real-time scoring API < 100ms latency +- [ ] Explanations generated for all predictions +- [ ] Risk segmentation validated against historical churn + +--- + +## Phase 3: Product Health Scoring (Week 2) + +### 3.1 Health Metric Framework + +- [ ] **3.1.1** Create `modules/predictive-analytics/health-scoring.ts` + - [ ] Health dimensions: Acquisition, Activation, Retention, Revenue, Engagement + - [ ] Composite health score (weighted average) + - [ ] Per-dimension scores with drill-down +- [ ] **3.1.2** Product health indicators + - [ ] Daily Active Users (DAU) trend + - [ ] New user activation rate (Day-1, Day-7) + - [ ] Cohort retention curves (Day-1, Day-7, Day-30) + - [ ] Feature adoption rates (new feature uptake) + - [ ] Error rates and stability scores + - [ ] Support ticket volume and sentiment + - [ ] Revenue metrics (MRR, ARPU, LTV) + +### 3.2 Health Score Computation + +- [ ] **3.2.1** Baseline establishment + - [ ] Historical 90-day baseline for each metric + - [ ] Peer product comparison (ChronoMind vs. JarvisJr benchmarks) + - [ ] Industry benchmarks (if available) +- [ ] **3.2.2** Scoring algorithm + - [ ] Z-score normalization (how many std devs from baseline) + - [ ] Trend direction (improving vs. declining) + - [ ] Volatility adjustment (consistent vs. erratic) + - [ ] 0–100 health score scale +- [ ] **3.2.3** Alert thresholds + - [ ] Critical: Score < 60 or 20% drop from baseline + - [ ] Warning: Score 60–75 or 10% drop + - [ ] Healthy: Score > 75 and stable + +### 3.3 Anomaly Detection + +- [ ] **3.3.1** Statistical anomaly detection + - [ ] Prophet/ARIMA for time-series forecasting + - [ ] Forecast vs. actual deviation detection + - [ ] Seasonal pattern recognition (day-of-week, monthly) +- [ ] **3.3.2** Multi-dimensional anomaly detection + - [ ] Correlation breakdown detection (metrics usually correlated diverging) + - [ ] Cohort-specific anomalies (specific region, platform, segment) +- [ ] **3.3.3** Root cause suggestion + - [ ] Correlation with deployments/releases + - [ ] Error spike correlation + - [ ] External factor detection (holidays, events) + +**Phase 3 Exit Criteria:** + +- [ ] Health scores computed daily for all products + - [ ] Anomaly detection with < 5% false positive rate + - [ ] Historical baseline established for all metrics + +--- + +## Phase 4: Proactive Intervention System (Week 2–3) + +### 4.1 Retention Campaign Automation + +- [ ] **4.1.1** Campaign trigger rules + - [ ] High-risk user enters segment β†’ trigger email + - [ ] Medium-risk + specific behavior β†’ trigger in-app message + - [ ] Critical risk β†’ trigger personal outreach task +- [ ] **4.1.2** Personalized messaging + - [ ] Message variant based on risk factors + - [ ] Feature recommendations based on unused capabilities + - [ ] Success stories from similar users +- [ ] **4.1.3** Campaign effectiveness tracking + - [ ] Control group vs. treatment + - [ ] Churn rate comparison + - [ ] Revenue impact measurement + +### 4.2 Auto-Trigger Flows + +- [ ] **4.2.1** Platform integrations + - [ ] Email delivery via existing `modules/delivery/` + - [ ] Push notifications via `modules/notifications/` + - [ ] Slack notifications for CS team + - [ ] CRM integration (create outreach tasks) +- [ ] **4.2.2** Smart scheduling + - [ ] Optimal contact time prediction + - [ ] Frequency capping (don't spam) + - [ ] Multi-channel orchestration +- [ ] **4.2.3** Feedback loop + - [ ] Track intervention outcomes + - [ ] Retrain model with intervention effectiveness + - [ ] A/B test intervention strategies + +### 4.3 Risk Dashboard for CS Team + +- [ ] **4.3.1** At-risk user list + - [ ] Sortable by churn probability + - [ ] Filter by product, segment, risk factors + - [ ] Last activity preview +- [ ] **4.3.2** User risk profile + - [ ] Churn probability trend over time + - [ ] Key risk factors highlighted + - [ ] Recommended actions + - [ ] User activity timeline +- [ ] **4.3.3** Intervention tracking + - [ ] Contact history + - [ ] Response tracking + - [ ] Outcome recording (retained/churned) + +**Phase 4 Exit Criteria:** + +- [ ] Automated campaigns triggered for high-risk users +- [ ] CS team dashboard with at-risk user queue +- [ ] Intervention effectiveness measurement in place +- [ ] Feedback loop improving model accuracy + +--- + +## Phase 5: Admin Dashboard UI (Week 3) + +### 5.1 Product Health Overview + +- [ ] **5.1.1** Create `/ops/health-dashboard/page.tsx` + - [ ] Health score cards for each product + - [ ] Trend sparklines (7-day, 30-day) + - [ ] Alert summary (critical issues count) + - [ ] Product comparison table +- [ ] **5.1.2** Health detail view + - [ ] Dimension breakdown (acquisition, activation, retention, etc.) + - [ ] Metric time-series charts + - [ ] Anomaly markers on charts + - [ ] Cohort retention curves + +### 5.2 Churn Prediction Dashboard + +- [ ] **5.2.1** Churn risk overview + - [ ] Risk distribution pie chart + - [ ] At-risk user count by product + - [ ] Predicted churn impact (revenue at risk) + - [ ] Model performance metrics (AUC, calibration) +- [ ] **5.2.2** User risk explorer + - [ ] Search/filter at-risk users + - [ ] Risk score with explanation + - [ ] Top risk factors + - [ ] Recommended interventions +- [ ] **5.2.3** Model insights + - [ ] Global feature importance chart + - [ ] Model performance over time + - [ ] Feature drift alerts + +### 5.3 Campaign Management + +- [ ] **5.3.1** Campaign list + - [ ] Active/paused/completed campaigns + - [ ] Trigger rules summary + - [ ] Performance stats (sent, opened, converted) +- [ ] **5.3.2** Campaign editor + - [ ] Trigger condition builder + - [ ] Message template editor + - [ ] Audience targeting + - [ ] A/B test configuration +- [ ] **5.3.3** Campaign analytics + - [ ] Funnel: triggered β†’ sent β†’ opened β†’ retained + - [ ] Revenue impact + - [ ] Comparison to control group + +**Phase 5 Exit Criteria:** + +- [ ] Health dashboard shows all products with trends +- [ ] Churn predictions visible with explanations +- [ ] Campaign creation and management functional +- [ ] Full test coverage + +--- + +## Phase 6: Advanced Capabilities (Future) + +### 6.1 Cohort-Specific Models + +- [ ] Segment-specific churn models (iOS vs. Android, free vs. pro) +- [ ] Regional models (different behaviors by geography) +- [ ] Temporal models (seasonal churn patterns) + +### 6.2 LTV Prediction + +- [ ] Predict lifetime value at signup +- [ ] Predict upgrade probability (free β†’ pro) +- [ ] Optimize acquisition channels by predicted LTV + +### 6.3 Product Recommendations + +- [ ] Suggest features to at-risk users based on successful cohorts +- [ ] Personalized onboarding based on predicted needs +- [ ] Next-best-action recommendations + +--- + +## Appendix A: Data Models + +### UserChurnPredictionDoc + +```typescript +interface UserChurnPredictionDoc { + id: string; // cp_ + userId: string; // partition key + productId: string; + + // Prediction + predictionHorizon: 7 | 14 | 30; // Days + churnProbability: number; // 0–1 + riskSegment: 'critical' | 'high' | 'medium' | 'low'; + + // Feature vector snapshot + features: Record; // Normalized feature values + featureVersion: string; // Schema version + + // Model info + modelVersion: string; + modelType: 'xgboost' | 'neural'; + predictionTimestamp: string; + + // Explanation (SHAP values) + explanation: { + topRiskFactors: Array<{ + feature: string; + contribution: number; // SHAP value + direction: 'positive' | 'negative'; // Increases or decreases churn risk + }>; + globalFeatureImportance: Array<{ + feature: string; + importance: number; + }>; + }; + + // Natural language summary + nlExplanation: string; // Auto-generated explanation + + // Intervention + suggestedActions: string[]; + interventionHistory: Array<{ + action: string; + timestamp: string; + outcome?: 'responded' | 'ignored' | 'churned' | 'retained'; + }>; + + // Validation (ground truth) + actualChurned?: boolean; + validationDate?: string; + + createdAt: string; + ttl: number; // predictionHorizon + 90 days +} +``` + +### ProductHealthScoreDoc + +```typescript +interface ProductHealthScoreDoc { + id: string; // ph_ + productId: string; // partition key + date: string; // Sort key (YYYY-MM-DD) + + // Composite score + overallHealthScore: number; // 0–100 + healthStatus: 'critical' | 'warning' | 'healthy'; + + // Dimension scores + dimensions: { + acquisition: { + score: number; // 0–100 + metrics: { + newUsers: number; + activationRateDay1: number; + activationRateDay7: number; + cac: number; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + activation: { + score: number; + metrics: { + firstValueMomentRate: number; + timeToFirstAction: number; + onboardingCompletionRate: number; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + retention: { + score: number; + metrics: { + dau: number; + mau: number; + dauMauRatio: number; + day7Retention: number; + day30Retention: number; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + engagement: { + score: number; + metrics: { + avgSessionLength: number; + sessionsPerUser: number; + featureAdoption: Record; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + revenue: { + score: number; + metrics: { + mrr: number; + arpu: number; + churnRate: number; + upgradeRate: number; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + stability: { + score: number; + metrics: { + crashFreeRate: number; + errorRate: number; + avgLatency: number; + uptimePercent: number; + }; + trend: 'improving' | 'stable' | 'declining'; + }; + }; + + // Anomalies detected + anomalies: Array<{ + metric: string; + expectedValue: number; + actualValue: number; + deviationPercent: number; + severity: 'critical' | 'warning'; + suggestedCause?: string; + }>; + + // Forecasts + forecasts: { + next7Days: { + expectedHealthScore: number; + confidenceInterval: [number, number]; + }; + next30Days: { + expectedHealthScore: number; + confidenceInterval: [number, number]; + }; + }; + + // Benchmarks + vsBaseline7Day: number; // % change vs. 7-day baseline + vsBaseline30Day: number; // % change vs. 30-day baseline + + createdAt: string; + ttl: number; // 2 years +} +``` + +### RetentionCampaignDoc + +```typescript +interface RetentionCampaignDoc { + id: string; // rc_ + productId: string; // partition key + + // Campaign definition + name: string; + description: string; + status: 'draft' | 'active' | 'paused' | 'completed'; + + // Trigger conditions + trigger: { + type: 'churn_risk' | 'health_score_drop' | 'behavioral' | 'scheduled'; + conditions: Array<{ + field: string; + operator: 'gt' | 'lt' | 'eq' | 'in'; + value: unknown; + }>; + }; + + // Audience + audience: { + riskSegments?: string[]; // 'critical', 'high', etc. + products?: string[]; + userSegments?: string[]; + excludeRecentContact?: number; // Hours (frequency capping) + }; + + // Message content + messages: Array<{ + channel: 'email' | 'push' | 'in_app' | 'slack_cs'; + templateId: string; + variant?: string; // A/B test variant + delayHours?: number; // Delay after trigger + conditions?: Array<{ + field: string; + operator: string; + value: unknown; + }>; + }>; + + // Performance tracking + stats: { + triggered: number; + sent: number; + opened: number; + clicked: number; + converted: number; + controlGroupSize: number; + controlChurnRate: number; + treatmentChurnRate: number; + }; + + createdAt: string; + updatedAt: string; + ttl: number; // 1 year after completion +} +``` + +### UserFeatureVectorDoc + +```typescript +interface UserFeatureVectorDoc { + id: string; // fv_ + userId: string; // partition key + productId: string; + + // Computed features + features: { + // Recency features + daysSinceLastSession: number; + daysSinceLastCoreAction: number; + + // Frequency features + sessionsLast7Days: number; + sessionsLast30Days: number; + avgSessionsPerWeek: number; + + // Engagement depth + avgSessionDuration: number; + actionsPerSession: number; + uniqueFeaturesUsed: number; + + // Product-specific (examples) + // NomGap + fastCompletionRate?: number; + streakLength?: number; + + // JarvisJr + agentDiversityScore?: number; + voiceSessionRatio?: number; + + // ChronoMind + timerCompletionRate?: number; + routineAdherenceScore?: number; + + // Error/stability + errorRateLast7Days: number; + crashCountLast30Days: number; + + // Revenue + planTier: number; // 0=free, 1=pro, 2=enterprise + lifetimeValue: number; + daysSinceLastPayment?: number; + }; + + // Normalized (0–1) for model input + normalizedFeatures: Record; + + // Metadata + featureSchemaVersion: string; + computedAt: string; + + // Time windows + observationWindow: { + start: string; + end: string; + }; + + ttl: number; // 90 days +} +``` + +--- + +## Implementation Tracking + +| Phase | Task | Status | Commit | +| ----- | ----------------------------- | ------ | ------ | +| 1.1 | Telemetry feature extraction | ⬜ | β€” | +| 1.1 | Time-window aggregations | ⬜ | β€” | +| 1.1 | Rolling window features | ⬜ | β€” | +| 1.2 | Feature store | ⬜ | β€” | +| 1.2 | Cosmos containers | ⬜ | β€” | +| 1.2 | Feature computation jobs | ⬜ | β€” | +| 1.3 | Product-specific features | ⬜ | β€” | +| 1.3 | Feature importance tracking | ⬜ | β€” | +| 2.1 | XGBoost model architecture | ⬜ | β€” | +| 2.1 | Training pipeline | ⬜ | β€” | +| 2.1 | Model evaluation | ⬜ | β€” | +| 2.2 | Real-time scoring API | ⬜ | β€” | +| 2.2 | Risk segmentation | ⬜ | β€” | +| 2.2 | Model versioning | ⬜ | β€” | +| 2.3 | SHAP explanations | ⬜ | β€” | +| 2.3 | Natural language explanations | ⬜ | β€” | +| 2.3 | Actionable insights | ⬜ | β€” | +| 3.1 | Health metric framework | ⬜ | β€” | +| 3.1 | Health indicators | ⬜ | β€” | +| 3.2 | Baseline establishment | ⬜ | β€” | +| 3.2 | Scoring algorithm | ⬜ | β€” | +| 3.2 | Alert thresholds | ⬜ | β€” | +| 3.3 | Anomaly detection | ⬜ | β€” | +| 4.1 | Campaign trigger rules | ⬜ | β€” | +| 4.1 | Personalized messaging | ⬜ | β€” | +| 4.2 | Platform integrations | ⬜ | β€” | +| 4.3 | CS team dashboard | ⬜ | β€” | +| 5.1 | Health overview UI | ⬜ | β€” | +| 5.2 | Churn prediction dashboard | ⬜ | β€” | +| 5.3 | Campaign management | ⬜ | β€” | + +**Legend:** ⬜ Not started | 🟑 In progress | βœ… Complete | ⏸️ Deferred + +--- + +## Quick Reference for Implementing Agent + +**πŸ“‹ Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md` + +**Key Files to Modify/Create:** + +``` +services/platform-service/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ modules/predictive-analytics/ +β”‚ β”‚ β”œβ”€β”€ types.ts # [1.2] Feature, HealthScore, Prediction types +β”‚ β”‚ β”œβ”€β”€ repository.ts # Data access layer +β”‚ β”‚ β”œβ”€β”€ feature-extractor.ts # [1.1] Telemetry β†’ features +β”‚ β”‚ β”œβ”€β”€ feature-store.ts # [1.2] Feature vector storage +β”‚ β”‚ β”œβ”€β”€ churn-model.ts # [2.1] XGBoost training & inference +β”‚ β”‚ β”œβ”€β”€ scoring-api.ts # [2.2] Real-time prediction endpoint +β”‚ β”‚ β”œβ”€β”€ explanation-engine.ts # [2.3] SHAP + NL explanations +β”‚ β”‚ β”œβ”€β”€ health-scoring.ts # [3] Health dimension calculation +β”‚ β”‚ β”œβ”€β”€ anomaly-detection.ts # [3.3] Prophet/ARIMA forecasting +β”‚ β”‚ β”œβ”€β”€ campaign-engine.ts # [4] Retention automation +β”‚ β”‚ β”œβ”€β”€ routes.ts # [5] REST API +β”‚ β”‚ └── predictive-analytics.test.ts # Tests +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ └── cosmos-init.ts # [1.2] Add containers +β”‚ └── server.ts # Register routes +dashboards/admin-web/ +β”œβ”€β”€ src/ +β”‚ β”œβ”€β”€ app/(dashboard)/ +β”‚ β”‚ β”œβ”€β”€ health-dashboard/ +β”‚ β”‚ β”‚ └── page.tsx # [5.1] Product health overview +β”‚ β”‚ └── predictive/ +β”‚ β”‚ β”œβ”€β”€ at-risk/ +β”‚ β”‚ β”‚ └── page.tsx # [4.3] At-risk user list +β”‚ β”‚ └── campaigns/ +β”‚ β”‚ └── page.tsx # [5.3] Campaign management +β”‚ β”œβ”€β”€ lib/ +β”‚ β”‚ └── predictive-client.ts # API client +β”‚ └── components/ +β”‚ └── predictive/ # Risk cards, health charts +``` + +**Commit Message Format:** + +``` +feat(predictive-analytics): [] +``` + +**Example:** + +```bash +git add services/platform-service/src/modules/predictive-analytics/ +git commit -m "feat(predictive-analytics): add feature extraction and store [1.1-1.2]" +``` + +**Testing Requirements:** + +- Unit tests: 20+ Vitest tests for feature extraction, model inference +- Model validation: AUC, calibration, precision@k metrics +- Integration: End-to-end prediction pipeline + +**Dependencies:** + +- Telemetry module (feature extraction) +- Delivery module (retention campaigns) +- Azure ML or scikit-learn (model training) + +--- + +## Appendix B: API Reference + +| Method | Endpoint | Auth | Description | +| ------ | -------------------------------------- | ------------- | ----------------------------- | +| GET | `/predictive/health` | Admin | Get all product health scores | +| GET | `/predictive/health/:productId` | Admin | Get product health detail | +| GET | `/predictive/health/:productId/trends` | Admin | Historical health trends | +| POST | `/predictive/churn-score` | Admin/Service | Get churn prediction for user | +| POST | `/predictive/churn-batch` | Admin | Batch churn scoring | +| GET | `/predictive/at-risk-users` | Admin/CS | List users by risk segment | +| GET | `/predictive/users/:id/risk-profile` | Admin/CS | User churn risk details | +| GET | `/predictive/model/performance` | Admin | Model accuracy metrics | +| GET | `/predictive/model/features` | Admin | Feature importance ranking | +| GET | `/predictive/campaigns` | Admin | List retention campaigns | +| POST | `/predictive/campaigns` | Admin | Create campaign | +| PATCH | `/predictive/campaigns/:id` | Admin | Update campaign | +| GET | `/predictive/campaigns/:id/stats` | Admin | Campaign performance | +| POST | `/predictive/campaigns/:id/trigger` | Admin | Manual trigger for testing | + +--- + +## Appendix C: Integration Points + +### With Telemetry Module + +- Raw events feed feature extraction +- Error rates flow into health scores +- Correlation IDs link behaviors to predictions + +### With Diagnostics Module + +- Debug sessions enrich feature vectors +- Error clusters correlate with churn risk +- Screenshot patterns analyzed for UX issues + +### With Event Bus + +| Event | Action | +| ----------------------------------- | ---------------------------------------- | +| `predictive.churn.risk_detected` | Trigger retention campaign | +| `predictive.health.critical` | Alert leadership, suggest debug sessions | +| `predictive.anomaly.detected` | Create incident, notify on-call | +| `user.retention.campaign_responded` | Update model with outcome | + +### With Delivery Module + +- Retention campaigns use email templates +- Push notifications for urgent interventions +- A/B test message variants + +--- + +## Appendix D: Cost Estimation + +| Component | Monthly Cost (est.) | +| ------------------------------------ | -------------------------- | +| Cosmos DB (features + predictions) | $150–300 | +| Model training (Azure ML) | $100–200 | +| Inference compute | $50–100 | +| Email delivery (retention campaigns) | $50–200 (volume-dependent) | +| **Total** | **$350–800/month** | + +ROI: If system prevents 5% of predicted churn at $50 LTV with 10K at-risk users/month: + +- 500 users retained Γ— $50 = $25K/month value +- 10:1+ ROI + +--- + +## Appendix E: Success Metrics + +### Model Performance + +- [ ] AUC > 75% (discrimination) +- [ ] Calibration slope 0.9–1.1 (well-calibrated probabilities) +- [ ] Precision@10% > 60% (high-risk predictions are accurate) + +### Business Impact + +- [ ] 10%+ reduction in churn rate for targeted cohorts +- [ ] 5%+ increase in re-engagement campaign response +- [ ] CS team satisfaction with at-risk user visibility + +### Operational + +- [ ] Prediction latency < 100ms +- [ ] Feature freshness < 24 hours +- [ ] Model retraining automation + +--- + +## Current Status + +- [ ] **Design complete** β€” Target: 2026-03-10 +- [ ] **Phase 1: Feature Pipeline** β€” Not started +- [ ] **Phase 2: Churn Model** β€” Not started +- [ ] **Phase 3: Health Scoring** β€” Not started +- [ ] **Phase 4: Interventions** β€” Not started +- [ ] **Phase 5: Admin UI** β€” Not started +- [ ] **Phase 6: Advanced** β€” Future + +**Estimated Timeline:** 3 weeks (Phases 1–5) + +**Dependencies:** + +- Telemetry module (for feature extraction) +- Azure ML or similar (for model training) +- Delivery module (for retention campaigns) + +--- + +_Last Updated: 2026-03-03_