docs(roadmaps): add AI diagnostic assistant, A/B testing, and churn prediction roadmaps

- AI Diagnostic Assistant: LLM-powered root cause analysis, error clustering, natural language queries
- Intelligent A/B Testing: Thompson sampling, Bayesian early stopping, AI hypothesis generation
- Predictive Churn & Health: XGBoost models, health scoring, automated retention campaigns

All roadmaps include:
- Implementation tracking tables with status/commit columns
- Quick reference sections with file structures
- Phase-by-phase task breakdowns with [X.Y.Z] codes
This commit is contained in:
saravanakumardb1 2026-03-03 11:38:56 -08:00
parent d510867b87
commit e98380003b
3 changed files with 2164 additions and 0 deletions

View File

@ -0,0 +1,597 @@
# AI Diagnostic Assistant — Implementation Roadmap
> **Module:** `platform-service/src/modules/ai-diagnostics/`
> **Admin UI:** `/ops/ai-diagnostics/`
> **Target:** LLM-powered root cause analysis from telemetry + debug sessions
> **Estimated Effort:** 23 weeks
> **Status:** 🟡 Planning
---
## Executive Summary
This roadmap delivers an **AI-powered diagnostic assistant** that analyzes error patterns, debug session data, and telemetry to automatically suggest root causes—like having a senior engineer on-call 24/7. Engineers can ask natural language questions like _"Why did the iOS keyboard crash yesterday?"_ and receive AI-generated hypotheses with supporting evidence.
### Key Differentiators vs. Manual Debugging
| Feature | Manual Debugging | AI Diagnostic Assistant |
| ----------------- | --------------------------- | ----------------------------------- |
| Query | SQL + log grep | **Natural language** |
| Pattern Detection | Hours of manual correlation | **AI finds hidden patterns** |
| Context Assembly | Check 5+ systems manually | **Auto-assembles timeline** |
| Hypothesis | Engineer intuition | **LLM-generated + evidence** |
| Learning | Per-engineer experience | **Accumulates across all sessions** |
---
## Phase 1: Data Pipeline & Embeddings (Week 1)
**Goal:** Extract, normalize, and embed error data for semantic search and clustering.
### 1.1 Error Fingerprinting & Clustering
- [ ] **1.1.1** Create `modules/ai-diagnostics/types.ts`
- [ ] `ErrorClusterDoc` — grouped similar errors with signature
- [ ] `ErrorFingerprint` — normalized stack trace hash
- [ ] `ClusterAnalysis` — AI-generated pattern description
- [ ] Zod schemas for all inputs
_Commit format:_ `git commit -m "feat(ai-diagnostics): add error clustering types [1.1.1]"``https://github.com/saravanakumardb1/learning_ai_common_plat/commit/<hash>`
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
- [ ] `error_clusters` (pk: `/productId`, TTL: 90 days)
- [ ] `error_fingerprints` (pk: `/fingerprintHash`, unique index)
- [ ] `diagnostic_insights` (pk: `/clusterId`, AI-generated analyses)
_Commit format:_ `git commit -m "feat(ai-diagnostics): add cosmos containers for error clustering [1.1.2]"`
- [ ] **1.1.3** Implement error normalization
- [ ] Stack trace parsing (remove line numbers, file paths)
- [ ] Message templating (replace UUIDs, timestamps, user IDs with placeholders)
- [ ] Fingerprint generation (SHA-256 of normalized error)
- [ ] Similarity scoring (Levenshtein for near-matches)
_Commit format:_ `git commit -m "feat(ai-diagnostics): implement error normalization and fingerprinting [1.1.3]"`
### 1.2 Vector Embeddings for Semantic Search
- [ ] **1.2.1** Create embedding pipeline
- [ ] Azure OpenAI `text-embedding-3-small` integration
- [ ] Error message + stack trace → 1536-dim vector
- [ ] Batch embedding job (100 errors at a time)
- [ ] **1.2.2** Cosmos DB vector search setup
- [ ] Store embeddings in `error_clusters` documents
- [ ] Cosine similarity query function
- [ ] Similar error lookup by vector distance
- [ ] **1.2.3** Clustering algorithm
- [ ] HDBSCAN for density-based clustering
- [ ] DBSCAN fallback for smaller datasets
- [ ] Auto-determine cluster count (no manual k)
- [ ] Re-cluster nightly as new errors arrive
### 1.3 Telemetry Ingestion for Context
- [ ] **1.3.1** Link telemetry to errors
- [ ] `correlationId` propagation across services
- [ ] 5-minute window: error → preceding telemetry events
- [ ] Session state reconstruction (what user was doing)
- [ ] **1.3.2** Enrich error context
- [ ] Device info (OS version, model, memory)
- [ ] App state (screen, feature flags, config)
- [ ] Recent API calls (network trace from diagnostics)
- [ ] Recent user actions (breadcrumb trail)
**Phase 1 Exit Criteria:**
- [ ] Errors auto-clustered with 90%+ accuracy
- [ ] Vector search returns semantically similar errors
- [ ] 10,000+ historical errors embedded and clustered
- [ ] Correlation pipeline links errors to telemetry context
---
## Phase 2: LLM Analysis Engine (Week 12)
### 2.1 Prompt Engineering & Analysis Pipeline
- [ ] **2.1.1** Create analysis prompts
- [ ] `ROOT_CAUSE_ANALYSIS` prompt template
```
Given this error cluster:
- Error signature: {fingerprint}
- Sample stack traces: {samples}
- Common context: {deviceStats}, {appState}
- Preceding events: {breadcrumbSummary}
- Similar resolved issues: {relatedClusters}
Analyze and provide:
1. Likely root cause category (config, dependency, logic, resource, external)
2. Specific hypothesis with reasoning
3. Evidence confidence (high/medium/low)
4. Suggested investigation steps
5. Potential fix direction
```
- [ ] `PATTERN_SUMMARY` prompt for cluster descriptions
- [ ] `COMPARATIVE_ANALYSIS` for error vs. baseline
- [ ] **2.1.2** LLM integration
- [ ] Azure OpenAI GPT-4o-mini for analysis (cost-effective)
- [ ] GPT-4o for complex multi-factor analysis
- [ ] Response JSON schema enforcement
- [ ] Retry logic with exponential backoff
### 2.2 Insight Generation Service
- [ ] **2.2.1** Create `modules/ai-diagnostics/analyzer.ts`
- [ ] `analyzeCluster(clusterId)` — full analysis workflow
- [ ] `generateInsight(errorContext)` — single error analysis
- [ ] `compareClusters(clusterA, clusterB)` — diff analysis
- [ ] **2.2.2** Analysis workflow
- [ ] Fetch cluster data + related telemetry
- [ ] Build LLM context (respect token limits)
- [ ] Call LLM with structured prompt
- [ ] Parse and validate response
- [ ] Store insight in `diagnostic_insights`
- [ ] **2.2.3** Confidence scoring
- [ ] Evidence count weighting
- [ ] Similar resolved issue bonus
- [ ] Recency decay (older patterns = lower confidence)
- [ ] Multi-model consensus (if available)
### 2.3 Continuous Learning
- [ ] **2.3.1** Feedback loop
- [ ] Engineer feedback: "Was this insight helpful? 👍/👎"
- [ ] Resolution tracking (link commits to clusters)
- [ ] Confidence recalibration based on outcomes
- [ ] **2.3.2** Pattern accumulation
- [ ] "Known issues" database (manually curated)
- [ ] Historical fix patterns (what solved similar issues)
- [ ] Regression detection (old issue reappearing)
**Phase 2 Exit Criteria:**
- [ ] LLM generates root cause hypotheses with evidence
- [ ] Confidence scores align with actual resolution rates
- [ ] Analysis completes in < 5 seconds for typical clusters
- [ ] Feedback loop capturing engineer ratings
---
## Phase 3: Natural Language Query Interface (Week 2)
### 3.1 Query Understanding
- [ ] **3.1.1** Create `modules/ai-diagnostics/query-parser.ts`
- [ ] Intent classification (root cause, pattern search, comparison, trend)
- [ ] Entity extraction (product, time range, error type, user segment)
- [ ] Temporal parsing ("yesterday", "last week", "since v2.1")
- [ ] Constraint identification ("only iOS", "excluding beta users")
- [ ] **3.1.2** Query patterns
- [ ] Root cause: _"Why did X happen?"_ → analyze cluster
- [ ] Pattern search: _"Show me similar crashes"_ → vector search
- [ ] Comparison: _"Did error rate increase after release?"_ → trend analysis
- [ ] User impact: _"How many users affected by Y?"_ → aggregation query
### 3.2 Query Execution Engine
- [ ] **3.2.1** Query → data pipeline
- [ ] Map entities to Cosmos queries
- [ ] Fetch relevant clusters, telemetry, sessions
- [ ] Assemble context for response generation
- [ ] **3.2.2** Response generation
- [ ] Direct answers for simple queries
- [ ] AI-generated summaries for complex analysis
- [ ] Data + visualization suggestions
- [ ] Drill-down links for exploration
### 3.3 REST API Routes
- [ ] **3.3.1** Create `modules/ai-diagnostics/routes.ts`
- [ ] `POST /ai-diagnostics/query` — natural language question
- [ ] `GET /ai-diagnostics/clusters/:id/analysis` — pre-computed insight
- [ ] `POST /ai-diagnostics/clusters/:id/analyze` — trigger fresh analysis
- [ ] `GET /ai-diagnostics/suggestions` — auto-suggested investigations
- [ ] `POST /ai-diagnostics/feedback` — submit insight rating
**Phase 3 Exit Criteria:**
- [ ] Natural language queries parse correctly (90%+ intent accuracy)
- [ ] Query → response pipeline < 3 seconds
- [ ] Complex queries return structured answers with evidence
- [ ] API routes tested and documented
---
## Phase 4: Admin Dashboard UI (Week 23)
### 4.1 AI Insights Page
- [ ] **4.1.1** Create `/ops/ai-diagnostics/page.tsx`
- [ ] Smart search bar (natural language input)
- [ ] Suggested queries based on recent errors
- [ ] Recent AI-generated insights list
- [ ] Trending clusters (auto-detected anomalies)
- [ ] **4.1.2** Query results view
- [ ] AI-generated answer with confidence badge
- [ ] Supporting evidence cards (cluster stats, sample errors)
- [ ] Related debug sessions (linked traces)
- [ ] Timeline visualization of error pattern
- [ ] "Investigate further" actions
### 4.2 Cluster Detail with AI Analysis
- [ ] **4.2.1** Enhance error cluster detail
- [ ] AI-generated summary card ("This appears to be...")
- [ ] Root cause hypothesis with confidence
- [ ] Evidence breakdown (stack samples, device patterns, API failures)
- [ ] Suggested fixes from similar resolved issues
- [ ] "Request deeper analysis" button (GPT-4o)
- [ ] **4.2.2** Interactive investigation
- [ ] Compare with other clusters ("Show me similar issues")
- [ ] Filter by context (OS version, app version, feature flags)
- [ ] View affected user journeys (breadcrumb trails)
### 4.3 Proactive Alerts
- [ ] **4.3.1** Anomaly detection
- [ ] Auto-detect emerging error clusters
- [ ] Spike in existing cluster frequency
- [ ] New error types after releases
- [ ] **4.3.2** AI-generated alerts
- [ ] Slack/Teams notification with summary
- [ ] "Investigate in AI Diagnostics" deep link
- [ ] Auto-started debug session recommendations
**Phase 4 Exit Criteria:**
- [ ] Admin can ask questions and get AI-generated answers
- [ ] Cluster detail shows AI analysis with evidence
- [ ] Proactive alerts for emerging issues
- [ ] Full test coverage (UI + API)
---
## Phase 5: Advanced Capabilities (Future)
### 5.1 Multi-Modal Analysis
- [ ] Analyze screenshots from debug sessions for UI issues
- [ ] Voice transcription analysis (for voice app errors)
- [ ] Performance trace visualization with AI annotations
### 5.2 Predictive Diagnostics
- [ ] Pre-crash pattern detection (warn before crash happens)
- [ ] Resource exhaustion prediction (memory, disk, API quotas)
- [ ] Config drift detection ("this setting combination often fails")
### 5.3 Self-Healing Suggestions
- [ ] Auto-generated config recommendations
- [ ] Feature flag rollback suggestions
- [ ] Circuit breaker threshold recommendations
## Implementation Tracking
| Phase | Task | Status | Commit |
| ----- | -------------------------- | ------ | ------ |
| 1.1 | Error clustering types | ⬜ | — |
| 1.1 | Cosmos containers | ⬜ | — |
| 1.1 | Error normalization | ⬜ | — |
| 1.2 | Embedding pipeline | ⬜ | — |
| 1.2 | Vector search setup | ⬜ | — |
| 1.2 | Clustering algorithm | ⬜ | — |
| 1.3 | Telemetry linking | ⬜ | — |
| 1.3 | Error context enrichment | ⬜ | — |
| 2.1 | Analysis prompts | ⬜ | — |
| 2.1 | LLM integration | ⬜ | — |
| 2.2 | Insight generation service | ⬜ | — |
| 2.2 | Analysis workflow | ⬜ | — |
| 2.2 | Confidence scoring | ⬜ | — |
| 2.3 | Feedback loop | ⬜ | — |
| 2.3 | Pattern accumulation | ⬜ | — |
| 3.1 | Query parser | ⬜ | — |
| 3.1 | Query patterns | ⬜ | — |
| 3.2 | Query execution | ⬜ | — |
| 3.2 | Response generation | ⬜ | — |
| 3.3 | REST API routes | ⬜ | — |
| 4.1 | AI insights page | ⬜ | — |
| 4.1 | Query results view | ⬜ | — |
| 4.2 | Cluster detail | ⬜ | — |
| 4.2 | Interactive investigation | ⬜ | — |
| 4.3 | Proactive alerts | ⬜ | — |
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
---
## Quick Reference for Implementing Agent
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md`
**Key Files to Modify/Create:**
```
services/platform-service/
├── src/
│ ├── modules/ai-diagnostics/
│ │ ├── types.ts # [1.1.1] Error clustering types
│ │ ├── repository.ts # [1.2] Data access layer
│ │ ├── analyzer.ts # [2.2] LLM analysis engine
│ │ ├── query-parser.ts # [3.1] NL query understanding
│ │ ├── query-executor.ts # [3.2] Query execution
│ │ ├── routes.ts # [3.3] REST API
│ │ └── ai-diagnostics.test.ts # Tests
│ ├── lib/
│ │ ├── cosmos-init.ts # [1.1.2] Add containers
│ │ ├── embedding-client.ts # [1.2.1] Azure OpenAI embeddings
│ │ └── pii-redaction.ts # Reuse existing
│ └── server.ts # [3.3] Register routes
dashboards/admin-web/
├── src/
│ ├── app/(dashboard)/
│ │ ├── ai-diagnostics/
│ │ │ ├── page.tsx # [4.1] Main insights page
│ │ │ └── [id]/
│ │ │ └── page.tsx # [4.2] Cluster detail
│ ├── lib/
│ │ └── ai-diagnostics-client.ts # API client
│ └── components/
│ └── ai-diagnostics/ # Reusable components
```
**Commit Message Format:**
```
feat(ai-diagnostics): <description> [<task.code>]
```
**Example:**
```bash
git add services/platform-service/src/modules/ai-diagnostics/
git commit -m "feat(ai-diagnostics): add error clustering types and cosmos containers [1.1.1-1.1.2]"
```
**Testing Requirements:**
- Unit tests: 20+ Vitest tests for clustering, embeddings, LLM responses
- Integration tests: End-to-end query → analysis pipeline
**Dependencies:**
- Telemetry module (error events)
- Azure OpenAI (embeddings + GPT-4o)
- Existing diagnostics module (optional linking)
---
### ErrorClusterDoc
```typescript
interface ErrorClusterDoc {
id: string; // ec_<uuid>
productId: string; // partition key
fingerprintHash: string; // SHA-256 of normalized error
// Cluster metadata
firstSeenAt: string; // ISO 8601
lastSeenAt: string;
occurrenceCount: number; // Total occurrences
uniqueUsers: number; // Affected user count
// Error signature
errorType: string; // Exception class/name
messageTemplate: string; // Normalized message with placeholders
stackSignature: string; // Normalized stack frames
// Vector embedding for semantic search
embedding: number[]; // 1536-dim from text-embedding-3-small
embeddingVersion: string; // Model version for re-embedding
// Context patterns (auto-extracted)
commonContext: {
osVersions: Array<{ version: string; count: number }>;
appVersions: Array<{ version: string; count: number }>;
deviceModels: Array<{ model: string; count: number }>;
screenContexts: Array<{ screen: string; count: number }>;
};
// Related data
relatedClusterIds: string[]; // Similar clusters (vector similarity)
mergedIntoClusterId?: string; // If deduplicated
// Resolution tracking
status: 'active' | 'investigating' | 'resolved' | 'ignored';
resolvedAt?: string;
resolutionCommit?: string; // Link to fix
// Timestamps
createdAt: string;
updatedAt: string;
ttl: number; // 90 days
}
```
### DiagnosticInsightDoc
```typescript
interface DiagnosticInsightDoc {
id: string; // di_<uuid>
clusterId: string; // partition key (with productId)
productId: string;
// AI-generated analysis
analysisType: 'root_cause' | 'pattern' | 'comparison' | 'trend';
generatedAt: string;
// LLM output
rootCauseCategory: 'config' | 'dependency' | 'logic' | 'resource' | 'external' | 'unknown';
hypothesis: string; // Natural language explanation
reasoning: string; // Why LLM thinks this
confidence: 'high' | 'medium' | 'low';
confidenceScore: number; // 0.01.0
// Evidence
evidence: Array<{
type:
| 'stack_trace'
| 'telemetry_pattern'
| 'device_correlation'
| 'api_failure'
| 'similar_issue';
description: string;
strength: 'strong' | 'moderate' | 'weak';
data: Record<string, unknown>;
}>;
// Suggested actions
suggestedInvestigation: string[];
potentialFixDirection?: string;
similarResolvedIssues?: Array<{
clusterId: string;
resolution: string;
confidence: number;
}>;
// Feedback
feedbackStats: {
helpful: number;
notHelpful: number;
engineerNotes: string[];
};
// LLM metadata
modelUsed: string; // gpt-4o, gpt-4o-mini
promptTokens: number;
completionTokens: number;
createdAt: string;
ttl: number; // 90 days
}
```
### NaturalLanguageQueryDoc
```typescript
interface NaturalLanguageQueryDoc {
id: string; // nq_<uuid>
userId: string; // Admin who asked
productId?: string; // Optional filter
// Query
rawQuery: string; // "Why did iOS keyboard crash yesterday?"
parsedIntent: 'root_cause' | 'pattern_search' | 'comparison' | 'trend' | 'impact';
extractedEntities: {
products?: string[];
timeRange?: { start: string; end: string };
errorTypes?: string[];
platforms?: string[];
userSegments?: string[];
};
// Execution
executedQuery: string; // Translated Cosmos query
dataSources: string[]; // Clusters, telemetry, sessions accessed
executionTimeMs: number;
// Response
aiResponse: string; // Generated answer
confidence: number; // Overall confidence
supportingData: Array<{
type: 'cluster' | 'telemetry' | 'session';
id: string;
relevanceScore: number;
}>;
// Feedback
userRating?: 'helpful' | 'not_helpful';
userComment?: string;
createdAt: string;
ttl: number; // 30 days
}
```
---
## Appendix B: API Reference
| Method | Endpoint | Auth | Description |
| ------ | --------------------------------------- | ----- | --------------------------------------- |
| POST | `/ai-diagnostics/query` | Admin | Natural language diagnostic query |
| GET | `/ai-diagnostics/clusters` | Admin | List error clusters (with AI summaries) |
| GET | `/ai-diagnostics/clusters/:id` | Admin | Cluster detail with AI analysis |
| POST | `/ai-diagnostics/clusters/:id/analyze` | Admin | Trigger fresh LLM analysis |
| GET | `/ai-diagnostics/clusters/:id/analysis` | Admin | Get pre-computed insight |
| GET | `/ai-diagnostics/suggestions` | Admin | AI-suggested investigations |
| POST | `/ai-diagnostics/feedback` | Admin | Rate insight helpfulness |
| POST | `/ai-diagnostics/search` | Admin | Semantic search across errors |
---
## Appendix C: Integration Points
### With Telemetry Module
- Error events auto-create/update clusters
- Telemetry context enriches error analysis
- Correlation IDs link errors to user journeys
### With Diagnostics Module
- Debug sessions linked to error clusters
- Screenshots from sessions aid visual analysis
- Network traces provide API failure context
### With Event Bus
| Event | Action |
| ------------------------------- | --------------------------------------------------------- |
| `telemetry.error.ingested` | Update/create cluster, trigger re-analysis if new pattern |
| `diagnostics.session.completed` | Link session to related clusters, analyze captured logs |
| `diagnostics.ingest.fatal` | High-priority cluster analysis, alert if novel pattern |
---
## Appendix D: Cost Estimation
| Component | Monthly Cost (est.) |
| ------------------------ | ------------------------------- |
| Azure OpenAI embeddings | $50100 (10K errors/day) |
| GPT-4o-mini analysis | $100200 (1K analyses/day) |
| GPT-4o deep analysis | $50100 (100 deep analyses/day) |
| Cosmos DB vector storage | $2050 |
| **Total** | **$220450/month** |
Optimization:
- Cache frequent cluster analyses (24hr TTL)
- Use GPT-4o-mini for 90% of queries
- Batch embedding jobs during off-peak
---
## Current Status
- [ ] **Design complete** — Target: 2026-03-10
- [ ] **Phase 1: Data Pipeline** — Not started
- [ ] **Phase 2: LLM Engine** — Not started
- [ ] **Phase 3: Query Interface** — Not started
- [ ] **Phase 4: Admin UI** — Not started
- [ ] **Phase 5: Advanced Capabilities** — Future
**Estimated Timeline:** 23 weeks (Phases 14)
**Dependencies:**
- Telemetry module (must be collecting errors)
- Diagnostics module (optional, for rich context)
- Azure OpenAI deployment (embedding + GPT-4o access)
---
_Last Updated: 2026-03-03_

View File

@ -0,0 +1,719 @@
# Intelligent A/B Testing — Implementation Roadmap
> **Module:** `platform-service/src/modules/ab-testing/`
> **Admin UI:** `/ops/experiments/`
> **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation
> **Estimated Effort:** 2.53 weeks
> **Status:** 🟡 Planning
---
## Executive Summary
This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns.
### Key Differentiators vs. Static Feature Flags
| Capability | Static Flags (Current) | Intelligent A/B Testing |
| ------------------ | ---------------------- | ----------------------------------------- |
| Traffic Allocation | Manual percentage | **Multi-armed bandit optimization** |
| Stopping Decision | Manual monitoring | **Auto-stop at statistical significance** |
| Winner Selection | Human judgment | **Bayesian probability of superiority** |
| Test Duration | Fixed (often wrong) | **Dynamic based on effect size** |
| Hypothesis | Human-written | **AI-generated from usage patterns** |
| Sample Size | Guesswork | **Power analysis + sequential testing** |
---
## Phase 1: Core Experiment Engine (Week 1)
### 1.1 Data Model & Schemas
- [ ] **1.1.1** Create `modules/ab-testing/types.ts`
- [ ] `ExperimentDoc` — experiment definition and config
- [ ] `VariantDoc` — variant metadata + metrics
- [ ] `AssignmentDoc` — user → variant assignments
- [ ] `MetricDoc` — event types being tracked
- [ ] `ExperimentResult` — statistical analysis results
- [ ] Zod schemas for all inputs
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
- [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed)
- [ ] `experiment_variants` (pk: `/experimentId`)
- [ ] `experiment_assignments` (pk: `/userId`, query by experiment)
- [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series)
- [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates)
### 1.2 Assignment & Bucketing
- [ ] **1.2.1** Create deterministic bucketing
- [ ] Consistent hashing (userId + experimentId → variant)
- [ ] FNV-1a hash algorithm (same as feature flags)
- [ ] Sticky assignments (user always sees same variant)
- [ ] Override capability (force specific variant for QA)
- [ ] **1.2.2** Assignment strategies
- [ ] `random` — Simple randomization (control vs static)
- [ ] `thompson` — Thompson sampling (multi-armed bandit)
- [ ] `epsilon_greedy` — Epsilon-greedy exploration
- [ ] `ucb` — Upper Confidence Bound algorithm
- [ ] **1.2.3** Audience targeting
- [ ] User property filters (platform, version, region, subscription tier)
- [ ] Percentage rollout within target segment
- [ ] Exclusion lists (beta users, internal accounts)
### 1.3 Event Tracking Pipeline
- [ ] **1.3.1** Metric definitions
- [ ] `conversion` — Binary (did/didn't convert)
- [ ] `count` — Integer events (sessions, messages)
- [ ] `duration` — Time-based (session length, task time)
- [ ] `revenue` — Monetary (purchase amount, LTV)
- [ ] `custom` — Arbitrary numeric values
- [ ] **1.3.2** Event ingestion
- [ ] `POST /ab-testing/events` batch endpoint
- [ ] Client SDK: `track(event, value, metadata)`
- [ ] Automatic attribution (which variant caused this event)
- [ ] Deduplication (eventId + userId uniqueness)
**Phase 1 Exit Criteria:**
- [ ] Experiments created with multiple variants
- [ ] Users consistently assigned to variants
- [ ] Events tracked and attributed correctly
- [ ] 20+ tests for assignment and ingestion
---
## Phase 2: Statistical Analysis Engine (Week 12)
### 2.1 Bayesian Inference
- [ ] **2.1.1** Create `modules/ab-testing/statistics.ts`
- [ ] `BetaDistribution` for conversion rates
- [ ] `GammaDistribution` for count/duration metrics
- [ ] `NormalDistribution` for continuous metrics
- [ ] Monte Carlo simulation (10,000 samples)
- [ ] **2.1.2** Probability calculations
- [ ] `probabilityVariantBeatsControl(variant, control)`
- [ ] `expectedLossIfChosen(variant)`
- [ ] `probabilityBeatAllVariants(variant)`
- [ ] **2.1.3** Credible intervals
- [ ] 95% credible interval for each variant's true metric
- [ ] Visualization-ready (lower, mean, upper bounds)
### 2.2 Early Stopping Rules
- [ ] **2.2.1** Stopping criteria
- [ ] **Winner found:** Variant has > 95% probability of beating control
- [ ] **Loser clear:** Control has > 95% probability of beating variant
- [ ] **Practical significance:** Minimum detectable effect not reached
- [ ] **Time bound:** Max duration reached (safety limit)
- [ ] **2.2.2** Auto-promotion
- [ ] Auto-rollout winner to 100% when threshold hit
- [ ] Notify admins via Slack/email
- [ ] Create audit log entry
- [ ] **2.2.3** Guardrails
- [ ] Minimum sample size before early stopping (100 users/variant)
- [ ] Business hours only for auto-actions
- [ ] Require approval for revenue-impacting experiments
### 2.3 Thompson Sampling
- [ ] **2.3.1** Multi-armed bandit implementation
- [ ] Sample from posterior distributions
- [ ] Assign user to variant with highest sample
- [ ] Re-balance traffic every hour based on performance
- [ ] **2.3.2** Exploration vs exploitation
- [ ] Exploration rate decays over time
- [ ] High uncertainty = more exploration
- [ ] Clear winner = more traffic to winner
- [ ] **2.3.3** Regret minimization
- [ ] Track cumulative regret vs optimal variant
- [ ] Regret bounds reporting
**Phase 2 Exit Criteria:**
- [ ] Bayesian probabilities calculated correctly
- [ ] Early stopping triggers at appropriate thresholds
- [ ] Thompson sampling re-allocates traffic dynamically
- [ ] Statistical tests validate correctness
---
## Phase 3: AI-Powered Hypothesis Generation (Week 2)
### 3.1 Pattern Detection
- [ ] **3.1.1** Usage pattern analysis
- [ ] Analyze feature flag usage telemetry
- [ ] Segment analysis (iOS vs Android, free vs pro)
- [ ] Temporal patterns (day of week, time of day)
- [ ] User behavior sequences (funnel analysis)
- [ ] **3.1.2** Anomaly detection
- [ ] Unexpected drop in feature adoption
- [ ] Performance regression signals
- [ ] User segment showing different behavior
- [ ] **3.1.3** Opportunity identification
- [ ] Underperforming features (low adoption)
- [ ] High-dropoff flows
- [ ] Competitor feature gaps
### 3.2 Hypothesis Generation
- [ ] **3.2.1** LLM hypothesis prompts
```
Given this feature usage data:
- Feature: {featureName}
- Current adoption: {adoptionRate}% (baseline: {baseline}%)
- Segment performance: {segmentData}
- User feedback: {feedbackSamples}
- Competitor analysis: {competitorFeatures}
Generate experiment hypotheses:
1. Primary hypothesis: "Changing X will improve Y because..."
2. Secondary hypotheses (2-3 alternatives)
3. Expected effect size (conservative estimate)
4. Success metric recommendation
5. Risk assessment
```
- [ ] **3.2.2** Hypothesis ranking
- [ ] Expected impact scoring
- [ ] Implementation difficulty estimate
- [ ] Statistical power prediction
- [ ] Risk-adjusted expected value
- [ ] **3.2.3** Suggested experiment design
- [ ] Variant count recommendation
- [ ] Traffic allocation suggestion
- [ ] Duration estimate
- [ ] Required sample size calculation
### 3.3 Auto-Experiment Suggestions
- [ ] **3.3.1** Weekly AI reports
- [ ] Top 5 experiment opportunities
- [ ] Hypotheses with supporting evidence
- [ ] Prioritized by expected impact
- [ ] **3.3.2** One-click experiment creation
- [ ] Pre-fill experiment from hypothesis
- [ ] Suggested variants with descriptions
- [ ] Pre-configured metrics
**Phase 3 Exit Criteria:**
- [ ] AI generates meaningful hypotheses from usage data
- [ ] Hypothesis quality rated by product team (80%+ useful)
- [ ] Auto-suggested experiments created in 1 click
- [ ] Weekly reports generated automatically
---
## Phase 4: Admin Dashboard UI (Week 23)
### 4.1 Experiments List Page
- [ ] **4.1.1** Create `/ops/experiments/page.tsx`
- [ ] Experiment cards (status, duration, sample size)
- [ ] Quick filters (running, completed, draft)
- [ ] AI-generated hypothesis badge
- [ ] Health indicators (traffic balance, event flow)
- [ ] **4.1.2** Experiment creation wizard
- [ ] Step 1: Define hypothesis (AI suggestions available)
- [ ] Step 2: Create variants (name, description, config)
- [ ] Step 3: Select metrics (primary + secondary)
- [ ] Step 4: Audience targeting
- [ ] Step 5: Traffic allocation (manual or Thompson)
- [ ] Step 6: Review and launch
### 4.2 Live Experiment Dashboard
- [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx`
- [ ] Real-time metrics comparison
- [ ] Variant performance table (conversions, counts, durations)
- [ ] Bayesian probability visualization
- [ ] Credible interval charts
- [ ] **4.2.2** Statistical summary card
- [ ] Probability of beating control (per variant)
- [ ] Expected lift if implemented
- [ ] Sample size progress bar
- [ ] Days to significance estimate
- [ ] **4.2.3** Action buttons
- [ ] Adjust traffic allocation
- [ ] Pause/resume experiment
- [ ] Stop and declare winner
- [ ] Rollout winner to 100%
- [ ] Archive experiment
### 4.3 Results & Reporting
- [ ] **4.3.1** Results page
- [ ] Final statistical summary
- [ ] Variant comparison visualization
- [ ] Segment breakdown (iOS vs Android, etc.)
- [ ] Confidence intervals over time
- [ ] **4.3.2** AI insights panel
- [ ] Why this result occurred (LLM summary)
- [ ] Unexpected findings
- [ ] Follow-up experiment suggestions
- [ ] **4.3.3** Export capabilities
- [ ] CSV export of raw data
- [ ] PDF report generation
- [ ] API endpoint for data warehouse sync
**Phase 4 Exit Criteria:**
- [ ] Full experiment lifecycle manageable in UI
- [ ] Real-time stats visible and accurate
- [ ] Bayesian visualizations clear to non-statisticians
- [ ] Export and reporting functional
---
## Phase 5: Advanced Capabilities (Future)
### 5.1 Multi-Variate Testing
- [ ] Test multiple variables simultaneously
- [ ] Full factorial and fractional factorial designs
- [ ] Interaction effect detection
### 5.2 Sequential Experimentation
- [ ] Multi-phase experiments (qualification → main → validation)
- [ ] Holdout groups for long-term validation
- [ ] Global holdout (never-exposed users)
### 5.3 Personalization Layer
- [ ] Contextual bandits (different variants for different users)
- [ ] ML model for variant selection
- [ ] Automatic personalization optimization
### 5.4 Experiment Coordination
- [ ] Mutually exclusive experiments
- [ ] Experiment priority rules
- [ ] Layered experimentation (orthogonal tests)
---
## Appendix A: Data Models
### ExperimentDoc
```typescript
interface ExperimentDoc {
id: string; // exp_<uuid>
productId: string; // partition key
// Experiment definition
name: string;
description: string;
hypothesis: string;
aiGeneratedHypothesis?: boolean; // Flag for AI-suggested
// Status lifecycle: draft → running → paused | stopped | completed
status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';
// Variants
controlVariantId: string; // Baseline variant
variantIds: string[]; // All variant IDs
// Configuration
allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
targetPercent: number; // % of eligible traffic
// Audience targeting
targeting: {
platforms?: string[]; // ios, android, web
appVersions?: { min: string; max?: string };
regions?: string[];
userSegments?: string[]; // pro, free, enterprise
userProperties?: Record<string, string | number | boolean>;
};
// Metrics
primaryMetric: {
name: string;
type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
eventName: string; // Telemetry event to track
aggregation: 'sum' | 'mean' | 'count' | 'unique';
direction: 'increase' | 'decrease'; // Is higher better?
minimumDetectableEffect: number; // % change we want to detect
};
secondaryMetrics: Array<{
name: string;
type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
eventName: string;
}>;
// Guardrails
guardrails: {
minSampleSizePerVariant: number; // Default: 100
maxDurationDays: number; // Safety limit, default: 30
autoStopEnabled: boolean;
winnerThreshold: number; // % probability to auto-stop, default: 95
requireApprovalFor: 'none' | 'revenue' | 'all';
};
// Scheduling
startAt?: string; // Scheduled start (ISO 8601)
endAt?: string; // Scheduled end or actual stop
// Stats (denormalized for fast reads)
totalParticipants: number;
totalEvents: number;
// Timestamps
createdAt: string;
updatedAt: string;
startedAt?: string;
completedAt?: string;
ttl: number; // 2 years for completed
}
```
### VariantDoc
```typescript
interface VariantDoc {
id: string; // var_<uuid>
experimentId: string; // partition key
// Variant definition
name: string; // "Control", "New Button Color", etc.
description?: string;
isControl: boolean;
// Feature flag configuration
flagConfig: Record<string, unknown>; // Arbitrary config payload
// Traffic allocation (dynamic for bandit strategies)
currentAllocationPercent: number; // 0100%
// Statistics (real-time computed)
stats: {
participants: number;
events: number;
// Primary metric
primaryMetricValue: number; // Mean or conversion rate
primaryMetricStdDev?: number;
// For conversion metrics
conversions?: number;
conversionRate?: number; // 01
// Bayesian posterior parameters
betaAlpha?: number; // For Beta distribution
betaBeta?: number;
gammaShape?: number; // For Gamma distribution
gammaScale?: number;
};
// Bayesian results
bayesianResults?: {
probabilityBeatsControl: number; // 01
probabilityBeatsAll: number; // 01
expectedLiftPercent: number; // Relative to control
expectedLoss: number; // Risk of choosing this variant
credibleInterval: {
lower: number;
mean: number;
upper: number;
};
};
createdAt: string;
updatedAt: string;
}
```
### ExperimentAssignmentDoc
```typescript
interface ExperimentAssignmentDoc {
id: string; // ea_<uuid>
userId: string; // partition key (for user lookups)
experimentId: string;
variantId: string;
// Assignment metadata
assignedAt: string; // First assignment
firstExposedAt?: string; // First actual exposure (feature use)
// Context at assignment
assignmentContext: {
platform: string;
appVersion: string;
osVersion: string;
deviceModel?: string;
region?: string;
};
// Events attributed to this assignment
eventCount: number;
lastEventAt?: string;
// TTL: Remove after experiment completes + analysis period
ttl: number; // experimentEnd + 90 days
}
```
### ExperimentEventDoc
```typescript
interface ExperimentEventDoc {
id: string; // ee_<uuid>
experimentId: string; // partition key
timestamp: string; // Sort key for time-series queries
// Attribution
userId: string;
variantId: string;
assignmentId: string;
// Event details
metricName: string;
metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
value: number; // Numeric value
// Conversion tracking (for binary metrics)
converted: boolean; // For conversion metrics
// Context
eventMetadata?: Record<string, unknown>;
// Denormalized for filtering
platform: string;
appVersion: string;
// TTL: Shorter for raw events
ttl: number; // 90 days
}
```
---
## Implementation Tracking
| Phase | Task | Status | Commit |
| ----- | ----------------------------- | ------ | ------ |
| 1.1 | Experiment types & schemas | ⬜ | — |
| 1.1 | Cosmos containers | ⬜ | — |
| 1.2 | Deterministic bucketing | ⬜ | — |
| 1.2 | Assignment strategies | ⬜ | — |
| 1.2 | Audience targeting | ⬜ | — |
| 1.3 | Metric definitions | ⬜ | — |
| 1.3 | Event ingestion | ⬜ | — |
| 2.1 | Bayesian inference engine | ⬜ | — |
| 2.1 | Probability calculations | ⬜ | — |
| 2.1 | Credible intervals | ⬜ | — |
| 2.2 | Early stopping rules | ⬜ | — |
| 2.2 | Auto-promotion | ⬜ | — |
| 2.2 | Guardrails | ⬜ | — |
| 2.3 | Thompson sampling | ⬜ | — |
| 2.3 | Exploration vs exploitation | ⬜ | — |
| 2.3 | Regret minimization | ⬜ | — |
| 3.1 | Pattern detection | ⬜ | — |
| 3.1 | Anomaly detection | ⬜ | — |
| 3.2 | Hypothesis generation prompts | ⬜ | — |
| 3.2 | Hypothesis ranking | ⬜ | — |
| 3.3 | Auto-experiment suggestions | ⬜ | — |
| 4.1 | Experiments list page | ⬜ | — |
| 4.1 | Creation wizard | ⬜ | — |
| 4.2 | Live dashboard | ⬜ | — |
| 4.2 | Statistical summary | ⬜ | — |
| 4.3 | Results & reporting | ⬜ | — |
| 4.3 | AI insights panel | ⬜ | — |
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
---
## Quick Reference for Implementing Agent
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md`
**Key Files to Modify/Create:**
```
services/platform-service/
├── src/
│ ├── modules/ab-testing/
│ │ ├── types.ts # [1.1] Experiment, Variant, Assignment types
│ │ ├── repository.ts # [1.2] Data access layer
│ │ ├── bucketing.ts # [1.2] FNV-1a hash, sticky assignments
│ │ ├── statistics.ts # [2.1] Bayesian inference, Beta/Normal distributions
│ │ ├── allocation.ts # [2.3] Thompson sampling, bandit strategies
│ │ ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
│ │ ├── routes.ts # [4] REST API
│ │ └── ab-testing.test.ts # Tests
│ ├── lib/
│ │ └── cosmos-init.ts # [1.1] Add containers
│ └── server.ts # Register routes
dashboards/admin-web/
├── src/
│ ├── app/(dashboard)/
│ │ ├── experiments/
│ │ │ ├── page.tsx # [4.1] Experiments list
│ │ │ ├── new/page.tsx # [4.1] Creation wizard
│ │ │ └── [id]/
│ │ │ └── page.tsx # [4.2] Live dashboard
│ ├── lib/
│ │ └── experiments-client.ts # API client
│ └── components/
│ └── experiments/ # Bayesian charts, variant cards
```
**Commit Message Format:**
```
feat(ab-testing): <description> [<task.code>]
```
**Example:**
```bash
git add services/platform-service/src/modules/ab-testing/
git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"
```
**Testing Requirements:**
- Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
- Statistical validation: A/A tests, known distribution tests
- Integration: End-to-end experiment lifecycle
**Dependencies:**
- Feature flags module (reuse bucketing logic)
- Telemetry module (event tracking)
- Azure OpenAI (hypothesis generation)
---
## Appendix B: Statistical Methods
### Bayesian A/B Testing
**Conversion Metrics (Beta-Binomial):**
```
Posterior: Beta(α + conversions, β + non-conversions)
Where α = β = 1 (uniform prior)
Probability variant beats control:
P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]
```
**Continuous Metrics (Normal):**
```
Posterior: Normal(μ_n, σ_n²)
Where μ_n, σ_n updated via conjugate prior
Probability variant beats control via Monte Carlo sampling
```
### Thompson Sampling
```
For each incoming user:
For each variant:
Sample θ_i from variant's posterior distribution
Assign user to variant with max(θ_i)
Update variant's posterior after observing outcome
```
### Early Stopping
```
Stop experiment when:
max_variant P(beats control) > 0.95 → Winner found
OR max_variant P(beats control) < 0.05 No winner
OR days_running > max_duration
AND samples_per_variant > min_sample_size
```
---
## Appendix C: API Reference
| Method | Endpoint | Auth | Description |
| ------ | -------------------------------------- | -------- | -------------------------------- |
| POST | `/ab-testing/experiments` | Admin | Create experiment |
| GET | `/ab-testing/experiments` | Admin | List experiments |
| GET | `/ab-testing/experiments/:id` | Admin | Get experiment details |
| PATCH | `/ab-testing/experiments/:id` | Admin | Update experiment |
| DELETE | `/ab-testing/experiments/:id` | Admin | Stop/archive experiment |
| POST | `/ab-testing/experiments/:id/start` | Admin | Start experiment |
| POST | `/ab-testing/experiments/:id/pause` | Admin | Pause experiment |
| POST | `/ab-testing/experiments/:id/complete` | Admin | Complete with winner |
| POST | `/ab-testing/assign` | Any auth | Get variant assignment for user |
| POST | `/ab-testing/events` | Any auth | Track experiment event |
| GET | `/ab-testing/experiments/:id/results` | Admin | Get statistical results |
| GET | `/ab-testing/suggestions` | Admin | AI-generated experiment ideas |
| POST | `/ab-testing/hypotheses` | Admin | Generate hypothesis from pattern |
---
## Appendix D: Integration Points
### With Feature Flags Module
- Experiments build on feature flag infrastructure
- Flag state = variant assignment
- Consistent bucketing with existing flags
### With Telemetry Module
- Experiment events enriched with telemetry context
- Automatic metric tracking from existing events
- Funnel analysis using telemetry breadcrumbs
### With Event Bus
| Event | Action |
| ----------------------------- | ----------------------------------- |
| `ab.experiment.started` | Notify stakeholders, log audit |
| `ab.experiment.completed` | Generate report, suggest follow-ups |
| `ab.variant.declared_winner` | Trigger auto-rollout if enabled |
| `ab.early_stopping.triggered` | Alert experiment owner |
---
## Appendix E: Cost Estimation
| Component | Monthly Cost (est.) |
| ---------------------------- | ------------------------ |
| Cosmos DB (experiment data) | $100200 |
| LLM hypothesis generation | $50100 (weekly reports) |
| Compute (statistical engine) | $50 (negligible) |
| **Total** | **$200350/month** |
---
## Current Status
- [ ] **Design complete** — Target: 2026-03-10
- [ ] **Phase 1: Core Engine** — Not started
- [ ] **Phase 2: Statistics** — Not started
- [ ] **Phase 3: AI Hypotheses** — Not started
- [ ] **Phase 4: Admin UI** — Not started
- [ ] **Phase 5: Advanced** — Future
**Estimated Timeline:** 2.53 weeks (Phases 14)
**Dependencies:**
- Feature flags module (for assignment infrastructure)
- Telemetry module (for event tracking)
- Azure OpenAI (for hypothesis generation)
---
_Last Updated: 2026-03-03_

View File

@ -0,0 +1,848 @@
# Predictive Churn & Health Scoring — Implementation Roadmap
> **Module:** `platform-service/src/modules/predictive-analytics/`
> **Admin UI:** `/ops/health-dashboard/`
> **Target:** ML-powered churn prediction, health scoring, and proactive retention
> **Estimated Effort:** 3 weeks
> **Status:** 🟡 Planning
---
## Executive Summary
This roadmap delivers a **predictive analytics platform** that forecasts user churn 730 days in advance and computes product health scores from telemetry. Unlike reactive dashboards that show what happened, this system **predicts what will happen**—enabling proactive retention campaigns, resource allocation, and product improvements before users leave.
### Key Differentiators vs. Reactive Analytics
| Capability | Traditional Analytics | Predictive Churn & Health |
| ------------------- | -------------------------- | --------------------------------- |
| Insight Type | Historical (what happened) | **Predictive (what will happen)** |
| Churn Detection | After user leaves | **730 days before churn** |
| Health View | Current snapshot only | **Trending + forecasted** |
| Interventions | Reactive recovery | **Proactive prevention** |
| Product Insights | Manual pattern search | **Auto-detected risk signals** |
| Resource Allocation | Guesswork | **Risk-weighted prioritization** |
---
## Phase 1: Feature Engineering Pipeline (Week 1)
### 1.1 Telemetry Feature Extraction
- [ ] **1.1.1** Create `modules/predictive-analytics/feature-extractor.ts`
- [ ] User behavior features (session frequency, depth, recency)
- [ ] Engagement features (feature usage diversity, core action completion)
- [ ] Performance features (error rate, latency exposure, crash frequency)
- [ ] Social features (sharing, collaboration, network effects)
- [ ] Revenue features (payment history, plan changes, support tickets)
- [ ] **1.1.2** Time-window aggregations
- [ ] Last 24 hours (recent behavior)
- [ ] Last 7 days (weekly patterns)
- [ ] Last 30 days (monthly trends)
- [ ] Life-to-date (all-time totals)
- [ ] **1.1.3** Rolling window features
- [ ] 7-day rolling average (trend smoothing)
- [ ] Week-over-week change (acceleration)
- [ ] Cohort-normalized scores (vs. similar users)
### 1.2 Feature Store
- [ ] **1.2.1** Create `modules/predictive-analytics/feature-store.ts`
- [ ] `UserFeatureVector` — normalized feature values per user
- [ ] `ProductHealthMetrics` — aggregated product-level scores
- [ ] Feature versioning (track feature schema changes)
- [ ] **1.2.2** Add Cosmos containers to `cosmos-init.ts`
- [ ] `user_features` (pk: `/userId`, TTL: 90 days)
- [ ] `product_health` (pk: `/productId` + `/date`, time-series)
- [ ] `feature_definitions` (pk: `/productId`, feature metadata)
- [ ] **1.2.3** Feature computation jobs
- [ ] Daily feature computation (nightly batch)
- [ ] Real-time feature updates (on key events)
- [ ] Feature backfill (compute historical features)
### 1.3 Product-Specific Feature Catalog
- [ ] **1.3.1** Define features per product
- [ ] **NomGap:** Fast completion rate, protocol adherence, streak length, autophagy engagement
- [ ] **JarvisJr:** Session frequency, agent diversity, voice/text ratio, skill progression
- [ ] **ChronoMind:** Timer completion rate, cascade effectiveness, routine adherence, urgency response
- [ ] **MindLyst:** Brain usage diversity, triage accuracy, memory capture frequency, reflection completion
- [ ] **PeakPulse:** Session frequency, goal completion, streak maintenance, social sharing
- [ ] **LysnrAI:** Dictation frequency, accuracy rate, hotkey usage, vocabulary growth
- [ ] **1.3.2** Feature importance tracking
- [ ] Which features correlate with churn/retention
- [ ] Feature drift detection (behavior changes over time)
- [ ] Auto-suggest new features based on patterns
**Phase 1 Exit Criteria:**
- [ ] 50+ features extracted per product
- [ ] Feature store populated for all active users
- [ ] Daily feature computation job running
- [ ] Feature importance analysis completed
---
## Phase 2: Churn Prediction Model (Week 12)
### 2.1 Model Architecture
- [ ] **2.1.1** Create `modules/predictive-analytics/churn-model.ts`
- [ ] Binary classification (will churn in 7 days? 30 days?)
- [ ] Gradient Boosted Trees (XGBoost/LightGBM) baseline
- [ ] Neural network ensemble (for comparison)
- [ ] **2.1.2** Training pipeline
- [ ] Label definition: No activity for N days = churned
- [ ] Train/validation/test split (time-based, not random)
- [ ] Cross-validation with temporal folds
- [ ] Hyperparameter tuning (optuna/ray tune)
- [ ] **2.1.3** Model evaluation
- [ ] ROC-AUC (discrimination ability)
- [ ] Precision/Recall at different thresholds
- [ ] Calibration (predicted prob vs. actual rate)
- [ ] Per-product performance breakdown
### 2.2 Prediction Service
- [ ] **2.2.1** Real-time scoring API
- [ ] `POST /predictive/churn-score` — single user prediction
- [ ] `POST /predictive/churn-batch` — batch scoring
- [ ] Latency < 100ms for single prediction
- [ ] **2.2.2** Risk segmentation
- [ ] Risk buckets: Critical (>80%), High (6080%), Medium (3060%), Low (<30%)
- [ ] Risk score components (which features drive the score)
- [ ] Confidence intervals on predictions
- [ ] **2.2.3** Model versioning
- [ ] A/B test model versions
- [ ] Shadow mode (predict without acting)
- [ ] Rollback capability
### 2.3 Explanation Engine
- [ ] **2.3.1** SHAP value computation
- [ ] Feature contributions to each prediction
- [ ] Global feature importance (what drives churn overall)
- [ ] Local explanations (why this specific user is at risk)
- [ ] **2.3.2** Natural language explanations
```
"This user shows 78% churn risk because:
- Session frequency dropped 60% in the last week
- No core feature usage in 5 days
- Error rate increased 3x vs. their baseline
- Similar users who showed these patterns had 85% churn rate"
```
- [ ] **2.3.3** Actionable insight extraction
- [ ] Top 3 risk factors per user
- [ ] Suggested intervention based on risk profile
- [ ] Priority ranking (who to contact first)
**Phase 2 Exit Criteria:**
- [ ] Model achieves > 75% AUC on test set
- [ ] Real-time scoring API < 100ms latency
- [ ] Explanations generated for all predictions
- [ ] Risk segmentation validated against historical churn
---
## Phase 3: Product Health Scoring (Week 2)
### 3.1 Health Metric Framework
- [ ] **3.1.1** Create `modules/predictive-analytics/health-scoring.ts`
- [ ] Health dimensions: Acquisition, Activation, Retention, Revenue, Engagement
- [ ] Composite health score (weighted average)
- [ ] Per-dimension scores with drill-down
- [ ] **3.1.2** Product health indicators
- [ ] Daily Active Users (DAU) trend
- [ ] New user activation rate (Day-1, Day-7)
- [ ] Cohort retention curves (Day-1, Day-7, Day-30)
- [ ] Feature adoption rates (new feature uptake)
- [ ] Error rates and stability scores
- [ ] Support ticket volume and sentiment
- [ ] Revenue metrics (MRR, ARPU, LTV)
### 3.2 Health Score Computation
- [ ] **3.2.1** Baseline establishment
- [ ] Historical 90-day baseline for each metric
- [ ] Peer product comparison (ChronoMind vs. JarvisJr benchmarks)
- [ ] Industry benchmarks (if available)
- [ ] **3.2.2** Scoring algorithm
- [ ] Z-score normalization (how many std devs from baseline)
- [ ] Trend direction (improving vs. declining)
- [ ] Volatility adjustment (consistent vs. erratic)
- [ ] 0100 health score scale
- [ ] **3.2.3** Alert thresholds
- [ ] Critical: Score < 60 or 20% drop from baseline
- [ ] Warning: Score 6075 or 10% drop
- [ ] Healthy: Score > 75 and stable
### 3.3 Anomaly Detection
- [ ] **3.3.1** Statistical anomaly detection
- [ ] Prophet/ARIMA for time-series forecasting
- [ ] Forecast vs. actual deviation detection
- [ ] Seasonal pattern recognition (day-of-week, monthly)
- [ ] **3.3.2** Multi-dimensional anomaly detection
- [ ] Correlation breakdown detection (metrics usually correlated diverging)
- [ ] Cohort-specific anomalies (specific region, platform, segment)
- [ ] **3.3.3** Root cause suggestion
- [ ] Correlation with deployments/releases
- [ ] Error spike correlation
- [ ] External factor detection (holidays, events)
**Phase 3 Exit Criteria:**
- [ ] Health scores computed daily for all products
- [ ] Anomaly detection with < 5% false positive rate
- [ ] Historical baseline established for all metrics
---
## Phase 4: Proactive Intervention System (Week 23)
### 4.1 Retention Campaign Automation
- [ ] **4.1.1** Campaign trigger rules
- [ ] High-risk user enters segment → trigger email
- [ ] Medium-risk + specific behavior → trigger in-app message
- [ ] Critical risk → trigger personal outreach task
- [ ] **4.1.2** Personalized messaging
- [ ] Message variant based on risk factors
- [ ] Feature recommendations based on unused capabilities
- [ ] Success stories from similar users
- [ ] **4.1.3** Campaign effectiveness tracking
- [ ] Control group vs. treatment
- [ ] Churn rate comparison
- [ ] Revenue impact measurement
### 4.2 Auto-Trigger Flows
- [ ] **4.2.1** Platform integrations
- [ ] Email delivery via existing `modules/delivery/`
- [ ] Push notifications via `modules/notifications/`
- [ ] Slack notifications for CS team
- [ ] CRM integration (create outreach tasks)
- [ ] **4.2.2** Smart scheduling
- [ ] Optimal contact time prediction
- [ ] Frequency capping (don't spam)
- [ ] Multi-channel orchestration
- [ ] **4.2.3** Feedback loop
- [ ] Track intervention outcomes
- [ ] Retrain model with intervention effectiveness
- [ ] A/B test intervention strategies
### 4.3 Risk Dashboard for CS Team
- [ ] **4.3.1** At-risk user list
- [ ] Sortable by churn probability
- [ ] Filter by product, segment, risk factors
- [ ] Last activity preview
- [ ] **4.3.2** User risk profile
- [ ] Churn probability trend over time
- [ ] Key risk factors highlighted
- [ ] Recommended actions
- [ ] User activity timeline
- [ ] **4.3.3** Intervention tracking
- [ ] Contact history
- [ ] Response tracking
- [ ] Outcome recording (retained/churned)
**Phase 4 Exit Criteria:**
- [ ] Automated campaigns triggered for high-risk users
- [ ] CS team dashboard with at-risk user queue
- [ ] Intervention effectiveness measurement in place
- [ ] Feedback loop improving model accuracy
---
## Phase 5: Admin Dashboard UI (Week 3)
### 5.1 Product Health Overview
- [ ] **5.1.1** Create `/ops/health-dashboard/page.tsx`
- [ ] Health score cards for each product
- [ ] Trend sparklines (7-day, 30-day)
- [ ] Alert summary (critical issues count)
- [ ] Product comparison table
- [ ] **5.1.2** Health detail view
- [ ] Dimension breakdown (acquisition, activation, retention, etc.)
- [ ] Metric time-series charts
- [ ] Anomaly markers on charts
- [ ] Cohort retention curves
### 5.2 Churn Prediction Dashboard
- [ ] **5.2.1** Churn risk overview
- [ ] Risk distribution pie chart
- [ ] At-risk user count by product
- [ ] Predicted churn impact (revenue at risk)
- [ ] Model performance metrics (AUC, calibration)
- [ ] **5.2.2** User risk explorer
- [ ] Search/filter at-risk users
- [ ] Risk score with explanation
- [ ] Top risk factors
- [ ] Recommended interventions
- [ ] **5.2.3** Model insights
- [ ] Global feature importance chart
- [ ] Model performance over time
- [ ] Feature drift alerts
### 5.3 Campaign Management
- [ ] **5.3.1** Campaign list
- [ ] Active/paused/completed campaigns
- [ ] Trigger rules summary
- [ ] Performance stats (sent, opened, converted)
- [ ] **5.3.2** Campaign editor
- [ ] Trigger condition builder
- [ ] Message template editor
- [ ] Audience targeting
- [ ] A/B test configuration
- [ ] **5.3.3** Campaign analytics
- [ ] Funnel: triggered → sent → opened → retained
- [ ] Revenue impact
- [ ] Comparison to control group
**Phase 5 Exit Criteria:**
- [ ] Health dashboard shows all products with trends
- [ ] Churn predictions visible with explanations
- [ ] Campaign creation and management functional
- [ ] Full test coverage
---
## Phase 6: Advanced Capabilities (Future)
### 6.1 Cohort-Specific Models
- [ ] Segment-specific churn models (iOS vs. Android, free vs. pro)
- [ ] Regional models (different behaviors by geography)
- [ ] Temporal models (seasonal churn patterns)
### 6.2 LTV Prediction
- [ ] Predict lifetime value at signup
- [ ] Predict upgrade probability (free → pro)
- [ ] Optimize acquisition channels by predicted LTV
### 6.3 Product Recommendations
- [ ] Suggest features to at-risk users based on successful cohorts
- [ ] Personalized onboarding based on predicted needs
- [ ] Next-best-action recommendations
---
## Appendix A: Data Models
### UserChurnPredictionDoc
```typescript
interface UserChurnPredictionDoc {
id: string; // cp_<uuid>
userId: string; // partition key
productId: string;
// Prediction
predictionHorizon: 7 | 14 | 30; // Days
churnProbability: number; // 01
riskSegment: 'critical' | 'high' | 'medium' | 'low';
// Feature vector snapshot
features: Record<string, number>; // Normalized feature values
featureVersion: string; // Schema version
// Model info
modelVersion: string;
modelType: 'xgboost' | 'neural';
predictionTimestamp: string;
// Explanation (SHAP values)
explanation: {
topRiskFactors: Array<{
feature: string;
contribution: number; // SHAP value
direction: 'positive' | 'negative'; // Increases or decreases churn risk
}>;
globalFeatureImportance: Array<{
feature: string;
importance: number;
}>;
};
// Natural language summary
nlExplanation: string; // Auto-generated explanation
// Intervention
suggestedActions: string[];
interventionHistory: Array<{
action: string;
timestamp: string;
outcome?: 'responded' | 'ignored' | 'churned' | 'retained';
}>;
// Validation (ground truth)
actualChurned?: boolean;
validationDate?: string;
createdAt: string;
ttl: number; // predictionHorizon + 90 days
}
```
### ProductHealthScoreDoc
```typescript
interface ProductHealthScoreDoc {
id: string; // ph_<uuid>
productId: string; // partition key
date: string; // Sort key (YYYY-MM-DD)
// Composite score
overallHealthScore: number; // 0100
healthStatus: 'critical' | 'warning' | 'healthy';
// Dimension scores
dimensions: {
acquisition: {
score: number; // 0100
metrics: {
newUsers: number;
activationRateDay1: number;
activationRateDay7: number;
cac: number;
};
trend: 'improving' | 'stable' | 'declining';
};
activation: {
score: number;
metrics: {
firstValueMomentRate: number;
timeToFirstAction: number;
onboardingCompletionRate: number;
};
trend: 'improving' | 'stable' | 'declining';
};
retention: {
score: number;
metrics: {
dau: number;
mau: number;
dauMauRatio: number;
day7Retention: number;
day30Retention: number;
};
trend: 'improving' | 'stable' | 'declining';
};
engagement: {
score: number;
metrics: {
avgSessionLength: number;
sessionsPerUser: number;
featureAdoption: Record<string, number>;
};
trend: 'improving' | 'stable' | 'declining';
};
revenue: {
score: number;
metrics: {
mrr: number;
arpu: number;
churnRate: number;
upgradeRate: number;
};
trend: 'improving' | 'stable' | 'declining';
};
stability: {
score: number;
metrics: {
crashFreeRate: number;
errorRate: number;
avgLatency: number;
uptimePercent: number;
};
trend: 'improving' | 'stable' | 'declining';
};
};
// Anomalies detected
anomalies: Array<{
metric: string;
expectedValue: number;
actualValue: number;
deviationPercent: number;
severity: 'critical' | 'warning';
suggestedCause?: string;
}>;
// Forecasts
forecasts: {
next7Days: {
expectedHealthScore: number;
confidenceInterval: [number, number];
};
next30Days: {
expectedHealthScore: number;
confidenceInterval: [number, number];
};
};
// Benchmarks
vsBaseline7Day: number; // % change vs. 7-day baseline
vsBaseline30Day: number; // % change vs. 30-day baseline
createdAt: string;
ttl: number; // 2 years
}
```
### RetentionCampaignDoc
```typescript
interface RetentionCampaignDoc {
id: string; // rc_<uuid>
productId: string; // partition key
// Campaign definition
name: string;
description: string;
status: 'draft' | 'active' | 'paused' | 'completed';
// Trigger conditions
trigger: {
type: 'churn_risk' | 'health_score_drop' | 'behavioral' | 'scheduled';
conditions: Array<{
field: string;
operator: 'gt' | 'lt' | 'eq' | 'in';
value: unknown;
}>;
};
// Audience
audience: {
riskSegments?: string[]; // 'critical', 'high', etc.
products?: string[];
userSegments?: string[];
excludeRecentContact?: number; // Hours (frequency capping)
};
// Message content
messages: Array<{
channel: 'email' | 'push' | 'in_app' | 'slack_cs';
templateId: string;
variant?: string; // A/B test variant
delayHours?: number; // Delay after trigger
conditions?: Array<{
field: string;
operator: string;
value: unknown;
}>;
}>;
// Performance tracking
stats: {
triggered: number;
sent: number;
opened: number;
clicked: number;
converted: number;
controlGroupSize: number;
controlChurnRate: number;
treatmentChurnRate: number;
};
createdAt: string;
updatedAt: string;
ttl: number; // 1 year after completion
}
```
### UserFeatureVectorDoc
```typescript
interface UserFeatureVectorDoc {
id: string; // fv_<uuid>
userId: string; // partition key
productId: string;
// Computed features
features: {
// Recency features
daysSinceLastSession: number;
daysSinceLastCoreAction: number;
// Frequency features
sessionsLast7Days: number;
sessionsLast30Days: number;
avgSessionsPerWeek: number;
// Engagement depth
avgSessionDuration: number;
actionsPerSession: number;
uniqueFeaturesUsed: number;
// Product-specific (examples)
// NomGap
fastCompletionRate?: number;
streakLength?: number;
// JarvisJr
agentDiversityScore?: number;
voiceSessionRatio?: number;
// ChronoMind
timerCompletionRate?: number;
routineAdherenceScore?: number;
// Error/stability
errorRateLast7Days: number;
crashCountLast30Days: number;
// Revenue
planTier: number; // 0=free, 1=pro, 2=enterprise
lifetimeValue: number;
daysSinceLastPayment?: number;
};
// Normalized (01) for model input
normalizedFeatures: Record<string, number>;
// Metadata
featureSchemaVersion: string;
computedAt: string;
// Time windows
observationWindow: {
start: string;
end: string;
};
ttl: number; // 90 days
}
```
---
## Implementation Tracking
| Phase | Task | Status | Commit |
| ----- | ----------------------------- | ------ | ------ |
| 1.1 | Telemetry feature extraction | ⬜ | — |
| 1.1 | Time-window aggregations | ⬜ | — |
| 1.1 | Rolling window features | ⬜ | — |
| 1.2 | Feature store | ⬜ | — |
| 1.2 | Cosmos containers | ⬜ | — |
| 1.2 | Feature computation jobs | ⬜ | — |
| 1.3 | Product-specific features | ⬜ | — |
| 1.3 | Feature importance tracking | ⬜ | — |
| 2.1 | XGBoost model architecture | ⬜ | — |
| 2.1 | Training pipeline | ⬜ | — |
| 2.1 | Model evaluation | ⬜ | — |
| 2.2 | Real-time scoring API | ⬜ | — |
| 2.2 | Risk segmentation | ⬜ | — |
| 2.2 | Model versioning | ⬜ | — |
| 2.3 | SHAP explanations | ⬜ | — |
| 2.3 | Natural language explanations | ⬜ | — |
| 2.3 | Actionable insights | ⬜ | — |
| 3.1 | Health metric framework | ⬜ | — |
| 3.1 | Health indicators | ⬜ | — |
| 3.2 | Baseline establishment | ⬜ | — |
| 3.2 | Scoring algorithm | ⬜ | — |
| 3.2 | Alert thresholds | ⬜ | — |
| 3.3 | Anomaly detection | ⬜ | — |
| 4.1 | Campaign trigger rules | ⬜ | — |
| 4.1 | Personalized messaging | ⬜ | — |
| 4.2 | Platform integrations | ⬜ | — |
| 4.3 | CS team dashboard | ⬜ | — |
| 5.1 | Health overview UI | ⬜ | — |
| 5.2 | Churn prediction dashboard | ⬜ | — |
| 5.3 | Campaign management | ⬜ | — |
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
---
## Quick Reference for Implementing Agent
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md`
**Key Files to Modify/Create:**
```
services/platform-service/
├── src/
│ ├── modules/predictive-analytics/
│ │ ├── types.ts # [1.2] Feature, HealthScore, Prediction types
│ │ ├── repository.ts # Data access layer
│ │ ├── feature-extractor.ts # [1.1] Telemetry → features
│ │ ├── feature-store.ts # [1.2] Feature vector storage
│ │ ├── churn-model.ts # [2.1] XGBoost training & inference
│ │ ├── scoring-api.ts # [2.2] Real-time prediction endpoint
│ │ ├── explanation-engine.ts # [2.3] SHAP + NL explanations
│ │ ├── health-scoring.ts # [3] Health dimension calculation
│ │ ├── anomaly-detection.ts # [3.3] Prophet/ARIMA forecasting
│ │ ├── campaign-engine.ts # [4] Retention automation
│ │ ├── routes.ts # [5] REST API
│ │ └── predictive-analytics.test.ts # Tests
│ ├── lib/
│ │ └── cosmos-init.ts # [1.2] Add containers
│ └── server.ts # Register routes
dashboards/admin-web/
├── src/
│ ├── app/(dashboard)/
│ │ ├── health-dashboard/
│ │ │ └── page.tsx # [5.1] Product health overview
│ │ └── predictive/
│ │ ├── at-risk/
│ │ │ └── page.tsx # [4.3] At-risk user list
│ │ └── campaigns/
│ │ └── page.tsx # [5.3] Campaign management
│ ├── lib/
│ │ └── predictive-client.ts # API client
│ └── components/
│ └── predictive/ # Risk cards, health charts
```
**Commit Message Format:**
```
feat(predictive-analytics): <description> [<task.code>]
```
**Example:**
```bash
git add services/platform-service/src/modules/predictive-analytics/
git commit -m "feat(predictive-analytics): add feature extraction and store [1.1-1.2]"
```
**Testing Requirements:**
- Unit tests: 20+ Vitest tests for feature extraction, model inference
- Model validation: AUC, calibration, precision@k metrics
- Integration: End-to-end prediction pipeline
**Dependencies:**
- Telemetry module (feature extraction)
- Delivery module (retention campaigns)
- Azure ML or scikit-learn (model training)
---
## Appendix B: API Reference
| Method | Endpoint | Auth | Description |
| ------ | -------------------------------------- | ------------- | ----------------------------- |
| GET | `/predictive/health` | Admin | Get all product health scores |
| GET | `/predictive/health/:productId` | Admin | Get product health detail |
| GET | `/predictive/health/:productId/trends` | Admin | Historical health trends |
| POST | `/predictive/churn-score` | Admin/Service | Get churn prediction for user |
| POST | `/predictive/churn-batch` | Admin | Batch churn scoring |
| GET | `/predictive/at-risk-users` | Admin/CS | List users by risk segment |
| GET | `/predictive/users/:id/risk-profile` | Admin/CS | User churn risk details |
| GET | `/predictive/model/performance` | Admin | Model accuracy metrics |
| GET | `/predictive/model/features` | Admin | Feature importance ranking |
| GET | `/predictive/campaigns` | Admin | List retention campaigns |
| POST | `/predictive/campaigns` | Admin | Create campaign |
| PATCH | `/predictive/campaigns/:id` | Admin | Update campaign |
| GET | `/predictive/campaigns/:id/stats` | Admin | Campaign performance |
| POST | `/predictive/campaigns/:id/trigger` | Admin | Manual trigger for testing |
---
## Appendix C: Integration Points
### With Telemetry Module
- Raw events feed feature extraction
- Error rates flow into health scores
- Correlation IDs link behaviors to predictions
### With Diagnostics Module
- Debug sessions enrich feature vectors
- Error clusters correlate with churn risk
- Screenshot patterns analyzed for UX issues
### With Event Bus
| Event | Action |
| ----------------------------------- | ---------------------------------------- |
| `predictive.churn.risk_detected` | Trigger retention campaign |
| `predictive.health.critical` | Alert leadership, suggest debug sessions |
| `predictive.anomaly.detected` | Create incident, notify on-call |
| `user.retention.campaign_responded` | Update model with outcome |
### With Delivery Module
- Retention campaigns use email templates
- Push notifications for urgent interventions
- A/B test message variants
---
## Appendix D: Cost Estimation
| Component | Monthly Cost (est.) |
| ------------------------------------ | -------------------------- |
| Cosmos DB (features + predictions) | $150300 |
| Model training (Azure ML) | $100200 |
| Inference compute | $50100 |
| Email delivery (retention campaigns) | $50200 (volume-dependent) |
| **Total** | **$350800/month** |
ROI: If system prevents 5% of predicted churn at $50 LTV with 10K at-risk users/month:
- 500 users retained × $50 = $25K/month value
- 10:1+ ROI
---
## Appendix E: Success Metrics
### Model Performance
- [ ] AUC > 75% (discrimination)
- [ ] Calibration slope 0.91.1 (well-calibrated probabilities)
- [ ] Precision@10% > 60% (high-risk predictions are accurate)
### Business Impact
- [ ] 10%+ reduction in churn rate for targeted cohorts
- [ ] 5%+ increase in re-engagement campaign response
- [ ] CS team satisfaction with at-risk user visibility
### Operational
- [ ] Prediction latency < 100ms
- [ ] Feature freshness < 24 hours
- [ ] Model retraining automation
---
## Current Status
- [ ] **Design complete** — Target: 2026-03-10
- [ ] **Phase 1: Feature Pipeline** — Not started
- [ ] **Phase 2: Churn Model** — Not started
- [ ] **Phase 3: Health Scoring** — Not started
- [ ] **Phase 4: Interventions** — Not started
- [ ] **Phase 5: Admin UI** — Not started
- [ ] **Phase 6: Advanced** — Future
**Estimated Timeline:** 3 weeks (Phases 15)
**Dependencies:**
- Telemetry module (for feature extraction)
- Azure ML or similar (for model training)
- Delivery module (for retention campaigns)
---
_Last Updated: 2026-03-03_