docs(roadmaps): add AI diagnostic assistant, A/B testing, and churn prediction roadmaps
- AI Diagnostic Assistant: LLM-powered root cause analysis, error clustering, natural language queries - Intelligent A/B Testing: Thompson sampling, Bayesian early stopping, AI hypothesis generation - Predictive Churn & Health: XGBoost models, health scoring, automated retention campaigns All roadmaps include: - Implementation tracking tables with status/commit columns - Quick reference sections with file structures - Phase-by-phase task breakdowns with [X.Y.Z] codes
This commit is contained in:
parent
d510867b87
commit
e98380003b
597
docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
Normal file
597
docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md
Normal file
@ -0,0 +1,597 @@
|
|||||||
|
# AI Diagnostic Assistant — Implementation Roadmap
|
||||||
|
|
||||||
|
> **Module:** `platform-service/src/modules/ai-diagnostics/`
|
||||||
|
> **Admin UI:** `/ops/ai-diagnostics/`
|
||||||
|
> **Target:** LLM-powered root cause analysis from telemetry + debug sessions
|
||||||
|
> **Estimated Effort:** 2–3 weeks
|
||||||
|
> **Status:** 🟡 Planning
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This roadmap delivers an **AI-powered diagnostic assistant** that analyzes error patterns, debug session data, and telemetry to automatically suggest root causes—like having a senior engineer on-call 24/7. Engineers can ask natural language questions like _"Why did the iOS keyboard crash yesterday?"_ and receive AI-generated hypotheses with supporting evidence.
|
||||||
|
|
||||||
|
### Key Differentiators vs. Manual Debugging
|
||||||
|
|
||||||
|
| Feature | Manual Debugging | AI Diagnostic Assistant |
|
||||||
|
| ----------------- | --------------------------- | ----------------------------------- |
|
||||||
|
| Query | SQL + log grep | **Natural language** |
|
||||||
|
| Pattern Detection | Hours of manual correlation | **AI finds hidden patterns** |
|
||||||
|
| Context Assembly | Check 5+ systems manually | **Auto-assembles timeline** |
|
||||||
|
| Hypothesis | Engineer intuition | **LLM-generated + evidence** |
|
||||||
|
| Learning | Per-engineer experience | **Accumulates across all sessions** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Data Pipeline & Embeddings (Week 1)
|
||||||
|
|
||||||
|
**Goal:** Extract, normalize, and embed error data for semantic search and clustering.
|
||||||
|
|
||||||
|
### 1.1 Error Fingerprinting & Clustering
|
||||||
|
|
||||||
|
- [ ] **1.1.1** Create `modules/ai-diagnostics/types.ts`
|
||||||
|
- [ ] `ErrorClusterDoc` — grouped similar errors with signature
|
||||||
|
- [ ] `ErrorFingerprint` — normalized stack trace hash
|
||||||
|
- [ ] `ClusterAnalysis` — AI-generated pattern description
|
||||||
|
- [ ] Zod schemas for all inputs
|
||||||
|
|
||||||
|
_Commit format:_ `git commit -m "feat(ai-diagnostics): add error clustering types [1.1.1]"` → `https://github.com/saravanakumardb1/learning_ai_common_plat/commit/<hash>`
|
||||||
|
|
||||||
|
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
|
||||||
|
- [ ] `error_clusters` (pk: `/productId`, TTL: 90 days)
|
||||||
|
- [ ] `error_fingerprints` (pk: `/fingerprintHash`, unique index)
|
||||||
|
- [ ] `diagnostic_insights` (pk: `/clusterId`, AI-generated analyses)
|
||||||
|
|
||||||
|
_Commit format:_ `git commit -m "feat(ai-diagnostics): add cosmos containers for error clustering [1.1.2]"`
|
||||||
|
|
||||||
|
- [ ] **1.1.3** Implement error normalization
|
||||||
|
- [ ] Stack trace parsing (remove line numbers, file paths)
|
||||||
|
- [ ] Message templating (replace UUIDs, timestamps, user IDs with placeholders)
|
||||||
|
- [ ] Fingerprint generation (SHA-256 of normalized error)
|
||||||
|
- [ ] Similarity scoring (Levenshtein for near-matches)
|
||||||
|
|
||||||
|
_Commit format:_ `git commit -m "feat(ai-diagnostics): implement error normalization and fingerprinting [1.1.3]"`
|
||||||
|
|
||||||
|
### 1.2 Vector Embeddings for Semantic Search
|
||||||
|
|
||||||
|
- [ ] **1.2.1** Create embedding pipeline
|
||||||
|
- [ ] Azure OpenAI `text-embedding-3-small` integration
|
||||||
|
- [ ] Error message + stack trace → 1536-dim vector
|
||||||
|
- [ ] Batch embedding job (100 errors at a time)
|
||||||
|
- [ ] **1.2.2** Cosmos DB vector search setup
|
||||||
|
- [ ] Store embeddings in `error_clusters` documents
|
||||||
|
- [ ] Cosine similarity query function
|
||||||
|
- [ ] Similar error lookup by vector distance
|
||||||
|
- [ ] **1.2.3** Clustering algorithm
|
||||||
|
- [ ] HDBSCAN for density-based clustering
|
||||||
|
- [ ] DBSCAN fallback for smaller datasets
|
||||||
|
- [ ] Auto-determine cluster count (no manual k)
|
||||||
|
- [ ] Re-cluster nightly as new errors arrive
|
||||||
|
|
||||||
|
### 1.3 Telemetry Ingestion for Context
|
||||||
|
|
||||||
|
- [ ] **1.3.1** Link telemetry to errors
|
||||||
|
- [ ] `correlationId` propagation across services
|
||||||
|
- [ ] 5-minute window: error → preceding telemetry events
|
||||||
|
- [ ] Session state reconstruction (what user was doing)
|
||||||
|
- [ ] **1.3.2** Enrich error context
|
||||||
|
- [ ] Device info (OS version, model, memory)
|
||||||
|
- [ ] App state (screen, feature flags, config)
|
||||||
|
- [ ] Recent API calls (network trace from diagnostics)
|
||||||
|
- [ ] Recent user actions (breadcrumb trail)
|
||||||
|
|
||||||
|
**Phase 1 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Errors auto-clustered with 90%+ accuracy
|
||||||
|
- [ ] Vector search returns semantically similar errors
|
||||||
|
- [ ] 10,000+ historical errors embedded and clustered
|
||||||
|
- [ ] Correlation pipeline links errors to telemetry context
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: LLM Analysis Engine (Week 1–2)
|
||||||
|
|
||||||
|
### 2.1 Prompt Engineering & Analysis Pipeline
|
||||||
|
|
||||||
|
- [ ] **2.1.1** Create analysis prompts
|
||||||
|
- [ ] `ROOT_CAUSE_ANALYSIS` prompt template
|
||||||
|
|
||||||
|
```
|
||||||
|
Given this error cluster:
|
||||||
|
- Error signature: {fingerprint}
|
||||||
|
- Sample stack traces: {samples}
|
||||||
|
- Common context: {deviceStats}, {appState}
|
||||||
|
- Preceding events: {breadcrumbSummary}
|
||||||
|
- Similar resolved issues: {relatedClusters}
|
||||||
|
|
||||||
|
Analyze and provide:
|
||||||
|
1. Likely root cause category (config, dependency, logic, resource, external)
|
||||||
|
2. Specific hypothesis with reasoning
|
||||||
|
3. Evidence confidence (high/medium/low)
|
||||||
|
4. Suggested investigation steps
|
||||||
|
5. Potential fix direction
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] `PATTERN_SUMMARY` prompt for cluster descriptions
|
||||||
|
- [ ] `COMPARATIVE_ANALYSIS` for error vs. baseline
|
||||||
|
|
||||||
|
- [ ] **2.1.2** LLM integration
|
||||||
|
- [ ] Azure OpenAI GPT-4o-mini for analysis (cost-effective)
|
||||||
|
- [ ] GPT-4o for complex multi-factor analysis
|
||||||
|
- [ ] Response JSON schema enforcement
|
||||||
|
- [ ] Retry logic with exponential backoff
|
||||||
|
|
||||||
|
### 2.2 Insight Generation Service
|
||||||
|
|
||||||
|
- [ ] **2.2.1** Create `modules/ai-diagnostics/analyzer.ts`
|
||||||
|
- [ ] `analyzeCluster(clusterId)` — full analysis workflow
|
||||||
|
- [ ] `generateInsight(errorContext)` — single error analysis
|
||||||
|
- [ ] `compareClusters(clusterA, clusterB)` — diff analysis
|
||||||
|
- [ ] **2.2.2** Analysis workflow
|
||||||
|
- [ ] Fetch cluster data + related telemetry
|
||||||
|
- [ ] Build LLM context (respect token limits)
|
||||||
|
- [ ] Call LLM with structured prompt
|
||||||
|
- [ ] Parse and validate response
|
||||||
|
- [ ] Store insight in `diagnostic_insights`
|
||||||
|
- [ ] **2.2.3** Confidence scoring
|
||||||
|
- [ ] Evidence count weighting
|
||||||
|
- [ ] Similar resolved issue bonus
|
||||||
|
- [ ] Recency decay (older patterns = lower confidence)
|
||||||
|
- [ ] Multi-model consensus (if available)
|
||||||
|
|
||||||
|
### 2.3 Continuous Learning
|
||||||
|
|
||||||
|
- [ ] **2.3.1** Feedback loop
|
||||||
|
- [ ] Engineer feedback: "Was this insight helpful? 👍/👎"
|
||||||
|
- [ ] Resolution tracking (link commits to clusters)
|
||||||
|
- [ ] Confidence recalibration based on outcomes
|
||||||
|
- [ ] **2.3.2** Pattern accumulation
|
||||||
|
- [ ] "Known issues" database (manually curated)
|
||||||
|
- [ ] Historical fix patterns (what solved similar issues)
|
||||||
|
- [ ] Regression detection (old issue reappearing)
|
||||||
|
|
||||||
|
**Phase 2 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] LLM generates root cause hypotheses with evidence
|
||||||
|
- [ ] Confidence scores align with actual resolution rates
|
||||||
|
- [ ] Analysis completes in < 5 seconds for typical clusters
|
||||||
|
- [ ] Feedback loop capturing engineer ratings
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Natural Language Query Interface (Week 2)
|
||||||
|
|
||||||
|
### 3.1 Query Understanding
|
||||||
|
|
||||||
|
- [ ] **3.1.1** Create `modules/ai-diagnostics/query-parser.ts`
|
||||||
|
- [ ] Intent classification (root cause, pattern search, comparison, trend)
|
||||||
|
- [ ] Entity extraction (product, time range, error type, user segment)
|
||||||
|
- [ ] Temporal parsing ("yesterday", "last week", "since v2.1")
|
||||||
|
- [ ] Constraint identification ("only iOS", "excluding beta users")
|
||||||
|
- [ ] **3.1.2** Query patterns
|
||||||
|
- [ ] Root cause: _"Why did X happen?"_ → analyze cluster
|
||||||
|
- [ ] Pattern search: _"Show me similar crashes"_ → vector search
|
||||||
|
- [ ] Comparison: _"Did error rate increase after release?"_ → trend analysis
|
||||||
|
- [ ] User impact: _"How many users affected by Y?"_ → aggregation query
|
||||||
|
|
||||||
|
### 3.2 Query Execution Engine
|
||||||
|
|
||||||
|
- [ ] **3.2.1** Query → data pipeline
|
||||||
|
- [ ] Map entities to Cosmos queries
|
||||||
|
- [ ] Fetch relevant clusters, telemetry, sessions
|
||||||
|
- [ ] Assemble context for response generation
|
||||||
|
- [ ] **3.2.2** Response generation
|
||||||
|
- [ ] Direct answers for simple queries
|
||||||
|
- [ ] AI-generated summaries for complex analysis
|
||||||
|
- [ ] Data + visualization suggestions
|
||||||
|
- [ ] Drill-down links for exploration
|
||||||
|
|
||||||
|
### 3.3 REST API Routes
|
||||||
|
|
||||||
|
- [ ] **3.3.1** Create `modules/ai-diagnostics/routes.ts`
|
||||||
|
- [ ] `POST /ai-diagnostics/query` — natural language question
|
||||||
|
- [ ] `GET /ai-diagnostics/clusters/:id/analysis` — pre-computed insight
|
||||||
|
- [ ] `POST /ai-diagnostics/clusters/:id/analyze` — trigger fresh analysis
|
||||||
|
- [ ] `GET /ai-diagnostics/suggestions` — auto-suggested investigations
|
||||||
|
- [ ] `POST /ai-diagnostics/feedback` — submit insight rating
|
||||||
|
|
||||||
|
**Phase 3 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Natural language queries parse correctly (90%+ intent accuracy)
|
||||||
|
- [ ] Query → response pipeline < 3 seconds
|
||||||
|
- [ ] Complex queries return structured answers with evidence
|
||||||
|
- [ ] API routes tested and documented
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Admin Dashboard UI (Week 2–3)
|
||||||
|
|
||||||
|
### 4.1 AI Insights Page
|
||||||
|
|
||||||
|
- [ ] **4.1.1** Create `/ops/ai-diagnostics/page.tsx`
|
||||||
|
- [ ] Smart search bar (natural language input)
|
||||||
|
- [ ] Suggested queries based on recent errors
|
||||||
|
- [ ] Recent AI-generated insights list
|
||||||
|
- [ ] Trending clusters (auto-detected anomalies)
|
||||||
|
- [ ] **4.1.2** Query results view
|
||||||
|
- [ ] AI-generated answer with confidence badge
|
||||||
|
- [ ] Supporting evidence cards (cluster stats, sample errors)
|
||||||
|
- [ ] Related debug sessions (linked traces)
|
||||||
|
- [ ] Timeline visualization of error pattern
|
||||||
|
- [ ] "Investigate further" actions
|
||||||
|
|
||||||
|
### 4.2 Cluster Detail with AI Analysis
|
||||||
|
|
||||||
|
- [ ] **4.2.1** Enhance error cluster detail
|
||||||
|
- [ ] AI-generated summary card ("This appears to be...")
|
||||||
|
- [ ] Root cause hypothesis with confidence
|
||||||
|
- [ ] Evidence breakdown (stack samples, device patterns, API failures)
|
||||||
|
- [ ] Suggested fixes from similar resolved issues
|
||||||
|
- [ ] "Request deeper analysis" button (GPT-4o)
|
||||||
|
- [ ] **4.2.2** Interactive investigation
|
||||||
|
- [ ] Compare with other clusters ("Show me similar issues")
|
||||||
|
- [ ] Filter by context (OS version, app version, feature flags)
|
||||||
|
- [ ] View affected user journeys (breadcrumb trails)
|
||||||
|
|
||||||
|
### 4.3 Proactive Alerts
|
||||||
|
|
||||||
|
- [ ] **4.3.1** Anomaly detection
|
||||||
|
- [ ] Auto-detect emerging error clusters
|
||||||
|
- [ ] Spike in existing cluster frequency
|
||||||
|
- [ ] New error types after releases
|
||||||
|
- [ ] **4.3.2** AI-generated alerts
|
||||||
|
- [ ] Slack/Teams notification with summary
|
||||||
|
- [ ] "Investigate in AI Diagnostics" deep link
|
||||||
|
- [ ] Auto-started debug session recommendations
|
||||||
|
|
||||||
|
**Phase 4 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Admin can ask questions and get AI-generated answers
|
||||||
|
- [ ] Cluster detail shows AI analysis with evidence
|
||||||
|
- [ ] Proactive alerts for emerging issues
|
||||||
|
- [ ] Full test coverage (UI + API)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Advanced Capabilities (Future)
|
||||||
|
|
||||||
|
### 5.1 Multi-Modal Analysis
|
||||||
|
|
||||||
|
- [ ] Analyze screenshots from debug sessions for UI issues
|
||||||
|
- [ ] Voice transcription analysis (for voice app errors)
|
||||||
|
- [ ] Performance trace visualization with AI annotations
|
||||||
|
|
||||||
|
### 5.2 Predictive Diagnostics
|
||||||
|
|
||||||
|
- [ ] Pre-crash pattern detection (warn before crash happens)
|
||||||
|
- [ ] Resource exhaustion prediction (memory, disk, API quotas)
|
||||||
|
- [ ] Config drift detection ("this setting combination often fails")
|
||||||
|
|
||||||
|
### 5.3 Self-Healing Suggestions
|
||||||
|
|
||||||
|
- [ ] Auto-generated config recommendations
|
||||||
|
- [ ] Feature flag rollback suggestions
|
||||||
|
- [ ] Circuit breaker threshold recommendations
|
||||||
|
|
||||||
|
## Implementation Tracking
|
||||||
|
|
||||||
|
| Phase | Task | Status | Commit |
|
||||||
|
| ----- | -------------------------- | ------ | ------ |
|
||||||
|
| 1.1 | Error clustering types | ⬜ | — |
|
||||||
|
| 1.1 | Cosmos containers | ⬜ | — |
|
||||||
|
| 1.1 | Error normalization | ⬜ | — |
|
||||||
|
| 1.2 | Embedding pipeline | ⬜ | — |
|
||||||
|
| 1.2 | Vector search setup | ⬜ | — |
|
||||||
|
| 1.2 | Clustering algorithm | ⬜ | — |
|
||||||
|
| 1.3 | Telemetry linking | ⬜ | — |
|
||||||
|
| 1.3 | Error context enrichment | ⬜ | — |
|
||||||
|
| 2.1 | Analysis prompts | ⬜ | — |
|
||||||
|
| 2.1 | LLM integration | ⬜ | — |
|
||||||
|
| 2.2 | Insight generation service | ⬜ | — |
|
||||||
|
| 2.2 | Analysis workflow | ⬜ | — |
|
||||||
|
| 2.2 | Confidence scoring | ⬜ | — |
|
||||||
|
| 2.3 | Feedback loop | ⬜ | — |
|
||||||
|
| 2.3 | Pattern accumulation | ⬜ | — |
|
||||||
|
| 3.1 | Query parser | ⬜ | — |
|
||||||
|
| 3.1 | Query patterns | ⬜ | — |
|
||||||
|
| 3.2 | Query execution | ⬜ | — |
|
||||||
|
| 3.2 | Response generation | ⬜ | — |
|
||||||
|
| 3.3 | REST API routes | ⬜ | — |
|
||||||
|
| 4.1 | AI insights page | ⬜ | — |
|
||||||
|
| 4.1 | Query results view | ⬜ | — |
|
||||||
|
| 4.2 | Cluster detail | ⬜ | — |
|
||||||
|
| 4.2 | Interactive investigation | ⬜ | — |
|
||||||
|
| 4.3 | Proactive alerts | ⬜ | — |
|
||||||
|
|
||||||
|
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference for Implementing Agent
|
||||||
|
|
||||||
|
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/AI_DIAGNOSTIC_ASSISTANT_ROADMAP.md`
|
||||||
|
|
||||||
|
**Key Files to Modify/Create:**
|
||||||
|
|
||||||
|
```
|
||||||
|
services/platform-service/
|
||||||
|
├── src/
|
||||||
|
│ ├── modules/ai-diagnostics/
|
||||||
|
│ │ ├── types.ts # [1.1.1] Error clustering types
|
||||||
|
│ │ ├── repository.ts # [1.2] Data access layer
|
||||||
|
│ │ ├── analyzer.ts # [2.2] LLM analysis engine
|
||||||
|
│ │ ├── query-parser.ts # [3.1] NL query understanding
|
||||||
|
│ │ ├── query-executor.ts # [3.2] Query execution
|
||||||
|
│ │ ├── routes.ts # [3.3] REST API
|
||||||
|
│ │ └── ai-diagnostics.test.ts # Tests
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ ├── cosmos-init.ts # [1.1.2] Add containers
|
||||||
|
│ │ ├── embedding-client.ts # [1.2.1] Azure OpenAI embeddings
|
||||||
|
│ │ └── pii-redaction.ts # Reuse existing
|
||||||
|
│ └── server.ts # [3.3] Register routes
|
||||||
|
dashboards/admin-web/
|
||||||
|
├── src/
|
||||||
|
│ ├── app/(dashboard)/
|
||||||
|
│ │ ├── ai-diagnostics/
|
||||||
|
│ │ │ ├── page.tsx # [4.1] Main insights page
|
||||||
|
│ │ │ └── [id]/
|
||||||
|
│ │ │ └── page.tsx # [4.2] Cluster detail
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ └── ai-diagnostics-client.ts # API client
|
||||||
|
│ └── components/
|
||||||
|
│ └── ai-diagnostics/ # Reusable components
|
||||||
|
```
|
||||||
|
|
||||||
|
**Commit Message Format:**
|
||||||
|
|
||||||
|
```
|
||||||
|
feat(ai-diagnostics): <description> [<task.code>]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add services/platform-service/src/modules/ai-diagnostics/
|
||||||
|
git commit -m "feat(ai-diagnostics): add error clustering types and cosmos containers [1.1.1-1.1.2]"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Testing Requirements:**
|
||||||
|
|
||||||
|
- Unit tests: 20+ Vitest tests for clustering, embeddings, LLM responses
|
||||||
|
- Integration tests: End-to-end query → analysis pipeline
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Telemetry module (error events)
|
||||||
|
- Azure OpenAI (embeddings + GPT-4o)
|
||||||
|
- Existing diagnostics module (optional linking)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ErrorClusterDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface ErrorClusterDoc {
|
||||||
|
id: string; // ec_<uuid>
|
||||||
|
productId: string; // partition key
|
||||||
|
fingerprintHash: string; // SHA-256 of normalized error
|
||||||
|
|
||||||
|
// Cluster metadata
|
||||||
|
firstSeenAt: string; // ISO 8601
|
||||||
|
lastSeenAt: string;
|
||||||
|
occurrenceCount: number; // Total occurrences
|
||||||
|
uniqueUsers: number; // Affected user count
|
||||||
|
|
||||||
|
// Error signature
|
||||||
|
errorType: string; // Exception class/name
|
||||||
|
messageTemplate: string; // Normalized message with placeholders
|
||||||
|
stackSignature: string; // Normalized stack frames
|
||||||
|
|
||||||
|
// Vector embedding for semantic search
|
||||||
|
embedding: number[]; // 1536-dim from text-embedding-3-small
|
||||||
|
embeddingVersion: string; // Model version for re-embedding
|
||||||
|
|
||||||
|
// Context patterns (auto-extracted)
|
||||||
|
commonContext: {
|
||||||
|
osVersions: Array<{ version: string; count: number }>;
|
||||||
|
appVersions: Array<{ version: string; count: number }>;
|
||||||
|
deviceModels: Array<{ model: string; count: number }>;
|
||||||
|
screenContexts: Array<{ screen: string; count: number }>;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Related data
|
||||||
|
relatedClusterIds: string[]; // Similar clusters (vector similarity)
|
||||||
|
mergedIntoClusterId?: string; // If deduplicated
|
||||||
|
|
||||||
|
// Resolution tracking
|
||||||
|
status: 'active' | 'investigating' | 'resolved' | 'ignored';
|
||||||
|
resolvedAt?: string;
|
||||||
|
resolutionCommit?: string; // Link to fix
|
||||||
|
|
||||||
|
// Timestamps
|
||||||
|
createdAt: string;
|
||||||
|
updatedAt: string;
|
||||||
|
ttl: number; // 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### DiagnosticInsightDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface DiagnosticInsightDoc {
|
||||||
|
id: string; // di_<uuid>
|
||||||
|
clusterId: string; // partition key (with productId)
|
||||||
|
productId: string;
|
||||||
|
|
||||||
|
// AI-generated analysis
|
||||||
|
analysisType: 'root_cause' | 'pattern' | 'comparison' | 'trend';
|
||||||
|
generatedAt: string;
|
||||||
|
|
||||||
|
// LLM output
|
||||||
|
rootCauseCategory: 'config' | 'dependency' | 'logic' | 'resource' | 'external' | 'unknown';
|
||||||
|
hypothesis: string; // Natural language explanation
|
||||||
|
reasoning: string; // Why LLM thinks this
|
||||||
|
confidence: 'high' | 'medium' | 'low';
|
||||||
|
confidenceScore: number; // 0.0–1.0
|
||||||
|
|
||||||
|
// Evidence
|
||||||
|
evidence: Array<{
|
||||||
|
type:
|
||||||
|
| 'stack_trace'
|
||||||
|
| 'telemetry_pattern'
|
||||||
|
| 'device_correlation'
|
||||||
|
| 'api_failure'
|
||||||
|
| 'similar_issue';
|
||||||
|
description: string;
|
||||||
|
strength: 'strong' | 'moderate' | 'weak';
|
||||||
|
data: Record<string, unknown>;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Suggested actions
|
||||||
|
suggestedInvestigation: string[];
|
||||||
|
potentialFixDirection?: string;
|
||||||
|
similarResolvedIssues?: Array<{
|
||||||
|
clusterId: string;
|
||||||
|
resolution: string;
|
||||||
|
confidence: number;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Feedback
|
||||||
|
feedbackStats: {
|
||||||
|
helpful: number;
|
||||||
|
notHelpful: number;
|
||||||
|
engineerNotes: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
// LLM metadata
|
||||||
|
modelUsed: string; // gpt-4o, gpt-4o-mini
|
||||||
|
promptTokens: number;
|
||||||
|
completionTokens: number;
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
ttl: number; // 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### NaturalLanguageQueryDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface NaturalLanguageQueryDoc {
|
||||||
|
id: string; // nq_<uuid>
|
||||||
|
userId: string; // Admin who asked
|
||||||
|
productId?: string; // Optional filter
|
||||||
|
|
||||||
|
// Query
|
||||||
|
rawQuery: string; // "Why did iOS keyboard crash yesterday?"
|
||||||
|
parsedIntent: 'root_cause' | 'pattern_search' | 'comparison' | 'trend' | 'impact';
|
||||||
|
extractedEntities: {
|
||||||
|
products?: string[];
|
||||||
|
timeRange?: { start: string; end: string };
|
||||||
|
errorTypes?: string[];
|
||||||
|
platforms?: string[];
|
||||||
|
userSegments?: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
// Execution
|
||||||
|
executedQuery: string; // Translated Cosmos query
|
||||||
|
dataSources: string[]; // Clusters, telemetry, sessions accessed
|
||||||
|
executionTimeMs: number;
|
||||||
|
|
||||||
|
// Response
|
||||||
|
aiResponse: string; // Generated answer
|
||||||
|
confidence: number; // Overall confidence
|
||||||
|
supportingData: Array<{
|
||||||
|
type: 'cluster' | 'telemetry' | 'session';
|
||||||
|
id: string;
|
||||||
|
relevanceScore: number;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Feedback
|
||||||
|
userRating?: 'helpful' | 'not_helpful';
|
||||||
|
userComment?: string;
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
ttl: number; // 30 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix B: API Reference
|
||||||
|
|
||||||
|
| Method | Endpoint | Auth | Description |
|
||||||
|
| ------ | --------------------------------------- | ----- | --------------------------------------- |
|
||||||
|
| POST | `/ai-diagnostics/query` | Admin | Natural language diagnostic query |
|
||||||
|
| GET | `/ai-diagnostics/clusters` | Admin | List error clusters (with AI summaries) |
|
||||||
|
| GET | `/ai-diagnostics/clusters/:id` | Admin | Cluster detail with AI analysis |
|
||||||
|
| POST | `/ai-diagnostics/clusters/:id/analyze` | Admin | Trigger fresh LLM analysis |
|
||||||
|
| GET | `/ai-diagnostics/clusters/:id/analysis` | Admin | Get pre-computed insight |
|
||||||
|
| GET | `/ai-diagnostics/suggestions` | Admin | AI-suggested investigations |
|
||||||
|
| POST | `/ai-diagnostics/feedback` | Admin | Rate insight helpfulness |
|
||||||
|
| POST | `/ai-diagnostics/search` | Admin | Semantic search across errors |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix C: Integration Points
|
||||||
|
|
||||||
|
### With Telemetry Module
|
||||||
|
|
||||||
|
- Error events auto-create/update clusters
|
||||||
|
- Telemetry context enriches error analysis
|
||||||
|
- Correlation IDs link errors to user journeys
|
||||||
|
|
||||||
|
### With Diagnostics Module
|
||||||
|
|
||||||
|
- Debug sessions linked to error clusters
|
||||||
|
- Screenshots from sessions aid visual analysis
|
||||||
|
- Network traces provide API failure context
|
||||||
|
|
||||||
|
### With Event Bus
|
||||||
|
|
||||||
|
| Event | Action |
|
||||||
|
| ------------------------------- | --------------------------------------------------------- |
|
||||||
|
| `telemetry.error.ingested` | Update/create cluster, trigger re-analysis if new pattern |
|
||||||
|
| `diagnostics.session.completed` | Link session to related clusters, analyze captured logs |
|
||||||
|
| `diagnostics.ingest.fatal` | High-priority cluster analysis, alert if novel pattern |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix D: Cost Estimation
|
||||||
|
|
||||||
|
| Component | Monthly Cost (est.) |
|
||||||
|
| ------------------------ | ------------------------------- |
|
||||||
|
| Azure OpenAI embeddings | $50–100 (10K errors/day) |
|
||||||
|
| GPT-4o-mini analysis | $100–200 (1K analyses/day) |
|
||||||
|
| GPT-4o deep analysis | $50–100 (100 deep analyses/day) |
|
||||||
|
| Cosmos DB vector storage | $20–50 |
|
||||||
|
| **Total** | **$220–450/month** |
|
||||||
|
|
||||||
|
Optimization:
|
||||||
|
|
||||||
|
- Cache frequent cluster analyses (24hr TTL)
|
||||||
|
- Use GPT-4o-mini for 90% of queries
|
||||||
|
- Batch embedding jobs during off-peak
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Status
|
||||||
|
|
||||||
|
- [ ] **Design complete** — Target: 2026-03-10
|
||||||
|
- [ ] **Phase 1: Data Pipeline** — Not started
|
||||||
|
- [ ] **Phase 2: LLM Engine** — Not started
|
||||||
|
- [ ] **Phase 3: Query Interface** — Not started
|
||||||
|
- [ ] **Phase 4: Admin UI** — Not started
|
||||||
|
- [ ] **Phase 5: Advanced Capabilities** — Future
|
||||||
|
|
||||||
|
**Estimated Timeline:** 2–3 weeks (Phases 1–4)
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Telemetry module (must be collecting errors)
|
||||||
|
- Diagnostics module (optional, for rich context)
|
||||||
|
- Azure OpenAI deployment (embedding + GPT-4o access)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Last Updated: 2026-03-03_
|
||||||
719
docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
Normal file
719
docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md
Normal file
@ -0,0 +1,719 @@
|
|||||||
|
# Intelligent A/B Testing — Implementation Roadmap
|
||||||
|
|
||||||
|
> **Module:** `platform-service/src/modules/ab-testing/`
|
||||||
|
> **Admin UI:** `/ops/experiments/`
|
||||||
|
> **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation
|
||||||
|
> **Estimated Effort:** 2.5–3 weeks
|
||||||
|
> **Status:** 🟡 Planning
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns.
|
||||||
|
|
||||||
|
### Key Differentiators vs. Static Feature Flags
|
||||||
|
|
||||||
|
| Capability | Static Flags (Current) | Intelligent A/B Testing |
|
||||||
|
| ------------------ | ---------------------- | ----------------------------------------- |
|
||||||
|
| Traffic Allocation | Manual percentage | **Multi-armed bandit optimization** |
|
||||||
|
| Stopping Decision | Manual monitoring | **Auto-stop at statistical significance** |
|
||||||
|
| Winner Selection | Human judgment | **Bayesian probability of superiority** |
|
||||||
|
| Test Duration | Fixed (often wrong) | **Dynamic based on effect size** |
|
||||||
|
| Hypothesis | Human-written | **AI-generated from usage patterns** |
|
||||||
|
| Sample Size | Guesswork | **Power analysis + sequential testing** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Core Experiment Engine (Week 1)
|
||||||
|
|
||||||
|
### 1.1 Data Model & Schemas
|
||||||
|
|
||||||
|
- [ ] **1.1.1** Create `modules/ab-testing/types.ts`
|
||||||
|
- [ ] `ExperimentDoc` — experiment definition and config
|
||||||
|
- [ ] `VariantDoc` — variant metadata + metrics
|
||||||
|
- [ ] `AssignmentDoc` — user → variant assignments
|
||||||
|
- [ ] `MetricDoc` — event types being tracked
|
||||||
|
- [ ] `ExperimentResult` — statistical analysis results
|
||||||
|
- [ ] Zod schemas for all inputs
|
||||||
|
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
|
||||||
|
- [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed)
|
||||||
|
- [ ] `experiment_variants` (pk: `/experimentId`)
|
||||||
|
- [ ] `experiment_assignments` (pk: `/userId`, query by experiment)
|
||||||
|
- [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series)
|
||||||
|
- [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates)
|
||||||
|
|
||||||
|
### 1.2 Assignment & Bucketing
|
||||||
|
|
||||||
|
- [ ] **1.2.1** Create deterministic bucketing
|
||||||
|
- [ ] Consistent hashing (userId + experimentId → variant)
|
||||||
|
- [ ] FNV-1a hash algorithm (same as feature flags)
|
||||||
|
- [ ] Sticky assignments (user always sees same variant)
|
||||||
|
- [ ] Override capability (force specific variant for QA)
|
||||||
|
- [ ] **1.2.2** Assignment strategies
|
||||||
|
- [ ] `random` — Simple randomization (control vs static)
|
||||||
|
- [ ] `thompson` — Thompson sampling (multi-armed bandit)
|
||||||
|
- [ ] `epsilon_greedy` — Epsilon-greedy exploration
|
||||||
|
- [ ] `ucb` — Upper Confidence Bound algorithm
|
||||||
|
- [ ] **1.2.3** Audience targeting
|
||||||
|
- [ ] User property filters (platform, version, region, subscription tier)
|
||||||
|
- [ ] Percentage rollout within target segment
|
||||||
|
- [ ] Exclusion lists (beta users, internal accounts)
|
||||||
|
|
||||||
|
### 1.3 Event Tracking Pipeline
|
||||||
|
|
||||||
|
- [ ] **1.3.1** Metric definitions
|
||||||
|
- [ ] `conversion` — Binary (did/didn't convert)
|
||||||
|
- [ ] `count` — Integer events (sessions, messages)
|
||||||
|
- [ ] `duration` — Time-based (session length, task time)
|
||||||
|
- [ ] `revenue` — Monetary (purchase amount, LTV)
|
||||||
|
- [ ] `custom` — Arbitrary numeric values
|
||||||
|
- [ ] **1.3.2** Event ingestion
|
||||||
|
- [ ] `POST /ab-testing/events` batch endpoint
|
||||||
|
- [ ] Client SDK: `track(event, value, metadata)`
|
||||||
|
- [ ] Automatic attribution (which variant caused this event)
|
||||||
|
- [ ] Deduplication (eventId + userId uniqueness)
|
||||||
|
|
||||||
|
**Phase 1 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Experiments created with multiple variants
|
||||||
|
- [ ] Users consistently assigned to variants
|
||||||
|
- [ ] Events tracked and attributed correctly
|
||||||
|
- [ ] 20+ tests for assignment and ingestion
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Statistical Analysis Engine (Week 1–2)
|
||||||
|
|
||||||
|
### 2.1 Bayesian Inference
|
||||||
|
|
||||||
|
- [ ] **2.1.1** Create `modules/ab-testing/statistics.ts`
|
||||||
|
- [ ] `BetaDistribution` for conversion rates
|
||||||
|
- [ ] `GammaDistribution` for count/duration metrics
|
||||||
|
- [ ] `NormalDistribution` for continuous metrics
|
||||||
|
- [ ] Monte Carlo simulation (10,000 samples)
|
||||||
|
- [ ] **2.1.2** Probability calculations
|
||||||
|
- [ ] `probabilityVariantBeatsControl(variant, control)`
|
||||||
|
- [ ] `expectedLossIfChosen(variant)`
|
||||||
|
- [ ] `probabilityBeatAllVariants(variant)`
|
||||||
|
- [ ] **2.1.3** Credible intervals
|
||||||
|
- [ ] 95% credible interval for each variant's true metric
|
||||||
|
- [ ] Visualization-ready (lower, mean, upper bounds)
|
||||||
|
|
||||||
|
### 2.2 Early Stopping Rules
|
||||||
|
|
||||||
|
- [ ] **2.2.1** Stopping criteria
|
||||||
|
- [ ] **Winner found:** Variant has > 95% probability of beating control
|
||||||
|
- [ ] **Loser clear:** Control has > 95% probability of beating variant
|
||||||
|
- [ ] **Practical significance:** Minimum detectable effect not reached
|
||||||
|
- [ ] **Time bound:** Max duration reached (safety limit)
|
||||||
|
- [ ] **2.2.2** Auto-promotion
|
||||||
|
- [ ] Auto-rollout winner to 100% when threshold hit
|
||||||
|
- [ ] Notify admins via Slack/email
|
||||||
|
- [ ] Create audit log entry
|
||||||
|
- [ ] **2.2.3** Guardrails
|
||||||
|
- [ ] Minimum sample size before early stopping (100 users/variant)
|
||||||
|
- [ ] Business hours only for auto-actions
|
||||||
|
- [ ] Require approval for revenue-impacting experiments
|
||||||
|
|
||||||
|
### 2.3 Thompson Sampling
|
||||||
|
|
||||||
|
- [ ] **2.3.1** Multi-armed bandit implementation
|
||||||
|
- [ ] Sample from posterior distributions
|
||||||
|
- [ ] Assign user to variant with highest sample
|
||||||
|
- [ ] Re-balance traffic every hour based on performance
|
||||||
|
- [ ] **2.3.2** Exploration vs exploitation
|
||||||
|
- [ ] Exploration rate decays over time
|
||||||
|
- [ ] High uncertainty = more exploration
|
||||||
|
- [ ] Clear winner = more traffic to winner
|
||||||
|
- [ ] **2.3.3** Regret minimization
|
||||||
|
- [ ] Track cumulative regret vs optimal variant
|
||||||
|
- [ ] Regret bounds reporting
|
||||||
|
|
||||||
|
**Phase 2 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Bayesian probabilities calculated correctly
|
||||||
|
- [ ] Early stopping triggers at appropriate thresholds
|
||||||
|
- [ ] Thompson sampling re-allocates traffic dynamically
|
||||||
|
- [ ] Statistical tests validate correctness
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: AI-Powered Hypothesis Generation (Week 2)
|
||||||
|
|
||||||
|
### 3.1 Pattern Detection
|
||||||
|
|
||||||
|
- [ ] **3.1.1** Usage pattern analysis
|
||||||
|
- [ ] Analyze feature flag usage telemetry
|
||||||
|
- [ ] Segment analysis (iOS vs Android, free vs pro)
|
||||||
|
- [ ] Temporal patterns (day of week, time of day)
|
||||||
|
- [ ] User behavior sequences (funnel analysis)
|
||||||
|
- [ ] **3.1.2** Anomaly detection
|
||||||
|
- [ ] Unexpected drop in feature adoption
|
||||||
|
- [ ] Performance regression signals
|
||||||
|
- [ ] User segment showing different behavior
|
||||||
|
- [ ] **3.1.3** Opportunity identification
|
||||||
|
- [ ] Underperforming features (low adoption)
|
||||||
|
- [ ] High-dropoff flows
|
||||||
|
- [ ] Competitor feature gaps
|
||||||
|
|
||||||
|
### 3.2 Hypothesis Generation
|
||||||
|
|
||||||
|
- [ ] **3.2.1** LLM hypothesis prompts
|
||||||
|
|
||||||
|
```
|
||||||
|
Given this feature usage data:
|
||||||
|
- Feature: {featureName}
|
||||||
|
- Current adoption: {adoptionRate}% (baseline: {baseline}%)
|
||||||
|
- Segment performance: {segmentData}
|
||||||
|
- User feedback: {feedbackSamples}
|
||||||
|
- Competitor analysis: {competitorFeatures}
|
||||||
|
|
||||||
|
Generate experiment hypotheses:
|
||||||
|
1. Primary hypothesis: "Changing X will improve Y because..."
|
||||||
|
2. Secondary hypotheses (2-3 alternatives)
|
||||||
|
3. Expected effect size (conservative estimate)
|
||||||
|
4. Success metric recommendation
|
||||||
|
5. Risk assessment
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **3.2.2** Hypothesis ranking
|
||||||
|
- [ ] Expected impact scoring
|
||||||
|
- [ ] Implementation difficulty estimate
|
||||||
|
- [ ] Statistical power prediction
|
||||||
|
- [ ] Risk-adjusted expected value
|
||||||
|
- [ ] **3.2.3** Suggested experiment design
|
||||||
|
- [ ] Variant count recommendation
|
||||||
|
- [ ] Traffic allocation suggestion
|
||||||
|
- [ ] Duration estimate
|
||||||
|
- [ ] Required sample size calculation
|
||||||
|
|
||||||
|
### 3.3 Auto-Experiment Suggestions
|
||||||
|
|
||||||
|
- [ ] **3.3.1** Weekly AI reports
|
||||||
|
- [ ] Top 5 experiment opportunities
|
||||||
|
- [ ] Hypotheses with supporting evidence
|
||||||
|
- [ ] Prioritized by expected impact
|
||||||
|
- [ ] **3.3.2** One-click experiment creation
|
||||||
|
- [ ] Pre-fill experiment from hypothesis
|
||||||
|
- [ ] Suggested variants with descriptions
|
||||||
|
- [ ] Pre-configured metrics
|
||||||
|
|
||||||
|
**Phase 3 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] AI generates meaningful hypotheses from usage data
|
||||||
|
- [ ] Hypothesis quality rated by product team (80%+ useful)
|
||||||
|
- [ ] Auto-suggested experiments created in 1 click
|
||||||
|
- [ ] Weekly reports generated automatically
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Admin Dashboard UI (Week 2–3)
|
||||||
|
|
||||||
|
### 4.1 Experiments List Page
|
||||||
|
|
||||||
|
- [ ] **4.1.1** Create `/ops/experiments/page.tsx`
|
||||||
|
- [ ] Experiment cards (status, duration, sample size)
|
||||||
|
- [ ] Quick filters (running, completed, draft)
|
||||||
|
- [ ] AI-generated hypothesis badge
|
||||||
|
- [ ] Health indicators (traffic balance, event flow)
|
||||||
|
- [ ] **4.1.2** Experiment creation wizard
|
||||||
|
- [ ] Step 1: Define hypothesis (AI suggestions available)
|
||||||
|
- [ ] Step 2: Create variants (name, description, config)
|
||||||
|
- [ ] Step 3: Select metrics (primary + secondary)
|
||||||
|
- [ ] Step 4: Audience targeting
|
||||||
|
- [ ] Step 5: Traffic allocation (manual or Thompson)
|
||||||
|
- [ ] Step 6: Review and launch
|
||||||
|
|
||||||
|
### 4.2 Live Experiment Dashboard
|
||||||
|
|
||||||
|
- [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx`
|
||||||
|
- [ ] Real-time metrics comparison
|
||||||
|
- [ ] Variant performance table (conversions, counts, durations)
|
||||||
|
- [ ] Bayesian probability visualization
|
||||||
|
- [ ] Credible interval charts
|
||||||
|
- [ ] **4.2.2** Statistical summary card
|
||||||
|
- [ ] Probability of beating control (per variant)
|
||||||
|
- [ ] Expected lift if implemented
|
||||||
|
- [ ] Sample size progress bar
|
||||||
|
- [ ] Days to significance estimate
|
||||||
|
- [ ] **4.2.3** Action buttons
|
||||||
|
- [ ] Adjust traffic allocation
|
||||||
|
- [ ] Pause/resume experiment
|
||||||
|
- [ ] Stop and declare winner
|
||||||
|
- [ ] Rollout winner to 100%
|
||||||
|
- [ ] Archive experiment
|
||||||
|
|
||||||
|
### 4.3 Results & Reporting
|
||||||
|
|
||||||
|
- [ ] **4.3.1** Results page
|
||||||
|
- [ ] Final statistical summary
|
||||||
|
- [ ] Variant comparison visualization
|
||||||
|
- [ ] Segment breakdown (iOS vs Android, etc.)
|
||||||
|
- [ ] Confidence intervals over time
|
||||||
|
- [ ] **4.3.2** AI insights panel
|
||||||
|
- [ ] Why this result occurred (LLM summary)
|
||||||
|
- [ ] Unexpected findings
|
||||||
|
- [ ] Follow-up experiment suggestions
|
||||||
|
- [ ] **4.3.3** Export capabilities
|
||||||
|
- [ ] CSV export of raw data
|
||||||
|
- [ ] PDF report generation
|
||||||
|
- [ ] API endpoint for data warehouse sync
|
||||||
|
|
||||||
|
**Phase 4 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Full experiment lifecycle manageable in UI
|
||||||
|
- [ ] Real-time stats visible and accurate
|
||||||
|
- [ ] Bayesian visualizations clear to non-statisticians
|
||||||
|
- [ ] Export and reporting functional
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Advanced Capabilities (Future)
|
||||||
|
|
||||||
|
### 5.1 Multi-Variate Testing
|
||||||
|
|
||||||
|
- [ ] Test multiple variables simultaneously
|
||||||
|
- [ ] Full factorial and fractional factorial designs
|
||||||
|
- [ ] Interaction effect detection
|
||||||
|
|
||||||
|
### 5.2 Sequential Experimentation
|
||||||
|
|
||||||
|
- [ ] Multi-phase experiments (qualification → main → validation)
|
||||||
|
- [ ] Holdout groups for long-term validation
|
||||||
|
- [ ] Global holdout (never-exposed users)
|
||||||
|
|
||||||
|
### 5.3 Personalization Layer
|
||||||
|
|
||||||
|
- [ ] Contextual bandits (different variants for different users)
|
||||||
|
- [ ] ML model for variant selection
|
||||||
|
- [ ] Automatic personalization optimization
|
||||||
|
|
||||||
|
### 5.4 Experiment Coordination
|
||||||
|
|
||||||
|
- [ ] Mutually exclusive experiments
|
||||||
|
- [ ] Experiment priority rules
|
||||||
|
- [ ] Layered experimentation (orthogonal tests)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix A: Data Models
|
||||||
|
|
||||||
|
### ExperimentDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface ExperimentDoc {
|
||||||
|
id: string; // exp_<uuid>
|
||||||
|
productId: string; // partition key
|
||||||
|
|
||||||
|
// Experiment definition
|
||||||
|
name: string;
|
||||||
|
description: string;
|
||||||
|
hypothesis: string;
|
||||||
|
aiGeneratedHypothesis?: boolean; // Flag for AI-suggested
|
||||||
|
|
||||||
|
// Status lifecycle: draft → running → paused | stopped | completed
|
||||||
|
status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';
|
||||||
|
|
||||||
|
// Variants
|
||||||
|
controlVariantId: string; // Baseline variant
|
||||||
|
variantIds: string[]; // All variant IDs
|
||||||
|
|
||||||
|
// Configuration
|
||||||
|
allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
|
||||||
|
targetPercent: number; // % of eligible traffic
|
||||||
|
|
||||||
|
// Audience targeting
|
||||||
|
targeting: {
|
||||||
|
platforms?: string[]; // ios, android, web
|
||||||
|
appVersions?: { min: string; max?: string };
|
||||||
|
regions?: string[];
|
||||||
|
userSegments?: string[]; // pro, free, enterprise
|
||||||
|
userProperties?: Record<string, string | number | boolean>;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Metrics
|
||||||
|
primaryMetric: {
|
||||||
|
name: string;
|
||||||
|
type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
|
||||||
|
eventName: string; // Telemetry event to track
|
||||||
|
aggregation: 'sum' | 'mean' | 'count' | 'unique';
|
||||||
|
direction: 'increase' | 'decrease'; // Is higher better?
|
||||||
|
minimumDetectableEffect: number; // % change we want to detect
|
||||||
|
};
|
||||||
|
secondaryMetrics: Array<{
|
||||||
|
name: string;
|
||||||
|
type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
|
||||||
|
eventName: string;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Guardrails
|
||||||
|
guardrails: {
|
||||||
|
minSampleSizePerVariant: number; // Default: 100
|
||||||
|
maxDurationDays: number; // Safety limit, default: 30
|
||||||
|
autoStopEnabled: boolean;
|
||||||
|
winnerThreshold: number; // % probability to auto-stop, default: 95
|
||||||
|
requireApprovalFor: 'none' | 'revenue' | 'all';
|
||||||
|
};
|
||||||
|
|
||||||
|
// Scheduling
|
||||||
|
startAt?: string; // Scheduled start (ISO 8601)
|
||||||
|
endAt?: string; // Scheduled end or actual stop
|
||||||
|
|
||||||
|
// Stats (denormalized for fast reads)
|
||||||
|
totalParticipants: number;
|
||||||
|
totalEvents: number;
|
||||||
|
|
||||||
|
// Timestamps
|
||||||
|
createdAt: string;
|
||||||
|
updatedAt: string;
|
||||||
|
startedAt?: string;
|
||||||
|
completedAt?: string;
|
||||||
|
ttl: number; // 2 years for completed
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### VariantDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface VariantDoc {
|
||||||
|
id: string; // var_<uuid>
|
||||||
|
experimentId: string; // partition key
|
||||||
|
|
||||||
|
// Variant definition
|
||||||
|
name: string; // "Control", "New Button Color", etc.
|
||||||
|
description?: string;
|
||||||
|
isControl: boolean;
|
||||||
|
|
||||||
|
// Feature flag configuration
|
||||||
|
flagConfig: Record<string, unknown>; // Arbitrary config payload
|
||||||
|
|
||||||
|
// Traffic allocation (dynamic for bandit strategies)
|
||||||
|
currentAllocationPercent: number; // 0–100%
|
||||||
|
|
||||||
|
// Statistics (real-time computed)
|
||||||
|
stats: {
|
||||||
|
participants: number;
|
||||||
|
events: number;
|
||||||
|
|
||||||
|
// Primary metric
|
||||||
|
primaryMetricValue: number; // Mean or conversion rate
|
||||||
|
primaryMetricStdDev?: number;
|
||||||
|
|
||||||
|
// For conversion metrics
|
||||||
|
conversions?: number;
|
||||||
|
conversionRate?: number; // 0–1
|
||||||
|
|
||||||
|
// Bayesian posterior parameters
|
||||||
|
betaAlpha?: number; // For Beta distribution
|
||||||
|
betaBeta?: number;
|
||||||
|
|
||||||
|
gammaShape?: number; // For Gamma distribution
|
||||||
|
gammaScale?: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Bayesian results
|
||||||
|
bayesianResults?: {
|
||||||
|
probabilityBeatsControl: number; // 0–1
|
||||||
|
probabilityBeatsAll: number; // 0–1
|
||||||
|
expectedLiftPercent: number; // Relative to control
|
||||||
|
expectedLoss: number; // Risk of choosing this variant
|
||||||
|
credibleInterval: {
|
||||||
|
lower: number;
|
||||||
|
mean: number;
|
||||||
|
upper: number;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
updatedAt: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### ExperimentAssignmentDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface ExperimentAssignmentDoc {
|
||||||
|
id: string; // ea_<uuid>
|
||||||
|
userId: string; // partition key (for user lookups)
|
||||||
|
|
||||||
|
experimentId: string;
|
||||||
|
variantId: string;
|
||||||
|
|
||||||
|
// Assignment metadata
|
||||||
|
assignedAt: string; // First assignment
|
||||||
|
firstExposedAt?: string; // First actual exposure (feature use)
|
||||||
|
|
||||||
|
// Context at assignment
|
||||||
|
assignmentContext: {
|
||||||
|
platform: string;
|
||||||
|
appVersion: string;
|
||||||
|
osVersion: string;
|
||||||
|
deviceModel?: string;
|
||||||
|
region?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Events attributed to this assignment
|
||||||
|
eventCount: number;
|
||||||
|
lastEventAt?: string;
|
||||||
|
|
||||||
|
// TTL: Remove after experiment completes + analysis period
|
||||||
|
ttl: number; // experimentEnd + 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### ExperimentEventDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface ExperimentEventDoc {
|
||||||
|
id: string; // ee_<uuid>
|
||||||
|
experimentId: string; // partition key
|
||||||
|
timestamp: string; // Sort key for time-series queries
|
||||||
|
|
||||||
|
// Attribution
|
||||||
|
userId: string;
|
||||||
|
variantId: string;
|
||||||
|
assignmentId: string;
|
||||||
|
|
||||||
|
// Event details
|
||||||
|
metricName: string;
|
||||||
|
metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
|
||||||
|
value: number; // Numeric value
|
||||||
|
|
||||||
|
// Conversion tracking (for binary metrics)
|
||||||
|
converted: boolean; // For conversion metrics
|
||||||
|
|
||||||
|
// Context
|
||||||
|
eventMetadata?: Record<string, unknown>;
|
||||||
|
|
||||||
|
// Denormalized for filtering
|
||||||
|
platform: string;
|
||||||
|
appVersion: string;
|
||||||
|
|
||||||
|
// TTL: Shorter for raw events
|
||||||
|
ttl: number; // 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Tracking
|
||||||
|
|
||||||
|
| Phase | Task | Status | Commit |
|
||||||
|
| ----- | ----------------------------- | ------ | ------ |
|
||||||
|
| 1.1 | Experiment types & schemas | ⬜ | — |
|
||||||
|
| 1.1 | Cosmos containers | ⬜ | — |
|
||||||
|
| 1.2 | Deterministic bucketing | ⬜ | — |
|
||||||
|
| 1.2 | Assignment strategies | ⬜ | — |
|
||||||
|
| 1.2 | Audience targeting | ⬜ | — |
|
||||||
|
| 1.3 | Metric definitions | ⬜ | — |
|
||||||
|
| 1.3 | Event ingestion | ⬜ | — |
|
||||||
|
| 2.1 | Bayesian inference engine | ⬜ | — |
|
||||||
|
| 2.1 | Probability calculations | ⬜ | — |
|
||||||
|
| 2.1 | Credible intervals | ⬜ | — |
|
||||||
|
| 2.2 | Early stopping rules | ⬜ | — |
|
||||||
|
| 2.2 | Auto-promotion | ⬜ | — |
|
||||||
|
| 2.2 | Guardrails | ⬜ | — |
|
||||||
|
| 2.3 | Thompson sampling | ⬜ | — |
|
||||||
|
| 2.3 | Exploration vs exploitation | ⬜ | — |
|
||||||
|
| 2.3 | Regret minimization | ⬜ | — |
|
||||||
|
| 3.1 | Pattern detection | ⬜ | — |
|
||||||
|
| 3.1 | Anomaly detection | ⬜ | — |
|
||||||
|
| 3.2 | Hypothesis generation prompts | ⬜ | — |
|
||||||
|
| 3.2 | Hypothesis ranking | ⬜ | — |
|
||||||
|
| 3.3 | Auto-experiment suggestions | ⬜ | — |
|
||||||
|
| 4.1 | Experiments list page | ⬜ | — |
|
||||||
|
| 4.1 | Creation wizard | ⬜ | — |
|
||||||
|
| 4.2 | Live dashboard | ⬜ | — |
|
||||||
|
| 4.2 | Statistical summary | ⬜ | — |
|
||||||
|
| 4.3 | Results & reporting | ⬜ | — |
|
||||||
|
| 4.3 | AI insights panel | ⬜ | — |
|
||||||
|
|
||||||
|
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference for Implementing Agent
|
||||||
|
|
||||||
|
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md`
|
||||||
|
|
||||||
|
**Key Files to Modify/Create:**
|
||||||
|
|
||||||
|
```
|
||||||
|
services/platform-service/
|
||||||
|
├── src/
|
||||||
|
│ ├── modules/ab-testing/
|
||||||
|
│ │ ├── types.ts # [1.1] Experiment, Variant, Assignment types
|
||||||
|
│ │ ├── repository.ts # [1.2] Data access layer
|
||||||
|
│ │ ├── bucketing.ts # [1.2] FNV-1a hash, sticky assignments
|
||||||
|
│ │ ├── statistics.ts # [2.1] Bayesian inference, Beta/Normal distributions
|
||||||
|
│ │ ├── allocation.ts # [2.3] Thompson sampling, bandit strategies
|
||||||
|
│ │ ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
|
||||||
|
│ │ ├── routes.ts # [4] REST API
|
||||||
|
│ │ └── ab-testing.test.ts # Tests
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ └── cosmos-init.ts # [1.1] Add containers
|
||||||
|
│ └── server.ts # Register routes
|
||||||
|
dashboards/admin-web/
|
||||||
|
├── src/
|
||||||
|
│ ├── app/(dashboard)/
|
||||||
|
│ │ ├── experiments/
|
||||||
|
│ │ │ ├── page.tsx # [4.1] Experiments list
|
||||||
|
│ │ │ ├── new/page.tsx # [4.1] Creation wizard
|
||||||
|
│ │ │ └── [id]/
|
||||||
|
│ │ │ └── page.tsx # [4.2] Live dashboard
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ └── experiments-client.ts # API client
|
||||||
|
│ └── components/
|
||||||
|
│ └── experiments/ # Bayesian charts, variant cards
|
||||||
|
```
|
||||||
|
|
||||||
|
**Commit Message Format:**
|
||||||
|
|
||||||
|
```
|
||||||
|
feat(ab-testing): <description> [<task.code>]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add services/platform-service/src/modules/ab-testing/
|
||||||
|
git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Testing Requirements:**
|
||||||
|
|
||||||
|
- Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
|
||||||
|
- Statistical validation: A/A tests, known distribution tests
|
||||||
|
- Integration: End-to-end experiment lifecycle
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Feature flags module (reuse bucketing logic)
|
||||||
|
- Telemetry module (event tracking)
|
||||||
|
- Azure OpenAI (hypothesis generation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix B: Statistical Methods
|
||||||
|
|
||||||
|
### Bayesian A/B Testing
|
||||||
|
|
||||||
|
**Conversion Metrics (Beta-Binomial):**
|
||||||
|
|
||||||
|
```
|
||||||
|
Posterior: Beta(α + conversions, β + non-conversions)
|
||||||
|
Where α = β = 1 (uniform prior)
|
||||||
|
|
||||||
|
Probability variant beats control:
|
||||||
|
P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Continuous Metrics (Normal):**
|
||||||
|
|
||||||
|
```
|
||||||
|
Posterior: Normal(μ_n, σ_n²)
|
||||||
|
Where μ_n, σ_n updated via conjugate prior
|
||||||
|
|
||||||
|
Probability variant beats control via Monte Carlo sampling
|
||||||
|
```
|
||||||
|
|
||||||
|
### Thompson Sampling
|
||||||
|
|
||||||
|
```
|
||||||
|
For each incoming user:
|
||||||
|
For each variant:
|
||||||
|
Sample θ_i from variant's posterior distribution
|
||||||
|
Assign user to variant with max(θ_i)
|
||||||
|
|
||||||
|
Update variant's posterior after observing outcome
|
||||||
|
```
|
||||||
|
|
||||||
|
### Early Stopping
|
||||||
|
|
||||||
|
```
|
||||||
|
Stop experiment when:
|
||||||
|
max_variant P(beats control) > 0.95 → Winner found
|
||||||
|
OR max_variant P(beats control) < 0.05 → No winner
|
||||||
|
OR days_running > max_duration
|
||||||
|
AND samples_per_variant > min_sample_size
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix C: API Reference
|
||||||
|
|
||||||
|
| Method | Endpoint | Auth | Description |
|
||||||
|
| ------ | -------------------------------------- | -------- | -------------------------------- |
|
||||||
|
| POST | `/ab-testing/experiments` | Admin | Create experiment |
|
||||||
|
| GET | `/ab-testing/experiments` | Admin | List experiments |
|
||||||
|
| GET | `/ab-testing/experiments/:id` | Admin | Get experiment details |
|
||||||
|
| PATCH | `/ab-testing/experiments/:id` | Admin | Update experiment |
|
||||||
|
| DELETE | `/ab-testing/experiments/:id` | Admin | Stop/archive experiment |
|
||||||
|
| POST | `/ab-testing/experiments/:id/start` | Admin | Start experiment |
|
||||||
|
| POST | `/ab-testing/experiments/:id/pause` | Admin | Pause experiment |
|
||||||
|
| POST | `/ab-testing/experiments/:id/complete` | Admin | Complete with winner |
|
||||||
|
| POST | `/ab-testing/assign` | Any auth | Get variant assignment for user |
|
||||||
|
| POST | `/ab-testing/events` | Any auth | Track experiment event |
|
||||||
|
| GET | `/ab-testing/experiments/:id/results` | Admin | Get statistical results |
|
||||||
|
| GET | `/ab-testing/suggestions` | Admin | AI-generated experiment ideas |
|
||||||
|
| POST | `/ab-testing/hypotheses` | Admin | Generate hypothesis from pattern |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix D: Integration Points
|
||||||
|
|
||||||
|
### With Feature Flags Module
|
||||||
|
|
||||||
|
- Experiments build on feature flag infrastructure
|
||||||
|
- Flag state = variant assignment
|
||||||
|
- Consistent bucketing with existing flags
|
||||||
|
|
||||||
|
### With Telemetry Module
|
||||||
|
|
||||||
|
- Experiment events enriched with telemetry context
|
||||||
|
- Automatic metric tracking from existing events
|
||||||
|
- Funnel analysis using telemetry breadcrumbs
|
||||||
|
|
||||||
|
### With Event Bus
|
||||||
|
|
||||||
|
| Event | Action |
|
||||||
|
| ----------------------------- | ----------------------------------- |
|
||||||
|
| `ab.experiment.started` | Notify stakeholders, log audit |
|
||||||
|
| `ab.experiment.completed` | Generate report, suggest follow-ups |
|
||||||
|
| `ab.variant.declared_winner` | Trigger auto-rollout if enabled |
|
||||||
|
| `ab.early_stopping.triggered` | Alert experiment owner |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix E: Cost Estimation
|
||||||
|
|
||||||
|
| Component | Monthly Cost (est.) |
|
||||||
|
| ---------------------------- | ------------------------ |
|
||||||
|
| Cosmos DB (experiment data) | $100–200 |
|
||||||
|
| LLM hypothesis generation | $50–100 (weekly reports) |
|
||||||
|
| Compute (statistical engine) | $50 (negligible) |
|
||||||
|
| **Total** | **$200–350/month** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Status
|
||||||
|
|
||||||
|
- [ ] **Design complete** — Target: 2026-03-10
|
||||||
|
- [ ] **Phase 1: Core Engine** — Not started
|
||||||
|
- [ ] **Phase 2: Statistics** — Not started
|
||||||
|
- [ ] **Phase 3: AI Hypotheses** — Not started
|
||||||
|
- [ ] **Phase 4: Admin UI** — Not started
|
||||||
|
- [ ] **Phase 5: Advanced** — Future
|
||||||
|
|
||||||
|
**Estimated Timeline:** 2.5–3 weeks (Phases 1–4)
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Feature flags module (for assignment infrastructure)
|
||||||
|
- Telemetry module (for event tracking)
|
||||||
|
- Azure OpenAI (for hypothesis generation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Last Updated: 2026-03-03_
|
||||||
848
docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
Normal file
848
docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md
Normal file
@ -0,0 +1,848 @@
|
|||||||
|
# Predictive Churn & Health Scoring — Implementation Roadmap
|
||||||
|
|
||||||
|
> **Module:** `platform-service/src/modules/predictive-analytics/`
|
||||||
|
> **Admin UI:** `/ops/health-dashboard/`
|
||||||
|
> **Target:** ML-powered churn prediction, health scoring, and proactive retention
|
||||||
|
> **Estimated Effort:** 3 weeks
|
||||||
|
> **Status:** 🟡 Planning
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This roadmap delivers a **predictive analytics platform** that forecasts user churn 7–30 days in advance and computes product health scores from telemetry. Unlike reactive dashboards that show what happened, this system **predicts what will happen**—enabling proactive retention campaigns, resource allocation, and product improvements before users leave.
|
||||||
|
|
||||||
|
### Key Differentiators vs. Reactive Analytics
|
||||||
|
|
||||||
|
| Capability | Traditional Analytics | Predictive Churn & Health |
|
||||||
|
| ------------------- | -------------------------- | --------------------------------- |
|
||||||
|
| Insight Type | Historical (what happened) | **Predictive (what will happen)** |
|
||||||
|
| Churn Detection | After user leaves | **7–30 days before churn** |
|
||||||
|
| Health View | Current snapshot only | **Trending + forecasted** |
|
||||||
|
| Interventions | Reactive recovery | **Proactive prevention** |
|
||||||
|
| Product Insights | Manual pattern search | **Auto-detected risk signals** |
|
||||||
|
| Resource Allocation | Guesswork | **Risk-weighted prioritization** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Feature Engineering Pipeline (Week 1)
|
||||||
|
|
||||||
|
### 1.1 Telemetry Feature Extraction
|
||||||
|
|
||||||
|
- [ ] **1.1.1** Create `modules/predictive-analytics/feature-extractor.ts`
|
||||||
|
- [ ] User behavior features (session frequency, depth, recency)
|
||||||
|
- [ ] Engagement features (feature usage diversity, core action completion)
|
||||||
|
- [ ] Performance features (error rate, latency exposure, crash frequency)
|
||||||
|
- [ ] Social features (sharing, collaboration, network effects)
|
||||||
|
- [ ] Revenue features (payment history, plan changes, support tickets)
|
||||||
|
- [ ] **1.1.2** Time-window aggregations
|
||||||
|
- [ ] Last 24 hours (recent behavior)
|
||||||
|
- [ ] Last 7 days (weekly patterns)
|
||||||
|
- [ ] Last 30 days (monthly trends)
|
||||||
|
- [ ] Life-to-date (all-time totals)
|
||||||
|
- [ ] **1.1.3** Rolling window features
|
||||||
|
- [ ] 7-day rolling average (trend smoothing)
|
||||||
|
- [ ] Week-over-week change (acceleration)
|
||||||
|
- [ ] Cohort-normalized scores (vs. similar users)
|
||||||
|
|
||||||
|
### 1.2 Feature Store
|
||||||
|
|
||||||
|
- [ ] **1.2.1** Create `modules/predictive-analytics/feature-store.ts`
|
||||||
|
- [ ] `UserFeatureVector` — normalized feature values per user
|
||||||
|
- [ ] `ProductHealthMetrics` — aggregated product-level scores
|
||||||
|
- [ ] Feature versioning (track feature schema changes)
|
||||||
|
- [ ] **1.2.2** Add Cosmos containers to `cosmos-init.ts`
|
||||||
|
- [ ] `user_features` (pk: `/userId`, TTL: 90 days)
|
||||||
|
- [ ] `product_health` (pk: `/productId` + `/date`, time-series)
|
||||||
|
- [ ] `feature_definitions` (pk: `/productId`, feature metadata)
|
||||||
|
- [ ] **1.2.3** Feature computation jobs
|
||||||
|
- [ ] Daily feature computation (nightly batch)
|
||||||
|
- [ ] Real-time feature updates (on key events)
|
||||||
|
- [ ] Feature backfill (compute historical features)
|
||||||
|
|
||||||
|
### 1.3 Product-Specific Feature Catalog
|
||||||
|
|
||||||
|
- [ ] **1.3.1** Define features per product
|
||||||
|
- [ ] **NomGap:** Fast completion rate, protocol adherence, streak length, autophagy engagement
|
||||||
|
- [ ] **JarvisJr:** Session frequency, agent diversity, voice/text ratio, skill progression
|
||||||
|
- [ ] **ChronoMind:** Timer completion rate, cascade effectiveness, routine adherence, urgency response
|
||||||
|
- [ ] **MindLyst:** Brain usage diversity, triage accuracy, memory capture frequency, reflection completion
|
||||||
|
- [ ] **PeakPulse:** Session frequency, goal completion, streak maintenance, social sharing
|
||||||
|
- [ ] **LysnrAI:** Dictation frequency, accuracy rate, hotkey usage, vocabulary growth
|
||||||
|
- [ ] **1.3.2** Feature importance tracking
|
||||||
|
- [ ] Which features correlate with churn/retention
|
||||||
|
- [ ] Feature drift detection (behavior changes over time)
|
||||||
|
- [ ] Auto-suggest new features based on patterns
|
||||||
|
|
||||||
|
**Phase 1 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] 50+ features extracted per product
|
||||||
|
- [ ] Feature store populated for all active users
|
||||||
|
- [ ] Daily feature computation job running
|
||||||
|
- [ ] Feature importance analysis completed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Churn Prediction Model (Week 1–2)
|
||||||
|
|
||||||
|
### 2.1 Model Architecture
|
||||||
|
|
||||||
|
- [ ] **2.1.1** Create `modules/predictive-analytics/churn-model.ts`
|
||||||
|
- [ ] Binary classification (will churn in 7 days? 30 days?)
|
||||||
|
- [ ] Gradient Boosted Trees (XGBoost/LightGBM) baseline
|
||||||
|
- [ ] Neural network ensemble (for comparison)
|
||||||
|
- [ ] **2.1.2** Training pipeline
|
||||||
|
- [ ] Label definition: No activity for N days = churned
|
||||||
|
- [ ] Train/validation/test split (time-based, not random)
|
||||||
|
- [ ] Cross-validation with temporal folds
|
||||||
|
- [ ] Hyperparameter tuning (optuna/ray tune)
|
||||||
|
- [ ] **2.1.3** Model evaluation
|
||||||
|
- [ ] ROC-AUC (discrimination ability)
|
||||||
|
- [ ] Precision/Recall at different thresholds
|
||||||
|
- [ ] Calibration (predicted prob vs. actual rate)
|
||||||
|
- [ ] Per-product performance breakdown
|
||||||
|
|
||||||
|
### 2.2 Prediction Service
|
||||||
|
|
||||||
|
- [ ] **2.2.1** Real-time scoring API
|
||||||
|
- [ ] `POST /predictive/churn-score` — single user prediction
|
||||||
|
- [ ] `POST /predictive/churn-batch` — batch scoring
|
||||||
|
- [ ] Latency < 100ms for single prediction
|
||||||
|
- [ ] **2.2.2** Risk segmentation
|
||||||
|
- [ ] Risk buckets: Critical (>80%), High (60–80%), Medium (30–60%), Low (<30%)
|
||||||
|
- [ ] Risk score components (which features drive the score)
|
||||||
|
- [ ] Confidence intervals on predictions
|
||||||
|
- [ ] **2.2.3** Model versioning
|
||||||
|
- [ ] A/B test model versions
|
||||||
|
- [ ] Shadow mode (predict without acting)
|
||||||
|
- [ ] Rollback capability
|
||||||
|
|
||||||
|
### 2.3 Explanation Engine
|
||||||
|
|
||||||
|
- [ ] **2.3.1** SHAP value computation
|
||||||
|
- [ ] Feature contributions to each prediction
|
||||||
|
- [ ] Global feature importance (what drives churn overall)
|
||||||
|
- [ ] Local explanations (why this specific user is at risk)
|
||||||
|
- [ ] **2.3.2** Natural language explanations
|
||||||
|
```
|
||||||
|
"This user shows 78% churn risk because:
|
||||||
|
- Session frequency dropped 60% in the last week
|
||||||
|
- No core feature usage in 5 days
|
||||||
|
- Error rate increased 3x vs. their baseline
|
||||||
|
- Similar users who showed these patterns had 85% churn rate"
|
||||||
|
```
|
||||||
|
- [ ] **2.3.3** Actionable insight extraction
|
||||||
|
- [ ] Top 3 risk factors per user
|
||||||
|
- [ ] Suggested intervention based on risk profile
|
||||||
|
- [ ] Priority ranking (who to contact first)
|
||||||
|
|
||||||
|
**Phase 2 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Model achieves > 75% AUC on test set
|
||||||
|
- [ ] Real-time scoring API < 100ms latency
|
||||||
|
- [ ] Explanations generated for all predictions
|
||||||
|
- [ ] Risk segmentation validated against historical churn
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Product Health Scoring (Week 2)
|
||||||
|
|
||||||
|
### 3.1 Health Metric Framework
|
||||||
|
|
||||||
|
- [ ] **3.1.1** Create `modules/predictive-analytics/health-scoring.ts`
|
||||||
|
- [ ] Health dimensions: Acquisition, Activation, Retention, Revenue, Engagement
|
||||||
|
- [ ] Composite health score (weighted average)
|
||||||
|
- [ ] Per-dimension scores with drill-down
|
||||||
|
- [ ] **3.1.2** Product health indicators
|
||||||
|
- [ ] Daily Active Users (DAU) trend
|
||||||
|
- [ ] New user activation rate (Day-1, Day-7)
|
||||||
|
- [ ] Cohort retention curves (Day-1, Day-7, Day-30)
|
||||||
|
- [ ] Feature adoption rates (new feature uptake)
|
||||||
|
- [ ] Error rates and stability scores
|
||||||
|
- [ ] Support ticket volume and sentiment
|
||||||
|
- [ ] Revenue metrics (MRR, ARPU, LTV)
|
||||||
|
|
||||||
|
### 3.2 Health Score Computation
|
||||||
|
|
||||||
|
- [ ] **3.2.1** Baseline establishment
|
||||||
|
- [ ] Historical 90-day baseline for each metric
|
||||||
|
- [ ] Peer product comparison (ChronoMind vs. JarvisJr benchmarks)
|
||||||
|
- [ ] Industry benchmarks (if available)
|
||||||
|
- [ ] **3.2.2** Scoring algorithm
|
||||||
|
- [ ] Z-score normalization (how many std devs from baseline)
|
||||||
|
- [ ] Trend direction (improving vs. declining)
|
||||||
|
- [ ] Volatility adjustment (consistent vs. erratic)
|
||||||
|
- [ ] 0–100 health score scale
|
||||||
|
- [ ] **3.2.3** Alert thresholds
|
||||||
|
- [ ] Critical: Score < 60 or 20% drop from baseline
|
||||||
|
- [ ] Warning: Score 60–75 or 10% drop
|
||||||
|
- [ ] Healthy: Score > 75 and stable
|
||||||
|
|
||||||
|
### 3.3 Anomaly Detection
|
||||||
|
|
||||||
|
- [ ] **3.3.1** Statistical anomaly detection
|
||||||
|
- [ ] Prophet/ARIMA for time-series forecasting
|
||||||
|
- [ ] Forecast vs. actual deviation detection
|
||||||
|
- [ ] Seasonal pattern recognition (day-of-week, monthly)
|
||||||
|
- [ ] **3.3.2** Multi-dimensional anomaly detection
|
||||||
|
- [ ] Correlation breakdown detection (metrics usually correlated diverging)
|
||||||
|
- [ ] Cohort-specific anomalies (specific region, platform, segment)
|
||||||
|
- [ ] **3.3.3** Root cause suggestion
|
||||||
|
- [ ] Correlation with deployments/releases
|
||||||
|
- [ ] Error spike correlation
|
||||||
|
- [ ] External factor detection (holidays, events)
|
||||||
|
|
||||||
|
**Phase 3 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Health scores computed daily for all products
|
||||||
|
- [ ] Anomaly detection with < 5% false positive rate
|
||||||
|
- [ ] Historical baseline established for all metrics
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Proactive Intervention System (Week 2–3)
|
||||||
|
|
||||||
|
### 4.1 Retention Campaign Automation
|
||||||
|
|
||||||
|
- [ ] **4.1.1** Campaign trigger rules
|
||||||
|
- [ ] High-risk user enters segment → trigger email
|
||||||
|
- [ ] Medium-risk + specific behavior → trigger in-app message
|
||||||
|
- [ ] Critical risk → trigger personal outreach task
|
||||||
|
- [ ] **4.1.2** Personalized messaging
|
||||||
|
- [ ] Message variant based on risk factors
|
||||||
|
- [ ] Feature recommendations based on unused capabilities
|
||||||
|
- [ ] Success stories from similar users
|
||||||
|
- [ ] **4.1.3** Campaign effectiveness tracking
|
||||||
|
- [ ] Control group vs. treatment
|
||||||
|
- [ ] Churn rate comparison
|
||||||
|
- [ ] Revenue impact measurement
|
||||||
|
|
||||||
|
### 4.2 Auto-Trigger Flows
|
||||||
|
|
||||||
|
- [ ] **4.2.1** Platform integrations
|
||||||
|
- [ ] Email delivery via existing `modules/delivery/`
|
||||||
|
- [ ] Push notifications via `modules/notifications/`
|
||||||
|
- [ ] Slack notifications for CS team
|
||||||
|
- [ ] CRM integration (create outreach tasks)
|
||||||
|
- [ ] **4.2.2** Smart scheduling
|
||||||
|
- [ ] Optimal contact time prediction
|
||||||
|
- [ ] Frequency capping (don't spam)
|
||||||
|
- [ ] Multi-channel orchestration
|
||||||
|
- [ ] **4.2.3** Feedback loop
|
||||||
|
- [ ] Track intervention outcomes
|
||||||
|
- [ ] Retrain model with intervention effectiveness
|
||||||
|
- [ ] A/B test intervention strategies
|
||||||
|
|
||||||
|
### 4.3 Risk Dashboard for CS Team
|
||||||
|
|
||||||
|
- [ ] **4.3.1** At-risk user list
|
||||||
|
- [ ] Sortable by churn probability
|
||||||
|
- [ ] Filter by product, segment, risk factors
|
||||||
|
- [ ] Last activity preview
|
||||||
|
- [ ] **4.3.2** User risk profile
|
||||||
|
- [ ] Churn probability trend over time
|
||||||
|
- [ ] Key risk factors highlighted
|
||||||
|
- [ ] Recommended actions
|
||||||
|
- [ ] User activity timeline
|
||||||
|
- [ ] **4.3.3** Intervention tracking
|
||||||
|
- [ ] Contact history
|
||||||
|
- [ ] Response tracking
|
||||||
|
- [ ] Outcome recording (retained/churned)
|
||||||
|
|
||||||
|
**Phase 4 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Automated campaigns triggered for high-risk users
|
||||||
|
- [ ] CS team dashboard with at-risk user queue
|
||||||
|
- [ ] Intervention effectiveness measurement in place
|
||||||
|
- [ ] Feedback loop improving model accuracy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Admin Dashboard UI (Week 3)
|
||||||
|
|
||||||
|
### 5.1 Product Health Overview
|
||||||
|
|
||||||
|
- [ ] **5.1.1** Create `/ops/health-dashboard/page.tsx`
|
||||||
|
- [ ] Health score cards for each product
|
||||||
|
- [ ] Trend sparklines (7-day, 30-day)
|
||||||
|
- [ ] Alert summary (critical issues count)
|
||||||
|
- [ ] Product comparison table
|
||||||
|
- [ ] **5.1.2** Health detail view
|
||||||
|
- [ ] Dimension breakdown (acquisition, activation, retention, etc.)
|
||||||
|
- [ ] Metric time-series charts
|
||||||
|
- [ ] Anomaly markers on charts
|
||||||
|
- [ ] Cohort retention curves
|
||||||
|
|
||||||
|
### 5.2 Churn Prediction Dashboard
|
||||||
|
|
||||||
|
- [ ] **5.2.1** Churn risk overview
|
||||||
|
- [ ] Risk distribution pie chart
|
||||||
|
- [ ] At-risk user count by product
|
||||||
|
- [ ] Predicted churn impact (revenue at risk)
|
||||||
|
- [ ] Model performance metrics (AUC, calibration)
|
||||||
|
- [ ] **5.2.2** User risk explorer
|
||||||
|
- [ ] Search/filter at-risk users
|
||||||
|
- [ ] Risk score with explanation
|
||||||
|
- [ ] Top risk factors
|
||||||
|
- [ ] Recommended interventions
|
||||||
|
- [ ] **5.2.3** Model insights
|
||||||
|
- [ ] Global feature importance chart
|
||||||
|
- [ ] Model performance over time
|
||||||
|
- [ ] Feature drift alerts
|
||||||
|
|
||||||
|
### 5.3 Campaign Management
|
||||||
|
|
||||||
|
- [ ] **5.3.1** Campaign list
|
||||||
|
- [ ] Active/paused/completed campaigns
|
||||||
|
- [ ] Trigger rules summary
|
||||||
|
- [ ] Performance stats (sent, opened, converted)
|
||||||
|
- [ ] **5.3.2** Campaign editor
|
||||||
|
- [ ] Trigger condition builder
|
||||||
|
- [ ] Message template editor
|
||||||
|
- [ ] Audience targeting
|
||||||
|
- [ ] A/B test configuration
|
||||||
|
- [ ] **5.3.3** Campaign analytics
|
||||||
|
- [ ] Funnel: triggered → sent → opened → retained
|
||||||
|
- [ ] Revenue impact
|
||||||
|
- [ ] Comparison to control group
|
||||||
|
|
||||||
|
**Phase 5 Exit Criteria:**
|
||||||
|
|
||||||
|
- [ ] Health dashboard shows all products with trends
|
||||||
|
- [ ] Churn predictions visible with explanations
|
||||||
|
- [ ] Campaign creation and management functional
|
||||||
|
- [ ] Full test coverage
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: Advanced Capabilities (Future)
|
||||||
|
|
||||||
|
### 6.1 Cohort-Specific Models
|
||||||
|
|
||||||
|
- [ ] Segment-specific churn models (iOS vs. Android, free vs. pro)
|
||||||
|
- [ ] Regional models (different behaviors by geography)
|
||||||
|
- [ ] Temporal models (seasonal churn patterns)
|
||||||
|
|
||||||
|
### 6.2 LTV Prediction
|
||||||
|
|
||||||
|
- [ ] Predict lifetime value at signup
|
||||||
|
- [ ] Predict upgrade probability (free → pro)
|
||||||
|
- [ ] Optimize acquisition channels by predicted LTV
|
||||||
|
|
||||||
|
### 6.3 Product Recommendations
|
||||||
|
|
||||||
|
- [ ] Suggest features to at-risk users based on successful cohorts
|
||||||
|
- [ ] Personalized onboarding based on predicted needs
|
||||||
|
- [ ] Next-best-action recommendations
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix A: Data Models
|
||||||
|
|
||||||
|
### UserChurnPredictionDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface UserChurnPredictionDoc {
|
||||||
|
id: string; // cp_<uuid>
|
||||||
|
userId: string; // partition key
|
||||||
|
productId: string;
|
||||||
|
|
||||||
|
// Prediction
|
||||||
|
predictionHorizon: 7 | 14 | 30; // Days
|
||||||
|
churnProbability: number; // 0–1
|
||||||
|
riskSegment: 'critical' | 'high' | 'medium' | 'low';
|
||||||
|
|
||||||
|
// Feature vector snapshot
|
||||||
|
features: Record<string, number>; // Normalized feature values
|
||||||
|
featureVersion: string; // Schema version
|
||||||
|
|
||||||
|
// Model info
|
||||||
|
modelVersion: string;
|
||||||
|
modelType: 'xgboost' | 'neural';
|
||||||
|
predictionTimestamp: string;
|
||||||
|
|
||||||
|
// Explanation (SHAP values)
|
||||||
|
explanation: {
|
||||||
|
topRiskFactors: Array<{
|
||||||
|
feature: string;
|
||||||
|
contribution: number; // SHAP value
|
||||||
|
direction: 'positive' | 'negative'; // Increases or decreases churn risk
|
||||||
|
}>;
|
||||||
|
globalFeatureImportance: Array<{
|
||||||
|
feature: string;
|
||||||
|
importance: number;
|
||||||
|
}>;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Natural language summary
|
||||||
|
nlExplanation: string; // Auto-generated explanation
|
||||||
|
|
||||||
|
// Intervention
|
||||||
|
suggestedActions: string[];
|
||||||
|
interventionHistory: Array<{
|
||||||
|
action: string;
|
||||||
|
timestamp: string;
|
||||||
|
outcome?: 'responded' | 'ignored' | 'churned' | 'retained';
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Validation (ground truth)
|
||||||
|
actualChurned?: boolean;
|
||||||
|
validationDate?: string;
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
ttl: number; // predictionHorizon + 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### ProductHealthScoreDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface ProductHealthScoreDoc {
|
||||||
|
id: string; // ph_<uuid>
|
||||||
|
productId: string; // partition key
|
||||||
|
date: string; // Sort key (YYYY-MM-DD)
|
||||||
|
|
||||||
|
// Composite score
|
||||||
|
overallHealthScore: number; // 0–100
|
||||||
|
healthStatus: 'critical' | 'warning' | 'healthy';
|
||||||
|
|
||||||
|
// Dimension scores
|
||||||
|
dimensions: {
|
||||||
|
acquisition: {
|
||||||
|
score: number; // 0–100
|
||||||
|
metrics: {
|
||||||
|
newUsers: number;
|
||||||
|
activationRateDay1: number;
|
||||||
|
activationRateDay7: number;
|
||||||
|
cac: number;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
activation: {
|
||||||
|
score: number;
|
||||||
|
metrics: {
|
||||||
|
firstValueMomentRate: number;
|
||||||
|
timeToFirstAction: number;
|
||||||
|
onboardingCompletionRate: number;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
retention: {
|
||||||
|
score: number;
|
||||||
|
metrics: {
|
||||||
|
dau: number;
|
||||||
|
mau: number;
|
||||||
|
dauMauRatio: number;
|
||||||
|
day7Retention: number;
|
||||||
|
day30Retention: number;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
engagement: {
|
||||||
|
score: number;
|
||||||
|
metrics: {
|
||||||
|
avgSessionLength: number;
|
||||||
|
sessionsPerUser: number;
|
||||||
|
featureAdoption: Record<string, number>;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
revenue: {
|
||||||
|
score: number;
|
||||||
|
metrics: {
|
||||||
|
mrr: number;
|
||||||
|
arpu: number;
|
||||||
|
churnRate: number;
|
||||||
|
upgradeRate: number;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
stability: {
|
||||||
|
score: number;
|
||||||
|
metrics: {
|
||||||
|
crashFreeRate: number;
|
||||||
|
errorRate: number;
|
||||||
|
avgLatency: number;
|
||||||
|
uptimePercent: number;
|
||||||
|
};
|
||||||
|
trend: 'improving' | 'stable' | 'declining';
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
// Anomalies detected
|
||||||
|
anomalies: Array<{
|
||||||
|
metric: string;
|
||||||
|
expectedValue: number;
|
||||||
|
actualValue: number;
|
||||||
|
deviationPercent: number;
|
||||||
|
severity: 'critical' | 'warning';
|
||||||
|
suggestedCause?: string;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Forecasts
|
||||||
|
forecasts: {
|
||||||
|
next7Days: {
|
||||||
|
expectedHealthScore: number;
|
||||||
|
confidenceInterval: [number, number];
|
||||||
|
};
|
||||||
|
next30Days: {
|
||||||
|
expectedHealthScore: number;
|
||||||
|
confidenceInterval: [number, number];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
// Benchmarks
|
||||||
|
vsBaseline7Day: number; // % change vs. 7-day baseline
|
||||||
|
vsBaseline30Day: number; // % change vs. 30-day baseline
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
ttl: number; // 2 years
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### RetentionCampaignDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface RetentionCampaignDoc {
|
||||||
|
id: string; // rc_<uuid>
|
||||||
|
productId: string; // partition key
|
||||||
|
|
||||||
|
// Campaign definition
|
||||||
|
name: string;
|
||||||
|
description: string;
|
||||||
|
status: 'draft' | 'active' | 'paused' | 'completed';
|
||||||
|
|
||||||
|
// Trigger conditions
|
||||||
|
trigger: {
|
||||||
|
type: 'churn_risk' | 'health_score_drop' | 'behavioral' | 'scheduled';
|
||||||
|
conditions: Array<{
|
||||||
|
field: string;
|
||||||
|
operator: 'gt' | 'lt' | 'eq' | 'in';
|
||||||
|
value: unknown;
|
||||||
|
}>;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Audience
|
||||||
|
audience: {
|
||||||
|
riskSegments?: string[]; // 'critical', 'high', etc.
|
||||||
|
products?: string[];
|
||||||
|
userSegments?: string[];
|
||||||
|
excludeRecentContact?: number; // Hours (frequency capping)
|
||||||
|
};
|
||||||
|
|
||||||
|
// Message content
|
||||||
|
messages: Array<{
|
||||||
|
channel: 'email' | 'push' | 'in_app' | 'slack_cs';
|
||||||
|
templateId: string;
|
||||||
|
variant?: string; // A/B test variant
|
||||||
|
delayHours?: number; // Delay after trigger
|
||||||
|
conditions?: Array<{
|
||||||
|
field: string;
|
||||||
|
operator: string;
|
||||||
|
value: unknown;
|
||||||
|
}>;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
// Performance tracking
|
||||||
|
stats: {
|
||||||
|
triggered: number;
|
||||||
|
sent: number;
|
||||||
|
opened: number;
|
||||||
|
clicked: number;
|
||||||
|
converted: number;
|
||||||
|
controlGroupSize: number;
|
||||||
|
controlChurnRate: number;
|
||||||
|
treatmentChurnRate: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
createdAt: string;
|
||||||
|
updatedAt: string;
|
||||||
|
ttl: number; // 1 year after completion
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### UserFeatureVectorDoc
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface UserFeatureVectorDoc {
|
||||||
|
id: string; // fv_<uuid>
|
||||||
|
userId: string; // partition key
|
||||||
|
productId: string;
|
||||||
|
|
||||||
|
// Computed features
|
||||||
|
features: {
|
||||||
|
// Recency features
|
||||||
|
daysSinceLastSession: number;
|
||||||
|
daysSinceLastCoreAction: number;
|
||||||
|
|
||||||
|
// Frequency features
|
||||||
|
sessionsLast7Days: number;
|
||||||
|
sessionsLast30Days: number;
|
||||||
|
avgSessionsPerWeek: number;
|
||||||
|
|
||||||
|
// Engagement depth
|
||||||
|
avgSessionDuration: number;
|
||||||
|
actionsPerSession: number;
|
||||||
|
uniqueFeaturesUsed: number;
|
||||||
|
|
||||||
|
// Product-specific (examples)
|
||||||
|
// NomGap
|
||||||
|
fastCompletionRate?: number;
|
||||||
|
streakLength?: number;
|
||||||
|
|
||||||
|
// JarvisJr
|
||||||
|
agentDiversityScore?: number;
|
||||||
|
voiceSessionRatio?: number;
|
||||||
|
|
||||||
|
// ChronoMind
|
||||||
|
timerCompletionRate?: number;
|
||||||
|
routineAdherenceScore?: number;
|
||||||
|
|
||||||
|
// Error/stability
|
||||||
|
errorRateLast7Days: number;
|
||||||
|
crashCountLast30Days: number;
|
||||||
|
|
||||||
|
// Revenue
|
||||||
|
planTier: number; // 0=free, 1=pro, 2=enterprise
|
||||||
|
lifetimeValue: number;
|
||||||
|
daysSinceLastPayment?: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Normalized (0–1) for model input
|
||||||
|
normalizedFeatures: Record<string, number>;
|
||||||
|
|
||||||
|
// Metadata
|
||||||
|
featureSchemaVersion: string;
|
||||||
|
computedAt: string;
|
||||||
|
|
||||||
|
// Time windows
|
||||||
|
observationWindow: {
|
||||||
|
start: string;
|
||||||
|
end: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
ttl: number; // 90 days
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Tracking
|
||||||
|
|
||||||
|
| Phase | Task | Status | Commit |
|
||||||
|
| ----- | ----------------------------- | ------ | ------ |
|
||||||
|
| 1.1 | Telemetry feature extraction | ⬜ | — |
|
||||||
|
| 1.1 | Time-window aggregations | ⬜ | — |
|
||||||
|
| 1.1 | Rolling window features | ⬜ | — |
|
||||||
|
| 1.2 | Feature store | ⬜ | — |
|
||||||
|
| 1.2 | Cosmos containers | ⬜ | — |
|
||||||
|
| 1.2 | Feature computation jobs | ⬜ | — |
|
||||||
|
| 1.3 | Product-specific features | ⬜ | — |
|
||||||
|
| 1.3 | Feature importance tracking | ⬜ | — |
|
||||||
|
| 2.1 | XGBoost model architecture | ⬜ | — |
|
||||||
|
| 2.1 | Training pipeline | ⬜ | — |
|
||||||
|
| 2.1 | Model evaluation | ⬜ | — |
|
||||||
|
| 2.2 | Real-time scoring API | ⬜ | — |
|
||||||
|
| 2.2 | Risk segmentation | ⬜ | — |
|
||||||
|
| 2.2 | Model versioning | ⬜ | — |
|
||||||
|
| 2.3 | SHAP explanations | ⬜ | — |
|
||||||
|
| 2.3 | Natural language explanations | ⬜ | — |
|
||||||
|
| 2.3 | Actionable insights | ⬜ | — |
|
||||||
|
| 3.1 | Health metric framework | ⬜ | — |
|
||||||
|
| 3.1 | Health indicators | ⬜ | — |
|
||||||
|
| 3.2 | Baseline establishment | ⬜ | — |
|
||||||
|
| 3.2 | Scoring algorithm | ⬜ | — |
|
||||||
|
| 3.2 | Alert thresholds | ⬜ | — |
|
||||||
|
| 3.3 | Anomaly detection | ⬜ | — |
|
||||||
|
| 4.1 | Campaign trigger rules | ⬜ | — |
|
||||||
|
| 4.1 | Personalized messaging | ⬜ | — |
|
||||||
|
| 4.2 | Platform integrations | ⬜ | — |
|
||||||
|
| 4.3 | CS team dashboard | ⬜ | — |
|
||||||
|
| 5.1 | Health overview UI | ⬜ | — |
|
||||||
|
| 5.2 | Churn prediction dashboard | ⬜ | — |
|
||||||
|
| 5.3 | Campaign management | ⬜ | — |
|
||||||
|
|
||||||
|
**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference for Implementing Agent
|
||||||
|
|
||||||
|
**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/PREDICTIVE_CHURN_HEALTH_SCORING_ROADMAP.md`
|
||||||
|
|
||||||
|
**Key Files to Modify/Create:**
|
||||||
|
|
||||||
|
```
|
||||||
|
services/platform-service/
|
||||||
|
├── src/
|
||||||
|
│ ├── modules/predictive-analytics/
|
||||||
|
│ │ ├── types.ts # [1.2] Feature, HealthScore, Prediction types
|
||||||
|
│ │ ├── repository.ts # Data access layer
|
||||||
|
│ │ ├── feature-extractor.ts # [1.1] Telemetry → features
|
||||||
|
│ │ ├── feature-store.ts # [1.2] Feature vector storage
|
||||||
|
│ │ ├── churn-model.ts # [2.1] XGBoost training & inference
|
||||||
|
│ │ ├── scoring-api.ts # [2.2] Real-time prediction endpoint
|
||||||
|
│ │ ├── explanation-engine.ts # [2.3] SHAP + NL explanations
|
||||||
|
│ │ ├── health-scoring.ts # [3] Health dimension calculation
|
||||||
|
│ │ ├── anomaly-detection.ts # [3.3] Prophet/ARIMA forecasting
|
||||||
|
│ │ ├── campaign-engine.ts # [4] Retention automation
|
||||||
|
│ │ ├── routes.ts # [5] REST API
|
||||||
|
│ │ └── predictive-analytics.test.ts # Tests
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ └── cosmos-init.ts # [1.2] Add containers
|
||||||
|
│ └── server.ts # Register routes
|
||||||
|
dashboards/admin-web/
|
||||||
|
├── src/
|
||||||
|
│ ├── app/(dashboard)/
|
||||||
|
│ │ ├── health-dashboard/
|
||||||
|
│ │ │ └── page.tsx # [5.1] Product health overview
|
||||||
|
│ │ └── predictive/
|
||||||
|
│ │ ├── at-risk/
|
||||||
|
│ │ │ └── page.tsx # [4.3] At-risk user list
|
||||||
|
│ │ └── campaigns/
|
||||||
|
│ │ └── page.tsx # [5.3] Campaign management
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ └── predictive-client.ts # API client
|
||||||
|
│ └── components/
|
||||||
|
│ └── predictive/ # Risk cards, health charts
|
||||||
|
```
|
||||||
|
|
||||||
|
**Commit Message Format:**
|
||||||
|
|
||||||
|
```
|
||||||
|
feat(predictive-analytics): <description> [<task.code>]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add services/platform-service/src/modules/predictive-analytics/
|
||||||
|
git commit -m "feat(predictive-analytics): add feature extraction and store [1.1-1.2]"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Testing Requirements:**
|
||||||
|
|
||||||
|
- Unit tests: 20+ Vitest tests for feature extraction, model inference
|
||||||
|
- Model validation: AUC, calibration, precision@k metrics
|
||||||
|
- Integration: End-to-end prediction pipeline
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Telemetry module (feature extraction)
|
||||||
|
- Delivery module (retention campaigns)
|
||||||
|
- Azure ML or scikit-learn (model training)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix B: API Reference
|
||||||
|
|
||||||
|
| Method | Endpoint | Auth | Description |
|
||||||
|
| ------ | -------------------------------------- | ------------- | ----------------------------- |
|
||||||
|
| GET | `/predictive/health` | Admin | Get all product health scores |
|
||||||
|
| GET | `/predictive/health/:productId` | Admin | Get product health detail |
|
||||||
|
| GET | `/predictive/health/:productId/trends` | Admin | Historical health trends |
|
||||||
|
| POST | `/predictive/churn-score` | Admin/Service | Get churn prediction for user |
|
||||||
|
| POST | `/predictive/churn-batch` | Admin | Batch churn scoring |
|
||||||
|
| GET | `/predictive/at-risk-users` | Admin/CS | List users by risk segment |
|
||||||
|
| GET | `/predictive/users/:id/risk-profile` | Admin/CS | User churn risk details |
|
||||||
|
| GET | `/predictive/model/performance` | Admin | Model accuracy metrics |
|
||||||
|
| GET | `/predictive/model/features` | Admin | Feature importance ranking |
|
||||||
|
| GET | `/predictive/campaigns` | Admin | List retention campaigns |
|
||||||
|
| POST | `/predictive/campaigns` | Admin | Create campaign |
|
||||||
|
| PATCH | `/predictive/campaigns/:id` | Admin | Update campaign |
|
||||||
|
| GET | `/predictive/campaigns/:id/stats` | Admin | Campaign performance |
|
||||||
|
| POST | `/predictive/campaigns/:id/trigger` | Admin | Manual trigger for testing |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix C: Integration Points
|
||||||
|
|
||||||
|
### With Telemetry Module
|
||||||
|
|
||||||
|
- Raw events feed feature extraction
|
||||||
|
- Error rates flow into health scores
|
||||||
|
- Correlation IDs link behaviors to predictions
|
||||||
|
|
||||||
|
### With Diagnostics Module
|
||||||
|
|
||||||
|
- Debug sessions enrich feature vectors
|
||||||
|
- Error clusters correlate with churn risk
|
||||||
|
- Screenshot patterns analyzed for UX issues
|
||||||
|
|
||||||
|
### With Event Bus
|
||||||
|
|
||||||
|
| Event | Action |
|
||||||
|
| ----------------------------------- | ---------------------------------------- |
|
||||||
|
| `predictive.churn.risk_detected` | Trigger retention campaign |
|
||||||
|
| `predictive.health.critical` | Alert leadership, suggest debug sessions |
|
||||||
|
| `predictive.anomaly.detected` | Create incident, notify on-call |
|
||||||
|
| `user.retention.campaign_responded` | Update model with outcome |
|
||||||
|
|
||||||
|
### With Delivery Module
|
||||||
|
|
||||||
|
- Retention campaigns use email templates
|
||||||
|
- Push notifications for urgent interventions
|
||||||
|
- A/B test message variants
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix D: Cost Estimation
|
||||||
|
|
||||||
|
| Component | Monthly Cost (est.) |
|
||||||
|
| ------------------------------------ | -------------------------- |
|
||||||
|
| Cosmos DB (features + predictions) | $150–300 |
|
||||||
|
| Model training (Azure ML) | $100–200 |
|
||||||
|
| Inference compute | $50–100 |
|
||||||
|
| Email delivery (retention campaigns) | $50–200 (volume-dependent) |
|
||||||
|
| **Total** | **$350–800/month** |
|
||||||
|
|
||||||
|
ROI: If system prevents 5% of predicted churn at $50 LTV with 10K at-risk users/month:
|
||||||
|
|
||||||
|
- 500 users retained × $50 = $25K/month value
|
||||||
|
- 10:1+ ROI
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix E: Success Metrics
|
||||||
|
|
||||||
|
### Model Performance
|
||||||
|
|
||||||
|
- [ ] AUC > 75% (discrimination)
|
||||||
|
- [ ] Calibration slope 0.9–1.1 (well-calibrated probabilities)
|
||||||
|
- [ ] Precision@10% > 60% (high-risk predictions are accurate)
|
||||||
|
|
||||||
|
### Business Impact
|
||||||
|
|
||||||
|
- [ ] 10%+ reduction in churn rate for targeted cohorts
|
||||||
|
- [ ] 5%+ increase in re-engagement campaign response
|
||||||
|
- [ ] CS team satisfaction with at-risk user visibility
|
||||||
|
|
||||||
|
### Operational
|
||||||
|
|
||||||
|
- [ ] Prediction latency < 100ms
|
||||||
|
- [ ] Feature freshness < 24 hours
|
||||||
|
- [ ] Model retraining automation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Status
|
||||||
|
|
||||||
|
- [ ] **Design complete** — Target: 2026-03-10
|
||||||
|
- [ ] **Phase 1: Feature Pipeline** — Not started
|
||||||
|
- [ ] **Phase 2: Churn Model** — Not started
|
||||||
|
- [ ] **Phase 3: Health Scoring** — Not started
|
||||||
|
- [ ] **Phase 4: Interventions** — Not started
|
||||||
|
- [ ] **Phase 5: Admin UI** — Not started
|
||||||
|
- [ ] **Phase 6: Advanced** — Future
|
||||||
|
|
||||||
|
**Estimated Timeline:** 3 weeks (Phases 1–5)
|
||||||
|
|
||||||
|
**Dependencies:**
|
||||||
|
|
||||||
|
- Telemetry module (for feature extraction)
|
||||||
|
- Azure ML or similar (for model training)
|
||||||
|
- Delivery module (for retention campaigns)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Last Updated: 2026-03-03_
|
||||||
Loading…
Reference in New Issue
Block a user