saravanakumardb1 a31fdfe55a feat(predictive-analytics): complete admin UI for churn prediction and health scoring

- Add health-dashboard page with 6-dimension health cards and anomaly detection
- Add predictive/at-risk page with user risk profiles and segmentation
- Add predictive/campaigns page with campaign management and stats
- Add predictive-client.ts API client with full type coverage
- Update all 3 roadmaps to reflect complete implementation status

2026-03-03 13:48:37 -08:00

24 KiB

Raw Blame History

Intelligent A/B Testing — Implementation Roadmap

Module: platform-service/src/modules/ab-testing/
Admin UI: /ops/experiments/
Target: AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation
Estimated Effort: 2.5–3 weeks
Status: 🟡 Planning

Executive Summary

This roadmap delivers an intelligent A/B testing platform that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, Bayesian early stopping when variants clearly win/lose, and LLM-powered hypothesis generation from feature flag usage patterns.

Key Differentiators vs. Static Feature Flags

Capability	Static Flags (Current)	Intelligent A/B Testing
Traffic Allocation	Manual percentage	Multi-armed bandit optimization
Stopping Decision	Manual monitoring	Auto-stop at statistical significance
Winner Selection	Human judgment	Bayesian probability of superiority
Test Duration	Fixed (often wrong)	Dynamic based on effect size
Hypothesis	Human-written	AI-generated from usage patterns
Sample Size	Guesswork	Power analysis + sequential testing

Phase 1: Core Experiment Engine (Week 1)

1.1 Data Model & Schemas

1.1.1 Create modules/ab-testing/types.ts
- ExperimentDoc — experiment definition and config
- VariantDoc — variant metadata + metrics
- AssignmentDoc — user → variant assignments
- MetricDoc — event types being tracked
- ExperimentResult — statistical analysis results
- Zod schemas for all inputs
1.1.2 Add Cosmos containers to cosmos-init.ts
- experiments (pk: /productId, TTL: 2 years for completed)
- experiment_variants (pk: /experimentId)
- experiment_assignments (pk: /userId, query by experiment)
- experiment_events (pk: /experimentId + /timestamp for time-series)
- experiment_metrics (pk: /experimentId, computed aggregates)

1.2 Assignment & Bucketing

1.2.1 Create deterministic bucketing
- Consistent hashing (userId + experimentId → variant)
- FNV-1a hash algorithm (same as feature flags)
- Sticky assignments (user always sees same variant)
- Override capability (force specific variant for QA)
1.2.2 Assignment strategies
- random — Simple randomization (control vs static)
- thompson — Thompson sampling (multi-armed bandit)
- epsilon_greedy — Epsilon-greedy exploration
- ucb — Upper Confidence Bound algorithm
1.2.3 Audience targeting
- User property filters (platform, version, region, subscription tier)
- Percentage rollout within target segment
- Exclusion lists (beta users, internal accounts)

1.3 Event Tracking Pipeline

1.3.1 Metric definitions
- conversion — Binary (did/didn't convert)
- count — Integer events (sessions, messages)
- duration — Time-based (session length, task time)
- revenue — Monetary (purchase amount, LTV)
- custom — Arbitrary numeric values
1.3.2 Event ingestion
- POST /ab-testing/events batch endpoint
- Client SDK: track(event, value, metadata)
- Automatic attribution (which variant caused this event)
- Deduplication (eventId + userId uniqueness)

Phase 1 Exit Criteria:

Experiments created with multiple variants
Users consistently assigned to variants
Events tracked and attributed correctly
20+ tests for assignment and ingestion

Phase 2: Statistical Analysis Engine (Week 1–2)

2.1 Bayesian Inference

2.1.1 Create modules/ab-testing/statistics.ts
- BetaDistribution for conversion rates
- GammaDistribution for count/duration metrics
- NormalDistribution for continuous metrics
- Monte Carlo simulation (10,000 samples)
2.1.2 Probability calculations
- probabilityVariantBeatsControl(variant, control)
- expectedLossIfChosen(variant)
- probabilityBeatAllVariants(variant)
2.1.3 Credible intervals
- 95% credible interval for each variant's true metric
- Visualization-ready (lower, mean, upper bounds)

2.2 Early Stopping Rules

2.2.1 Stopping criteria
- Winner found: Variant has > 95% probability of beating control
- Loser clear: Control has > 95% probability of beating variant
- Practical significance: Minimum detectable effect not reached
- Time bound: Max duration reached (safety limit)
2.2.2 Auto-promotion
- Auto-rollout winner to 100% when threshold hit
- Notify admins via Slack/email
- Create audit log entry
2.2.3 Guardrails
- Minimum sample size before early stopping (100 users/variant)
- Business hours only for auto-actions
- Require approval for revenue-impacting experiments

2.3 Thompson Sampling

2.3.1 Multi-armed bandit implementation
- Sample from posterior distributions
- Assign user to variant with highest sample
- Re-balance traffic every hour based on performance
2.3.2 Exploration vs exploitation
- Exploration rate decays over time
- High uncertainty = more exploration
- Clear winner = more traffic to winner
2.3.3 Regret minimization
- Track cumulative regret vs optimal variant
- Regret bounds reporting

Phase 2 Exit Criteria:

Bayesian probabilities calculated correctly
Early stopping triggers at appropriate thresholds
Thompson sampling re-allocates traffic dynamically
Statistical tests validate correctness

Phase 3: AI-Powered Hypothesis Generation (Week 2)

3.1 Pattern Detection

3.1.1 Usage pattern analysis
- Analyze feature flag usage telemetry
- Segment analysis (iOS vs Android, free vs pro)
- Temporal patterns (day of week, time of day)
- User behavior sequences (funnel analysis)
3.1.2 Anomaly detection
- Unexpected drop in feature adoption
- Performance regression signals
- User segment showing different behavior
3.1.3 Opportunity identification
- Underperforming features (low adoption)
- High-dropoff flows
- Competitor feature gaps

3.2 Hypothesis Generation

3.2.1 LLM hypothesis prompts

Given this feature usage data:
- Feature: {featureName}
- Current adoption: {adoptionRate}% (baseline: {baseline}%)
- Segment performance: {segmentData}
- User feedback: {feedbackSamples}
- Competitor analysis: {competitorFeatures}

Generate experiment hypotheses:
1. Primary hypothesis: "Changing X will improve Y because..."
2. Secondary hypotheses (2-3 alternatives)
3. Expected effect size (conservative estimate)
4. Success metric recommendation
5. Risk assessment

3.2.2 Hypothesis ranking
- Expected impact scoring
- Implementation difficulty estimate
- Statistical power prediction
- Risk-adjusted expected value
3.2.3 Suggested experiment design
- Variant count recommendation
- Traffic allocation suggestion
- Duration estimate
- Required sample size calculation

3.3 Auto-Experiment Suggestions

3.3.1 Weekly AI reports
- Top 5 experiment opportunities
- Hypotheses with supporting evidence
- Prioritized by expected impact
3.3.2 One-click experiment creation
- Pre-fill experiment from hypothesis
- Suggested variants with descriptions
- Pre-configured metrics

Phase 3 Exit Criteria:

AI generates meaningful hypotheses from usage data
Hypothesis quality rated by product team (80%+ useful)
Auto-suggested experiments created in 1 click
Weekly reports generated automatically

Phase 4: Admin Dashboard UI (Week 2–3)

4.1 Experiments List Page

4.1.1 Create /ops/experiments/page.tsx
- Experiment cards (status, duration, sample size)
- Quick filters (running, completed, draft)
- AI-generated hypothesis badge
- Health indicators (traffic balance, event flow)
4.1.2 Experiment creation wizard
- Step 1: Define hypothesis (AI suggestions available)
- Step 2: Create variants (name, description, config)
- Step 3: Select metrics (primary + secondary)
- Step 4: Audience targeting
- Step 5: Traffic allocation (manual or Thompson)
- Step 6: Review and launch

4.2 Live Experiment Dashboard

4.2.1 Create /ops/experiments/[id]/page.tsx
- Real-time metrics comparison
- Variant performance table (conversions, counts, durations)
- Bayesian probability visualization
- Credible interval charts
4.2.2 Statistical summary card
- Probability of beating control (per variant)
- Expected lift if implemented
- Sample size progress bar
- Days to significance estimate
4.2.3 Action buttons
- Adjust traffic allocation
- Pause/resume experiment
- Stop and declare winner
- Rollout winner to 100%
- Archive experiment

4.3 Results & Reporting

4.3.1 Results page
- Final statistical summary
- Variant comparison visualization
- Segment breakdown (iOS vs Android, etc.)
- Confidence intervals over time
4.3.2 AI insights panel
- Why this result occurred (LLM summary)
- Unexpected findings
- Follow-up experiment suggestions
4.3.3 Export capabilities
- CSV export of raw data
- PDF report generation
- API endpoint for data warehouse sync

Phase 4 Exit Criteria:

Full experiment lifecycle manageable in UI
Real-time stats visible and accurate
Bayesian visualizations clear to non-statisticians
Export and reporting functional

Phase 5: Advanced Capabilities (Future)

5.1 Multi-Variate Testing

Test multiple variables simultaneously
Full factorial and fractional factorial designs
Interaction effect detection

5.2 Sequential Experimentation

Multi-phase experiments (qualification → main → validation)
Holdout groups for long-term validation
Global holdout (never-exposed users)

5.3 Personalization Layer

Contextual bandits (different variants for different users)
ML model for variant selection
Automatic personalization optimization

5.4 Experiment Coordination

Mutually exclusive experiments
Experiment priority rules
Layered experimentation (orthogonal tests)

Appendix A: Data Models

ExperimentDoc

interface ExperimentDoc {
  id: string; // exp_<uuid>
  productId: string; // partition key

  // Experiment definition
  name: string;
  description: string;
  hypothesis: string;
  aiGeneratedHypothesis?: boolean; // Flag for AI-suggested

  // Status lifecycle: draft → running → paused | stopped | completed
  status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';

  // Variants
  controlVariantId: string; // Baseline variant
  variantIds: string[]; // All variant IDs

  // Configuration
  allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
  targetPercent: number; // % of eligible traffic

  // Audience targeting
  targeting: {
    platforms?: string[]; // ios, android, web
    appVersions?: { min: string; max?: string };
    regions?: string[];
    userSegments?: string[]; // pro, free, enterprise
    userProperties?: Record<string, string | number | boolean>;
  };

  // Metrics
  primaryMetric: {
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string; // Telemetry event to track
    aggregation: 'sum' | 'mean' | 'count' | 'unique';
    direction: 'increase' | 'decrease'; // Is higher better?
    minimumDetectableEffect: number; // % change we want to detect
  };
  secondaryMetrics: Array<{
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string;
  }>;

  // Guardrails
  guardrails: {
    minSampleSizePerVariant: number; // Default: 100
    maxDurationDays: number; // Safety limit, default: 30
    autoStopEnabled: boolean;
    winnerThreshold: number; // % probability to auto-stop, default: 95
    requireApprovalFor: 'none' | 'revenue' | 'all';
  };

  // Scheduling
  startAt?: string; // Scheduled start (ISO 8601)
  endAt?: string; // Scheduled end or actual stop

  // Stats (denormalized for fast reads)
  totalParticipants: number;
  totalEvents: number;

  // Timestamps
  createdAt: string;
  updatedAt: string;
  startedAt?: string;
  completedAt?: string;
  ttl: number; // 2 years for completed
}

VariantDoc

interface VariantDoc {
  id: string; // var_<uuid>
  experimentId: string; // partition key

  // Variant definition
  name: string; // "Control", "New Button Color", etc.
  description?: string;
  isControl: boolean;

  // Feature flag configuration
  flagConfig: Record<string, unknown>; // Arbitrary config payload

  // Traffic allocation (dynamic for bandit strategies)
  currentAllocationPercent: number; // 0–100%

  // Statistics (real-time computed)
  stats: {
    participants: number;
    events: number;

    // Primary metric
    primaryMetricValue: number; // Mean or conversion rate
    primaryMetricStdDev?: number;

    // For conversion metrics
    conversions?: number;
    conversionRate?: number; // 0–1

    // Bayesian posterior parameters
    betaAlpha?: number; // For Beta distribution
    betaBeta?: number;

    gammaShape?: number; // For Gamma distribution
    gammaScale?: number;
  };

  // Bayesian results
  bayesianResults?: {
    probabilityBeatsControl: number; // 0–1
    probabilityBeatsAll: number; // 0–1
    expectedLiftPercent: number; // Relative to control
    expectedLoss: number; // Risk of choosing this variant
    credibleInterval: {
      lower: number;
      mean: number;
      upper: number;
    };
  };

  createdAt: string;
  updatedAt: string;
}

ExperimentAssignmentDoc

interface ExperimentAssignmentDoc {
  id: string; // ea_<uuid>
  userId: string; // partition key (for user lookups)

  experimentId: string;
  variantId: string;

  // Assignment metadata
  assignedAt: string; // First assignment
  firstExposedAt?: string; // First actual exposure (feature use)

  // Context at assignment
  assignmentContext: {
    platform: string;
    appVersion: string;
    osVersion: string;
    deviceModel?: string;
    region?: string;
  };

  // Events attributed to this assignment
  eventCount: number;
  lastEventAt?: string;

  // TTL: Remove after experiment completes + analysis period
  ttl: number; // experimentEnd + 90 days
}

ExperimentEventDoc

interface ExperimentEventDoc {
  id: string; // ee_<uuid>
  experimentId: string; // partition key
  timestamp: string; // Sort key for time-series queries

  // Attribution
  userId: string;
  variantId: string;
  assignmentId: string;

  // Event details
  metricName: string;
  metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
  value: number; // Numeric value

  // Conversion tracking (for binary metrics)
  converted: boolean; // For conversion metrics

  // Context
  eventMetadata?: Record<string, unknown>;

  // Denormalized for filtering
  platform: string;
  appVersion: string;

  // TTL: Shorter for raw events
  ttl: number; // 90 days
}

Implementation Tracking

Phase	Task	Status	Commit
1.1	Experiment types & schemas	✅	`a9b2247`
1.1	Cosmos containers	✅	`a9b2247`
1.2	Deterministic bucketing	✅	`783067e`
1.2	Assignment strategies	✅	`783067e`
1.2	Audience targeting	✅	`783067e`
1.3	Metric definitions	✅	`783067e`
1.3	Event ingestion	✅	`783067e`
2.1	Bayesian inference engine	✅	`783067e`
2.1	Probability calculations	✅	`783067e`
2.1	Credible intervals	✅	`783067e`
2.2	Early stopping rules	✅	`783067e`
2.2	Auto-promotion	✅	`783067e`
2.2	Guardrails	✅	`783067e`
2.3	Thompson sampling	✅	`783067e`
2.3	Exploration vs exploitation	✅	`783067e`
2.3	Regret minimization	✅	`783067e`
3.1	Pattern detection	✅	`44fa045`
3.1	Anomaly detection	✅	`44fa045`
3.2	Hypothesis generation prompts	✅	`44fa045`
3.2	Hypothesis ranking	✅	`44fa045`
3.3	Auto-experiment suggestions	✅	`44fa045`
4.1	Experiments list page	✅	`44fa045`
4.1	Creation wizard	✅	`44fa045`
4.2	Live dashboard	✅	`44fa045`
4.2	Statistical summary	✅	`44fa045`
4.3	Results & reporting	✅	`44fa045`
4.3	AI insights panel	✅	`44fa045`

Legend: ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred

Quick Reference for Implementing Agent

📋 Full Roadmap: /Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md

Key Files to Modify/Create:

services/platform-service/
├── src/
│   ├── modules/ab-testing/
│   │   ├── types.ts              # [1.1] Experiment, Variant, Assignment types
│   │   ├── repository.ts         # [1.2] Data access layer
│   │   ├── bucketing.ts          # [1.2] FNV-1a hash, sticky assignments
│   │   ├── statistics.ts         # [2.1] Bayesian inference, Beta/Normal distributions
│   │   ├── allocation.ts         # [2.3] Thompson sampling, bandit strategies
│   │   ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
│   │   ├── routes.ts             # [4] REST API
│   │   └── ab-testing.test.ts    # Tests
│   ├── lib/
│   │   └── cosmos-init.ts        # [1.1] Add containers
│   └── server.ts                 # Register routes
dashboards/admin-web/
├── src/
│   ├── app/(dashboard)/
│   │   ├── experiments/
│   │   │   ├── page.tsx          # [4.1] Experiments list
│   │   │   ├── new/page.tsx      # [4.1] Creation wizard
│   │   │   └── [id]/
│   │   │       └── page.tsx      # [4.2] Live dashboard
│   ├── lib/
│   │   └── experiments-client.ts # API client
│   └── components/
│       └── experiments/          # Bayesian charts, variant cards

Commit Message Format:

feat(ab-testing): <description> [<task.code>]

Example:

git add services/platform-service/src/modules/ab-testing/
git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"

Testing Requirements:

Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
Statistical validation: A/A tests, known distribution tests
Integration: End-to-end experiment lifecycle

Dependencies:

Feature flags module (reuse bucketing logic)
Telemetry module (event tracking)
Azure OpenAI (hypothesis generation)

Appendix B: Statistical Methods

Bayesian A/B Testing

Conversion Metrics (Beta-Binomial):

Posterior: Beta(α + conversions, β + non-conversions)
Where α = β = 1 (uniform prior)

Probability variant beats control:
P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]

Continuous Metrics (Normal):

Posterior: Normal(μ_n, σ_n²)
Where μ_n, σ_n updated via conjugate prior

Probability variant beats control via Monte Carlo sampling

Thompson Sampling

For each incoming user:
  For each variant:
    Sample θ_i from variant's posterior distribution
  Assign user to variant with max(θ_i)

Update variant's posterior after observing outcome

Early Stopping

Stop experiment when:
  max_variant P(beats control) > 0.95  → Winner found
  OR max_variant P(beats control) < 0.05 → No winner
  OR days_running > max_duration
  AND samples_per_variant > min_sample_size

Appendix C: API Reference

Method	Endpoint	Auth	Description
POST	`/ab-testing/experiments`	Admin	Create experiment
GET	`/ab-testing/experiments`	Admin	List experiments
GET	`/ab-testing/experiments/:id`	Admin	Get experiment details
PATCH	`/ab-testing/experiments/:id`	Admin	Update experiment
DELETE	`/ab-testing/experiments/:id`	Admin	Stop/archive experiment
POST	`/ab-testing/experiments/:id/start`	Admin	Start experiment
POST	`/ab-testing/experiments/:id/pause`	Admin	Pause experiment
POST	`/ab-testing/experiments/:id/complete`	Admin	Complete with winner
POST	`/ab-testing/assign`	Any auth	Get variant assignment for user
POST	`/ab-testing/events`	Any auth	Track experiment event
GET	`/ab-testing/experiments/:id/results`	Admin	Get statistical results
GET	`/ab-testing/suggestions`	Admin	AI-generated experiment ideas
POST	`/ab-testing/hypotheses`	Admin	Generate hypothesis from pattern

Appendix D: Integration Points

With Feature Flags Module

Experiments build on feature flag infrastructure
Flag state = variant assignment
Consistent bucketing with existing flags

With Telemetry Module

Experiment events enriched with telemetry context
Automatic metric tracking from existing events
Funnel analysis using telemetry breadcrumbs

With Event Bus

Event	Action
`ab.experiment.started`	Notify stakeholders, log audit
`ab.experiment.completed`	Generate report, suggest follow-ups
`ab.variant.declared_winner`	Trigger auto-rollout if enabled
`ab.early_stopping.triggered`	Alert experiment owner

Appendix E: Cost Estimation

Component	Monthly Cost (est.)
Cosmos DB (experiment data)	$100–200
LLM hypothesis generation	$50–100 (weekly reports)
Compute (statistical engine)	$50 (negligible)
Total	$200–350/month

Current Status

Design complete — 2026-03-03
Phase 1: Core Engine — Complete
Phase 2: Statistics — Complete
Phase 3: AI Hypotheses — Complete
Phase 4: Admin UI — Complete
Phase 5: Advanced — Future

Estimated Timeline: COMPLETE (Phases 1–4)