# Intelligent A/B Testing — Implementation Roadmap

> **Module:** `platform-service/src/modules/ab-testing/`  
> **Admin UI:** `/ops/experiments/`  
> **Target:** AI-powered experiment management with auto-allocation, early stopping, and hypothesis generation  
> **Estimated Effort:** 2.5–3 weeks  
> **Status:** 🟡 Planning

---

## Executive Summary

This roadmap delivers an **intelligent A/B testing platform** that goes beyond traditional feature flags. Unlike manual percentage rollouts, this system uses statistical algorithms for ** Thompson sampling**-based auto-allocation, **Bayesian early stopping** when variants clearly win/lose, and **LLM-powered hypothesis generation** from feature flag usage patterns.

### Key Differentiators vs. Static Feature Flags

| Capability         | Static Flags (Current) | Intelligent A/B Testing                   |
| ------------------ | ---------------------- | ----------------------------------------- |
| Traffic Allocation | Manual percentage      | **Multi-armed bandit optimization**       |
| Stopping Decision  | Manual monitoring      | **Auto-stop at statistical significance** |
| Winner Selection   | Human judgment         | **Bayesian probability of superiority**   |
| Test Duration      | Fixed (often wrong)    | **Dynamic based on effect size**          |
| Hypothesis         | Human-written          | **AI-generated from usage patterns**      |
| Sample Size        | Guesswork              | **Power analysis + sequential testing**   |

---

## Phase 1: Core Experiment Engine (Week 1)

### 1.1 Data Model & Schemas

- [ ] **1.1.1** Create `modules/ab-testing/types.ts`
  - [ ] `ExperimentDoc` — experiment definition and config
  - [ ] `VariantDoc` — variant metadata + metrics
  - [ ] `AssignmentDoc` — user → variant assignments
  - [ ] `MetricDoc` — event types being tracked
  - [ ] `ExperimentResult` — statistical analysis results
  - [ ] Zod schemas for all inputs
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
  - [ ] `experiments` (pk: `/productId`, TTL: 2 years for completed)
  - [ ] `experiment_variants` (pk: `/experimentId`)
  - [ ] `experiment_assignments` (pk: `/userId`, query by experiment)
  - [ ] `experiment_events` (pk: `/experimentId` + `/timestamp` for time-series)
  - [ ] `experiment_metrics` (pk: `/experimentId`, computed aggregates)

### 1.2 Assignment & Bucketing

- [ ] **1.2.1** Create deterministic bucketing
  - [ ] Consistent hashing (userId + experimentId → variant)
  - [ ] FNV-1a hash algorithm (same as feature flags)
  - [ ] Sticky assignments (user always sees same variant)
  - [ ] Override capability (force specific variant for QA)
- [ ] **1.2.2** Assignment strategies
  - [ ] `random` — Simple randomization (control vs static)
  - [ ] `thompson` — Thompson sampling (multi-armed bandit)
  - [ ] `epsilon_greedy` — Epsilon-greedy exploration
  - [ ] `ucb` — Upper Confidence Bound algorithm
- [ ] **1.2.3** Audience targeting
  - [ ] User property filters (platform, version, region, subscription tier)
  - [ ] Percentage rollout within target segment
  - [ ] Exclusion lists (beta users, internal accounts)

### 1.3 Event Tracking Pipeline

- [ ] **1.3.1** Metric definitions
  - [ ] `conversion` — Binary (did/didn't convert)
  - [ ] `count` — Integer events (sessions, messages)
  - [ ] `duration` — Time-based (session length, task time)
  - [ ] `revenue` — Monetary (purchase amount, LTV)
  - [ ] `custom` — Arbitrary numeric values
- [ ] **1.3.2** Event ingestion
  - [ ] `POST /ab-testing/events` batch endpoint
  - [ ] Client SDK: `track(event, value, metadata)`
  - [ ] Automatic attribution (which variant caused this event)
  - [ ] Deduplication (eventId + userId uniqueness)

**Phase 1 Exit Criteria:**

- [ ] Experiments created with multiple variants
- [ ] Users consistently assigned to variants
- [ ] Events tracked and attributed correctly
- [ ] 20+ tests for assignment and ingestion

---

## Phase 2: Statistical Analysis Engine (Week 1–2)

### 2.1 Bayesian Inference

- [ ] **2.1.1** Create `modules/ab-testing/statistics.ts`
  - [ ] `BetaDistribution` for conversion rates
  - [ ] `GammaDistribution` for count/duration metrics
  - [ ] `NormalDistribution` for continuous metrics
  - [ ] Monte Carlo simulation (10,000 samples)
- [ ] **2.1.2** Probability calculations
  - [ ] `probabilityVariantBeatsControl(variant, control)`
  - [ ] `expectedLossIfChosen(variant)`
  - [ ] `probabilityBeatAllVariants(variant)`
- [ ] **2.1.3** Credible intervals
  - [ ] 95% credible interval for each variant's true metric
  - [ ] Visualization-ready (lower, mean, upper bounds)

### 2.2 Early Stopping Rules

- [ ] **2.2.1** Stopping criteria
  - [ ] **Winner found:** Variant has > 95% probability of beating control
  - [ ] **Loser clear:** Control has > 95% probability of beating variant
  - [ ] **Practical significance:** Minimum detectable effect not reached
  - [ ] **Time bound:** Max duration reached (safety limit)
- [ ] **2.2.2** Auto-promotion
  - [ ] Auto-rollout winner to 100% when threshold hit
  - [ ] Notify admins via Slack/email
  - [ ] Create audit log entry
- [ ] **2.2.3** Guardrails
  - [ ] Minimum sample size before early stopping (100 users/variant)
  - [ ] Business hours only for auto-actions
  - [ ] Require approval for revenue-impacting experiments

### 2.3 Thompson Sampling

- [ ] **2.3.1** Multi-armed bandit implementation
  - [ ] Sample from posterior distributions
  - [ ] Assign user to variant with highest sample
  - [ ] Re-balance traffic every hour based on performance
- [ ] **2.3.2** Exploration vs exploitation
  - [ ] Exploration rate decays over time
  - [ ] High uncertainty = more exploration
  - [ ] Clear winner = more traffic to winner
- [ ] **2.3.3** Regret minimization
  - [ ] Track cumulative regret vs optimal variant
  - [ ] Regret bounds reporting

**Phase 2 Exit Criteria:**

- [ ] Bayesian probabilities calculated correctly
- [ ] Early stopping triggers at appropriate thresholds
- [ ] Thompson sampling re-allocates traffic dynamically
- [ ] Statistical tests validate correctness

---

## Phase 3: AI-Powered Hypothesis Generation (Week 2)

### 3.1 Pattern Detection

- [ ] **3.1.1** Usage pattern analysis
  - [ ] Analyze feature flag usage telemetry
  - [ ] Segment analysis (iOS vs Android, free vs pro)
  - [ ] Temporal patterns (day of week, time of day)
  - [ ] User behavior sequences (funnel analysis)
- [ ] **3.1.2** Anomaly detection
  - [ ] Unexpected drop in feature adoption
  - [ ] Performance regression signals
  - [ ] User segment showing different behavior
- [ ] **3.1.3** Opportunity identification
  - [ ] Underperforming features (low adoption)
  - [ ] High-dropoff flows
  - [ ] Competitor feature gaps

### 3.2 Hypothesis Generation

- [ ] **3.2.1** LLM hypothesis prompts

  ```
  Given this feature usage data:
  - Feature: {featureName}
  - Current adoption: {adoptionRate}% (baseline: {baseline}%)
  - Segment performance: {segmentData}
  - User feedback: {feedbackSamples}
  - Competitor analysis: {competitorFeatures}

  Generate experiment hypotheses:
  1. Primary hypothesis: "Changing X will improve Y because..."
  2. Secondary hypotheses (2-3 alternatives)
  3. Expected effect size (conservative estimate)
  4. Success metric recommendation
  5. Risk assessment
  ```

- [ ] **3.2.2** Hypothesis ranking
  - [ ] Expected impact scoring
  - [ ] Implementation difficulty estimate
  - [ ] Statistical power prediction
  - [ ] Risk-adjusted expected value
- [ ] **3.2.3** Suggested experiment design
  - [ ] Variant count recommendation
  - [ ] Traffic allocation suggestion
  - [ ] Duration estimate
  - [ ] Required sample size calculation

### 3.3 Auto-Experiment Suggestions

- [ ] **3.3.1** Weekly AI reports
  - [ ] Top 5 experiment opportunities
  - [ ] Hypotheses with supporting evidence
  - [ ] Prioritized by expected impact
- [ ] **3.3.2** One-click experiment creation
  - [ ] Pre-fill experiment from hypothesis
  - [ ] Suggested variants with descriptions
  - [ ] Pre-configured metrics

**Phase 3 Exit Criteria:**

- [ ] AI generates meaningful hypotheses from usage data
- [ ] Hypothesis quality rated by product team (80%+ useful)
- [ ] Auto-suggested experiments created in 1 click
- [ ] Weekly reports generated automatically

---

## Phase 4: Admin Dashboard UI (Week 2–3)

### 4.1 Experiments List Page

- [ ] **4.1.1** Create `/ops/experiments/page.tsx`
  - [ ] Experiment cards (status, duration, sample size)
  - [ ] Quick filters (running, completed, draft)
  - [ ] AI-generated hypothesis badge
  - [ ] Health indicators (traffic balance, event flow)
- [ ] **4.1.2** Experiment creation wizard
  - [ ] Step 1: Define hypothesis (AI suggestions available)
  - [ ] Step 2: Create variants (name, description, config)
  - [ ] Step 3: Select metrics (primary + secondary)
  - [ ] Step 4: Audience targeting
  - [ ] Step 5: Traffic allocation (manual or Thompson)
  - [ ] Step 6: Review and launch

### 4.2 Live Experiment Dashboard

- [ ] **4.2.1** Create `/ops/experiments/[id]/page.tsx`
  - [ ] Real-time metrics comparison
  - [ ] Variant performance table (conversions, counts, durations)
  - [ ] Bayesian probability visualization
  - [ ] Credible interval charts
- [ ] **4.2.2** Statistical summary card
  - [ ] Probability of beating control (per variant)
  - [ ] Expected lift if implemented
  - [ ] Sample size progress bar
  - [ ] Days to significance estimate
- [ ] **4.2.3** Action buttons
  - [ ] Adjust traffic allocation
  - [ ] Pause/resume experiment
  - [ ] Stop and declare winner
  - [ ] Rollout winner to 100%
  - [ ] Archive experiment

### 4.3 Results & Reporting

- [ ] **4.3.1** Results page
  - [ ] Final statistical summary
  - [ ] Variant comparison visualization
  - [ ] Segment breakdown (iOS vs Android, etc.)
  - [ ] Confidence intervals over time
- [ ] **4.3.2** AI insights panel
  - [ ] Why this result occurred (LLM summary)
  - [ ] Unexpected findings
  - [ ] Follow-up experiment suggestions
- [ ] **4.3.3** Export capabilities
  - [ ] CSV export of raw data
  - [ ] PDF report generation
  - [ ] API endpoint for data warehouse sync

**Phase 4 Exit Criteria:**

- [ ] Full experiment lifecycle manageable in UI
- [ ] Real-time stats visible and accurate
- [ ] Bayesian visualizations clear to non-statisticians
- [ ] Export and reporting functional

---

## Phase 5: Advanced Capabilities (Future)

### 5.1 Multi-Variate Testing

- [ ] Test multiple variables simultaneously
- [ ] Full factorial and fractional factorial designs
- [ ] Interaction effect detection

### 5.2 Sequential Experimentation

- [ ] Multi-phase experiments (qualification → main → validation)
- [ ] Holdout groups for long-term validation
- [ ] Global holdout (never-exposed users)

### 5.3 Personalization Layer

- [ ] Contextual bandits (different variants for different users)
- [ ] ML model for variant selection
- [ ] Automatic personalization optimization

### 5.4 Experiment Coordination

- [ ] Mutually exclusive experiments
- [ ] Experiment priority rules
- [ ] Layered experimentation (orthogonal tests)

---

## Appendix A: Data Models

### ExperimentDoc

```typescript
interface ExperimentDoc {
  id: string; // exp_<uuid>
  productId: string; // partition key

  // Experiment definition
  name: string;
  description: string;
  hypothesis: string;
  aiGeneratedHypothesis?: boolean; // Flag for AI-suggested

  // Status lifecycle: draft → running → paused | stopped | completed
  status: 'draft' | 'running' | 'paused' | 'stopped' | 'completed';

  // Variants
  controlVariantId: string; // Baseline variant
  variantIds: string[]; // All variant IDs

  // Configuration
  allocationStrategy: 'random' | 'thompson' | 'epsilon_greedy' | 'ucb';
  targetPercent: number; // % of eligible traffic

  // Audience targeting
  targeting: {
    platforms?: string[]; // ios, android, web
    appVersions?: { min: string; max?: string };
    regions?: string[];
    userSegments?: string[]; // pro, free, enterprise
    userProperties?: Record<string, string | number | boolean>;
  };

  // Metrics
  primaryMetric: {
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string; // Telemetry event to track
    aggregation: 'sum' | 'mean' | 'count' | 'unique';
    direction: 'increase' | 'decrease'; // Is higher better?
    minimumDetectableEffect: number; // % change we want to detect
  };
  secondaryMetrics: Array<{
    name: string;
    type: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
    eventName: string;
  }>;

  // Guardrails
  guardrails: {
    minSampleSizePerVariant: number; // Default: 100
    maxDurationDays: number; // Safety limit, default: 30
    autoStopEnabled: boolean;
    winnerThreshold: number; // % probability to auto-stop, default: 95
    requireApprovalFor: 'none' | 'revenue' | 'all';
  };

  // Scheduling
  startAt?: string; // Scheduled start (ISO 8601)
  endAt?: string; // Scheduled end or actual stop

  // Stats (denormalized for fast reads)
  totalParticipants: number;
  totalEvents: number;

  // Timestamps
  createdAt: string;
  updatedAt: string;
  startedAt?: string;
  completedAt?: string;
  ttl: number; // 2 years for completed
}
```

### VariantDoc

```typescript
interface VariantDoc {
  id: string; // var_<uuid>
  experimentId: string; // partition key

  // Variant definition
  name: string; // "Control", "New Button Color", etc.
  description?: string;
  isControl: boolean;

  // Feature flag configuration
  flagConfig: Record<string, unknown>; // Arbitrary config payload

  // Traffic allocation (dynamic for bandit strategies)
  currentAllocationPercent: number; // 0–100%

  // Statistics (real-time computed)
  stats: {
    participants: number;
    events: number;

    // Primary metric
    primaryMetricValue: number; // Mean or conversion rate
    primaryMetricStdDev?: number;

    // For conversion metrics
    conversions?: number;
    conversionRate?: number; // 0–1

    // Bayesian posterior parameters
    betaAlpha?: number; // For Beta distribution
    betaBeta?: number;

    gammaShape?: number; // For Gamma distribution
    gammaScale?: number;
  };

  // Bayesian results
  bayesianResults?: {
    probabilityBeatsControl: number; // 0–1
    probabilityBeatsAll: number; // 0–1
    expectedLiftPercent: number; // Relative to control
    expectedLoss: number; // Risk of choosing this variant
    credibleInterval: {
      lower: number;
      mean: number;
      upper: number;
    };
  };

  createdAt: string;
  updatedAt: string;
}
```

### ExperimentAssignmentDoc

```typescript
interface ExperimentAssignmentDoc {
  id: string; // ea_<uuid>
  userId: string; // partition key (for user lookups)

  experimentId: string;
  variantId: string;

  // Assignment metadata
  assignedAt: string; // First assignment
  firstExposedAt?: string; // First actual exposure (feature use)

  // Context at assignment
  assignmentContext: {
    platform: string;
    appVersion: string;
    osVersion: string;
    deviceModel?: string;
    region?: string;
  };

  // Events attributed to this assignment
  eventCount: number;
  lastEventAt?: string;

  // TTL: Remove after experiment completes + analysis period
  ttl: number; // experimentEnd + 90 days
}
```

### ExperimentEventDoc

```typescript
interface ExperimentEventDoc {
  id: string; // ee_<uuid>
  experimentId: string; // partition key
  timestamp: string; // Sort key for time-series queries

  // Attribution
  userId: string;
  variantId: string;
  assignmentId: string;

  // Event details
  metricName: string;
  metricType: 'conversion' | 'count' | 'duration' | 'revenue' | 'custom';
  value: number; // Numeric value

  // Conversion tracking (for binary metrics)
  converted: boolean; // For conversion metrics

  // Context
  eventMetadata?: Record<string, unknown>;

  // Denormalized for filtering
  platform: string;
  appVersion: string;

  // TTL: Shorter for raw events
  ttl: number; // 90 days
}
```

---

## Implementation Tracking

| Phase | Task                          | Status | Commit |
| ----- | ----------------------------- | ------ | ------ |
| 1.1   | Experiment types & schemas    | ✅     | a9b2247 |
| 1.1   | Cosmos containers             | ✅     | a9b2247 |
| 1.2   | Deterministic bucketing       | ✅     | 783067e |
| 1.2   | Assignment strategies         | ✅     | 783067e |
| 1.2   | Audience targeting            | ✅     | 783067e |
| 1.3   | Metric definitions            | ✅     | 783067e |
| 1.3   | Event ingestion               | ✅     | 783067e |
| 2.1   | Bayesian inference engine     | ✅     | 783067e |
| 2.1   | Probability calculations      | ✅     | 783067e |
| 2.1   | Credible intervals            | ✅     | 783067e |
| 2.2   | Early stopping rules          | ✅     | 783067e |
| 2.2   | Auto-promotion                | ✅     | 783067e |
| 2.2   | Guardrails                    | ✅     | 783067e |
| 2.3   | Thompson sampling             | ✅     | 783067e |
| 2.3   | Exploration vs exploitation   | ✅     | 783067e |
| 2.3   | Regret minimization           | ✅     | 783067e |
| 3.1   | Pattern detection             | ✅     | 44fa045 |
| 3.1   | Anomaly detection             | ✅     | 44fa045 |
| 3.2   | Hypothesis generation prompts | ✅     | 44fa045 |
| 3.2   | Hypothesis ranking            | ✅     | 44fa045 |
| 3.3   | Auto-experiment suggestions   | ✅     | 44fa045 |
| 4.1   | Experiments list page         | ✅     | 44fa045 |
| 4.1   | Creation wizard               | ✅     | 44fa045 |
| 4.2   | Live dashboard                | ✅     | 44fa045 |
| 4.2   | Statistical summary           | ✅     | 44fa045 |
| 4.3   | Results & reporting           | ✅     | 44fa045 |
| 4.3   | AI insights panel             | ✅     | 44fa045 |

**Legend:** ⬜ Not started | 🟡 In progress | ✅ Complete | ⏸️ Deferred

---

## Quick Reference for Implementing Agent

**📋 Full Roadmap:** `/Users/sd9235/code/mygh/learning_ai_common_plat/docs/roadmaps/INTELLIGENT_AB_TESTING_ROADMAP.md`

**Key Files to Modify/Create:**

```
services/platform-service/
├── src/
│   ├── modules/ab-testing/
│   │   ├── types.ts              # [1.1] Experiment, Variant, Assignment types
│   │   ├── repository.ts         # [1.2] Data access layer
│   │   ├── bucketing.ts          # [1.2] FNV-1a hash, sticky assignments
│   │   ├── statistics.ts         # [2.1] Bayesian inference, Beta/Normal distributions
│   │   ├── allocation.ts         # [2.3] Thompson sampling, bandit strategies
│   │   ├── hypothesis-generator.ts # [3.2] LLM pattern analysis
│   │   ├── routes.ts             # [4] REST API
│   │   └── ab-testing.test.ts    # Tests
│   ├── lib/
│   │   └── cosmos-init.ts        # [1.1] Add containers
│   └── server.ts                 # Register routes
dashboards/admin-web/
├── src/
│   ├── app/(dashboard)/
│   │   ├── experiments/
│   │   │   ├── page.tsx          # [4.1] Experiments list
│   │   │   ├── new/page.tsx      # [4.1] Creation wizard
│   │   │   └── [id]/
│   │   │       └── page.tsx      # [4.2] Live dashboard
│   ├── lib/
│   │   └── experiments-client.ts # API client
│   └── components/
│       └── experiments/          # Bayesian charts, variant cards
```

**Commit Message Format:**

```
feat(ab-testing): <description> [<task.code>]
```

**Example:**

```bash
git add services/platform-service/src/modules/ab-testing/
git commit -m "feat(ab-testing): add experiment types and cosmos containers [1.1]"
```

**Testing Requirements:**

- Unit tests: 25+ Vitest tests for bucketing, statistics, bandit algorithms
- Statistical validation: A/A tests, known distribution tests
- Integration: End-to-end experiment lifecycle

**Dependencies:**

- Feature flags module (reuse bucketing logic)
- Telemetry module (event tracking)
- Azure OpenAI (hypothesis generation)

---

## Appendix B: Statistical Methods

### Bayesian A/B Testing

**Conversion Metrics (Beta-Binomial):**

```
Posterior: Beta(α + conversions, β + non-conversions)
Where α = β = 1 (uniform prior)

Probability variant beats control:
P(variant > control) = Σ(i=0 to n) [BetaCDF_control(i)] * [BetaPDF_variant(i)]
```

**Continuous Metrics (Normal):**

```
Posterior: Normal(μ_n, σ_n²)
Where μ_n, σ_n updated via conjugate prior

Probability variant beats control via Monte Carlo sampling
```

### Thompson Sampling

```
For each incoming user:
  For each variant:
    Sample θ_i from variant's posterior distribution
  Assign user to variant with max(θ_i)

Update variant's posterior after observing outcome
```

### Early Stopping

```
Stop experiment when:
  max_variant P(beats control) > 0.95  → Winner found
  OR max_variant P(beats control) < 0.05 → No winner
  OR days_running > max_duration
  AND samples_per_variant > min_sample_size
```

---

## Appendix C: API Reference

| Method | Endpoint                               | Auth     | Description                      |
| ------ | -------------------------------------- | -------- | -------------------------------- |
| POST   | `/ab-testing/experiments`              | Admin    | Create experiment                |
| GET    | `/ab-testing/experiments`              | Admin    | List experiments                 |
| GET    | `/ab-testing/experiments/:id`          | Admin    | Get experiment details           |
| PATCH  | `/ab-testing/experiments/:id`          | Admin    | Update experiment                |
| DELETE | `/ab-testing/experiments/:id`          | Admin    | Stop/archive experiment          |
| POST   | `/ab-testing/experiments/:id/start`    | Admin    | Start experiment                 |
| POST   | `/ab-testing/experiments/:id/pause`    | Admin    | Pause experiment                 |
| POST   | `/ab-testing/experiments/:id/complete` | Admin    | Complete with winner             |
| POST   | `/ab-testing/assign`                   | Any auth | Get variant assignment for user  |
| POST   | `/ab-testing/events`                   | Any auth | Track experiment event           |
| GET    | `/ab-testing/experiments/:id/results`  | Admin    | Get statistical results          |
| GET    | `/ab-testing/suggestions`              | Admin    | AI-generated experiment ideas    |
| POST   | `/ab-testing/hypotheses`               | Admin    | Generate hypothesis from pattern |

---

## Appendix D: Integration Points

### With Feature Flags Module

- Experiments build on feature flag infrastructure
- Flag state = variant assignment
- Consistent bucketing with existing flags

### With Telemetry Module

- Experiment events enriched with telemetry context
- Automatic metric tracking from existing events
- Funnel analysis using telemetry breadcrumbs

### With Event Bus

| Event                         | Action                              |
| ----------------------------- | ----------------------------------- |
| `ab.experiment.started`       | Notify stakeholders, log audit      |
| `ab.experiment.completed`     | Generate report, suggest follow-ups |
| `ab.variant.declared_winner`  | Trigger auto-rollout if enabled     |
| `ab.early_stopping.triggered` | Alert experiment owner              |

---

## Appendix E: Cost Estimation

| Component                    | Monthly Cost (est.)      |
| ---------------------------- | ------------------------ |
| Cosmos DB (experiment data)  | $100–200                 |
| LLM hypothesis generation    | $50–100 (weekly reports) |
| Compute (statistical engine) | $50 (negligible)         |
| **Total**                    | **$200–350/month**       |

---

## Current Status

- [x] **Design complete** — 2026-03-03
- [x] **Phase 1: Core Engine** — Complete
- [x] **Phase 2: Statistics** — Complete
- [x] **Phase 3: AI Hypotheses** — Complete
- [x] **Phase 4: Admin UI** — Complete
- [ ] **Phase 5: Advanced** — Future

**Estimated Timeline:** COMPLETE (Phases 1–4)

**Dependencies:**

- Feature flags module (for assignment infrastructure)
- Telemetry module (for event tracking)
- Azure OpenAI (for hypothesis generation)

---

_Last Updated: 2026-03-03_