# Cloud-Agnostic Refactor Roadmap — ByteLyst Ecosystem

> **Author:** AI Analysis (Cascade)
> **Date:** 2026-03-01
> **Companion doc:** [`CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`](./CLOUD_PROVIDER_MIGRATION_ANALYSIS.md)
> **Goal:** Refactor the codebase so it continues to work on Azure today, but switching to any other cloud provider requires **minimum effort** (days, not weeks).

---

## Table of Contents

1. [Philosophy](#1-philosophy)
2. [Current State vs Target State](#2-current-state-vs-target-state)
3. [Sprint Plan Overview](#3-sprint-plan-overview)
4. [Sprint 1: Database Abstraction Layer](#4-sprint-1-database-abstraction-layer)
5. [Sprint 2: Storage Abstraction Layer](#5-sprint-2-storage-abstraction-layer)
6. [Sprint 3: LLM Provider Abstraction](#6-sprint-3-llm-provider-abstraction)
7. [Sprint 4: Secrets Manager Abstraction](#7-sprint-4-secrets-manager-abstraction)
8. [Sprint 5: Speech Provider Abstraction](#8-sprint-5-speech-provider-abstraction)
9. [Sprint 6: Push Notification Abstraction](#9-sprint-6-push-notification-abstraction)
10. [Sprint 7: Monitoring & Telemetry Abstraction](#10-sprint-7-monitoring--telemetry-abstraction)
11. [Migration Effort After Refactor](#11-migration-effort-after-refactor)
12. [Testing Strategy](#12-testing-strategy)
13. [Env Var Naming Convention](#13-env-var-naming-convention)
14. [Risk Mitigation](#14-risk-mitigation)
15. [Appendix: Interface Specifications](#appendix-interface-specifications)

---

## 1. Philosophy

### Core Principle: Provider-Agnostic Interfaces, Provider-Specific Implementations

```
Application Code (routes, business logic)
        │
        ▼
  @bytelyst/* interfaces  ◄── Cloud-agnostic contracts
        │
        ▼
  Provider implementations  ◄── Azure today, swap tomorrow
  ├── cosmos-provider/       (Azure Cosmos DB)
  ├── mongo-provider/        (MongoDB Atlas — future)
  ├── s3-provider/           (AWS S3 — future)
  └── ...
```

### Design Rules

1. **Application code NEVER imports cloud SDKs** — only `@bytelyst/*` interfaces
2. **Provider chosen at startup via env var** — `DB_PROVIDER=cosmos`, `STORAGE_PROVIDER=azure`, etc.
3. **All interfaces have an in-memory mock** — for testing without any cloud dependency
4. **Zero breaking changes** — every sprint keeps all existing tests passing
5. **Incremental adoption** — modules migrate one at a time, old and new patterns coexist

### What This Is NOT

- This is **not** a migration to another cloud — Azure continues to be the production provider
- This is **not** a rewrite — it's a series of refactors that insert interfaces between app code and cloud SDKs
- This is **not** over-engineering — each interface is thin (30–60 lines) and directly maps to patterns already in the codebase

---

## 2. Current State vs Target State

### Current: Direct Azure SDK Usage

```
                    38 repository.ts files
                    ┌──────────────────────┐
routes.ts ────────► │ container()           │
                    │   .items.query(SQL)   │ ◄── @azure/cosmos types leak everywhere
                    │   .items.create(doc)  │
                    │   .item(id,pk).read() │
                    └──────────────────────┘
                              │
                              ▼
                    @bytelyst/cosmos (client.ts)
                              │
                              ▼
                       @azure/cosmos SDK
```

**Problems:**
- 38 platform-service repository files write raw Cosmos SQL queries
- 6 additional repository files in dashboards + MindLyst web
- Blob, Speech, OpenAI all have direct Azure SDK imports
- Switching DB means rewriting 44+ files

### Target: Provider-Agnostic Interfaces

```
                    38 repository.ts files
                    ┌──────────────────────────┐
routes.ts ────────► │ collection.findMany({     │
                    │   filter: {productId},    │ ◄── Cloud-agnostic API
                    │   sort: {createdAt: -1},  │
                    │   limit: 20,              │
                    │ })                         │
                    └──────────────────────────┘
                              │
                              ▼
                    @bytelyst/datastore (interface)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
               CosmosAdapter MongoAdapter MemoryAdapter
               (Azure)      (MongoDB)    (Testing)
                    │
                    ▼
               @azure/cosmos SDK
```

**Benefits:**
- Repositories use a generic query API — no SQL strings, no Azure types
- Switching provider = implement a new adapter (~200 lines) + change env var
- In-memory adapter makes tests fast and cloud-free
- Azure continues to work exactly as before

---

## 3. Sprint Plan Overview

| Sprint | Package / Scope | Effort | Files Changed | Risk |
|--------|----------------|--------|---------------|------|
| **1** | `@bytelyst/datastore` — DB abstraction | 5–7 days | 44 repository files + 1 new package | Medium |
| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low |
| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low |
| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low |
| **5** | `@bytelyst/speech` — Speech STT abstraction | 3–4 days | 3 files + 1 new package | Medium |
| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low |
| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None |
| **Total** | | **~15–17 days** | ~55 files | |

### Priority Order

```
Sprint 1 (DB) ──► Sprint 2 (Storage) ──► Sprint 3 (LLM) ──► Sprint 4 (Secrets)
    ▲ HIGHEST ROI         EASY               EASY              TRIVIAL
    │
    └── 80% of migration effort lives here. Do this first.

Sprint 5 (Speech) ──► Sprint 6 (Push) ──► Sprint 7 (Monitoring)
    MEDIUM               LOW PRIORITY        ALREADY DONE
```

---

## 4. Sprint 1: Database Abstraction Layer

**Package:** `@bytelyst/datastore`
**Effort:** 5–7 days
**This is the most important sprint — it eliminates 80% of cloud lock-in.**

### 4.1 Interface Design

```typescript
// packages/datastore/src/types.ts

/** A cloud-agnostic document collection (like a Cosmos container or Mongo collection). */
export interface DocumentCollection<T extends BaseDocument = BaseDocument> {
  /** Find a single document by ID + partition key. */
  findById(id: string, partitionKey: string): Promise<T | null>;

  /** Find multiple documents matching a filter. */
  findMany(opts: FindManyOptions): Promise<T[]>;

  /** Find one document matching a filter. */
  findOne(opts: FindOneOptions): Promise<T | null>;

  /** Count documents matching a filter. */
  count(filter: FilterMap): Promise<number>;

  /** Insert a new document. */
  create(doc: T): Promise<T>;

  /** Replace an entire document (full overwrite). */
  replace(id: string, partitionKey: string, doc: T): Promise<T>;

  /** Upsert: create if not exists, replace if exists. */
  upsert(doc: T): Promise<T>;

  /** Delete a document by ID + partition key. */
  delete(id: string, partitionKey: string): Promise<boolean>;

  /** Run an aggregation (COUNT, SUM, GROUP BY). */
  aggregate<R = unknown>(opts: AggregateOptions): Promise<R[]>;
}

export interface BaseDocument {
  id: string;
  [key: string]: unknown;
}

export interface FindManyOptions {
  filter: FilterMap;
  sort?: SortMap;
  limit?: number;
  offset?: number;
  partitionKey?: string;
}

export interface FindOneOptions {
  filter: FilterMap;
  partitionKey?: string;
}

export type FilterMap = Record<string, unknown>;
export type SortMap = Record<string, 1 | -1>; // 1 = ASC, -1 = DESC

export interface AggregateOptions {
  filter: FilterMap;
  groupBy?: string[];
  count?: string;     // alias for COUNT(1)
  sum?: string;       // field to SUM
}

/** Factory that creates collections — one per provider. */
export interface DatastoreProvider {
  collection<T extends BaseDocument>(name: string): DocumentCollection<T>;
  initialize?(configs: Record<string, CollectionConfig>): Promise<void>;
  close?(): Promise<void>;
}

export interface CollectionConfig {
  partitionKeyPath: string;
  defaultTtl?: number | null;
}
```

### 4.2 Cosmos Adapter (keeps everything working today)

```typescript
// packages/datastore/src/providers/cosmos.ts

import type { Container } from '@azure/cosmos';
import type { BaseDocument, DocumentCollection, FindManyOptions, FilterMap, ... } from '../types.js';

export class CosmosCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private container: Container) {}

  async findById(id: string, partitionKey: string): Promise<T | null> {
    try {
      const { resource } = await this.container.item(id, partitionKey).read<T>();
      return resource ?? null;
    } catch { return null; }
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    const { sql, params } = buildSqlQuery(opts);         // ◄── converts FilterMap → Cosmos SQL
    const { resources } = await this.container
      .items.query<T>({ query: sql, parameters: params })
      .fetchAll();
    return resources;
  }

  async create(doc: T): Promise<T> {
    const { resource } = await this.container.items.create(doc);
    return resource as T;
  }

  async replace(id: string, partitionKey: string, doc: T): Promise<T> {
    const { resource } = await this.container.item(id, partitionKey).replace<T>(doc);
    return resource as T;
  }

  async upsert(doc: T): Promise<T> {
    const { resource } = await this.container.items.upsert<T>(doc);
    return resource as T;
  }

  async delete(id: string, partitionKey: string): Promise<boolean> {
    try {
      await this.container.item(id, partitionKey).delete();
      return true;
    } catch { return false; }
  }

  // ... count(), findOne(), aggregate()
}

/** Convert a FilterMap to Cosmos SQL. */
function buildSqlQuery(opts: FindManyOptions): { sql: string; params: SqlParam[] } {
  // { productId: 'x', userId: 'y' }
  // → "SELECT * FROM c WHERE c.productId = @p0 AND c.userId = @p1 ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20"
  // This is a mechanical translation — no query language exposed to application code.
}
```

### 4.3 In-Memory Adapter (for testing)

```typescript
// packages/datastore/src/providers/memory.ts

export class MemoryCollection<T extends BaseDocument> implements DocumentCollection<T> {
  private docs: Map<string, T> = new Map();

  async findById(id: string): Promise<T | null> {
    return this.docs.get(id) ?? null;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let results = [...this.docs.values()].filter(doc => matchesFilter(doc, opts.filter));
    if (opts.sort) results = sortDocs(results, opts.sort);
    if (opts.offset) results = results.slice(opts.offset);
    if (opts.limit) results = results.slice(0, opts.limit);
    return results;
  }

  async create(doc: T): Promise<T> {
    this.docs.set(doc.id, doc);
    return doc;
  }
  // ... etc
}
```

### 4.4 MongoDB Adapter (future — ready to implement when needed)

```typescript
// packages/datastore/src/providers/mongo.ts (STUB — implement when migrating)

import type { Collection as MongoCollection } from 'mongodb';
import type { BaseDocument, DocumentCollection, FindManyOptions } from '../types.js';

export class MongoDocumentCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private collection: MongoCollection<T>) {}

  async findById(id: string): Promise<T | null> {
    return this.collection.findOne({ _id: id } as any) as Promise<T | null>;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let cursor = this.collection.find(opts.filter);
    if (opts.sort) cursor = cursor.sort(opts.sort);
    if (opts.offset) cursor = cursor.skip(opts.offset);
    if (opts.limit) cursor = cursor.limit(opts.limit);
    return cursor.toArray() as Promise<T[]>;
  }
  // ... etc
}
```

### 4.5 How Repository Files Change

**Before (Cosmos SQL in every file):**

```typescript
// services/platform-service/src/modules/flags/repository.ts
import { getContainer } from '../../lib/cosmos.js';

function container() {
  return getContainer('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId ORDER BY c.key ASC',
      parameters: [{ name: '@productId', value: productId }],
    })
    .fetchAll();
  return resources;
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId AND c.key = @key',
      parameters: [
        { name: '@productId', value: productId },
        { name: '@key', value: key },
      ],
    })
    .fetchAll();
  return resources[0] ?? null;
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  const { resource } = await container().items.create(doc);
  return resource as FeatureFlagDoc;
}
```

**After (cloud-agnostic):**

```typescript
// services/platform-service/src/modules/flags/repository.ts
import { getCollection } from '../../lib/datastore.js';
import type { FeatureFlagDoc } from './types.js';

function collection() {
  return getCollection<FeatureFlagDoc>('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  return collection().findMany({
    filter: { productId },
    sort: { key: 1 },
  });
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  return collection().findOne({
    filter: { productId, key },
  });
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  return collection().create(doc);
}
```

**Key observations:**
- No SQL strings
- No `@azure/cosmos` types
- No `.items.query().fetchAll()` chaining
- The `getCollection()` function returns the right provider based on `DB_PROVIDER` env var
- **All existing behavior is preserved** — the Cosmos adapter generates the same SQL under the hood

### 4.6 Service Wiring

```typescript
// services/platform-service/src/lib/datastore.ts (replaces lib/cosmos.ts)
import { createDatastoreProvider } from '@bytelyst/datastore';
import type { DocumentCollection, BaseDocument } from '@bytelyst/datastore';

let _provider: ReturnType<typeof createDatastoreProvider> | null = null;

export function getProvider() {
  if (!_provider) {
    _provider = createDatastoreProvider(); // reads DB_PROVIDER env var
  }
  return _provider;
}

export function getCollection<T extends BaseDocument>(name: string): DocumentCollection<T> {
  return getProvider().collection<T>(name);
}
```

```typescript
// packages/datastore/src/factory.ts
export function createDatastoreProvider(): DatastoreProvider {
  const provider = process.env.DB_PROVIDER || 'cosmos';
  switch (provider) {
    case 'cosmos':
      return new CosmosDatastoreProvider();  // uses existing COSMOS_ENDPOINT, COSMOS_KEY
    case 'mongo':
      return new MongoDatastoreProvider();   // uses MONGO_URI
    case 'memory':
      return new MemoryDatastoreProvider();  // no config needed
    default:
      throw new Error(`Unknown DB_PROVIDER: ${provider}`);
  }
}
```

### 4.7 Migration Plan for 38 Repository Files

Migrate in batches, one module per commit. Each commit:
1. Update the repository file to use `getCollection()` instead of `getContainer()`
2. Replace SQL queries with `findMany()` / `findOne()` / `count()` / `aggregate()`
3. Run the module's test file — must pass
4. Commit: `refactor(module-name): migrate to datastore abstraction`

**Batch order** (simplest first, complex last):

| Batch | Modules | Complexity | Notes |
|-------|---------|-----------|-------|
| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup |
| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files |
| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files |
| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files |
| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files |
| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files |
| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files |
| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files |
| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files |
| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files |

### 4.8 Handling Complex Queries

Some repository files use advanced Cosmos SQL features. Here's how the interface handles them:

| Cosmos SQL Pattern | Datastore Interface Equivalent |
|--------------------|-------------------------------|
| `SELECT * FROM c WHERE c.x = @v` | `findMany({ filter: { x: v } })` |
| `SELECT * FROM c WHERE c.x = @v AND c.y = @w` | `findMany({ filter: { x: v, y: w } })` |
| `ORDER BY c.x ASC` | `findMany({ sort: { x: 1 } })` |
| `ORDER BY c.x DESC` | `findMany({ sort: { x: -1 } })` |
| `OFFSET @o LIMIT @l` | `findMany({ offset: o, limit: l })` |
| `SELECT VALUE COUNT(1) FROM c WHERE ...` | `count({ filter })` |
| `SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan` | `aggregate({ filter, groupBy: ['plan'], count: 'cnt' })` |
| `NOT IS_DEFINED(c.usedAt)` | `findMany({ filter: { usedAt: { $exists: false } } })` |
| `c.x >= @v` | `findMany({ filter: { x: { $gte: v } } })` |
| `ARRAY_CONTAINS(c.tags, @tag)` | `findMany({ filter: { tags: { $contains: tag } } })` |
| `container().item(id, pk).read()` | `findById(id, pk)` |
| `container().items.create(doc)` | `create(doc)` |
| `container().item(id, pk).replace(doc)` | `replace(id, pk, doc)` |
| `container().items.upsert(doc)` | `upsert(doc)` |
| `container().item(id, pk).delete()` | `delete(id, pk)` |

For the filter operators, use a simple operator convention:

```typescript
// Exact match
{ productId: 'lysnrai' }

// Comparison operators
{ syncVersion: { $gte: 5 } }
{ createdAt: { $gte: '2026-01-01', $lt: '2026-02-01' } }

// Exists check (replaces NOT IS_DEFINED)
{ usedAt: { $exists: false } }

// Array contains
{ tags: { $contains: 'important' } }
```

The Cosmos adapter translates these to SQL. The MongoDB adapter passes them directly (native MQL). The memory adapter does in-memory filtering.

---

## 5. Sprint 2: Storage Abstraction Layer

**Package:** `@bytelyst/storage`
**Effort:** 2 days
**Files changed:** `packages/blob/src/blob.ts`, `src/cloud/blob_client.py`, `services/platform-service/src/modules/blob/`

### 5.1 Interface Design

```typescript
// packages/storage/src/types.ts

export interface StorageProvider {
  /** Get or create a bucket/container. */
  getBucket(name: string): StorageBucket;

  /** Check if storage is configured. */
  isConfigured(): boolean;
}

export interface StorageBucket {
  /** Upload a blob/object. */
  upload(path: string, data: Buffer | ReadableStream, contentType?: string): Promise<void>;

  /** Download a blob/object. */
  download(path: string): Promise<Buffer>;

  /** Delete a blob/object. */
  delete(path: string): Promise<boolean>;

  /** Check if a blob/object exists. */
  exists(path: string): Promise<boolean>;

  /** List blobs/objects with optional prefix. */
  list(prefix?: string): Promise<StorageObjectInfo[]>;

  /** Generate a time-limited signed URL for direct access. */
  getSignedUrl(path: string, opts: SignedUrlOptions): Promise<string>;
}

export interface SignedUrlOptions {
  permissions: 'read' | 'write' | 'readwrite';
  expiresInMinutes?: number; // default: 60
}

export interface StorageObjectInfo {
  name: string;
  size: number;
  lastModified: Date;
  contentType?: string;
}
```

### 5.2 Provider Implementations

```typescript
// packages/storage/src/providers/azure-blob.ts
// Wraps existing @bytelyst/blob code — nearly 1:1 mapping

// packages/storage/src/providers/s3.ts (future)
// Uses @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner

// packages/storage/src/providers/r2.ts (future)
// S3-compatible — extends S3 provider with Cloudflare-specific config

// packages/storage/src/providers/memory.ts
// In-memory Map<string, Buffer> for testing
```

### 5.3 Migration

The existing `@bytelyst/blob` package (162 lines) becomes the Azure Blob provider inside `@bytelyst/storage`. Consumers switch from:

```typescript
// Before
import { generateSasUrl, getContainerClient } from '@bytelyst/blob';
```

```typescript
// After
import { getStorage } from '@bytelyst/storage';
const bucket = getStorage().getBucket('audio');
const url = await bucket.getSignedUrl('user123/recording.wav', { permissions: 'read' });
```

**Python equivalent:** Refactor `src/cloud/blob_client.py` to use a `StorageProvider` ABC with `AzureBlobProvider` implementation.

---

## 6. Sprint 3: LLM Provider Abstraction

**Package:** `@bytelyst/llm`
**Effort:** 2 days
**Files changed:** `src/llm/text_cleaner.py`, `backend/src/clients/openai_client.py`, MindLyst `web/src/lib/llm.ts`, extraction-service config

### 6.1 Interface Design

```typescript
// packages/llm/src/types.ts

export interface LLMProvider {
  chatCompletion(req: ChatCompletionRequest): Promise<ChatCompletionResponse>;
  chatCompletionStream?(req: ChatCompletionRequest): AsyncIterable<string>;
  isConfigured(): boolean;
}

export interface ChatCompletionRequest {
  messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  temperature?: number;
  maxTokens?: number;
  model?: string; // override default model
}

export interface ChatCompletionResponse {
  content: string;
  usage?: { promptTokens: number; completionTokens: number };
}
```

### 6.2 Key Insight: MindLyst Already Has This Pattern

MindLyst `web/src/lib/llm.ts` already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package.

**Provider implementations:**
- `AzureOpenAIProvider` — uses `api-key` header + deployment-scoped URL
- `OpenAIProvider` — uses `Authorization: Bearer` header + model param
- `GeminiProvider` — uses Google Generative AI SDK (future)
- `OllamaProvider` — for local development (future)

### 6.3 Python Migration

```python
# Before (text_cleaner.py)
from openai import AzureOpenAI
self._client = AzureOpenAI(azure_endpoint=endpoint, api_key=api_key, api_version="2024-10-21")

# After
from bytelyst.llm import create_llm_client
self._client = create_llm_client()  # reads LLM_PROVIDER, OPENAI_API_KEY, etc.
# Returns OpenAI() or AzureOpenAI() based on config — same API surface
```

The `openai` Python SDK already has a common interface between `OpenAI` and `AzureOpenAI`. The abstraction is just a factory function that picks the right class.

---

## 7. Sprint 4: Secrets Manager Abstraction

**Package:** Refactor existing `@bytelyst/config`
**Effort:** 1 day
**Files changed:** `packages/config/src/keyvault.ts`, `src/secrets/keyvault.py`

### 7.1 Key Insight: Already 90% Done

The current `resolveKeyVaultSecrets()` already:
- Skips if `AZURE_KEYVAULT_URL` is not set
- Falls back to env vars for each secret
- Logs warnings but doesn't throw

**Refactor:** Rename to `resolveSecrets()` with provider dispatch:

```typescript
// packages/config/src/secrets.ts

export interface SecretsProvider {
  getSecret(name: string): Promise<string | null>;
}

export async function resolveSecrets(
  secrets: SecretMapping[],
  opts?: { provider?: string },
): Promise<void> {
  const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env';

  switch (provider) {
    case 'azure-keyvault':
      return resolveFromAzureKeyVault(secrets);   // existing code
    case 'aws-secrets-manager':
      return resolveFromAWSSecretsManager(secrets); // future
    case 'gcp-secret-manager':
      return resolveFromGCPSecretManager(secrets);  // future
    case 'doppler':
      return resolveFromDoppler(secrets);            // future
    case 'env':
    default:
      return; // All secrets already in env — nothing to resolve
  }
}
```

### 7.2 Rename Azure-Prefixed Env Vars

The current env vars have Azure-specific names. Add **generic aliases** that fall back to the Azure names:

```typescript
// packages/config/src/env-aliases.ts

export const ENV_ALIASES: Record<string, string[]> = {
  // Generic name → fallback names (checked in order)
  'BLOB_CONNECTION_STRING': ['AZURE_BLOB_CONNECTION_STRING'],
  'BLOB_ACCOUNT_NAME': ['AZURE_BLOB_ACCOUNT_NAME'],
  'BLOB_ACCOUNT_KEY': ['AZURE_BLOB_ACCOUNT_KEY'],
  'SPEECH_KEY': ['AZURE_SPEECH_KEY'],
  'SPEECH_REGION': ['AZURE_SPEECH_REGION'],
  'LLM_API_KEY': ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'],
  'LLM_ENDPOINT': ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'],
  'LLM_MODEL': ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'],
};

export function getEnv(name: string): string | undefined {
  if (process.env[name]) return process.env[name];
  const aliases = ENV_ALIASES[name];
  if (aliases) {
    for (const alias of aliases) {
      if (process.env[alias]) return process.env[alias];
    }
  }
  return undefined;
}
```

This means existing `.env` files with `AZURE_*` names continue to work. New deployments can use generic names.

---

## 8. Sprint 5: Speech Provider Abstraction

**Package:** `@bytelyst/speech`
**Effort:** 3–4 days
**Files changed:** `src/audio/azure_stt.py`, `iosApp/Services/AzureSpeechTranscriber.swift`

### 8.1 Interface Design (Python)

```python
# bytelyst/speech/types.py

from abc import ABC, abstractmethod
from typing import Callable, Optional

class SpeechTranscriber(ABC):
    """Cloud-agnostic streaming speech-to-text interface."""

    @abstractmethod
    def start(self, language: str = "en-US", languages: list[str] | None = None) -> None:
        """Start continuous recognition."""

    @abstractmethod
    def stop(self) -> None:
        """Stop recognition."""

    @abstractmethod
    def push_audio(self, data: bytes) -> None:
        """Push raw audio data (PCM 16-bit, 16kHz, mono)."""

    @abstractmethod
    def on_partial(self, callback: Callable[[str], None]) -> None:
        """Register callback for partial (interim) results."""

    @abstractmethod
    def on_final(self, callback: Callable[[str], None]) -> None:
        """Register callback for final (committed) results."""

    @abstractmethod
    def on_error(self, callback: Callable[[Exception], None]) -> None:
        """Register callback for errors."""

    @abstractmethod
    def set_vocabulary(self, phrases: list[str]) -> None:
        """Set custom vocabulary / phrase hints."""
```

### 8.2 Provider Implementations

```python
# bytelyst/speech/azure_provider.py
# Wraps existing azure_stt.py code — PushAudioInputStream, SpeechRecognizer, events

# bytelyst/speech/google_provider.py (future)
# Uses google-cloud-speech streaming_recognize

# bytelyst/speech/deepgram_provider.py (future)
# Uses Deepgram WebSocket API

# bytelyst/speech/whisper_provider.py (future)
# Uses faster-whisper for local transcription (already in requirements.txt!)
```

### 8.3 Swift Protocol (iOS)

```swift
// Shared/Speech/SpeechTranscriberProtocol.swift

protocol SpeechTranscriber {
    func start(language: String, languages: [String]?) async throws
    func stop() async
    func onPartial(_ handler: @escaping (String) -> Void)
    func onFinal(_ handler: @escaping (String) -> Void)
    func onError(_ handler: @escaping (Error) -> Void)
    func setVocabulary(_ phrases: [String])
}

// Shared/Speech/AzureSpeechTranscriber.swift — existing code, implements protocol
// Shared/Speech/AppleSpeechTranscriber.swift — future, uses Apple's SFSpeechRecognizer
```

### 8.4 Note on Complexity

Speech is the hardest abstraction because:
- Azure Speech SDK has a unique push-stream architecture
- Google Cloud Speech uses gRPC streaming
- Deepgram uses WebSockets
- Each has different audio format requirements and event models

The abstraction hides these differences behind a unified push-audio + callback interface. The Azure implementation wraps existing code with zero behavior changes.

---

## 9. Sprint 6: Push Notification Abstraction

**Package:** `@bytelyst/push`
**Effort:** 1 day
**Files changed:** Platform-service push-triggers module

### 9.1 Interface Design

```typescript
export interface PushProvider {
  send(notification: PushNotification): Promise<PushResult>;
  sendBatch(notifications: PushNotification[]): Promise<PushResult[]>;
}

export interface PushNotification {
  deviceToken: string;
  platform: 'ios' | 'android' | 'web';
  title: string;
  body: string;
  data?: Record<string, string>;
  badge?: number;
}
```

Implementations: `AzureNotificationHubProvider`, `FirebaseProvider` (future), `ExpoProvider` (for NomGap), `OneSignalProvider` (future).

---

## 10. Sprint 7: Monitoring & Telemetry Abstraction

**Effort:** 0.5 days (mostly done already)

The ecosystem already has cloud-agnostic monitoring:
- **Custom telemetry** via `@bytelyst/telemetry-client` → platform-service → Cosmos
- **Loki + Grafana** in `services/monitoring/`
- **Health checks** via `/health` endpoints on all services

**Remaining work:**
- Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights)
- Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers

---

## 11. Migration Effort After Refactor

Once all sprints are complete, here's how much work each cloud migration scenario requires:

### Scenario: Switch DB from Cosmos to MongoDB Atlas

| Step | Effort | Description |
|------|--------|-------------|
| Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() |
| Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change |
| Run data migration script | 2–4 hours | Export Cosmos JSON → import to MongoDB |
| Run full test suite | 30 minutes | Verify all 1,029+ tests pass |
| **Total** | **~1.5 days** | vs 3–5 weeks without abstraction |

### Scenario: Switch Storage from Azure Blob to S3

| Step | Effort | Description |
|------|--------|-------------|
| Implement `S3StorageProvider` | 0.5 day | ~100 lines |
| Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars | 5 minutes | Config change |
| Migrate blobs | 1–2 hours | azcopy or rclone |
| **Total** | **~0.5 days** | vs 2–3 days without abstraction |

### Scenario: Switch LLM from Azure OpenAI to OpenAI Direct

| Step | Effort | Description |
|------|--------|-------------|
| Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...` | 5 minutes | Config change only |
| Remove `AZURE_OPENAI_*` env vars | 5 minutes | Cleanup |
| **Total** | **10 minutes** | Already near-zero today |

### Scenario: Full Cloud Migration (Azure → AWS)

| Step | Effort | Description |
|------|--------|-------------|
| Implement MongoDB/DynamoDB provider | 1–2 days | |
| Implement S3 storage provider | 0.5 days | |
| Implement AWS Secrets Manager provider | 0.5 days | |
| Switch LLM to OpenAI direct | 10 minutes | |
| Implement Google STT or AWS Transcribe | 2–3 days | Speech is still the hardest |
| Implement SNS push provider | 0.5 days | |
| Data migration + testing | 2–3 days | |
| **Total** | **~7–10 days** | vs 4–8 weeks without abstraction |

---

## 12. Testing Strategy

### 12.1 Provider-Agnostic Tests

Every repository test should work against **any** provider. The test setup picks the provider:

```typescript
// Test setup: use in-memory provider
import { setTestProvider } from '@bytelyst/datastore/testing';

beforeAll(() => {
  setTestProvider('memory');  // Fast, no network, deterministic
});
```

### 12.2 Provider Integration Tests

Separate test suites verify each provider works correctly:

```
__tests__/
  datastore/
    cosmos.integration.test.ts    # Runs against real Cosmos (CI only)
    mongo.integration.test.ts     # Runs against real MongoDB (CI only)
    memory.test.ts                # Always runs — verifies memory provider
```

### 12.3 Migration Verification Checklist

For each sprint, before merging:
1. All existing tests pass (no regressions)
2. New interface tests pass with all implemented providers
3. Manual smoke test against Azure (dev environment)
4. No new `@azure/*` imports in application code (only in provider files)

### 12.4 CI Gate

Add a lint rule to prevent direct Azure SDK imports outside of provider directories:

```bash
# scripts/check-cloud-agnostic.sh
# Fail if any file outside packages/*/providers/ imports @azure/*
rg '@azure/' services/ dashboards/ --include='*.ts' \
  --glob='!**/providers/**' --glob='!**/node_modules/**' \
  && echo "FAIL: Direct Azure SDK import found outside provider layer" && exit 1 \
  || echo "PASS: No direct Azure imports in application code"
```

---

## 13. Env Var Naming Convention

### Current (Azure-specific)

```bash
COSMOS_ENDPOINT=https://cosmos-mywisprai.documents.azure.com:443/
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
AZURE_BLOB_CONNECTION_STRING=...
AZURE_BLOB_ACCOUNT_NAME=bytelystblobs
AZURE_BLOB_ACCOUNT_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus
AZURE_KEYVAULT_URL=...
```

### Target (generic with Azure fallbacks)

```bash
# ── Provider Selection ────────────────────────────
DB_PROVIDER=cosmos              # cosmos | mongo | memory
STORAGE_PROVIDER=azure          # azure | s3 | r2 | memory
LLM_PROVIDER=azure              # azure | openai | gemini
SECRETS_PROVIDER=azure-keyvault # azure-keyvault | aws | doppler | env
SPEECH_PROVIDER=azure           # azure | google | deepgram | whisper
PUSH_PROVIDER=azure-nh          # azure-nh | firebase | expo

# ── Database (provider-specific) ──────────────────
# Cosmos (when DB_PROVIDER=cosmos):
COSMOS_ENDPOINT=...
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
# MongoDB (when DB_PROVIDER=mongo):
# MONGO_URI=mongodb+srv://...

# ── Storage (provider-specific) ───────────────────
# Azure (when STORAGE_PROVIDER=azure):
AZURE_BLOB_CONNECTION_STRING=...
# S3 (when STORAGE_PROVIDER=s3):
# AWS_ACCESS_KEY_ID=...
# AWS_SECRET_ACCESS_KEY=...
# S3_BUCKET_PREFIX=bytelyst-

# ── LLM (provider-specific) ──────────────────────
# Azure OpenAI:
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
# OpenAI direct:
# OPENAI_API_KEY=...
# OPENAI_MODEL=gpt-4o-mini

# ── Secrets (optional) ───────────────────────────
AZURE_KEYVAULT_URL=...   # only if SECRETS_PROVIDER=azure-keyvault

# ── Speech ────────────────────────────────────────
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus
```

**Backward compatibility:** All existing `AZURE_*` env vars continue to work. The generic `*_PROVIDER` vars are additive.

---

## 14. Risk Mitigation

| Risk | Mitigation |
|------|-----------|
| **FilterMap can't express complex Cosmos SQL** | Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators |
| **Performance regression from abstraction layer** | Benchmark critical queries before/after. The abstraction adds one function call — negligible |
| **Team unfamiliar with new patterns** | Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration |
| **In-memory provider behaves differently** | Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only |
| **Stale data during DB migration** | Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old |
| **Sprint 1 takes too long** | The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist |

---

## Appendix: Interface Specifications

### A.1 `@bytelyst/datastore` — Package Structure

```
packages/datastore/
├── src/
│   ├── index.ts               # Public exports
│   ├── types.ts               # All interfaces (DocumentCollection, DatastoreProvider, etc.)
│   ├── factory.ts             # createDatastoreProvider() factory
│   ├── filter.ts              # FilterMap → provider-specific query translation
│   ├── providers/
│   │   ├── cosmos.ts          # CosmosDatastoreProvider + CosmosCollection
│   │   ├── mongo.ts           # MongoDatastoreProvider + MongoCollection (stub)
│   │   └── memory.ts          # MemoryDatastoreProvider + MemoryCollection
│   └── testing.ts             # Test helpers (setTestProvider, seedCollection, etc.)
├── package.json               # peer deps: @azure/cosmos (optional), mongodb (optional)
├── tsconfig.json
└── vitest.config.ts
```

### A.2 `@bytelyst/storage` — Package Structure

```
packages/storage/
├── src/
│   ├── index.ts
│   ├── types.ts               # StorageProvider, StorageBucket, SignedUrlOptions
│   ├── factory.ts             # createStorageProvider()
│   ├── providers/
│   │   ├── azure-blob.ts      # Wraps existing @bytelyst/blob code
│   │   ├── s3.ts              # AWS S3 (stub)
│   │   └── memory.ts          # In-memory for testing
│   └── testing.ts
├── package.json
└── tsconfig.json
```

### A.3 `@bytelyst/llm` — Package Structure

```
packages/llm/
├── src/
│   ├── index.ts
│   ├── types.ts               # LLMProvider, ChatCompletionRequest/Response
│   ├── factory.ts             # createLLMProvider()
│   ├── providers/
│   │   ├── azure-openai.ts    # AzureOpenAI endpoint + api-key auth
│   │   ├── openai.ts          # OpenAI direct + Bearer auth
│   │   └── gemini.ts          # Google Generative AI (stub)
│   └── testing.ts             # MockLLMProvider for tests
├── package.json
└── tsconfig.json
```

### A.4 Complete Interface: FilterMap Operators

```typescript
// Exact match
{ field: value }

// Comparison
{ field: { $gt: value } }       // >
{ field: { $gte: value } }      // >=
{ field: { $lt: value } }       // <
{ field: { $lte: value } }      // <=
{ field: { $ne: value } }       // !=

// Existence
{ field: { $exists: true } }    // IS_DEFINED(c.field)
{ field: { $exists: false } }   // NOT IS_DEFINED(c.field)

// String
{ field: { $startsWith: 'prefix' } }
{ field: { $contains: 'substr' } }

// Array
{ field: { $contains: value } }    // ARRAY_CONTAINS
{ field: { $in: [v1, v2, v3] } }   // IN operator

// Logical (for complex queries)
{ $or: [{ field1: v1 }, { field2: v2 }] }
```

**Cosmos adapter** translates each operator to SQL:
- `{ $gte: v }` → `c.field >= @pN`
- `{ $exists: false }` → `NOT IS_DEFINED(c.field)`
- `{ $contains: v }` on array → `ARRAY_CONTAINS(c.field, @pN)`
- `{ $in: [...] }` → `c.field IN (@pN, @pM, ...)`

**MongoDB adapter** passes operators natively (MQL uses the same `$gte`, `$exists` syntax).

**Memory adapter** evaluates operators with simple JS comparisons.

---

## Summary

| Sprint | What | Days | After This Sprint... |
|--------|------|------|---------------------|
| 1 | Database abstraction | 5–7 | DB swap = implement 1 adapter (~200 LOC) + config change |
| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change |
| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) |
| 4 | Secrets abstraction | 1 | Secrets swap = config change only |
| 5 | Speech abstraction | 3–4 | Speech swap = implement 1 adapter (~300 LOC) |
| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) |
| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic |
| **Total** | | **~15–17 days** | **Full cloud migration = ~7–10 days instead of 4–8 weeks** |

The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy.

---

*Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves.*