learning_ai_common_plat/docs/CLOUD/CLOUD_AGNOSTIC_REFACTOR_ROADMAP.md

40 KiB
Raw Blame History

Cloud-Agnostic Refactor Roadmap — ByteLyst Ecosystem

Author: AI Analysis (Cascade) Date: 2026-03-01 Companion doc: CLOUD_PROVIDER_MIGRATION_ANALYSIS.md Goal: Refactor the codebase so it continues to work on Azure today, but switching to any other cloud provider requires minimum effort (days, not weeks).


Table of Contents

  1. Philosophy
  2. Current State vs Target State
  3. Sprint Plan Overview
  4. Sprint 1: Database Abstraction Layer
  5. Sprint 2: Storage Abstraction Layer
  6. Sprint 3: LLM Provider Abstraction
  7. Sprint 4: Secrets Manager Abstraction
  8. Sprint 5: Speech Provider Abstraction
  9. Sprint 6: Push Notification Abstraction
  10. Sprint 7: Monitoring & Telemetry Abstraction
  11. Migration Effort After Refactor
  12. Testing Strategy
  13. Env Var Naming Convention
  14. Risk Mitigation
  15. Appendix: Interface Specifications

1. Philosophy

Core Principle: Provider-Agnostic Interfaces, Provider-Specific Implementations

Application Code (routes, business logic)
        │
        ▼
  @bytelyst/* interfaces  ◄── Cloud-agnostic contracts
        │
        ▼
  Provider implementations  ◄── Azure today, swap tomorrow
  ├── cosmos-provider/       (Azure Cosmos DB)
  ├── mongo-provider/        (MongoDB Atlas — future)
  ├── s3-provider/           (AWS S3 — future)
  └── ...

Design Rules

  1. Application code NEVER imports cloud SDKs — only @bytelyst/* interfaces
  2. Provider chosen at startup via env varDB_PROVIDER=cosmos, STORAGE_PROVIDER=azure, etc.
  3. All interfaces have an in-memory mock — for testing without any cloud dependency
  4. Zero breaking changes — every sprint keeps all existing tests passing
  5. Incremental adoption — modules migrate one at a time, old and new patterns coexist

What This Is NOT

  • This is not a migration to another cloud — Azure continues to be the production provider
  • This is not a rewrite — it's a series of refactors that insert interfaces between app code and cloud SDKs
  • This is not over-engineering — each interface is thin (3060 lines) and directly maps to patterns already in the codebase

2. Current State vs Target State

Current: Direct Azure SDK Usage

                    38 repository.ts files
                    ┌──────────────────────┐
routes.ts ────────► │ container()           │
                    │   .items.query(SQL)   │ ◄── @azure/cosmos types leak everywhere
                    │   .items.create(doc)  │
                    │   .item(id,pk).read() │
                    └──────────────────────┘
                              │
                              ▼
                    @bytelyst/cosmos (client.ts)
                              │
                              ▼
                       @azure/cosmos SDK

Problems:

  • 38 platform-service repository files write raw Cosmos SQL queries
  • 6 additional repository files in dashboards + MindLyst web
  • Blob, Speech, OpenAI all have direct Azure SDK imports
  • Switching DB means rewriting 44+ files

Target: Provider-Agnostic Interfaces

                    38 repository.ts files
                    ┌──────────────────────────┐
routes.ts ────────► │ collection.findMany({     │
                    │   filter: {productId},    │ ◄── Cloud-agnostic API
                    │   sort: {createdAt: -1},  │
                    │   limit: 20,              │
                    │ })                         │
                    └──────────────────────────┘
                              │
                              ▼
                    @bytelyst/datastore (interface)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
               CosmosAdapter MongoAdapter MemoryAdapter
               (Azure)      (MongoDB)    (Testing)
                    │
                    ▼
               @azure/cosmos SDK

Benefits:

  • Repositories use a generic query API — no SQL strings, no Azure types
  • Switching provider = implement a new adapter (~200 lines) + change env var
  • In-memory adapter makes tests fast and cloud-free
  • Azure continues to work exactly as before

3. Sprint Plan Overview

Sprint Package / Scope Effort Files Changed Risk
1 @bytelyst/datastore — DB abstraction 57 days 44 repository files + 1 new package Medium
2 @bytelyst/storage — Blob/Object abstraction 2 days 3 files + 1 new package Low
3 @bytelyst/llm — LLM provider abstraction 2 days 4 files + 1 new package Low
4 @bytelyst/secrets — Secrets manager abstraction 1 day 2 files (refactor existing) Very Low
5 @bytelyst/speech — Speech STT abstraction 34 days 3 files + 1 new package Medium
6 @bytelyst/push — Push notification abstraction 1 day 1 file + 1 new package Very Low
7 Monitoring/Telemetry cleanup 0.5 days Already done (custom telemetry) None
Total ~1517 days ~55 files

Priority Order

Sprint 1 (DB) ──► Sprint 2 (Storage) ──► Sprint 3 (LLM) ──► Sprint 4 (Secrets)
    ▲ HIGHEST ROI         EASY               EASY              TRIVIAL
    │
    └── 80% of migration effort lives here. Do this first.

Sprint 5 (Speech) ──► Sprint 6 (Push) ──► Sprint 7 (Monitoring)
    MEDIUM               LOW PRIORITY        ALREADY DONE

4. Sprint 1: Database Abstraction Layer

Package: @bytelyst/datastore Effort: 57 days This is the most important sprint — it eliminates 80% of cloud lock-in.

4.1 Interface Design

// packages/datastore/src/types.ts

/** A cloud-agnostic document collection (like a Cosmos container or Mongo collection). */
export interface DocumentCollection<T extends BaseDocument = BaseDocument> {
  /** Find a single document by ID + partition key. */
  findById(id: string, partitionKey: string): Promise<T | null>;

  /** Find multiple documents matching a filter. */
  findMany(opts: FindManyOptions): Promise<T[]>;

  /** Find one document matching a filter. */
  findOne(opts: FindOneOptions): Promise<T | null>;

  /** Count documents matching a filter. */
  count(filter: FilterMap): Promise<number>;

  /** Insert a new document. */
  create(doc: T): Promise<T>;

  /** Replace an entire document (full overwrite). */
  replace(id: string, partitionKey: string, doc: T): Promise<T>;

  /** Upsert: create if not exists, replace if exists. */
  upsert(doc: T): Promise<T>;

  /** Delete a document by ID + partition key. */
  delete(id: string, partitionKey: string): Promise<boolean>;

  /** Run an aggregation (COUNT, SUM, GROUP BY). */
  aggregate<R = unknown>(opts: AggregateOptions): Promise<R[]>;
}

export interface BaseDocument {
  id: string;
  [key: string]: unknown;
}

export interface FindManyOptions {
  filter: FilterMap;
  sort?: SortMap;
  limit?: number;
  offset?: number;
  partitionKey?: string;
}

export interface FindOneOptions {
  filter: FilterMap;
  partitionKey?: string;
}

export type FilterMap = Record<string, unknown>;
export type SortMap = Record<string, 1 | -1>; // 1 = ASC, -1 = DESC

export interface AggregateOptions {
  filter: FilterMap;
  groupBy?: string[];
  count?: string;     // alias for COUNT(1)
  sum?: string;       // field to SUM
}

/** Factory that creates collections — one per provider. */
export interface DatastoreProvider {
  collection<T extends BaseDocument>(name: string): DocumentCollection<T>;
  initialize?(configs: Record<string, CollectionConfig>): Promise<void>;
  close?(): Promise<void>;
}

export interface CollectionConfig {
  partitionKeyPath: string;
  defaultTtl?: number | null;
}

4.2 Cosmos Adapter (keeps everything working today)

// packages/datastore/src/providers/cosmos.ts

import type { Container } from '@azure/cosmos';
import type { BaseDocument, DocumentCollection, FindManyOptions, FilterMap, ... } from '../types.js';

export class CosmosCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private container: Container) {}

  async findById(id: string, partitionKey: string): Promise<T | null> {
    try {
      const { resource } = await this.container.item(id, partitionKey).read<T>();
      return resource ?? null;
    } catch { return null; }
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    const { sql, params } = buildSqlQuery(opts);         // ◄── converts FilterMap → Cosmos SQL
    const { resources } = await this.container
      .items.query<T>({ query: sql, parameters: params })
      .fetchAll();
    return resources;
  }

  async create(doc: T): Promise<T> {
    const { resource } = await this.container.items.create(doc);
    return resource as T;
  }

  async replace(id: string, partitionKey: string, doc: T): Promise<T> {
    const { resource } = await this.container.item(id, partitionKey).replace<T>(doc);
    return resource as T;
  }

  async upsert(doc: T): Promise<T> {
    const { resource } = await this.container.items.upsert<T>(doc);
    return resource as T;
  }

  async delete(id: string, partitionKey: string): Promise<boolean> {
    try {
      await this.container.item(id, partitionKey).delete();
      return true;
    } catch { return false; }
  }

  // ... count(), findOne(), aggregate()
}

/** Convert a FilterMap to Cosmos SQL. */
function buildSqlQuery(opts: FindManyOptions): { sql: string; params: SqlParam[] } {
  // { productId: 'x', userId: 'y' }
  // → "SELECT * FROM c WHERE c.productId = @p0 AND c.userId = @p1 ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20"
  // This is a mechanical translation — no query language exposed to application code.
}

4.3 In-Memory Adapter (for testing)

// packages/datastore/src/providers/memory.ts

export class MemoryCollection<T extends BaseDocument> implements DocumentCollection<T> {
  private docs: Map<string, T> = new Map();

  async findById(id: string): Promise<T | null> {
    return this.docs.get(id) ?? null;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let results = [...this.docs.values()].filter(doc => matchesFilter(doc, opts.filter));
    if (opts.sort) results = sortDocs(results, opts.sort);
    if (opts.offset) results = results.slice(opts.offset);
    if (opts.limit) results = results.slice(0, opts.limit);
    return results;
  }

  async create(doc: T): Promise<T> {
    this.docs.set(doc.id, doc);
    return doc;
  }
  // ... etc
}

4.4 MongoDB Adapter (future — ready to implement when needed)

// packages/datastore/src/providers/mongo.ts (STUB — implement when migrating)

import type { Collection as MongoCollection } from 'mongodb';
import type { BaseDocument, DocumentCollection, FindManyOptions } from '../types.js';

export class MongoDocumentCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private collection: MongoCollection<T>) {}

  async findById(id: string): Promise<T | null> {
    return this.collection.findOne({ _id: id } as any) as Promise<T | null>;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let cursor = this.collection.find(opts.filter);
    if (opts.sort) cursor = cursor.sort(opts.sort);
    if (opts.offset) cursor = cursor.skip(opts.offset);
    if (opts.limit) cursor = cursor.limit(opts.limit);
    return cursor.toArray() as Promise<T[]>;
  }
  // ... etc
}

4.5 How Repository Files Change

Before (Cosmos SQL in every file):

// services/platform-service/src/modules/flags/repository.ts
import { getContainer } from '../../lib/cosmos.js';

function container() {
  return getContainer('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId ORDER BY c.key ASC',
      parameters: [{ name: '@productId', value: productId }],
    })
    .fetchAll();
  return resources;
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId AND c.key = @key',
      parameters: [
        { name: '@productId', value: productId },
        { name: '@key', value: key },
      ],
    })
    .fetchAll();
  return resources[0] ?? null;
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  const { resource } = await container().items.create(doc);
  return resource as FeatureFlagDoc;
}

After (cloud-agnostic):

// services/platform-service/src/modules/flags/repository.ts
import { getCollection } from '../../lib/datastore.js';
import type { FeatureFlagDoc } from './types.js';

function collection() {
  return getCollection<FeatureFlagDoc>('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  return collection().findMany({
    filter: { productId },
    sort: { key: 1 },
  });
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  return collection().findOne({
    filter: { productId, key },
  });
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  return collection().create(doc);
}

Key observations:

  • No SQL strings
  • No @azure/cosmos types
  • No .items.query().fetchAll() chaining
  • The getCollection() function returns the right provider based on DB_PROVIDER env var
  • All existing behavior is preserved — the Cosmos adapter generates the same SQL under the hood

4.6 Service Wiring

// services/platform-service/src/lib/datastore.ts (replaces lib/cosmos.ts)
import { createDatastoreProvider } from '@bytelyst/datastore';
import type { DocumentCollection, BaseDocument } from '@bytelyst/datastore';

let _provider: ReturnType<typeof createDatastoreProvider> | null = null;

export function getProvider() {
  if (!_provider) {
    _provider = createDatastoreProvider(); // reads DB_PROVIDER env var
  }
  return _provider;
}

export function getCollection<T extends BaseDocument>(name: string): DocumentCollection<T> {
  return getProvider().collection<T>(name);
}
// packages/datastore/src/factory.ts
export function createDatastoreProvider(): DatastoreProvider {
  const provider = process.env.DB_PROVIDER || 'cosmos';
  switch (provider) {
    case 'cosmos':
      return new CosmosDatastoreProvider();  // uses existing COSMOS_ENDPOINT, COSMOS_KEY
    case 'mongo':
      return new MongoDatastoreProvider();   // uses MONGO_URI
    case 'memory':
      return new MemoryDatastoreProvider();  // no config needed
    default:
      throw new Error(`Unknown DB_PROVIDER: ${provider}`);
  }
}

4.7 Migration Plan for 38 Repository Files

Migrate in batches, one module per commit. Each commit:

  1. Update the repository file to use getCollection() instead of getContainer()
  2. Replace SQL queries with findMany() / findOne() / count() / aggregate()
  3. Run the module's test file — must pass
  4. Commit: refactor(module-name): migrate to datastore abstraction

Batch order (simplest first, complex last):

Batch Modules Complexity Notes
1 flags, plans, settings, changelog, products Simple CRUD 5 files, warmup
2 licenses, sessions, ip-rules, maintenance, feedback Simple CRUD + filters 5 files
3 items, comments, votes, brains, reflections CRUD + filter combos 5 files
4 audit, delivery, notifications, exports, jobs CRUD + time queries 5 files
5 tokens, usage, invitations, referrals, webhooks More complex queries 5 files
6 auth, subscriptions, telemetry, experiments Complex (GROUP BY, aggregates) 4 files
7 timers, shared-timers, routines, households Sync logic, batch ops 4 files
8 fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist Product-specific + remaining 12 files
9 Dashboard cosmos clients (admin-web, MindLyst web) Direct @azure/cosmos 6 files
10 Python clients (desktop cosmos, backend cosmos) azure.cosmos → abstracted 2 files

4.8 Handling Complex Queries

Some repository files use advanced Cosmos SQL features. Here's how the interface handles them:

Cosmos SQL Pattern Datastore Interface Equivalent
SELECT * FROM c WHERE c.x = @v findMany({ filter: { x: v } })
SELECT * FROM c WHERE c.x = @v AND c.y = @w findMany({ filter: { x: v, y: w } })
ORDER BY c.x ASC findMany({ sort: { x: 1 } })
ORDER BY c.x DESC findMany({ sort: { x: -1 } })
OFFSET @o LIMIT @l findMany({ offset: o, limit: l })
SELECT VALUE COUNT(1) FROM c WHERE ... count({ filter })
SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan aggregate({ filter, groupBy: ['plan'], count: 'cnt' })
NOT IS_DEFINED(c.usedAt) findMany({ filter: { usedAt: { $exists: false } } })
c.x >= @v findMany({ filter: { x: { $gte: v } } })
ARRAY_CONTAINS(c.tags, @tag) findMany({ filter: { tags: { $contains: tag } } })
container().item(id, pk).read() findById(id, pk)
container().items.create(doc) create(doc)
container().item(id, pk).replace(doc) replace(id, pk, doc)
container().items.upsert(doc) upsert(doc)
container().item(id, pk).delete() delete(id, pk)

For the filter operators, use a simple operator convention:

// Exact match
{ productId: 'lysnrai' }

// Comparison operators
{ syncVersion: { $gte: 5 } }
{ createdAt: { $gte: '2026-01-01', $lt: '2026-02-01' } }

// Exists check (replaces NOT IS_DEFINED)
{ usedAt: { $exists: false } }

// Array contains
{ tags: { $contains: 'important' } }

The Cosmos adapter translates these to SQL. The MongoDB adapter passes them directly (native MQL). The memory adapter does in-memory filtering.


5. Sprint 2: Storage Abstraction Layer

Package: @bytelyst/storage Effort: 2 days Files changed: packages/blob/src/blob.ts, src/cloud/blob_client.py, services/platform-service/src/modules/blob/

5.1 Interface Design

// packages/storage/src/types.ts

export interface StorageProvider {
  /** Get or create a bucket/container. */
  getBucket(name: string): StorageBucket;

  /** Check if storage is configured. */
  isConfigured(): boolean;
}

export interface StorageBucket {
  /** Upload a blob/object. */
  upload(path: string, data: Buffer | ReadableStream, contentType?: string): Promise<void>;

  /** Download a blob/object. */
  download(path: string): Promise<Buffer>;

  /** Delete a blob/object. */
  delete(path: string): Promise<boolean>;

  /** Check if a blob/object exists. */
  exists(path: string): Promise<boolean>;

  /** List blobs/objects with optional prefix. */
  list(prefix?: string): Promise<StorageObjectInfo[]>;

  /** Generate a time-limited signed URL for direct access. */
  getSignedUrl(path: string, opts: SignedUrlOptions): Promise<string>;
}

export interface SignedUrlOptions {
  permissions: 'read' | 'write' | 'readwrite';
  expiresInMinutes?: number; // default: 60
}

export interface StorageObjectInfo {
  name: string;
  size: number;
  lastModified: Date;
  contentType?: string;
}

5.2 Provider Implementations

// packages/storage/src/providers/azure-blob.ts
// Wraps existing @bytelyst/blob code — nearly 1:1 mapping

// packages/storage/src/providers/s3.ts (future)
// Uses @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner

// packages/storage/src/providers/r2.ts (future)
// S3-compatible — extends S3 provider with Cloudflare-specific config

// packages/storage/src/providers/memory.ts
// In-memory Map<string, Buffer> for testing

5.3 Migration

The existing @bytelyst/blob package (162 lines) becomes the Azure Blob provider inside @bytelyst/storage. Consumers switch from:

// Before
import { generateSasUrl, getContainerClient } from '@bytelyst/blob';
// After
import { getStorage } from '@bytelyst/storage';
const bucket = getStorage().getBucket('audio');
const url = await bucket.getSignedUrl('user123/recording.wav', { permissions: 'read' });

Python equivalent: Refactor src/cloud/blob_client.py to use a StorageProvider ABC with AzureBlobProvider implementation.


6. Sprint 3: LLM Provider Abstraction

Package: @bytelyst/llm Effort: 2 days Files changed: src/llm/text_cleaner.py, backend/src/clients/openai_client.py, MindLyst web/src/lib/llm.ts, extraction-service config

6.1 Interface Design

// packages/llm/src/types.ts

export interface LLMProvider {
  chatCompletion(req: ChatCompletionRequest): Promise<ChatCompletionResponse>;
  chatCompletionStream?(req: ChatCompletionRequest): AsyncIterable<string>;
  isConfigured(): boolean;
}

export interface ChatCompletionRequest {
  messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  temperature?: number;
  maxTokens?: number;
  model?: string; // override default model
}

export interface ChatCompletionResponse {
  content: string;
  usage?: { promptTokens: number; completionTokens: number };
}

6.2 Key Insight: MindLyst Already Has This Pattern

MindLyst web/src/lib/llm.ts already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package.

Provider implementations:

  • AzureOpenAIProvider — uses api-key header + deployment-scoped URL
  • OpenAIProvider — uses Authorization: Bearer header + model param
  • GeminiProvider — uses Google Generative AI SDK (future)
  • OllamaProvider — for local development (future)

6.3 Python Migration

# Before (text_cleaner.py)
from openai import AzureOpenAI
self._client = AzureOpenAI(azure_endpoint=endpoint, api_key=api_key, api_version="2024-10-21")

# After
from bytelyst.llm import create_llm_client
self._client = create_llm_client()  # reads LLM_PROVIDER, OPENAI_API_KEY, etc.
# Returns OpenAI() or AzureOpenAI() based on config — same API surface

The openai Python SDK already has a common interface between OpenAI and AzureOpenAI. The abstraction is just a factory function that picks the right class.


7. Sprint 4: Secrets Manager Abstraction

Package: Refactor existing @bytelyst/config Effort: 1 day Files changed: packages/config/src/keyvault.ts, src/secrets/keyvault.py

7.1 Key Insight: Already 90% Done

The current resolveKeyVaultSecrets() already:

  • Skips if AZURE_KEYVAULT_URL is not set
  • Falls back to env vars for each secret
  • Logs warnings but doesn't throw

Refactor: Rename to resolveSecrets() with provider dispatch:

// packages/config/src/secrets.ts

export interface SecretsProvider {
  getSecret(name: string): Promise<string | null>;
}

export async function resolveSecrets(
  secrets: SecretMapping[],
  opts?: { provider?: string },
): Promise<void> {
  const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env';

  switch (provider) {
    case 'azure-keyvault':
      return resolveFromAzureKeyVault(secrets);   // existing code
    case 'aws-secrets-manager':
      return resolveFromAWSSecretsManager(secrets); // future
    case 'gcp-secret-manager':
      return resolveFromGCPSecretManager(secrets);  // future
    case 'doppler':
      return resolveFromDoppler(secrets);            // future
    case 'env':
    default:
      return; // All secrets already in env — nothing to resolve
  }
}

7.2 Rename Azure-Prefixed Env Vars

The current env vars have Azure-specific names. Add generic aliases that fall back to the Azure names:

// packages/config/src/env-aliases.ts

export const ENV_ALIASES: Record<string, string[]> = {
  // Generic name → fallback names (checked in order)
  'BLOB_CONNECTION_STRING': ['AZURE_BLOB_CONNECTION_STRING'],
  'BLOB_ACCOUNT_NAME': ['AZURE_BLOB_ACCOUNT_NAME'],
  'BLOB_ACCOUNT_KEY': ['AZURE_BLOB_ACCOUNT_KEY'],
  'SPEECH_KEY': ['AZURE_SPEECH_KEY'],
  'SPEECH_REGION': ['AZURE_SPEECH_REGION'],
  'LLM_API_KEY': ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'],
  'LLM_ENDPOINT': ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'],
  'LLM_MODEL': ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'],
};

export function getEnv(name: string): string | undefined {
  if (process.env[name]) return process.env[name];
  const aliases = ENV_ALIASES[name];
  if (aliases) {
    for (const alias of aliases) {
      if (process.env[alias]) return process.env[alias];
    }
  }
  return undefined;
}

This means existing .env files with AZURE_* names continue to work. New deployments can use generic names.


8. Sprint 5: Speech Provider Abstraction

Package: @bytelyst/speech Effort: 34 days Files changed: src/audio/azure_stt.py, iosApp/Services/AzureSpeechTranscriber.swift

8.1 Interface Design (Python)

# bytelyst/speech/types.py

from abc import ABC, abstractmethod
from typing import Callable, Optional

class SpeechTranscriber(ABC):
    """Cloud-agnostic streaming speech-to-text interface."""

    @abstractmethod
    def start(self, language: str = "en-US", languages: list[str] | None = None) -> None:
        """Start continuous recognition."""

    @abstractmethod
    def stop(self) -> None:
        """Stop recognition."""

    @abstractmethod
    def push_audio(self, data: bytes) -> None:
        """Push raw audio data (PCM 16-bit, 16kHz, mono)."""

    @abstractmethod
    def on_partial(self, callback: Callable[[str], None]) -> None:
        """Register callback for partial (interim) results."""

    @abstractmethod
    def on_final(self, callback: Callable[[str], None]) -> None:
        """Register callback for final (committed) results."""

    @abstractmethod
    def on_error(self, callback: Callable[[Exception], None]) -> None:
        """Register callback for errors."""

    @abstractmethod
    def set_vocabulary(self, phrases: list[str]) -> None:
        """Set custom vocabulary / phrase hints."""

8.2 Provider Implementations

# bytelyst/speech/azure_provider.py
# Wraps existing azure_stt.py code — PushAudioInputStream, SpeechRecognizer, events

# bytelyst/speech/google_provider.py (future)
# Uses google-cloud-speech streaming_recognize

# bytelyst/speech/deepgram_provider.py (future)
# Uses Deepgram WebSocket API

# bytelyst/speech/whisper_provider.py (future)
# Uses faster-whisper for local transcription (already in requirements.txt!)

8.3 Swift Protocol (iOS)

// Shared/Speech/SpeechTranscriberProtocol.swift

protocol SpeechTranscriber {
    func start(language: String, languages: [String]?) async throws
    func stop() async
    func onPartial(_ handler: @escaping (String) -> Void)
    func onFinal(_ handler: @escaping (String) -> Void)
    func onError(_ handler: @escaping (Error) -> Void)
    func setVocabulary(_ phrases: [String])
}

// Shared/Speech/AzureSpeechTranscriber.swift — existing code, implements protocol
// Shared/Speech/AppleSpeechTranscriber.swift — future, uses Apple's SFSpeechRecognizer

8.4 Note on Complexity

Speech is the hardest abstraction because:

  • Azure Speech SDK has a unique push-stream architecture
  • Google Cloud Speech uses gRPC streaming
  • Deepgram uses WebSockets
  • Each has different audio format requirements and event models

The abstraction hides these differences behind a unified push-audio + callback interface. The Azure implementation wraps existing code with zero behavior changes.


9. Sprint 6: Push Notification Abstraction

Package: @bytelyst/push Effort: 1 day Files changed: Platform-service push-triggers module

9.1 Interface Design

export interface PushProvider {
  send(notification: PushNotification): Promise<PushResult>;
  sendBatch(notifications: PushNotification[]): Promise<PushResult[]>;
}

export interface PushNotification {
  deviceToken: string;
  platform: 'ios' | 'android' | 'web';
  title: string;
  body: string;
  data?: Record<string, string>;
  badge?: number;
}

Implementations: AzureNotificationHubProvider, FirebaseProvider (future), ExpoProvider (for NomGap), OneSignalProvider (future).


10. Sprint 7: Monitoring & Telemetry Abstraction

Effort: 0.5 days (mostly done already)

The ecosystem already has cloud-agnostic monitoring:

  • Custom telemetry via @bytelyst/telemetry-client → platform-service → Cosmos
  • Loki + Grafana in services/monitoring/
  • Health checks via /health endpoints on all services

Remaining work:

  • Remove opencensus-ext-azure from Python requirements (optional, only used for App Insights)
  • Ensure all structured logging uses pino (TS) or structlog (Python) — no Azure-specific loggers

11. Migration Effort After Refactor

Once all sprints are complete, here's how much work each cloud migration scenario requires:

Scenario: Switch DB from Cosmos to MongoDB Atlas

Step Effort Description
Implement MongoDatastoreProvider 1 day ~200 lines — translate FilterMap to MongoDB find()
Set DB_PROVIDER=mongo + MONGO_URI=... 5 minutes Config change
Run data migration script 24 hours Export Cosmos JSON → import to MongoDB
Run full test suite 30 minutes Verify all 1,029+ tests pass
Total ~1.5 days vs 35 weeks without abstraction

Scenario: Switch Storage from Azure Blob to S3

Step Effort Description
Implement S3StorageProvider 0.5 day ~100 lines
Set STORAGE_PROVIDER=s3 + AWS_* env vars 5 minutes Config change
Migrate blobs 12 hours azcopy or rclone
Total ~0.5 days vs 23 days without abstraction

Scenario: Switch LLM from Azure OpenAI to OpenAI Direct

Step Effort Description
Set LLM_PROVIDER=openai + OPENAI_API_KEY=... 5 minutes Config change only
Remove AZURE_OPENAI_* env vars 5 minutes Cleanup
Total 10 minutes Already near-zero today

Scenario: Full Cloud Migration (Azure → AWS)

Step Effort Description
Implement MongoDB/DynamoDB provider 12 days
Implement S3 storage provider 0.5 days
Implement AWS Secrets Manager provider 0.5 days
Switch LLM to OpenAI direct 10 minutes
Implement Google STT or AWS Transcribe 23 days Speech is still the hardest
Implement SNS push provider 0.5 days
Data migration + testing 23 days
Total ~710 days vs 48 weeks without abstraction

12. Testing Strategy

12.1 Provider-Agnostic Tests

Every repository test should work against any provider. The test setup picks the provider:

// Test setup: use in-memory provider
import { setTestProvider } from '@bytelyst/datastore/testing';

beforeAll(() => {
  setTestProvider('memory');  // Fast, no network, deterministic
});

12.2 Provider Integration Tests

Separate test suites verify each provider works correctly:

__tests__/
  datastore/
    cosmos.integration.test.ts    # Runs against real Cosmos (CI only)
    mongo.integration.test.ts     # Runs against real MongoDB (CI only)
    memory.test.ts                # Always runs — verifies memory provider

12.3 Migration Verification Checklist

For each sprint, before merging:

  1. All existing tests pass (no regressions)
  2. New interface tests pass with all implemented providers
  3. Manual smoke test against Azure (dev environment)
  4. No new @azure/* imports in application code (only in provider files)

12.4 CI Gate

Add a lint rule to prevent direct Azure SDK imports outside of provider directories:

# scripts/check-cloud-agnostic.sh
# Fail if any file outside packages/*/providers/ imports @azure/*
rg '@azure/' services/ dashboards/ --include='*.ts' \
  --glob='!**/providers/**' --glob='!**/node_modules/**' \
  && echo "FAIL: Direct Azure SDK import found outside provider layer" && exit 1 \
  || echo "PASS: No direct Azure imports in application code"

13. Env Var Naming Convention

Current (Azure-specific)

COSMOS_ENDPOINT=https://cosmos-mywisprai.documents.azure.com:443/
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
AZURE_BLOB_CONNECTION_STRING=...
AZURE_BLOB_ACCOUNT_NAME=bytelystblobs
AZURE_BLOB_ACCOUNT_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus
AZURE_KEYVAULT_URL=...

Target (generic with Azure fallbacks)

# ── Provider Selection ────────────────────────────
DB_PROVIDER=cosmos              # cosmos | mongo | memory
STORAGE_PROVIDER=azure          # azure | s3 | r2 | memory
LLM_PROVIDER=azure              # azure | openai | gemini
SECRETS_PROVIDER=azure-keyvault # azure-keyvault | aws | doppler | env
SPEECH_PROVIDER=azure           # azure | google | deepgram | whisper
PUSH_PROVIDER=azure-nh          # azure-nh | firebase | expo

# ── Database (provider-specific) ──────────────────
# Cosmos (when DB_PROVIDER=cosmos):
COSMOS_ENDPOINT=...
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
# MongoDB (when DB_PROVIDER=mongo):
# MONGO_URI=mongodb+srv://...

# ── Storage (provider-specific) ───────────────────
# Azure (when STORAGE_PROVIDER=azure):
AZURE_BLOB_CONNECTION_STRING=...
# S3 (when STORAGE_PROVIDER=s3):
# AWS_ACCESS_KEY_ID=...
# AWS_SECRET_ACCESS_KEY=...
# S3_BUCKET_PREFIX=bytelyst-

# ── LLM (provider-specific) ──────────────────────
# Azure OpenAI:
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
# OpenAI direct:
# OPENAI_API_KEY=...
# OPENAI_MODEL=gpt-4o-mini

# ── Secrets (optional) ───────────────────────────
AZURE_KEYVAULT_URL=...   # only if SECRETS_PROVIDER=azure-keyvault

# ── Speech ────────────────────────────────────────
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus

Backward compatibility: All existing AZURE_* env vars continue to work. The generic *_PROVIDER vars are additive.


14. Risk Mitigation

Risk Mitigation
FilterMap can't express complex Cosmos SQL Add rawQuery() escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators
Performance regression from abstraction layer Benchmark critical queries before/after. The abstraction adds one function call — negligible
Team unfamiliar with new patterns Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration
In-memory provider behaves differently Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only
Stale data during DB migration Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old
Sprint 1 takes too long The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist

Appendix: Interface Specifications

A.1 @bytelyst/datastore — Package Structure

packages/datastore/
├── src/
│   ├── index.ts               # Public exports
│   ├── types.ts               # All interfaces (DocumentCollection, DatastoreProvider, etc.)
│   ├── factory.ts             # createDatastoreProvider() factory
│   ├── filter.ts              # FilterMap → provider-specific query translation
│   ├── providers/
│   │   ├── cosmos.ts          # CosmosDatastoreProvider + CosmosCollection
│   │   ├── mongo.ts           # MongoDatastoreProvider + MongoCollection (stub)
│   │   └── memory.ts          # MemoryDatastoreProvider + MemoryCollection
│   └── testing.ts             # Test helpers (setTestProvider, seedCollection, etc.)
├── package.json               # peer deps: @azure/cosmos (optional), mongodb (optional)
├── tsconfig.json
└── vitest.config.ts

A.2 @bytelyst/storage — Package Structure

packages/storage/
├── src/
│   ├── index.ts
│   ├── types.ts               # StorageProvider, StorageBucket, SignedUrlOptions
│   ├── factory.ts             # createStorageProvider()
│   ├── providers/
│   │   ├── azure-blob.ts      # Wraps existing @bytelyst/blob code
│   │   ├── s3.ts              # AWS S3 (stub)
│   │   └── memory.ts          # In-memory for testing
│   └── testing.ts
├── package.json
└── tsconfig.json

A.3 @bytelyst/llm — Package Structure

packages/llm/
├── src/
│   ├── index.ts
│   ├── types.ts               # LLMProvider, ChatCompletionRequest/Response
│   ├── factory.ts             # createLLMProvider()
│   ├── providers/
│   │   ├── azure-openai.ts    # AzureOpenAI endpoint + api-key auth
│   │   ├── openai.ts          # OpenAI direct + Bearer auth
│   │   └── gemini.ts          # Google Generative AI (stub)
│   └── testing.ts             # MockLLMProvider for tests
├── package.json
└── tsconfig.json

A.4 Complete Interface: FilterMap Operators

// Exact match
{ field: value }

// Comparison
{ field: { $gt: value } }       // >
{ field: { $gte: value } }      // >=
{ field: { $lt: value } }       // <
{ field: { $lte: value } }      // <=
{ field: { $ne: value } }       // !=

// Existence
{ field: { $exists: true } }    // IS_DEFINED(c.field)
{ field: { $exists: false } }   // NOT IS_DEFINED(c.field)

// String
{ field: { $startsWith: 'prefix' } }
{ field: { $contains: 'substr' } }

// Array
{ field: { $contains: value } }    // ARRAY_CONTAINS
{ field: { $in: [v1, v2, v3] } }   // IN operator

// Logical (for complex queries)
{ $or: [{ field1: v1 }, { field2: v2 }] }

Cosmos adapter translates each operator to SQL:

  • { $gte: v }c.field >= @pN
  • { $exists: false }NOT IS_DEFINED(c.field)
  • { $contains: v } on array → ARRAY_CONTAINS(c.field, @pN)
  • { $in: [...] }c.field IN (@pN, @pM, ...)

MongoDB adapter passes operators natively (MQL uses the same $gte, $exists syntax).

Memory adapter evaluates operators with simple JS comparisons.


Summary

Sprint What Days After This Sprint...
1 Database abstraction 57 DB swap = implement 1 adapter (~200 LOC) + config change
2 Storage abstraction 2 Blob swap = implement 1 adapter (~100 LOC) + config change
3 LLM abstraction 2 LLM swap = config change only (10 minutes)
4 Secrets abstraction 1 Secrets swap = config change only
5 Speech abstraction 34 Speech swap = implement 1 adapter (~300 LOC)
6 Push abstraction 1 Push swap = implement 1 adapter (~50 LOC)
7 Monitoring cleanup 0.5 Already cloud-agnostic
Total ~1517 days Full cloud migration = ~710 days instead of 48 weeks

The key insight: ~80% of migration effort is in Sprint 1 (database). If you only do one sprint, do that one. Everything else is comparatively easy.


Document generated by automated codebase analysis. Companion to CLOUD_PROVIDER_MIGRATION_ANALYSIS.md. Review as the codebase evolves.