saravanakumardb1 dcabe46de2 feat(jarvis): add jarvis-agents, jarvis-sessions, jarvis-memory modules (63 tests)

2026-03-01 07:09:12 -08:00

40 KiB

Raw Blame History

Cloud-Agnostic Refactor Roadmap — ByteLyst Ecosystem

Author: AI Analysis (Cascade) Date: 2026-03-01 Companion doc: CLOUD_PROVIDER_MIGRATION_ANALYSIS.md Goal: Refactor the codebase so it continues to work on Azure today, but switching to any other cloud provider requires minimum effort (days, not weeks).

Philosophy
Current State vs Target State
Sprint Plan Overview
Sprint 1: Database Abstraction Layer
Sprint 2: Storage Abstraction Layer
Sprint 3: LLM Provider Abstraction
Sprint 4: Secrets Manager Abstraction
Sprint 5: Speech Provider Abstraction
Sprint 6: Push Notification Abstraction
Sprint 7: Monitoring & Telemetry Abstraction
Migration Effort After Refactor
Testing Strategy
Env Var Naming Convention
Risk Mitigation
Appendix: Interface Specifications

1. Philosophy

Core Principle: Provider-Agnostic Interfaces, Provider-Specific Implementations

Application Code (routes, business logic)
        │
        ▼
  @bytelyst/* interfaces  ◄── Cloud-agnostic contracts
        │
        ▼
  Provider implementations  ◄── Azure today, swap tomorrow
  ├── cosmos-provider/       (Azure Cosmos DB)
  ├── mongo-provider/        (MongoDB Atlas — future)
  ├── s3-provider/           (AWS S3 — future)
  └── ...

Design Rules

Application code NEVER imports cloud SDKs — only @bytelyst/* interfaces
Provider chosen at startup via env var — DB_PROVIDER=cosmos, STORAGE_PROVIDER=azure, etc.
All interfaces have an in-memory mock — for testing without any cloud dependency
Zero breaking changes — every sprint keeps all existing tests passing
Incremental adoption — modules migrate one at a time, old and new patterns coexist

What This Is NOT

This is not a migration to another cloud — Azure continues to be the production provider
This is not a rewrite — it's a series of refactors that insert interfaces between app code and cloud SDKs
This is not over-engineering — each interface is thin (30–60 lines) and directly maps to patterns already in the codebase

2. Current State vs Target State

Current: Direct Azure SDK Usage

                    38 repository.ts files
                    ┌──────────────────────┐
routes.ts ────────► │ container()           │
                    │   .items.query(SQL)   │ ◄── @azure/cosmos types leak everywhere
                    │   .items.create(doc)  │
                    │   .item(id,pk).read() │
                    └──────────────────────┘
                              │
                              ▼
                    @bytelyst/cosmos (client.ts)
                              │
                              ▼
                       @azure/cosmos SDK

Problems:

38 platform-service repository files write raw Cosmos SQL queries
6 additional repository files in dashboards + MindLyst web
Blob, Speech, OpenAI all have direct Azure SDK imports
Switching DB means rewriting 44+ files

Target: Provider-Agnostic Interfaces

                    38 repository.ts files
                    ┌──────────────────────────┐
routes.ts ────────► │ collection.findMany({     │
                    │   filter: {productId},    │ ◄── Cloud-agnostic API
                    │   sort: {createdAt: -1},  │
                    │   limit: 20,              │
                    │ })                         │
                    └──────────────────────────┘
                              │
                              ▼
                    @bytelyst/datastore (interface)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
               CosmosAdapter MongoAdapter MemoryAdapter
               (Azure)      (MongoDB)    (Testing)
                    │
                    ▼
               @azure/cosmos SDK

Benefits:

Repositories use a generic query API — no SQL strings, no Azure types
Switching provider = implement a new adapter (~200 lines) + change env var
In-memory adapter makes tests fast and cloud-free
Azure continues to work exactly as before

3. Sprint Plan Overview

Sprint	Package / Scope	Effort	Files Changed	Risk
1	`@bytelyst/datastore` — DB abstraction	5–7 days	44 repository files + 1 new package	Medium
2	`@bytelyst/storage` — Blob/Object abstraction	2 days	3 files + 1 new package	Low
3	`@bytelyst/llm` — LLM provider abstraction	2 days	4 files + 1 new package	Low
4	`@bytelyst/secrets` — Secrets manager abstraction	1 day	2 files (refactor existing)	Very Low
5	`@bytelyst/speech` — Speech STT abstraction	3–4 days	3 files + 1 new package	Medium
6	`@bytelyst/push` — Push notification abstraction	1 day	1 file + 1 new package	Very Low
7	Monitoring/Telemetry cleanup	0.5 days	Already done (custom telemetry)	None
Total		~15–17 days	~55 files

Priority Order

Sprint 1 (DB) ──► Sprint 2 (Storage) ──► Sprint 3 (LLM) ──► Sprint 4 (Secrets)
    ▲ HIGHEST ROI         EASY               EASY              TRIVIAL
    │
    └── 80% of migration effort lives here. Do this first.

Sprint 5 (Speech) ──► Sprint 6 (Push) ──► Sprint 7 (Monitoring)
    MEDIUM               LOW PRIORITY        ALREADY DONE

4. Sprint 1: Database Abstraction Layer

Package: @bytelyst/datastore Effort: 5–7 days This is the most important sprint — it eliminates 80% of cloud lock-in.

4.1 Interface Design

// packages/datastore/src/types.ts

/** A cloud-agnostic document collection (like a Cosmos container or Mongo collection). */
export interface DocumentCollection<T extends BaseDocument = BaseDocument> {
  /** Find a single document by ID + partition key. */
  findById(id: string, partitionKey: string): Promise<T | null>;

  /** Find multiple documents matching a filter. */
  findMany(opts: FindManyOptions): Promise<T[]>;

  /** Find one document matching a filter. */
  findOne(opts: FindOneOptions): Promise<T | null>;

  /** Count documents matching a filter. */
  count(filter: FilterMap): Promise<number>;

  /** Insert a new document. */
  create(doc: T): Promise<T>;

  /** Replace an entire document (full overwrite). */
  replace(id: string, partitionKey: string, doc: T): Promise<T>;

  /** Upsert: create if not exists, replace if exists. */
  upsert(doc: T): Promise<T>;

  /** Delete a document by ID + partition key. */
  delete(id: string, partitionKey: string): Promise<boolean>;

  /** Run an aggregation (COUNT, SUM, GROUP BY). */
  aggregate<R = unknown>(opts: AggregateOptions): Promise<R[]>;
}

export interface BaseDocument {
  id: string;
  [key: string]: unknown;
}

export interface FindManyOptions {
  filter: FilterMap;
  sort?: SortMap;
  limit?: number;
  offset?: number;
  partitionKey?: string;
}

export interface FindOneOptions {
  filter: FilterMap;
  partitionKey?: string;
}

export type FilterMap = Record<string, unknown>;
export type SortMap = Record<string, 1 | -1>; // 1 = ASC, -1 = DESC

export interface AggregateOptions {
  filter: FilterMap;
  groupBy?: string[];
  count?: string;     // alias for COUNT(1)
  sum?: string;       // field to SUM
}

/** Factory that creates collections — one per provider. */
export interface DatastoreProvider {
  collection<T extends BaseDocument>(name: string): DocumentCollection<T>;
  initialize?(configs: Record<string, CollectionConfig>): Promise<void>;
  close?(): Promise<void>;
}

export interface CollectionConfig {
  partitionKeyPath: string;
  defaultTtl?: number | null;
}

4.2 Cosmos Adapter (keeps everything working today)

// packages/datastore/src/providers/cosmos.ts

import type { Container } from '@azure/cosmos';
import type { BaseDocument, DocumentCollection, FindManyOptions, FilterMap, ... } from '../types.js';

export class CosmosCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private container: Container) {}

  async findById(id: string, partitionKey: string): Promise<T | null> {
    try {
      const { resource } = await this.container.item(id, partitionKey).read<T>();
      return resource ?? null;
    } catch { return null; }
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    const { sql, params } = buildSqlQuery(opts);         // ◄── converts FilterMap → Cosmos SQL
    const { resources } = await this.container
      .items.query<T>({ query: sql, parameters: params })
      .fetchAll();
    return resources;
  }

  async create(doc: T): Promise<T> {
    const { resource } = await this.container.items.create(doc);
    return resource as T;
  }

  async replace(id: string, partitionKey: string, doc: T): Promise<T> {
    const { resource } = await this.container.item(id, partitionKey).replace<T>(doc);
    return resource as T;
  }

  async upsert(doc: T): Promise<T> {
    const { resource } = await this.container.items.upsert<T>(doc);
    return resource as T;
  }

  async delete(id: string, partitionKey: string): Promise<boolean> {
    try {
      await this.container.item(id, partitionKey).delete();
      return true;
    } catch { return false; }
  }

  // ... count(), findOne(), aggregate()
}

/** Convert a FilterMap to Cosmos SQL. */
function buildSqlQuery(opts: FindManyOptions): { sql: string; params: SqlParam[] } {
  // { productId: 'x', userId: 'y' }
  // → "SELECT * FROM c WHERE c.productId = @p0 AND c.userId = @p1 ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20"
  // This is a mechanical translation — no query language exposed to application code.
}

4.3 In-Memory Adapter (for testing)

// packages/datastore/src/providers/memory.ts

export class MemoryCollection<T extends BaseDocument> implements DocumentCollection<T> {
  private docs: Map<string, T> = new Map();

  async findById(id: string): Promise<T | null> {
    return this.docs.get(id) ?? null;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let results = [...this.docs.values()].filter(doc => matchesFilter(doc, opts.filter));
    if (opts.sort) results = sortDocs(results, opts.sort);
    if (opts.offset) results = results.slice(opts.offset);
    if (opts.limit) results = results.slice(0, opts.limit);
    return results;
  }

  async create(doc: T): Promise<T> {
    this.docs.set(doc.id, doc);
    return doc;
  }
  // ... etc
}

4.4 MongoDB Adapter (future — ready to implement when needed)

// packages/datastore/src/providers/mongo.ts (STUB — implement when migrating)

import type { Collection as MongoCollection } from 'mongodb';
import type { BaseDocument, DocumentCollection, FindManyOptions } from '../types.js';

export class MongoDocumentCollection<T extends BaseDocument> implements DocumentCollection<T> {
  constructor(private collection: MongoCollection<T>) {}

  async findById(id: string): Promise<T | null> {
    return this.collection.findOne({ _id: id } as any) as Promise<T | null>;
  }

  async findMany(opts: FindManyOptions): Promise<T[]> {
    let cursor = this.collection.find(opts.filter);
    if (opts.sort) cursor = cursor.sort(opts.sort);
    if (opts.offset) cursor = cursor.skip(opts.offset);
    if (opts.limit) cursor = cursor.limit(opts.limit);
    return cursor.toArray() as Promise<T[]>;
  }
  // ... etc
}

4.5 How Repository Files Change

Before (Cosmos SQL in every file):

// services/platform-service/src/modules/flags/repository.ts
import { getContainer } from '../../lib/cosmos.js';

function container() {
  return getContainer('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId ORDER BY c.key ASC',
      parameters: [{ name: '@productId', value: productId }],
    })
    .fetchAll();
  return resources;
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  const { resources } = await container()
    .items.query<FeatureFlagDoc>({
      query: 'SELECT * FROM c WHERE c.productId = @productId AND c.key = @key',
      parameters: [
        { name: '@productId', value: productId },
        { name: '@key', value: key },
      ],
    })
    .fetchAll();
  return resources[0] ?? null;
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  const { resource } = await container().items.create(doc);
  return resource as FeatureFlagDoc;
}

After (cloud-agnostic):

// services/platform-service/src/modules/flags/repository.ts
import { getCollection } from '../../lib/datastore.js';
import type { FeatureFlagDoc } from './types.js';

function collection() {
  return getCollection<FeatureFlagDoc>('feature_flags');
}

export async function list(productId: string): Promise<FeatureFlagDoc[]> {
  return collection().findMany({
    filter: { productId },
    sort: { key: 1 },
  });
}

export async function getByKey(key: string, productId: string): Promise<FeatureFlagDoc | null> {
  return collection().findOne({
    filter: { productId, key },
  });
}

export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
  return collection().create(doc);
}

Key observations:

No SQL strings
No @azure/cosmos types
No .items.query().fetchAll() chaining
The getCollection() function returns the right provider based on DB_PROVIDER env var
All existing behavior is preserved — the Cosmos adapter generates the same SQL under the hood

4.6 Service Wiring

// services/platform-service/src/lib/datastore.ts (replaces lib/cosmos.ts)
import { createDatastoreProvider } from '@bytelyst/datastore';
import type { DocumentCollection, BaseDocument } from '@bytelyst/datastore';

let _provider: ReturnType<typeof createDatastoreProvider> | null = null;

export function getProvider() {
  if (!_provider) {
    _provider = createDatastoreProvider(); // reads DB_PROVIDER env var
  }
  return _provider;
}

export function getCollection<T extends BaseDocument>(name: string): DocumentCollection<T> {
  return getProvider().collection<T>(name);
}

// packages/datastore/src/factory.ts
export function createDatastoreProvider(): DatastoreProvider {
  const provider = process.env.DB_PROVIDER || 'cosmos';
  switch (provider) {
    case 'cosmos':
      return new CosmosDatastoreProvider();  // uses existing COSMOS_ENDPOINT, COSMOS_KEY
    case 'mongo':
      return new MongoDatastoreProvider();   // uses MONGO_URI
    case 'memory':
      return new MemoryDatastoreProvider();  // no config needed
    default:
      throw new Error(`Unknown DB_PROVIDER: ${provider}`);
  }
}

4.7 Migration Plan for 38 Repository Files

Migrate in batches, one module per commit. Each commit:

Update the repository file to use getCollection() instead of getContainer()
Replace SQL queries with findMany() / findOne() / count() / aggregate()
Run the module's test file — must pass
Commit: refactor(module-name): migrate to datastore abstraction

Batch order (simplest first, complex last):

Batch	Modules	Complexity	Notes
1	flags, plans, settings, changelog, products	Simple CRUD	5 files, warmup
2	licenses, sessions, ip-rules, maintenance, feedback	Simple CRUD + filters	5 files
3	items, comments, votes, brains, reflections	CRUD + filter combos	5 files
4	audit, delivery, notifications, exports, jobs	CRUD + time queries	5 files
5	tokens, usage, invitations, referrals, webhooks	More complex queries	5 files
6	auth, subscriptions, telemetry, experiments	Complex (GROUP BY, aggregates)	4 files
7	timers, shared-timers, routines, households	Sync logic, batch ops	4 files
8	fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist	Product-specific + remaining	12 files
9	Dashboard cosmos clients (admin-web, MindLyst web)	Direct `@azure/cosmos`	6 files
10	Python clients (desktop cosmos, backend cosmos)	`azure.cosmos` → abstracted	2 files

4.8 Handling Complex Queries

Some repository files use advanced Cosmos SQL features. Here's how the interface handles them:

Cosmos SQL Pattern	Datastore Interface Equivalent
`SELECT * FROM c WHERE c.x = @v`	`findMany({ filter: { x: v } })`
`SELECT * FROM c WHERE c.x = @v AND c.y = @w`	`findMany({ filter: { x: v, y: w } })`
`ORDER BY c.x ASC`	`findMany({ sort: { x: 1 } })`
`ORDER BY c.x DESC`	`findMany({ sort: { x: -1 } })`
`OFFSET @o LIMIT @l`	`findMany({ offset: o, limit: l })`
`SELECT VALUE COUNT(1) FROM c WHERE ...`	`count({ filter })`
`SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan`	`aggregate({ filter, groupBy: ['plan'], count: 'cnt' })`
`NOT IS_DEFINED(c.usedAt)`	`findMany({ filter: { usedAt: { $exists: false } } })`
`c.x >= @v`	`findMany({ filter: { x: { $gte: v } } })`
`ARRAY_CONTAINS(c.tags, @tag)`	`findMany({ filter: { tags: { $contains: tag } } })`
`container().item(id, pk).read()`	`findById(id, pk)`
`container().items.create(doc)`	`create(doc)`
`container().item(id, pk).replace(doc)`	`replace(id, pk, doc)`
`container().items.upsert(doc)`	`upsert(doc)`
`container().item(id, pk).delete()`	`delete(id, pk)`

For the filter operators, use a simple operator convention:

// Exact match
{ productId: 'lysnrai' }

// Comparison operators
{ syncVersion: { $gte: 5 } }
{ createdAt: { $gte: '2026-01-01', $lt: '2026-02-01' } }

// Exists check (replaces NOT IS_DEFINED)
{ usedAt: { $exists: false } }

// Array contains
{ tags: { $contains: 'important' } }

The Cosmos adapter translates these to SQL. The MongoDB adapter passes them directly (native MQL). The memory adapter does in-memory filtering.

5. Sprint 2: Storage Abstraction Layer

Package: @bytelyst/storage Effort: 2 days Files changed: packages/blob/src/blob.ts, src/cloud/blob_client.py, services/platform-service/src/modules/blob/

5.1 Interface Design

// packages/storage/src/types.ts

export interface StorageProvider {
  /** Get or create a bucket/container. */
  getBucket(name: string): StorageBucket;

  /** Check if storage is configured. */
  isConfigured(): boolean;
}

export interface StorageBucket {
  /** Upload a blob/object. */
  upload(path: string, data: Buffer | ReadableStream, contentType?: string): Promise<void>;

  /** Download a blob/object. */
  download(path: string): Promise<Buffer>;

  /** Delete a blob/object. */
  delete(path: string): Promise<boolean>;

  /** Check if a blob/object exists. */
  exists(path: string): Promise<boolean>;

  /** List blobs/objects with optional prefix. */
  list(prefix?: string): Promise<StorageObjectInfo[]>;

  /** Generate a time-limited signed URL for direct access. */
  getSignedUrl(path: string, opts: SignedUrlOptions): Promise<string>;
}

export interface SignedUrlOptions {
  permissions: 'read' | 'write' | 'readwrite';
  expiresInMinutes?: number; // default: 60
}

export interface StorageObjectInfo {
  name: string;
  size: number;
  lastModified: Date;
  contentType?: string;
}

5.2 Provider Implementations

// packages/storage/src/providers/azure-blob.ts
// Wraps existing @bytelyst/blob code — nearly 1:1 mapping

// packages/storage/src/providers/s3.ts (future)
// Uses @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner

// packages/storage/src/providers/r2.ts (future)
// S3-compatible — extends S3 provider with Cloudflare-specific config

// packages/storage/src/providers/memory.ts
// In-memory Map<string, Buffer> for testing

5.3 Migration

The existing @bytelyst/blob package (162 lines) becomes the Azure Blob provider inside @bytelyst/storage. Consumers switch from:

// Before
import { generateSasUrl, getContainerClient } from '@bytelyst/blob';

// After
import { getStorage } from '@bytelyst/storage';
const bucket = getStorage().getBucket('audio');
const url = await bucket.getSignedUrl('user123/recording.wav', { permissions: 'read' });

Python equivalent: Refactor src/cloud/blob_client.py to use a StorageProvider ABC with AzureBlobProvider implementation.

6. Sprint 3: LLM Provider Abstraction

Package: @bytelyst/llm Effort: 2 days Files changed: src/llm/text_cleaner.py, backend/src/clients/openai_client.py, MindLyst web/src/lib/llm.ts, extraction-service config

6.1 Interface Design

// packages/llm/src/types.ts

export interface LLMProvider {
  chatCompletion(req: ChatCompletionRequest): Promise<ChatCompletionResponse>;
  chatCompletionStream?(req: ChatCompletionRequest): AsyncIterable<string>;
  isConfigured(): boolean;
}

export interface ChatCompletionRequest {
  messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  temperature?: number;
  maxTokens?: number;
  model?: string; // override default model
}

export interface ChatCompletionResponse {
  content: string;
  usage?: { promptTokens: number; completionTokens: number };
}

6.2 Key Insight: MindLyst Already Has This Pattern

MindLyst web/src/lib/llm.ts already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package.

Provider implementations:

AzureOpenAIProvider — uses api-key header + deployment-scoped URL
OpenAIProvider — uses Authorization: Bearer header + model param
GeminiProvider — uses Google Generative AI SDK (future)
OllamaProvider — for local development (future)

6.3 Python Migration

# Before (text_cleaner.py)
from openai import AzureOpenAI
self._client = AzureOpenAI(azure_endpoint=endpoint, api_key=api_key, api_version="2024-10-21")

# After
from bytelyst.llm import create_llm_client
self._client = create_llm_client()  # reads LLM_PROVIDER, OPENAI_API_KEY, etc.
# Returns OpenAI() or AzureOpenAI() based on config — same API surface

The openai Python SDK already has a common interface between OpenAI and AzureOpenAI. The abstraction is just a factory function that picks the right class.

7. Sprint 4: Secrets Manager Abstraction

Package: Refactor existing @bytelyst/config Effort: 1 day Files changed: packages/config/src/keyvault.ts, src/secrets/keyvault.py

7.1 Key Insight: Already 90% Done

The current resolveKeyVaultSecrets() already:

Skips if AZURE_KEYVAULT_URL is not set
Falls back to env vars for each secret
Logs warnings but doesn't throw

Refactor: Rename to resolveSecrets() with provider dispatch:

// packages/config/src/secrets.ts

export interface SecretsProvider {
  getSecret(name: string): Promise<string | null>;
}

export async function resolveSecrets(
  secrets: SecretMapping[],
  opts?: { provider?: string },
): Promise<void> {
  const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env';

  switch (provider) {
    case 'azure-keyvault':
      return resolveFromAzureKeyVault(secrets);   // existing code
    case 'aws-secrets-manager':
      return resolveFromAWSSecretsManager(secrets); // future
    case 'gcp-secret-manager':
      return resolveFromGCPSecretManager(secrets);  // future
    case 'doppler':
      return resolveFromDoppler(secrets);            // future
    case 'env':
    default:
      return; // All secrets already in env — nothing to resolve
  }
}

7.2 Rename Azure-Prefixed Env Vars

The current env vars have Azure-specific names. Add generic aliases that fall back to the Azure names:

// packages/config/src/env-aliases.ts

export const ENV_ALIASES: Record<string, string[]> = {
  // Generic name → fallback names (checked in order)
  'BLOB_CONNECTION_STRING': ['AZURE_BLOB_CONNECTION_STRING'],
  'BLOB_ACCOUNT_NAME': ['AZURE_BLOB_ACCOUNT_NAME'],
  'BLOB_ACCOUNT_KEY': ['AZURE_BLOB_ACCOUNT_KEY'],
  'SPEECH_KEY': ['AZURE_SPEECH_KEY'],
  'SPEECH_REGION': ['AZURE_SPEECH_REGION'],
  'LLM_API_KEY': ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'],
  'LLM_ENDPOINT': ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'],
  'LLM_MODEL': ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'],
};

export function getEnv(name: string): string | undefined {
  if (process.env[name]) return process.env[name];
  const aliases = ENV_ALIASES[name];
  if (aliases) {
    for (const alias of aliases) {
      if (process.env[alias]) return process.env[alias];
    }
  }
  return undefined;
}

This means existing .env files with AZURE_* names continue to work. New deployments can use generic names.

8. Sprint 5: Speech Provider Abstraction

Package: @bytelyst/speech Effort: 3–4 days Files changed: src/audio/azure_stt.py, iosApp/Services/AzureSpeechTranscriber.swift

8.1 Interface Design (Python)

# bytelyst/speech/types.py

from abc import ABC, abstractmethod
from typing import Callable, Optional

class SpeechTranscriber(ABC):
    """Cloud-agnostic streaming speech-to-text interface."""

    @abstractmethod
    def start(self, language: str = "en-US", languages: list[str] | None = None) -> None:
        """Start continuous recognition."""

    @abstractmethod
    def stop(self) -> None:
        """Stop recognition."""

    @abstractmethod
    def push_audio(self, data: bytes) -> None:
        """Push raw audio data (PCM 16-bit, 16kHz, mono)."""

    @abstractmethod
    def on_partial(self, callback: Callable[[str], None]) -> None:
        """Register callback for partial (interim) results."""

    @abstractmethod
    def on_final(self, callback: Callable[[str], None]) -> None:
        """Register callback for final (committed) results."""

    @abstractmethod
    def on_error(self, callback: Callable[[Exception], None]) -> None:
        """Register callback for errors."""

    @abstractmethod
    def set_vocabulary(self, phrases: list[str]) -> None:
        """Set custom vocabulary / phrase hints."""

8.2 Provider Implementations

# bytelyst/speech/azure_provider.py
# Wraps existing azure_stt.py code — PushAudioInputStream, SpeechRecognizer, events

# bytelyst/speech/google_provider.py (future)
# Uses google-cloud-speech streaming_recognize

# bytelyst/speech/deepgram_provider.py (future)
# Uses Deepgram WebSocket API

# bytelyst/speech/whisper_provider.py (future)
# Uses faster-whisper for local transcription (already in requirements.txt!)

8.3 Swift Protocol (iOS)

// Shared/Speech/SpeechTranscriberProtocol.swift

protocol SpeechTranscriber {
    func start(language: String, languages: [String]?) async throws
    func stop() async
    func onPartial(_ handler: @escaping (String) -> Void)
    func onFinal(_ handler: @escaping (String) -> Void)
    func onError(_ handler: @escaping (Error) -> Void)
    func setVocabulary(_ phrases: [String])
}

// Shared/Speech/AzureSpeechTranscriber.swift — existing code, implements protocol
// Shared/Speech/AppleSpeechTranscriber.swift — future, uses Apple's SFSpeechRecognizer

8.4 Note on Complexity

Speech is the hardest abstraction because:

Azure Speech SDK has a unique push-stream architecture
Google Cloud Speech uses gRPC streaming
Deepgram uses WebSockets
Each has different audio format requirements and event models

The abstraction hides these differences behind a unified push-audio + callback interface. The Azure implementation wraps existing code with zero behavior changes.

9. Sprint 6: Push Notification Abstraction

Package: @bytelyst/push Effort: 1 day Files changed: Platform-service push-triggers module

9.1 Interface Design

export interface PushProvider {
  send(notification: PushNotification): Promise<PushResult>;
  sendBatch(notifications: PushNotification[]): Promise<PushResult[]>;
}

export interface PushNotification {
  deviceToken: string;
  platform: 'ios' | 'android' | 'web';
  title: string;
  body: string;
  data?: Record<string, string>;
  badge?: number;
}

Implementations: AzureNotificationHubProvider, FirebaseProvider (future), ExpoProvider (for NomGap), OneSignalProvider (future).

10. Sprint 7: Monitoring & Telemetry Abstraction

Effort: 0.5 days (mostly done already)

The ecosystem already has cloud-agnostic monitoring:

Custom telemetry via @bytelyst/telemetry-client → platform-service → Cosmos
Loki + Grafana in services/monitoring/
Health checks via /health endpoints on all services

Remaining work:

Remove opencensus-ext-azure from Python requirements (optional, only used for App Insights)
Ensure all structured logging uses pino (TS) or structlog (Python) — no Azure-specific loggers

11. Migration Effort After Refactor

Once all sprints are complete, here's how much work each cloud migration scenario requires:

Scenario: Switch DB from Cosmos to MongoDB Atlas

Step	Effort	Description
Implement `MongoDatastoreProvider`	1 day	~200 lines — translate FilterMap to MongoDB find()
Set `DB_PROVIDER=mongo` + `MONGO_URI=...`	5 minutes	Config change
Run data migration script	2–4 hours	Export Cosmos JSON → import to MongoDB
Run full test suite	30 minutes	Verify all 1,029+ tests pass
Total	~1.5 days	vs 3–5 weeks without abstraction

Scenario: Switch Storage from Azure Blob to S3

Step	Effort	Description
Implement `S3StorageProvider`	0.5 day	~100 lines
Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars	5 minutes	Config change
Migrate blobs	1–2 hours	azcopy or rclone
Total	~0.5 days	vs 2–3 days without abstraction

Scenario: Switch LLM from Azure OpenAI to OpenAI Direct

Step	Effort	Description
Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...`	5 minutes	Config change only
Remove `AZURE_OPENAI_*` env vars	5 minutes	Cleanup
Total	10 minutes	Already near-zero today

Scenario: Full Cloud Migration (Azure → AWS)

Step	Effort	Description
Implement MongoDB/DynamoDB provider	1–2 days
Implement S3 storage provider	0.5 days
Implement AWS Secrets Manager provider	0.5 days
Switch LLM to OpenAI direct	10 minutes
Implement Google STT or AWS Transcribe	2–3 days	Speech is still the hardest
Implement SNS push provider	0.5 days
Data migration + testing	2–3 days
Total	~7–10 days	vs 4–8 weeks without abstraction

12. Testing Strategy

12.1 Provider-Agnostic Tests

Every repository test should work against any provider. The test setup picks the provider:

// Test setup: use in-memory provider
import { setTestProvider } from '@bytelyst/datastore/testing';

beforeAll(() => {
  setTestProvider('memory');  // Fast, no network, deterministic
});

12.2 Provider Integration Tests

Separate test suites verify each provider works correctly:

__tests__/
  datastore/
    cosmos.integration.test.ts    # Runs against real Cosmos (CI only)
    mongo.integration.test.ts     # Runs against real MongoDB (CI only)
    memory.test.ts                # Always runs — verifies memory provider

12.3 Migration Verification Checklist

For each sprint, before merging:

All existing tests pass (no regressions)
New interface tests pass with all implemented providers
Manual smoke test against Azure (dev environment)
No new @azure/* imports in application code (only in provider files)

12.4 CI Gate

Add a lint rule to prevent direct Azure SDK imports outside of provider directories:

# scripts/check-cloud-agnostic.sh
# Fail if any file outside packages/*/providers/ imports @azure/*
rg '@azure/' services/ dashboards/ --include='*.ts' \
  --glob='!**/providers/**' --glob='!**/node_modules/**' \
  && echo "FAIL: Direct Azure SDK import found outside provider layer" && exit 1 \
  || echo "PASS: No direct Azure imports in application code"

13. Env Var Naming Convention

Current (Azure-specific)

COSMOS_ENDPOINT=https://cosmos-mywisprai.documents.azure.com:443/
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
AZURE_BLOB_CONNECTION_STRING=...
AZURE_BLOB_ACCOUNT_NAME=bytelystblobs
AZURE_BLOB_ACCOUNT_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus
AZURE_KEYVAULT_URL=...

Target (generic with Azure fallbacks)

# ── Provider Selection ────────────────────────────
DB_PROVIDER=cosmos              # cosmos | mongo | memory
STORAGE_PROVIDER=azure          # azure | s3 | r2 | memory
LLM_PROVIDER=azure              # azure | openai | gemini
SECRETS_PROVIDER=azure-keyvault # azure-keyvault | aws | doppler | env
SPEECH_PROVIDER=azure           # azure | google | deepgram | whisper
PUSH_PROVIDER=azure-nh          # azure-nh | firebase | expo

# ── Database (provider-specific) ──────────────────
# Cosmos (when DB_PROVIDER=cosmos):
COSMOS_ENDPOINT=...
COSMOS_KEY=...
COSMOS_DATABASE=lysnrai
# MongoDB (when DB_PROVIDER=mongo):
# MONGO_URI=mongodb+srv://...

# ── Storage (provider-specific) ───────────────────
# Azure (when STORAGE_PROVIDER=azure):
AZURE_BLOB_CONNECTION_STRING=...
# S3 (when STORAGE_PROVIDER=s3):
# AWS_ACCESS_KEY_ID=...
# AWS_SECRET_ACCESS_KEY=...
# S3_BUCKET_PREFIX=bytelyst-

# ── LLM (provider-specific) ──────────────────────
# Azure OpenAI:
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
# OpenAI direct:
# OPENAI_API_KEY=...
# OPENAI_MODEL=gpt-4o-mini

# ── Secrets (optional) ───────────────────────────
AZURE_KEYVAULT_URL=...   # only if SECRETS_PROVIDER=azure-keyvault

# ── Speech ────────────────────────────────────────
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=eastus

Backward compatibility: All existing AZURE_* env vars continue to work. The generic *_PROVIDER vars are additive.

14. Risk Mitigation

Risk	Mitigation
FilterMap can't express complex Cosmos SQL	Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators
Performance regression from abstraction layer	Benchmark critical queries before/after. The abstraction adds one function call — negligible
Team unfamiliar with new patterns	Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration
In-memory provider behaves differently	Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only
Stale data during DB migration	Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old
Sprint 1 takes too long	The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist

Appendix: Interface Specifications

A.1 `@bytelyst/datastore` — Package Structure

packages/datastore/
├── src/
│   ├── index.ts               # Public exports
│   ├── types.ts               # All interfaces (DocumentCollection, DatastoreProvider, etc.)
│   ├── factory.ts             # createDatastoreProvider() factory
│   ├── filter.ts              # FilterMap → provider-specific query translation
│   ├── providers/
│   │   ├── cosmos.ts          # CosmosDatastoreProvider + CosmosCollection
│   │   ├── mongo.ts           # MongoDatastoreProvider + MongoCollection (stub)
│   │   └── memory.ts          # MemoryDatastoreProvider + MemoryCollection
│   └── testing.ts             # Test helpers (setTestProvider, seedCollection, etc.)
├── package.json               # peer deps: @azure/cosmos (optional), mongodb (optional)
├── tsconfig.json
└── vitest.config.ts

A.2 `@bytelyst/storage` — Package Structure

packages/storage/
├── src/
│   ├── index.ts
│   ├── types.ts               # StorageProvider, StorageBucket, SignedUrlOptions
│   ├── factory.ts             # createStorageProvider()
│   ├── providers/
│   │   ├── azure-blob.ts      # Wraps existing @bytelyst/blob code
│   │   ├── s3.ts              # AWS S3 (stub)
│   │   └── memory.ts          # In-memory for testing
│   └── testing.ts
├── package.json
└── tsconfig.json

A.3 `@bytelyst/llm` — Package Structure

packages/llm/
├── src/
│   ├── index.ts
│   ├── types.ts               # LLMProvider, ChatCompletionRequest/Response
│   ├── factory.ts             # createLLMProvider()
│   ├── providers/
│   │   ├── azure-openai.ts    # AzureOpenAI endpoint + api-key auth
│   │   ├── openai.ts          # OpenAI direct + Bearer auth
│   │   └── gemini.ts          # Google Generative AI (stub)
│   └── testing.ts             # MockLLMProvider for tests
├── package.json
└── tsconfig.json

A.4 Complete Interface: FilterMap Operators

// Exact match
{ field: value }

// Comparison
{ field: { $gt: value } }       // >
{ field: { $gte: value } }      // >=
{ field: { $lt: value } }       // <
{ field: { $lte: value } }      // <=
{ field: { $ne: value } }       // !=

// Existence
{ field: { $exists: true } }    // IS_DEFINED(c.field)
{ field: { $exists: false } }   // NOT IS_DEFINED(c.field)

// String
{ field: { $startsWith: 'prefix' } }
{ field: { $contains: 'substr' } }

// Array
{ field: { $contains: value } }    // ARRAY_CONTAINS
{ field: { $in: [v1, v2, v3] } }   // IN operator

// Logical (for complex queries)
{ $or: [{ field1: v1 }, { field2: v2 }] }

Cosmos adapter translates each operator to SQL:

{ $gte: v } → c.field >= @pN
{ $exists: false } → NOT IS_DEFINED(c.field)
{ $contains: v } on array → ARRAY_CONTAINS(c.field, @pN)
{ $in: [...] } → c.field IN (@pN, @pM, ...)

MongoDB adapter passes operators natively (MQL uses the same $gte, $exists syntax).

Memory adapter evaluates operators with simple JS comparisons.

Summary

Sprint	What	Days	After This Sprint...
1	Database abstraction	5–7	DB swap = implement 1 adapter (~200 LOC) + config change
2	Storage abstraction	2	Blob swap = implement 1 adapter (~100 LOC) + config change
3	LLM abstraction	2	LLM swap = config change only (10 minutes)
4	Secrets abstraction	1	Secrets swap = config change only
5	Speech abstraction	3–4	Speech swap = implement 1 adapter (~300 LOC)
6	Push abstraction	1	Push swap = implement 1 adapter (~50 LOC)
7	Monitoring cleanup	0.5	Already cloud-agnostic
Total		~15–17 days	Full cloud migration = ~7–10 days instead of 4–8 weeks

The key insight: ~80% of migration effort is in Sprint 1 (database). If you only do one sprint, do that one. Everything else is comparatively easy.

Document generated by automated codebase analysis. Companion to CLOUD_PROVIDER_MIGRATION_ANALYSIS.md. Review as the codebase evolves.

40 KiB Raw Blame History Unescape Escape