diff --git a/docs/CLOUD/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md b/docs/CLOUD/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md deleted file mode 100644 index 666b184e..00000000 --- a/docs/CLOUD/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md +++ /dev/null @@ -1,726 +0,0 @@ -# Cloud Provider Migration Analysis — ByteLyst Ecosystem - -> **Author:** AI Analysis (Cascade) -> **Date:** 2026-03-01 -> **Scope:** All 7 repos — LysnrAI, MindLyst, ChronoMind, NomGap, PeakPulse, Common Platform, JarvisJr -> **Purpose:** Evaluate current Azure investment, assess migration feasibility to AWS / GCP / MongoDB Atlas / multi-cloud, and provide actionable recommendations. - ---- - -## Table of Contents - -1. [Executive Summary](#1-executive-summary) -2. [Current Azure Investment Inventory](#2-current-azure-investment-inventory) -3. [Dependency Depth Analysis](#3-dependency-depth-analysis) -4. [Migration Target Comparison](#4-migration-target-comparison) -5. [Per-Service Migration Analysis](#5-per-service-migration-analysis) -6. [Migration Scenario Scoring](#6-migration-scenario-scoring) -7. [Cost Comparison](#7-cost-comparison) -8. [Abstraction Layer Assessment](#8-abstraction-layer-assessment) -9. [Risk Analysis](#9-risk-analysis) -10. [Recommendations](#10-recommendations) -11. [Migration Playbook (If Chosen)](#11-migration-playbook-if-chosen) -12. [Appendix A: File-Level Azure Dependency Map](#appendix-a-file-level-azure-dependency-map) -13. [Appendix B: SDK & Package Inventory](#appendix-b-sdk--package-inventory) - ---- - -## 1. Executive Summary - -The ByteLyst ecosystem is **moderately coupled** to Azure. The coupling is concentrated in **3 packages** (`@bytelyst/cosmos`, `@bytelyst/blob`, `@bytelyst/config`) and **2 Python modules** (`azure_stt.py`, `cosmos_client.py`). The architecture already uses an internal abstraction layer — most application code never imports Azure SDKs directly. - -### Key Findings - -| Dimension | Assessment | -|-----------|-----------| -| **Overall Azure lock-in** | **Medium** — concentrated in ~15 files, but those files are foundational | -| **Easiest to migrate** | Blob Storage, Key Vault, OpenAI, Application Insights | -| **Hardest to migrate** | Cosmos DB (SQL API queries in 56+ repository files), Azure Speech SDK | -| **Best alternative DB** | MongoDB Atlas (closest query model to Cosmos SQL API) | -| **Best alternative cloud** | AWS (broadest service parity, mature SDK ecosystem) | -| **Estimated migration effort** | 4–8 weeks for full cloud swap (Cosmos DB is the long pole) | -| **Recommendation** | **Stay on Azure** for now, but invest in abstraction layers to reduce future switching cost | - -### Azure Services Used (8 total) - -| # | Azure Service | Monthly Cost | Lock-in Risk | Files Affected | -|---|--------------|-------------|-------------|----------------| -| 1 | **Cosmos DB** (SQL/NoSQL API) | ~$4–10 | **HIGH** | 56+ repository files, 3 databases, ~45 containers | -| 2 | **Blob Storage** | ~$0.20 | LOW | 2 packages + 1 Python module | -| 3 | **Azure OpenAI** | ~$5–10 | LOW | 3 files (already supports OpenAI fallback) | -| 4 | **Speech Services** | $0 (F0) | **HIGH** | 2 files (deep SDK integration, streaming) | -| 5 | **Key Vault** | ~$0.06 | LOW | 2 files (1 TS, 1 Python) | -| 6 | **Notification Hubs** | $0 (Free) | MEDIUM | Planned, not yet deeply integrated | -| 7 | **Application Insights** | $0 (5GB free) | LOW | 1 file (custom telemetry already built) | -| 8 | **Azure Identity** (DefaultAzureCredential) | $0 | LOW | Used by Key Vault + Secrets Manager | - ---- - -## 2. Current Azure Investment Inventory - -### 2.1 Azure Resources (from Azure Portal) - -| Resource | Azure Name | Region | SKU | Status | -|----------|-----------|--------|-----|--------| -| Resource Group | `rg-mywisprai` | East US | — | Active | -| Cosmos DB | `cosmos-mywisprai` | West US 2 | Serverless | Active — 3 DBs, ~45 containers | -| Blob Storage | `bytelystblobs` | West US 2 | StorageV2, RAGRS | Active — 9+ containers | -| Azure OpenAI | `mywisprai-openai-sweden` | Sweden Central | S0 | Active — gpt-4o-mini deployment | -| Speech Service | `mywisprai-speech` | East US | F0 (Free) | Active | -| Key Vault | `kv-mywisprai` | East US | Standard | Active — ~25 secrets | -| Notification Hubs | `lysnnai` namespace | East US | Free | Active — 2 hubs | -| App Insights | `bytelyst-appinsights` | East US | Classic | Active | - -### 2.2 Cosmos DB Databases & Containers - -| Database | Containers | Products Using | -|----------|-----------|----------------| -| `lysnrai` | ~27 containers (users, subscriptions, feature_flags, audit_log, tracker_items, telemetry_events, etc.) | LysnrAI, platform-service (all products) | -| `mindlyst` | ~20 containers (brains, memory_items, streaks, reflections, etc.) | MindLyst | -| `mywisprai` | 10 containers (legacy, pre-rebrand) | Legacy / migration target | - -**Total: ~57 containers across 3 databases**, all using Cosmos SQL (NoSQL) API with SQL-like queries (`SELECT`, `WHERE`, `ORDER BY`, `OFFSET/LIMIT`, aggregate functions). - -### 2.3 Code Investment by Language - -| Language | Azure SDK Packages | Files Using Azure | Lines of Azure-Specific Code | -|----------|-------------------|-------------------|------------------------------| -| **TypeScript** | `@azure/cosmos`, `@azure/storage-blob`, `@azure/identity`, `@azure/keyvault-secrets` | ~65 files | ~500 lines | -| **Python** | `azure-cognitiveservices-speech`, `azure-cosmos`, `azure-storage-blob`, `azure-identity`, `azure-keyvault-secrets`, `openai` (AzureOpenAI) | ~8 files | ~400 lines | -| **Swift** | `MicrosoftCognitiveServicesSpeech` (SPX framework) | ~3 files | ~150 lines | -| **Kotlin** | None directly (uses platform-service REST API) | 0 files | 0 lines | - ---- - -## 3. Dependency Depth Analysis - -### 3.1 Cosmos DB — DEEP (56+ files) - -This is the **most deeply embedded** Azure dependency. Every repository module follows the pattern: - -``` -types.ts → repository.ts → routes.ts - ↑ - Uses @azure/cosmos SDK - SQL queries: SELECT c.id, c.name FROM c WHERE c.productId = @pid -``` - -**Touchpoints:** -- `packages/cosmos/` — shared client singleton (`@azure/cosmos` peer dep) -- `services/platform-service/src/modules/*/repository.ts` — **56 repository files** with Cosmos SQL queries -- `services/extraction-service/src/modules/*/repository.ts` — 2 repository files -- `dashboards/admin-web/src/lib/cosmos.ts` — direct `@azure/cosmos` import -- `dashboards/admin-web/src/lib/repositories/*.ts` — 4 repository files -- `mindlyst-native/web/src/lib/cosmos.ts` — direct `@azure/cosmos` import -- `learning_voice_ai_agent/src/cloud/cosmos_client.py` — Python Cosmos client -- `learning_voice_ai_agent/backend/src/cloud/cosmos.py` — Python backend Cosmos client - -**Query patterns used:** -- `container.items.query()` with parameterized SQL -- `container.items.create()`, `.replace()`, `.delete()`, `.read()` -- `container.items.upsert()` -- Partition key routing (`/userId`, `/productId`, `/id`) -- Cross-partition queries (admin/analytics) -- `SELECT VALUE COUNT(1)` aggregates -- `OFFSET ... LIMIT` pagination -- `ORDER BY` sorting -- `ARRAY_CONTAINS()` for array queries - -### 3.2 Azure Speech SDK — DEEP (3 files, streaming integration) - -The Speech SDK is used for **real-time streaming speech-to-text** with features that are tightly coupled to the Azure SDK's event-driven architecture: - -- `src/audio/azure_stt.py` — 248 lines. Uses `PushAudioInputStream`, `SpeechRecognizer`, continuous recognition with `recognizing`/`recognized`/`canceled`/`session_stopped` event callbacks, `PhraseListGrammar`, auto-language detection (10 languages), auto-reconnect -- `src/ui/settings.py` + `src/ui/unified_window.py` — connection testing -- `mindlyst-native/iosApp/Services/AzureSpeechTranscriber.swift` — iOS Swift SPX framework -- `mobile_app/ios/LysnrAI/` — iOS keyboard extension uses SPX framework - -### 3.3 Blob Storage — SHALLOW (3 files) - -- `packages/blob/src/blob.ts` — 162 lines, singleton client, SAS URL generation -- `src/cloud/blob_client.py` — 190 lines, Python equivalent -- `services/platform-service/src/modules/blob/` — REST API wrapper - -### 3.4 Azure OpenAI — SHALLOW (3 files, already abstracted) - -- `src/llm/text_cleaner.py` — uses `openai.AzureOpenAI` (OpenAI SDK with Azure endpoint) -- `backend/src/clients/openai_client.py` — uses `openai.AsyncAzureOpenAI` -- `mindlyst-native/web/src/lib/llm.ts` — **already has OpenAI fallback** (resolves provider dynamically) - -The `openai` Python/JS SDK supports both Azure and OpenAI endpoints with minimal config change. MindLyst web already handles this automatically. - -### 3.5 Key Vault — SHALLOW (2 files) - -- `packages/config/src/keyvault.ts` — 90 lines, `resolveKeyVaultSecrets()` with graceful fallback -- `src/secrets/keyvault.py` — 69 lines, `SecretResolver` class with env var fallback - -Both implementations already fall back to environment variables when Key Vault is unavailable. Migration = just stop using Key Vault and use the env var path. - -### 3.6 Notification Hubs — NOT YET INTEGRATED - -Planned but not deeply wired. Only namespace/hub exists in Azure. Mobile apps use `BLPlatformClient` (REST) to talk to platform-service, which would route push notifications. - -### 3.7 Application Insights — SHALLOW (1 file) - -- `opencensus-ext-azure` in Python requirements (optional telemetry) -- Custom telemetry system already built (`@bytelyst/telemetry-client`, platform-service telemetry module with Cosmos storage) - -The custom telemetry system means App Insights is supplementary, not critical. - ---- - -## 4. Migration Target Comparison - -### 4.1 Database: Cosmos DB → Alternatives - -| Feature | Azure Cosmos DB (current) | MongoDB Atlas | AWS DynamoDB | Google Firestore | PostgreSQL (Supabase/Neon) | -|---------|--------------------------|---------------|-------------|-----------------|---------------------------| -| **Data model** | Document (JSON) | Document (JSON) | Key-Value + Document | Document (JSON) | Relational + JSONB | -| **Query language** | SQL-like | MQL (MongoDB Query) | PartiQL / API | GQL-like API | SQL | -| **Partition keys** | Required | Shard keys (optional) | Required | Collection groups | Not applicable | -| **Serverless** | Yes | Yes (Atlas Serverless) | Yes | Yes | Yes (Neon) | -| **SQL queries** | `SELECT c.id FROM c WHERE c.x = @y` | `db.collection.find({x: y})` | `SELECT id FROM table WHERE x = ?` | Client SDK queries | Standard SQL | -| **Aggregates** | Basic (`COUNT`, `SUM`, `AVG`) | Full (`$group`, `$match`, `$lookup`) | Limited | Limited | Full SQL | -| **Cross-partition** | Yes (expensive) | Yes (scatter-gather) | Scan (expensive) | Yes | N/A | -| **Change feed** | Yes | Change Streams | DynamoDB Streams | Real-time listeners | Logical replication | -| **Global distribution** | Built-in multi-region | Atlas Global Clusters | Global Tables | Multi-region | Manual / Citus | -| **Max doc size** | 2 MB | 16 MB | 400 KB | 1 MB | Unlimited (JSONB) | -| **Free tier** | 1000 RU/s + 25 GB | 512 MB | 25 GB + 25 WCU/RCU | 1 GiB + 50K reads/day | 0.5 GB (Neon) | -| **Migration effort** | — | **Medium** (query rewrite) | **Hard** (paradigm shift) | **Hard** (no SQL) | **Hard** (schema design) | - -### 4.2 Object Storage: Blob → Alternatives - -| Feature | Azure Blob (current) | AWS S3 | GCP Cloud Storage | Cloudflare R2 | MinIO (self-hosted) | -|---------|---------------------|--------|-------------------|---------------|---------------------| -| **API compatibility** | Azure Blob API | S3 API | GCS API / S3-compat | S3-compatible | S3-compatible | -| **SAS tokens** | Yes (Azure SAS) | Pre-signed URLs | Signed URLs | Pre-signed URLs | Pre-signed URLs | -| **CDN integration** | Azure CDN | CloudFront | Cloud CDN | Built-in | Manual | -| **Cost (per GB)** | $0.018 (Cool) | $0.023 (Standard) | $0.020 | $0.015 (no egress) | Self-hosted | -| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** | - -### 4.3 Speech-to-Text: Azure Speech → Alternatives - -| Feature | Azure Speech (current) | AWS Transcribe | Google Speech-to-Text | Deepgram | Whisper (local) | -|---------|----------------------|----------------|----------------------|----------|-----------------| -| **Streaming STT** | Yes (push stream) | Yes (WebSocket) | Yes (streaming) | Yes (WebSocket) | No (batch only) | -| **Languages** | 100+ | 100+ | 125+ | 36+ | 99+ | -| **Auto-detect lang** | Up to 10 at-once | Yes | Yes | Yes | Yes | -| **Custom vocabulary** | PhraseListGrammar | Custom vocabulary | Speech adaptation | Keywords | No | -| **Native SDK** | Python, Swift (SPX), JS | Python, no iOS SDK | Python, iOS, JS | REST/WebSocket | Python only | -| **iOS native SDK** | SPX framework (ObjC) | No native SDK | Yes (gRPC) | No native SDK | No | -| **Free tier** | 5 hrs/month (F0) | 60 min/month | 60 min/month | None | Free (local GPU) | -| **Latency** | ~200ms | ~300ms | ~200ms | ~100ms | ~500ms+ (local) | -| **Migration effort** | — | **Hard** (no iOS SDK) | **Medium** (has iOS SDK) | **Medium** (REST only) | **Hard** (no streaming) | - -### 4.4 LLM / AI: Azure OpenAI → Alternatives - -| Feature | Azure OpenAI (current) | OpenAI API (direct) | Google Gemini | AWS Bedrock | Anthropic Claude | -|---------|----------------------|--------------------|--------------|-----------| -----------------| -| **Models** | GPT-4o, GPT-4o-mini | Same models | Gemini 2.5 | Claude, Llama, Titan | Claude 3.5/4 | -| **API compatibility** | OpenAI SDK (azure mode) | OpenAI SDK (native) | Google SDK | AWS SDK | Anthropic SDK | -| **Data residency** | Azure regions | US only | Google regions | AWS regions | US/EU | -| **Cost (GPT-4o-mini)** | $0.15/$0.60 per M tokens | $0.15/$0.60 per M tokens | ~$0.10/$0.40 (Flash) | Varies | ~$0.25/$1.25 (Haiku) | -| **Migration effort** | — | **Trivial** (change endpoint) | **Easy** (SDK swap) | **Medium** | **Easy** (SDK swap) | - -### 4.5 Secrets Management: Key Vault → Alternatives - -| Feature | Azure Key Vault (current) | AWS Secrets Manager | GCP Secret Manager | HashiCorp Vault | Doppler / Infisical | -|---------|--------------------------|--------------------|--------------------|-----------------|---------------------| -| **Cost** | $0.03/10K ops | $0.40/secret/month | $0.06/10K ops | Free (OSS) | Free tier | -| **SDK** | `@azure/keyvault-secrets` | `@aws-sdk/client-secrets-manager` | `@google-cloud/secret-manager` | HTTP API | SDK / CLI | -| **Migration effort** | — | **Easy** | **Easy** | **Medium** | **Easy** | - -**Note:** The codebase already falls back to env vars when Key Vault is unavailable. This means Key Vault can be replaced by **any** secrets manager or simply .env files without code changes to application logic. - -### 4.6 Push Notifications: Notification Hubs → Alternatives - -| Feature | Azure NH (current) | AWS SNS | Firebase Cloud Messaging | OneSignal | Expo Push | -|---------|-------------------|---------|--------------------------|-----------|-----------| -| **APNs + FCM** | Yes | Yes | FCM only (APNs via FCM) | Yes | Yes | -| **Free tier** | 1M pushes/month | 1M publishes | Unlimited | 10K subscribers | Unlimited | -| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** (NomGap uses Expo) | - ---- - -## 5. Per-Service Migration Analysis - -### 5.1 Cosmos DB → MongoDB Atlas - -**Difficulty: MEDIUM-HIGH** | **Effort: 3–5 weeks** | **Risk: MEDIUM** - -This is the **single largest migration task**. Here's why: - -#### What needs to change - -| Layer | Current (Cosmos SQL API) | Target (MongoDB) | Files | -|-------|--------------------------|-------------------|-------| -| Client package | `@azure/cosmos` → `CosmosClient` | `mongodb` → `MongoClient` | `packages/cosmos/src/client.ts` | -| Container registry | `getContainer(name)` | `db.collection(name)` | `packages/cosmos/src/containers.ts` | -| All repository files | `container.items.query('SELECT...')` | `collection.find({...})` | **56+ files** in platform-service | -| Dashboard Cosmos clients | `@azure/cosmos` direct | `mongodb` direct | 2 files (admin, MindLyst) | -| Python clients | `azure.cosmos.CosmosClient` | `pymongo.MongoClient` | 2 files | -| Query syntax | SQL-like (`SELECT c.id FROM c WHERE c.productId = @pid AND c.userId = @uid ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20`) | MQL (`collection.find({productId: pid, userId: uid}).sort({createdAt: -1}).skip(0).limit(20)`) | All repository files | -| Partition keys | Explicit partition key in every query | Shard key (auto-routed) | All repository files | -| Upsert | `container.items.upsert(doc)` | `collection.updateOne({_id: id}, {$set: doc}, {upsert: true})` | ~20 files | -| Read by ID | `container.item(id, partitionKey).read()` | `collection.findOne({_id: id})` | All repository files | - -#### What stays the same -- Document structure (JSON documents with `id`, `productId`, partition keys) -- Data model (no schema changes needed — MongoDB is also schemaless) -- Partition key concept maps to shard key -- Serverless pricing model available on both - -#### Key migration steps -1. Update `@bytelyst/cosmos` package to export MongoDB-compatible API -2. Rewrite all SQL queries to MQL (56+ files) -3. Replace `container.items.query()` → `collection.find()` -4. Replace `container.item(id, pk).read()` → `collection.findOne({_id: id})` -5. Replace `container.items.create()` → `collection.insertOne()` -6. Replace `container.items.replace()` → `collection.replaceOne()` -7. Replace `container.items.upsert()` → `collection.updateOne({upsert: true})` -8. Update Python clients similarly -9. Migrate data (use Azure Data Factory or custom script) -10. Update all test mocks - -#### Why MongoDB Atlas is the best DB alternative -- **Closest query model** to Cosmos SQL API (both are document DBs) -- **MongoDB has a Cosmos DB compatibility mode** (but going native is better) -- Cosmos DB was originally inspired by MongoDB's document model -- MongoDB's `find()` queries map closely to Cosmos SQL `SELECT` queries -- Both support partition/shard keys, TTL indexes, change streams -- MongoDB Atlas Serverless pricing is competitive -- MongoDB has excellent TypeScript and Python SDKs - -### 5.2 Azure Speech → Google Cloud Speech-to-Text - -**Difficulty: HIGH** | **Effort: 2–3 weeks** | **Risk: HIGH** - -#### Why this is hard -- The Azure Speech SDK uses a **push-stream architecture** (`PushAudioInputStream`) that is deeply integrated into the audio pipeline -- The `SpeechRecognizer` has event-driven callbacks (`recognizing`, `recognized`, `canceled`, `session_stopped`) that the code relies on for real-time partial/final transcript delivery -- Custom vocabulary via `PhraseListGrammar` is Azure-specific -- Auto-language detection config is Azure-specific -- The **iOS SPX framework** (Objective-C) is used in LysnrAI keyboard extension and MindLyst — there's no direct equivalent for most alternatives - -#### Best alternative: Google Cloud Speech-to-Text -- Has streaming recognition with similar event model -- Has an iOS SDK (gRPC-based) -- Supports custom vocabulary (speech adaptation) -- Supports auto-language detection -- Similar pricing and free tier - -#### What needs to change -- `src/audio/azure_stt.py` — complete rewrite (~248 lines) -- `iosApp/Services/AzureSpeechTranscriber.swift` — complete rewrite -- `LysnrAI/LysnrKeyboard/` — keyboard extension STT integration -- Audio format handling (may differ between providers) -- Connection test code in settings UI - -### 5.3 Blob Storage → AWS S3 or Cloudflare R2 - -**Difficulty: LOW** | **Effort: 2–3 days** | **Risk: LOW** - -#### Why this is easy -- `@bytelyst/blob` package is a thin wrapper (162 lines) -- Only 3 files need changes -- S3 API is the de facto standard — R2, MinIO, GCS all support S3-compatible API -- SAS tokens → Pre-signed URLs (same concept, different implementation) - -#### What needs to change -- `packages/blob/src/blob.ts` — swap `@azure/storage-blob` → `@aws-sdk/client-s3` + `@aws-sdk/s3-request-presigner` -- `src/cloud/blob_client.py` — swap `azure.storage.blob` → `boto3` -- `services/platform-service/src/modules/blob/` — update routes for pre-signed URL format -- Environment variables: `AZURE_BLOB_*` → `AWS_S3_*` or `S3_*` - -### 5.4 Azure OpenAI → OpenAI API (direct) or Gemini - -**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW** - -#### Why this is trivial -- The `openai` Python SDK supports both Azure and OpenAI endpoints — just change config -- MindLyst web `llm.ts` **already auto-detects** Azure vs OpenAI and builds the correct URL -- LysnrAI desktop uses `AzureOpenAI` class from `openai` SDK — switch to `OpenAI` class -- Same models, same API shape, same pricing - -#### What needs to change -- Set `OPENAI_API_KEY` instead of `AZURE_OPENAI_*` env vars -- Change `AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)` → `OpenAI(api_key=...)` -- Change `AsyncAzureOpenAI(...)` → `AsyncOpenAI(...)` -- Remove `api_version` parameter -- That's it. The `openai` SDK handles the rest. - -### 5.5 Key Vault → Environment Variables / Any Secrets Manager - -**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW** - -Both `keyvault.ts` and `keyvault.py` already implement graceful fallback: -- If `AZURE_KEYVAULT_URL` is not set → uses env vars directly -- If Key Vault is unreachable → falls back to env vars - -**To migrate:** Simply stop setting `AZURE_KEYVAULT_URL`. Everything works via env vars. Then optionally adopt any other secrets manager (AWS Secrets Manager, Doppler, Infisical, etc.). - -### 5.6 Notification Hubs → Firebase Cloud Messaging - -**Difficulty: LOW** | **Effort: 1–2 days** | **Risk: LOW** - -Not yet deeply integrated. The platform-service notification module sends via REST API. Swap the push provider client. - -### 5.7 Application Insights → Self-hosted / Grafana - -**Difficulty: TRIVIAL** | **Effort: Already done** | **Risk: NONE** - -The ecosystem already has: -- Custom telemetry system (`@bytelyst/telemetry-client` → platform-service → Cosmos) -- Loki + Grafana in `services/monitoring/` -- App Insights is supplementary, can be dropped with zero code changes - ---- - -## 6. Migration Scenario Scoring - -### Scenario A: Stay on Azure (Status Quo) - -| Dimension | Score (1-5) | Notes | -|-----------|-------------|-------| -| Migration effort | **5** (none) | No work needed | -| Cost | **4** | ~$15/month at current scale, competitive | -| Vendor diversity | **1** | Single cloud vendor | -| Feature parity | **5** | Everything works today | -| **Total** | **15/20** | | - -### Scenario B: Full Migration to AWS - -| Dimension | Score (1-5) | Notes | -|-----------|-------------|-------| -| Migration effort | **2** | 6–8 weeks, Cosmos→DynamoDB is painful | -| Cost | **3** | Similar or slightly higher at small scale | -| Vendor diversity | **1** | Still single cloud, just different | -| Feature parity | **3** | No native iOS Speech SDK, DynamoDB query model is very different | -| **Total** | **9/20** | | - -### Scenario C: Multi-Cloud (MongoDB Atlas + OpenAI + R2 + Google STT) - -| Dimension | Score (1-5) | Notes | -|-----------|-------------|-------| -| Migration effort | **2** | 5–7 weeks, Cosmos→MongoDB is medium | -| Cost | **4** | MongoDB Atlas free tier, R2 no egress fees | -| Vendor diversity | **5** | No single-vendor dependency | -| Feature parity | **4** | MongoDB is a better document DB than Cosmos in many ways | -| **Total** | **15/20** | | - -### Scenario D: Stay Azure + Add Abstraction Layers - -| Dimension | Score (1-5) | Notes | -|-----------|-------------|-------| -| Migration effort | **4** | 1–2 weeks to add repository interface pattern | -| Cost | **4** | No change | -| Vendor diversity | **3** | Ready to switch, but still on Azure | -| Feature parity | **5** | Everything works today | -| **Total** | **16/20** | **Winner** | - -### Scenario E: Migrate DB Only (Cosmos → MongoDB Atlas, keep rest on Azure) - -| Dimension | Score (1-5) | Notes | -|-----------|-------------|-------| -| Migration effort | **3** | 3–5 weeks for DB migration | -| Cost | **4** | MongoDB Atlas Serverless may be cheaper | -| Vendor diversity | **3** | DB is independent, other services still Azure | -| Feature parity | **5** | MongoDB is very capable | -| **Total** | **15/20** | | - ---- - -## 7. Cost Comparison - -### Current Azure Costs (MVP / Low Usage) - -| Service | Monthly Cost | Notes | -|---------|-------------|-------| -| Cosmos DB (Serverless) | ~$4–10 | 3 databases, ~45 containers | -| Blob Storage (Cool, RAGRS) | ~$0.20 | 9+ containers | -| Azure OpenAI (GPT-4o-mini) | ~$5–10 | Pay per token | -| Speech (F0) | $0 | 5 hrs/month free | -| Key Vault | ~$0.06 | ~25 secrets | -| Notification Hubs (Free) | $0 | 1M pushes/month | -| App Insights | $0 | 5 GB/month free | -| **Total** | **~$10–20/month** | | - -### Equivalent AWS Costs - -| Service | AWS Equivalent | Monthly Cost | -|---------|---------------|-------------| -| Cosmos DB → DynamoDB (On-Demand) | DynamoDB | ~$5–15 | -| Blob → S3 Standard | S3 | ~$0.25 | -| Azure OpenAI → OpenAI API | Same pricing | ~$5–10 | -| Speech → Transcribe | Transcribe | ~$1–3 | -| Key Vault → Secrets Manager | Secrets Manager | ~$10 (per-secret pricing) | -| Notification Hubs → SNS | SNS | ~$0.50 | -| App Insights → CloudWatch | CloudWatch | ~$3 | -| **Total** | | **~$25–42/month** | - -### Equivalent Multi-Cloud Costs - -| Service | Provider | Monthly Cost | -|---------|---------|-------------| -| Cosmos DB → MongoDB Atlas Serverless | MongoDB | ~$3–8 | -| Blob → Cloudflare R2 | Cloudflare | ~$0.15 (no egress) | -| Azure OpenAI → OpenAI API (direct) | OpenAI | ~$5–10 | -| Speech → Google STT | Google Cloud | ~$1–3 | -| Key Vault → Doppler (free tier) | Doppler | $0 | -| Push → Firebase FCM | Google | $0 | -| Monitoring → Grafana Cloud (free) | Grafana | $0 | -| **Total** | | **~$10–22/month** | - -### Cost Summary - -| Scenario | Monthly Cost | vs Current | -|----------|-------------|-----------| -| **Azure (current)** | ~$10–20 | Baseline | -| **Full AWS** | ~$25–42 | +50–110% | -| **Multi-cloud** | ~$10–22 | ~Same | -| **MongoDB Atlas + Azure rest** | ~$10–18 | ~Same | - -**Verdict:** At current scale, cost is not a compelling reason to migrate. All options are under $50/month. Cost becomes more significant at scale (10K+ users), where MongoDB Atlas and R2 would likely be cheaper due to no egress fees and better serverless pricing. - ---- - -## 8. Abstraction Layer Assessment - -### Current State: Partially Abstracted - -The codebase already has meaningful abstraction: - -| Layer | Abstraction Level | Notes | -|-------|-------------------|-------| -| **Cosmos DB** | **Partial** — `@bytelyst/cosmos` package | Application code still writes raw SQL queries and uses `@azure/cosmos` types | -| **Blob Storage** | **Good** — `@bytelyst/blob` package | Thin wrapper, easy to swap internals | -| **OpenAI/LLM** | **Good** — MindLyst has provider auto-detection | LysnrAI desktop/backend hardcodes `AzureOpenAI` | -| **Key Vault** | **Excellent** — graceful fallback to env vars | Already cloud-agnostic in practice | -| **Speech** | **None** — raw SDK usage | Deep Azure SDK coupling in 3 files | -| **Auth (JWT)** | **Excellent** — uses `jose` library | No cloud dependency | -| **Push notifications** | **Good** — platform-service abstraction | Swap provider client only | - -### What's Missing: Repository Interface Pattern - -The biggest gap is that repository files directly use `@azure/cosmos` types and SQL query syntax. To make the DB layer swappable, you'd need: - -```typescript -// Proposed: packages/cosmos/src/repository.ts -export interface DocumentRepository { - findById(id: string, partitionKey: string): Promise; - findMany(filter: Record, opts?: QueryOptions): Promise; - create(doc: T): Promise; - replace(id: string, doc: T, partitionKey: string): Promise; - upsert(doc: T): Promise; - delete(id: string, partitionKey: string): Promise; - count(filter: Record): Promise; -} -``` - -This would allow swapping Cosmos → MongoDB → PostgreSQL behind the interface without touching 56+ repository files. - -**Effort to add:** 1–2 weeks. This is the **highest-ROI investment** regardless of migration decision. - ---- - -## 9. Risk Analysis - -### 9.1 Risks of Staying on Azure - -| Risk | Likelihood | Impact | Mitigation | -|------|-----------|--------|-----------| -| Azure pricing increases | Low | Medium | Add abstraction layer for future portability | -| Azure outage | Low | High | Multi-region already possible (Cosmos global distribution) | -| Feature stagnation | Very Low | Low | Azure is investing heavily in AI services | -| Vendor lock-in deepens over time | Medium | Medium | Add abstraction layers proactively | - -### 9.2 Risks of Migrating - -| Risk | Likelihood | Impact | Mitigation | -|------|-----------|--------|-----------| -| Data loss during migration | Low | Critical | Test migration on staging first, keep Azure as backup | -| Query performance differences | Medium | Medium | Benchmark before committing | -| Feature gaps in new provider | Medium | Medium | Prototype critical features first | -| Wasted engineering time | Medium | High | Only migrate if there's a clear business driver | -| Regression bugs in 56+ repository files | High | Medium | Comprehensive test suite (1,029 tests) catches most issues | -| Speech quality degradation | Medium | High | A/B test both providers before committing | - -### 9.3 Azure-Specific Lock-in Risks (ranked) - -| # | Component | Lock-in Level | Escape Hatch | -|---|-----------|--------------|-------------| -| 1 | **Cosmos DB SQL API** | High | Rewrite queries to MongoDB MQL or add repository interface | -| 2 | **Azure Speech SDK (streaming)** | High | Google STT has comparable streaming API | -| 3 | **Azure Identity (DefaultAzureCredential)** | Medium | Only used by Key Vault, which is already optional | -| 4 | **Blob Storage SAS tokens** | Low | Pre-signed URLs are equivalent across all providers | -| 5 | **Azure OpenAI** | Very Low | OpenAI SDK works with both — 1-line config change | -| 6 | **Key Vault** | Very Low | Already has env var fallback | -| 7 | **Notification Hubs** | Very Low | Not deeply integrated yet | -| 8 | **Application Insights** | None | Custom telemetry already built | - ---- - -## 10. Recommendations - -### Recommended Strategy: **Stay on Azure + Invest in Abstraction** (Scenario D) - -This is the highest-scoring approach. Here's the prioritized action plan: - -#### Phase 1: Add Repository Interface (1–2 weeks) -- Create `DocumentRepository` interface in `@bytelyst/cosmos` -- Implement `CosmosDocumentRepository` that wraps current `@azure/cosmos` calls -- Gradually migrate the 56 repository files to use the interface -- This makes future DB migration a matter of implementing `MongoDocumentRepository` — no application code changes needed - -#### Phase 2: Normalize LLM Abstraction (2–3 days) -- Move LysnrAI desktop/backend from `AzureOpenAI` → auto-detecting provider pattern (like MindLyst web already does) -- Support `OPENAI_PROVIDER=azure|openai|gemini` across all repos -- This makes LLM provider swappable via config - -#### Phase 3: Speech Abstraction Layer (1 week, optional) -- Create `SpeechTranscriber` protocol/interface -- Implement `AzureSpeechTranscriber` (current code, extracted) -- Prepare `GoogleSpeechTranscriber` stub for future use -- This is lower priority since Azure Speech F0 tier is free - -#### Phase 4: Document Decision Criteria for Future Migration -- Define triggers that would justify migration (e.g., cost > $X/month, Azure outage > Y hours, need for feature Z) -- Review annually - -### Why NOT Migrate Now - -1. **Cost is negligible** — ~$10–20/month doesn't justify weeks of engineering -2. **No business driver** — Azure isn't blocking any feature development -3. **Risk/reward is unfavorable** — 4–8 weeks of migration work for ~$0 cost savings -4. **Test coverage is good but not perfect** — 1,029 tests cover most paths, but query-level changes in 56 files still risk regressions -5. **Azure free tiers are generous** — Speech F0, Notification Hubs Free, App Insights free tier - -### When Migration WOULD Make Sense - -- **Cosmos DB costs exceed $100/month** → Consider MongoDB Atlas Serverless -- **Azure Speech quality is insufficient** → Evaluate Google STT or Deepgram -- **Enterprise customer requires specific cloud** → Build the repository interface, then implement their cloud backend -- **Azure has extended outage affecting your region** → Multi-region or multi-cloud -- **You want to go fully open-source** → PostgreSQL (Supabase) + Whisper + MinIO (significant rewrite) - ---- - -## 11. Migration Playbook (If Chosen) - -If you decide to migrate in the future, here's the execution order (shortest critical path): - -### Week 1–2: Database Abstraction -1. Create `DocumentRepository` interface -2. Implement `CosmosDocumentRepository` (wraps current code) -3. Migrate all 56 repository files to use interface -4. Verify all 1,029 tests pass - -### Week 3–4: Database Migration (Cosmos → MongoDB) -1. Implement `MongoDocumentRepository` -2. Set up MongoDB Atlas Serverless cluster -3. Write data migration script (Cosmos → MongoDB) -4. Run migration on staging, verify data integrity -5. Switch repository implementation via config flag -6. Run full test suite against MongoDB - -### Week 5: Storage + Secrets -1. Swap `@bytelyst/blob` internals to S3-compatible client -2. Migrate blobs (azcopy → aws s3 sync or similar) -3. Replace Key Vault with new secrets manager (or just env vars) -4. Update all environment variable names - -### Week 6: LLM + Speech (if needed) -1. Switch OpenAI from Azure endpoint to direct (config change only) -2. If migrating Speech: rewrite `azure_stt.py` and Swift `AzureSpeechTranscriber` -3. A/B test new speech provider against Azure - -### Week 7–8: Cleanup + Verification -1. Remove all `@azure/*` npm packages -2. Remove all `azure-*` pip packages -3. Update Docker configs, CI/CD -4. Update documentation -5. Monitor production for 2 weeks - ---- - -## Appendix A: File-Level Azure Dependency Map - -### TypeScript — `@azure/cosmos` (CRITICAL) - -| File | Repo | Direct Import | -|------|------|---------------| -| `packages/cosmos/src/client.ts` | common-plat | `@azure/cosmos` | -| `packages/cosmos/src/containers.ts` | common-plat | `@azure/cosmos` | -| `services/platform-service/src/modules/*/repository.ts` (56 files) | common-plat | Via `@bytelyst/cosmos` | -| `services/extraction-service/src/modules/*/repository.ts` (2 files) | common-plat | Via `@bytelyst/cosmos` | -| `dashboards/admin-web/src/lib/cosmos.ts` | common-plat | `@azure/cosmos` | -| `dashboards/admin-web/src/lib/repositories/*.ts` (4 files) | common-plat | Via cosmos.ts | -| `mindlyst-native/web/src/lib/cosmos.ts` | MindLyst | `@azure/cosmos` | - -### TypeScript — `@azure/storage-blob` - -| File | Repo | Direct Import | -|------|------|---------------| -| `packages/blob/src/blob.ts` | common-plat | `@azure/storage-blob` | - -### TypeScript — `@azure/identity` + `@azure/keyvault-secrets` - -| File | Repo | Direct Import | -|------|------|---------------| -| `packages/config/src/keyvault.ts` | common-plat | Dynamic import (both) | -| `dashboards/admin-web/src/app/api/ops/secrets/route.ts` | common-plat | Both (Secrets Manager UI) | - -### Python — Azure SDKs - -| File | Repo | SDK | -|------|------|-----| -| `src/audio/azure_stt.py` | LysnrAI | `azure.cognitiveservices.speech` | -| `src/cloud/cosmos_client.py` | LysnrAI | `azure.cosmos` | -| `src/cloud/blob_client.py` | LysnrAI | `azure.storage.blob` | -| `src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` | -| `backend/src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` | -| `backend/src/cloud/cosmos.py` | LysnrAI | `azure.cosmos` | -| `src/llm/text_cleaner.py` | LysnrAI | `openai.AzureOpenAI` | -| `backend/src/clients/openai_client.py` | LysnrAI | `openai.AsyncAzureOpenAI` | - -### Swift — Azure Speech SDK - -| File | Repo | SDK | -|------|------|-----| -| `iosApp/Services/AzureSpeechTranscriber.swift` | MindLyst | `MicrosoftCognitiveServicesSpeech` | -| `LysnrAI/LysnrKeyboard/KeyboardViewController.swift` | LysnrAI | SPX framework (via CocoaPods) | - ---- - -## Appendix B: SDK & Package Inventory - -### npm packages (TypeScript) - -| Package | Version | Used By | Swappable | -|---------|---------|---------|-----------| -| `@azure/cosmos` | ≥4.0.0 | `@bytelyst/cosmos`, admin-web, MindLyst web | Medium (query rewrite) | -| `@azure/storage-blob` | ≥12.0.0 | `@bytelyst/blob` | Easy (S3 compat) | -| `@azure/identity` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) | -| `@azure/keyvault-secrets` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) | - -### pip packages (Python) - -| Package | Version | Used By | Swappable | -|---------|---------|---------|-----------| -| `azure-cognitiveservices-speech` | ≥1.42.0 | Desktop STT | Hard (deep SDK integration) | -| `azure-cosmos` | latest | Desktop + backend Cosmos client | Medium (pymongo swap) | -| `azure-storage-blob` | ≥12.24.0 | Desktop blob client | Easy (boto3 swap) | -| `azure-identity` | ≥1.19.0 | Key Vault auth | Easy (remove) | -| `azure-keyvault-secrets` | ≥4.9.0 | Secrets resolver | Easy (remove) | -| `openai` | ≥1.60.0 | `AzureOpenAI` / `AsyncAzureOpenAI` | Trivial (change class name) | -| `opencensus-ext-azure` | ≥1.1.0 | Optional telemetry | Trivial (remove) | - -### Swift packages / CocoaPods - -| Package | Used By | Swappable | -|---------|---------|-----------| -| `MicrosoftCognitiveServicesSpeech` (SPX) | LysnrAI iOS, MindLyst iOS | Hard (need alternative streaming STT) | - ---- - -*Document generated by automated codebase analysis. Numbers are accurate as of 2026-03-01. Update as the codebase evolves.* diff --git a/docs/WINDSURF/AZURE_CONNECTION_AUDIT.md b/docs/WINDSURF/AZURE_CONNECTION_AUDIT.md deleted file mode 100644 index a8cb24f7..00000000 --- a/docs/WINDSURF/AZURE_CONNECTION_AUDIT.md +++ /dev/null @@ -1,181 +0,0 @@ -# Azure Connection Audit — Full Workspace Report - -> **Date:** 2026-02-22 -> **Scope:** `learning_ai_common_plat`, `learning_voice_ai_agent`, `learning_multimodal_memory_agents`, `learning_ai_clock`, `learning_ai_fastgap` -> **Auditor:** Cascade (AI) - ---- - -## Executive Summary - -| Category | Issues Found | Fixed (session 1) | Fixed (session 2) | Remaining | -|----------|-------------|-------------------|-------------------|-----------| -| `x-request-id` missing | 12 clients | 2 (MindLyst) | **9** (root cause + feature-flags) | 0 ✅ | -| `x-product-id` missing | 6 clients | 0 | **6** (admin + user dashboards + Python) | 0 ✅ | -| Cosmos PK mismatch | 1 container | 0 (flagged) | 0 | 1 (needs migration) | -| `.env.example` gaps | 4 files | 1 (MindLyst) | **3** (ChronoMind, user-dash, admin-dash) | 0 ✅ | -| Hardcoded productId | 2 instances | 0 | **2** (telemetry.ts, platform_client.py) | 0 ✅ | -| Python client gaps | 1 file | 0 | **1** (headers + config) | 0 ✅ | - ---- - -## 1. `x-request-id` Header — Root Cause - -### Finding - -**`@bytelyst/api-client` does NOT auto-inject `x-request-id`.** - -The `createApiClient()` factory in `packages/api-client/src/client.ts` only sets `Content-Type`, auth token (via `getToken`), and caller-supplied `defaultHeaders`. No `x-request-id` is generated. This means **every consumer** that relies on `@bytelyst/api-client` without explicitly adding the header is missing request tracing. - -### Root Cause Fix - -Add `x-request-id: crypto.randomUUID()` to `buildHeaders()` in `packages/api-client/src/client.ts`. This single change propagates to all consumers automatically. - -### Affected Clients (missing `x-request-id`) - -| Repo | File | Client Pattern | -|------|------|---------------| -| `common_plat` | `dashboards/admin-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` | -| `common_plat` | `dashboards/admin-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` | -| `common_plat` | `dashboards/admin-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` | -| `common_plat` | `dashboards/tracker-web/src/lib/tracker-client.ts` | `createApiClient` — no `x-request-id` | -| `common_plat` | `packages/extraction/src/client.ts` | `createApiClient` — no `x-request-id` | -| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` | -| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` | -| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` | -| `voice_ai_agent` | `user-dashboard-web/src/lib/feature-flags.ts` | Custom `fetch` — no `x-request-id` | -| `voice_ai_agent` | `backend/src/clients/platform_client.py` | `httpx` — no `x-request-id` | - -### Already Fixed (previous session) - -| Repo | File | Status | -|------|------|--------| -| `multimodal_memory` | `web/src/lib/billing-client.ts` | ✅ Added via `defaultHeaders` | -| `multimodal_memory` | `web/src/lib/feature-flags.ts` | ✅ Added manually | - -### Already Correct - -| Repo | File | Status | -|------|------|--------| -| `ai_fastgap` (NomGap) | `src/api/client.ts` | ✅ Custom client with `crypto.randomUUID()` | -| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` | ✅ Custom client with `crypto.randomUUID()` | -| `voice_ai_agent` | `backend/src/main.py` | ✅ Middleware propagates/generates | -| `voice_ai_agent` | `backend/src/clients/extraction_client.py` | ✅ Passes `request_id` param | - ---- - -## 2. `x-product-id` Header Gaps - -### Clients Missing `x-product-id` - -| Repo | File | Impact | -|------|------|--------| -| `common_plat` | `admin-web/src/lib/billing-client.ts` | Server can't filter by product | -| `common_plat` | `admin-web/src/lib/growth-client.ts` | Server can't filter by product | -| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | Server can't filter by product | -| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | Server can't filter by product | -| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | Passes in body, not header | -| `voice_ai_agent` | `backend/src/clients/platform_client.py` | Passes in body/params, not header | - -### Already Correct - -| Repo | File | -|------|------| -| `ai_fastgap` (NomGap) | `src/api/client.ts` — `x-product-id: API_CONFIG.productId` | -| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` — `x-product-id` header | -| `multimodal_memory` (MindLyst) | `web/src/lib/billing-client.ts` — via `defaultHeaders` | -| `multimodal_memory` (MindLyst) | `web/src/lib/feature-flags.ts` — explicit header | -| `common_plat` | `tracker-web/src/lib/tracker-client.ts` — from `localStorage` | - ---- - -## 3. Cosmos DB Partition Key Mismatch - -### `referrals` Container — 3-way Mismatch - -| Location | Partition Key | -|----------|--------------| -| `platform-service/src/lib/cosmos-init.ts` | `/id` | -| MindLyst `web/src/lib/cosmos.ts` | `/userId` | -| Admin dashboard `admin-web/src/lib/cosmos.ts` | `/referrerId` | -| User dashboard `user-dashboard-web/src/lib/cosmos.ts` | `/referrerId` | - -**Status:** Flagged in previous session. Cannot be fixed without data migration. Comment added to `cosmos-init.ts`. - -**Risk:** Cross-partition queries will silently succeed but may return incomplete results or fail on point reads if the wrong partition key is specified. - ---- - -## 4. Missing Environment Variables in `.env.example` Files - -### ChronoMind `web/.env.example` - -Currently only has: -``` -NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003/api -``` - -**Missing:** -- `NEXT_PUBLIC_PRODUCT_ID=chronomind` — used implicitly by `platform-sync.ts` (hardcoded there, but should be env-driven for consistency) - -### LysnrAI `user-dashboard-web/.env.example` - -**Missing:** -- `NEXT_PUBLIC_PRODUCT_ID=lysnrai` — referenced by `feature-flags.ts` line 10 -- `NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003` — referenced by `feature-flags.ts` line 11 - -Has `PLATFORM_SERVICE_URL` (server-side) but not the `NEXT_PUBLIC_` variant (client-side). - -### LysnrAI root `.env.example` - -**Missing:** -- `NEXT_PUBLIC_PRODUCT_ID` — not needed at root level (desktop app), so this is informational only. - -### Admin dashboard `.env.example` - -**Missing:** -- `AZURE_KEYVAULT_URL` — referenced by `instrumentation.ts` but not in `.env.example` - ---- - -## 5. Hardcoded `productId` Values - -| Repo | File | Line | Value | Should Use | -|------|------|------|-------|-----------| -| `multimodal_memory` | `web/src/lib/telemetry.ts` | 19 | `productId: 'mindlyst'` | `process.env.NEXT_PUBLIC_PRODUCT_ID` | -| `voice_ai_agent` | `backend/src/clients/platform_client.py` | 86, 101 | `product_id: str = "lysnrai"` | `settings.PRODUCT_ID` or config | - ---- - -## 6. Python Backend Client Gaps (`platform_client.py`) - -The `PlatformClient` class in `backend/src/clients/platform_client.py` has several issues: - -1. **No `x-request-id` header** on any request -2. **No `x-product-id` header** on any request -3. **Creates new `httpx.AsyncClient` per request** — no connection pooling -4. **Hardcoded `product_id="lysnrai"` defaults** — should use config - ---- - -## 7. Previously Fixed (Session 1) - -| Fix | Repo | File | -|-----|------|------| -| Added `x-request-id` to billing client | `multimodal_memory` | `web/src/lib/billing-client.ts` | -| Added `x-request-id` to feature flags | `multimodal_memory` | `web/src/lib/feature-flags.ts` | -| Added 13 MindLyst containers to cosmos-init | `common_plat` | `services/platform-service/src/lib/cosmos-init.ts` | -| Added Blob Storage creds to Python config | `voice_ai_agent` | `backend/src/config.py` | -| Added missing env vars to MindLyst | `multimodal_memory` | `web/.env.example` | - ---- - -## 8. Recommended Fix Order - -1. **P0 — Root cause:** Add `x-request-id` auto-generation to `@bytelyst/api-client` `buildHeaders()` → fixes 9 TS clients at once -2. **P0 — LysnrAI feature-flags:** Add `x-request-id` to the custom `fetch` call in `user-dashboard-web/src/lib/feature-flags.ts` -3. **P1 — Python backend:** Add `x-request-id` and `x-product-id` headers to `platform_client.py` -4. **P1 — Env vars:** Add missing `NEXT_PUBLIC_*` vars to ChronoMind, LysnrAI user-dashboard, admin-dashboard `.env.example` files -5. **P2 — `x-product-id`:** Add to admin/user dashboard clients via `defaultHeaders` in `createApiClient` config -6. **P2 — Hardcoded productId:** Replace in `telemetry.ts` and `platform_client.py` -7. **P3 — Referrals PK mismatch:** Requires data migration strategy (separate task) diff --git a/docs/architecture/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md b/docs/architecture/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md new file mode 100644 index 00000000..c2c58fe3 --- /dev/null +++ b/docs/architecture/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md @@ -0,0 +1,749 @@ +# Cloud Provider Migration Analysis — ByteLyst Ecosystem + +> **Author:** AI Analysis (Cascade) +> **Date:** 2026-03-01 +> **Scope:** All 7 repos — LysnrAI, MindLyst, ChronoMind, NomGap, PeakPulse, Common Platform, JarvisJr +> **Purpose:** Evaluate current Azure investment, assess migration feasibility to AWS / GCP / MongoDB Atlas / multi-cloud, and provide actionable recommendations. + +--- + +## Table of Contents + +1. [Executive Summary](#1-executive-summary) +2. [Current Azure Investment Inventory](#2-current-azure-investment-inventory) +3. [Dependency Depth Analysis](#3-dependency-depth-analysis) +4. [Migration Target Comparison](#4-migration-target-comparison) +5. [Per-Service Migration Analysis](#5-per-service-migration-analysis) +6. [Migration Scenario Scoring](#6-migration-scenario-scoring) +7. [Cost Comparison](#7-cost-comparison) +8. [Abstraction Layer Assessment](#8-abstraction-layer-assessment) +9. [Risk Analysis](#9-risk-analysis) +10. [Recommendations](#10-recommendations) +11. [Migration Playbook (If Chosen)](#11-migration-playbook-if-chosen) +12. [Appendix A: File-Level Azure Dependency Map](#appendix-a-file-level-azure-dependency-map) +13. [Appendix B: SDK & Package Inventory](#appendix-b-sdk--package-inventory) + +--- + +## 1. Executive Summary + +The ByteLyst ecosystem is **moderately coupled** to Azure. The coupling is concentrated in **3 packages** (`@bytelyst/cosmos`, `@bytelyst/blob`, `@bytelyst/config`) and **2 Python modules** (`azure_stt.py`, `cosmos_client.py`). The architecture already uses an internal abstraction layer — most application code never imports Azure SDKs directly. + +### Key Findings + +| Dimension | Assessment | +| ------------------------------ | ------------------------------------------------------------------------------------------- | +| **Overall Azure lock-in** | **Medium** — concentrated in ~15 files, but those files are foundational | +| **Easiest to migrate** | Blob Storage, Key Vault, OpenAI, Application Insights | +| **Hardest to migrate** | Cosmos DB (SQL API queries in 56+ repository files), Azure Speech SDK | +| **Best alternative DB** | MongoDB Atlas (closest query model to Cosmos SQL API) | +| **Best alternative cloud** | AWS (broadest service parity, mature SDK ecosystem) | +| **Estimated migration effort** | 4–8 weeks for full cloud swap (Cosmos DB is the long pole) | +| **Recommendation** | **Stay on Azure** for now, but invest in abstraction layers to reduce future switching cost | + +### Azure Services Used (8 total) + +| # | Azure Service | Monthly Cost | Lock-in Risk | Files Affected | +| --- | ------------------------------------------- | ------------- | ------------ | ------------------------------------------------- | +| 1 | **Cosmos DB** (SQL/NoSQL API) | ~$4–10 | **HIGH** | 56+ repository files, 3 databases, ~45 containers | +| 2 | **Blob Storage** | ~$0.20 | LOW | 2 packages + 1 Python module | +| 3 | **Azure OpenAI** | ~$5–10 | LOW | 3 files (already supports OpenAI fallback) | +| 4 | **Speech Services** | $0 (F0) | **HIGH** | 2 files (deep SDK integration, streaming) | +| 5 | **Key Vault** | ~$0.06 | LOW | 2 files (1 TS, 1 Python) | +| 6 | **Notification Hubs** | $0 (Free) | MEDIUM | Planned, not yet deeply integrated | +| 7 | **Application Insights** | $0 (5GB free) | LOW | 1 file (custom telemetry already built) | +| 8 | **Azure Identity** (DefaultAzureCredential) | $0 | LOW | Used by Key Vault + Secrets Manager | + +--- + +## 2. Current Azure Investment Inventory + +### 2.1 Azure Resources (from Azure Portal) + +| Resource | Azure Name | Region | SKU | Status | +| ----------------- | ------------------------- | -------------- | ---------------- | ------------------------------- | +| Resource Group | `rg-mywisprai` | East US | — | Active | +| Cosmos DB | `cosmos-mywisprai` | West US 2 | Serverless | Active — 3 DBs, ~45 containers | +| Blob Storage | `bytelystblobs` | West US 2 | StorageV2, RAGRS | Active — 9+ containers | +| Azure OpenAI | `mywisprai-openai-sweden` | Sweden Central | S0 | Active — gpt-4o-mini deployment | +| Speech Service | `mywisprai-speech` | East US | F0 (Free) | Active | +| Key Vault | `kv-mywisprai` | East US | Standard | Active — ~25 secrets | +| Notification Hubs | `lysnnai` namespace | East US | Free | Active — 2 hubs | +| App Insights | `bytelyst-appinsights` | East US | Classic | Active | + +### 2.2 Cosmos DB Databases & Containers + +| Database | Containers | Products Using | +| ----------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------------- | +| `lysnrai` | ~27 containers (users, subscriptions, feature_flags, audit_log, tracker_items, telemetry_events, etc.) | LysnrAI, platform-service (all products) | +| `mindlyst` | ~20 containers (brains, memory_items, streaks, reflections, etc.) | MindLyst | +| `mywisprai` | 10 containers (legacy, pre-rebrand) | Legacy / migration target | + +**Total: ~57 containers across 3 databases**, all using Cosmos SQL (NoSQL) API with SQL-like queries (`SELECT`, `WHERE`, `ORDER BY`, `OFFSET/LIMIT`, aggregate functions). + +### 2.3 Code Investment by Language + +| Language | Azure SDK Packages | Files Using Azure | Lines of Azure-Specific Code | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ----------------- | ---------------------------- | +| **TypeScript** | `@azure/cosmos`, `@azure/storage-blob`, `@azure/identity`, `@azure/keyvault-secrets` | ~65 files | ~500 lines | +| **Python** | `azure-cognitiveservices-speech`, `azure-cosmos`, `azure-storage-blob`, `azure-identity`, `azure-keyvault-secrets`, `openai` (AzureOpenAI) | ~8 files | ~400 lines | +| **Swift** | `MicrosoftCognitiveServicesSpeech` (SPX framework) | ~3 files | ~150 lines | +| **Kotlin** | None directly (uses platform-service REST API) | 0 files | 0 lines | + +--- + +## 3. Dependency Depth Analysis + +### 3.1 Cosmos DB — DEEP (56+ files) + +This is the **most deeply embedded** Azure dependency. Every repository module follows the pattern: + +``` +types.ts → repository.ts → routes.ts + ↑ + Uses @azure/cosmos SDK + SQL queries: SELECT c.id, c.name FROM c WHERE c.productId = @pid +``` + +**Touchpoints:** + +- `packages/cosmos/` — shared client singleton (`@azure/cosmos` peer dep) +- `services/platform-service/src/modules/*/repository.ts` — **56 repository files** with Cosmos SQL queries +- `services/extraction-service/src/modules/*/repository.ts` — 2 repository files +- `dashboards/admin-web/src/lib/cosmos.ts` — direct `@azure/cosmos` import +- `dashboards/admin-web/src/lib/repositories/*.ts` — 4 repository files +- `mindlyst-native/web/src/lib/cosmos.ts` — direct `@azure/cosmos` import +- `learning_voice_ai_agent/src/cloud/cosmos_client.py` — Python Cosmos client +- `learning_voice_ai_agent/backend/src/cloud/cosmos.py` — Python backend Cosmos client + +**Query patterns used:** + +- `container.items.query()` with parameterized SQL +- `container.items.create()`, `.replace()`, `.delete()`, `.read()` +- `container.items.upsert()` +- Partition key routing (`/userId`, `/productId`, `/id`) +- Cross-partition queries (admin/analytics) +- `SELECT VALUE COUNT(1)` aggregates +- `OFFSET ... LIMIT` pagination +- `ORDER BY` sorting +- `ARRAY_CONTAINS()` for array queries + +### 3.2 Azure Speech SDK — DEEP (3 files, streaming integration) + +The Speech SDK is used for **real-time streaming speech-to-text** with features that are tightly coupled to the Azure SDK's event-driven architecture: + +- `src/audio/azure_stt.py` — 248 lines. Uses `PushAudioInputStream`, `SpeechRecognizer`, continuous recognition with `recognizing`/`recognized`/`canceled`/`session_stopped` event callbacks, `PhraseListGrammar`, auto-language detection (10 languages), auto-reconnect +- `src/ui/settings.py` + `src/ui/unified_window.py` — connection testing +- `mindlyst-native/iosApp/Services/AzureSpeechTranscriber.swift` — iOS Swift SPX framework +- `mobile_app/ios/LysnrAI/` — iOS keyboard extension uses SPX framework + +### 3.3 Blob Storage — SHALLOW (3 files) + +- `packages/blob/src/blob.ts` — 162 lines, singleton client, SAS URL generation +- `src/cloud/blob_client.py` — 190 lines, Python equivalent +- `services/platform-service/src/modules/blob/` — REST API wrapper + +### 3.4 Azure OpenAI — SHALLOW (3 files, already abstracted) + +- `src/llm/text_cleaner.py` — uses `openai.AzureOpenAI` (OpenAI SDK with Azure endpoint) +- `backend/src/clients/openai_client.py` — uses `openai.AsyncAzureOpenAI` +- `mindlyst-native/web/src/lib/llm.ts` — **already has OpenAI fallback** (resolves provider dynamically) + +The `openai` Python/JS SDK supports both Azure and OpenAI endpoints with minimal config change. MindLyst web already handles this automatically. + +### 3.5 Key Vault — SHALLOW (2 files) + +- `packages/config/src/keyvault.ts` — 90 lines, `resolveKeyVaultSecrets()` with graceful fallback +- `src/secrets/keyvault.py` — 69 lines, `SecretResolver` class with env var fallback + +Both implementations already fall back to environment variables when Key Vault is unavailable. Migration = just stop using Key Vault and use the env var path. + +### 3.6 Notification Hubs — NOT YET INTEGRATED + +Planned but not deeply wired. Only namespace/hub exists in Azure. Mobile apps use `BLPlatformClient` (REST) to talk to platform-service, which would route push notifications. + +### 3.7 Application Insights — SHALLOW (1 file) + +- `opencensus-ext-azure` in Python requirements (optional telemetry) +- Custom telemetry system already built (`@bytelyst/telemetry-client`, platform-service telemetry module with Cosmos storage) + +The custom telemetry system means App Insights is supplementary, not critical. + +--- + +## 4. Migration Target Comparison + +### 4.1 Database: Cosmos DB → Alternatives + +| Feature | Azure Cosmos DB (current) | MongoDB Atlas | AWS DynamoDB | Google Firestore | PostgreSQL (Supabase/Neon) | +| ----------------------- | ----------------------------------- | ------------------------------------ | ---------------------------------- | --------------------- | -------------------------- | +| **Data model** | Document (JSON) | Document (JSON) | Key-Value + Document | Document (JSON) | Relational + JSONB | +| **Query language** | SQL-like | MQL (MongoDB Query) | PartiQL / API | GQL-like API | SQL | +| **Partition keys** | Required | Shard keys (optional) | Required | Collection groups | Not applicable | +| **Serverless** | Yes | Yes (Atlas Serverless) | Yes | Yes | Yes (Neon) | +| **SQL queries** | `SELECT c.id FROM c WHERE c.x = @y` | `db.collection.find({x: y})` | `SELECT id FROM table WHERE x = ?` | Client SDK queries | Standard SQL | +| **Aggregates** | Basic (`COUNT`, `SUM`, `AVG`) | Full (`$group`, `$match`, `$lookup`) | Limited | Limited | Full SQL | +| **Cross-partition** | Yes (expensive) | Yes (scatter-gather) | Scan (expensive) | Yes | N/A | +| **Change feed** | Yes | Change Streams | DynamoDB Streams | Real-time listeners | Logical replication | +| **Global distribution** | Built-in multi-region | Atlas Global Clusters | Global Tables | Multi-region | Manual / Citus | +| **Max doc size** | 2 MB | 16 MB | 400 KB | 1 MB | Unlimited (JSONB) | +| **Free tier** | 1000 RU/s + 25 GB | 512 MB | 25 GB + 25 WCU/RCU | 1 GiB + 50K reads/day | 0.5 GB (Neon) | +| **Migration effort** | — | **Medium** (query rewrite) | **Hard** (paradigm shift) | **Hard** (no SQL) | **Hard** (schema design) | + +### 4.2 Object Storage: Blob → Alternatives + +| Feature | Azure Blob (current) | AWS S3 | GCP Cloud Storage | Cloudflare R2 | MinIO (self-hosted) | +| --------------------- | -------------------- | ----------------- | ------------------- | ------------------ | ------------------- | +| **API compatibility** | Azure Blob API | S3 API | GCS API / S3-compat | S3-compatible | S3-compatible | +| **SAS tokens** | Yes (Azure SAS) | Pre-signed URLs | Signed URLs | Pre-signed URLs | Pre-signed URLs | +| **CDN integration** | Azure CDN | CloudFront | Cloud CDN | Built-in | Manual | +| **Cost (per GB)** | $0.018 (Cool) | $0.023 (Standard) | $0.020 | $0.015 (no egress) | Self-hosted | +| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** | + +### 4.3 Speech-to-Text: Azure Speech → Alternatives + +| Feature | Azure Speech (current) | AWS Transcribe | Google Speech-to-Text | Deepgram | Whisper (local) | +| --------------------- | ----------------------- | --------------------- | ------------------------ | ---------------------- | ----------------------- | +| **Streaming STT** | Yes (push stream) | Yes (WebSocket) | Yes (streaming) | Yes (WebSocket) | No (batch only) | +| **Languages** | 100+ | 100+ | 125+ | 36+ | 99+ | +| **Auto-detect lang** | Up to 10 at-once | Yes | Yes | Yes | Yes | +| **Custom vocabulary** | PhraseListGrammar | Custom vocabulary | Speech adaptation | Keywords | No | +| **Native SDK** | Python, Swift (SPX), JS | Python, no iOS SDK | Python, iOS, JS | REST/WebSocket | Python only | +| **iOS native SDK** | SPX framework (ObjC) | No native SDK | Yes (gRPC) | No native SDK | No | +| **Free tier** | 5 hrs/month (F0) | 60 min/month | 60 min/month | None | Free (local GPU) | +| **Latency** | ~200ms | ~300ms | ~200ms | ~100ms | ~500ms+ (local) | +| **Migration effort** | — | **Hard** (no iOS SDK) | **Medium** (has iOS SDK) | **Medium** (REST only) | **Hard** (no streaming) | + +### 4.4 LLM / AI: Azure OpenAI → Alternatives + +| Feature | Azure OpenAI (current) | OpenAI API (direct) | Google Gemini | AWS Bedrock | Anthropic Claude | +| ---------------------- | ------------------------ | ----------------------------- | -------------------- | -------------------- | -------------------- | +| **Models** | GPT-4o, GPT-4o-mini | Same models | Gemini 2.5 | Claude, Llama, Titan | Claude 3.5/4 | +| **API compatibility** | OpenAI SDK (azure mode) | OpenAI SDK (native) | Google SDK | AWS SDK | Anthropic SDK | +| **Data residency** | Azure regions | US only | Google regions | AWS regions | US/EU | +| **Cost (GPT-4o-mini)** | $0.15/$0.60 per M tokens | $0.15/$0.60 per M tokens | ~$0.10/$0.40 (Flash) | Varies | ~$0.25/$1.25 (Haiku) | +| **Migration effort** | — | **Trivial** (change endpoint) | **Easy** (SDK swap) | **Medium** | **Easy** (SDK swap) | + +### 4.5 Secrets Management: Key Vault → Alternatives + +| Feature | Azure Key Vault (current) | AWS Secrets Manager | GCP Secret Manager | HashiCorp Vault | Doppler / Infisical | +| -------------------- | ------------------------- | --------------------------------- | ------------------------------ | --------------- | ------------------- | +| **Cost** | $0.03/10K ops | $0.40/secret/month | $0.06/10K ops | Free (OSS) | Free tier | +| **SDK** | `@azure/keyvault-secrets` | `@aws-sdk/client-secrets-manager` | `@google-cloud/secret-manager` | HTTP API | SDK / CLI | +| **Migration effort** | — | **Easy** | **Easy** | **Medium** | **Easy** | + +**Note:** The codebase already falls back to env vars when Key Vault is unavailable. This means Key Vault can be replaced by **any** secrets manager or simply .env files without code changes to application logic. + +### 4.6 Push Notifications: Notification Hubs → Alternatives + +| Feature | Azure NH (current) | AWS SNS | Firebase Cloud Messaging | OneSignal | Expo Push | +| -------------------- | ------------------ | ------------ | ------------------------ | --------------- | --------------------------- | +| **APNs + FCM** | Yes | Yes | FCM only (APNs via FCM) | Yes | Yes | +| **Free tier** | 1M pushes/month | 1M publishes | Unlimited | 10K subscribers | Unlimited | +| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** (NomGap uses Expo) | + +--- + +## 5. Per-Service Migration Analysis + +### 5.1 Cosmos DB → MongoDB Atlas + +**Difficulty: MEDIUM-HIGH** | **Effort: 3–5 weeks** | **Risk: MEDIUM** + +This is the **single largest migration task**. Here's why: + +#### What needs to change + +| Layer | Current (Cosmos SQL API) | Target (MongoDB) | Files | +| ------------------------ | ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | ----------------------------------- | +| Client package | `@azure/cosmos` → `CosmosClient` | `mongodb` → `MongoClient` | `packages/cosmos/src/client.ts` | +| Container registry | `getContainer(name)` | `db.collection(name)` | `packages/cosmos/src/containers.ts` | +| All repository files | `container.items.query('SELECT...')` | `collection.find({...})` | **56+ files** in platform-service | +| Dashboard Cosmos clients | `@azure/cosmos` direct | `mongodb` direct | 2 files (admin, MindLyst) | +| Python clients | `azure.cosmos.CosmosClient` | `pymongo.MongoClient` | 2 files | +| Query syntax | SQL-like (`SELECT c.id FROM c WHERE c.productId = @pid AND c.userId = @uid ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20`) | MQL (`collection.find({productId: pid, userId: uid}).sort({createdAt: -1}).skip(0).limit(20)`) | All repository files | +| Partition keys | Explicit partition key in every query | Shard key (auto-routed) | All repository files | +| Upsert | `container.items.upsert(doc)` | `collection.updateOne({_id: id}, {$set: doc}, {upsert: true})` | ~20 files | +| Read by ID | `container.item(id, partitionKey).read()` | `collection.findOne({_id: id})` | All repository files | + +#### What stays the same + +- Document structure (JSON documents with `id`, `productId`, partition keys) +- Data model (no schema changes needed — MongoDB is also schemaless) +- Partition key concept maps to shard key +- Serverless pricing model available on both + +#### Key migration steps + +1. Update `@bytelyst/cosmos` package to export MongoDB-compatible API +2. Rewrite all SQL queries to MQL (56+ files) +3. Replace `container.items.query()` → `collection.find()` +4. Replace `container.item(id, pk).read()` → `collection.findOne({_id: id})` +5. Replace `container.items.create()` → `collection.insertOne()` +6. Replace `container.items.replace()` → `collection.replaceOne()` +7. Replace `container.items.upsert()` → `collection.updateOne({upsert: true})` +8. Update Python clients similarly +9. Migrate data (use Azure Data Factory or custom script) +10. Update all test mocks + +#### Why MongoDB Atlas is the best DB alternative + +- **Closest query model** to Cosmos SQL API (both are document DBs) +- **MongoDB has a Cosmos DB compatibility mode** (but going native is better) +- Cosmos DB was originally inspired by MongoDB's document model +- MongoDB's `find()` queries map closely to Cosmos SQL `SELECT` queries +- Both support partition/shard keys, TTL indexes, change streams +- MongoDB Atlas Serverless pricing is competitive +- MongoDB has excellent TypeScript and Python SDKs + +### 5.2 Azure Speech → Google Cloud Speech-to-Text + +**Difficulty: HIGH** | **Effort: 2–3 weeks** | **Risk: HIGH** + +#### Why this is hard + +- The Azure Speech SDK uses a **push-stream architecture** (`PushAudioInputStream`) that is deeply integrated into the audio pipeline +- The `SpeechRecognizer` has event-driven callbacks (`recognizing`, `recognized`, `canceled`, `session_stopped`) that the code relies on for real-time partial/final transcript delivery +- Custom vocabulary via `PhraseListGrammar` is Azure-specific +- Auto-language detection config is Azure-specific +- The **iOS SPX framework** (Objective-C) is used in LysnrAI keyboard extension and MindLyst — there's no direct equivalent for most alternatives + +#### Best alternative: Google Cloud Speech-to-Text + +- Has streaming recognition with similar event model +- Has an iOS SDK (gRPC-based) +- Supports custom vocabulary (speech adaptation) +- Supports auto-language detection +- Similar pricing and free tier + +#### What needs to change + +- `src/audio/azure_stt.py` — complete rewrite (~248 lines) +- `iosApp/Services/AzureSpeechTranscriber.swift` — complete rewrite +- `LysnrAI/LysnrKeyboard/` — keyboard extension STT integration +- Audio format handling (may differ between providers) +- Connection test code in settings UI + +### 5.3 Blob Storage → AWS S3 or Cloudflare R2 + +**Difficulty: LOW** | **Effort: 2–3 days** | **Risk: LOW** + +#### Why this is easy + +- `@bytelyst/blob` package is a thin wrapper (162 lines) +- Only 3 files need changes +- S3 API is the de facto standard — R2, MinIO, GCS all support S3-compatible API +- SAS tokens → Pre-signed URLs (same concept, different implementation) + +#### What needs to change + +- `packages/blob/src/blob.ts` — swap `@azure/storage-blob` → `@aws-sdk/client-s3` + `@aws-sdk/s3-request-presigner` +- `src/cloud/blob_client.py` — swap `azure.storage.blob` → `boto3` +- `services/platform-service/src/modules/blob/` — update routes for pre-signed URL format +- Environment variables: `AZURE_BLOB_*` → `AWS_S3_*` or `S3_*` + +### 5.4 Azure OpenAI → OpenAI API (direct) or Gemini + +**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW** + +#### Why this is trivial + +- The `openai` Python SDK supports both Azure and OpenAI endpoints — just change config +- MindLyst web `llm.ts` **already auto-detects** Azure vs OpenAI and builds the correct URL +- LysnrAI desktop uses `AzureOpenAI` class from `openai` SDK — switch to `OpenAI` class +- Same models, same API shape, same pricing + +#### What needs to change + +- Set `OPENAI_API_KEY` instead of `AZURE_OPENAI_*` env vars +- Change `AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)` → `OpenAI(api_key=...)` +- Change `AsyncAzureOpenAI(...)` → `AsyncOpenAI(...)` +- Remove `api_version` parameter +- That's it. The `openai` SDK handles the rest. + +### 5.5 Key Vault → Environment Variables / Any Secrets Manager + +**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW** + +Both `keyvault.ts` and `keyvault.py` already implement graceful fallback: + +- If `AZURE_KEYVAULT_URL` is not set → uses env vars directly +- If Key Vault is unreachable → falls back to env vars + +**To migrate:** Simply stop setting `AZURE_KEYVAULT_URL`. Everything works via env vars. Then optionally adopt any other secrets manager (AWS Secrets Manager, Doppler, Infisical, etc.). + +### 5.6 Notification Hubs → Firebase Cloud Messaging + +**Difficulty: LOW** | **Effort: 1–2 days** | **Risk: LOW** + +Not yet deeply integrated. The platform-service notification module sends via REST API. Swap the push provider client. + +### 5.7 Application Insights → Self-hosted / Grafana + +**Difficulty: TRIVIAL** | **Effort: Already done** | **Risk: NONE** + +The ecosystem already has: + +- Custom telemetry system (`@bytelyst/telemetry-client` → platform-service → Cosmos) +- Loki + Grafana in `services/monitoring/` +- App Insights is supplementary, can be dropped with zero code changes + +--- + +## 6. Migration Scenario Scoring + +### Scenario A: Stay on Azure (Status Quo) + +| Dimension | Score (1-5) | Notes | +| ---------------- | ------------ | ---------------------------------------- | +| Migration effort | **5** (none) | No work needed | +| Cost | **4** | ~$15/month at current scale, competitive | +| Vendor diversity | **1** | Single cloud vendor | +| Feature parity | **5** | Everything works today | +| **Total** | **15/20** | | + +### Scenario B: Full Migration to AWS + +| Dimension | Score (1-5) | Notes | +| ---------------- | ----------- | ---------------------------------------------------------------- | +| Migration effort | **2** | 6–8 weeks, Cosmos→DynamoDB is painful | +| Cost | **3** | Similar or slightly higher at small scale | +| Vendor diversity | **1** | Still single cloud, just different | +| Feature parity | **3** | No native iOS Speech SDK, DynamoDB query model is very different | +| **Total** | **9/20** | | + +### Scenario C: Multi-Cloud (MongoDB Atlas + OpenAI + R2 + Google STT) + +| Dimension | Score (1-5) | Notes | +| ---------------- | ----------- | -------------------------------------------------------- | +| Migration effort | **2** | 5–7 weeks, Cosmos→MongoDB is medium | +| Cost | **4** | MongoDB Atlas free tier, R2 no egress fees | +| Vendor diversity | **5** | No single-vendor dependency | +| Feature parity | **4** | MongoDB is a better document DB than Cosmos in many ways | +| **Total** | **15/20** | | + +### Scenario D: Stay Azure + Add Abstraction Layers + +| Dimension | Score (1-5) | Notes | +| ---------------- | ----------- | --------------------------------------------- | +| Migration effort | **4** | 1–2 weeks to add repository interface pattern | +| Cost | **4** | No change | +| Vendor diversity | **3** | Ready to switch, but still on Azure | +| Feature parity | **5** | Everything works today | +| **Total** | **16/20** | **Winner** | + +### Scenario E: Migrate DB Only (Cosmos → MongoDB Atlas, keep rest on Azure) + +| Dimension | Score (1-5) | Notes | +| ---------------- | ----------- | --------------------------------------------- | +| Migration effort | **3** | 3–5 weeks for DB migration | +| Cost | **4** | MongoDB Atlas Serverless may be cheaper | +| Vendor diversity | **3** | DB is independent, other services still Azure | +| Feature parity | **5** | MongoDB is very capable | +| **Total** | **15/20** | | + +--- + +## 7. Cost Comparison + +### Current Azure Costs (MVP / Low Usage) + +| Service | Monthly Cost | Notes | +| -------------------------- | ----------------- | --------------------------- | +| Cosmos DB (Serverless) | ~$4–10 | 3 databases, ~45 containers | +| Blob Storage (Cool, RAGRS) | ~$0.20 | 9+ containers | +| Azure OpenAI (GPT-4o-mini) | ~$5–10 | Pay per token | +| Speech (F0) | $0 | 5 hrs/month free | +| Key Vault | ~$0.06 | ~25 secrets | +| Notification Hubs (Free) | $0 | 1M pushes/month | +| App Insights | $0 | 5 GB/month free | +| **Total** | **~$10–20/month** | | + +### Equivalent AWS Costs + +| Service | AWS Equivalent | Monthly Cost | +| -------------------------------- | --------------- | ------------------------- | +| Cosmos DB → DynamoDB (On-Demand) | DynamoDB | ~$5–15 | +| Blob → S3 Standard | S3 | ~$0.25 | +| Azure OpenAI → OpenAI API | Same pricing | ~$5–10 | +| Speech → Transcribe | Transcribe | ~$1–3 | +| Key Vault → Secrets Manager | Secrets Manager | ~$10 (per-secret pricing) | +| Notification Hubs → SNS | SNS | ~$0.50 | +| App Insights → CloudWatch | CloudWatch | ~$3 | +| **Total** | | **~$25–42/month** | + +### Equivalent Multi-Cloud Costs + +| Service | Provider | Monthly Cost | +| ------------------------------------ | ------------ | ------------------ | +| Cosmos DB → MongoDB Atlas Serverless | MongoDB | ~$3–8 | +| Blob → Cloudflare R2 | Cloudflare | ~$0.15 (no egress) | +| Azure OpenAI → OpenAI API (direct) | OpenAI | ~$5–10 | +| Speech → Google STT | Google Cloud | ~$1–3 | +| Key Vault → Doppler (free tier) | Doppler | $0 | +| Push → Firebase FCM | Google | $0 | +| Monitoring → Grafana Cloud (free) | Grafana | $0 | +| **Total** | | **~$10–22/month** | + +### Cost Summary + +| Scenario | Monthly Cost | vs Current | +| ------------------------------ | ------------ | ---------- | +| **Azure (current)** | ~$10–20 | Baseline | +| **Full AWS** | ~$25–42 | +50–110% | +| **Multi-cloud** | ~$10–22 | ~Same | +| **MongoDB Atlas + Azure rest** | ~$10–18 | ~Same | + +**Verdict:** At current scale, cost is not a compelling reason to migrate. All options are under $50/month. Cost becomes more significant at scale (10K+ users), where MongoDB Atlas and R2 would likely be cheaper due to no egress fees and better serverless pricing. + +--- + +## 8. Abstraction Layer Assessment + +### Current State: Partially Abstracted + +The codebase already has meaningful abstraction: + +| Layer | Abstraction Level | Notes | +| ---------------------- | ----------------------------------------------- | ---------------------------------------------------------------------------- | +| **Cosmos DB** | **Partial** — `@bytelyst/cosmos` package | Application code still writes raw SQL queries and uses `@azure/cosmos` types | +| **Blob Storage** | **Good** — `@bytelyst/blob` package | Thin wrapper, easy to swap internals | +| **OpenAI/LLM** | **Good** — MindLyst has provider auto-detection | LysnrAI desktop/backend hardcodes `AzureOpenAI` | +| **Key Vault** | **Excellent** — graceful fallback to env vars | Already cloud-agnostic in practice | +| **Speech** | **None** — raw SDK usage | Deep Azure SDK coupling in 3 files | +| **Auth (JWT)** | **Excellent** — uses `jose` library | No cloud dependency | +| **Push notifications** | **Good** — platform-service abstraction | Swap provider client only | + +### What's Missing: Repository Interface Pattern + +The biggest gap is that repository files directly use `@azure/cosmos` types and SQL query syntax. To make the DB layer swappable, you'd need: + +```typescript +// Proposed: packages/cosmos/src/repository.ts +export interface DocumentRepository { + findById(id: string, partitionKey: string): Promise; + findMany(filter: Record, opts?: QueryOptions): Promise; + create(doc: T): Promise; + replace(id: string, doc: T, partitionKey: string): Promise; + upsert(doc: T): Promise; + delete(id: string, partitionKey: string): Promise; + count(filter: Record): Promise; +} +``` + +This would allow swapping Cosmos → MongoDB → PostgreSQL behind the interface without touching 56+ repository files. + +**Effort to add:** 1–2 weeks. This is the **highest-ROI investment** regardless of migration decision. + +--- + +## 9. Risk Analysis + +### 9.1 Risks of Staying on Azure + +| Risk | Likelihood | Impact | Mitigation | +| -------------------------------- | ---------- | ------ | ---------------------------------------------------------- | +| Azure pricing increases | Low | Medium | Add abstraction layer for future portability | +| Azure outage | Low | High | Multi-region already possible (Cosmos global distribution) | +| Feature stagnation | Very Low | Low | Azure is investing heavily in AI services | +| Vendor lock-in deepens over time | Medium | Medium | Add abstraction layers proactively | + +### 9.2 Risks of Migrating + +| Risk | Likelihood | Impact | Mitigation | +| --------------------------------------- | ---------- | -------- | ---------------------------------------------------------- | +| Data loss during migration | Low | Critical | Test migration on staging first, keep Azure as backup | +| Query performance differences | Medium | Medium | Benchmark before committing | +| Feature gaps in new provider | Medium | Medium | Prototype critical features first | +| Wasted engineering time | Medium | High | Only migrate if there's a clear business driver | +| Regression bugs in 56+ repository files | High | Medium | Comprehensive test suite (1,029 tests) catches most issues | +| Speech quality degradation | Medium | High | A/B test both providers before committing | + +### 9.3 Azure-Specific Lock-in Risks (ranked) + +| # | Component | Lock-in Level | Escape Hatch | +| --- | ------------------------------------------- | ------------- | ---------------------------------------------------------- | +| 1 | **Cosmos DB SQL API** | High | Rewrite queries to MongoDB MQL or add repository interface | +| 2 | **Azure Speech SDK (streaming)** | High | Google STT has comparable streaming API | +| 3 | **Azure Identity (DefaultAzureCredential)** | Medium | Only used by Key Vault, which is already optional | +| 4 | **Blob Storage SAS tokens** | Low | Pre-signed URLs are equivalent across all providers | +| 5 | **Azure OpenAI** | Very Low | OpenAI SDK works with both — 1-line config change | +| 6 | **Key Vault** | Very Low | Already has env var fallback | +| 7 | **Notification Hubs** | Very Low | Not deeply integrated yet | +| 8 | **Application Insights** | None | Custom telemetry already built | + +--- + +## 10. Recommendations + +### Recommended Strategy: **Stay on Azure + Invest in Abstraction** (Scenario D) + +This is the highest-scoring approach. Here's the prioritized action plan: + +#### Phase 1: Add Repository Interface (1–2 weeks) + +- Create `DocumentRepository` interface in `@bytelyst/cosmos` +- Implement `CosmosDocumentRepository` that wraps current `@azure/cosmos` calls +- Gradually migrate the 56 repository files to use the interface +- This makes future DB migration a matter of implementing `MongoDocumentRepository` — no application code changes needed + +#### Phase 2: Normalize LLM Abstraction (2–3 days) + +- Move LysnrAI desktop/backend from `AzureOpenAI` → auto-detecting provider pattern (like MindLyst web already does) +- Support `OPENAI_PROVIDER=azure|openai|gemini` across all repos +- This makes LLM provider swappable via config + +#### Phase 3: Speech Abstraction Layer (1 week, optional) + +- Create `SpeechTranscriber` protocol/interface +- Implement `AzureSpeechTranscriber` (current code, extracted) +- Prepare `GoogleSpeechTranscriber` stub for future use +- This is lower priority since Azure Speech F0 tier is free + +#### Phase 4: Document Decision Criteria for Future Migration + +- Define triggers that would justify migration (e.g., cost > $X/month, Azure outage > Y hours, need for feature Z) +- Review annually + +### Why NOT Migrate Now + +1. **Cost is negligible** — ~$10–20/month doesn't justify weeks of engineering +2. **No business driver** — Azure isn't blocking any feature development +3. **Risk/reward is unfavorable** — 4–8 weeks of migration work for ~$0 cost savings +4. **Test coverage is good but not perfect** — 1,029 tests cover most paths, but query-level changes in 56 files still risk regressions +5. **Azure free tiers are generous** — Speech F0, Notification Hubs Free, App Insights free tier + +### When Migration WOULD Make Sense + +- **Cosmos DB costs exceed $100/month** → Consider MongoDB Atlas Serverless +- **Azure Speech quality is insufficient** → Evaluate Google STT or Deepgram +- **Enterprise customer requires specific cloud** → Build the repository interface, then implement their cloud backend +- **Azure has extended outage affecting your region** → Multi-region or multi-cloud +- **You want to go fully open-source** → PostgreSQL (Supabase) + Whisper + MinIO (significant rewrite) + +--- + +## 11. Migration Playbook (If Chosen) + +If you decide to migrate in the future, here's the execution order (shortest critical path): + +### Week 1–2: Database Abstraction + +1. Create `DocumentRepository` interface +2. Implement `CosmosDocumentRepository` (wraps current code) +3. Migrate all 56 repository files to use interface +4. Verify all 1,029 tests pass + +### Week 3–4: Database Migration (Cosmos → MongoDB) + +1. Implement `MongoDocumentRepository` +2. Set up MongoDB Atlas Serverless cluster +3. Write data migration script (Cosmos → MongoDB) +4. Run migration on staging, verify data integrity +5. Switch repository implementation via config flag +6. Run full test suite against MongoDB + +### Week 5: Storage + Secrets + +1. Swap `@bytelyst/blob` internals to S3-compatible client +2. Migrate blobs (azcopy → aws s3 sync or similar) +3. Replace Key Vault with new secrets manager (or just env vars) +4. Update all environment variable names + +### Week 6: LLM + Speech (if needed) + +1. Switch OpenAI from Azure endpoint to direct (config change only) +2. If migrating Speech: rewrite `azure_stt.py` and Swift `AzureSpeechTranscriber` +3. A/B test new speech provider against Azure + +### Week 7–8: Cleanup + Verification + +1. Remove all `@azure/*` npm packages +2. Remove all `azure-*` pip packages +3. Update Docker configs, CI/CD +4. Update documentation +5. Monitor production for 2 weeks + +--- + +## Appendix A: File-Level Azure Dependency Map + +### TypeScript — `@azure/cosmos` (CRITICAL) + +| File | Repo | Direct Import | +| ------------------------------------------------------------------- | ----------- | ---------------------- | +| `packages/cosmos/src/client.ts` | common-plat | `@azure/cosmos` | +| `packages/cosmos/src/containers.ts` | common-plat | `@azure/cosmos` | +| `services/platform-service/src/modules/*/repository.ts` (56 files) | common-plat | Via `@bytelyst/cosmos` | +| `services/extraction-service/src/modules/*/repository.ts` (2 files) | common-plat | Via `@bytelyst/cosmos` | +| `dashboards/admin-web/src/lib/cosmos.ts` | common-plat | `@azure/cosmos` | +| `dashboards/admin-web/src/lib/repositories/*.ts` (4 files) | common-plat | Via cosmos.ts | +| `mindlyst-native/web/src/lib/cosmos.ts` | MindLyst | `@azure/cosmos` | + +### TypeScript — `@azure/storage-blob` + +| File | Repo | Direct Import | +| --------------------------- | ----------- | --------------------- | +| `packages/blob/src/blob.ts` | common-plat | `@azure/storage-blob` | + +### TypeScript — `@azure/identity` + `@azure/keyvault-secrets` + +| File | Repo | Direct Import | +| ------------------------------------------------------- | ----------- | ------------------------- | +| `packages/config/src/keyvault.ts` | common-plat | Dynamic import (both) | +| `dashboards/admin-web/src/app/api/ops/secrets/route.ts` | common-plat | Both (Secrets Manager UI) | + +### Python — Azure SDKs + +| File | Repo | SDK | +| -------------------------------------- | ------- | ------------------------------------------ | +| `src/audio/azure_stt.py` | LysnrAI | `azure.cognitiveservices.speech` | +| `src/cloud/cosmos_client.py` | LysnrAI | `azure.cosmos` | +| `src/cloud/blob_client.py` | LysnrAI | `azure.storage.blob` | +| `src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` | +| `backend/src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` | +| `backend/src/cloud/cosmos.py` | LysnrAI | `azure.cosmos` | +| `src/llm/text_cleaner.py` | LysnrAI | `openai.AzureOpenAI` | +| `backend/src/clients/openai_client.py` | LysnrAI | `openai.AsyncAzureOpenAI` | + +### Swift — Azure Speech SDK + +| File | Repo | SDK | +| ---------------------------------------------------- | -------- | ---------------------------------- | +| `iosApp/Services/AzureSpeechTranscriber.swift` | MindLyst | `MicrosoftCognitiveServicesSpeech` | +| `LysnrAI/LysnrKeyboard/KeyboardViewController.swift` | LysnrAI | SPX framework (via CocoaPods) | + +--- + +## Appendix B: SDK & Package Inventory + +### npm packages (TypeScript) + +| Package | Version | Used By | Swappable | +| ------------------------- | ------- | ------------------------------------------- | ---------------------- | +| `@azure/cosmos` | ≥4.0.0 | `@bytelyst/cosmos`, admin-web, MindLyst web | Medium (query rewrite) | +| `@azure/storage-blob` | ≥12.0.0 | `@bytelyst/blob` | Easy (S3 compat) | +| `@azure/identity` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) | +| `@azure/keyvault-secrets` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) | + +### pip packages (Python) + +| Package | Version | Used By | Swappable | +| -------------------------------- | -------- | ---------------------------------- | --------------------------- | +| `azure-cognitiveservices-speech` | ≥1.42.0 | Desktop STT | Hard (deep SDK integration) | +| `azure-cosmos` | latest | Desktop + backend Cosmos client | Medium (pymongo swap) | +| `azure-storage-blob` | ≥12.24.0 | Desktop blob client | Easy (boto3 swap) | +| `azure-identity` | ≥1.19.0 | Key Vault auth | Easy (remove) | +| `azure-keyvault-secrets` | ≥4.9.0 | Secrets resolver | Easy (remove) | +| `openai` | ≥1.60.0 | `AzureOpenAI` / `AsyncAzureOpenAI` | Trivial (change class name) | +| `opencensus-ext-azure` | ≥1.1.0 | Optional telemetry | Trivial (remove) | + +### Swift packages / CocoaPods + +| Package | Used By | Swappable | +| ---------------------------------------- | ------------------------- | ------------------------------------- | +| `MicrosoftCognitiveServicesSpeech` (SPX) | LysnrAI iOS, MindLyst iOS | Hard (need alternative streaming STT) | + +--- + +_Document generated by automated codebase analysis. Numbers are accurate as of 2026-03-01. Update as the codebase evolves._ diff --git a/docs/COMMON_PLATFORM_ANALYSIS.md b/docs/architecture/COMMON_PLATFORM_ANALYSIS.md similarity index 100% rename from docs/COMMON_PLATFORM_ANALYSIS.md rename to docs/architecture/COMMON_PLATFORM_ANALYSIS.md diff --git a/docs/ECOSYSTEM_ARCHITECTURE.md b/docs/architecture/ECOSYSTEM_ARCHITECTURE.md similarity index 100% rename from docs/ECOSYSTEM_ARCHITECTURE.md rename to docs/architecture/ECOSYSTEM_ARCHITECTURE.md diff --git a/docs/ecosystem-after-refactor.drawio b/docs/architecture/ecosystem-after-refactor.drawio similarity index 100% rename from docs/ecosystem-after-refactor.drawio rename to docs/architecture/ecosystem-after-refactor.drawio diff --git a/docs/WINDSURF/AI_SECURITY_AUDIT_REPORT.md b/docs/audits/AI_SECURITY_AUDIT_REPORT.md similarity index 100% rename from docs/WINDSURF/AI_SECURITY_AUDIT_REPORT.md rename to docs/audits/AI_SECURITY_AUDIT_REPORT.md diff --git a/docs/WINDSURF/AUTH_CROSS_PRODUCT_ANALYSIS.md b/docs/audits/AUTH_CROSS_PRODUCT_ANALYSIS.md similarity index 100% rename from docs/WINDSURF/AUTH_CROSS_PRODUCT_ANALYSIS.md rename to docs/audits/AUTH_CROSS_PRODUCT_ANALYSIS.md diff --git a/docs/audits/AZURE_CONNECTION_AUDIT.md b/docs/audits/AZURE_CONNECTION_AUDIT.md new file mode 100644 index 00000000..344e3a15 --- /dev/null +++ b/docs/audits/AZURE_CONNECTION_AUDIT.md @@ -0,0 +1,186 @@ +# Azure Connection Audit — Full Workspace Report + +> **Date:** 2026-02-22 +> **Scope:** `learning_ai_common_plat`, `learning_voice_ai_agent`, `learning_multimodal_memory_agents`, `learning_ai_clock`, `learning_ai_fastgap` +> **Auditor:** Cascade (AI) + +--- + +## Executive Summary + +| Category | Issues Found | Fixed (session 1) | Fixed (session 2) | Remaining | +| ---------------------- | ------------ | ----------------- | ----------------------------------------- | ------------------- | +| `x-request-id` missing | 12 clients | 2 (MindLyst) | **9** (root cause + feature-flags) | 0 ✅ | +| `x-product-id` missing | 6 clients | 0 | **6** (admin + user dashboards + Python) | 0 ✅ | +| Cosmos PK mismatch | 1 container | 0 (flagged) | 0 | 1 (needs migration) | +| `.env.example` gaps | 4 files | 1 (MindLyst) | **3** (ChronoMind, user-dash, admin-dash) | 0 ✅ | +| Hardcoded productId | 2 instances | 0 | **2** (telemetry.ts, platform_client.py) | 0 ✅ | +| Python client gaps | 1 file | 0 | **1** (headers + config) | 0 ✅ | + +--- + +## 1. `x-request-id` Header — Root Cause + +### Finding + +**`@bytelyst/api-client` does NOT auto-inject `x-request-id`.** + +The `createApiClient()` factory in `packages/api-client/src/client.ts` only sets `Content-Type`, auth token (via `getToken`), and caller-supplied `defaultHeaders`. No `x-request-id` is generated. This means **every consumer** that relies on `@bytelyst/api-client` without explicitly adding the header is missing request tracing. + +### Root Cause Fix + +Add `x-request-id: crypto.randomUUID()` to `buildHeaders()` in `packages/api-client/src/client.ts`. This single change propagates to all consumers automatically. + +### Affected Clients (missing `x-request-id`) + +| Repo | File | Client Pattern | +| ---------------- | -------------------------------------------------- | ------------------------------------- | +| `common_plat` | `dashboards/admin-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` | +| `common_plat` | `dashboards/admin-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` | +| `common_plat` | `dashboards/admin-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` | +| `common_plat` | `dashboards/tracker-web/src/lib/tracker-client.ts` | `createApiClient` — no `x-request-id` | +| `common_plat` | `packages/extraction/src/client.ts` | `createApiClient` — no `x-request-id` | +| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` | +| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` | +| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` | +| `voice_ai_agent` | `user-dashboard-web/src/lib/feature-flags.ts` | Custom `fetch` — no `x-request-id` | +| `voice_ai_agent` | `backend/src/clients/platform_client.py` | `httpx` — no `x-request-id` | + +### Already Fixed (previous session) + +| Repo | File | Status | +| ------------------- | ------------------------------- | ----------------------------- | +| `multimodal_memory` | `web/src/lib/billing-client.ts` | ✅ Added via `defaultHeaders` | +| `multimodal_memory` | `web/src/lib/feature-flags.ts` | ✅ Added manually | + +### Already Correct + +| Repo | File | Status | +| ----------------------- | ------------------------------------------ | ------------------------------------------- | +| `ai_fastgap` (NomGap) | `src/api/client.ts` | ✅ Custom client with `crypto.randomUUID()` | +| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` | ✅ Custom client with `crypto.randomUUID()` | +| `voice_ai_agent` | `backend/src/main.py` | ✅ Middleware propagates/generates | +| `voice_ai_agent` | `backend/src/clients/extraction_client.py` | ✅ Passes `request_id` param | + +--- + +## 2. `x-product-id` Header Gaps + +### Clients Missing `x-product-id` + +| Repo | File | Impact | +| ---------------- | ----------------------------------------------- | --------------------------------- | +| `common_plat` | `admin-web/src/lib/billing-client.ts` | Server can't filter by product | +| `common_plat` | `admin-web/src/lib/growth-client.ts` | Server can't filter by product | +| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | Server can't filter by product | +| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | Server can't filter by product | +| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | Passes in body, not header | +| `voice_ai_agent` | `backend/src/clients/platform_client.py` | Passes in body/params, not header | + +### Already Correct + +| Repo | File | +| ------------------------------ | ------------------------------------------------------------- | +| `ai_fastgap` (NomGap) | `src/api/client.ts` — `x-product-id: API_CONFIG.productId` | +| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` — `x-product-id` header | +| `multimodal_memory` (MindLyst) | `web/src/lib/billing-client.ts` — via `defaultHeaders` | +| `multimodal_memory` (MindLyst) | `web/src/lib/feature-flags.ts` — explicit header | +| `common_plat` | `tracker-web/src/lib/tracker-client.ts` — from `localStorage` | + +--- + +## 3. Cosmos DB Partition Key Mismatch + +### `referrals` Container — 3-way Mismatch + +| Location | Partition Key | +| ----------------------------------------------------- | ------------- | +| `platform-service/src/lib/cosmos-init.ts` | `/id` | +| MindLyst `web/src/lib/cosmos.ts` | `/userId` | +| Admin dashboard `admin-web/src/lib/cosmos.ts` | `/referrerId` | +| User dashboard `user-dashboard-web/src/lib/cosmos.ts` | `/referrerId` | + +**Status:** Flagged in previous session. Cannot be fixed without data migration. Comment added to `cosmos-init.ts`. + +**Risk:** Cross-partition queries will silently succeed but may return incomplete results or fail on point reads if the wrong partition key is specified. + +--- + +## 4. Missing Environment Variables in `.env.example` Files + +### ChronoMind `web/.env.example` + +Currently only has: + +``` +NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003/api +``` + +**Missing:** + +- `NEXT_PUBLIC_PRODUCT_ID=chronomind` — used implicitly by `platform-sync.ts` (hardcoded there, but should be env-driven for consistency) + +### LysnrAI `user-dashboard-web/.env.example` + +**Missing:** + +- `NEXT_PUBLIC_PRODUCT_ID=lysnrai` — referenced by `feature-flags.ts` line 10 +- `NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003` — referenced by `feature-flags.ts` line 11 + +Has `PLATFORM_SERVICE_URL` (server-side) but not the `NEXT_PUBLIC_` variant (client-side). + +### LysnrAI root `.env.example` + +**Missing:** + +- `NEXT_PUBLIC_PRODUCT_ID` — not needed at root level (desktop app), so this is informational only. + +### Admin dashboard `.env.example` + +**Missing:** + +- `AZURE_KEYVAULT_URL` — referenced by `instrumentation.ts` but not in `.env.example` + +--- + +## 5. Hardcoded `productId` Values + +| Repo | File | Line | Value | Should Use | +| ------------------- | ---------------------------------------- | ------- | ----------------------------- | ------------------------------------ | +| `multimodal_memory` | `web/src/lib/telemetry.ts` | 19 | `productId: 'mindlyst'` | `process.env.NEXT_PUBLIC_PRODUCT_ID` | +| `voice_ai_agent` | `backend/src/clients/platform_client.py` | 86, 101 | `product_id: str = "lysnrai"` | `settings.PRODUCT_ID` or config | + +--- + +## 6. Python Backend Client Gaps (`platform_client.py`) + +The `PlatformClient` class in `backend/src/clients/platform_client.py` has several issues: + +1. **No `x-request-id` header** on any request +2. **No `x-product-id` header** on any request +3. **Creates new `httpx.AsyncClient` per request** — no connection pooling +4. **Hardcoded `product_id="lysnrai"` defaults** — should use config + +--- + +## 7. Previously Fixed (Session 1) + +| Fix | Repo | File | +| ------------------------------------------- | ------------------- | -------------------------------------------------- | +| Added `x-request-id` to billing client | `multimodal_memory` | `web/src/lib/billing-client.ts` | +| Added `x-request-id` to feature flags | `multimodal_memory` | `web/src/lib/feature-flags.ts` | +| Added 13 MindLyst containers to cosmos-init | `common_plat` | `services/platform-service/src/lib/cosmos-init.ts` | +| Added Blob Storage creds to Python config | `voice_ai_agent` | `backend/src/config.py` | +| Added missing env vars to MindLyst | `multimodal_memory` | `web/.env.example` | + +--- + +## 8. Recommended Fix Order + +1. **P0 — Root cause:** Add `x-request-id` auto-generation to `@bytelyst/api-client` `buildHeaders()` → fixes 9 TS clients at once +2. **P0 — LysnrAI feature-flags:** Add `x-request-id` to the custom `fetch` call in `user-dashboard-web/src/lib/feature-flags.ts` +3. **P1 — Python backend:** Add `x-request-id` and `x-product-id` headers to `platform_client.py` +4. **P1 — Env vars:** Add missing `NEXT_PUBLIC_*` vars to ChronoMind, LysnrAI user-dashboard, admin-dashboard `.env.example` files +5. **P2 — `x-product-id`:** Add to admin/user dashboard clients via `defaultHeaders` in `createApiClient` config +6. **P2 — Hardcoded productId:** Replace in `telemetry.ts` and `platform_client.py` +7. **P3 — Referrals PK mismatch:** Requires data migration strategy (separate task) diff --git a/docs/AGENT_PROMPTS_SYNC_AND_COSMOS.md b/docs/design/AGENT_PROMPTS_SYNC_AND_COSMOS.md similarity index 100% rename from docs/AGENT_PROMPTS_SYNC_AND_COSMOS.md rename to docs/design/AGENT_PROMPTS_SYNC_AND_COSMOS.md diff --git a/docs/WINDSURF/CLIENT_TELEMETRY_DESIGN.md b/docs/design/CLIENT_TELEMETRY_DESIGN.md similarity index 100% rename from docs/WINDSURF/CLIENT_TELEMETRY_DESIGN.md rename to docs/design/CLIENT_TELEMETRY_DESIGN.md diff --git a/docs/MIGRATION_GUIDE.md b/docs/guides/MIGRATION_GUIDE.md similarity index 100% rename from docs/MIGRATION_GUIDE.md rename to docs/guides/MIGRATION_GUIDE.md diff --git a/docs/BEST_PRACTICES/PLATFORM_ACCELERATION_IDEAS.md b/docs/guides/PLATFORM_ACCELERATION_IDEAS.md similarity index 100% rename from docs/BEST_PRACTICES/PLATFORM_ACCELERATION_IDEAS.md rename to docs/guides/PLATFORM_ACCELERATION_IDEAS.md diff --git a/docs/BEST_PRACTICES/PLATFORM_PLAYBOOK.md b/docs/guides/PLATFORM_PLAYBOOK.md similarity index 100% rename from docs/BEST_PRACTICES/PLATFORM_PLAYBOOK.md rename to docs/guides/PLATFORM_PLAYBOOK.md diff --git a/docs/WORKSPACE_ANTI_PATTERNS.md b/docs/guides/WORKSPACE_ANTI_PATTERNS.md similarity index 100% rename from docs/WORKSPACE_ANTI_PATTERNS.md rename to docs/guides/WORKSPACE_ANTI_PATTERNS.md diff --git a/docs/BEST_PRACTICES/MOBILE_APPS/ANDROID_PLATFORM_SDK.md b/docs/roadmaps/completed/mobile_ANDROID_PLATFORM_SDK.md similarity index 100% rename from docs/BEST_PRACTICES/MOBILE_APPS/ANDROID_PLATFORM_SDK.md rename to docs/roadmaps/completed/mobile_ANDROID_PLATFORM_SDK.md diff --git a/docs/BEST_PRACTICES/MOBILE_APPS/IOS_PLATFORM_SDK.md b/docs/roadmaps/completed/mobile_IOS_PLATFORM_SDK.md similarity index 100% rename from docs/BEST_PRACTICES/MOBILE_APPS/IOS_PLATFORM_SDK.md rename to docs/roadmaps/completed/mobile_IOS_PLATFORM_SDK.md diff --git a/docs/BACKEND_TO_PLATFORM_SERVICE_MIGRATION.md b/docs/roadmaps/completed/platform_BACKEND_MIGRATION.md similarity index 100% rename from docs/BACKEND_TO_PLATFORM_SERVICE_MIGRATION.md rename to docs/roadmaps/completed/platform_BACKEND_MIGRATION.md diff --git a/docs/ROADMAP.md b/docs/roadmaps/completed/platform_COMMON_EXTRACTION_ROADMAP.md similarity index 98% rename from docs/ROADMAP.md rename to docs/roadmaps/completed/platform_COMMON_EXTRACTION_ROADMAP.md index edc1da75..bdba195e 100644 --- a/docs/ROADMAP.md +++ b/docs/roadmaps/completed/platform_COMMON_EXTRACTION_ROADMAP.md @@ -544,21 +544,21 @@ The following gaps were identified by scanning every import in the actual codeba ## Summary -| Phase | Packages | Tasks | Done | Status | -| --------- | ------------------------------------------------ | ------- | ------- | --------------------------------------------------------------------------------- | -| **0** | Repo scaffolding + branching + rollback strategy | 14 | 14 | ✅ Complete | -| **1A** | `@bytelyst/errors` | 23 | 22 | ✅ Complete (Docker verify pending) | -| **1B** | `@bytelyst/cosmos` | 33 | 32 | ✅ Complete (Docker verify pending) | -| **2A** | `@bytelyst/config` (34 files to rewire) | 25 | 24 | ✅ Complete (Docker verify pending) | -| **2B** | `@bytelyst/auth` (20+ admin routes affected) | 29 | 29 | ✅ Complete (25 tests, tracker migrated) | -| **2C** | `@bytelyst/fastify-core` | 24 | 22 | ✅ Services refactored, health-check verified (Docker pending) | -| **3A** | `@bytelyst/api-client` | 17 | 17 | ✅ Complete | -| **3B** | `@bytelyst/react-auth` (24 consumer files) | 28 | 25 | ✅ Admin uses factory; user/tracker keep custom | -| **4** | `@bytelyst/design-tokens` (4 platforms) | 24 | 23 | ✅ CSS synced to MindLyst; CONTRIBUTING updated; visual verify pending | -| **5** | CI/CD + Docker (pre-copy strategy) | 23 | 23 | ✅ Docker build + compose up verified on home network | -| **6** | Verification + docs + cleanup | 28 | 25 | ⚠️ Remaining E2E: admin + user portal flows | -| **7** | Future enhancements (+testing pkg) | 10 | 3 | 🔲 @bytelyst/testing (10 tests) + token pre-commit hook + AGENTS updated | -| **Total** | **10 packages (+1 bonus: logger)** | **278** | **257** | **~92% complete** | +| Phase | Packages | Tasks | Done | Status | +| --------- | ------------------------------------------------ | ------- | ------- | ------------------------------------------------------------------------ | +| **0** | Repo scaffolding + branching + rollback strategy | 14 | 14 | ✅ Complete | +| **1A** | `@bytelyst/errors` | 23 | 22 | ✅ Complete (Docker verify pending) | +| **1B** | `@bytelyst/cosmos` | 33 | 32 | ✅ Complete (Docker verify pending) | +| **2A** | `@bytelyst/config` (34 files to rewire) | 25 | 24 | ✅ Complete (Docker verify pending) | +| **2B** | `@bytelyst/auth` (20+ admin routes affected) | 29 | 29 | ✅ Complete (25 tests, tracker migrated) | +| **2C** | `@bytelyst/fastify-core` | 24 | 22 | ✅ Services refactored, health-check verified (Docker pending) | +| **3A** | `@bytelyst/api-client` | 17 | 17 | ✅ Complete | +| **3B** | `@bytelyst/react-auth` (24 consumer files) | 28 | 25 | ✅ Admin uses factory; user/tracker keep custom | +| **4** | `@bytelyst/design-tokens` (4 platforms) | 24 | 23 | ✅ CSS synced to MindLyst; CONTRIBUTING updated; visual verify pending | +| **5** | CI/CD + Docker (pre-copy strategy) | 23 | 23 | ✅ Docker build + compose up verified on home network | +| **6** | Verification + docs + cleanup | 28 | 25 | ⚠️ Remaining E2E: admin + user portal flows | +| **7** | Future enhancements (+testing pkg) | 10 | 3 | 🔲 @bytelyst/testing (10 tests) + token pre-commit hook + AGENTS updated | +| **Total** | **10 packages (+1 bonus: logger)** | **278** | **257** | **~92% complete** | ### Bonus Package (not in original roadmap) diff --git a/docs/WINDSURF/SERVICE_CONSOLIDATION_ROADMAP.md b/docs/roadmaps/completed/platform_SERVICE_CONSOLIDATION_ROADMAP.md similarity index 77% rename from docs/WINDSURF/SERVICE_CONSOLIDATION_ROADMAP.md rename to docs/roadmaps/completed/platform_SERVICE_CONSOLIDATION_ROADMAP.md index df77a458..c87f9a6f 100644 --- a/docs/WINDSURF/SERVICE_CONSOLIDATION_ROADMAP.md +++ b/docs/roadmaps/completed/platform_SERVICE_CONSOLIDATION_ROADMAP.md @@ -11,13 +11,13 @@ ## Why Consolidate -| Problem | Impact | -|---------|--------| -| 5 separate Node processes for 2 products | Unnecessary operational overhead | -| 5 ports to manage (4001–4005) | Complex docker-compose, run scripts, env files | -| 5 separate Cosmos connections | Wasted connection pool resources | -| 5 CI pipelines | Slow feedback, more config to maintain | -| 5 config schemas with duplicate env vars | Inconsistent config, easy to miss vars | +| Problem | Impact | +| ---------------------------------------- | ---------------------------------------------- | +| 5 separate Node processes for 2 products | Unnecessary operational overhead | +| 5 ports to manage (4001–4005) | Complex docker-compose, run scripts, env files | +| 5 separate Cosmos connections | Wasted connection pool resources | +| 5 CI pipelines | Slow feedback, more config to maintain | +| 5 config schemas with duplicate env vars | Inconsistent config, easy to miss vars | **After consolidation:** 2 services — `platform-service` (port 4003) + `extraction-service` (port 4005) @@ -31,12 +31,12 @@ Services export product ID differently — modules reference different names: -| Service | Export Name | Source | -|---------|-----------|--------| -| **platform-service** | `PRODUCT_ID` | `loadProductIdentity().productId` from `@bytelyst/config` | -| **growth-service** | `PRODUCT_ID` | same as platform ✅ | -| **billing-service** | `PRODUCT_ID` | same as platform ✅ | -| **tracker-service** | `DEFAULT_PRODUCT_ID` | `process.env.DEFAULT_PRODUCT_ID \|\| getProductId()` — **different name** ⚠️ | +| Service | Export Name | Source | +| -------------------- | -------------------- | ---------------------------------------------------------------------------- | +| **platform-service** | `PRODUCT_ID` | `loadProductIdentity().productId` from `@bytelyst/config` | +| **growth-service** | `PRODUCT_ID` | same as platform ✅ | +| **billing-service** | `PRODUCT_ID` | same as platform ✅ | +| **tracker-service** | `DEFAULT_PRODUCT_ID` | `process.env.DEFAULT_PRODUCT_ID \|\| getProductId()` — **different name** ⚠️ | **Fix:** When merging tracker modules, change all `DEFAULT_PRODUCT_ID` imports to `PRODUCT_ID` in the copied module files, and add `DEFAULT_PRODUCT_ID` env var support to platform-service's `product-config.ts` for backward compat. @@ -44,15 +44,16 @@ Services export product ID differently — modules reference different names: Platform-service `package.json` is **missing** these deps needed by merged modules: -| Dep | Needed By | Currently In | -|-----|-----------|-------------| -| `stripe` (^17.5.0) | billing modules (stripe webhooks, checkout) | billing-service, growth-service | -| `@bytelyst/auth` (workspace:*) | tracker modules (`extractAuth`) | tracker-service | -| `@fastify/rate-limit` (^10.3.0) | tracker rate limiting | tracker-service | +| Dep | Needed By | Currently In | +| ------------------------------- | ------------------------------------------- | ------------------------------- | +| `stripe` (^17.5.0) | billing modules (stripe webhooks, checkout) | billing-service, growth-service | +| `@bytelyst/auth` (workspace:\*) | tracker modules (`extractAuth`) | tracker-service | +| `@fastify/rate-limit` (^10.3.0) | tracker rate limiting | tracker-service | ### Gap 3: Billing Internal Key Auth (Global Hook) `billing-service/src/server.ts` has a **global** `onRequest` hook: + ```typescript app.addHook('onRequest', async (req, reply) => { if (path === '/health' || path.includes('/stripe/webhook')) return; @@ -60,6 +61,7 @@ app.addHook('onRequest', async (req, reply) => { if (key !== INTERNAL_KEY) reply.code(401).send(...) }); ``` + This **cannot** be a global hook after merge — it would block auth, audit, tracker, etc. routes. **Fix:** Convert to a Fastify plugin registered only on billing route prefixes, or add `x-internal-key` check inside each billing route handler. @@ -67,6 +69,7 @@ This **cannot** be a global hook after merge — it would block auth, audit, tra ### Gap 4: Growth Webhooks Library `growth-service/src/lib/webhooks.ts` dispatches fire-and-forget HTTP callbacks on invitation redeem. References env vars: + - `WEBHOOK_INVITATION_REDEEMED_URL` - `WEBHOOK_REFERRAL_STATUS_URL` @@ -82,26 +85,26 @@ Growth-service config requires `STRIPE_SECRET_KEY` as **required** (not optional **Dashboard API clients (TypeScript):** -| File | Current Env Var | Current Default | -|------|----------------|-----------------| -| `admin-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | -| `admin-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | -| `user-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | -| `user-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | -| `user-dashboard-web/src/app/api/stripe/webhook/route.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | -| `admin-dashboard-web/src/app/api/stripe/config/route.ts` | — | `http://localhost:4002` inline | -| `admin-dashboard-web/src/lib/stripe-context.tsx` | — | `http://localhost:4002` (3 places) | -| `tracker-dashboard-web/src/app/api/tracker/[...path]/route.ts` | `TRACKER_API_URL` | `http://localhost:4004` | -| `tracker-dashboard-web/src/app/api/auth/login/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | -| `tracker-dashboard-web/src/app/api/auth/me/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | +| File | Current Env Var | Current Default | +| -------------------------------------------------------------- | --------------------- | ---------------------------------- | +| `admin-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | +| `admin-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | +| `user-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | +| `user-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | +| `user-dashboard-web/src/app/api/stripe/webhook/route.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | +| `admin-dashboard-web/src/app/api/stripe/config/route.ts` | — | `http://localhost:4002` inline | +| `admin-dashboard-web/src/lib/stripe-context.tsx` | — | `http://localhost:4002` (3 places) | +| `tracker-dashboard-web/src/app/api/tracker/[...path]/route.ts` | `TRACKER_API_URL` | `http://localhost:4004` | +| `tracker-dashboard-web/src/app/api/auth/login/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | +| `tracker-dashboard-web/src/app/api/auth/me/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | **Python clients (desktop + backend):** -| File | Current Env Var | Current Default | -|------|----------------|-----------------| +| File | Current Env Var | Current Default | +| --------------------------------------- | --------------------- | ----------------------- | | `backend/src/clients/billing_client.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | -| `src/cloud/api_sync.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | -| `src/cloud/plan_resolver.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | +| `src/cloud/api_sync.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | +| `src/cloud/plan_resolver.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | All these must change to `PLATFORM_SERVICE_URL` / `http://localhost:4003`. @@ -112,10 +115,12 @@ All these must change to `PLATFORM_SERVICE_URL` / `http://localhost:4003`. ### Gap 8: Stripe Webhook Test Hardcodes Port `user-dashboard-web/src/__tests__/stripe-webhook.test.ts` sets: + ```typescript process.env.BILLING_SERVICE_URL = 'http://localhost:4002'; expect(url).toBe('http://localhost:4002/api/stripe/webhook'); ``` + Must update to port 4003. ### Gap 9: Load Test Scripts @@ -133,6 +138,7 @@ Must update defaults to port 4003. ### Gap 11: LysnrAI Services Stubs `learning_voice_ai_agent/services/` contains `.env.example` stubs for each service: + - `services/billing-service/.env.example` - `services/growth-service/.env.example` - `services/tracker-service/.env.example` @@ -154,6 +160,7 @@ Mobile apps call the Python backend (`localhost:8000`), which calls billing-serv ### Gap 14: Docker Compose `depends_on` for Tracker Dashboard `learning_voice_ai_agent/docker-compose.yml` has: + ```yaml tracker-dashboard: depends_on: @@ -162,17 +169,23 @@ tracker-dashboard: platform-service: condition: service_started ``` + After merge, `tracker-service` container no longer exists. Must change `depends_on` to only `platform-service`. ### Gap 15: Admin Dashboard `docs.ts` Service Directory List `admin-dashboard-web/src/lib/docs.ts` has a hardcoded list of service directories: + ```typescript const serviceDirs = [ - 'admin-dashboard-web', 'user-dashboard-web', 'mobile_app', - 'services/billing-service', 'services/growth-service', + 'admin-dashboard-web', + 'user-dashboard-web', + 'mobile_app', + 'services/billing-service', + 'services/growth-service', ]; ``` + Must update to remove old service names or replace with `services/platform-service`. ### Gap 16: MindLyst Docs Reference Old Services @@ -195,6 +208,7 @@ Platform-service's Dockerfile only copies `services/platform-service/` — it do ### Route Path Collision Check ✅ All services use unique route prefixes — **no collisions**: + - platform: `/auth/*`, `/audit/*`, `/notifications/*`, `/flags/*`, `/ratelimit/*`, `/blob/*`, `/devices/*` - billing: `/subscriptions/*`, `/usage/*`, `/plans/*`, `/licenses/*`, `/payments/*`, `/stripe/*` - growth: `/invitations/*`, `/referrals/*`, `/promos/*` @@ -244,12 +258,12 @@ services/ All containers served by one Cosmos client in platform-service: -| Origin | Containers | -|--------|-----------| +| Origin | Containers | +| ----------------------- | ----------------------------------------------------------------------------------- | | **platform** (existing) | `users`, `audit_log`, `feature_flags`, `notification_devices`, `notification_prefs` | -| **billing** → platform | `subscriptions`, `payments`, `plans`, `licenses`, `usage_daily` | -| **growth** → platform | `invitation_codes`, `referrals`, `promo_codes` | -| **tracker** → platform | `tracker_items`, `tracker_comments`, `tracker_votes` | +| **billing** → platform | `subscriptions`, `payments`, `plans`, `licenses`, `usage_daily` | +| **growth** → platform | `invitation_codes`, `referrals`, `promo_codes` | +| **tracker** → platform | `tracker_items`, `tracker_comments`, `tracker_votes` | --- @@ -390,7 +404,7 @@ All containers served by one Cosmos client in platform-service: - [x] **3.3.1** Created `platform-service/src/lib/auth.ts` re-exporting from `@bytelyst/auth` - [x] **3.3.2** Copied from tracker-service (identical content) -- [x] **3.3.3** Added `@bytelyst/auth` (workspace:*) to package.json +- [x] **3.3.3** Added `@bytelyst/auth` (workspace:\*) to package.json - [x] **3.3.4** Added `@fastify/rate-limit` (^10.3.0) to package.json - [x] **3.3.5** `jose` already in platform ✅ @@ -560,29 +574,30 @@ Also fixed: monitoring/health.ts, AI.dev/SKILLS docs, MIGRATION_GUIDE.md [`81609 ## Summary -| Phase | What | Effort | Tests Moved | Critical Gaps Addressed | -|-------|------|--------|-------------|------------------------| -| **0** | Preparation & backup | 30 min | — | — | -| **1** | Merge growth-service (3 modules) | 2–3 hrs | ~14 | Gap 4 (webhooks), Gap 5 (Stripe key) | -| **2** | Merge billing-service (5 modules) | 4–5 hrs | ~11 | Gap 3 (internal key auth) | -| **3** | Merge tracker-service (4 modules) | 3–4 hrs | ~45 | Gap 1 (product ID), Gap 2 (deps) | -| **4** | Update consumers (20+ files across 3 repos) | 4–5 hrs | — | Gaps 6–11, 13–17 | -| **5** | Documentation & final verification | 2–3 hrs | — | — | -| **Total** | **5 services → 2** | **~4–5 days** | **~125+ tests** | **17 gaps addressed** | +| Phase | What | Effort | Tests Moved | Critical Gaps Addressed | +| --------- | ------------------------------------------- | ------------- | --------------- | ------------------------------------ | +| **0** | Preparation & backup | 30 min | — | — | +| **1** | Merge growth-service (3 modules) | 2–3 hrs | ~14 | Gap 4 (webhooks), Gap 5 (Stripe key) | +| **2** | Merge billing-service (5 modules) | 4–5 hrs | ~11 | Gap 3 (internal key auth) | +| **3** | Merge tracker-service (4 modules) | 3–4 hrs | ~45 | Gap 1 (product ID), Gap 2 (deps) | +| **4** | Update consumers (20+ files across 3 repos) | 4–5 hrs | — | Gaps 6–11, 13–17 | +| **5** | Documentation & final verification | 2–3 hrs | — | — | +| **Total** | **5 services → 2** | **~4–5 days** | **~125+ tests** | **17 gaps addressed** | ## Port Allocation (After) -| Service | Port | -|---------|------| -| **platform-service** | **4003** | -| **extraction-service** | **4005** | -| extraction-service python sidecar (internal) | 4006 | +| Service | Port | +| -------------------------------------------- | -------- | +| **platform-service** | **4003** | +| **extraction-service** | **4005** | +| extraction-service python sidecar (internal) | 4006 | Ports 4001, 4002, 4004 freed up. ## Rollback Strategy Each phase has its own commit. If a phase breaks something: + 1. `git revert ` to undo that phase 2. The old service code is in git history 3. Backup branches created in Phase 0 @@ -590,13 +605,13 @@ Each phase has its own commit. If a phase breaks something: ## Risks & Mitigations -| Risk | Mitigation | -|------|-----------| -| Route path collisions | Verified ✅ — all services use unique prefixes | -| Config schema gets large | Group env vars by domain with clear section comments | -| Stripe webhook raw body | Fastify handles this — verify after move | +| Risk | Mitigation | +| ---------------------------------------- | ----------------------------------------------------------------------------- | +| Route path collisions | Verified ✅ — all services use unique prefixes | +| Config schema gets large | Group env vars by domain with clear section comments | +| Stripe webhook raw body | Fastify handles this — verify after move | | Billing internal key blocks other routes | Scoped Fastify plugin (Phase 2.2) isolates key check to billing prefixes only | -| Public tracker routes skip auth | Register outside scoped plugins — verify in Phase 3.5.3 | -| Python billing client breaks | Change env var name, keep same API paths — transparent to Python code | -| Stripe webhook test fails | Explicit port update in Phase 4.4 | -| Product ID mismatch | Alias `DEFAULT_PRODUCT_ID = PRODUCT_ID` in Phase 3.2.4 | +| Public tracker routes skip auth | Register outside scoped plugins — verify in Phase 3.5.3 | +| Python billing client breaks | Change env var name, keep same API paths — transparent to Python code | +| Stripe webhook test fails | Explicit port update in Phase 4.4 | +| Product ID mismatch | Alias `DEFAULT_PRODUCT_ID = PRODUCT_ID` in Phase 3.2.4 | diff --git a/docs/CLOUD/CLOUD_AGNOSTIC_REFACTOR_ROADMAP.md b/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md similarity index 72% rename from docs/CLOUD/CLOUD_AGNOSTIC_REFACTOR_ROADMAP.md rename to docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md index 598a2065..43c12bc0 100644 --- a/docs/CLOUD/CLOUD_AGNOSTIC_REFACTOR_ROADMAP.md +++ b/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md @@ -82,6 +82,7 @@ routes.ts ────────► │ container() │ ``` **Problems:** + - 38 platform-service repository files write raw Cosmos SQL queries - 6 additional repository files in dashboards + MindLyst web - Blob, Speech, OpenAI all have direct Azure SDK imports @@ -112,6 +113,7 @@ routes.ts ────────► │ collection.findMany({ │ ``` **Benefits:** + - Repositories use a generic query API — no SQL strings, no Azure types - Switching provider = implement a new adapter (~200 lines) + change env var - In-memory adapter makes tests fast and cloud-free @@ -121,16 +123,16 @@ routes.ts ────────► │ collection.findMany({ │ ## 3. Sprint Plan Overview -| Sprint | Package / Scope | Effort | Files Changed | Risk | -|--------|----------------|--------|---------------|------| -| **1** | `@bytelyst/datastore` — DB abstraction | 5–7 days | 44 repository files + 1 new package | Medium | -| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low | -| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low | -| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low | -| **5** | `@bytelyst/speech` — Speech STT abstraction | 3–4 days | 3 files + 1 new package | Medium | -| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low | -| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None | -| **Total** | | **~15–17 days** | ~55 files | | +| Sprint | Package / Scope | Effort | Files Changed | Risk | +| --------- | ------------------------------------------------- | --------------- | ----------------------------------- | -------- | +| **1** | `@bytelyst/datastore` — DB abstraction | 5–7 days | 44 repository files + 1 new package | Medium | +| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low | +| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low | +| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low | +| **5** | `@bytelyst/speech` — Speech STT abstraction | 3–4 days | 3 files + 1 new package | Medium | +| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low | +| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None | +| **Total** | | **~15–17 days** | ~55 files | | ### Priority Order @@ -211,8 +213,8 @@ export type SortMap = Record; // 1 = ASC, -1 = DESC export interface AggregateOptions { filter: FilterMap; groupBy?: string[]; - count?: string; // alias for COUNT(1) - sum?: string; // field to SUM + count?: string; // alias for COUNT(1) + sum?: string; // field to SUM } /** Factory that creates collections — one per provider. */ @@ -412,6 +414,7 @@ export async function create(doc: FeatureFlagDoc): Promise { ``` **Key observations:** + - No SQL strings - No `@azure/cosmos` types - No `.items.query().fetchAll()` chaining @@ -445,11 +448,11 @@ export function createDatastoreProvider(): DatastoreProvider { const provider = process.env.DB_PROVIDER || 'cosmos'; switch (provider) { case 'cosmos': - return new CosmosDatastoreProvider(); // uses existing COSMOS_ENDPOINT, COSMOS_KEY + return new CosmosDatastoreProvider(); // uses existing COSMOS_ENDPOINT, COSMOS_KEY case 'mongo': - return new MongoDatastoreProvider(); // uses MONGO_URI + return new MongoDatastoreProvider(); // uses MONGO_URI case 'memory': - return new MemoryDatastoreProvider(); // no config needed + return new MemoryDatastoreProvider(); // no config needed default: throw new Error(`Unknown DB_PROVIDER: ${provider}`); } @@ -459,6 +462,7 @@ export function createDatastoreProvider(): DatastoreProvider { ### 4.7 Migration Plan for 38 Repository Files Migrate in batches, one module per commit. Each commit: + 1. Update the repository file to use `getCollection()` instead of `getContainer()` 2. Replace SQL queries with `findMany()` / `findOne()` / `count()` / `aggregate()` 3. Run the module's test file — must pass @@ -466,40 +470,40 @@ Migrate in batches, one module per commit. Each commit: **Batch order** (simplest first, complex last): -| Batch | Modules | Complexity | Notes | -|-------|---------|-----------|-------| -| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup | -| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files | -| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files | -| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files | -| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files | -| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files | -| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files | -| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files | -| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files | -| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files | +| Batch | Modules | Complexity | Notes | +| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- | +| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup | +| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files | +| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files | +| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files | +| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files | +| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files | +| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files | +| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files | +| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files | +| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files | ### 4.8 Handling Complex Queries Some repository files use advanced Cosmos SQL features. Here's how the interface handles them: -| Cosmos SQL Pattern | Datastore Interface Equivalent | -|--------------------|-------------------------------| -| `SELECT * FROM c WHERE c.x = @v` | `findMany({ filter: { x: v } })` | -| `SELECT * FROM c WHERE c.x = @v AND c.y = @w` | `findMany({ filter: { x: v, y: w } })` | -| `ORDER BY c.x ASC` | `findMany({ sort: { x: 1 } })` | -| `ORDER BY c.x DESC` | `findMany({ sort: { x: -1 } })` | -| `OFFSET @o LIMIT @l` | `findMany({ offset: o, limit: l })` | -| `SELECT VALUE COUNT(1) FROM c WHERE ...` | `count({ filter })` | +| Cosmos SQL Pattern | Datastore Interface Equivalent | +| ---------------------------------------------------- | -------------------------------------------------------- | +| `SELECT * FROM c WHERE c.x = @v` | `findMany({ filter: { x: v } })` | +| `SELECT * FROM c WHERE c.x = @v AND c.y = @w` | `findMany({ filter: { x: v, y: w } })` | +| `ORDER BY c.x ASC` | `findMany({ sort: { x: 1 } })` | +| `ORDER BY c.x DESC` | `findMany({ sort: { x: -1 } })` | +| `OFFSET @o LIMIT @l` | `findMany({ offset: o, limit: l })` | +| `SELECT VALUE COUNT(1) FROM c WHERE ...` | `count({ filter })` | | `SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan` | `aggregate({ filter, groupBy: ['plan'], count: 'cnt' })` | -| `NOT IS_DEFINED(c.usedAt)` | `findMany({ filter: { usedAt: { $exists: false } } })` | -| `c.x >= @v` | `findMany({ filter: { x: { $gte: v } } })` | -| `ARRAY_CONTAINS(c.tags, @tag)` | `findMany({ filter: { tags: { $contains: tag } } })` | -| `container().item(id, pk).read()` | `findById(id, pk)` | -| `container().items.create(doc)` | `create(doc)` | -| `container().item(id, pk).replace(doc)` | `replace(id, pk, doc)` | -| `container().items.upsert(doc)` | `upsert(doc)` | -| `container().item(id, pk).delete()` | `delete(id, pk)` | +| `NOT IS_DEFINED(c.usedAt)` | `findMany({ filter: { usedAt: { $exists: false } } })` | +| `c.x >= @v` | `findMany({ filter: { x: { $gte: v } } })` | +| `ARRAY_CONTAINS(c.tags, @tag)` | `findMany({ filter: { tags: { $contains: tag } } })` | +| `container().item(id, pk).read()` | `findById(id, pk)` | +| `container().items.create(doc)` | `create(doc)` | +| `container().item(id, pk).replace(doc)` | `replace(id, pk, doc)` | +| `container().items.upsert(doc)` | `upsert(doc)` | +| `container().item(id, pk).delete()` | `delete(id, pk)` | For the filter operators, use a simple operator convention: @@ -645,6 +649,7 @@ export interface ChatCompletionResponse { MindLyst `web/src/lib/llm.ts` already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package. **Provider implementations:** + - `AzureOpenAIProvider` — uses `api-key` header + deployment-scoped URL - `OpenAIProvider` — uses `Authorization: Bearer` header + model param - `GeminiProvider` — uses Google Generative AI SDK (future) @@ -676,6 +681,7 @@ The `openai` Python SDK already has a common interface between `OpenAI` and `Azu ### 7.1 Key Insight: Already 90% Done The current `resolveKeyVaultSecrets()` already: + - Skips if `AZURE_KEYVAULT_URL` is not set - Falls back to env vars for each secret - Logs warnings but doesn't throw @@ -691,19 +697,19 @@ export interface SecretsProvider { export async function resolveSecrets( secrets: SecretMapping[], - opts?: { provider?: string }, + opts?: { provider?: string } ): Promise { const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env'; switch (provider) { case 'azure-keyvault': - return resolveFromAzureKeyVault(secrets); // existing code + return resolveFromAzureKeyVault(secrets); // existing code case 'aws-secrets-manager': return resolveFromAWSSecretsManager(secrets); // future case 'gcp-secret-manager': - return resolveFromGCPSecretManager(secrets); // future + return resolveFromGCPSecretManager(secrets); // future case 'doppler': - return resolveFromDoppler(secrets); // future + return resolveFromDoppler(secrets); // future case 'env': default: return; // All secrets already in env — nothing to resolve @@ -720,14 +726,14 @@ The current env vars have Azure-specific names. Add **generic aliases** that fal export const ENV_ALIASES: Record = { // Generic name → fallback names (checked in order) - 'BLOB_CONNECTION_STRING': ['AZURE_BLOB_CONNECTION_STRING'], - 'BLOB_ACCOUNT_NAME': ['AZURE_BLOB_ACCOUNT_NAME'], - 'BLOB_ACCOUNT_KEY': ['AZURE_BLOB_ACCOUNT_KEY'], - 'SPEECH_KEY': ['AZURE_SPEECH_KEY'], - 'SPEECH_REGION': ['AZURE_SPEECH_REGION'], - 'LLM_API_KEY': ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'], - 'LLM_ENDPOINT': ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'], - 'LLM_MODEL': ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'], + BLOB_CONNECTION_STRING: ['AZURE_BLOB_CONNECTION_STRING'], + BLOB_ACCOUNT_NAME: ['AZURE_BLOB_ACCOUNT_NAME'], + BLOB_ACCOUNT_KEY: ['AZURE_BLOB_ACCOUNT_KEY'], + SPEECH_KEY: ['AZURE_SPEECH_KEY'], + SPEECH_REGION: ['AZURE_SPEECH_REGION'], + LLM_API_KEY: ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'], + LLM_ENDPOINT: ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'], + LLM_MODEL: ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'], }; export function getEnv(name: string): string | undefined { @@ -829,6 +835,7 @@ protocol SpeechTranscriber { ### 8.4 Note on Complexity Speech is the hardest abstraction because: + - Azure Speech SDK has a unique push-stream architecture - Google Cloud Speech uses gRPC streaming - Deepgram uses WebSockets @@ -871,11 +878,13 @@ Implementations: `AzureNotificationHubProvider`, `FirebaseProvider` (future), `E **Effort:** 0.5 days (mostly done already) The ecosystem already has cloud-agnostic monitoring: + - **Custom telemetry** via `@bytelyst/telemetry-client` → platform-service → Cosmos - **Loki + Grafana** in `services/monitoring/` - **Health checks** via `/health` endpoints on all services **Remaining work:** + - Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights) - Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers @@ -887,43 +896,43 @@ Once all sprints are complete, here's how much work each cloud migration scenari ### Scenario: Switch DB from Cosmos to MongoDB Atlas -| Step | Effort | Description | -|------|--------|-------------| -| Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() | -| Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change | -| Run data migration script | 2–4 hours | Export Cosmos JSON → import to MongoDB | -| Run full test suite | 30 minutes | Verify all 1,029+ tests pass | -| **Total** | **~1.5 days** | vs 3–5 weeks without abstraction | +| Step | Effort | Description | +| ----------------------------------------- | ------------- | -------------------------------------------------- | +| Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() | +| Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change | +| Run data migration script | 2–4 hours | Export Cosmos JSON → import to MongoDB | +| Run full test suite | 30 minutes | Verify all 1,029+ tests pass | +| **Total** | **~1.5 days** | vs 3–5 weeks without abstraction | ### Scenario: Switch Storage from Azure Blob to S3 -| Step | Effort | Description | -|------|--------|-------------| -| Implement `S3StorageProvider` | 0.5 day | ~100 lines | -| Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars | 5 minutes | Config change | -| Migrate blobs | 1–2 hours | azcopy or rclone | -| **Total** | **~0.5 days** | vs 2–3 days without abstraction | +| Step | Effort | Description | +| -------------------------------------------- | ------------- | ------------------------------- | +| Implement `S3StorageProvider` | 0.5 day | ~100 lines | +| Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars | 5 minutes | Config change | +| Migrate blobs | 1–2 hours | azcopy or rclone | +| **Total** | **~0.5 days** | vs 2–3 days without abstraction | ### Scenario: Switch LLM from Azure OpenAI to OpenAI Direct -| Step | Effort | Description | -|------|--------|-------------| -| Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...` | 5 minutes | Config change only | -| Remove `AZURE_OPENAI_*` env vars | 5 minutes | Cleanup | -| **Total** | **10 minutes** | Already near-zero today | +| Step | Effort | Description | +| ------------------------------------------------ | -------------- | ----------------------- | +| Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...` | 5 minutes | Config change only | +| Remove `AZURE_OPENAI_*` env vars | 5 minutes | Cleanup | +| **Total** | **10 minutes** | Already near-zero today | ### Scenario: Full Cloud Migration (Azure → AWS) -| Step | Effort | Description | -|------|--------|-------------| -| Implement MongoDB/DynamoDB provider | 1–2 days | | -| Implement S3 storage provider | 0.5 days | | -| Implement AWS Secrets Manager provider | 0.5 days | | -| Switch LLM to OpenAI direct | 10 minutes | | -| Implement Google STT or AWS Transcribe | 2–3 days | Speech is still the hardest | -| Implement SNS push provider | 0.5 days | | -| Data migration + testing | 2–3 days | | -| **Total** | **~7–10 days** | vs 4–8 weeks without abstraction | +| Step | Effort | Description | +| -------------------------------------- | -------------- | -------------------------------- | +| Implement MongoDB/DynamoDB provider | 1–2 days | | +| Implement S3 storage provider | 0.5 days | | +| Implement AWS Secrets Manager provider | 0.5 days | | +| Switch LLM to OpenAI direct | 10 minutes | | +| Implement Google STT or AWS Transcribe | 2–3 days | Speech is still the hardest | +| Implement SNS push provider | 0.5 days | | +| Data migration + testing | 2–3 days | | +| **Total** | **~7–10 days** | vs 4–8 weeks without abstraction | --- @@ -938,7 +947,7 @@ Every repository test should work against **any** provider. The test setup picks import { setTestProvider } from '@bytelyst/datastore/testing'; beforeAll(() => { - setTestProvider('memory'); // Fast, no network, deterministic + setTestProvider('memory'); // Fast, no network, deterministic }); ``` @@ -957,6 +966,7 @@ __tests__/ ### 12.3 Migration Verification Checklist For each sprint, before merging: + 1. All existing tests pass (no regressions) 2. New interface tests pass with all implemented providers 3. Manual smoke test against Azure (dev environment) @@ -1046,14 +1056,14 @@ AZURE_SPEECH_REGION=eastus ## 14. Risk Mitigation -| Risk | Mitigation | -|------|-----------| -| **FilterMap can't express complex Cosmos SQL** | Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators | -| **Performance regression from abstraction layer** | Benchmark critical queries before/after. The abstraction adds one function call — negligible | -| **Team unfamiliar with new patterns** | Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration | -| **In-memory provider behaves differently** | Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only | -| **Stale data during DB migration** | Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old | -| **Sprint 1 takes too long** | The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist | +| Risk | Mitigation | +| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| **FilterMap can't express complex Cosmos SQL** | Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators | +| **Performance regression from abstraction layer** | Benchmark critical queries before/after. The abstraction adds one function call — negligible | +| **Team unfamiliar with new patterns** | Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration | +| **In-memory provider behaves differently** | Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only | +| **Stale data during DB migration** | Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old | +| **Sprint 1 takes too long** | The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist | --- @@ -1116,32 +1126,81 @@ packages/llm/ ```typescript // Exact match -{ field: value } +{ + field: value; +} // Comparison -{ field: { $gt: value } } // > -{ field: { $gte: value } } // >= -{ field: { $lt: value } } // < -{ field: { $lte: value } } // <= -{ field: { $ne: value } } // != +{ + field: { + $gt: value; + } +} // > +{ + field: { + $gte: value; + } +} // >= +{ + field: { + $lt: value; + } +} // < +{ + field: { + $lte: value; + } +} // <= +{ + field: { + $ne: value; + } +} // != // Existence -{ field: { $exists: true } } // IS_DEFINED(c.field) -{ field: { $exists: false } } // NOT IS_DEFINED(c.field) +{ + field: { + $exists: true; + } +} // IS_DEFINED(c.field) +{ + field: { + $exists: false; + } +} // NOT IS_DEFINED(c.field) // String -{ field: { $startsWith: 'prefix' } } -{ field: { $contains: 'substr' } } +{ + field: { + $startsWith: 'prefix'; + } +} +{ + field: { + $contains: 'substr'; + } +} // Array -{ field: { $contains: value } } // ARRAY_CONTAINS -{ field: { $in: [v1, v2, v3] } } // IN operator +{ + field: { + $contains: value; + } +} // ARRAY_CONTAINS +{ + field: { + $in: [v1, v2, v3]; + } +} // IN operator // Logical (for complex queries) -{ $or: [{ field1: v1 }, { field2: v2 }] } +{ + $or: [{ field1: v1 }, { field2: v2 }]; +} ``` **Cosmos adapter** translates each operator to SQL: + - `{ $gte: v }` → `c.field >= @pN` - `{ $exists: false }` → `NOT IS_DEFINED(c.field)` - `{ $contains: v }` on array → `ARRAY_CONTAINS(c.field, @pN)` @@ -1155,19 +1214,19 @@ packages/llm/ ## Summary -| Sprint | What | Days | After This Sprint... | -|--------|------|------|---------------------| -| 1 | Database abstraction | 5–7 | DB swap = implement 1 adapter (~200 LOC) + config change | -| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change | -| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) | -| 4 | Secrets abstraction | 1 | Secrets swap = config change only | -| 5 | Speech abstraction | 3–4 | Speech swap = implement 1 adapter (~300 LOC) | -| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) | -| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic | -| **Total** | | **~15–17 days** | **Full cloud migration = ~7–10 days instead of 4–8 weeks** | +| Sprint | What | Days | After This Sprint... | +| --------- | -------------------- | --------------- | ---------------------------------------------------------- | +| 1 | Database abstraction | 5–7 | DB swap = implement 1 adapter (~200 LOC) + config change | +| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change | +| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) | +| 4 | Secrets abstraction | 1 | Secrets swap = config change only | +| 5 | Speech abstraction | 3–4 | Speech swap = implement 1 adapter (~300 LOC) | +| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) | +| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic | +| **Total** | | **~15–17 days** | **Full cloud migration = ~7–10 days instead of 4–8 weeks** | The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy. --- -*Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves.* +_Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves._ diff --git a/docs/CLOUD/REFERRALS_PARTITION_KEY_MIGRATION.md b/docs/roadmaps/not-started/cloud_REFERRALS_PARTITION_KEY_MIGRATION.md similarity index 100% rename from docs/CLOUD/REFERRALS_PARTITION_KEY_MIGRATION.md rename to docs/roadmaps/not-started/cloud_REFERRALS_PARTITION_KEY_MIGRATION.md diff --git a/docs/BEST_PRACTICES/PLATFORM_ACCELERATION_ROADMAP.md b/docs/roadmaps/not-started/platform_ACCELERATION_ROADMAP.md similarity index 100% rename from docs/BEST_PRACTICES/PLATFORM_ACCELERATION_ROADMAP.md rename to docs/roadmaps/not-started/platform_ACCELERATION_ROADMAP.md diff --git a/docs/MARKETPLACE_MODULE_DESIGN.md b/docs/roadmaps/not-started/product_MARKETPLACE_MODULE_DESIGN.md similarity index 100% rename from docs/MARKETPLACE_MODULE_DESIGN.md rename to docs/roadmaps/not-started/product_MARKETPLACE_MODULE_DESIGN.md diff --git a/docs/PRE_LAUNCH_SIGNUP_SYSTEM.md b/docs/roadmaps/not-started/product_PRE_LAUNCH_SIGNUP_SYSTEM.md similarity index 100% rename from docs/PRE_LAUNCH_SIGNUP_SYSTEM.md rename to docs/roadmaps/not-started/product_PRE_LAUNCH_SIGNUP_SYSTEM.md diff --git a/docs/EXTRACTION_SERVICE_ROADMAP.md b/docs/roadmaps/partial/extraction_SERVICE_ROADMAP.md similarity index 100% rename from docs/EXTRACTION_SERVICE_ROADMAP.md rename to docs/roadmaps/partial/extraction_SERVICE_ROADMAP.md diff --git a/docs/BEST_PRACTICES/MOBILE_APPS/REACT_NATIVE_PLATFORM_SDK.md b/docs/roadmaps/partial/mobile_REACT_NATIVE_PLATFORM_SDK.md similarity index 100% rename from docs/BEST_PRACTICES/MOBILE_APPS/REACT_NATIVE_PLATFORM_SDK.md rename to docs/roadmaps/partial/mobile_REACT_NATIVE_PLATFORM_SDK.md diff --git a/docs/WINDSURF/PLATFORM_COMPONENTS_ROADMAP.md b/docs/roadmaps/partial/platform_COMPONENTS_ROADMAP.md similarity index 100% rename from docs/WINDSURF/PLATFORM_COMPONENTS_ROADMAP.md rename to docs/roadmaps/partial/platform_COMPONENTS_ROADMAP.md diff --git a/docs/WINDSURF/TELEMETRY_ROADMAP.md b/docs/roadmaps/partial/telemetry_IMPLEMENTATION_ROADMAP.md similarity index 100% rename from docs/WINDSURF/TELEMETRY_ROADMAP.md rename to docs/roadmaps/partial/telemetry_IMPLEMENTATION_ROADMAP.md diff --git a/docs/WINDSURF/CODEX_SESSION_SUMMARY_AND_PLAYBOOK.md b/docs/session-logs/CODEX_SESSION_SUMMARY_AND_PLAYBOOK.md similarity index 100% rename from docs/WINDSURF/CODEX_SESSION_SUMMARY_AND_PLAYBOOK.md rename to docs/session-logs/CODEX_SESSION_SUMMARY_AND_PLAYBOOK.md diff --git a/docs/codex/platform_and_products_refactor_2026-02-14.md b/docs/session-logs/platform_and_products_refactor_2026-02-14.md similarity index 99% rename from docs/codex/platform_and_products_refactor_2026-02-14.md rename to docs/session-logs/platform_and_products_refactor_2026-02-14.md index eee9a6c5..255721ca 100644 --- a/docs/codex/platform_and_products_refactor_2026-02-14.md +++ b/docs/session-logs/platform_and_products_refactor_2026-02-14.md @@ -10,7 +10,7 @@ You currently have 3 repos checked out side-by-side: The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping. -Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we *can* adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates. +Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we _can_ adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates. --- @@ -283,4 +283,3 @@ Pick one of these patterns: 3. Decide whether LysnrAI FastAPI backend is: - product-only (dictation/transcripts), or - a transitional legacy backend to be decomposed into platform services. -