docs: reorganize docs/ into category folders with roadmaps/{completed,partial,not-started}

This commit is contained in:
saravanakumardb1 2026-03-01 23:17:15 -08:00
parent 7742ebd58f
commit dd4410548e
31 changed files with 1206 additions and 1105 deletions

View File

@ -1,726 +0,0 @@
# Cloud Provider Migration Analysis — ByteLyst Ecosystem
> **Author:** AI Analysis (Cascade)
> **Date:** 2026-03-01
> **Scope:** All 7 repos — LysnrAI, MindLyst, ChronoMind, NomGap, PeakPulse, Common Platform, JarvisJr
> **Purpose:** Evaluate current Azure investment, assess migration feasibility to AWS / GCP / MongoDB Atlas / multi-cloud, and provide actionable recommendations.
---
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Current Azure Investment Inventory](#2-current-azure-investment-inventory)
3. [Dependency Depth Analysis](#3-dependency-depth-analysis)
4. [Migration Target Comparison](#4-migration-target-comparison)
5. [Per-Service Migration Analysis](#5-per-service-migration-analysis)
6. [Migration Scenario Scoring](#6-migration-scenario-scoring)
7. [Cost Comparison](#7-cost-comparison)
8. [Abstraction Layer Assessment](#8-abstraction-layer-assessment)
9. [Risk Analysis](#9-risk-analysis)
10. [Recommendations](#10-recommendations)
11. [Migration Playbook (If Chosen)](#11-migration-playbook-if-chosen)
12. [Appendix A: File-Level Azure Dependency Map](#appendix-a-file-level-azure-dependency-map)
13. [Appendix B: SDK & Package Inventory](#appendix-b-sdk--package-inventory)
---
## 1. Executive Summary
The ByteLyst ecosystem is **moderately coupled** to Azure. The coupling is concentrated in **3 packages** (`@bytelyst/cosmos`, `@bytelyst/blob`, `@bytelyst/config`) and **2 Python modules** (`azure_stt.py`, `cosmos_client.py`). The architecture already uses an internal abstraction layer — most application code never imports Azure SDKs directly.
### Key Findings
| Dimension | Assessment |
|-----------|-----------|
| **Overall Azure lock-in** | **Medium** — concentrated in ~15 files, but those files are foundational |
| **Easiest to migrate** | Blob Storage, Key Vault, OpenAI, Application Insights |
| **Hardest to migrate** | Cosmos DB (SQL API queries in 56+ repository files), Azure Speech SDK |
| **Best alternative DB** | MongoDB Atlas (closest query model to Cosmos SQL API) |
| **Best alternative cloud** | AWS (broadest service parity, mature SDK ecosystem) |
| **Estimated migration effort** | 48 weeks for full cloud swap (Cosmos DB is the long pole) |
| **Recommendation** | **Stay on Azure** for now, but invest in abstraction layers to reduce future switching cost |
### Azure Services Used (8 total)
| # | Azure Service | Monthly Cost | Lock-in Risk | Files Affected |
|---|--------------|-------------|-------------|----------------|
| 1 | **Cosmos DB** (SQL/NoSQL API) | ~$410 | **HIGH** | 56+ repository files, 3 databases, ~45 containers |
| 2 | **Blob Storage** | ~$0.20 | LOW | 2 packages + 1 Python module |
| 3 | **Azure OpenAI** | ~$510 | LOW | 3 files (already supports OpenAI fallback) |
| 4 | **Speech Services** | $0 (F0) | **HIGH** | 2 files (deep SDK integration, streaming) |
| 5 | **Key Vault** | ~$0.06 | LOW | 2 files (1 TS, 1 Python) |
| 6 | **Notification Hubs** | $0 (Free) | MEDIUM | Planned, not yet deeply integrated |
| 7 | **Application Insights** | $0 (5GB free) | LOW | 1 file (custom telemetry already built) |
| 8 | **Azure Identity** (DefaultAzureCredential) | $0 | LOW | Used by Key Vault + Secrets Manager |
---
## 2. Current Azure Investment Inventory
### 2.1 Azure Resources (from Azure Portal)
| Resource | Azure Name | Region | SKU | Status |
|----------|-----------|--------|-----|--------|
| Resource Group | `rg-mywisprai` | East US | — | Active |
| Cosmos DB | `cosmos-mywisprai` | West US 2 | Serverless | Active — 3 DBs, ~45 containers |
| Blob Storage | `bytelystblobs` | West US 2 | StorageV2, RAGRS | Active — 9+ containers |
| Azure OpenAI | `mywisprai-openai-sweden` | Sweden Central | S0 | Active — gpt-4o-mini deployment |
| Speech Service | `mywisprai-speech` | East US | F0 (Free) | Active |
| Key Vault | `kv-mywisprai` | East US | Standard | Active — ~25 secrets |
| Notification Hubs | `lysnnai` namespace | East US | Free | Active — 2 hubs |
| App Insights | `bytelyst-appinsights` | East US | Classic | Active |
### 2.2 Cosmos DB Databases & Containers
| Database | Containers | Products Using |
|----------|-----------|----------------|
| `lysnrai` | ~27 containers (users, subscriptions, feature_flags, audit_log, tracker_items, telemetry_events, etc.) | LysnrAI, platform-service (all products) |
| `mindlyst` | ~20 containers (brains, memory_items, streaks, reflections, etc.) | MindLyst |
| `mywisprai` | 10 containers (legacy, pre-rebrand) | Legacy / migration target |
**Total: ~57 containers across 3 databases**, all using Cosmos SQL (NoSQL) API with SQL-like queries (`SELECT`, `WHERE`, `ORDER BY`, `OFFSET/LIMIT`, aggregate functions).
### 2.3 Code Investment by Language
| Language | Azure SDK Packages | Files Using Azure | Lines of Azure-Specific Code |
|----------|-------------------|-------------------|------------------------------|
| **TypeScript** | `@azure/cosmos`, `@azure/storage-blob`, `@azure/identity`, `@azure/keyvault-secrets` | ~65 files | ~500 lines |
| **Python** | `azure-cognitiveservices-speech`, `azure-cosmos`, `azure-storage-blob`, `azure-identity`, `azure-keyvault-secrets`, `openai` (AzureOpenAI) | ~8 files | ~400 lines |
| **Swift** | `MicrosoftCognitiveServicesSpeech` (SPX framework) | ~3 files | ~150 lines |
| **Kotlin** | None directly (uses platform-service REST API) | 0 files | 0 lines |
---
## 3. Dependency Depth Analysis
### 3.1 Cosmos DB — DEEP (56+ files)
This is the **most deeply embedded** Azure dependency. Every repository module follows the pattern:
```
types.ts → repository.ts → routes.ts
Uses @azure/cosmos SDK
SQL queries: SELECT c.id, c.name FROM c WHERE c.productId = @pid
```
**Touchpoints:**
- `packages/cosmos/` — shared client singleton (`@azure/cosmos` peer dep)
- `services/platform-service/src/modules/*/repository.ts`**56 repository files** with Cosmos SQL queries
- `services/extraction-service/src/modules/*/repository.ts` — 2 repository files
- `dashboards/admin-web/src/lib/cosmos.ts` — direct `@azure/cosmos` import
- `dashboards/admin-web/src/lib/repositories/*.ts` — 4 repository files
- `mindlyst-native/web/src/lib/cosmos.ts` — direct `@azure/cosmos` import
- `learning_voice_ai_agent/src/cloud/cosmos_client.py` — Python Cosmos client
- `learning_voice_ai_agent/backend/src/cloud/cosmos.py` — Python backend Cosmos client
**Query patterns used:**
- `container.items.query()` with parameterized SQL
- `container.items.create()`, `.replace()`, `.delete()`, `.read()`
- `container.items.upsert()`
- Partition key routing (`/userId`, `/productId`, `/id`)
- Cross-partition queries (admin/analytics)
- `SELECT VALUE COUNT(1)` aggregates
- `OFFSET ... LIMIT` pagination
- `ORDER BY` sorting
- `ARRAY_CONTAINS()` for array queries
### 3.2 Azure Speech SDK — DEEP (3 files, streaming integration)
The Speech SDK is used for **real-time streaming speech-to-text** with features that are tightly coupled to the Azure SDK's event-driven architecture:
- `src/audio/azure_stt.py` — 248 lines. Uses `PushAudioInputStream`, `SpeechRecognizer`, continuous recognition with `recognizing`/`recognized`/`canceled`/`session_stopped` event callbacks, `PhraseListGrammar`, auto-language detection (10 languages), auto-reconnect
- `src/ui/settings.py` + `src/ui/unified_window.py` — connection testing
- `mindlyst-native/iosApp/Services/AzureSpeechTranscriber.swift` — iOS Swift SPX framework
- `mobile_app/ios/LysnrAI/` — iOS keyboard extension uses SPX framework
### 3.3 Blob Storage — SHALLOW (3 files)
- `packages/blob/src/blob.ts` — 162 lines, singleton client, SAS URL generation
- `src/cloud/blob_client.py` — 190 lines, Python equivalent
- `services/platform-service/src/modules/blob/` — REST API wrapper
### 3.4 Azure OpenAI — SHALLOW (3 files, already abstracted)
- `src/llm/text_cleaner.py` — uses `openai.AzureOpenAI` (OpenAI SDK with Azure endpoint)
- `backend/src/clients/openai_client.py` — uses `openai.AsyncAzureOpenAI`
- `mindlyst-native/web/src/lib/llm.ts`**already has OpenAI fallback** (resolves provider dynamically)
The `openai` Python/JS SDK supports both Azure and OpenAI endpoints with minimal config change. MindLyst web already handles this automatically.
### 3.5 Key Vault — SHALLOW (2 files)
- `packages/config/src/keyvault.ts` — 90 lines, `resolveKeyVaultSecrets()` with graceful fallback
- `src/secrets/keyvault.py` — 69 lines, `SecretResolver` class with env var fallback
Both implementations already fall back to environment variables when Key Vault is unavailable. Migration = just stop using Key Vault and use the env var path.
### 3.6 Notification Hubs — NOT YET INTEGRATED
Planned but not deeply wired. Only namespace/hub exists in Azure. Mobile apps use `BLPlatformClient` (REST) to talk to platform-service, which would route push notifications.
### 3.7 Application Insights — SHALLOW (1 file)
- `opencensus-ext-azure` in Python requirements (optional telemetry)
- Custom telemetry system already built (`@bytelyst/telemetry-client`, platform-service telemetry module with Cosmos storage)
The custom telemetry system means App Insights is supplementary, not critical.
---
## 4. Migration Target Comparison
### 4.1 Database: Cosmos DB → Alternatives
| Feature | Azure Cosmos DB (current) | MongoDB Atlas | AWS DynamoDB | Google Firestore | PostgreSQL (Supabase/Neon) |
|---------|--------------------------|---------------|-------------|-----------------|---------------------------|
| **Data model** | Document (JSON) | Document (JSON) | Key-Value + Document | Document (JSON) | Relational + JSONB |
| **Query language** | SQL-like | MQL (MongoDB Query) | PartiQL / API | GQL-like API | SQL |
| **Partition keys** | Required | Shard keys (optional) | Required | Collection groups | Not applicable |
| **Serverless** | Yes | Yes (Atlas Serverless) | Yes | Yes | Yes (Neon) |
| **SQL queries** | `SELECT c.id FROM c WHERE c.x = @y` | `db.collection.find({x: y})` | `SELECT id FROM table WHERE x = ?` | Client SDK queries | Standard SQL |
| **Aggregates** | Basic (`COUNT`, `SUM`, `AVG`) | Full (`$group`, `$match`, `$lookup`) | Limited | Limited | Full SQL |
| **Cross-partition** | Yes (expensive) | Yes (scatter-gather) | Scan (expensive) | Yes | N/A |
| **Change feed** | Yes | Change Streams | DynamoDB Streams | Real-time listeners | Logical replication |
| **Global distribution** | Built-in multi-region | Atlas Global Clusters | Global Tables | Multi-region | Manual / Citus |
| **Max doc size** | 2 MB | 16 MB | 400 KB | 1 MB | Unlimited (JSONB) |
| **Free tier** | 1000 RU/s + 25 GB | 512 MB | 25 GB + 25 WCU/RCU | 1 GiB + 50K reads/day | 0.5 GB (Neon) |
| **Migration effort** | — | **Medium** (query rewrite) | **Hard** (paradigm shift) | **Hard** (no SQL) | **Hard** (schema design) |
### 4.2 Object Storage: Blob → Alternatives
| Feature | Azure Blob (current) | AWS S3 | GCP Cloud Storage | Cloudflare R2 | MinIO (self-hosted) |
|---------|---------------------|--------|-------------------|---------------|---------------------|
| **API compatibility** | Azure Blob API | S3 API | GCS API / S3-compat | S3-compatible | S3-compatible |
| **SAS tokens** | Yes (Azure SAS) | Pre-signed URLs | Signed URLs | Pre-signed URLs | Pre-signed URLs |
| **CDN integration** | Azure CDN | CloudFront | Cloud CDN | Built-in | Manual |
| **Cost (per GB)** | $0.018 (Cool) | $0.023 (Standard) | $0.020 | $0.015 (no egress) | Self-hosted |
| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** |
### 4.3 Speech-to-Text: Azure Speech → Alternatives
| Feature | Azure Speech (current) | AWS Transcribe | Google Speech-to-Text | Deepgram | Whisper (local) |
|---------|----------------------|----------------|----------------------|----------|-----------------|
| **Streaming STT** | Yes (push stream) | Yes (WebSocket) | Yes (streaming) | Yes (WebSocket) | No (batch only) |
| **Languages** | 100+ | 100+ | 125+ | 36+ | 99+ |
| **Auto-detect lang** | Up to 10 at-once | Yes | Yes | Yes | Yes |
| **Custom vocabulary** | PhraseListGrammar | Custom vocabulary | Speech adaptation | Keywords | No |
| **Native SDK** | Python, Swift (SPX), JS | Python, no iOS SDK | Python, iOS, JS | REST/WebSocket | Python only |
| **iOS native SDK** | SPX framework (ObjC) | No native SDK | Yes (gRPC) | No native SDK | No |
| **Free tier** | 5 hrs/month (F0) | 60 min/month | 60 min/month | None | Free (local GPU) |
| **Latency** | ~200ms | ~300ms | ~200ms | ~100ms | ~500ms+ (local) |
| **Migration effort** | — | **Hard** (no iOS SDK) | **Medium** (has iOS SDK) | **Medium** (REST only) | **Hard** (no streaming) |
### 4.4 LLM / AI: Azure OpenAI → Alternatives
| Feature | Azure OpenAI (current) | OpenAI API (direct) | Google Gemini | AWS Bedrock | Anthropic Claude |
|---------|----------------------|--------------------|--------------|-----------| -----------------|
| **Models** | GPT-4o, GPT-4o-mini | Same models | Gemini 2.5 | Claude, Llama, Titan | Claude 3.5/4 |
| **API compatibility** | OpenAI SDK (azure mode) | OpenAI SDK (native) | Google SDK | AWS SDK | Anthropic SDK |
| **Data residency** | Azure regions | US only | Google regions | AWS regions | US/EU |
| **Cost (GPT-4o-mini)** | $0.15/$0.60 per M tokens | $0.15/$0.60 per M tokens | ~$0.10/$0.40 (Flash) | Varies | ~$0.25/$1.25 (Haiku) |
| **Migration effort** | — | **Trivial** (change endpoint) | **Easy** (SDK swap) | **Medium** | **Easy** (SDK swap) |
### 4.5 Secrets Management: Key Vault → Alternatives
| Feature | Azure Key Vault (current) | AWS Secrets Manager | GCP Secret Manager | HashiCorp Vault | Doppler / Infisical |
|---------|--------------------------|--------------------|--------------------|-----------------|---------------------|
| **Cost** | $0.03/10K ops | $0.40/secret/month | $0.06/10K ops | Free (OSS) | Free tier |
| **SDK** | `@azure/keyvault-secrets` | `@aws-sdk/client-secrets-manager` | `@google-cloud/secret-manager` | HTTP API | SDK / CLI |
| **Migration effort** | — | **Easy** | **Easy** | **Medium** | **Easy** |
**Note:** The codebase already falls back to env vars when Key Vault is unavailable. This means Key Vault can be replaced by **any** secrets manager or simply .env files without code changes to application logic.
### 4.6 Push Notifications: Notification Hubs → Alternatives
| Feature | Azure NH (current) | AWS SNS | Firebase Cloud Messaging | OneSignal | Expo Push |
|---------|-------------------|---------|--------------------------|-----------|-----------|
| **APNs + FCM** | Yes | Yes | FCM only (APNs via FCM) | Yes | Yes |
| **Free tier** | 1M pushes/month | 1M publishes | Unlimited | 10K subscribers | Unlimited |
| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** (NomGap uses Expo) |
---
## 5. Per-Service Migration Analysis
### 5.1 Cosmos DB → MongoDB Atlas
**Difficulty: MEDIUM-HIGH** | **Effort: 35 weeks** | **Risk: MEDIUM**
This is the **single largest migration task**. Here's why:
#### What needs to change
| Layer | Current (Cosmos SQL API) | Target (MongoDB) | Files |
|-------|--------------------------|-------------------|-------|
| Client package | `@azure/cosmos``CosmosClient` | `mongodb``MongoClient` | `packages/cosmos/src/client.ts` |
| Container registry | `getContainer(name)` | `db.collection(name)` | `packages/cosmos/src/containers.ts` |
| All repository files | `container.items.query('SELECT...')` | `collection.find({...})` | **56+ files** in platform-service |
| Dashboard Cosmos clients | `@azure/cosmos` direct | `mongodb` direct | 2 files (admin, MindLyst) |
| Python clients | `azure.cosmos.CosmosClient` | `pymongo.MongoClient` | 2 files |
| Query syntax | SQL-like (`SELECT c.id FROM c WHERE c.productId = @pid AND c.userId = @uid ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20`) | MQL (`collection.find({productId: pid, userId: uid}).sort({createdAt: -1}).skip(0).limit(20)`) | All repository files |
| Partition keys | Explicit partition key in every query | Shard key (auto-routed) | All repository files |
| Upsert | `container.items.upsert(doc)` | `collection.updateOne({_id: id}, {$set: doc}, {upsert: true})` | ~20 files |
| Read by ID | `container.item(id, partitionKey).read()` | `collection.findOne({_id: id})` | All repository files |
#### What stays the same
- Document structure (JSON documents with `id`, `productId`, partition keys)
- Data model (no schema changes needed — MongoDB is also schemaless)
- Partition key concept maps to shard key
- Serverless pricing model available on both
#### Key migration steps
1. Update `@bytelyst/cosmos` package to export MongoDB-compatible API
2. Rewrite all SQL queries to MQL (56+ files)
3. Replace `container.items.query()``collection.find()`
4. Replace `container.item(id, pk).read()``collection.findOne({_id: id})`
5. Replace `container.items.create()``collection.insertOne()`
6. Replace `container.items.replace()``collection.replaceOne()`
7. Replace `container.items.upsert()``collection.updateOne({upsert: true})`
8. Update Python clients similarly
9. Migrate data (use Azure Data Factory or custom script)
10. Update all test mocks
#### Why MongoDB Atlas is the best DB alternative
- **Closest query model** to Cosmos SQL API (both are document DBs)
- **MongoDB has a Cosmos DB compatibility mode** (but going native is better)
- Cosmos DB was originally inspired by MongoDB's document model
- MongoDB's `find()` queries map closely to Cosmos SQL `SELECT` queries
- Both support partition/shard keys, TTL indexes, change streams
- MongoDB Atlas Serverless pricing is competitive
- MongoDB has excellent TypeScript and Python SDKs
### 5.2 Azure Speech → Google Cloud Speech-to-Text
**Difficulty: HIGH** | **Effort: 23 weeks** | **Risk: HIGH**
#### Why this is hard
- The Azure Speech SDK uses a **push-stream architecture** (`PushAudioInputStream`) that is deeply integrated into the audio pipeline
- The `SpeechRecognizer` has event-driven callbacks (`recognizing`, `recognized`, `canceled`, `session_stopped`) that the code relies on for real-time partial/final transcript delivery
- Custom vocabulary via `PhraseListGrammar` is Azure-specific
- Auto-language detection config is Azure-specific
- The **iOS SPX framework** (Objective-C) is used in LysnrAI keyboard extension and MindLyst — there's no direct equivalent for most alternatives
#### Best alternative: Google Cloud Speech-to-Text
- Has streaming recognition with similar event model
- Has an iOS SDK (gRPC-based)
- Supports custom vocabulary (speech adaptation)
- Supports auto-language detection
- Similar pricing and free tier
#### What needs to change
- `src/audio/azure_stt.py` — complete rewrite (~248 lines)
- `iosApp/Services/AzureSpeechTranscriber.swift` — complete rewrite
- `LysnrAI/LysnrKeyboard/` — keyboard extension STT integration
- Audio format handling (may differ between providers)
- Connection test code in settings UI
### 5.3 Blob Storage → AWS S3 or Cloudflare R2
**Difficulty: LOW** | **Effort: 23 days** | **Risk: LOW**
#### Why this is easy
- `@bytelyst/blob` package is a thin wrapper (162 lines)
- Only 3 files need changes
- S3 API is the de facto standard — R2, MinIO, GCS all support S3-compatible API
- SAS tokens → Pre-signed URLs (same concept, different implementation)
#### What needs to change
- `packages/blob/src/blob.ts` — swap `@azure/storage-blob``@aws-sdk/client-s3` + `@aws-sdk/s3-request-presigner`
- `src/cloud/blob_client.py` — swap `azure.storage.blob``boto3`
- `services/platform-service/src/modules/blob/` — update routes for pre-signed URL format
- Environment variables: `AZURE_BLOB_*``AWS_S3_*` or `S3_*`
### 5.4 Azure OpenAI → OpenAI API (direct) or Gemini
**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW**
#### Why this is trivial
- The `openai` Python SDK supports both Azure and OpenAI endpoints — just change config
- MindLyst web `llm.ts` **already auto-detects** Azure vs OpenAI and builds the correct URL
- LysnrAI desktop uses `AzureOpenAI` class from `openai` SDK — switch to `OpenAI` class
- Same models, same API shape, same pricing
#### What needs to change
- Set `OPENAI_API_KEY` instead of `AZURE_OPENAI_*` env vars
- Change `AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)``OpenAI(api_key=...)`
- Change `AsyncAzureOpenAI(...)``AsyncOpenAI(...)`
- Remove `api_version` parameter
- That's it. The `openai` SDK handles the rest.
### 5.5 Key Vault → Environment Variables / Any Secrets Manager
**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW**
Both `keyvault.ts` and `keyvault.py` already implement graceful fallback:
- If `AZURE_KEYVAULT_URL` is not set → uses env vars directly
- If Key Vault is unreachable → falls back to env vars
**To migrate:** Simply stop setting `AZURE_KEYVAULT_URL`. Everything works via env vars. Then optionally adopt any other secrets manager (AWS Secrets Manager, Doppler, Infisical, etc.).
### 5.6 Notification Hubs → Firebase Cloud Messaging
**Difficulty: LOW** | **Effort: 12 days** | **Risk: LOW**
Not yet deeply integrated. The platform-service notification module sends via REST API. Swap the push provider client.
### 5.7 Application Insights → Self-hosted / Grafana
**Difficulty: TRIVIAL** | **Effort: Already done** | **Risk: NONE**
The ecosystem already has:
- Custom telemetry system (`@bytelyst/telemetry-client` → platform-service → Cosmos)
- Loki + Grafana in `services/monitoring/`
- App Insights is supplementary, can be dropped with zero code changes
---
## 6. Migration Scenario Scoring
### Scenario A: Stay on Azure (Status Quo)
| Dimension | Score (1-5) | Notes |
|-----------|-------------|-------|
| Migration effort | **5** (none) | No work needed |
| Cost | **4** | ~$15/month at current scale, competitive |
| Vendor diversity | **1** | Single cloud vendor |
| Feature parity | **5** | Everything works today |
| **Total** | **15/20** | |
### Scenario B: Full Migration to AWS
| Dimension | Score (1-5) | Notes |
|-----------|-------------|-------|
| Migration effort | **2** | 68 weeks, Cosmos→DynamoDB is painful |
| Cost | **3** | Similar or slightly higher at small scale |
| Vendor diversity | **1** | Still single cloud, just different |
| Feature parity | **3** | No native iOS Speech SDK, DynamoDB query model is very different |
| **Total** | **9/20** | |
### Scenario C: Multi-Cloud (MongoDB Atlas + OpenAI + R2 + Google STT)
| Dimension | Score (1-5) | Notes |
|-----------|-------------|-------|
| Migration effort | **2** | 57 weeks, Cosmos→MongoDB is medium |
| Cost | **4** | MongoDB Atlas free tier, R2 no egress fees |
| Vendor diversity | **5** | No single-vendor dependency |
| Feature parity | **4** | MongoDB is a better document DB than Cosmos in many ways |
| **Total** | **15/20** | |
### Scenario D: Stay Azure + Add Abstraction Layers
| Dimension | Score (1-5) | Notes |
|-----------|-------------|-------|
| Migration effort | **4** | 12 weeks to add repository interface pattern |
| Cost | **4** | No change |
| Vendor diversity | **3** | Ready to switch, but still on Azure |
| Feature parity | **5** | Everything works today |
| **Total** | **16/20** | **Winner** |
### Scenario E: Migrate DB Only (Cosmos → MongoDB Atlas, keep rest on Azure)
| Dimension | Score (1-5) | Notes |
|-----------|-------------|-------|
| Migration effort | **3** | 35 weeks for DB migration |
| Cost | **4** | MongoDB Atlas Serverless may be cheaper |
| Vendor diversity | **3** | DB is independent, other services still Azure |
| Feature parity | **5** | MongoDB is very capable |
| **Total** | **15/20** | |
---
## 7. Cost Comparison
### Current Azure Costs (MVP / Low Usage)
| Service | Monthly Cost | Notes |
|---------|-------------|-------|
| Cosmos DB (Serverless) | ~$410 | 3 databases, ~45 containers |
| Blob Storage (Cool, RAGRS) | ~$0.20 | 9+ containers |
| Azure OpenAI (GPT-4o-mini) | ~$510 | Pay per token |
| Speech (F0) | $0 | 5 hrs/month free |
| Key Vault | ~$0.06 | ~25 secrets |
| Notification Hubs (Free) | $0 | 1M pushes/month |
| App Insights | $0 | 5 GB/month free |
| **Total** | **~$1020/month** | |
### Equivalent AWS Costs
| Service | AWS Equivalent | Monthly Cost |
|---------|---------------|-------------|
| Cosmos DB → DynamoDB (On-Demand) | DynamoDB | ~$515 |
| Blob → S3 Standard | S3 | ~$0.25 |
| Azure OpenAI → OpenAI API | Same pricing | ~$510 |
| Speech → Transcribe | Transcribe | ~$13 |
| Key Vault → Secrets Manager | Secrets Manager | ~$10 (per-secret pricing) |
| Notification Hubs → SNS | SNS | ~$0.50 |
| App Insights → CloudWatch | CloudWatch | ~$3 |
| **Total** | | **~$2542/month** |
### Equivalent Multi-Cloud Costs
| Service | Provider | Monthly Cost |
|---------|---------|-------------|
| Cosmos DB → MongoDB Atlas Serverless | MongoDB | ~$38 |
| Blob → Cloudflare R2 | Cloudflare | ~$0.15 (no egress) |
| Azure OpenAI → OpenAI API (direct) | OpenAI | ~$510 |
| Speech → Google STT | Google Cloud | ~$13 |
| Key Vault → Doppler (free tier) | Doppler | $0 |
| Push → Firebase FCM | Google | $0 |
| Monitoring → Grafana Cloud (free) | Grafana | $0 |
| **Total** | | **~$1022/month** |
### Cost Summary
| Scenario | Monthly Cost | vs Current |
|----------|-------------|-----------|
| **Azure (current)** | ~$1020 | Baseline |
| **Full AWS** | ~$2542 | +50110% |
| **Multi-cloud** | ~$1022 | ~Same |
| **MongoDB Atlas + Azure rest** | ~$1018 | ~Same |
**Verdict:** At current scale, cost is not a compelling reason to migrate. All options are under $50/month. Cost becomes more significant at scale (10K+ users), where MongoDB Atlas and R2 would likely be cheaper due to no egress fees and better serverless pricing.
---
## 8. Abstraction Layer Assessment
### Current State: Partially Abstracted
The codebase already has meaningful abstraction:
| Layer | Abstraction Level | Notes |
|-------|-------------------|-------|
| **Cosmos DB** | **Partial**`@bytelyst/cosmos` package | Application code still writes raw SQL queries and uses `@azure/cosmos` types |
| **Blob Storage** | **Good**`@bytelyst/blob` package | Thin wrapper, easy to swap internals |
| **OpenAI/LLM** | **Good** — MindLyst has provider auto-detection | LysnrAI desktop/backend hardcodes `AzureOpenAI` |
| **Key Vault** | **Excellent** — graceful fallback to env vars | Already cloud-agnostic in practice |
| **Speech** | **None** — raw SDK usage | Deep Azure SDK coupling in 3 files |
| **Auth (JWT)** | **Excellent** — uses `jose` library | No cloud dependency |
| **Push notifications** | **Good** — platform-service abstraction | Swap provider client only |
### What's Missing: Repository Interface Pattern
The biggest gap is that repository files directly use `@azure/cosmos` types and SQL query syntax. To make the DB layer swappable, you'd need:
```typescript
// Proposed: packages/cosmos/src/repository.ts
export interface DocumentRepository<T> {
findById(id: string, partitionKey: string): Promise<T | null>;
findMany(filter: Record<string, unknown>, opts?: QueryOptions): Promise<T[]>;
create(doc: T): Promise<T>;
replace(id: string, doc: T, partitionKey: string): Promise<T>;
upsert(doc: T): Promise<T>;
delete(id: string, partitionKey: string): Promise<void>;
count(filter: Record<string, unknown>): Promise<number>;
}
```
This would allow swapping Cosmos → MongoDB → PostgreSQL behind the interface without touching 56+ repository files.
**Effort to add:** 12 weeks. This is the **highest-ROI investment** regardless of migration decision.
---
## 9. Risk Analysis
### 9.1 Risks of Staying on Azure
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|-----------|
| Azure pricing increases | Low | Medium | Add abstraction layer for future portability |
| Azure outage | Low | High | Multi-region already possible (Cosmos global distribution) |
| Feature stagnation | Very Low | Low | Azure is investing heavily in AI services |
| Vendor lock-in deepens over time | Medium | Medium | Add abstraction layers proactively |
### 9.2 Risks of Migrating
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|-----------|
| Data loss during migration | Low | Critical | Test migration on staging first, keep Azure as backup |
| Query performance differences | Medium | Medium | Benchmark before committing |
| Feature gaps in new provider | Medium | Medium | Prototype critical features first |
| Wasted engineering time | Medium | High | Only migrate if there's a clear business driver |
| Regression bugs in 56+ repository files | High | Medium | Comprehensive test suite (1,029 tests) catches most issues |
| Speech quality degradation | Medium | High | A/B test both providers before committing |
### 9.3 Azure-Specific Lock-in Risks (ranked)
| # | Component | Lock-in Level | Escape Hatch |
|---|-----------|--------------|-------------|
| 1 | **Cosmos DB SQL API** | High | Rewrite queries to MongoDB MQL or add repository interface |
| 2 | **Azure Speech SDK (streaming)** | High | Google STT has comparable streaming API |
| 3 | **Azure Identity (DefaultAzureCredential)** | Medium | Only used by Key Vault, which is already optional |
| 4 | **Blob Storage SAS tokens** | Low | Pre-signed URLs are equivalent across all providers |
| 5 | **Azure OpenAI** | Very Low | OpenAI SDK works with both — 1-line config change |
| 6 | **Key Vault** | Very Low | Already has env var fallback |
| 7 | **Notification Hubs** | Very Low | Not deeply integrated yet |
| 8 | **Application Insights** | None | Custom telemetry already built |
---
## 10. Recommendations
### Recommended Strategy: **Stay on Azure + Invest in Abstraction** (Scenario D)
This is the highest-scoring approach. Here's the prioritized action plan:
#### Phase 1: Add Repository Interface (12 weeks)
- Create `DocumentRepository<T>` interface in `@bytelyst/cosmos`
- Implement `CosmosDocumentRepository<T>` that wraps current `@azure/cosmos` calls
- Gradually migrate the 56 repository files to use the interface
- This makes future DB migration a matter of implementing `MongoDocumentRepository<T>` — no application code changes needed
#### Phase 2: Normalize LLM Abstraction (23 days)
- Move LysnrAI desktop/backend from `AzureOpenAI` → auto-detecting provider pattern (like MindLyst web already does)
- Support `OPENAI_PROVIDER=azure|openai|gemini` across all repos
- This makes LLM provider swappable via config
#### Phase 3: Speech Abstraction Layer (1 week, optional)
- Create `SpeechTranscriber` protocol/interface
- Implement `AzureSpeechTranscriber` (current code, extracted)
- Prepare `GoogleSpeechTranscriber` stub for future use
- This is lower priority since Azure Speech F0 tier is free
#### Phase 4: Document Decision Criteria for Future Migration
- Define triggers that would justify migration (e.g., cost > $X/month, Azure outage > Y hours, need for feature Z)
- Review annually
### Why NOT Migrate Now
1. **Cost is negligible** — ~$1020/month doesn't justify weeks of engineering
2. **No business driver** — Azure isn't blocking any feature development
3. **Risk/reward is unfavorable** — 48 weeks of migration work for ~$0 cost savings
4. **Test coverage is good but not perfect** — 1,029 tests cover most paths, but query-level changes in 56 files still risk regressions
5. **Azure free tiers are generous** — Speech F0, Notification Hubs Free, App Insights free tier
### When Migration WOULD Make Sense
- **Cosmos DB costs exceed $100/month** → Consider MongoDB Atlas Serverless
- **Azure Speech quality is insufficient** → Evaluate Google STT or Deepgram
- **Enterprise customer requires specific cloud** → Build the repository interface, then implement their cloud backend
- **Azure has extended outage affecting your region** → Multi-region or multi-cloud
- **You want to go fully open-source** → PostgreSQL (Supabase) + Whisper + MinIO (significant rewrite)
---
## 11. Migration Playbook (If Chosen)
If you decide to migrate in the future, here's the execution order (shortest critical path):
### Week 12: Database Abstraction
1. Create `DocumentRepository<T>` interface
2. Implement `CosmosDocumentRepository<T>` (wraps current code)
3. Migrate all 56 repository files to use interface
4. Verify all 1,029 tests pass
### Week 34: Database Migration (Cosmos → MongoDB)
1. Implement `MongoDocumentRepository<T>`
2. Set up MongoDB Atlas Serverless cluster
3. Write data migration script (Cosmos → MongoDB)
4. Run migration on staging, verify data integrity
5. Switch repository implementation via config flag
6. Run full test suite against MongoDB
### Week 5: Storage + Secrets
1. Swap `@bytelyst/blob` internals to S3-compatible client
2. Migrate blobs (azcopy → aws s3 sync or similar)
3. Replace Key Vault with new secrets manager (or just env vars)
4. Update all environment variable names
### Week 6: LLM + Speech (if needed)
1. Switch OpenAI from Azure endpoint to direct (config change only)
2. If migrating Speech: rewrite `azure_stt.py` and Swift `AzureSpeechTranscriber`
3. A/B test new speech provider against Azure
### Week 78: Cleanup + Verification
1. Remove all `@azure/*` npm packages
2. Remove all `azure-*` pip packages
3. Update Docker configs, CI/CD
4. Update documentation
5. Monitor production for 2 weeks
---
## Appendix A: File-Level Azure Dependency Map
### TypeScript — `@azure/cosmos` (CRITICAL)
| File | Repo | Direct Import |
|------|------|---------------|
| `packages/cosmos/src/client.ts` | common-plat | `@azure/cosmos` |
| `packages/cosmos/src/containers.ts` | common-plat | `@azure/cosmos` |
| `services/platform-service/src/modules/*/repository.ts` (56 files) | common-plat | Via `@bytelyst/cosmos` |
| `services/extraction-service/src/modules/*/repository.ts` (2 files) | common-plat | Via `@bytelyst/cosmos` |
| `dashboards/admin-web/src/lib/cosmos.ts` | common-plat | `@azure/cosmos` |
| `dashboards/admin-web/src/lib/repositories/*.ts` (4 files) | common-plat | Via cosmos.ts |
| `mindlyst-native/web/src/lib/cosmos.ts` | MindLyst | `@azure/cosmos` |
### TypeScript — `@azure/storage-blob`
| File | Repo | Direct Import |
|------|------|---------------|
| `packages/blob/src/blob.ts` | common-plat | `@azure/storage-blob` |
### TypeScript — `@azure/identity` + `@azure/keyvault-secrets`
| File | Repo | Direct Import |
|------|------|---------------|
| `packages/config/src/keyvault.ts` | common-plat | Dynamic import (both) |
| `dashboards/admin-web/src/app/api/ops/secrets/route.ts` | common-plat | Both (Secrets Manager UI) |
### Python — Azure SDKs
| File | Repo | SDK |
|------|------|-----|
| `src/audio/azure_stt.py` | LysnrAI | `azure.cognitiveservices.speech` |
| `src/cloud/cosmos_client.py` | LysnrAI | `azure.cosmos` |
| `src/cloud/blob_client.py` | LysnrAI | `azure.storage.blob` |
| `src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` |
| `backend/src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` |
| `backend/src/cloud/cosmos.py` | LysnrAI | `azure.cosmos` |
| `src/llm/text_cleaner.py` | LysnrAI | `openai.AzureOpenAI` |
| `backend/src/clients/openai_client.py` | LysnrAI | `openai.AsyncAzureOpenAI` |
### Swift — Azure Speech SDK
| File | Repo | SDK |
|------|------|-----|
| `iosApp/Services/AzureSpeechTranscriber.swift` | MindLyst | `MicrosoftCognitiveServicesSpeech` |
| `LysnrAI/LysnrKeyboard/KeyboardViewController.swift` | LysnrAI | SPX framework (via CocoaPods) |
---
## Appendix B: SDK & Package Inventory
### npm packages (TypeScript)
| Package | Version | Used By | Swappable |
|---------|---------|---------|-----------|
| `@azure/cosmos` | ≥4.0.0 | `@bytelyst/cosmos`, admin-web, MindLyst web | Medium (query rewrite) |
| `@azure/storage-blob` | ≥12.0.0 | `@bytelyst/blob` | Easy (S3 compat) |
| `@azure/identity` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) |
| `@azure/keyvault-secrets` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) |
### pip packages (Python)
| Package | Version | Used By | Swappable |
|---------|---------|---------|-----------|
| `azure-cognitiveservices-speech` | ≥1.42.0 | Desktop STT | Hard (deep SDK integration) |
| `azure-cosmos` | latest | Desktop + backend Cosmos client | Medium (pymongo swap) |
| `azure-storage-blob` | ≥12.24.0 | Desktop blob client | Easy (boto3 swap) |
| `azure-identity` | ≥1.19.0 | Key Vault auth | Easy (remove) |
| `azure-keyvault-secrets` | ≥4.9.0 | Secrets resolver | Easy (remove) |
| `openai` | ≥1.60.0 | `AzureOpenAI` / `AsyncAzureOpenAI` | Trivial (change class name) |
| `opencensus-ext-azure` | ≥1.1.0 | Optional telemetry | Trivial (remove) |
### Swift packages / CocoaPods
| Package | Used By | Swappable |
|---------|---------|-----------|
| `MicrosoftCognitiveServicesSpeech` (SPX) | LysnrAI iOS, MindLyst iOS | Hard (need alternative streaming STT) |
---
*Document generated by automated codebase analysis. Numbers are accurate as of 2026-03-01. Update as the codebase evolves.*

View File

@ -1,181 +0,0 @@
# Azure Connection Audit — Full Workspace Report
> **Date:** 2026-02-22
> **Scope:** `learning_ai_common_plat`, `learning_voice_ai_agent`, `learning_multimodal_memory_agents`, `learning_ai_clock`, `learning_ai_fastgap`
> **Auditor:** Cascade (AI)
---
## Executive Summary
| Category | Issues Found | Fixed (session 1) | Fixed (session 2) | Remaining |
|----------|-------------|-------------------|-------------------|-----------|
| `x-request-id` missing | 12 clients | 2 (MindLyst) | **9** (root cause + feature-flags) | 0 ✅ |
| `x-product-id` missing | 6 clients | 0 | **6** (admin + user dashboards + Python) | 0 ✅ |
| Cosmos PK mismatch | 1 container | 0 (flagged) | 0 | 1 (needs migration) |
| `.env.example` gaps | 4 files | 1 (MindLyst) | **3** (ChronoMind, user-dash, admin-dash) | 0 ✅ |
| Hardcoded productId | 2 instances | 0 | **2** (telemetry.ts, platform_client.py) | 0 ✅ |
| Python client gaps | 1 file | 0 | **1** (headers + config) | 0 ✅ |
---
## 1. `x-request-id` Header — Root Cause
### Finding
**`@bytelyst/api-client` does NOT auto-inject `x-request-id`.**
The `createApiClient()` factory in `packages/api-client/src/client.ts` only sets `Content-Type`, auth token (via `getToken`), and caller-supplied `defaultHeaders`. No `x-request-id` is generated. This means **every consumer** that relies on `@bytelyst/api-client` without explicitly adding the header is missing request tracing.
### Root Cause Fix
Add `x-request-id: crypto.randomUUID()` to `buildHeaders()` in `packages/api-client/src/client.ts`. This single change propagates to all consumers automatically.
### Affected Clients (missing `x-request-id`)
| Repo | File | Client Pattern |
|------|------|---------------|
| `common_plat` | `dashboards/admin-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/admin-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/admin-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/tracker-web/src/lib/tracker-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `packages/extraction/src/client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/feature-flags.ts` | Custom `fetch` — no `x-request-id` |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | `httpx` — no `x-request-id` |
### Already Fixed (previous session)
| Repo | File | Status |
|------|------|--------|
| `multimodal_memory` | `web/src/lib/billing-client.ts` | ✅ Added via `defaultHeaders` |
| `multimodal_memory` | `web/src/lib/feature-flags.ts` | ✅ Added manually |
### Already Correct
| Repo | File | Status |
|------|------|--------|
| `ai_fastgap` (NomGap) | `src/api/client.ts` | ✅ Custom client with `crypto.randomUUID()` |
| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` | ✅ Custom client with `crypto.randomUUID()` |
| `voice_ai_agent` | `backend/src/main.py` | ✅ Middleware propagates/generates |
| `voice_ai_agent` | `backend/src/clients/extraction_client.py` | ✅ Passes `request_id` param |
---
## 2. `x-product-id` Header Gaps
### Clients Missing `x-product-id`
| Repo | File | Impact |
|------|------|--------|
| `common_plat` | `admin-web/src/lib/billing-client.ts` | Server can't filter by product |
| `common_plat` | `admin-web/src/lib/growth-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | Passes in body, not header |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | Passes in body/params, not header |
### Already Correct
| Repo | File |
|------|------|
| `ai_fastgap` (NomGap) | `src/api/client.ts``x-product-id: API_CONFIG.productId` |
| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts``x-product-id` header |
| `multimodal_memory` (MindLyst) | `web/src/lib/billing-client.ts` — via `defaultHeaders` |
| `multimodal_memory` (MindLyst) | `web/src/lib/feature-flags.ts` — explicit header |
| `common_plat` | `tracker-web/src/lib/tracker-client.ts` — from `localStorage` |
---
## 3. Cosmos DB Partition Key Mismatch
### `referrals` Container — 3-way Mismatch
| Location | Partition Key |
|----------|--------------|
| `platform-service/src/lib/cosmos-init.ts` | `/id` |
| MindLyst `web/src/lib/cosmos.ts` | `/userId` |
| Admin dashboard `admin-web/src/lib/cosmos.ts` | `/referrerId` |
| User dashboard `user-dashboard-web/src/lib/cosmos.ts` | `/referrerId` |
**Status:** Flagged in previous session. Cannot be fixed without data migration. Comment added to `cosmos-init.ts`.
**Risk:** Cross-partition queries will silently succeed but may return incomplete results or fail on point reads if the wrong partition key is specified.
---
## 4. Missing Environment Variables in `.env.example` Files
### ChronoMind `web/.env.example`
Currently only has:
```
NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003/api
```
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID=chronomind` — used implicitly by `platform-sync.ts` (hardcoded there, but should be env-driven for consistency)
### LysnrAI `user-dashboard-web/.env.example`
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID=lysnrai` — referenced by `feature-flags.ts` line 10
- `NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003` — referenced by `feature-flags.ts` line 11
Has `PLATFORM_SERVICE_URL` (server-side) but not the `NEXT_PUBLIC_` variant (client-side).
### LysnrAI root `.env.example`
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID` — not needed at root level (desktop app), so this is informational only.
### Admin dashboard `.env.example`
**Missing:**
- `AZURE_KEYVAULT_URL` — referenced by `instrumentation.ts` but not in `.env.example`
---
## 5. Hardcoded `productId` Values
| Repo | File | Line | Value | Should Use |
|------|------|------|-------|-----------|
| `multimodal_memory` | `web/src/lib/telemetry.ts` | 19 | `productId: 'mindlyst'` | `process.env.NEXT_PUBLIC_PRODUCT_ID` |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | 86, 101 | `product_id: str = "lysnrai"` | `settings.PRODUCT_ID` or config |
---
## 6. Python Backend Client Gaps (`platform_client.py`)
The `PlatformClient` class in `backend/src/clients/platform_client.py` has several issues:
1. **No `x-request-id` header** on any request
2. **No `x-product-id` header** on any request
3. **Creates new `httpx.AsyncClient` per request** — no connection pooling
4. **Hardcoded `product_id="lysnrai"` defaults** — should use config
---
## 7. Previously Fixed (Session 1)
| Fix | Repo | File |
|-----|------|------|
| Added `x-request-id` to billing client | `multimodal_memory` | `web/src/lib/billing-client.ts` |
| Added `x-request-id` to feature flags | `multimodal_memory` | `web/src/lib/feature-flags.ts` |
| Added 13 MindLyst containers to cosmos-init | `common_plat` | `services/platform-service/src/lib/cosmos-init.ts` |
| Added Blob Storage creds to Python config | `voice_ai_agent` | `backend/src/config.py` |
| Added missing env vars to MindLyst | `multimodal_memory` | `web/.env.example` |
---
## 8. Recommended Fix Order
1. **P0 — Root cause:** Add `x-request-id` auto-generation to `@bytelyst/api-client` `buildHeaders()` → fixes 9 TS clients at once
2. **P0 — LysnrAI feature-flags:** Add `x-request-id` to the custom `fetch` call in `user-dashboard-web/src/lib/feature-flags.ts`
3. **P1 — Python backend:** Add `x-request-id` and `x-product-id` headers to `platform_client.py`
4. **P1 — Env vars:** Add missing `NEXT_PUBLIC_*` vars to ChronoMind, LysnrAI user-dashboard, admin-dashboard `.env.example` files
5. **P2 — `x-product-id`:** Add to admin/user dashboard clients via `defaultHeaders` in `createApiClient` config
6. **P2 — Hardcoded productId:** Replace in `telemetry.ts` and `platform_client.py`
7. **P3 — Referrals PK mismatch:** Requires data migration strategy (separate task)

View File

@ -0,0 +1,749 @@
# Cloud Provider Migration Analysis — ByteLyst Ecosystem
> **Author:** AI Analysis (Cascade)
> **Date:** 2026-03-01
> **Scope:** All 7 repos — LysnrAI, MindLyst, ChronoMind, NomGap, PeakPulse, Common Platform, JarvisJr
> **Purpose:** Evaluate current Azure investment, assess migration feasibility to AWS / GCP / MongoDB Atlas / multi-cloud, and provide actionable recommendations.
---
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Current Azure Investment Inventory](#2-current-azure-investment-inventory)
3. [Dependency Depth Analysis](#3-dependency-depth-analysis)
4. [Migration Target Comparison](#4-migration-target-comparison)
5. [Per-Service Migration Analysis](#5-per-service-migration-analysis)
6. [Migration Scenario Scoring](#6-migration-scenario-scoring)
7. [Cost Comparison](#7-cost-comparison)
8. [Abstraction Layer Assessment](#8-abstraction-layer-assessment)
9. [Risk Analysis](#9-risk-analysis)
10. [Recommendations](#10-recommendations)
11. [Migration Playbook (If Chosen)](#11-migration-playbook-if-chosen)
12. [Appendix A: File-Level Azure Dependency Map](#appendix-a-file-level-azure-dependency-map)
13. [Appendix B: SDK & Package Inventory](#appendix-b-sdk--package-inventory)
---
## 1. Executive Summary
The ByteLyst ecosystem is **moderately coupled** to Azure. The coupling is concentrated in **3 packages** (`@bytelyst/cosmos`, `@bytelyst/blob`, `@bytelyst/config`) and **2 Python modules** (`azure_stt.py`, `cosmos_client.py`). The architecture already uses an internal abstraction layer — most application code never imports Azure SDKs directly.
### Key Findings
| Dimension | Assessment |
| ------------------------------ | ------------------------------------------------------------------------------------------- |
| **Overall Azure lock-in** | **Medium** — concentrated in ~15 files, but those files are foundational |
| **Easiest to migrate** | Blob Storage, Key Vault, OpenAI, Application Insights |
| **Hardest to migrate** | Cosmos DB (SQL API queries in 56+ repository files), Azure Speech SDK |
| **Best alternative DB** | MongoDB Atlas (closest query model to Cosmos SQL API) |
| **Best alternative cloud** | AWS (broadest service parity, mature SDK ecosystem) |
| **Estimated migration effort** | 48 weeks for full cloud swap (Cosmos DB is the long pole) |
| **Recommendation** | **Stay on Azure** for now, but invest in abstraction layers to reduce future switching cost |
### Azure Services Used (8 total)
| # | Azure Service | Monthly Cost | Lock-in Risk | Files Affected |
| --- | ------------------------------------------- | ------------- | ------------ | ------------------------------------------------- |
| 1 | **Cosmos DB** (SQL/NoSQL API) | ~$410 | **HIGH** | 56+ repository files, 3 databases, ~45 containers |
| 2 | **Blob Storage** | ~$0.20 | LOW | 2 packages + 1 Python module |
| 3 | **Azure OpenAI** | ~$510 | LOW | 3 files (already supports OpenAI fallback) |
| 4 | **Speech Services** | $0 (F0) | **HIGH** | 2 files (deep SDK integration, streaming) |
| 5 | **Key Vault** | ~$0.06 | LOW | 2 files (1 TS, 1 Python) |
| 6 | **Notification Hubs** | $0 (Free) | MEDIUM | Planned, not yet deeply integrated |
| 7 | **Application Insights** | $0 (5GB free) | LOW | 1 file (custom telemetry already built) |
| 8 | **Azure Identity** (DefaultAzureCredential) | $0 | LOW | Used by Key Vault + Secrets Manager |
---
## 2. Current Azure Investment Inventory
### 2.1 Azure Resources (from Azure Portal)
| Resource | Azure Name | Region | SKU | Status |
| ----------------- | ------------------------- | -------------- | ---------------- | ------------------------------- |
| Resource Group | `rg-mywisprai` | East US | — | Active |
| Cosmos DB | `cosmos-mywisprai` | West US 2 | Serverless | Active — 3 DBs, ~45 containers |
| Blob Storage | `bytelystblobs` | West US 2 | StorageV2, RAGRS | Active — 9+ containers |
| Azure OpenAI | `mywisprai-openai-sweden` | Sweden Central | S0 | Active — gpt-4o-mini deployment |
| Speech Service | `mywisprai-speech` | East US | F0 (Free) | Active |
| Key Vault | `kv-mywisprai` | East US | Standard | Active — ~25 secrets |
| Notification Hubs | `lysnnai` namespace | East US | Free | Active — 2 hubs |
| App Insights | `bytelyst-appinsights` | East US | Classic | Active |
### 2.2 Cosmos DB Databases & Containers
| Database | Containers | Products Using |
| ----------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------------- |
| `lysnrai` | ~27 containers (users, subscriptions, feature_flags, audit_log, tracker_items, telemetry_events, etc.) | LysnrAI, platform-service (all products) |
| `mindlyst` | ~20 containers (brains, memory_items, streaks, reflections, etc.) | MindLyst |
| `mywisprai` | 10 containers (legacy, pre-rebrand) | Legacy / migration target |
**Total: ~57 containers across 3 databases**, all using Cosmos SQL (NoSQL) API with SQL-like queries (`SELECT`, `WHERE`, `ORDER BY`, `OFFSET/LIMIT`, aggregate functions).
### 2.3 Code Investment by Language
| Language | Azure SDK Packages | Files Using Azure | Lines of Azure-Specific Code |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ----------------- | ---------------------------- |
| **TypeScript** | `@azure/cosmos`, `@azure/storage-blob`, `@azure/identity`, `@azure/keyvault-secrets` | ~65 files | ~500 lines |
| **Python** | `azure-cognitiveservices-speech`, `azure-cosmos`, `azure-storage-blob`, `azure-identity`, `azure-keyvault-secrets`, `openai` (AzureOpenAI) | ~8 files | ~400 lines |
| **Swift** | `MicrosoftCognitiveServicesSpeech` (SPX framework) | ~3 files | ~150 lines |
| **Kotlin** | None directly (uses platform-service REST API) | 0 files | 0 lines |
---
## 3. Dependency Depth Analysis
### 3.1 Cosmos DB — DEEP (56+ files)
This is the **most deeply embedded** Azure dependency. Every repository module follows the pattern:
```
types.ts → repository.ts → routes.ts
Uses @azure/cosmos SDK
SQL queries: SELECT c.id, c.name FROM c WHERE c.productId = @pid
```
**Touchpoints:**
- `packages/cosmos/` — shared client singleton (`@azure/cosmos` peer dep)
- `services/platform-service/src/modules/*/repository.ts`**56 repository files** with Cosmos SQL queries
- `services/extraction-service/src/modules/*/repository.ts` — 2 repository files
- `dashboards/admin-web/src/lib/cosmos.ts` — direct `@azure/cosmos` import
- `dashboards/admin-web/src/lib/repositories/*.ts` — 4 repository files
- `mindlyst-native/web/src/lib/cosmos.ts` — direct `@azure/cosmos` import
- `learning_voice_ai_agent/src/cloud/cosmos_client.py` — Python Cosmos client
- `learning_voice_ai_agent/backend/src/cloud/cosmos.py` — Python backend Cosmos client
**Query patterns used:**
- `container.items.query()` with parameterized SQL
- `container.items.create()`, `.replace()`, `.delete()`, `.read()`
- `container.items.upsert()`
- Partition key routing (`/userId`, `/productId`, `/id`)
- Cross-partition queries (admin/analytics)
- `SELECT VALUE COUNT(1)` aggregates
- `OFFSET ... LIMIT` pagination
- `ORDER BY` sorting
- `ARRAY_CONTAINS()` for array queries
### 3.2 Azure Speech SDK — DEEP (3 files, streaming integration)
The Speech SDK is used for **real-time streaming speech-to-text** with features that are tightly coupled to the Azure SDK's event-driven architecture:
- `src/audio/azure_stt.py` — 248 lines. Uses `PushAudioInputStream`, `SpeechRecognizer`, continuous recognition with `recognizing`/`recognized`/`canceled`/`session_stopped` event callbacks, `PhraseListGrammar`, auto-language detection (10 languages), auto-reconnect
- `src/ui/settings.py` + `src/ui/unified_window.py` — connection testing
- `mindlyst-native/iosApp/Services/AzureSpeechTranscriber.swift` — iOS Swift SPX framework
- `mobile_app/ios/LysnrAI/` — iOS keyboard extension uses SPX framework
### 3.3 Blob Storage — SHALLOW (3 files)
- `packages/blob/src/blob.ts` — 162 lines, singleton client, SAS URL generation
- `src/cloud/blob_client.py` — 190 lines, Python equivalent
- `services/platform-service/src/modules/blob/` — REST API wrapper
### 3.4 Azure OpenAI — SHALLOW (3 files, already abstracted)
- `src/llm/text_cleaner.py` — uses `openai.AzureOpenAI` (OpenAI SDK with Azure endpoint)
- `backend/src/clients/openai_client.py` — uses `openai.AsyncAzureOpenAI`
- `mindlyst-native/web/src/lib/llm.ts`**already has OpenAI fallback** (resolves provider dynamically)
The `openai` Python/JS SDK supports both Azure and OpenAI endpoints with minimal config change. MindLyst web already handles this automatically.
### 3.5 Key Vault — SHALLOW (2 files)
- `packages/config/src/keyvault.ts` — 90 lines, `resolveKeyVaultSecrets()` with graceful fallback
- `src/secrets/keyvault.py` — 69 lines, `SecretResolver` class with env var fallback
Both implementations already fall back to environment variables when Key Vault is unavailable. Migration = just stop using Key Vault and use the env var path.
### 3.6 Notification Hubs — NOT YET INTEGRATED
Planned but not deeply wired. Only namespace/hub exists in Azure. Mobile apps use `BLPlatformClient` (REST) to talk to platform-service, which would route push notifications.
### 3.7 Application Insights — SHALLOW (1 file)
- `opencensus-ext-azure` in Python requirements (optional telemetry)
- Custom telemetry system already built (`@bytelyst/telemetry-client`, platform-service telemetry module with Cosmos storage)
The custom telemetry system means App Insights is supplementary, not critical.
---
## 4. Migration Target Comparison
### 4.1 Database: Cosmos DB → Alternatives
| Feature | Azure Cosmos DB (current) | MongoDB Atlas | AWS DynamoDB | Google Firestore | PostgreSQL (Supabase/Neon) |
| ----------------------- | ----------------------------------- | ------------------------------------ | ---------------------------------- | --------------------- | -------------------------- |
| **Data model** | Document (JSON) | Document (JSON) | Key-Value + Document | Document (JSON) | Relational + JSONB |
| **Query language** | SQL-like | MQL (MongoDB Query) | PartiQL / API | GQL-like API | SQL |
| **Partition keys** | Required | Shard keys (optional) | Required | Collection groups | Not applicable |
| **Serverless** | Yes | Yes (Atlas Serverless) | Yes | Yes | Yes (Neon) |
| **SQL queries** | `SELECT c.id FROM c WHERE c.x = @y` | `db.collection.find({x: y})` | `SELECT id FROM table WHERE x = ?` | Client SDK queries | Standard SQL |
| **Aggregates** | Basic (`COUNT`, `SUM`, `AVG`) | Full (`$group`, `$match`, `$lookup`) | Limited | Limited | Full SQL |
| **Cross-partition** | Yes (expensive) | Yes (scatter-gather) | Scan (expensive) | Yes | N/A |
| **Change feed** | Yes | Change Streams | DynamoDB Streams | Real-time listeners | Logical replication |
| **Global distribution** | Built-in multi-region | Atlas Global Clusters | Global Tables | Multi-region | Manual / Citus |
| **Max doc size** | 2 MB | 16 MB | 400 KB | 1 MB | Unlimited (JSONB) |
| **Free tier** | 1000 RU/s + 25 GB | 512 MB | 25 GB + 25 WCU/RCU | 1 GiB + 50K reads/day | 0.5 GB (Neon) |
| **Migration effort** | — | **Medium** (query rewrite) | **Hard** (paradigm shift) | **Hard** (no SQL) | **Hard** (schema design) |
### 4.2 Object Storage: Blob → Alternatives
| Feature | Azure Blob (current) | AWS S3 | GCP Cloud Storage | Cloudflare R2 | MinIO (self-hosted) |
| --------------------- | -------------------- | ----------------- | ------------------- | ------------------ | ------------------- |
| **API compatibility** | Azure Blob API | S3 API | GCS API / S3-compat | S3-compatible | S3-compatible |
| **SAS tokens** | Yes (Azure SAS) | Pre-signed URLs | Signed URLs | Pre-signed URLs | Pre-signed URLs |
| **CDN integration** | Azure CDN | CloudFront | Cloud CDN | Built-in | Manual |
| **Cost (per GB)** | $0.018 (Cool) | $0.023 (Standard) | $0.020 | $0.015 (no egress) | Self-hosted |
| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** |
### 4.3 Speech-to-Text: Azure Speech → Alternatives
| Feature | Azure Speech (current) | AWS Transcribe | Google Speech-to-Text | Deepgram | Whisper (local) |
| --------------------- | ----------------------- | --------------------- | ------------------------ | ---------------------- | ----------------------- |
| **Streaming STT** | Yes (push stream) | Yes (WebSocket) | Yes (streaming) | Yes (WebSocket) | No (batch only) |
| **Languages** | 100+ | 100+ | 125+ | 36+ | 99+ |
| **Auto-detect lang** | Up to 10 at-once | Yes | Yes | Yes | Yes |
| **Custom vocabulary** | PhraseListGrammar | Custom vocabulary | Speech adaptation | Keywords | No |
| **Native SDK** | Python, Swift (SPX), JS | Python, no iOS SDK | Python, iOS, JS | REST/WebSocket | Python only |
| **iOS native SDK** | SPX framework (ObjC) | No native SDK | Yes (gRPC) | No native SDK | No |
| **Free tier** | 5 hrs/month (F0) | 60 min/month | 60 min/month | None | Free (local GPU) |
| **Latency** | ~200ms | ~300ms | ~200ms | ~100ms | ~500ms+ (local) |
| **Migration effort** | — | **Hard** (no iOS SDK) | **Medium** (has iOS SDK) | **Medium** (REST only) | **Hard** (no streaming) |
### 4.4 LLM / AI: Azure OpenAI → Alternatives
| Feature | Azure OpenAI (current) | OpenAI API (direct) | Google Gemini | AWS Bedrock | Anthropic Claude |
| ---------------------- | ------------------------ | ----------------------------- | -------------------- | -------------------- | -------------------- |
| **Models** | GPT-4o, GPT-4o-mini | Same models | Gemini 2.5 | Claude, Llama, Titan | Claude 3.5/4 |
| **API compatibility** | OpenAI SDK (azure mode) | OpenAI SDK (native) | Google SDK | AWS SDK | Anthropic SDK |
| **Data residency** | Azure regions | US only | Google regions | AWS regions | US/EU |
| **Cost (GPT-4o-mini)** | $0.15/$0.60 per M tokens | $0.15/$0.60 per M tokens | ~$0.10/$0.40 (Flash) | Varies | ~$0.25/$1.25 (Haiku) |
| **Migration effort** | — | **Trivial** (change endpoint) | **Easy** (SDK swap) | **Medium** | **Easy** (SDK swap) |
### 4.5 Secrets Management: Key Vault → Alternatives
| Feature | Azure Key Vault (current) | AWS Secrets Manager | GCP Secret Manager | HashiCorp Vault | Doppler / Infisical |
| -------------------- | ------------------------- | --------------------------------- | ------------------------------ | --------------- | ------------------- |
| **Cost** | $0.03/10K ops | $0.40/secret/month | $0.06/10K ops | Free (OSS) | Free tier |
| **SDK** | `@azure/keyvault-secrets` | `@aws-sdk/client-secrets-manager` | `@google-cloud/secret-manager` | HTTP API | SDK / CLI |
| **Migration effort** | — | **Easy** | **Easy** | **Medium** | **Easy** |
**Note:** The codebase already falls back to env vars when Key Vault is unavailable. This means Key Vault can be replaced by **any** secrets manager or simply .env files without code changes to application logic.
### 4.6 Push Notifications: Notification Hubs → Alternatives
| Feature | Azure NH (current) | AWS SNS | Firebase Cloud Messaging | OneSignal | Expo Push |
| -------------------- | ------------------ | ------------ | ------------------------ | --------------- | --------------------------- |
| **APNs + FCM** | Yes | Yes | FCM only (APNs via FCM) | Yes | Yes |
| **Free tier** | 1M pushes/month | 1M publishes | Unlimited | 10K subscribers | Unlimited |
| **Migration effort** | — | **Easy** | **Easy** | **Easy** | **Easy** (NomGap uses Expo) |
---
## 5. Per-Service Migration Analysis
### 5.1 Cosmos DB → MongoDB Atlas
**Difficulty: MEDIUM-HIGH** | **Effort: 35 weeks** | **Risk: MEDIUM**
This is the **single largest migration task**. Here's why:
#### What needs to change
| Layer | Current (Cosmos SQL API) | Target (MongoDB) | Files |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | ----------------------------------- |
| Client package | `@azure/cosmos``CosmosClient` | `mongodb``MongoClient` | `packages/cosmos/src/client.ts` |
| Container registry | `getContainer(name)` | `db.collection(name)` | `packages/cosmos/src/containers.ts` |
| All repository files | `container.items.query('SELECT...')` | `collection.find({...})` | **56+ files** in platform-service |
| Dashboard Cosmos clients | `@azure/cosmos` direct | `mongodb` direct | 2 files (admin, MindLyst) |
| Python clients | `azure.cosmos.CosmosClient` | `pymongo.MongoClient` | 2 files |
| Query syntax | SQL-like (`SELECT c.id FROM c WHERE c.productId = @pid AND c.userId = @uid ORDER BY c.createdAt DESC OFFSET 0 LIMIT 20`) | MQL (`collection.find({productId: pid, userId: uid}).sort({createdAt: -1}).skip(0).limit(20)`) | All repository files |
| Partition keys | Explicit partition key in every query | Shard key (auto-routed) | All repository files |
| Upsert | `container.items.upsert(doc)` | `collection.updateOne({_id: id}, {$set: doc}, {upsert: true})` | ~20 files |
| Read by ID | `container.item(id, partitionKey).read()` | `collection.findOne({_id: id})` | All repository files |
#### What stays the same
- Document structure (JSON documents with `id`, `productId`, partition keys)
- Data model (no schema changes needed — MongoDB is also schemaless)
- Partition key concept maps to shard key
- Serverless pricing model available on both
#### Key migration steps
1. Update `@bytelyst/cosmos` package to export MongoDB-compatible API
2. Rewrite all SQL queries to MQL (56+ files)
3. Replace `container.items.query()``collection.find()`
4. Replace `container.item(id, pk).read()``collection.findOne({_id: id})`
5. Replace `container.items.create()``collection.insertOne()`
6. Replace `container.items.replace()``collection.replaceOne()`
7. Replace `container.items.upsert()``collection.updateOne({upsert: true})`
8. Update Python clients similarly
9. Migrate data (use Azure Data Factory or custom script)
10. Update all test mocks
#### Why MongoDB Atlas is the best DB alternative
- **Closest query model** to Cosmos SQL API (both are document DBs)
- **MongoDB has a Cosmos DB compatibility mode** (but going native is better)
- Cosmos DB was originally inspired by MongoDB's document model
- MongoDB's `find()` queries map closely to Cosmos SQL `SELECT` queries
- Both support partition/shard keys, TTL indexes, change streams
- MongoDB Atlas Serverless pricing is competitive
- MongoDB has excellent TypeScript and Python SDKs
### 5.2 Azure Speech → Google Cloud Speech-to-Text
**Difficulty: HIGH** | **Effort: 23 weeks** | **Risk: HIGH**
#### Why this is hard
- The Azure Speech SDK uses a **push-stream architecture** (`PushAudioInputStream`) that is deeply integrated into the audio pipeline
- The `SpeechRecognizer` has event-driven callbacks (`recognizing`, `recognized`, `canceled`, `session_stopped`) that the code relies on for real-time partial/final transcript delivery
- Custom vocabulary via `PhraseListGrammar` is Azure-specific
- Auto-language detection config is Azure-specific
- The **iOS SPX framework** (Objective-C) is used in LysnrAI keyboard extension and MindLyst — there's no direct equivalent for most alternatives
#### Best alternative: Google Cloud Speech-to-Text
- Has streaming recognition with similar event model
- Has an iOS SDK (gRPC-based)
- Supports custom vocabulary (speech adaptation)
- Supports auto-language detection
- Similar pricing and free tier
#### What needs to change
- `src/audio/azure_stt.py` — complete rewrite (~248 lines)
- `iosApp/Services/AzureSpeechTranscriber.swift` — complete rewrite
- `LysnrAI/LysnrKeyboard/` — keyboard extension STT integration
- Audio format handling (may differ between providers)
- Connection test code in settings UI
### 5.3 Blob Storage → AWS S3 or Cloudflare R2
**Difficulty: LOW** | **Effort: 23 days** | **Risk: LOW**
#### Why this is easy
- `@bytelyst/blob` package is a thin wrapper (162 lines)
- Only 3 files need changes
- S3 API is the de facto standard — R2, MinIO, GCS all support S3-compatible API
- SAS tokens → Pre-signed URLs (same concept, different implementation)
#### What needs to change
- `packages/blob/src/blob.ts` — swap `@azure/storage-blob``@aws-sdk/client-s3` + `@aws-sdk/s3-request-presigner`
- `src/cloud/blob_client.py` — swap `azure.storage.blob``boto3`
- `services/platform-service/src/modules/blob/` — update routes for pre-signed URL format
- Environment variables: `AZURE_BLOB_*``AWS_S3_*` or `S3_*`
### 5.4 Azure OpenAI → OpenAI API (direct) or Gemini
**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW**
#### Why this is trivial
- The `openai` Python SDK supports both Azure and OpenAI endpoints — just change config
- MindLyst web `llm.ts` **already auto-detects** Azure vs OpenAI and builds the correct URL
- LysnrAI desktop uses `AzureOpenAI` class from `openai` SDK — switch to `OpenAI` class
- Same models, same API shape, same pricing
#### What needs to change
- Set `OPENAI_API_KEY` instead of `AZURE_OPENAI_*` env vars
- Change `AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)``OpenAI(api_key=...)`
- Change `AsyncAzureOpenAI(...)``AsyncOpenAI(...)`
- Remove `api_version` parameter
- That's it. The `openai` SDK handles the rest.
### 5.5 Key Vault → Environment Variables / Any Secrets Manager
**Difficulty: TRIVIAL** | **Effort: < 1 day** | **Risk: VERY LOW**
Both `keyvault.ts` and `keyvault.py` already implement graceful fallback:
- If `AZURE_KEYVAULT_URL` is not set → uses env vars directly
- If Key Vault is unreachable → falls back to env vars
**To migrate:** Simply stop setting `AZURE_KEYVAULT_URL`. Everything works via env vars. Then optionally adopt any other secrets manager (AWS Secrets Manager, Doppler, Infisical, etc.).
### 5.6 Notification Hubs → Firebase Cloud Messaging
**Difficulty: LOW** | **Effort: 12 days** | **Risk: LOW**
Not yet deeply integrated. The platform-service notification module sends via REST API. Swap the push provider client.
### 5.7 Application Insights → Self-hosted / Grafana
**Difficulty: TRIVIAL** | **Effort: Already done** | **Risk: NONE**
The ecosystem already has:
- Custom telemetry system (`@bytelyst/telemetry-client` → platform-service → Cosmos)
- Loki + Grafana in `services/monitoring/`
- App Insights is supplementary, can be dropped with zero code changes
---
## 6. Migration Scenario Scoring
### Scenario A: Stay on Azure (Status Quo)
| Dimension | Score (1-5) | Notes |
| ---------------- | ------------ | ---------------------------------------- |
| Migration effort | **5** (none) | No work needed |
| Cost | **4** | ~$15/month at current scale, competitive |
| Vendor diversity | **1** | Single cloud vendor |
| Feature parity | **5** | Everything works today |
| **Total** | **15/20** | |
### Scenario B: Full Migration to AWS
| Dimension | Score (1-5) | Notes |
| ---------------- | ----------- | ---------------------------------------------------------------- |
| Migration effort | **2** | 68 weeks, Cosmos→DynamoDB is painful |
| Cost | **3** | Similar or slightly higher at small scale |
| Vendor diversity | **1** | Still single cloud, just different |
| Feature parity | **3** | No native iOS Speech SDK, DynamoDB query model is very different |
| **Total** | **9/20** | |
### Scenario C: Multi-Cloud (MongoDB Atlas + OpenAI + R2 + Google STT)
| Dimension | Score (1-5) | Notes |
| ---------------- | ----------- | -------------------------------------------------------- |
| Migration effort | **2** | 57 weeks, Cosmos→MongoDB is medium |
| Cost | **4** | MongoDB Atlas free tier, R2 no egress fees |
| Vendor diversity | **5** | No single-vendor dependency |
| Feature parity | **4** | MongoDB is a better document DB than Cosmos in many ways |
| **Total** | **15/20** | |
### Scenario D: Stay Azure + Add Abstraction Layers
| Dimension | Score (1-5) | Notes |
| ---------------- | ----------- | --------------------------------------------- |
| Migration effort | **4** | 12 weeks to add repository interface pattern |
| Cost | **4** | No change |
| Vendor diversity | **3** | Ready to switch, but still on Azure |
| Feature parity | **5** | Everything works today |
| **Total** | **16/20** | **Winner** |
### Scenario E: Migrate DB Only (Cosmos → MongoDB Atlas, keep rest on Azure)
| Dimension | Score (1-5) | Notes |
| ---------------- | ----------- | --------------------------------------------- |
| Migration effort | **3** | 35 weeks for DB migration |
| Cost | **4** | MongoDB Atlas Serverless may be cheaper |
| Vendor diversity | **3** | DB is independent, other services still Azure |
| Feature parity | **5** | MongoDB is very capable |
| **Total** | **15/20** | |
---
## 7. Cost Comparison
### Current Azure Costs (MVP / Low Usage)
| Service | Monthly Cost | Notes |
| -------------------------- | ----------------- | --------------------------- |
| Cosmos DB (Serverless) | ~$410 | 3 databases, ~45 containers |
| Blob Storage (Cool, RAGRS) | ~$0.20 | 9+ containers |
| Azure OpenAI (GPT-4o-mini) | ~$510 | Pay per token |
| Speech (F0) | $0 | 5 hrs/month free |
| Key Vault | ~$0.06 | ~25 secrets |
| Notification Hubs (Free) | $0 | 1M pushes/month |
| App Insights | $0 | 5 GB/month free |
| **Total** | **~$1020/month** | |
### Equivalent AWS Costs
| Service | AWS Equivalent | Monthly Cost |
| -------------------------------- | --------------- | ------------------------- |
| Cosmos DB → DynamoDB (On-Demand) | DynamoDB | ~$515 |
| Blob → S3 Standard | S3 | ~$0.25 |
| Azure OpenAI → OpenAI API | Same pricing | ~$510 |
| Speech → Transcribe | Transcribe | ~$13 |
| Key Vault → Secrets Manager | Secrets Manager | ~$10 (per-secret pricing) |
| Notification Hubs → SNS | SNS | ~$0.50 |
| App Insights → CloudWatch | CloudWatch | ~$3 |
| **Total** | | **~$2542/month** |
### Equivalent Multi-Cloud Costs
| Service | Provider | Monthly Cost |
| ------------------------------------ | ------------ | ------------------ |
| Cosmos DB → MongoDB Atlas Serverless | MongoDB | ~$38 |
| Blob → Cloudflare R2 | Cloudflare | ~$0.15 (no egress) |
| Azure OpenAI → OpenAI API (direct) | OpenAI | ~$510 |
| Speech → Google STT | Google Cloud | ~$13 |
| Key Vault → Doppler (free tier) | Doppler | $0 |
| Push → Firebase FCM | Google | $0 |
| Monitoring → Grafana Cloud (free) | Grafana | $0 |
| **Total** | | **~$1022/month** |
### Cost Summary
| Scenario | Monthly Cost | vs Current |
| ------------------------------ | ------------ | ---------- |
| **Azure (current)** | ~$1020 | Baseline |
| **Full AWS** | ~$2542 | +50110% |
| **Multi-cloud** | ~$1022 | ~Same |
| **MongoDB Atlas + Azure rest** | ~$1018 | ~Same |
**Verdict:** At current scale, cost is not a compelling reason to migrate. All options are under $50/month. Cost becomes more significant at scale (10K+ users), where MongoDB Atlas and R2 would likely be cheaper due to no egress fees and better serverless pricing.
---
## 8. Abstraction Layer Assessment
### Current State: Partially Abstracted
The codebase already has meaningful abstraction:
| Layer | Abstraction Level | Notes |
| ---------------------- | ----------------------------------------------- | ---------------------------------------------------------------------------- |
| **Cosmos DB** | **Partial**`@bytelyst/cosmos` package | Application code still writes raw SQL queries and uses `@azure/cosmos` types |
| **Blob Storage** | **Good**`@bytelyst/blob` package | Thin wrapper, easy to swap internals |
| **OpenAI/LLM** | **Good** — MindLyst has provider auto-detection | LysnrAI desktop/backend hardcodes `AzureOpenAI` |
| **Key Vault** | **Excellent** — graceful fallback to env vars | Already cloud-agnostic in practice |
| **Speech** | **None** — raw SDK usage | Deep Azure SDK coupling in 3 files |
| **Auth (JWT)** | **Excellent** — uses `jose` library | No cloud dependency |
| **Push notifications** | **Good** — platform-service abstraction | Swap provider client only |
### What's Missing: Repository Interface Pattern
The biggest gap is that repository files directly use `@azure/cosmos` types and SQL query syntax. To make the DB layer swappable, you'd need:
```typescript
// Proposed: packages/cosmos/src/repository.ts
export interface DocumentRepository<T> {
findById(id: string, partitionKey: string): Promise<T | null>;
findMany(filter: Record<string, unknown>, opts?: QueryOptions): Promise<T[]>;
create(doc: T): Promise<T>;
replace(id: string, doc: T, partitionKey: string): Promise<T>;
upsert(doc: T): Promise<T>;
delete(id: string, partitionKey: string): Promise<void>;
count(filter: Record<string, unknown>): Promise<number>;
}
```
This would allow swapping Cosmos → MongoDB → PostgreSQL behind the interface without touching 56+ repository files.
**Effort to add:** 12 weeks. This is the **highest-ROI investment** regardless of migration decision.
---
## 9. Risk Analysis
### 9.1 Risks of Staying on Azure
| Risk | Likelihood | Impact | Mitigation |
| -------------------------------- | ---------- | ------ | ---------------------------------------------------------- |
| Azure pricing increases | Low | Medium | Add abstraction layer for future portability |
| Azure outage | Low | High | Multi-region already possible (Cosmos global distribution) |
| Feature stagnation | Very Low | Low | Azure is investing heavily in AI services |
| Vendor lock-in deepens over time | Medium | Medium | Add abstraction layers proactively |
### 9.2 Risks of Migrating
| Risk | Likelihood | Impact | Mitigation |
| --------------------------------------- | ---------- | -------- | ---------------------------------------------------------- |
| Data loss during migration | Low | Critical | Test migration on staging first, keep Azure as backup |
| Query performance differences | Medium | Medium | Benchmark before committing |
| Feature gaps in new provider | Medium | Medium | Prototype critical features first |
| Wasted engineering time | Medium | High | Only migrate if there's a clear business driver |
| Regression bugs in 56+ repository files | High | Medium | Comprehensive test suite (1,029 tests) catches most issues |
| Speech quality degradation | Medium | High | A/B test both providers before committing |
### 9.3 Azure-Specific Lock-in Risks (ranked)
| # | Component | Lock-in Level | Escape Hatch |
| --- | ------------------------------------------- | ------------- | ---------------------------------------------------------- |
| 1 | **Cosmos DB SQL API** | High | Rewrite queries to MongoDB MQL or add repository interface |
| 2 | **Azure Speech SDK (streaming)** | High | Google STT has comparable streaming API |
| 3 | **Azure Identity (DefaultAzureCredential)** | Medium | Only used by Key Vault, which is already optional |
| 4 | **Blob Storage SAS tokens** | Low | Pre-signed URLs are equivalent across all providers |
| 5 | **Azure OpenAI** | Very Low | OpenAI SDK works with both — 1-line config change |
| 6 | **Key Vault** | Very Low | Already has env var fallback |
| 7 | **Notification Hubs** | Very Low | Not deeply integrated yet |
| 8 | **Application Insights** | None | Custom telemetry already built |
---
## 10. Recommendations
### Recommended Strategy: **Stay on Azure + Invest in Abstraction** (Scenario D)
This is the highest-scoring approach. Here's the prioritized action plan:
#### Phase 1: Add Repository Interface (12 weeks)
- Create `DocumentRepository<T>` interface in `@bytelyst/cosmos`
- Implement `CosmosDocumentRepository<T>` that wraps current `@azure/cosmos` calls
- Gradually migrate the 56 repository files to use the interface
- This makes future DB migration a matter of implementing `MongoDocumentRepository<T>` — no application code changes needed
#### Phase 2: Normalize LLM Abstraction (23 days)
- Move LysnrAI desktop/backend from `AzureOpenAI` → auto-detecting provider pattern (like MindLyst web already does)
- Support `OPENAI_PROVIDER=azure|openai|gemini` across all repos
- This makes LLM provider swappable via config
#### Phase 3: Speech Abstraction Layer (1 week, optional)
- Create `SpeechTranscriber` protocol/interface
- Implement `AzureSpeechTranscriber` (current code, extracted)
- Prepare `GoogleSpeechTranscriber` stub for future use
- This is lower priority since Azure Speech F0 tier is free
#### Phase 4: Document Decision Criteria for Future Migration
- Define triggers that would justify migration (e.g., cost > $X/month, Azure outage > Y hours, need for feature Z)
- Review annually
### Why NOT Migrate Now
1. **Cost is negligible** — ~$1020/month doesn't justify weeks of engineering
2. **No business driver** — Azure isn't blocking any feature development
3. **Risk/reward is unfavorable** — 48 weeks of migration work for ~$0 cost savings
4. **Test coverage is good but not perfect** — 1,029 tests cover most paths, but query-level changes in 56 files still risk regressions
5. **Azure free tiers are generous** — Speech F0, Notification Hubs Free, App Insights free tier
### When Migration WOULD Make Sense
- **Cosmos DB costs exceed $100/month** → Consider MongoDB Atlas Serverless
- **Azure Speech quality is insufficient** → Evaluate Google STT or Deepgram
- **Enterprise customer requires specific cloud** → Build the repository interface, then implement their cloud backend
- **Azure has extended outage affecting your region** → Multi-region or multi-cloud
- **You want to go fully open-source** → PostgreSQL (Supabase) + Whisper + MinIO (significant rewrite)
---
## 11. Migration Playbook (If Chosen)
If you decide to migrate in the future, here's the execution order (shortest critical path):
### Week 12: Database Abstraction
1. Create `DocumentRepository<T>` interface
2. Implement `CosmosDocumentRepository<T>` (wraps current code)
3. Migrate all 56 repository files to use interface
4. Verify all 1,029 tests pass
### Week 34: Database Migration (Cosmos → MongoDB)
1. Implement `MongoDocumentRepository<T>`
2. Set up MongoDB Atlas Serverless cluster
3. Write data migration script (Cosmos → MongoDB)
4. Run migration on staging, verify data integrity
5. Switch repository implementation via config flag
6. Run full test suite against MongoDB
### Week 5: Storage + Secrets
1. Swap `@bytelyst/blob` internals to S3-compatible client
2. Migrate blobs (azcopy → aws s3 sync or similar)
3. Replace Key Vault with new secrets manager (or just env vars)
4. Update all environment variable names
### Week 6: LLM + Speech (if needed)
1. Switch OpenAI from Azure endpoint to direct (config change only)
2. If migrating Speech: rewrite `azure_stt.py` and Swift `AzureSpeechTranscriber`
3. A/B test new speech provider against Azure
### Week 78: Cleanup + Verification
1. Remove all `@azure/*` npm packages
2. Remove all `azure-*` pip packages
3. Update Docker configs, CI/CD
4. Update documentation
5. Monitor production for 2 weeks
---
## Appendix A: File-Level Azure Dependency Map
### TypeScript — `@azure/cosmos` (CRITICAL)
| File | Repo | Direct Import |
| ------------------------------------------------------------------- | ----------- | ---------------------- |
| `packages/cosmos/src/client.ts` | common-plat | `@azure/cosmos` |
| `packages/cosmos/src/containers.ts` | common-plat | `@azure/cosmos` |
| `services/platform-service/src/modules/*/repository.ts` (56 files) | common-plat | Via `@bytelyst/cosmos` |
| `services/extraction-service/src/modules/*/repository.ts` (2 files) | common-plat | Via `@bytelyst/cosmos` |
| `dashboards/admin-web/src/lib/cosmos.ts` | common-plat | `@azure/cosmos` |
| `dashboards/admin-web/src/lib/repositories/*.ts` (4 files) | common-plat | Via cosmos.ts |
| `mindlyst-native/web/src/lib/cosmos.ts` | MindLyst | `@azure/cosmos` |
### TypeScript — `@azure/storage-blob`
| File | Repo | Direct Import |
| --------------------------- | ----------- | --------------------- |
| `packages/blob/src/blob.ts` | common-plat | `@azure/storage-blob` |
### TypeScript — `@azure/identity` + `@azure/keyvault-secrets`
| File | Repo | Direct Import |
| ------------------------------------------------------- | ----------- | ------------------------- |
| `packages/config/src/keyvault.ts` | common-plat | Dynamic import (both) |
| `dashboards/admin-web/src/app/api/ops/secrets/route.ts` | common-plat | Both (Secrets Manager UI) |
### Python — Azure SDKs
| File | Repo | SDK |
| -------------------------------------- | ------- | ------------------------------------------ |
| `src/audio/azure_stt.py` | LysnrAI | `azure.cognitiveservices.speech` |
| `src/cloud/cosmos_client.py` | LysnrAI | `azure.cosmos` |
| `src/cloud/blob_client.py` | LysnrAI | `azure.storage.blob` |
| `src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` |
| `backend/src/secrets/keyvault.py` | LysnrAI | `azure.identity`, `azure.keyvault.secrets` |
| `backend/src/cloud/cosmos.py` | LysnrAI | `azure.cosmos` |
| `src/llm/text_cleaner.py` | LysnrAI | `openai.AzureOpenAI` |
| `backend/src/clients/openai_client.py` | LysnrAI | `openai.AsyncAzureOpenAI` |
### Swift — Azure Speech SDK
| File | Repo | SDK |
| ---------------------------------------------------- | -------- | ---------------------------------- |
| `iosApp/Services/AzureSpeechTranscriber.swift` | MindLyst | `MicrosoftCognitiveServicesSpeech` |
| `LysnrAI/LysnrKeyboard/KeyboardViewController.swift` | LysnrAI | SPX framework (via CocoaPods) |
---
## Appendix B: SDK & Package Inventory
### npm packages (TypeScript)
| Package | Version | Used By | Swappable |
| ------------------------- | ------- | ------------------------------------------- | ---------------------- |
| `@azure/cosmos` | ≥4.0.0 | `@bytelyst/cosmos`, admin-web, MindLyst web | Medium (query rewrite) |
| `@azure/storage-blob` | ≥12.0.0 | `@bytelyst/blob` | Easy (S3 compat) |
| `@azure/identity` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) |
| `@azure/keyvault-secrets` | latest | `@bytelyst/config`, admin-web secrets | Easy (remove) |
### pip packages (Python)
| Package | Version | Used By | Swappable |
| -------------------------------- | -------- | ---------------------------------- | --------------------------- |
| `azure-cognitiveservices-speech` | ≥1.42.0 | Desktop STT | Hard (deep SDK integration) |
| `azure-cosmos` | latest | Desktop + backend Cosmos client | Medium (pymongo swap) |
| `azure-storage-blob` | ≥12.24.0 | Desktop blob client | Easy (boto3 swap) |
| `azure-identity` | ≥1.19.0 | Key Vault auth | Easy (remove) |
| `azure-keyvault-secrets` | ≥4.9.0 | Secrets resolver | Easy (remove) |
| `openai` | ≥1.60.0 | `AzureOpenAI` / `AsyncAzureOpenAI` | Trivial (change class name) |
| `opencensus-ext-azure` | ≥1.1.0 | Optional telemetry | Trivial (remove) |
### Swift packages / CocoaPods
| Package | Used By | Swappable |
| ---------------------------------------- | ------------------------- | ------------------------------------- |
| `MicrosoftCognitiveServicesSpeech` (SPX) | LysnrAI iOS, MindLyst iOS | Hard (need alternative streaming STT) |
---
_Document generated by automated codebase analysis. Numbers are accurate as of 2026-03-01. Update as the codebase evolves._

View File

@ -0,0 +1,186 @@
# Azure Connection Audit — Full Workspace Report
> **Date:** 2026-02-22
> **Scope:** `learning_ai_common_plat`, `learning_voice_ai_agent`, `learning_multimodal_memory_agents`, `learning_ai_clock`, `learning_ai_fastgap`
> **Auditor:** Cascade (AI)
---
## Executive Summary
| Category | Issues Found | Fixed (session 1) | Fixed (session 2) | Remaining |
| ---------------------- | ------------ | ----------------- | ----------------------------------------- | ------------------- |
| `x-request-id` missing | 12 clients | 2 (MindLyst) | **9** (root cause + feature-flags) | 0 ✅ |
| `x-product-id` missing | 6 clients | 0 | **6** (admin + user dashboards + Python) | 0 ✅ |
| Cosmos PK mismatch | 1 container | 0 (flagged) | 0 | 1 (needs migration) |
| `.env.example` gaps | 4 files | 1 (MindLyst) | **3** (ChronoMind, user-dash, admin-dash) | 0 ✅ |
| Hardcoded productId | 2 instances | 0 | **2** (telemetry.ts, platform_client.py) | 0 ✅ |
| Python client gaps | 1 file | 0 | **1** (headers + config) | 0 ✅ |
---
## 1. `x-request-id` Header — Root Cause
### Finding
**`@bytelyst/api-client` does NOT auto-inject `x-request-id`.**
The `createApiClient()` factory in `packages/api-client/src/client.ts` only sets `Content-Type`, auth token (via `getToken`), and caller-supplied `defaultHeaders`. No `x-request-id` is generated. This means **every consumer** that relies on `@bytelyst/api-client` without explicitly adding the header is missing request tracing.
### Root Cause Fix
Add `x-request-id: crypto.randomUUID()` to `buildHeaders()` in `packages/api-client/src/client.ts`. This single change propagates to all consumers automatically.
### Affected Clients (missing `x-request-id`)
| Repo | File | Client Pattern |
| ---------------- | -------------------------------------------------- | ------------------------------------- |
| `common_plat` | `dashboards/admin-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/admin-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/admin-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `dashboards/tracker-web/src/lib/tracker-client.ts` | `createApiClient` — no `x-request-id` |
| `common_plat` | `packages/extraction/src/client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | `createApiClient` — no `x-request-id` |
| `voice_ai_agent` | `user-dashboard-web/src/lib/feature-flags.ts` | Custom `fetch` — no `x-request-id` |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | `httpx` — no `x-request-id` |
### Already Fixed (previous session)
| Repo | File | Status |
| ------------------- | ------------------------------- | ----------------------------- |
| `multimodal_memory` | `web/src/lib/billing-client.ts` | ✅ Added via `defaultHeaders` |
| `multimodal_memory` | `web/src/lib/feature-flags.ts` | ✅ Added manually |
### Already Correct
| Repo | File | Status |
| ----------------------- | ------------------------------------------ | ------------------------------------------- |
| `ai_fastgap` (NomGap) | `src/api/client.ts` | ✅ Custom client with `crypto.randomUUID()` |
| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts` | ✅ Custom client with `crypto.randomUUID()` |
| `voice_ai_agent` | `backend/src/main.py` | ✅ Middleware propagates/generates |
| `voice_ai_agent` | `backend/src/clients/extraction_client.py` | ✅ Passes `request_id` param |
---
## 2. `x-product-id` Header Gaps
### Clients Missing `x-product-id`
| Repo | File | Impact |
| ---------------- | ----------------------------------------------- | --------------------------------- |
| `common_plat` | `admin-web/src/lib/billing-client.ts` | Server can't filter by product |
| `common_plat` | `admin-web/src/lib/growth-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/billing-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/growth-client.ts` | Server can't filter by product |
| `voice_ai_agent` | `user-dashboard-web/src/lib/platform-client.ts` | Passes in body, not header |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | Passes in body/params, not header |
### Already Correct
| Repo | File |
| ------------------------------ | ------------------------------------------------------------- |
| `ai_fastgap` (NomGap) | `src/api/client.ts``x-product-id: API_CONFIG.productId` |
| `ai_clock` (ChronoMind) | `web/src/lib/platform-sync.ts``x-product-id` header |
| `multimodal_memory` (MindLyst) | `web/src/lib/billing-client.ts` — via `defaultHeaders` |
| `multimodal_memory` (MindLyst) | `web/src/lib/feature-flags.ts` — explicit header |
| `common_plat` | `tracker-web/src/lib/tracker-client.ts` — from `localStorage` |
---
## 3. Cosmos DB Partition Key Mismatch
### `referrals` Container — 3-way Mismatch
| Location | Partition Key |
| ----------------------------------------------------- | ------------- |
| `platform-service/src/lib/cosmos-init.ts` | `/id` |
| MindLyst `web/src/lib/cosmos.ts` | `/userId` |
| Admin dashboard `admin-web/src/lib/cosmos.ts` | `/referrerId` |
| User dashboard `user-dashboard-web/src/lib/cosmos.ts` | `/referrerId` |
**Status:** Flagged in previous session. Cannot be fixed without data migration. Comment added to `cosmos-init.ts`.
**Risk:** Cross-partition queries will silently succeed but may return incomplete results or fail on point reads if the wrong partition key is specified.
---
## 4. Missing Environment Variables in `.env.example` Files
### ChronoMind `web/.env.example`
Currently only has:
```
NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003/api
```
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID=chronomind` — used implicitly by `platform-sync.ts` (hardcoded there, but should be env-driven for consistency)
### LysnrAI `user-dashboard-web/.env.example`
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID=lysnrai` — referenced by `feature-flags.ts` line 10
- `NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003` — referenced by `feature-flags.ts` line 11
Has `PLATFORM_SERVICE_URL` (server-side) but not the `NEXT_PUBLIC_` variant (client-side).
### LysnrAI root `.env.example`
**Missing:**
- `NEXT_PUBLIC_PRODUCT_ID` — not needed at root level (desktop app), so this is informational only.
### Admin dashboard `.env.example`
**Missing:**
- `AZURE_KEYVAULT_URL` — referenced by `instrumentation.ts` but not in `.env.example`
---
## 5. Hardcoded `productId` Values
| Repo | File | Line | Value | Should Use |
| ------------------- | ---------------------------------------- | ------- | ----------------------------- | ------------------------------------ |
| `multimodal_memory` | `web/src/lib/telemetry.ts` | 19 | `productId: 'mindlyst'` | `process.env.NEXT_PUBLIC_PRODUCT_ID` |
| `voice_ai_agent` | `backend/src/clients/platform_client.py` | 86, 101 | `product_id: str = "lysnrai"` | `settings.PRODUCT_ID` or config |
---
## 6. Python Backend Client Gaps (`platform_client.py`)
The `PlatformClient` class in `backend/src/clients/platform_client.py` has several issues:
1. **No `x-request-id` header** on any request
2. **No `x-product-id` header** on any request
3. **Creates new `httpx.AsyncClient` per request** — no connection pooling
4. **Hardcoded `product_id="lysnrai"` defaults** — should use config
---
## 7. Previously Fixed (Session 1)
| Fix | Repo | File |
| ------------------------------------------- | ------------------- | -------------------------------------------------- |
| Added `x-request-id` to billing client | `multimodal_memory` | `web/src/lib/billing-client.ts` |
| Added `x-request-id` to feature flags | `multimodal_memory` | `web/src/lib/feature-flags.ts` |
| Added 13 MindLyst containers to cosmos-init | `common_plat` | `services/platform-service/src/lib/cosmos-init.ts` |
| Added Blob Storage creds to Python config | `voice_ai_agent` | `backend/src/config.py` |
| Added missing env vars to MindLyst | `multimodal_memory` | `web/.env.example` |
---
## 8. Recommended Fix Order
1. **P0 — Root cause:** Add `x-request-id` auto-generation to `@bytelyst/api-client` `buildHeaders()` → fixes 9 TS clients at once
2. **P0 — LysnrAI feature-flags:** Add `x-request-id` to the custom `fetch` call in `user-dashboard-web/src/lib/feature-flags.ts`
3. **P1 — Python backend:** Add `x-request-id` and `x-product-id` headers to `platform_client.py`
4. **P1 — Env vars:** Add missing `NEXT_PUBLIC_*` vars to ChronoMind, LysnrAI user-dashboard, admin-dashboard `.env.example` files
5. **P2 — `x-product-id`:** Add to admin/user dashboard clients via `defaultHeaders` in `createApiClient` config
6. **P2 — Hardcoded productId:** Replace in `telemetry.ts` and `platform_client.py`
7. **P3 — Referrals PK mismatch:** Requires data migration strategy (separate task)

View File

@ -544,21 +544,21 @@ The following gaps were identified by scanning every import in the actual codeba
## Summary ## Summary
| Phase | Packages | Tasks | Done | Status | | Phase | Packages | Tasks | Done | Status |
| --------- | ------------------------------------------------ | ------- | ------- | --------------------------------------------------------------------------------- | | --------- | ------------------------------------------------ | ------- | ------- | ------------------------------------------------------------------------ |
| **0** | Repo scaffolding + branching + rollback strategy | 14 | 14 | ✅ Complete | | **0** | Repo scaffolding + branching + rollback strategy | 14 | 14 | ✅ Complete |
| **1A** | `@bytelyst/errors` | 23 | 22 | ✅ Complete (Docker verify pending) | | **1A** | `@bytelyst/errors` | 23 | 22 | ✅ Complete (Docker verify pending) |
| **1B** | `@bytelyst/cosmos` | 33 | 32 | ✅ Complete (Docker verify pending) | | **1B** | `@bytelyst/cosmos` | 33 | 32 | ✅ Complete (Docker verify pending) |
| **2A** | `@bytelyst/config` (34 files to rewire) | 25 | 24 | ✅ Complete (Docker verify pending) | | **2A** | `@bytelyst/config` (34 files to rewire) | 25 | 24 | ✅ Complete (Docker verify pending) |
| **2B** | `@bytelyst/auth` (20+ admin routes affected) | 29 | 29 | ✅ Complete (25 tests, tracker migrated) | | **2B** | `@bytelyst/auth` (20+ admin routes affected) | 29 | 29 | ✅ Complete (25 tests, tracker migrated) |
| **2C** | `@bytelyst/fastify-core` | 24 | 22 | ✅ Services refactored, health-check verified (Docker pending) | | **2C** | `@bytelyst/fastify-core` | 24 | 22 | ✅ Services refactored, health-check verified (Docker pending) |
| **3A** | `@bytelyst/api-client` | 17 | 17 | ✅ Complete | | **3A** | `@bytelyst/api-client` | 17 | 17 | ✅ Complete |
| **3B** | `@bytelyst/react-auth` (24 consumer files) | 28 | 25 | ✅ Admin uses factory; user/tracker keep custom | | **3B** | `@bytelyst/react-auth` (24 consumer files) | 28 | 25 | ✅ Admin uses factory; user/tracker keep custom |
| **4** | `@bytelyst/design-tokens` (4 platforms) | 24 | 23 | ✅ CSS synced to MindLyst; CONTRIBUTING updated; visual verify pending | | **4** | `@bytelyst/design-tokens` (4 platforms) | 24 | 23 | ✅ CSS synced to MindLyst; CONTRIBUTING updated; visual verify pending |
| **5** | CI/CD + Docker (pre-copy strategy) | 23 | 23 | ✅ Docker build + compose up verified on home network | | **5** | CI/CD + Docker (pre-copy strategy) | 23 | 23 | ✅ Docker build + compose up verified on home network |
| **6** | Verification + docs + cleanup | 28 | 25 | ⚠️ Remaining E2E: admin + user portal flows | | **6** | Verification + docs + cleanup | 28 | 25 | ⚠️ Remaining E2E: admin + user portal flows |
| **7** | Future enhancements (+testing pkg) | 10 | 3 | 🔲 @bytelyst/testing (10 tests) + token pre-commit hook + AGENTS updated | | **7** | Future enhancements (+testing pkg) | 10 | 3 | 🔲 @bytelyst/testing (10 tests) + token pre-commit hook + AGENTS updated |
| **Total** | **10 packages (+1 bonus: logger)** | **278** | **257** | **~92% complete** | | **Total** | **10 packages (+1 bonus: logger)** | **278** | **257** | **~92% complete** |
### Bonus Package (not in original roadmap) ### Bonus Package (not in original roadmap)

View File

@ -11,13 +11,13 @@
## Why Consolidate ## Why Consolidate
| Problem | Impact | | Problem | Impact |
|---------|--------| | ---------------------------------------- | ---------------------------------------------- |
| 5 separate Node processes for 2 products | Unnecessary operational overhead | | 5 separate Node processes for 2 products | Unnecessary operational overhead |
| 5 ports to manage (40014005) | Complex docker-compose, run scripts, env files | | 5 ports to manage (40014005) | Complex docker-compose, run scripts, env files |
| 5 separate Cosmos connections | Wasted connection pool resources | | 5 separate Cosmos connections | Wasted connection pool resources |
| 5 CI pipelines | Slow feedback, more config to maintain | | 5 CI pipelines | Slow feedback, more config to maintain |
| 5 config schemas with duplicate env vars | Inconsistent config, easy to miss vars | | 5 config schemas with duplicate env vars | Inconsistent config, easy to miss vars |
**After consolidation:** 2 services — `platform-service` (port 4003) + `extraction-service` (port 4005) **After consolidation:** 2 services — `platform-service` (port 4003) + `extraction-service` (port 4005)
@ -31,12 +31,12 @@
Services export product ID differently — modules reference different names: Services export product ID differently — modules reference different names:
| Service | Export Name | Source | | Service | Export Name | Source |
|---------|-----------|--------| | -------------------- | -------------------- | ---------------------------------------------------------------------------- |
| **platform-service** | `PRODUCT_ID` | `loadProductIdentity().productId` from `@bytelyst/config` | | **platform-service** | `PRODUCT_ID` | `loadProductIdentity().productId` from `@bytelyst/config` |
| **growth-service** | `PRODUCT_ID` | same as platform ✅ | | **growth-service** | `PRODUCT_ID` | same as platform ✅ |
| **billing-service** | `PRODUCT_ID` | same as platform ✅ | | **billing-service** | `PRODUCT_ID` | same as platform ✅ |
| **tracker-service** | `DEFAULT_PRODUCT_ID` | `process.env.DEFAULT_PRODUCT_ID \|\| getProductId()`**different name** ⚠️ | | **tracker-service** | `DEFAULT_PRODUCT_ID` | `process.env.DEFAULT_PRODUCT_ID \|\| getProductId()`**different name** ⚠️ |
**Fix:** When merging tracker modules, change all `DEFAULT_PRODUCT_ID` imports to `PRODUCT_ID` in the copied module files, and add `DEFAULT_PRODUCT_ID` env var support to platform-service's `product-config.ts` for backward compat. **Fix:** When merging tracker modules, change all `DEFAULT_PRODUCT_ID` imports to `PRODUCT_ID` in the copied module files, and add `DEFAULT_PRODUCT_ID` env var support to platform-service's `product-config.ts` for backward compat.
@ -44,15 +44,16 @@ Services export product ID differently — modules reference different names:
Platform-service `package.json` is **missing** these deps needed by merged modules: Platform-service `package.json` is **missing** these deps needed by merged modules:
| Dep | Needed By | Currently In | | Dep | Needed By | Currently In |
|-----|-----------|-------------| | ------------------------------- | ------------------------------------------- | ------------------------------- |
| `stripe` (^17.5.0) | billing modules (stripe webhooks, checkout) | billing-service, growth-service | | `stripe` (^17.5.0) | billing modules (stripe webhooks, checkout) | billing-service, growth-service |
| `@bytelyst/auth` (workspace:*) | tracker modules (`extractAuth`) | tracker-service | | `@bytelyst/auth` (workspace:\*) | tracker modules (`extractAuth`) | tracker-service |
| `@fastify/rate-limit` (^10.3.0) | tracker rate limiting | tracker-service | | `@fastify/rate-limit` (^10.3.0) | tracker rate limiting | tracker-service |
### Gap 3: Billing Internal Key Auth (Global Hook) ### Gap 3: Billing Internal Key Auth (Global Hook)
`billing-service/src/server.ts` has a **global** `onRequest` hook: `billing-service/src/server.ts` has a **global** `onRequest` hook:
```typescript ```typescript
app.addHook('onRequest', async (req, reply) => { app.addHook('onRequest', async (req, reply) => {
if (path === '/health' || path.includes('/stripe/webhook')) return; if (path === '/health' || path.includes('/stripe/webhook')) return;
@ -60,6 +61,7 @@ app.addHook('onRequest', async (req, reply) => {
if (key !== INTERNAL_KEY) reply.code(401).send(...) if (key !== INTERNAL_KEY) reply.code(401).send(...)
}); });
``` ```
This **cannot** be a global hook after merge — it would block auth, audit, tracker, etc. routes. This **cannot** be a global hook after merge — it would block auth, audit, tracker, etc. routes.
**Fix:** Convert to a Fastify plugin registered only on billing route prefixes, or add `x-internal-key` check inside each billing route handler. **Fix:** Convert to a Fastify plugin registered only on billing route prefixes, or add `x-internal-key` check inside each billing route handler.
@ -67,6 +69,7 @@ This **cannot** be a global hook after merge — it would block auth, audit, tra
### Gap 4: Growth Webhooks Library ### Gap 4: Growth Webhooks Library
`growth-service/src/lib/webhooks.ts` dispatches fire-and-forget HTTP callbacks on invitation redeem. References env vars: `growth-service/src/lib/webhooks.ts` dispatches fire-and-forget HTTP callbacks on invitation redeem. References env vars:
- `WEBHOOK_INVITATION_REDEEMED_URL` - `WEBHOOK_INVITATION_REDEEMED_URL`
- `WEBHOOK_REFERRAL_STATUS_URL` - `WEBHOOK_REFERRAL_STATUS_URL`
@ -82,26 +85,26 @@ Growth-service config requires `STRIPE_SECRET_KEY` as **required** (not optional
**Dashboard API clients (TypeScript):** **Dashboard API clients (TypeScript):**
| File | Current Env Var | Current Default | | File | Current Env Var | Current Default |
|------|----------------|-----------------| | -------------------------------------------------------------- | --------------------- | ---------------------------------- |
| `admin-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `admin-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
| `admin-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | | `admin-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` |
| `user-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `user-dashboard-web/src/lib/billing-client.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
| `user-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` | | `user-dashboard-web/src/lib/growth-client.ts` | `GROWTH_SERVICE_URL` | `http://localhost:4001` |
| `user-dashboard-web/src/app/api/stripe/webhook/route.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `user-dashboard-web/src/app/api/stripe/webhook/route.ts` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
| `admin-dashboard-web/src/app/api/stripe/config/route.ts` | — | `http://localhost:4002` inline | | `admin-dashboard-web/src/app/api/stripe/config/route.ts` | — | `http://localhost:4002` inline |
| `admin-dashboard-web/src/lib/stripe-context.tsx` | — | `http://localhost:4002` (3 places) | | `admin-dashboard-web/src/lib/stripe-context.tsx` | — | `http://localhost:4002` (3 places) |
| `tracker-dashboard-web/src/app/api/tracker/[...path]/route.ts` | `TRACKER_API_URL` | `http://localhost:4004` | | `tracker-dashboard-web/src/app/api/tracker/[...path]/route.ts` | `TRACKER_API_URL` | `http://localhost:4004` |
| `tracker-dashboard-web/src/app/api/auth/login/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | | `tracker-dashboard-web/src/app/api/auth/login/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` |
| `tracker-dashboard-web/src/app/api/auth/me/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` ✅ | | `tracker-dashboard-web/src/app/api/auth/me/route.ts` | `PLATFORM_API_URL` | `http://localhost:4003` |
**Python clients (desktop + backend):** **Python clients (desktop + backend):**
| File | Current Env Var | Current Default | | File | Current Env Var | Current Default |
|------|----------------|-----------------| | --------------------------------------- | --------------------- | ----------------------- |
| `backend/src/clients/billing_client.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `backend/src/clients/billing_client.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
| `src/cloud/api_sync.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `src/cloud/api_sync.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
| `src/cloud/plan_resolver.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` | | `src/cloud/plan_resolver.py` | `BILLING_SERVICE_URL` | `http://localhost:4002` |
All these must change to `PLATFORM_SERVICE_URL` / `http://localhost:4003`. All these must change to `PLATFORM_SERVICE_URL` / `http://localhost:4003`.
@ -112,10 +115,12 @@ All these must change to `PLATFORM_SERVICE_URL` / `http://localhost:4003`.
### Gap 8: Stripe Webhook Test Hardcodes Port ### Gap 8: Stripe Webhook Test Hardcodes Port
`user-dashboard-web/src/__tests__/stripe-webhook.test.ts` sets: `user-dashboard-web/src/__tests__/stripe-webhook.test.ts` sets:
```typescript ```typescript
process.env.BILLING_SERVICE_URL = 'http://localhost:4002'; process.env.BILLING_SERVICE_URL = 'http://localhost:4002';
expect(url).toBe('http://localhost:4002/api/stripe/webhook'); expect(url).toBe('http://localhost:4002/api/stripe/webhook');
``` ```
Must update to port 4003. Must update to port 4003.
### Gap 9: Load Test Scripts ### Gap 9: Load Test Scripts
@ -133,6 +138,7 @@ Must update defaults to port 4003.
### Gap 11: LysnrAI Services Stubs ### Gap 11: LysnrAI Services Stubs
`learning_voice_ai_agent/services/` contains `.env.example` stubs for each service: `learning_voice_ai_agent/services/` contains `.env.example` stubs for each service:
- `services/billing-service/.env.example` - `services/billing-service/.env.example`
- `services/growth-service/.env.example` - `services/growth-service/.env.example`
- `services/tracker-service/.env.example` - `services/tracker-service/.env.example`
@ -154,6 +160,7 @@ Mobile apps call the Python backend (`localhost:8000`), which calls billing-serv
### Gap 14: Docker Compose `depends_on` for Tracker Dashboard ### Gap 14: Docker Compose `depends_on` for Tracker Dashboard
`learning_voice_ai_agent/docker-compose.yml` has: `learning_voice_ai_agent/docker-compose.yml` has:
```yaml ```yaml
tracker-dashboard: tracker-dashboard:
depends_on: depends_on:
@ -162,17 +169,23 @@ tracker-dashboard:
platform-service: platform-service:
condition: service_started condition: service_started
``` ```
After merge, `tracker-service` container no longer exists. Must change `depends_on` to only `platform-service`. After merge, `tracker-service` container no longer exists. Must change `depends_on` to only `platform-service`.
### Gap 15: Admin Dashboard `docs.ts` Service Directory List ### Gap 15: Admin Dashboard `docs.ts` Service Directory List
`admin-dashboard-web/src/lib/docs.ts` has a hardcoded list of service directories: `admin-dashboard-web/src/lib/docs.ts` has a hardcoded list of service directories:
```typescript ```typescript
const serviceDirs = [ const serviceDirs = [
'admin-dashboard-web', 'user-dashboard-web', 'mobile_app', 'admin-dashboard-web',
'services/billing-service', 'services/growth-service', 'user-dashboard-web',
'mobile_app',
'services/billing-service',
'services/growth-service',
]; ];
``` ```
Must update to remove old service names or replace with `services/platform-service`. Must update to remove old service names or replace with `services/platform-service`.
### Gap 16: MindLyst Docs Reference Old Services ### Gap 16: MindLyst Docs Reference Old Services
@ -195,6 +208,7 @@ Platform-service's Dockerfile only copies `services/platform-service/` — it do
### Route Path Collision Check ✅ ### Route Path Collision Check ✅
All services use unique route prefixes — **no collisions**: All services use unique route prefixes — **no collisions**:
- platform: `/auth/*`, `/audit/*`, `/notifications/*`, `/flags/*`, `/ratelimit/*`, `/blob/*`, `/devices/*` - platform: `/auth/*`, `/audit/*`, `/notifications/*`, `/flags/*`, `/ratelimit/*`, `/blob/*`, `/devices/*`
- billing: `/subscriptions/*`, `/usage/*`, `/plans/*`, `/licenses/*`, `/payments/*`, `/stripe/*` - billing: `/subscriptions/*`, `/usage/*`, `/plans/*`, `/licenses/*`, `/payments/*`, `/stripe/*`
- growth: `/invitations/*`, `/referrals/*`, `/promos/*` - growth: `/invitations/*`, `/referrals/*`, `/promos/*`
@ -244,12 +258,12 @@ services/
All containers served by one Cosmos client in platform-service: All containers served by one Cosmos client in platform-service:
| Origin | Containers | | Origin | Containers |
|--------|-----------| | ----------------------- | ----------------------------------------------------------------------------------- |
| **platform** (existing) | `users`, `audit_log`, `feature_flags`, `notification_devices`, `notification_prefs` | | **platform** (existing) | `users`, `audit_log`, `feature_flags`, `notification_devices`, `notification_prefs` |
| **billing** → platform | `subscriptions`, `payments`, `plans`, `licenses`, `usage_daily` | | **billing** → platform | `subscriptions`, `payments`, `plans`, `licenses`, `usage_daily` |
| **growth** → platform | `invitation_codes`, `referrals`, `promo_codes` | | **growth** → platform | `invitation_codes`, `referrals`, `promo_codes` |
| **tracker** → platform | `tracker_items`, `tracker_comments`, `tracker_votes` | | **tracker** → platform | `tracker_items`, `tracker_comments`, `tracker_votes` |
--- ---
@ -390,7 +404,7 @@ All containers served by one Cosmos client in platform-service:
- [x] **3.3.1** Created `platform-service/src/lib/auth.ts` re-exporting from `@bytelyst/auth` - [x] **3.3.1** Created `platform-service/src/lib/auth.ts` re-exporting from `@bytelyst/auth`
- [x] **3.3.2** Copied from tracker-service (identical content) - [x] **3.3.2** Copied from tracker-service (identical content)
- [x] **3.3.3** Added `@bytelyst/auth` (workspace:*) to package.json - [x] **3.3.3** Added `@bytelyst/auth` (workspace:\*) to package.json
- [x] **3.3.4** Added `@fastify/rate-limit` (^10.3.0) to package.json - [x] **3.3.4** Added `@fastify/rate-limit` (^10.3.0) to package.json
- [x] **3.3.5** `jose` already in platform ✅ - [x] **3.3.5** `jose` already in platform ✅
@ -560,29 +574,30 @@ Also fixed: monitoring/health.ts, AI.dev/SKILLS docs, MIGRATION_GUIDE.md [`81609
## Summary ## Summary
| Phase | What | Effort | Tests Moved | Critical Gaps Addressed | | Phase | What | Effort | Tests Moved | Critical Gaps Addressed |
|-------|------|--------|-------------|------------------------| | --------- | ------------------------------------------- | ------------- | --------------- | ------------------------------------ |
| **0** | Preparation & backup | 30 min | — | — | | **0** | Preparation & backup | 30 min | — | — |
| **1** | Merge growth-service (3 modules) | 23 hrs | ~14 | Gap 4 (webhooks), Gap 5 (Stripe key) | | **1** | Merge growth-service (3 modules) | 23 hrs | ~14 | Gap 4 (webhooks), Gap 5 (Stripe key) |
| **2** | Merge billing-service (5 modules) | 45 hrs | ~11 | Gap 3 (internal key auth) | | **2** | Merge billing-service (5 modules) | 45 hrs | ~11 | Gap 3 (internal key auth) |
| **3** | Merge tracker-service (4 modules) | 34 hrs | ~45 | Gap 1 (product ID), Gap 2 (deps) | | **3** | Merge tracker-service (4 modules) | 34 hrs | ~45 | Gap 1 (product ID), Gap 2 (deps) |
| **4** | Update consumers (20+ files across 3 repos) | 45 hrs | — | Gaps 611, 1317 | | **4** | Update consumers (20+ files across 3 repos) | 45 hrs | — | Gaps 611, 1317 |
| **5** | Documentation & final verification | 23 hrs | — | — | | **5** | Documentation & final verification | 23 hrs | — | — |
| **Total** | **5 services → 2** | **~45 days** | **~125+ tests** | **17 gaps addressed** | | **Total** | **5 services → 2** | **~45 days** | **~125+ tests** | **17 gaps addressed** |
## Port Allocation (After) ## Port Allocation (After)
| Service | Port | | Service | Port |
|---------|------| | -------------------------------------------- | -------- |
| **platform-service** | **4003** | | **platform-service** | **4003** |
| **extraction-service** | **4005** | | **extraction-service** | **4005** |
| extraction-service python sidecar (internal) | 4006 | | extraction-service python sidecar (internal) | 4006 |
Ports 4001, 4002, 4004 freed up. Ports 4001, 4002, 4004 freed up.
## Rollback Strategy ## Rollback Strategy
Each phase has its own commit. If a phase breaks something: Each phase has its own commit. If a phase breaks something:
1. `git revert <commit>` to undo that phase 1. `git revert <commit>` to undo that phase
2. The old service code is in git history 2. The old service code is in git history
3. Backup branches created in Phase 0 3. Backup branches created in Phase 0
@ -590,13 +605,13 @@ Each phase has its own commit. If a phase breaks something:
## Risks & Mitigations ## Risks & Mitigations
| Risk | Mitigation | | Risk | Mitigation |
|------|-----------| | ---------------------------------------- | ----------------------------------------------------------------------------- |
| Route path collisions | Verified ✅ — all services use unique prefixes | | Route path collisions | Verified ✅ — all services use unique prefixes |
| Config schema gets large | Group env vars by domain with clear section comments | | Config schema gets large | Group env vars by domain with clear section comments |
| Stripe webhook raw body | Fastify handles this — verify after move | | Stripe webhook raw body | Fastify handles this — verify after move |
| Billing internal key blocks other routes | Scoped Fastify plugin (Phase 2.2) isolates key check to billing prefixes only | | Billing internal key blocks other routes | Scoped Fastify plugin (Phase 2.2) isolates key check to billing prefixes only |
| Public tracker routes skip auth | Register outside scoped plugins — verify in Phase 3.5.3 | | Public tracker routes skip auth | Register outside scoped plugins — verify in Phase 3.5.3 |
| Python billing client breaks | Change env var name, keep same API paths — transparent to Python code | | Python billing client breaks | Change env var name, keep same API paths — transparent to Python code |
| Stripe webhook test fails | Explicit port update in Phase 4.4 | | Stripe webhook test fails | Explicit port update in Phase 4.4 |
| Product ID mismatch | Alias `DEFAULT_PRODUCT_ID = PRODUCT_ID` in Phase 3.2.4 | | Product ID mismatch | Alias `DEFAULT_PRODUCT_ID = PRODUCT_ID` in Phase 3.2.4 |

View File

@ -82,6 +82,7 @@ routes.ts ────────► │ container() │
``` ```
**Problems:** **Problems:**
- 38 platform-service repository files write raw Cosmos SQL queries - 38 platform-service repository files write raw Cosmos SQL queries
- 6 additional repository files in dashboards + MindLyst web - 6 additional repository files in dashboards + MindLyst web
- Blob, Speech, OpenAI all have direct Azure SDK imports - Blob, Speech, OpenAI all have direct Azure SDK imports
@ -112,6 +113,7 @@ routes.ts ────────► │ collection.findMany({ │
``` ```
**Benefits:** **Benefits:**
- Repositories use a generic query API — no SQL strings, no Azure types - Repositories use a generic query API — no SQL strings, no Azure types
- Switching provider = implement a new adapter (~200 lines) + change env var - Switching provider = implement a new adapter (~200 lines) + change env var
- In-memory adapter makes tests fast and cloud-free - In-memory adapter makes tests fast and cloud-free
@ -121,16 +123,16 @@ routes.ts ────────► │ collection.findMany({ │
## 3. Sprint Plan Overview ## 3. Sprint Plan Overview
| Sprint | Package / Scope | Effort | Files Changed | Risk | | Sprint | Package / Scope | Effort | Files Changed | Risk |
|--------|----------------|--------|---------------|------| | --------- | ------------------------------------------------- | --------------- | ----------------------------------- | -------- |
| **1** | `@bytelyst/datastore` — DB abstraction | 57 days | 44 repository files + 1 new package | Medium | | **1** | `@bytelyst/datastore` — DB abstraction | 57 days | 44 repository files + 1 new package | Medium |
| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low | | **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low |
| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low | | **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low |
| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low | | **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low |
| **5** | `@bytelyst/speech` — Speech STT abstraction | 34 days | 3 files + 1 new package | Medium | | **5** | `@bytelyst/speech` — Speech STT abstraction | 34 days | 3 files + 1 new package | Medium |
| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low | | **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low |
| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None | | **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None |
| **Total** | | **~1517 days** | ~55 files | | | **Total** | | **~1517 days** | ~55 files | |
### Priority Order ### Priority Order
@ -211,8 +213,8 @@ export type SortMap = Record<string, 1 | -1>; // 1 = ASC, -1 = DESC
export interface AggregateOptions { export interface AggregateOptions {
filter: FilterMap; filter: FilterMap;
groupBy?: string[]; groupBy?: string[];
count?: string; // alias for COUNT(1) count?: string; // alias for COUNT(1)
sum?: string; // field to SUM sum?: string; // field to SUM
} }
/** Factory that creates collections — one per provider. */ /** Factory that creates collections — one per provider. */
@ -412,6 +414,7 @@ export async function create(doc: FeatureFlagDoc): Promise<FeatureFlagDoc> {
``` ```
**Key observations:** **Key observations:**
- No SQL strings - No SQL strings
- No `@azure/cosmos` types - No `@azure/cosmos` types
- No `.items.query().fetchAll()` chaining - No `.items.query().fetchAll()` chaining
@ -445,11 +448,11 @@ export function createDatastoreProvider(): DatastoreProvider {
const provider = process.env.DB_PROVIDER || 'cosmos'; const provider = process.env.DB_PROVIDER || 'cosmos';
switch (provider) { switch (provider) {
case 'cosmos': case 'cosmos':
return new CosmosDatastoreProvider(); // uses existing COSMOS_ENDPOINT, COSMOS_KEY return new CosmosDatastoreProvider(); // uses existing COSMOS_ENDPOINT, COSMOS_KEY
case 'mongo': case 'mongo':
return new MongoDatastoreProvider(); // uses MONGO_URI return new MongoDatastoreProvider(); // uses MONGO_URI
case 'memory': case 'memory':
return new MemoryDatastoreProvider(); // no config needed return new MemoryDatastoreProvider(); // no config needed
default: default:
throw new Error(`Unknown DB_PROVIDER: ${provider}`); throw new Error(`Unknown DB_PROVIDER: ${provider}`);
} }
@ -459,6 +462,7 @@ export function createDatastoreProvider(): DatastoreProvider {
### 4.7 Migration Plan for 38 Repository Files ### 4.7 Migration Plan for 38 Repository Files
Migrate in batches, one module per commit. Each commit: Migrate in batches, one module per commit. Each commit:
1. Update the repository file to use `getCollection()` instead of `getContainer()` 1. Update the repository file to use `getCollection()` instead of `getContainer()`
2. Replace SQL queries with `findMany()` / `findOne()` / `count()` / `aggregate()` 2. Replace SQL queries with `findMany()` / `findOne()` / `count()` / `aggregate()`
3. Run the module's test file — must pass 3. Run the module's test file — must pass
@ -466,40 +470,40 @@ Migrate in batches, one module per commit. Each commit:
**Batch order** (simplest first, complex last): **Batch order** (simplest first, complex last):
| Batch | Modules | Complexity | Notes | | Batch | Modules | Complexity | Notes |
|-------|---------|-----------|-------| | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- |
| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup | | 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup |
| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files | | 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files |
| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files | | 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files |
| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files | | 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files |
| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files | | 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files |
| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files | | 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files |
| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files | | 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files |
| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files | | 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files |
| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files | | 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files |
| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files | | 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files |
### 4.8 Handling Complex Queries ### 4.8 Handling Complex Queries
Some repository files use advanced Cosmos SQL features. Here's how the interface handles them: Some repository files use advanced Cosmos SQL features. Here's how the interface handles them:
| Cosmos SQL Pattern | Datastore Interface Equivalent | | Cosmos SQL Pattern | Datastore Interface Equivalent |
|--------------------|-------------------------------| | ---------------------------------------------------- | -------------------------------------------------------- |
| `SELECT * FROM c WHERE c.x = @v` | `findMany({ filter: { x: v } })` | | `SELECT * FROM c WHERE c.x = @v` | `findMany({ filter: { x: v } })` |
| `SELECT * FROM c WHERE c.x = @v AND c.y = @w` | `findMany({ filter: { x: v, y: w } })` | | `SELECT * FROM c WHERE c.x = @v AND c.y = @w` | `findMany({ filter: { x: v, y: w } })` |
| `ORDER BY c.x ASC` | `findMany({ sort: { x: 1 } })` | | `ORDER BY c.x ASC` | `findMany({ sort: { x: 1 } })` |
| `ORDER BY c.x DESC` | `findMany({ sort: { x: -1 } })` | | `ORDER BY c.x DESC` | `findMany({ sort: { x: -1 } })` |
| `OFFSET @o LIMIT @l` | `findMany({ offset: o, limit: l })` | | `OFFSET @o LIMIT @l` | `findMany({ offset: o, limit: l })` |
| `SELECT VALUE COUNT(1) FROM c WHERE ...` | `count({ filter })` | | `SELECT VALUE COUNT(1) FROM c WHERE ...` | `count({ filter })` |
| `SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan` | `aggregate({ filter, groupBy: ['plan'], count: 'cnt' })` | | `SELECT c.plan, COUNT(1) AS cnt ... GROUP BY c.plan` | `aggregate({ filter, groupBy: ['plan'], count: 'cnt' })` |
| `NOT IS_DEFINED(c.usedAt)` | `findMany({ filter: { usedAt: { $exists: false } } })` | | `NOT IS_DEFINED(c.usedAt)` | `findMany({ filter: { usedAt: { $exists: false } } })` |
| `c.x >= @v` | `findMany({ filter: { x: { $gte: v } } })` | | `c.x >= @v` | `findMany({ filter: { x: { $gte: v } } })` |
| `ARRAY_CONTAINS(c.tags, @tag)` | `findMany({ filter: { tags: { $contains: tag } } })` | | `ARRAY_CONTAINS(c.tags, @tag)` | `findMany({ filter: { tags: { $contains: tag } } })` |
| `container().item(id, pk).read()` | `findById(id, pk)` | | `container().item(id, pk).read()` | `findById(id, pk)` |
| `container().items.create(doc)` | `create(doc)` | | `container().items.create(doc)` | `create(doc)` |
| `container().item(id, pk).replace(doc)` | `replace(id, pk, doc)` | | `container().item(id, pk).replace(doc)` | `replace(id, pk, doc)` |
| `container().items.upsert(doc)` | `upsert(doc)` | | `container().items.upsert(doc)` | `upsert(doc)` |
| `container().item(id, pk).delete()` | `delete(id, pk)` | | `container().item(id, pk).delete()` | `delete(id, pk)` |
For the filter operators, use a simple operator convention: For the filter operators, use a simple operator convention:
@ -645,6 +649,7 @@ export interface ChatCompletionResponse {
MindLyst `web/src/lib/llm.ts` already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package. MindLyst `web/src/lib/llm.ts` already auto-detects Azure vs OpenAI based on env vars. This pattern should be promoted to a shared package.
**Provider implementations:** **Provider implementations:**
- `AzureOpenAIProvider` — uses `api-key` header + deployment-scoped URL - `AzureOpenAIProvider` — uses `api-key` header + deployment-scoped URL
- `OpenAIProvider` — uses `Authorization: Bearer` header + model param - `OpenAIProvider` — uses `Authorization: Bearer` header + model param
- `GeminiProvider` — uses Google Generative AI SDK (future) - `GeminiProvider` — uses Google Generative AI SDK (future)
@ -676,6 +681,7 @@ The `openai` Python SDK already has a common interface between `OpenAI` and `Azu
### 7.1 Key Insight: Already 90% Done ### 7.1 Key Insight: Already 90% Done
The current `resolveKeyVaultSecrets()` already: The current `resolveKeyVaultSecrets()` already:
- Skips if `AZURE_KEYVAULT_URL` is not set - Skips if `AZURE_KEYVAULT_URL` is not set
- Falls back to env vars for each secret - Falls back to env vars for each secret
- Logs warnings but doesn't throw - Logs warnings but doesn't throw
@ -691,19 +697,19 @@ export interface SecretsProvider {
export async function resolveSecrets( export async function resolveSecrets(
secrets: SecretMapping[], secrets: SecretMapping[],
opts?: { provider?: string }, opts?: { provider?: string }
): Promise<void> { ): Promise<void> {
const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env'; const provider = opts?.provider || process.env.SECRETS_PROVIDER || 'env';
switch (provider) { switch (provider) {
case 'azure-keyvault': case 'azure-keyvault':
return resolveFromAzureKeyVault(secrets); // existing code return resolveFromAzureKeyVault(secrets); // existing code
case 'aws-secrets-manager': case 'aws-secrets-manager':
return resolveFromAWSSecretsManager(secrets); // future return resolveFromAWSSecretsManager(secrets); // future
case 'gcp-secret-manager': case 'gcp-secret-manager':
return resolveFromGCPSecretManager(secrets); // future return resolveFromGCPSecretManager(secrets); // future
case 'doppler': case 'doppler':
return resolveFromDoppler(secrets); // future return resolveFromDoppler(secrets); // future
case 'env': case 'env':
default: default:
return; // All secrets already in env — nothing to resolve return; // All secrets already in env — nothing to resolve
@ -720,14 +726,14 @@ The current env vars have Azure-specific names. Add **generic aliases** that fal
export const ENV_ALIASES: Record<string, string[]> = { export const ENV_ALIASES: Record<string, string[]> = {
// Generic name → fallback names (checked in order) // Generic name → fallback names (checked in order)
'BLOB_CONNECTION_STRING': ['AZURE_BLOB_CONNECTION_STRING'], BLOB_CONNECTION_STRING: ['AZURE_BLOB_CONNECTION_STRING'],
'BLOB_ACCOUNT_NAME': ['AZURE_BLOB_ACCOUNT_NAME'], BLOB_ACCOUNT_NAME: ['AZURE_BLOB_ACCOUNT_NAME'],
'BLOB_ACCOUNT_KEY': ['AZURE_BLOB_ACCOUNT_KEY'], BLOB_ACCOUNT_KEY: ['AZURE_BLOB_ACCOUNT_KEY'],
'SPEECH_KEY': ['AZURE_SPEECH_KEY'], SPEECH_KEY: ['AZURE_SPEECH_KEY'],
'SPEECH_REGION': ['AZURE_SPEECH_REGION'], SPEECH_REGION: ['AZURE_SPEECH_REGION'],
'LLM_API_KEY': ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'], LLM_API_KEY: ['AZURE_OPENAI_KEY', 'OPENAI_API_KEY'],
'LLM_ENDPOINT': ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'], LLM_ENDPOINT: ['AZURE_OPENAI_ENDPOINT', 'OPENAI_BASE_URL'],
'LLM_MODEL': ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'], LLM_MODEL: ['AZURE_OPENAI_DEPLOYMENT', 'OPENAI_MODEL'],
}; };
export function getEnv(name: string): string | undefined { export function getEnv(name: string): string | undefined {
@ -829,6 +835,7 @@ protocol SpeechTranscriber {
### 8.4 Note on Complexity ### 8.4 Note on Complexity
Speech is the hardest abstraction because: Speech is the hardest abstraction because:
- Azure Speech SDK has a unique push-stream architecture - Azure Speech SDK has a unique push-stream architecture
- Google Cloud Speech uses gRPC streaming - Google Cloud Speech uses gRPC streaming
- Deepgram uses WebSockets - Deepgram uses WebSockets
@ -871,11 +878,13 @@ Implementations: `AzureNotificationHubProvider`, `FirebaseProvider` (future), `E
**Effort:** 0.5 days (mostly done already) **Effort:** 0.5 days (mostly done already)
The ecosystem already has cloud-agnostic monitoring: The ecosystem already has cloud-agnostic monitoring:
- **Custom telemetry** via `@bytelyst/telemetry-client` → platform-service → Cosmos - **Custom telemetry** via `@bytelyst/telemetry-client` → platform-service → Cosmos
- **Loki + Grafana** in `services/monitoring/` - **Loki + Grafana** in `services/monitoring/`
- **Health checks** via `/health` endpoints on all services - **Health checks** via `/health` endpoints on all services
**Remaining work:** **Remaining work:**
- Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights) - Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights)
- Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers - Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers
@ -887,43 +896,43 @@ Once all sprints are complete, here's how much work each cloud migration scenari
### Scenario: Switch DB from Cosmos to MongoDB Atlas ### Scenario: Switch DB from Cosmos to MongoDB Atlas
| Step | Effort | Description | | Step | Effort | Description |
|------|--------|-------------| | ----------------------------------------- | ------------- | -------------------------------------------------- |
| Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() | | Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() |
| Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change | | Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change |
| Run data migration script | 24 hours | Export Cosmos JSON → import to MongoDB | | Run data migration script | 24 hours | Export Cosmos JSON → import to MongoDB |
| Run full test suite | 30 minutes | Verify all 1,029+ tests pass | | Run full test suite | 30 minutes | Verify all 1,029+ tests pass |
| **Total** | **~1.5 days** | vs 35 weeks without abstraction | | **Total** | **~1.5 days** | vs 35 weeks without abstraction |
### Scenario: Switch Storage from Azure Blob to S3 ### Scenario: Switch Storage from Azure Blob to S3
| Step | Effort | Description | | Step | Effort | Description |
|------|--------|-------------| | -------------------------------------------- | ------------- | ------------------------------- |
| Implement `S3StorageProvider` | 0.5 day | ~100 lines | | Implement `S3StorageProvider` | 0.5 day | ~100 lines |
| Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars | 5 minutes | Config change | | Set `STORAGE_PROVIDER=s3` + `AWS_*` env vars | 5 minutes | Config change |
| Migrate blobs | 12 hours | azcopy or rclone | | Migrate blobs | 12 hours | azcopy or rclone |
| **Total** | **~0.5 days** | vs 23 days without abstraction | | **Total** | **~0.5 days** | vs 23 days without abstraction |
### Scenario: Switch LLM from Azure OpenAI to OpenAI Direct ### Scenario: Switch LLM from Azure OpenAI to OpenAI Direct
| Step | Effort | Description | | Step | Effort | Description |
|------|--------|-------------| | ------------------------------------------------ | -------------- | ----------------------- |
| Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...` | 5 minutes | Config change only | | Set `LLM_PROVIDER=openai` + `OPENAI_API_KEY=...` | 5 minutes | Config change only |
| Remove `AZURE_OPENAI_*` env vars | 5 minutes | Cleanup | | Remove `AZURE_OPENAI_*` env vars | 5 minutes | Cleanup |
| **Total** | **10 minutes** | Already near-zero today | | **Total** | **10 minutes** | Already near-zero today |
### Scenario: Full Cloud Migration (Azure → AWS) ### Scenario: Full Cloud Migration (Azure → AWS)
| Step | Effort | Description | | Step | Effort | Description |
|------|--------|-------------| | -------------------------------------- | -------------- | -------------------------------- |
| Implement MongoDB/DynamoDB provider | 12 days | | | Implement MongoDB/DynamoDB provider | 12 days | |
| Implement S3 storage provider | 0.5 days | | | Implement S3 storage provider | 0.5 days | |
| Implement AWS Secrets Manager provider | 0.5 days | | | Implement AWS Secrets Manager provider | 0.5 days | |
| Switch LLM to OpenAI direct | 10 minutes | | | Switch LLM to OpenAI direct | 10 minutes | |
| Implement Google STT or AWS Transcribe | 23 days | Speech is still the hardest | | Implement Google STT or AWS Transcribe | 23 days | Speech is still the hardest |
| Implement SNS push provider | 0.5 days | | | Implement SNS push provider | 0.5 days | |
| Data migration + testing | 23 days | | | Data migration + testing | 23 days | |
| **Total** | **~710 days** | vs 48 weeks without abstraction | | **Total** | **~710 days** | vs 48 weeks without abstraction |
--- ---
@ -938,7 +947,7 @@ Every repository test should work against **any** provider. The test setup picks
import { setTestProvider } from '@bytelyst/datastore/testing'; import { setTestProvider } from '@bytelyst/datastore/testing';
beforeAll(() => { beforeAll(() => {
setTestProvider('memory'); // Fast, no network, deterministic setTestProvider('memory'); // Fast, no network, deterministic
}); });
``` ```
@ -957,6 +966,7 @@ __tests__/
### 12.3 Migration Verification Checklist ### 12.3 Migration Verification Checklist
For each sprint, before merging: For each sprint, before merging:
1. All existing tests pass (no regressions) 1. All existing tests pass (no regressions)
2. New interface tests pass with all implemented providers 2. New interface tests pass with all implemented providers
3. Manual smoke test against Azure (dev environment) 3. Manual smoke test against Azure (dev environment)
@ -1046,14 +1056,14 @@ AZURE_SPEECH_REGION=eastus
## 14. Risk Mitigation ## 14. Risk Mitigation
| Risk | Mitigation | | Risk | Mitigation |
|------|-----------| | ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| **FilterMap can't express complex Cosmos SQL** | Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators | | **FilterMap can't express complex Cosmos SQL** | Add `rawQuery()` escape hatch for edge cases. Track usage — if >5% of queries need it, expand FilterMap operators |
| **Performance regression from abstraction layer** | Benchmark critical queries before/after. The abstraction adds one function call — negligible | | **Performance regression from abstraction layer** | Benchmark critical queries before/after. The abstraction adds one function call — negligible |
| **Team unfamiliar with new patterns** | Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration | | **Team unfamiliar with new patterns** | Each sprint includes updating AGENTS.md with new conventions. Old pattern (direct Cosmos) still works during migration |
| **In-memory provider behaves differently** | Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only | | **In-memory provider behaves differently** | Integration test suite runs against real Cosmos in CI. Memory provider is for unit tests only |
| **Stale data during DB migration** | Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old | | **Stale data during DB migration** | Use dual-write pattern: write to both old and new provider during transition. Read from new, fall back to old |
| **Sprint 1 takes too long** | The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist | | **Sprint 1 takes too long** | The 38 repository files can be migrated incrementally — even 5 files at a time is progress. Old and new patterns coexist |
--- ---
@ -1116,32 +1126,81 @@ packages/llm/
```typescript ```typescript
// Exact match // Exact match
{ field: value } {
field: value;
}
// Comparison // Comparison
{ field: { $gt: value } } // > {
{ field: { $gte: value } } // >= field: {
{ field: { $lt: value } } // < $gt: value;
{ field: { $lte: value } } // <= }
{ field: { $ne: value } } // != } // >
{
field: {
$gte: value;
}
} // >=
{
field: {
$lt: value;
}
} // <
{
field: {
$lte: value;
}
} // <=
{
field: {
$ne: value;
}
} // !=
// Existence // Existence
{ field: { $exists: true } } // IS_DEFINED(c.field) {
{ field: { $exists: false } } // NOT IS_DEFINED(c.field) field: {
$exists: true;
}
} // IS_DEFINED(c.field)
{
field: {
$exists: false;
}
} // NOT IS_DEFINED(c.field)
// String // String
{ field: { $startsWith: 'prefix' } } {
{ field: { $contains: 'substr' } } field: {
$startsWith: 'prefix';
}
}
{
field: {
$contains: 'substr';
}
}
// Array // Array
{ field: { $contains: value } } // ARRAY_CONTAINS {
{ field: { $in: [v1, v2, v3] } } // IN operator field: {
$contains: value;
}
} // ARRAY_CONTAINS
{
field: {
$in: [v1, v2, v3];
}
} // IN operator
// Logical (for complex queries) // Logical (for complex queries)
{ $or: [{ field1: v1 }, { field2: v2 }] } {
$or: [{ field1: v1 }, { field2: v2 }];
}
``` ```
**Cosmos adapter** translates each operator to SQL: **Cosmos adapter** translates each operator to SQL:
- `{ $gte: v }``c.field >= @pN` - `{ $gte: v }``c.field >= @pN`
- `{ $exists: false }``NOT IS_DEFINED(c.field)` - `{ $exists: false }``NOT IS_DEFINED(c.field)`
- `{ $contains: v }` on array → `ARRAY_CONTAINS(c.field, @pN)` - `{ $contains: v }` on array → `ARRAY_CONTAINS(c.field, @pN)`
@ -1155,19 +1214,19 @@ packages/llm/
## Summary ## Summary
| Sprint | What | Days | After This Sprint... | | Sprint | What | Days | After This Sprint... |
|--------|------|------|---------------------| | --------- | -------------------- | --------------- | ---------------------------------------------------------- |
| 1 | Database abstraction | 57 | DB swap = implement 1 adapter (~200 LOC) + config change | | 1 | Database abstraction | 57 | DB swap = implement 1 adapter (~200 LOC) + config change |
| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change | | 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change |
| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) | | 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) |
| 4 | Secrets abstraction | 1 | Secrets swap = config change only | | 4 | Secrets abstraction | 1 | Secrets swap = config change only |
| 5 | Speech abstraction | 34 | Speech swap = implement 1 adapter (~300 LOC) | | 5 | Speech abstraction | 34 | Speech swap = implement 1 adapter (~300 LOC) |
| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) | | 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) |
| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic | | 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic |
| **Total** | | **~1517 days** | **Full cloud migration = ~710 days instead of 48 weeks** | | **Total** | | **~1517 days** | **Full cloud migration = ~710 days instead of 48 weeks** |
The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy. The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy.
--- ---
*Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves.* _Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves._

View File

@ -10,7 +10,7 @@ You currently have 3 repos checked out side-by-side:
The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping. The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping.
Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we *can* adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates. Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we _can_ adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates.
--- ---
@ -283,4 +283,3 @@ Pick one of these patterns:
3. Decide whether LysnrAI FastAPI backend is: 3. Decide whether LysnrAI FastAPI backend is:
- product-only (dictation/transcripts), or - product-only (dictation/transcripts), or
- a transitional legacy backend to be decomposed into platform services. - a transitional legacy backend to be decomposed into platform services.