docs(roadmap): review cloud-agnostic refactor roadmap — add status markers, correct file counts, fix companion doc link

This commit is contained in:
saravanakumardb1 2026-03-01 23:48:46 -08:00
parent b33d3cf2b3
commit 4fe0c034c2

View File

@ -1,9 +1,12 @@
# Cloud-Agnostic Refactor Roadmap — ByteLyst Ecosystem
> **Author:** AI Analysis (Cascade)
> **Date:** 2026-03-01
> **Companion doc:** [`CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`](./CLOUD_PROVIDER_MIGRATION_ANALYSIS.md)
> **Date:** 2026-03-01 · **Last reviewed:** 2026-03-02 (comprehensive workspace scan)
> **Companion doc:** [`CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`](../../architecture/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md)
> **Repos scanned:** `learning_ai_common_plat` (platform-service, 23 packages) · `learning_voice_ai_agent` (LysnrAI) · `learning_multimodal_memory_agents` (MindLyst) · `learning_ai_clock` (ChronoMind) · `learning_ai_jarvis_jr` (JarvisJr) · `learning_ai_fastgap` (NomGap) · `learning_ai_peakpulse` (PeakPulse)
> **Goal:** Refactor the codebase so it continues to work on Azure today, but switching to any other cloud provider requires **minimum effort** (days, not weeks).
>
> **Status as of 2026-03-02:** None of the 7 sprints have been started. All Azure SDK usage remains direct. 2 partial precursors exist (LysnrAI STT router, MindLyst LLM provider auto-detect). Monitoring (Sprint 7) is already cloud-agnostic.
---
@ -81,12 +84,16 @@ routes.ts ────────► │ container() │
@azure/cosmos SDK
```
**Problems:**
**Problems (verified 2026-03-02):**
- 38 platform-service repository files write raw Cosmos SQL queries
- 6 additional repository files in dashboards + MindLyst web
- Blob, Speech, OpenAI all have direct Azure SDK imports
- Switching DB means rewriting 44+ files
- **47** platform-service repository files use `getContainer()` → raw Cosmos SQL queries
- **21** product-backend repository files across 6 repos (ChronoMind 5, JarvisJr 3, NomGap 5, PeakPulse 2, LysnrAI 5, MindLyst 5 + 1 test)
- **8** dashboard/web cosmos clients (LysnrAI user-dashboard 5, MindLyst web 1, admin-web 1, tracker-web 1)
- **2** Python cosmos clients (`src/cloud/cosmos_client.py`, `src/cloud/api_sync.py`)
- `@bytelyst/blob` (162 LOC) wraps `@azure/storage-blob` directly
- `@bytelyst/config` `resolveKeyVaultSecrets()` imports `@azure/keyvault-secrets` directly
- LysnrAI desktop: `azure_stt.py` uses Azure Speech SDK, `text_cleaner.py` uses `AzureOpenAI`
- **Total: ~78 files need migration** (was estimated 44+ — significantly more after product backend split)
### Target: Provider-Agnostic Interfaces
@ -123,16 +130,16 @@ routes.ts ────────► │ collection.findMany({ │
## 3. Sprint Plan Overview
| Sprint | Package / Scope | Effort | Files Changed | Risk |
| --------- | ------------------------------------------------- | --------------- | ----------------------------------- | -------- |
| **1** | `@bytelyst/datastore` — DB abstraction | 57 days | 44 repository files + 1 new package | Medium |
| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low |
| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low |
| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low |
| **5** | `@bytelyst/speech` — Speech STT abstraction | 34 days | 3 files + 1 new package | Medium |
| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low |
| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None |
| **Total** | | **~1517 days** | ~55 files | |
| Sprint | Package / Scope | Status | Effort | Files Changed (updated) | Risk |
| --------- | ------------------------------------------------- | ------------------- | --------------- | ----------------------------------------------------------------------------------------- | -------- |
| **1** | `@bytelyst/datastore` — DB abstraction | ❌ NOT STARTED | 710 days | **78** repository files + 1 new package (was 44 — now includes product backends) | Medium |
| **2** | `@bytelyst/storage` — Blob/Object abstraction | ❌ NOT STARTED | 2 days | 3 files + 1 new package | Low |
| **3** | `@bytelyst/llm` — LLM provider abstraction | ⚠️ PRECURSOR EXISTS | 2 days | 4 files + 1 new package. MindLyst `llm.ts` already auto-detects Azure vs OpenAI | Low |
| **4** | `@bytelyst/secrets` — Secrets manager abstraction | ⚠️ PRECURSOR EXISTS | 1 day | 2 files (refactor existing `resolveKeyVaultSecrets()`) | Very Low |
| **5** | `@bytelyst/speech` — Speech STT abstraction | ⚠️ PRECURSOR EXISTS | 34 days | 3 files + 1 new package. LysnrAI `stt_router.py` already routes Azure↔Whisper | Medium |
| **6** | `@bytelyst/push` — Push notification abstraction | ❌ NOT STARTED | 1 day | 1 file + 1 new package. No push infra exists yet | Very Low |
| **7** | Monitoring/Telemetry cleanup | ✅ ALREADY DONE | 0 days | Custom telemetry via `@bytelyst/telemetry-client`, Loki+Grafana in `services/monitoring/` | None |
| **Total** | | | **~1620 days** | ~90 files | |
### Priority Order
@ -148,12 +155,14 @@ Sprint 5 (Speech) ──► Sprint 6 (Push) ──► Sprint 7 (Monitoring)
---
## 4. Sprint 1: Database Abstraction Layer
## 4. Sprint 1: Database Abstraction Layer ❌ NOT STARTED
**Package:** `@bytelyst/datastore`
**Effort:** 57 days
**Effort:** 710 days (revised up from 57 — now 78 files vs original 44)
**This is the most important sprint — it eliminates 80% of cloud lock-in.**
> **Current state (2026-03-02):** No `@bytelyst/datastore` package exists. All 78 repository files use `getContainer()` from `@bytelyst/cosmos` → raw Cosmos SQL. The `@bytelyst/cosmos` package itself wraps `@azure/cosmos` directly. No in-memory adapter exists for testing.
### 4.1 Interface Design
```typescript
@ -459,7 +468,7 @@ export function createDatastoreProvider(): DatastoreProvider {
}
```
### 4.7 Migration Plan for 38 Repository Files
### 4.7 Migration Plan for 78 Repository Files (revised count)
Migrate in batches, one module per commit. Each commit:
@ -470,18 +479,21 @@ Migrate in batches, one module per commit. Each commit:
**Batch order** (simplest first, complex last):
| Batch | Modules | Complexity | Notes |
| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- |
| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup |
| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files |
| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files |
| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files |
| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files |
| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files |
| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files |
| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files |
| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files |
| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files |
| Batch | Modules | Complexity | Notes |
| ----- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- |
| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup |
| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files |
| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files |
| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files |
| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files |
| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files |
| 7 | ChronoMind backend: timers, routines, households, shared-timers, webhooks | Sync logic, batch ops | 5 files |
| 8 | JarvisJr backend: jarvis-agents, jarvis-sessions, jarvis-memory | Agent memory queries | 3 files |
| 9 | NomGap backend: fasting-sessions, fasting-protocols, meal-log, social-fasting, push-triggers | Product-specific | 5 files |
| 10 | PeakPulse + MindLyst backends: peak-sessions, peak-routes, brains, memory, reflections, daily-briefs, streaks | Product-specific | 7 files |
| 11 | LysnrAI backend + user-dashboard: transcripts, sessions, organizations, api-tokens, webhooks, themes, export + 5 dashboard repos | Product-specific + dashboard | 10 files |
| 12 | Dashboard cosmos clients (admin-web, tracker-web, MindLyst web) | Direct `@azure/cosmos` | 3 files |
| 13 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files |
### 4.8 Handling Complex Queries
@ -526,12 +538,14 @@ The Cosmos adapter translates these to SQL. The MongoDB adapter passes them dire
---
## 5. Sprint 2: Storage Abstraction Layer
## 5. Sprint 2: Storage Abstraction Layer ❌ NOT STARTED
**Package:** `@bytelyst/storage`
**Effort:** 2 days
**Files changed:** `packages/blob/src/blob.ts`, `src/cloud/blob_client.py`, `services/platform-service/src/modules/blob/`
> **Current state (2026-03-02):** `@bytelyst/blob` (162 LOC) wraps `@azure/storage-blob` directly with `BlobServiceClient` and `generateBlobSASQueryParameters`. `@bytelyst/blob-client` exists as a client-side package. No storage abstraction interface exists.
### 5.1 Interface Design
```typescript
@ -614,12 +628,14 @@ const url = await bucket.getSignedUrl('user123/recording.wav', { permissions: 'r
---
## 6. Sprint 3: LLM Provider Abstraction
## 6. Sprint 3: LLM Provider Abstraction ⚠️ PRECURSOR EXISTS
**Package:** `@bytelyst/llm`
**Effort:** 2 days
**Files changed:** `src/llm/text_cleaner.py`, `backend/src/clients/openai_client.py`, MindLyst `web/src/lib/llm.ts`, extraction-service config
> **Current state (2026-03-02):** MindLyst `web/src/lib/llm.ts` already implements a dual-provider pattern with `resolveProvider()` that reads `OPENAI_PROVIDER` / `LLM_PROVIDER` env var and auto-detects Azure vs OpenAI from endpoint URLs. This is exactly the pattern this sprint proposes to extract into a shared `@bytelyst/llm` package. LysnrAI `text_cleaner.py` still imports `AzureOpenAI` directly. Extraction-service uses Azure OpenAI via `server.ts` config.
### 6.1 Interface Design
```typescript
@ -672,12 +688,14 @@ The `openai` Python SDK already has a common interface between `OpenAI` and `Azu
---
## 7. Sprint 4: Secrets Manager Abstraction
## 7. Sprint 4: Secrets Manager Abstraction ⚠️ PRECURSOR EXISTS
**Package:** Refactor existing `@bytelyst/config`
**Effort:** 1 day
**Files changed:** `packages/config/src/keyvault.ts`, `src/secrets/keyvault.py`
> **Current state (2026-03-02):** `@bytelyst/config` exports `resolveKeyVaultSecrets()` which already skips if `AZURE_KEYVAULT_URL` is unset and falls back to env vars. It has 9 unit tests. The function name and imports are Azure-specific (`@azure/identity`, `@azure/keyvault-secrets`) but the fallback-to-env behavior means services already work without Azure Key Vault. This sprint is essentially a rename + provider dispatch — lowest effort of all sprints.
### 7.1 Key Insight: Already 90% Done
The current `resolveKeyVaultSecrets()` already:
@ -752,12 +770,14 @@ This means existing `.env` files with `AZURE_*` names continue to work. New depl
---
## 8. Sprint 5: Speech Provider Abstraction
## 8. Sprint 5: Speech Provider Abstraction ⚠️ PRECURSOR EXISTS
**Package:** `@bytelyst/speech`
**Effort:** 34 days
**Files changed:** `src/audio/azure_stt.py`, `iosApp/Services/AzureSpeechTranscriber.swift`
> **Current state (2026-03-02):** LysnrAI already has a **`SttRouter`** class in `src/audio/stt_router.py` that routes between `AzureSpeechToText` (online) and `WhisperSpeechToText` (offline/local) based on connectivity. Both engines share the same interface: `start()`, `push_audio()`, `stop()`. This is exactly the provider pattern this sprint proposes. The refactor would extract the protocol/ABC and add the factory function. iOS apps still use Azure Speech SDK directly via `AzureSpeechTranscriber.swift`.
### 8.1 Interface Design (Python)
```python
@ -845,12 +865,14 @@ The abstraction hides these differences behind a unified push-audio + callback i
---
## 9. Sprint 6: Push Notification Abstraction
## 9. Sprint 6: Push Notification Abstraction ❌ NOT STARTED
**Package:** `@bytelyst/push`
**Effort:** 1 day
**Files changed:** Platform-service push-triggers module
> **Current state (2026-03-02):** No push notification infrastructure exists in platform-service. NomGap has `push-triggers` module in its product backend that stores push trigger rules in Cosmos, but no actual delivery mechanism (APNS/FCM) is implemented. The `@bytelyst/delivery` module handles email but not push. No Azure Notification Hub, Firebase, or Expo push integration exists.
### 9.1 Interface Design
```typescript
@ -873,9 +895,9 @@ Implementations: `AzureNotificationHubProvider`, `FirebaseProvider` (future), `E
---
## 10. Sprint 7: Monitoring & Telemetry Abstraction
## 10. Sprint 7: Monitoring & Telemetry Abstraction ✅ ALREADY DONE
**Effort:** 0.5 days (mostly done already)
**Effort:** 0 days (complete)
The ecosystem already has cloud-agnostic monitoring:
@ -883,10 +905,11 @@ The ecosystem already has cloud-agnostic monitoring:
- **Loki + Grafana** in `services/monitoring/`
- **Health checks** via `/health` endpoints on all services
**Remaining work:**
**Remaining work (minor):**
- Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights)
- Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers
- All TS services already use `pino` via Fastify built-in logger. Python desktop uses `structlog`. ✅
---
@ -901,7 +924,7 @@ Once all sprints are complete, here's how much work each cloud migration scenari
| Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() |
| Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change |
| Run data migration script | 24 hours | Export Cosmos JSON → import to MongoDB |
| Run full test suite | 30 minutes | Verify all 1,029+ tests pass |
| Run full test suite | 30 minutes | Verify all ~1,713 service tests pass |
| **Total** | **~1.5 days** | vs 35 weeks without abstraction |
### Scenario: Switch Storage from Azure Blob to S3
@ -1214,19 +1237,28 @@ packages/llm/
## Summary
| Sprint | What | Days | After This Sprint... |
| --------- | -------------------- | --------------- | ---------------------------------------------------------- |
| 1 | Database abstraction | 57 | DB swap = implement 1 adapter (~200 LOC) + config change |
| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change |
| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) |
| 4 | Secrets abstraction | 1 | Secrets swap = config change only |
| 5 | Speech abstraction | 34 | Speech swap = implement 1 adapter (~300 LOC) |
| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) |
| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic |
| **Total** | | **~1517 days** | **Full cloud migration = ~710 days instead of 48 weeks** |
| Sprint | What | Status | Days | After This Sprint... |
| --------- | -------------------- | ------------------- | --------------- | ---------------------------------------------------------- |
| 1 | Database abstraction | ❌ NOT STARTED | 710 | DB swap = implement 1 adapter (~200 LOC) + config change |
| 2 | Storage abstraction | ❌ NOT STARTED | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change |
| 3 | LLM abstraction | ⚠️ PRECURSOR EXISTS | 2 | LLM swap = config change only (10 minutes) |
| 4 | Secrets abstraction | ⚠️ PRECURSOR EXISTS | 1 | Secrets swap = config change only |
| 5 | Speech abstraction | ⚠️ PRECURSOR EXISTS | 34 | Speech swap = implement 1 adapter (~300 LOC) |
| 6 | Push abstraction | ❌ NOT STARTED | 1 | Push swap = implement 1 adapter (~50 LOC) |
| 7 | Monitoring cleanup | ✅ ALREADY DONE | 0 | Already cloud-agnostic |
| **Total** | | **1/7 done** | **~1620 days** | **Full cloud migration = ~710 days instead of 48 weeks** |
The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy.
### Key Changes Since Original Document (2026-03-02 Review)
1. **File count increased from 44 → 78** — product-specific backend modules were split into 6 product repos after this doc was written. Each repo has its own `backend/` with repository files using `getContainer()`.
2. **Sprint 1 effort revised from 57 → 710 days** — more repository files to migrate.
3. **Batch plan expanded from 10 → 13 batches** — added batches for 6 product backends.
4. **3 precursors identified** — MindLyst LLM auto-detect, LysnrAI STT router, `resolveKeyVaultSecrets()` env-var fallback.
5. **Sprint 7 confirmed complete** — custom telemetry, Loki+Grafana, pino/structlog logging all in place.
6. **Migration effort table updated** — test count changed from 1,029 → ~1,713 (includes product backend tests).
---
_Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves._
_Document generated by automated codebase analysis. Last reviewed 2026-03-02 (comprehensive workspace scan). Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`._