diff --git a/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md b/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md index 43c12bc0..64ba1a4e 100644 --- a/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md +++ b/docs/roadmaps/not-started/cloud_AGNOSTIC_REFACTOR_ROADMAP.md @@ -1,9 +1,12 @@ # Cloud-Agnostic Refactor Roadmap — ByteLyst Ecosystem > **Author:** AI Analysis (Cascade) -> **Date:** 2026-03-01 -> **Companion doc:** [`CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`](./CLOUD_PROVIDER_MIGRATION_ANALYSIS.md) +> **Date:** 2026-03-01 · **Last reviewed:** 2026-03-02 (comprehensive workspace scan) +> **Companion doc:** [`CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`](../../architecture/CLOUD_PROVIDER_MIGRATION_ANALYSIS.md) +> **Repos scanned:** `learning_ai_common_plat` (platform-service, 23 packages) · `learning_voice_ai_agent` (LysnrAI) · `learning_multimodal_memory_agents` (MindLyst) · `learning_ai_clock` (ChronoMind) · `learning_ai_jarvis_jr` (JarvisJr) · `learning_ai_fastgap` (NomGap) · `learning_ai_peakpulse` (PeakPulse) > **Goal:** Refactor the codebase so it continues to work on Azure today, but switching to any other cloud provider requires **minimum effort** (days, not weeks). +> +> **Status as of 2026-03-02:** None of the 7 sprints have been started. All Azure SDK usage remains direct. 2 partial precursors exist (LysnrAI STT router, MindLyst LLM provider auto-detect). Monitoring (Sprint 7) is already cloud-agnostic. --- @@ -81,12 +84,16 @@ routes.ts ────────► │ container() │ @azure/cosmos SDK ``` -**Problems:** +**Problems (verified 2026-03-02):** -- 38 platform-service repository files write raw Cosmos SQL queries -- 6 additional repository files in dashboards + MindLyst web -- Blob, Speech, OpenAI all have direct Azure SDK imports -- Switching DB means rewriting 44+ files +- **47** platform-service repository files use `getContainer()` → raw Cosmos SQL queries +- **21** product-backend repository files across 6 repos (ChronoMind 5, JarvisJr 3, NomGap 5, PeakPulse 2, LysnrAI 5, MindLyst 5 + 1 test) +- **8** dashboard/web cosmos clients (LysnrAI user-dashboard 5, MindLyst web 1, admin-web 1, tracker-web 1) +- **2** Python cosmos clients (`src/cloud/cosmos_client.py`, `src/cloud/api_sync.py`) +- `@bytelyst/blob` (162 LOC) wraps `@azure/storage-blob` directly +- `@bytelyst/config` `resolveKeyVaultSecrets()` imports `@azure/keyvault-secrets` directly +- LysnrAI desktop: `azure_stt.py` uses Azure Speech SDK, `text_cleaner.py` uses `AzureOpenAI` +- **Total: ~78 files need migration** (was estimated 44+ — significantly more after product backend split) ### Target: Provider-Agnostic Interfaces @@ -123,16 +130,16 @@ routes.ts ────────► │ collection.findMany({ │ ## 3. Sprint Plan Overview -| Sprint | Package / Scope | Effort | Files Changed | Risk | -| --------- | ------------------------------------------------- | --------------- | ----------------------------------- | -------- | -| **1** | `@bytelyst/datastore` — DB abstraction | 5–7 days | 44 repository files + 1 new package | Medium | -| **2** | `@bytelyst/storage` — Blob/Object abstraction | 2 days | 3 files + 1 new package | Low | -| **3** | `@bytelyst/llm` — LLM provider abstraction | 2 days | 4 files + 1 new package | Low | -| **4** | `@bytelyst/secrets` — Secrets manager abstraction | 1 day | 2 files (refactor existing) | Very Low | -| **5** | `@bytelyst/speech` — Speech STT abstraction | 3–4 days | 3 files + 1 new package | Medium | -| **6** | `@bytelyst/push` — Push notification abstraction | 1 day | 1 file + 1 new package | Very Low | -| **7** | Monitoring/Telemetry cleanup | 0.5 days | Already done (custom telemetry) | None | -| **Total** | | **~15–17 days** | ~55 files | | +| Sprint | Package / Scope | Status | Effort | Files Changed (updated) | Risk | +| --------- | ------------------------------------------------- | ------------------- | --------------- | ----------------------------------------------------------------------------------------- | -------- | +| **1** | `@bytelyst/datastore` — DB abstraction | ❌ NOT STARTED | 7–10 days | **78** repository files + 1 new package (was 44 — now includes product backends) | Medium | +| **2** | `@bytelyst/storage` — Blob/Object abstraction | ❌ NOT STARTED | 2 days | 3 files + 1 new package | Low | +| **3** | `@bytelyst/llm` — LLM provider abstraction | ⚠️ PRECURSOR EXISTS | 2 days | 4 files + 1 new package. MindLyst `llm.ts` already auto-detects Azure vs OpenAI | Low | +| **4** | `@bytelyst/secrets` — Secrets manager abstraction | ⚠️ PRECURSOR EXISTS | 1 day | 2 files (refactor existing `resolveKeyVaultSecrets()`) | Very Low | +| **5** | `@bytelyst/speech` — Speech STT abstraction | ⚠️ PRECURSOR EXISTS | 3–4 days | 3 files + 1 new package. LysnrAI `stt_router.py` already routes Azure↔Whisper | Medium | +| **6** | `@bytelyst/push` — Push notification abstraction | ❌ NOT STARTED | 1 day | 1 file + 1 new package. No push infra exists yet | Very Low | +| **7** | Monitoring/Telemetry cleanup | ✅ ALREADY DONE | 0 days | Custom telemetry via `@bytelyst/telemetry-client`, Loki+Grafana in `services/monitoring/` | None | +| **Total** | | | **~16–20 days** | ~90 files | | ### Priority Order @@ -148,12 +155,14 @@ Sprint 5 (Speech) ──► Sprint 6 (Push) ──► Sprint 7 (Monitoring) --- -## 4. Sprint 1: Database Abstraction Layer +## 4. Sprint 1: Database Abstraction Layer ❌ NOT STARTED **Package:** `@bytelyst/datastore` -**Effort:** 5–7 days +**Effort:** 7–10 days (revised up from 5–7 — now 78 files vs original 44) **This is the most important sprint — it eliminates 80% of cloud lock-in.** +> **Current state (2026-03-02):** No `@bytelyst/datastore` package exists. All 78 repository files use `getContainer()` from `@bytelyst/cosmos` → raw Cosmos SQL. The `@bytelyst/cosmos` package itself wraps `@azure/cosmos` directly. No in-memory adapter exists for testing. + ### 4.1 Interface Design ```typescript @@ -459,7 +468,7 @@ export function createDatastoreProvider(): DatastoreProvider { } ``` -### 4.7 Migration Plan for 38 Repository Files +### 4.7 Migration Plan for 78 Repository Files (revised count) Migrate in batches, one module per commit. Each commit: @@ -470,18 +479,21 @@ Migrate in batches, one module per commit. Each commit: **Batch order** (simplest first, complex last): -| Batch | Modules | Complexity | Notes | -| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- | -| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup | -| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files | -| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files | -| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files | -| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files | -| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files | -| 7 | timers, shared-timers, routines, households | Sync logic, batch ops | 4 files | -| 8 | fasting-sessions, fasting-protocols, meal-log, social-fasting, daily-briefs, streaks, push-triggers, impersonation, status, memory, analytics, waitlist | Product-specific + remaining | 12 files | -| 9 | Dashboard cosmos clients (admin-web, MindLyst web) | Direct `@azure/cosmos` | 6 files | -| 10 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files | +| Batch | Modules | Complexity | Notes | +| ----- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------- | +| 1 | flags, plans, settings, changelog, products | Simple CRUD | 5 files, warmup | +| 2 | licenses, sessions, ip-rules, maintenance, feedback | Simple CRUD + filters | 5 files | +| 3 | items, comments, votes, brains, reflections | CRUD + filter combos | 5 files | +| 4 | audit, delivery, notifications, exports, jobs | CRUD + time queries | 5 files | +| 5 | tokens, usage, invitations, referrals, webhooks | More complex queries | 5 files | +| 6 | auth, subscriptions, telemetry, experiments | Complex (GROUP BY, aggregates) | 4 files | +| 7 | ChronoMind backend: timers, routines, households, shared-timers, webhooks | Sync logic, batch ops | 5 files | +| 8 | JarvisJr backend: jarvis-agents, jarvis-sessions, jarvis-memory | Agent memory queries | 3 files | +| 9 | NomGap backend: fasting-sessions, fasting-protocols, meal-log, social-fasting, push-triggers | Product-specific | 5 files | +| 10 | PeakPulse + MindLyst backends: peak-sessions, peak-routes, brains, memory, reflections, daily-briefs, streaks | Product-specific | 7 files | +| 11 | LysnrAI backend + user-dashboard: transcripts, sessions, organizations, api-tokens, webhooks, themes, export + 5 dashboard repos | Product-specific + dashboard | 10 files | +| 12 | Dashboard cosmos clients (admin-web, tracker-web, MindLyst web) | Direct `@azure/cosmos` | 3 files | +| 13 | Python clients (desktop cosmos, backend cosmos) | `azure.cosmos` → abstracted | 2 files | ### 4.8 Handling Complex Queries @@ -526,12 +538,14 @@ The Cosmos adapter translates these to SQL. The MongoDB adapter passes them dire --- -## 5. Sprint 2: Storage Abstraction Layer +## 5. Sprint 2: Storage Abstraction Layer ❌ NOT STARTED **Package:** `@bytelyst/storage` **Effort:** 2 days **Files changed:** `packages/blob/src/blob.ts`, `src/cloud/blob_client.py`, `services/platform-service/src/modules/blob/` +> **Current state (2026-03-02):** `@bytelyst/blob` (162 LOC) wraps `@azure/storage-blob` directly with `BlobServiceClient` and `generateBlobSASQueryParameters`. `@bytelyst/blob-client` exists as a client-side package. No storage abstraction interface exists. + ### 5.1 Interface Design ```typescript @@ -614,12 +628,14 @@ const url = await bucket.getSignedUrl('user123/recording.wav', { permissions: 'r --- -## 6. Sprint 3: LLM Provider Abstraction +## 6. Sprint 3: LLM Provider Abstraction ⚠️ PRECURSOR EXISTS **Package:** `@bytelyst/llm` **Effort:** 2 days **Files changed:** `src/llm/text_cleaner.py`, `backend/src/clients/openai_client.py`, MindLyst `web/src/lib/llm.ts`, extraction-service config +> **Current state (2026-03-02):** MindLyst `web/src/lib/llm.ts` already implements a dual-provider pattern with `resolveProvider()` that reads `OPENAI_PROVIDER` / `LLM_PROVIDER` env var and auto-detects Azure vs OpenAI from endpoint URLs. This is exactly the pattern this sprint proposes to extract into a shared `@bytelyst/llm` package. LysnrAI `text_cleaner.py` still imports `AzureOpenAI` directly. Extraction-service uses Azure OpenAI via `server.ts` config. + ### 6.1 Interface Design ```typescript @@ -672,12 +688,14 @@ The `openai` Python SDK already has a common interface between `OpenAI` and `Azu --- -## 7. Sprint 4: Secrets Manager Abstraction +## 7. Sprint 4: Secrets Manager Abstraction ⚠️ PRECURSOR EXISTS **Package:** Refactor existing `@bytelyst/config` **Effort:** 1 day **Files changed:** `packages/config/src/keyvault.ts`, `src/secrets/keyvault.py` +> **Current state (2026-03-02):** `@bytelyst/config` exports `resolveKeyVaultSecrets()` which already skips if `AZURE_KEYVAULT_URL` is unset and falls back to env vars. It has 9 unit tests. The function name and imports are Azure-specific (`@azure/identity`, `@azure/keyvault-secrets`) but the fallback-to-env behavior means services already work without Azure Key Vault. This sprint is essentially a rename + provider dispatch — lowest effort of all sprints. + ### 7.1 Key Insight: Already 90% Done The current `resolveKeyVaultSecrets()` already: @@ -752,12 +770,14 @@ This means existing `.env` files with `AZURE_*` names continue to work. New depl --- -## 8. Sprint 5: Speech Provider Abstraction +## 8. Sprint 5: Speech Provider Abstraction ⚠️ PRECURSOR EXISTS **Package:** `@bytelyst/speech` **Effort:** 3–4 days **Files changed:** `src/audio/azure_stt.py`, `iosApp/Services/AzureSpeechTranscriber.swift` +> **Current state (2026-03-02):** LysnrAI already has a **`SttRouter`** class in `src/audio/stt_router.py` that routes between `AzureSpeechToText` (online) and `WhisperSpeechToText` (offline/local) based on connectivity. Both engines share the same interface: `start()`, `push_audio()`, `stop()`. This is exactly the provider pattern this sprint proposes. The refactor would extract the protocol/ABC and add the factory function. iOS apps still use Azure Speech SDK directly via `AzureSpeechTranscriber.swift`. + ### 8.1 Interface Design (Python) ```python @@ -845,12 +865,14 @@ The abstraction hides these differences behind a unified push-audio + callback i --- -## 9. Sprint 6: Push Notification Abstraction +## 9. Sprint 6: Push Notification Abstraction ❌ NOT STARTED **Package:** `@bytelyst/push` **Effort:** 1 day **Files changed:** Platform-service push-triggers module +> **Current state (2026-03-02):** No push notification infrastructure exists in platform-service. NomGap has `push-triggers` module in its product backend that stores push trigger rules in Cosmos, but no actual delivery mechanism (APNS/FCM) is implemented. The `@bytelyst/delivery` module handles email but not push. No Azure Notification Hub, Firebase, or Expo push integration exists. + ### 9.1 Interface Design ```typescript @@ -873,9 +895,9 @@ Implementations: `AzureNotificationHubProvider`, `FirebaseProvider` (future), `E --- -## 10. Sprint 7: Monitoring & Telemetry Abstraction +## 10. Sprint 7: Monitoring & Telemetry Abstraction ✅ ALREADY DONE -**Effort:** 0.5 days (mostly done already) +**Effort:** 0 days (complete) The ecosystem already has cloud-agnostic monitoring: @@ -883,10 +905,11 @@ The ecosystem already has cloud-agnostic monitoring: - **Loki + Grafana** in `services/monitoring/` - **Health checks** via `/health` endpoints on all services -**Remaining work:** +**Remaining work (minor):** - Remove `opencensus-ext-azure` from Python requirements (optional, only used for App Insights) - Ensure all structured logging uses `pino` (TS) or `structlog` (Python) — no Azure-specific loggers +- All TS services already use `pino` via Fastify built-in logger. Python desktop uses `structlog`. ✅ --- @@ -901,7 +924,7 @@ Once all sprints are complete, here's how much work each cloud migration scenari | Implement `MongoDatastoreProvider` | 1 day | ~200 lines — translate FilterMap to MongoDB find() | | Set `DB_PROVIDER=mongo` + `MONGO_URI=...` | 5 minutes | Config change | | Run data migration script | 2–4 hours | Export Cosmos JSON → import to MongoDB | -| Run full test suite | 30 minutes | Verify all 1,029+ tests pass | +| Run full test suite | 30 minutes | Verify all ~1,713 service tests pass | | **Total** | **~1.5 days** | vs 3–5 weeks without abstraction | ### Scenario: Switch Storage from Azure Blob to S3 @@ -1214,19 +1237,28 @@ packages/llm/ ## Summary -| Sprint | What | Days | After This Sprint... | -| --------- | -------------------- | --------------- | ---------------------------------------------------------- | -| 1 | Database abstraction | 5–7 | DB swap = implement 1 adapter (~200 LOC) + config change | -| 2 | Storage abstraction | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change | -| 3 | LLM abstraction | 2 | LLM swap = config change only (10 minutes) | -| 4 | Secrets abstraction | 1 | Secrets swap = config change only | -| 5 | Speech abstraction | 3–4 | Speech swap = implement 1 adapter (~300 LOC) | -| 6 | Push abstraction | 1 | Push swap = implement 1 adapter (~50 LOC) | -| 7 | Monitoring cleanup | 0.5 | Already cloud-agnostic | -| **Total** | | **~15–17 days** | **Full cloud migration = ~7–10 days instead of 4–8 weeks** | +| Sprint | What | Status | Days | After This Sprint... | +| --------- | -------------------- | ------------------- | --------------- | ---------------------------------------------------------- | +| 1 | Database abstraction | ❌ NOT STARTED | 7–10 | DB swap = implement 1 adapter (~200 LOC) + config change | +| 2 | Storage abstraction | ❌ NOT STARTED | 2 | Blob swap = implement 1 adapter (~100 LOC) + config change | +| 3 | LLM abstraction | ⚠️ PRECURSOR EXISTS | 2 | LLM swap = config change only (10 minutes) | +| 4 | Secrets abstraction | ⚠️ PRECURSOR EXISTS | 1 | Secrets swap = config change only | +| 5 | Speech abstraction | ⚠️ PRECURSOR EXISTS | 3–4 | Speech swap = implement 1 adapter (~300 LOC) | +| 6 | Push abstraction | ❌ NOT STARTED | 1 | Push swap = implement 1 adapter (~50 LOC) | +| 7 | Monitoring cleanup | ✅ ALREADY DONE | 0 | Already cloud-agnostic | +| **Total** | | **1/7 done** | **~16–20 days** | **Full cloud migration = ~7–10 days instead of 4–8 weeks** | The key insight: **~80% of migration effort is in Sprint 1 (database)**. If you only do one sprint, do that one. Everything else is comparatively easy. +### Key Changes Since Original Document (2026-03-02 Review) + +1. **File count increased from 44 → 78** — product-specific backend modules were split into 6 product repos after this doc was written. Each repo has its own `backend/` with repository files using `getContainer()`. +2. **Sprint 1 effort revised from 5–7 → 7–10 days** — more repository files to migrate. +3. **Batch plan expanded from 10 → 13 batches** — added batches for 6 product backends. +4. **3 precursors identified** — MindLyst LLM auto-detect, LysnrAI STT router, `resolveKeyVaultSecrets()` env-var fallback. +5. **Sprint 7 confirmed complete** — custom telemetry, Loki+Grafana, pino/structlog logging all in place. +6. **Migration effort table updated** — test count changed from 1,029 → ~1,713 (includes product backend tests). + --- -_Document generated by automated codebase analysis. Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`. Review as the codebase evolves._ +_Document generated by automated codebase analysis. Last reviewed 2026-03-02 (comprehensive workspace scan). Companion to `CLOUD_PROVIDER_MIGRATION_ANALYSIS.md`._