From af5684587dccd7711a9db8cb3e790021578ef232 Mon Sep 17 00:00:00 2001 From: saravanakumardb1 Date: Sat, 21 Mar 2026 09:01:35 -0700 Subject: [PATCH] docs(devops): fix 14 bugs/gaps in E2EE design + roadmap after codebase review --- docs/devops/END_TO_END_ENCRYPTION_DESIGN.md | 106 ++++++++++++------- docs/devops/END_TO_END_ENCRYPTION_ROADMAP.md | 60 ++++++----- 2 files changed, 100 insertions(+), 66 deletions(-) diff --git a/docs/devops/END_TO_END_ENCRYPTION_DESIGN.md b/docs/devops/END_TO_END_ENCRYPTION_DESIGN.md index f9a0bd01..a84f99ed 100644 --- a/docs/devops/END_TO_END_ENCRYPTION_DESIGN.md +++ b/docs/devops/END_TO_END_ENCRYPTION_DESIGN.md @@ -252,27 +252,33 @@ After encryption: - `dekId` identifies which DEK to unwrap — enables per-user key isolation - Non-sensitive fields (`userId`, `productId`, `createdAt`) remain in plaintext for querying - Cosmos DB queries on encrypted fields are impossible by design — use indexed plaintext metadata fields for filtering +- For array fields (e.g., JarvisJr `transcript[]`, MindLyst `repeatedThemes[]`), JSON-serialize the entire array before encryption, producing a single `EncryptedField` object. On decrypt, JSON-parse back to the original array. + +> **¹ Array field encryption:** Fields marked with ¹ in §4.6 are arrays or structs. The pattern is: `JSON.stringify(array)` → encrypt → store as `EncryptedField`. On read: decrypt → `JSON.parse()` → typed array. This keeps the schema simple (one encrypted blob per field) at the cost of losing per-element access. ### 4.6 Fields to Encrypt by Product -| Product | Container | Field(s) to Encrypt | Sensitivity | Priority | -| --------------- | ----------------- | --------------------------------------- | ----------- | --------------- | -| **LysnrAI** | `transcripts` | `transcriptText`, `rawAudio` (blob ref) | Critical | P0 | -| **LysnrAI** | `sessions` | `notes` | High | P1 | -| **JarvisJr** | `jarvis_sessions` | `transcript`, `coachingNotes` | Critical | P0 | -| **JarvisJr** | `jarvis_memory` | `content` | Critical | P0 | -| **NoteLett** | `notes` | `body` | Critical | P0 | -| **NoteLett** | `note_artifacts` | `content` | High | P1 | -| **MindLyst** | `memory_items` | `content`, `voiceTranscriptText` | Critical | P0 | -| **MindLyst** | `brains` | `description` | Medium | P2 | -| **MindLyst** | `reflections` | `content` | High | P1 | -| **NomGap** | `meal_logs` | `notes` | Low | P3 | -| **ChronoMind** | `timers` | (none — low sensitivity) | Low | Defer | -| **PeakPulse** | `peak_sessions` | (none — GPS/stats not sensitive) | Low | Defer | -| **FlowMonk** | `tasks` | `notes` | Low | P3 | -| **ActionTrail** | `trail_actions` | `beforeSnapshot`, `afterSnapshot` | Medium | P2 | -| **LocalMemGPT** | SQLite `messages` | `content` | High | P1 | -| **SmartAuth** | `auth_mfa` | `encryptedSecret` | Critical | ✅ Already done | +| Product | Container | Field(s) to Encrypt | Sensitivity | Priority | +| --------------- | ------------------ | ------------------------------------------------------------------------------- | ----------- | --------------- | +| **LysnrAI** | `transcripts` | `rawText`, `cleanedText` | Critical | P0 | +| **LysnrAI** | `sessions` | (no sensitive free-text fields) | — | Skip | +| **JarvisJr** | `jarvis_sessions` | `transcript` (array¹), `coachingNotes` (array¹) | Critical | P0 | +| **JarvisJr** | `jarvis_memory` | `content` | Critical | P0 | +| **NoteLett** | `notes` | `body` | Critical | P0 | +| **NoteLett** | `note_artifacts` | `content` | High | P1 | +| **MindLyst** | `memory_items` | `content`, `voiceTranscriptText` | Critical | P0 | +| **MindLyst** | `brains` | `description` | Medium | P2 | +| **MindLyst** | `reflections` | `repeatedThemes`, `suggestedAdjustments`, `vsLastWeek.summary` (arrays/struct¹) | High | P1 | +| **NomGap** | `meal_logs` | `notes` | Low | P3 | +| **NomGap** | `fasting_sessions` | `notes`, `moodCheckins[].notes` | Low | P3 | +| **NomGap** | `weight_logs` | `notes` | Low | P3 | +| **ChronoMind** | `timers` | (none — low sensitivity) | Low | Defer | +| **PeakPulse** | `peak_sessions` | (none — GPS/stats not sensitive) | Low | Defer | +| **FlowMonk** | `tasks` | `description` | Low | P3 | +| **ActionTrail** | `trail_actions` | `before`, `after` | Medium | P2 | +| **ActionTrail** | `trail_reverts` | `beforeState`, `afterState` | Medium | P2 | +| **LocalMemGPT** | SQLite `messages` | `content` | High | P1 | +| **SmartAuth** | `auth_mfa` | `encryptedSecret` | Critical | ✅ Already done | ### 4.7 Query Impact Analysis @@ -333,11 +339,11 @@ All platforms support AES-256-GCM natively — no external dependencies needed. ### 5.4 New SDK Components -| SDK | Component | Purpose | -| ------------------------- | ------------------------- | ----------------------------------------- | -| `@bytelyst/field-encrypt` | `createClientEncryptor()` | Web + React Native client-side encryption | -| `swift-platform-sdk` | `BLFieldEncrypt` | iOS/macOS/watchOS client-side encryption | -| `kotlin-platform-sdk` | `BLFieldEncrypt` | Android/Wear OS client-side encryption | +| SDK | Component | Purpose | +| -------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------- | +| `@bytelyst/client-encrypt` | `createClientEncryptor()` | Web + React Native client-side encryption (separate package — uses Web Crypto API, not `node:crypto`) | +| `swift-platform-sdk` | `BLFieldEncrypt` | iOS/macOS/watchOS client-side encryption | +| `kotlin-platform-sdk` | `BLFieldEncrypt` | Android/Wear OS client-side encryption | --- @@ -540,6 +546,7 @@ For Level 2, the existing `BLBlobClient` (Swift/Kotlin) and `@bytelyst/blob-clie | Azure Key Vault RBAC mode | ⚠️ Planned (currently access policies) | Tier 1 MEK management | | AKV `encrypt`/`decrypt`/`wrapKey`/`unwrapKey` permissions | Not configured | Tier 1 envelope encryption | | `@azure/keyvault-keys` npm package | Available (not installed) | Tier 1 AKV key provider | +| `_encryption_keys` Cosmos container (per product) | Not created | Tier 1 DEK storage | | RS256 JWT migration (SmartAuth Phase 4) | Planned | Tier 2 key distribution | | iCloud Keychain sharing entitlements | Not configured | Tier 3 iOS key sync | | QR code library (iOS/Android) | Not installed | Tier 3 cross-platform key transfer | @@ -565,13 +572,13 @@ For Level 2, the existing `BLBlobClient` (Swift/Kotlin) and `@bytelyst/blob-clie ## Appendix A: Algorithm Selection Rationale -| Algorithm | Use Case | Why | -| ----------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------- | -| **AES-256-GCM** | Field encryption (all tiers) | AEAD — provides confidentiality + integrity + authentication. Native support on all platforms. NIST approved. | -| **RSA-OAEP-256** | MEK wrapping in AKV | Azure Key Vault natively supports RSA key wrapping. 4096-bit RSA for quantum-resistant margin. | -| **PBKDF2-SHA256** | Web passphrase → key derivation | Web Crypto API native. 600,000 iterations (OWASP 2024 recommendation). | -| **X25519** | Tier 3 key exchange | Modern elliptic-curve DH. Used by Signal, WireGuard. Compact (32-byte keys). | -| **HKDF-SHA256** | Tier 3 key derivation from shared secret | Standard KDF for deriving symmetric keys from DH output. | +| Algorithm | Use Case | Why | +| ----------------- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **AES-256-GCM** | Field encryption (all tiers) | AEAD — provides confidentiality + integrity + authentication. Native support on all platforms. NIST approved. | +| **RSA-OAEP-256** | MEK wrapping in AKV | Azure Key Vault natively supports RSA key wrapping. 4096-bit RSA for maximum classical security margin. **Note:** RSA is NOT quantum-resistant (Shor's algorithm). Post-quantum migration path: AKV EC keys when NIST PQC standards are supported. | +| **PBKDF2-SHA256** | Web passphrase → key derivation | Web Crypto API native. 600,000 iterations (OWASP 2024 recommendation). | +| **X25519** | Tier 3 key exchange | Modern elliptic-curve DH. Used by Signal, WireGuard. Compact (32-byte keys). | +| **HKDF-SHA256** | Tier 3 key derivation from shared secret | Standard KDF for deriving symmetric keys from DH output. | ## Appendix B: Encrypted Field Detection Pattern @@ -599,14 +606,35 @@ async function getNote(id: string): Promise { } ``` -## Appendix C: Cost Impact +## Appendix C: Review Findings (2026-03-21) -| Resource | Current | After Tier 1 | Notes | -| -------------- | ----------------------- | ------------------------------- | ------------------------------------------------------------------- | -| AKV operations | ~100/day (secret reads) | ~500/day (+DEK unwraps) | Well within free tier (10K ops/month free) | -| AKV keys | 0 RSA keys | 10 RSA keys (1 MEK per product) | $1/key/month for software-protected; $5 for HSM | -| Cosmos RU | Baseline | +5-10% (larger encrypted docs) | Encrypted fields are ~30% larger than plaintext (base64 + metadata) | -| Compute | Baseline | +1-2% CPU | AES-256-GCM is hardware-accelerated (AES-NI) on all modern CPUs | -| Storage | Baseline | +15-20% for encrypted fields | Base64 encoding + IV + authTag + dekId overhead | +Systematic review against actual codebase. **14 issues found and fixed:** -**Estimated monthly cost increase: < $5** (dominated by AKV key charges). +| # | Type | Issue | Resolution | +| --- | ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | +| F1 | BUG | LysnrAI field name: doc said `transcriptText` but actual fields are `rawText` + `cleanedText` | Fixed §4.6 field table | +| F2 | BUG | LysnrAI `sessions.notes` doesn't exist — `SessionDoc` has no `notes` field | Removed from field table | +| F3 | BUG | ActionTrail field names: doc said `beforeSnapshot`/`afterSnapshot` but actual are `before`/`after` (actions) and `beforeState`/`afterState` (reverts) | Fixed §4.6 + split into 2 rows | +| F4 | BUG | JarvisJr `transcript` is `Array<{role,content,ts}>` not string | Added array encryption pattern (JSON-serialize → encrypt) | +| F5 | BUG | JarvisJr `coachingNotes` is `string[]` not string | Marked as array¹ in field table | +| F6 | BUG | FlowMonk tasks has no `notes` field — actual field is `description` | Fixed §4.6 | +| F7 | BUG | MindLyst reflections has no single `content` field — has array fields (`repeatedThemes`, `suggestedAdjustments`, etc.) | Fixed §4.6 with actual field names | +| F8 | BUG | RSA-4096 "quantum-resistant" claim is wrong — Shor's algorithm breaks all RSA | Fixed Appendix A + Decision Log | +| F9 | BUG | Cost section said "10K ops/month free" for AKV — AKV Standard charges per operation | Fixed Appendix D cost table | +| F10 | GAP | NomGap has 3 modules with `notes` (meal-log, fasting-sessions, weight-log) — doc only listed meal_logs | Added all 3 containers | +| F11 | GAP | Array field encryption pattern not addressed | Added array encryption note in §4.5 + footnote ¹ | +| F12 | GAP | `@bytelyst/client-encrypt` vs `@bytelyst/field-encrypt` naming inconsistency | Clarified: `field-encrypt` = server (node:crypto), `client-encrypt` = browser (SubtleCrypto) | +| F13 | GAP | DEK storage container `_encryption_keys` not in prerequisites | Added to §12 prerequisites table | +| F14 | GAP | ActionTrail reverts container missing from field table | Added `trail_reverts` row | + +## Appendix D: Cost Impact + +| Resource | Current | After Tier 1 | Notes | +| -------------- | ----------------------- | ------------------------------- | ------------------------------------------------------------------------------ | +| AKV operations | ~100/day (secret reads) | ~500/day (+DEK unwraps) | AKV Standard: $0.03/10K transactions. ~15K ops/month ≈ $0.05/month | +| AKV keys | 0 RSA keys | 10 RSA keys (1 MEK per product) | Software-protected: $0.03/10K ops; HSM-protected: $1/key/month + $0.03/10K ops | +| Cosmos RU | Baseline | +5-10% (larger encrypted docs) | Encrypted fields are ~30% larger than plaintext (base64 + metadata) | +| Compute | Baseline | +1-2% CPU | AES-256-GCM is hardware-accelerated (AES-NI) on all modern CPUs | +| Storage | Baseline | +15-20% for encrypted fields | Base64 encoding + IV + authTag + dekId overhead | + +**Estimated monthly cost increase: < $10** (dominated by Cosmos storage overhead for encrypted fields). diff --git a/docs/devops/END_TO_END_ENCRYPTION_ROADMAP.md b/docs/devops/END_TO_END_ENCRYPTION_ROADMAP.md index 2d7bfb05..bee6f89a 100644 --- a/docs/devops/END_TO_END_ENCRYPTION_ROADMAP.md +++ b/docs/devops/END_TO_END_ENCRYPTION_ROADMAP.md @@ -62,6 +62,7 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 - **Script:** `scripts/create-encryption-keys.sh` - [ ] **1.1.3** Add new env vars to `.env.example` and AKV secrets doc + ``` AZURE_KEYVAULT_URL=https://kv-mywisprai.vault.azure.net FIELD_ENCRYPT_KEY_PROVIDER=akv # 'akv' | 'env' | 'memory' @@ -165,9 +166,9 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 }); ``` -- [ ] **2.1.3** Encrypt `transcriptText` in transcripts module +- [ ] **2.1.3** Encrypt `rawText` and `cleanedText` in transcripts module - `repository.ts`: encrypt on `create()`, decrypt on `findById()` and `findByUserId()` - - Keep `transcriptText` field name but value changes from string to EncryptedField + - Keep field names but value changes from string to EncryptedField - **Migration:** Add `POST /api/admin/migrate-encryption` endpoint (admin-only) that batch-encrypts existing plaintext - [ ] **2.1.4** Update config schema with encryption env vars @@ -180,7 +181,7 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 - [ ] **2.1.5** Update tests — ensure existing tests pass with memory provider (no env vars needed) -**Commit:** `feat(lysnrai): encrypt transcriptText field with @bytelyst/field-encrypt` +**Commit:** `feat(lysnrai): encrypt rawText and cleanedText fields with @bytelyst/field-encrypt` #### 2.2 JarvisJr Backend (port 4012) @@ -189,8 +190,9 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 - [ ] **2.2.2** Create `backend/src/lib/field-encrypt.ts` encryptor singleton - [ ] **2.2.3** Encrypt fields in jarvis-sessions module - - `transcript` field — session voice transcripts - - `coachingNotes` field — extracted coaching insights + - `transcript` field — `Array<{role,content,ts}>`, JSON-serialize before encrypting as single EncryptedField + - `coachingNotes` field — `string[]`, JSON-serialize before encrypting as single EncryptedField + - **Note:** Array fields use the JSON-serialize → encrypt → store pattern (see design doc §4.5) - [ ] **2.2.4** Encrypt fields in jarvis-memory module - `content` field — per-agent persistent memory entries @@ -233,7 +235,9 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 - [ ] **3.2.1** Add dependency + create encryptor singleton - [ ] **3.2.2** Encrypt `content` and `voiceTranscriptText` in memory_items module -- [ ] **3.2.3** Encrypt `content` in reflections module +- [ ] **3.2.3** Encrypt array fields in reflections module + - `repeatedThemes` (string[]), `suggestedAdjustments` (string[]), `vsLastWeek.summary` (string) + - Array fields: JSON-serialize → encrypt → store as single EncryptedField each - [ ] **3.2.4** Update tests **Commit:** `feat(mindlyst): encrypt memory content, voice transcripts, and reflections` @@ -241,28 +245,30 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 #### 3.3 NomGap Backend (port 4013) - [ ] **3.3.1** Add dependency + create encryptor singleton -- [ ] **3.3.2** Encrypt `notes` field in meal-logs module (low sensitivity, but consistent) -- [ ] **3.3.3** Update tests +- [ ] **3.3.2** Encrypt `notes` field in meal-logs module +- [ ] **3.3.3** Encrypt `notes` field in fasting-sessions module (including `moodCheckins[].notes`) +- [ ] **3.3.4** Encrypt `notes` field in weight-log module +- [ ] **3.3.5** Update tests -**Commit:** `feat(nomgap): encrypt meal log notes` +**Commit:** `feat(nomgap): encrypt notes fields across meal-log, fasting-sessions, and weight-log` #### 3.4 ActionTrail Backend (port 4018) - [ ] **3.4.1** Add dependency + create encryptor singleton -- [ ] **3.4.2** Encrypt `beforeSnapshot` and `afterSnapshot` in actions module -- [ ] **3.4.3** Encrypt `beforeSnapshot` and `afterSnapshot` in reverts module +- [ ] **3.4.2** Encrypt `before` and `after` fields in actions module +- [ ] **3.4.3** Encrypt `beforeState` and `afterState` fields in reverts module - [ ] **3.4.4** Update tests **Commit:** `feat(actiontrail): encrypt action and revert snapshots` #### 3.5 Remaining Backends (Low Priority — Defer or Skip) -| Product | Backend Port | Encrypted Fields | Decision | -| ---------------------- | ------------------ | ----------------------------------- | -------- | -| **FlowMonk** (4017) | `tasks.notes` | Defer to Sprint 4 — low sensitivity | -| **ChronoMind** (4011) | None | Skip — timer configs not sensitive | -| **PeakPulse** (4010) | None | Skip — GPS/stats not sensitive | -| **LocalMemGPT** (4019) | `messages.content` | Sprint 5 (SQLite-specific approach) | +| Product | Backend Port | Encrypted Fields | Decision | +| ---------------------- | ------------------- | ----------------------------------- | -------- | +| **FlowMonk** (4017) | `tasks.description` | Defer to Sprint 4 — low sensitivity | +| **ChronoMind** (4011) | None | Skip — timer configs not sensitive | +| **PeakPulse** (4010) | None | Skip — GPS/stats not sensitive | +| **LocalMemGPT** (4019) | `messages.content` | Sprint 5 (SQLite-specific approach) | **Sprint 3 deliverable:** 6 product backends encrypted (LysnrAI, JarvisJr, NoteLett, MindLyst, NomGap, ActionTrail). @@ -349,7 +355,7 @@ Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-10 Week 11-14 #### 4.5 FlowMonk Backend (port 4017) -- [ ] **4.5.1** Add dependency + encrypt `tasks.notes` field +- [ ] **4.5.1** Add dependency + encrypt `tasks.description` field - [ ] **4.5.2** Update tests **Commit:** `feat(flowmonk): encrypt task notes` @@ -651,15 +657,15 @@ Each product backend gets 3-5 new tests: ## Decision Log -| Date | Decision | Rationale | -| ---------- | ---------------------------------------- | -------------------------------------------------------------------------- | -| 2026-03-21 | Use AES-256-GCM for all field encryption | AEAD, native on all platforms, NIST approved, matches existing MFA pattern | -| 2026-03-21 | Envelope encryption (MEK → DEK) | Enables key rotation without re-encrypting all data; per-user isolation | -| 2026-03-21 | Tier 3 E2EE only for JarvisJr + NoteLett | Server-side AI in other products requires plaintext access | -| 2026-03-21 | Skip ChronoMind + PeakPulse encryption | Timer configs and GPS stats are not sensitive data | -| 2026-03-21 | RSA-4096 for MEKs | Quantum-resistant margin; AKV native support | -| 2026-03-21 | 15-min DEK cache TTL | Balances AKV rate limits vs key exposure window | -| 2026-03-21 | `__encrypted` sentinel in documents | Enables gradual migration without schema changes | +| Date | Decision | Rationale | +| ---------- | ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | +| 2026-03-21 | Use AES-256-GCM for all field encryption | AEAD, native on all platforms, NIST approved, matches existing MFA pattern | +| 2026-03-21 | Envelope encryption (MEK → DEK) | Enables key rotation without re-encrypting all data; per-user isolation | +| 2026-03-21 | Tier 3 E2EE only for JarvisJr + NoteLett | Server-side AI in other products requires plaintext access | +| 2026-03-21 | Skip ChronoMind + PeakPulse encryption | Timer configs and GPS stats are not sensitive data | +| 2026-03-21 | RSA-4096 for MEKs | Maximum classical security margin; AKV native support. Not quantum-resistant — plan PQC migration when AKV supports it. | +| 2026-03-21 | 15-min DEK cache TTL | Balances AKV rate limits vs key exposure window | +| 2026-03-21 | `__encrypted` sentinel in documents | Enables gradual migration without schema changes | ---