docs(telemetry): update roadmap — correct test counts (158), add Phase 4 operational wiring gaps

This commit is contained in:
saravanakumardb1 2026-02-17 18:41:38 -08:00
parent 3c5b50ac86
commit 4f905f1231

View File

@ -1,7 +1,7 @@
# Client Telemetry — Implementation Roadmap
> **Status:** Phase 2 complete ✅, Phase 3 complete ✅
> **Last updated:** 2026-02-17
> **Status:** Phases 03 code complete ✅ · Phase 4 (Operational Wiring) **NOT STARTED** 🔴
> **Last updated:** 2026-02-17 (reviewed for accuracy against running code)
> **Design doc:** [`CLIENT_TELEMETRY_DESIGN.md`](./CLIENT_TELEMETRY_DESIGN.md)
> **Repos:** `learning_ai_common_plat` (platform-service) · `learning_voice_ai_agent` (all clients + dashboards)
@ -157,6 +157,87 @@
---
## Phase 4 — Operational Wiring (NOT STARTED 🔴)
> **This phase bridges "code exists" → "telemetry actually flows."**
> All Phases 03 are code-complete, but **no telemetry data has ever reached the server** from any real client.
> The items below are required before the telemetry system can be called "done."
### 4.1 — Platform-Service Deployment
- [ ] Deploy platform-service to a **publicly reachable URL** (Azure Container Apps, Azure App Service, or VM)
- [ ] Configure DNS / reverse proxy so clients can reach `https://api.lysnrai.com` (or similar)
- [ ] Set env vars: `COSMOS_ENDPOINT`, `COSMOS_KEY`, `TELEMETRY_ENABLED=true`
- [ ] Run `scripts/cosmos-telemetry-indexes.sh` against live Cosmos DB to create containers + indexes
- [ ] Verify `POST /api/telemetry/events` accepts a test payload from `curl`
### 4.2 — iOS Keyboard Extension Wiring
- [ ] **Register App Groups capability** in Apple Developer portal for both `com.bytelyst.LysnrAI` and `com.bytelyst.LysnrAI.keyboard`
- [ ] **Restore entitlements** in TestFlight builds (currently cleared because provisioning profile lacks App Groups)
- `LysnrAI.entitlements`: `aps-environment` + `com.apple.security.application-groups`
- `LysnrKeyboard.entitlements`: `com.apple.security.application-groups`
- [ ] **Write `platform_service_url`** to App Group UserDefaults — currently `LysnrTelemetry.swift` reads `platform_service_url` from App Group (line 80) but **nothing writes it**
- Option A: Main app writes URL on launch from env/config
- Option B: Hardcode URL in `LysnrTelemetry.swift` init
- Option C: Bundle in `env.dev` and read from shared config
- [ ] **Verify mic permission flow on physical device** — keyboard extensions may not show permission prompts; main app must request mic permission first. Current "Mic error" on device likely caused by this.
- [ ] Test Full Access ON vs OFF paths on physical device
### 4.3 — iOS Main App TelemetryService Integration
- [ ] Verify `TelemetryService.swift` reads `platform_service_url` from config/env and writes to App Group
- [ ] Verify keyboard queue drain works: main app foreground → reads App Group `telemetry_event_queue` → POSTs to server
- [ ] Test lifecycle: app backgrounded → keyboard generates events → app foregrounded → events flushed
### 4.4 — Desktop App Wiring
- [ ] Set `PLATFORM_SERVICE_URL` env var in `~/.LysnrAI/.env` pointing to deployed service
- [ ] Verify `platform_telemetry.py` sends events on dictation start/stop
- [ ] Test offline → online queue drain
### 4.5 — Web Dashboard Wiring
- [ ] Set `PLATFORM_SERVICE_URL` in dashboard `.env.local` files
- [ ] Verify `/api/telemetry/ingest` proxy routes forward to deployed platform-service
- [ ] Verify admin dashboard `/ops/client-logs` page loads real data from platform-service
### 4.6 — Android Wiring
- [ ] Set platform service URL in Android app config
- [ ] Test SharedPreferences offline queue + foreground flush
- [ ] Verify keyboard instrumentation events reach server
### 4.7 — Webhook / Alert Configuration
- [ ] Set `TELEMETRY_ALERT_WEBHOOK_URL` env var (Slack webhook or equivalent)
- [ ] Test cluster severity escalation triggers webhook
- [ ] Set `TELEMETRY_GEO_API_URL` env var (ip-api.com or similar) for geo enrichment
### 4.8 — End-to-End Smoke Test
- [ ] iOS keyboard → platform-service → Cosmos → admin dashboard query — **full round-trip**
- [ ] Desktop → platform-service → Cosmos → admin dashboard query
- [ ] Web dashboard → platform-service ingest → admin dashboard query
- [ ] Trigger error cluster creation → verify cluster appears in admin UI
- [ ] Trigger rate limit → verify rejection in metrics tab
- [ ] GDPR erasure → verify events deleted from Cosmos
### Summary: What Blocks "100% Done"
| Blocker | Severity | Effort |
| --------------------------------------------------- | ----------- | ----------------------------------------------- |
| **Platform-service not deployed** | 🔴 Critical | Medium — needs Azure infra |
| **App Group entitlements not registered** | 🔴 Critical | Low — Apple Developer portal config |
| **`platform_service_url` not written to App Group** | 🔴 Critical | Low — one-line code change |
| **Cosmos containers not created in prod** | 🟡 High | Low — run indexing script |
| **Mic permission flow on device** | 🟡 High | Medium — needs device testing + possible UX fix |
| **Webhook URL not configured** | 🟢 Low | Trivial — env var |
| **Geo API URL not configured** | 🟢 Low | Trivial — env var |
| **Remaining test gaps (5 items)** | 🟢 Low | Medium — integration/e2e tests |
---
## Architecture Summary
```
@ -206,37 +287,51 @@
## Test Coverage
| Component | Test File | Tests | Coverage |
| -------------------------------- | ------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Platform-service schemas** | `telemetry.test.ts` | 34 | Zod schemas: event, ingest, policy CRUD, query |
| **Platform-service route logic** | `telemetry.test.ts` | 46 | `containsPII` (6), `computePk` (4), `normalizeMessage` (7), `generateFingerprint` (8), `policyMatchesContext` (13), `mergePolicies` (5), `checkRateLimit` (3) |
| **Desktop Python client** | `tests/cloud/test_platform_telemetry.py` | 19 | Event format (6), queue behavior (2), session mgmt (2), flush/HTTP (5), install ID (2), singleton (2) |
| **Web dashboard client** | `user-dashboard-web/src/__tests__/telemetry.test.ts` | 12 | `trackEvent` (3), `trackPageView` (1), `flush` (4), install ID (2), `initTelemetry` (2) |
| **Tracker dashboard client** | `tracker-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Admin dashboard client** | `admin-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Total** | | **131** | |
| Component | Test File | Tests | Coverage |
| --------------------------------- | ------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Platform-service telemetry** | `telemetry.test.ts` | 89 | Zod schemas (34), `containsPII` (6), `computePk` (4), `normalizeMessage` (7), `generateFingerprint` (8), `policyMatchesContext` (13), `mergePolicies` (5), `checkRateLimit` (3), plus additional route-logic tests |
| **iOS LysnrTelemetry (keyboard)** | `LysnrAITests/LysnrTelemetryTests.swift` | 18 | Identity (5), session management (2), event types (1), DictationContext (3), track (3), flush (2), queue (1), crash-safety (1) |
| **Desktop Python client** | `tests/cloud/test_platform_telemetry.py` | 19 | Event format (6), queue behavior (2), session mgmt (2), flush/HTTP (5), install ID (2), singleton (2) |
| **Web dashboard client** | `user-dashboard-web/src/__tests__/telemetry.test.ts` | 12 | `trackEvent` (3), `trackPageView` (1), `flush` (4), install ID (2), `initTelemetry` (2) |
| **Tracker dashboard client** | `tracker-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Admin dashboard client** | `admin-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Total** | | **158** | |
### Verification commands
```bash
# Platform-service (80 telemetry tests within 614 total)
# Platform-service (89 telemetry tests within 624 total)
cd ../learning_ai_common_plat && pnpm --filter @lysnrai/platform-service test
# iOS keyboard telemetry (18 tests)
cd learning_voice_ai_agent
xcodebuild test-without-building \
-workspace mobile_app/ios/LysnrAI.xcworkspace \
-scheme LysnrAITests \
-destination 'platform=iOS Simulator,name=iPhone 17 Pro' \
-only-testing:LysnrAITests/LysnrTelemetryTests
# Desktop Python (19 tests)
cd learning_voice_ai_agent && python -m pytest tests/cloud/test_platform_telemetry.py -v
python -m pytest tests/cloud/test_platform_telemetry.py -v
# Web user-dashboard (12 tests)
cd learning_voice_ai_agent/user-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
cd user-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
# Tracker dashboard (10 tests)
cd tracker-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
# Admin dashboard (10 tests)
cd admin-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
```
### Not yet tested
- [ ] iOS `LysnrTelemetry.swift` — needs XCTest target (Swift unit tests)
- [ ] iOS `TelemetryService.swift` — needs XCTest target
- [x] iOS `LysnrTelemetry.swift` — ✅ 18 XCTest unit tests (`LysnrTelemetryTests.swift`, build 28)
- [ ] iOS `TelemetryService.swift` (main app) — needs XCTest target for main app
- [ ] Android `TelemetryClient.kt` — needs Android instrumented tests or Robolectric
- [ ] Admin dashboard `/api/telemetry/route.ts` — API route integration test
- [ ] Platform-service HTTP integration tests (Fastify inject)
- [ ] End-to-end: client → platform-service → Cosmos read-back
- [ ] Platform-service HTTP integration tests (Fastify inject for telemetry routes)
- [ ] End-to-end: client → platform-service → Cosmos read-back → admin dashboard query
---