learning_ai_common_plat/docs/WINDSURF/TELEMETRY_ROADMAP.md

384 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Client Telemetry — Implementation Roadmap
> **Status:** Phases 03 code complete ✅ · Phase 4 (Operational Wiring) **NOT STARTED** 🔴
> **Last updated:** 2026-02-17 (reviewed for accuracy against running code)
> **Design doc:** [`CLIENT_TELEMETRY_DESIGN.md`](./CLIENT_TELEMETRY_DESIGN.md)
> **Repos:** `learning_ai_common_plat` (platform-service) · `learning_voice_ai_agent` (all clients + dashboards)
---
## Phase 0 — Design & Review
- [x] Write comprehensive telemetry design doc — schema, APIs, admin UX, privacy guardrails ([`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e))
- [x] Systematic review: identify and fix 18 bugs/gaps in the design doc ([`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02))
- TTL format (ISO → seconds), `regionCode` prefix format, missing `pk` field
- Auth model for keyboard extension (`X-Install-Token`)
- Config endpoint query params (`userId`/`anonymousInstallId`)
- Error clustering made version-agnostic (`affectedVersions` array)
- GDPR erasure endpoint added
- iOS offline queue strategy (App Group UserDefaults, FIFO eviction)
- Global defaults for `batchSize`/`flushInterval`/`maxQueueSize`
---
## Phase 1 — MVP (iOS Keyboard + Backend + Admin UI)
### Platform-Service Telemetry Module
- [x] `types.ts` — Zod schemas for events, policies, clusters, queries ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] `repository.ts` — Cosmos DB CRUD for events, policies, clusters ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] `routes.ts` — Fastify routes: ingestion, config, admin query, clusters, policy CRUD, GDPR erasure ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] `telemetry.test.ts` — 34 Vitest tests for schemas + policy evaluation ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Register telemetry routes in `server.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Add Cosmos containers (`telemetry_events`, `telemetry_error_clusters`, `telemetry_collection_policies`) to `cosmos-init.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
### iOS Keyboard Telemetry Client
- [x] `LysnrTelemetry.swift` — Singleton client with App Group offline queue, `X-Install-Token` auth, 200-event cap ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475))
- [x] Instrument `KeyboardViewController.swift` — 10+ telemetry points ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475))
- [x] `session_started` / `session_ended` (with full `DictationContext`)
- [x] `backend_selected` (azure / local + reason)
- [x] `recognition_started` / `recognition_failed`
- [x] `mic_permission_denied`
- [x] `insert_noop` detection
- [x] `error_recovery_attempted` (local→azure, azure→local)
- [x] Session summary metrics (duration, segments, words, transcript length)
### Admin Dashboard — Client Logs Page
- [x] `/ops/client-logs/page.tsx` — Events table + Error Clusters tab ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
- [x] Stat cards (total events, errors, warnings, keyboard events)
- [x] Filters (platform, channel, level, module, free-text search)
- [x] Expandable event detail rows (device, tags, metrics, dictation context)
- [x] Error Clusters tab with severity, affected versions, user count
- [x] `/api/telemetry/route.ts` — API route proxying to platform-service ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
- [x] `platform-client.ts``queryTelemetryEvents` + `queryTelemetryClusters` ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
- [x] `sidebar-nav.tsx` — "Client Logs" nav item with `FileText` icon ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
---
## Phase 2 — Full Platform Coverage
### iOS Main App
- [x] `TelemetryService.swift` — Main app telemetry service with App Group queue drain on foreground ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
- [x] `LysnrAIApp.swift``scenePhase` integration for activate/deactivate lifecycle ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
- [x] `app_foregrounded` / `app_backgrounded` events
- [x] Keyboard queue flush on every foreground transition
- [x] 60-second periodic flush timer
### Desktop App (Python)
- [x] `platform_telemetry.py``PlatformTelemetry` singleton with `urllib.request` POST, threaded flush timer, persistent `install_id` in `~/.LysnrAI/install_id` ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
- [x] `main.py` instrumentation ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
- [x] `app_started` / `app_stopped` lifecycle events
- [x] `dictation_started` (with backend tag)
- [x] `dictation_completed` (with duration_ms, word_count, transcript_length metrics)
- [x] `mic_permission_denied` / `recording_start_failed` error events
### Web User Dashboard
- [x] `telemetry.ts` — Browser client with `sendBeacon`, `localStorage` install ID, auto-flush on visibility change ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
- [x] `/api/telemetry/ingest/route.ts` — Server-side proxy to platform-service ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
- [x] `providers.tsx``initTelemetry()` called on app mount ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
### Tracker Dashboard
- [x] `telemetry.ts` — Browser client (same pattern as user dashboard) ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
- [x] `/api/telemetry/ingest/route.ts` — Server-side proxy to platform-service ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
- [x] `providers.tsx``initTelemetry()` called on app mount ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
### Admin Dashboard Self-Telemetry
- [x] `telemetry.ts` — Browser client tracking admin page views, filter usage, policy changes ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
- [x] `/api/telemetry/admin-ingest/route.ts` — Separate proxy from admin query route ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
- [x] `providers.tsx``initTelemetry()` called on app mount ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
### Android
- [x] `TelemetryClient.kt` — Kotlin singleton with OkHttp POST, SharedPreferences offline queue, persistent install ID ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
- [x] Instrument `LysnrInputMethodService.kt` — 10 telemetry points ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
- [x] `session_started` / `session_ended` (with words_inserted metric)
- [x] `dictation_started` (with backend + reason tags)
- [x] `dictation_completed` (with duration_ms, word_count, segment_count, transcript_length)
- [x] `mic_permission_denied`
- [x] `recognition_failed` (with errorCode + errorDomain)
- [x] `error_recovery_attempted` (azure→local fallback)
- [x] Offline queue using SharedPreferences with FIFO eviction ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
- [x] Flush on app foreground via `ProcessLifecycleOwner` + 60s periodic flush timer ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
---
## Phase 3 — Intelligence & Admin Tooling
### Error Clustering & Alerting
- [x] Automated error fingerprinting (hash of `platform + channel + module + eventName + errorDomain + errorCode`) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Cluster severity escalation (`warn` → `error``fatal` based on count + affected users) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Webhook alerting when cluster severity escalates (Slack-compatible, env `TELEMETRY_ALERT_WEBHOOK_URL`) ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
- [x] Dashboard: cluster timeline chart (Recharts stacked bar, last 14 days, severity breakdown) ([`dc49073`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/dc49073))
- [x] Dashboard: "Resolve" / "Ignore" / "Reopen" actions on clusters ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
- [x] Cluster status field (`open`/`resolved`/`ignored`) + `PATCH /telemetry/clusters/:id` endpoint ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
### Geo Enrichment
- [x] Server-side IP → country/region lookup on ingestion (configurable via `TELEMETRY_GEO_API_URL`, 24h in-memory cache, 2s timeout) ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
- [x] Populate `countryCode` + `regionCode` fields (e.g., `US:WA`) on events from server-side IP lookup ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
- [x] Admin UI: geographic distribution chart (horizontal bar chart + country table, Geo tab on client-logs page) ([`0bfd4bd`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0bfd4bd), [`82a25c0`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/82a25c0))
- [x] Policy targeting by `regionCode`/`countryCodes` ranges (schema already supports it in `TelemetryTargetingSchema`)
### Collection Policy Builder UI
- [x] Admin page: `/ops/telemetry-policies` ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
- [x] CRUD UI for collection policies (name, enabled, targeting rules, sampling rates) ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
- [x] Targeting builder: platform checkboxes, channel badges, release channel selection, percentage slider ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
- [x] Live preview: "N / M clients would match this policy" — `POST /telemetry/policies/preview` + UI button ([`61c919a`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/61c919a), [`da9031b`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/da9031b))
- [x] Policy activation/deactivation toggle ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
- [x] Scheduling: `startsAt` / `expiresAt` date pickers ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
### Privacy & Compliance
- [x] PII regex scanner on ingestion (email, phone, SSN, credit card patterns → reject before storage) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Admin API: GDPR erasure endpoint `DELETE /telemetry/user/:userId` — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Admin UI: GDPR erasure proxy route `/api/telemetry/erasure` ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
- [x] Retention policy enforcement (TTL-based auto-expiry, `TELEMETRY_EVENT_TTL_DAYS` env var) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
- [x] Audit log entries for policy CRUD + GDPR erasure (`telemetry.policy.created/updated/deleted`, `telemetry.gdpr.erasure`, `telemetry.cluster.resolved/ignored`) ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
- [x] Admin UI: GDPR erasure tab on Client Logs page ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
### Performance & Scale
- [x] ETag caching on `GET /telemetry/config` (`If-None-Match` → 304, `Cache-Control: private, max-age=60`) ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
- [x] Server-side rate limiting per `installId` (100 events/min, in-memory sliding window) ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
- [x] Cosmos DB indexing policy tuning — `scripts/cosmos-telemetry-indexes.sh` with composite indexes for all 3 containers ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
- [x] Batch ingestion deduplication by `event.id` ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
- [x] In-memory ingestion metrics counters + `GET /telemetry/metrics` admin endpoint ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
- [x] Admin UI: Metrics tab on Client Logs page (ingested, rejected, PII blocked, rate limited, duplicates) ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
- [x] Prometheus OpenMetrics export endpoint `GET /telemetry/metrics/prometheus` ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
---
## Phase 4 — Operational Wiring (NOT STARTED 🔴)
> **This phase bridges "code exists" → "telemetry actually flows."**
> All Phases 03 are code-complete, but **no telemetry data has ever reached the server** from any real client.
> The items below are required before the telemetry system can be called "done."
### 4.1 — Platform-Service Deployment
- [ ] Deploy platform-service to a **publicly reachable URL** (Azure Container Apps, Azure App Service, or VM)
- [ ] Configure DNS / reverse proxy so clients can reach `https://api.lysnrai.com` (or similar)
- [ ] Set env vars: `COSMOS_ENDPOINT`, `COSMOS_KEY`, `TELEMETRY_ENABLED=true`
- [ ] Run `scripts/cosmos-telemetry-indexes.sh` against live Cosmos DB to create containers + indexes
- [ ] Verify `POST /api/telemetry/events` accepts a test payload from `curl`
### 4.2 — iOS Keyboard Extension Wiring
- [ ] **Register App Groups capability** in Apple Developer portal for both `com.bytelyst.LysnrAI` and `com.bytelyst.LysnrAI.keyboard`
- [ ] **Restore entitlements** in TestFlight builds (currently cleared because provisioning profile lacks App Groups)
- `LysnrAI.entitlements`: `aps-environment` + `com.apple.security.application-groups`
- `LysnrKeyboard.entitlements`: `com.apple.security.application-groups`
- [ ] **Write `platform_service_url`** to App Group UserDefaults — currently `LysnrTelemetry.swift` reads `platform_service_url` from App Group (line 80) but **nothing writes it**
- Option A: Main app writes URL on launch from env/config
- Option B: Hardcode URL in `LysnrTelemetry.swift` init
- Option C: Bundle in `env.dev` and read from shared config
- [ ] **Verify mic permission flow on physical device** — keyboard extensions may not show permission prompts; main app must request mic permission first. Current "Mic error" on device likely caused by this.
- [ ] Test Full Access ON vs OFF paths on physical device
### 4.3 — iOS Main App TelemetryService Integration
- [ ] Verify `TelemetryService.swift` reads `platform_service_url` from config/env and writes to App Group
- [ ] Verify keyboard queue drain works: main app foreground → reads App Group `telemetry_event_queue` → POSTs to server
- [ ] Test lifecycle: app backgrounded → keyboard generates events → app foregrounded → events flushed
### 4.4 — Desktop App Wiring
- [ ] Set `PLATFORM_SERVICE_URL` env var in `~/.LysnrAI/.env` pointing to deployed service
- [ ] Verify `platform_telemetry.py` sends events on dictation start/stop
- [ ] Test offline → online queue drain
### 4.5 — Web Dashboard Wiring
- [ ] Set `PLATFORM_SERVICE_URL` in dashboard `.env.local` files
- [ ] Verify `/api/telemetry/ingest` proxy routes forward to deployed platform-service
- [ ] Verify admin dashboard `/ops/client-logs` page loads real data from platform-service
### 4.6 — Android Wiring
- [ ] Set platform service URL in Android app config
- [ ] Test SharedPreferences offline queue + foreground flush
- [ ] Verify keyboard instrumentation events reach server
### 4.7 — Webhook / Alert Configuration
- [ ] Set `TELEMETRY_ALERT_WEBHOOK_URL` env var (Slack webhook or equivalent)
- [ ] Test cluster severity escalation triggers webhook
- [ ] Set `TELEMETRY_GEO_API_URL` env var (ip-api.com or similar) for geo enrichment
### 4.8 — End-to-End Smoke Test
- [ ] iOS keyboard → platform-service → Cosmos → admin dashboard query — **full round-trip**
- [ ] Desktop → platform-service → Cosmos → admin dashboard query
- [ ] Web dashboard → platform-service ingest → admin dashboard query
- [ ] Trigger error cluster creation → verify cluster appears in admin UI
- [ ] Trigger rate limit → verify rejection in metrics tab
- [ ] GDPR erasure → verify events deleted from Cosmos
### Summary: What Blocks "100% Done"
| Blocker | Severity | Effort |
| --------------------------------------------------- | ----------- | ----------------------------------------------- |
| **Platform-service not deployed** | 🔴 Critical | Medium — needs Azure infra |
| **App Group entitlements not registered** | 🔴 Critical | Low — Apple Developer portal config |
| **`platform_service_url` not written to App Group** | 🔴 Critical | Low — one-line code change |
| **Cosmos containers not created in prod** | 🟡 High | Low — run indexing script |
| **Mic permission flow on device** | 🟡 High | Medium — needs device testing + possible UX fix |
| **Webhook URL not configured** | 🟢 Low | Trivial — env var |
| **Geo API URL not configured** | 🟢 Low | Trivial — env var |
| **Remaining test gaps (5 items)** | 🟢 Low | Medium — integration/e2e tests |
---
## Architecture Summary
```
┌─────────────────────┐ ┌──────────────────────┐ ┌───────────────────┐
│ iOS Keyboard Ext │ │ iOS Main App │ │ Desktop (Python) │
│ LysnrTelemetry │───▶│ TelemetryService │ │ PlatformTelemetry│
│ (App Group queue) │ │ (drains queue) │ │ (urllib POST) │
└─────────────────────┘ └──────────┬───────────┘ └────────┬──────────┘
Full Access ON ──┐ │ │
direct POST │ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Platform Service (Fastify, port 4003) │
│ POST /api/telemetry/events — batch ingestion │
│ GET /api/telemetry/config — client collection config │
│ GET /api/telemetry/query — admin event search │
│ GET /api/telemetry/clusters — admin error clusters │
│ CRUD /api/telemetry/policies — collection policy management │
│ DELETE /api/telemetry/user/:userId — GDPR erasure │
└────────────────────────────┬────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Azure Cosmos DB │
│ telemetry_events partitionKeyPath: /pk │
│ pk value = productId:yyyyMM:platform (e.g. lysnrai:202602:ios) │
│ telemetry_error_clusters partitionKeyPath: /pk │
│ pk value = productId:platform:module (e.g. lysnrai:ios:dictation)│
│ telemetry_collection_policies partitionKeyPath: /productId │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┐ ┌──────────────────────┐
│ Admin Dashboard │ GET │ User Dashboard │ POST
│ /ops/client-logs │─────────▶│ /api/telemetry/ │─────────▶ platform
│ (queries via │ query/ │ ingest │ /events -service
│ platform-service API) │ clusters│ (browser → proxy) │
└─────────────────────────┘ └──────────────────────┘
┌───────────────────────┐
│ Android │
│ TelemetryClient.kt │──▶ POST /api/telemetry/events ──▶ platform-service
│ (SharedPreferences) │
└───────────────────────┘
```
---
## Test Coverage
| Component | Test File | Tests | Coverage |
| --------------------------------- | ------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Platform-service telemetry** | `telemetry.test.ts` | 89 | Zod schemas (34), `containsPII` (6), `computePk` (4), `normalizeMessage` (7), `generateFingerprint` (8), `policyMatchesContext` (13), `mergePolicies` (5), `checkRateLimit` (3), plus additional route-logic tests |
| **iOS LysnrTelemetry (keyboard)** | `LysnrAITests/LysnrTelemetryTests.swift` | 18 | Identity (5), session management (2), event types (1), DictationContext (3), track (3), flush (2), queue (1), crash-safety (1) |
| **Desktop Python client** | `tests/cloud/test_platform_telemetry.py` | 19 | Event format (6), queue behavior (2), session mgmt (2), flush/HTTP (5), install ID (2), singleton (2) |
| **Web dashboard client** | `user-dashboard-web/src/__tests__/telemetry.test.ts` | 12 | `trackEvent` (3), `trackPageView` (1), `flush` (4), install ID (2), `initTelemetry` (2) |
| **Tracker dashboard client** | `tracker-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Admin dashboard client** | `admin-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
| **Total** | | **158** | |
### Verification commands
```bash
# Platform-service (89 telemetry tests within 624 total)
cd ../learning_ai_common_plat && pnpm --filter @lysnrai/platform-service test
# iOS keyboard telemetry (18 tests)
cd learning_voice_ai_agent
xcodebuild test-without-building \
-workspace mobile_app/ios/LysnrAI.xcworkspace \
-scheme LysnrAITests \
-destination 'platform=iOS Simulator,name=iPhone 17 Pro' \
-only-testing:LysnrAITests/LysnrTelemetryTests
# Desktop Python (19 tests)
python -m pytest tests/cloud/test_platform_telemetry.py -v
# Web user-dashboard (12 tests)
cd user-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
# Tracker dashboard (10 tests)
cd tracker-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
# Admin dashboard (10 tests)
cd admin-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
```
### Not yet tested
- [x] iOS `LysnrTelemetry.swift` — ✅ 18 XCTest unit tests (`LysnrTelemetryTests.swift`, build 28)
- [ ] iOS `TelemetryService.swift` (main app) — needs XCTest target for main app
- [ ] Android `TelemetryClient.kt` — needs Android instrumented tests or Robolectric
- [ ] Admin dashboard `/api/telemetry/route.ts` — API route integration test
- [ ] Platform-service HTTP integration tests (Fastify inject for telemetry routes)
- [ ] End-to-end: client → platform-service → Cosmos read-back → admin dashboard query
---
## Bugs Found During Review
The following bugs were discovered during systematic review of the roadmap against actual code and fixed:
| # | Severity | Issue | Fix |
| --- | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| 1 | **High** | Desktop Python `id` used `uuid.uuid4().hex` (32 hex, no dashes) — fails Zod `.uuid()` server validation | Changed to `str(uuid.uuid4())` |
| 2 | **High** | Web telemetry `osFamily='web'` not in Zod `OsFamilyEnum` — fails server validation | Changed to `'other'` |
| 3 | **Medium** | Status said "Phase 2 complete" but Android is all unchecked | Fixed status line |
| 4 | **Medium** | Architecture diagram showed wrong pk for `telemetry_error_clusters` (`/productId` → actual `/pk` = `productId:platform:module`) | Fixed diagram |
| 5 | **Medium** | Tracker dashboard telemetry missing from roadmap entirely | Added as Phase 2 pending |
| 6 | **Medium** | Admin dashboard self-telemetry (page views) not mentioned | Added as Phase 2 pending |
| 7 | **Low** | Architecture diagram missing Android client box | Added with "not yet implemented" note |
| 8 | **Low** | Architecture diagram implied Admin reads Cosmos directly (it queries Platform Service) | Fixed data flow arrows |
| 9 | **Low** | Web `telemetry.ts` JSDoc said "via the admin dashboard proxy" (wrong dashboard) | Fixed to "user dashboard's /api/telemetry/ingest proxy" |
| 10 | **Low** | Commit log missing roadmap doc commit | Added |
---
## Commit Log
| Date | Repo | Commit | Description |
| ---------- | ----------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| 2026-02-16 | common-plat | [`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e) | Design doc: client telemetry & log insights |
| 2026-02-16 | common-plat | [`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02) | Fix 18 gaps in telemetry design doc (rev 2) |
| 2026-02-16 | common-plat | [`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff) | Telemetry module — ingest, config, query, clusters, policies (34 tests) |
| 2026-02-17 | voice-agent | [`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475) | iOS keyboard telemetry client + KeyboardViewController instrumentation |
| 2026-02-17 | voice-agent | [`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94) | Admin dashboard Client Logs page + sidebar nav |
| 2026-02-17 | voice-agent | [`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa) | iOS main app TelemetryService + Desktop Python platform_telemetry |
| 2026-02-17 | voice-agent | [`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6) | Web user-dashboard telemetry client + ingest proxy |
| 2026-02-17 | common-plat | [`c3d6977`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c3d6977) | Telemetry roadmap doc (this file) |
| 2026-02-17 | voice-agent | [`ae77438`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/ae77438) | Fix: desktop uuid format + web osFamily — pass Zod validation |
| 2026-02-17 | common-plat | [`20f77d5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/20f77d5) | Tests: route-logic tests — PII, pk, fingerprint, policy matching (34→77) |
| 2026-02-17 | voice-agent | [`08efdb6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/08efdb6) | Tests: Python client (19) + web dashboard (12) telemetry tests |
| 2026-02-17 | voice-agent | [`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609) | Tracker + admin self-telemetry clients + tests (20 tests) |
| 2026-02-17 | voice-agent | [`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48) | Android TelemetryClient + keyboard instrumentation + ProcessLifecycleOwner |
| 2026-02-17 | voice-agent | [`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9) | Phase 3: Policy Builder UI + GDPR erasure proxy + sidebar nav |
| 2026-02-17 | common-plat | [`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410) | Phase 3: Rate limiting, batch dedup, ETag config caching (614 tests) |
| 2026-02-17 | common-plat | [`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323) | Phase 3: Cluster resolve/ignore, audit logging, webhook alerts, metrics, Cosmos indexes |
| 2026-02-17 | voice-agent | [`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3) | Phase 3: Cluster actions UI, metrics tab, GDPR erasure UI |
| 2026-02-17 | common-plat | [`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5) | Phase 3: Geo enrichment, Prometheus metrics export |
| 2026-02-17 | voice-agent | [`dc49073`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/dc49073) | Phase 3: Cluster timeline chart (Recharts) |
| 2026-02-17 | common-plat | [`61c919a`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/61c919a) | Phase 3: Policy preview endpoint (count matching clients) |
| 2026-02-17 | voice-agent | [`da9031b`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/da9031b) | Phase 3: Policy builder live preview UI + API proxy |
| 2026-02-17 | common-plat | [`0bfd4bd`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0bfd4bd) | Phase 3: Geo distribution endpoint (GET /telemetry/geo, Cosmos GROUP BY) |
| 2026-02-17 | voice-agent | [`82a25c0`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/82a25c0) | Phase 3: Geo distribution UI — bar chart + country table on client-logs Geo tab |