384 lines
34 KiB
Markdown
384 lines
34 KiB
Markdown
# Client Telemetry — Implementation Roadmap
|
||
|
||
> **Status:** Phases 0–3 code complete ✅ · Phase 4 (Operational Wiring) **NOT STARTED** 🔴
|
||
> **Last updated:** 2026-02-17 (reviewed for accuracy against running code)
|
||
> **Design doc:** [`CLIENT_TELEMETRY_DESIGN.md`](./CLIENT_TELEMETRY_DESIGN.md)
|
||
> **Repos:** `learning_ai_common_plat` (platform-service) · `learning_voice_ai_agent` (all clients + dashboards)
|
||
|
||
---
|
||
|
||
## Phase 0 — Design & Review
|
||
|
||
- [x] Write comprehensive telemetry design doc — schema, APIs, admin UX, privacy guardrails ([`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e))
|
||
- [x] Systematic review: identify and fix 18 bugs/gaps in the design doc ([`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02))
|
||
- TTL format (ISO → seconds), `regionCode` prefix format, missing `pk` field
|
||
- Auth model for keyboard extension (`X-Install-Token`)
|
||
- Config endpoint query params (`userId`/`anonymousInstallId`)
|
||
- Error clustering made version-agnostic (`affectedVersions` array)
|
||
- GDPR erasure endpoint added
|
||
- iOS offline queue strategy (App Group UserDefaults, FIFO eviction)
|
||
- Global defaults for `batchSize`/`flushInterval`/`maxQueueSize`
|
||
|
||
---
|
||
|
||
## Phase 1 — MVP (iOS Keyboard + Backend + Admin UI)
|
||
|
||
### Platform-Service Telemetry Module
|
||
|
||
- [x] `types.ts` — Zod schemas for events, policies, clusters, queries ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] `repository.ts` — Cosmos DB CRUD for events, policies, clusters ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] `routes.ts` — Fastify routes: ingestion, config, admin query, clusters, policy CRUD, GDPR erasure ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] `telemetry.test.ts` — 34 Vitest tests for schemas + policy evaluation ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Register telemetry routes in `server.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Add Cosmos containers (`telemetry_events`, `telemetry_error_clusters`, `telemetry_collection_policies`) to `cosmos-init.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
|
||
### iOS Keyboard Telemetry Client
|
||
|
||
- [x] `LysnrTelemetry.swift` — Singleton client with App Group offline queue, `X-Install-Token` auth, 200-event cap ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475))
|
||
- [x] Instrument `KeyboardViewController.swift` — 10+ telemetry points ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475))
|
||
- [x] `session_started` / `session_ended` (with full `DictationContext`)
|
||
- [x] `backend_selected` (azure / local + reason)
|
||
- [x] `recognition_started` / `recognition_failed`
|
||
- [x] `mic_permission_denied`
|
||
- [x] `insert_noop` detection
|
||
- [x] `error_recovery_attempted` (local→azure, azure→local)
|
||
- [x] Session summary metrics (duration, segments, words, transcript length)
|
||
|
||
### Admin Dashboard — Client Logs Page
|
||
|
||
- [x] `/ops/client-logs/page.tsx` — Events table + Error Clusters tab ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
|
||
- [x] Stat cards (total events, errors, warnings, keyboard events)
|
||
- [x] Filters (platform, channel, level, module, free-text search)
|
||
- [x] Expandable event detail rows (device, tags, metrics, dictation context)
|
||
- [x] Error Clusters tab with severity, affected versions, user count
|
||
- [x] `/api/telemetry/route.ts` — API route proxying to platform-service ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
|
||
- [x] `platform-client.ts` — `queryTelemetryEvents` + `queryTelemetryClusters` ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
|
||
- [x] `sidebar-nav.tsx` — "Client Logs" nav item with `FileText` icon ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94))
|
||
|
||
---
|
||
|
||
## Phase 2 — Full Platform Coverage
|
||
|
||
### iOS Main App
|
||
|
||
- [x] `TelemetryService.swift` — Main app telemetry service with App Group queue drain on foreground ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
|
||
- [x] `LysnrAIApp.swift` — `scenePhase` integration for activate/deactivate lifecycle ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
|
||
- [x] `app_foregrounded` / `app_backgrounded` events
|
||
- [x] Keyboard queue flush on every foreground transition
|
||
- [x] 60-second periodic flush timer
|
||
|
||
### Desktop App (Python)
|
||
|
||
- [x] `platform_telemetry.py` — `PlatformTelemetry` singleton with `urllib.request` POST, threaded flush timer, persistent `install_id` in `~/.LysnrAI/install_id` ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
|
||
- [x] `main.py` instrumentation ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa))
|
||
- [x] `app_started` / `app_stopped` lifecycle events
|
||
- [x] `dictation_started` (with backend tag)
|
||
- [x] `dictation_completed` (with duration_ms, word_count, transcript_length metrics)
|
||
- [x] `mic_permission_denied` / `recording_start_failed` error events
|
||
|
||
### Web User Dashboard
|
||
|
||
- [x] `telemetry.ts` — Browser client with `sendBeacon`, `localStorage` install ID, auto-flush on visibility change ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
|
||
- [x] `/api/telemetry/ingest/route.ts` — Server-side proxy to platform-service ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
|
||
- [x] `providers.tsx` — `initTelemetry()` called on app mount ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6))
|
||
|
||
### Tracker Dashboard
|
||
|
||
- [x] `telemetry.ts` — Browser client (same pattern as user dashboard) ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
- [x] `/api/telemetry/ingest/route.ts` — Server-side proxy to platform-service ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
- [x] `providers.tsx` — `initTelemetry()` called on app mount ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
|
||
### Admin Dashboard Self-Telemetry
|
||
|
||
- [x] `telemetry.ts` — Browser client tracking admin page views, filter usage, policy changes ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
- [x] `/api/telemetry/admin-ingest/route.ts` — Separate proxy from admin query route ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
- [x] `providers.tsx` — `initTelemetry()` called on app mount ([`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609))
|
||
|
||
### Android
|
||
|
||
- [x] `TelemetryClient.kt` — Kotlin singleton with OkHttp POST, SharedPreferences offline queue, persistent install ID ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
|
||
- [x] Instrument `LysnrInputMethodService.kt` — 10 telemetry points ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
|
||
- [x] `session_started` / `session_ended` (with words_inserted metric)
|
||
- [x] `dictation_started` (with backend + reason tags)
|
||
- [x] `dictation_completed` (with duration_ms, word_count, segment_count, transcript_length)
|
||
- [x] `mic_permission_denied`
|
||
- [x] `recognition_failed` (with errorCode + errorDomain)
|
||
- [x] `error_recovery_attempted` (azure→local fallback)
|
||
- [x] Offline queue using SharedPreferences with FIFO eviction ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
|
||
- [x] Flush on app foreground via `ProcessLifecycleOwner` + 60s periodic flush timer ([`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48))
|
||
|
||
---
|
||
|
||
## Phase 3 — Intelligence & Admin Tooling
|
||
|
||
### Error Clustering & Alerting
|
||
|
||
- [x] Automated error fingerprinting (hash of `platform + channel + module + eventName + errorDomain + errorCode`) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Cluster severity escalation (`warn` → `error` → `fatal` based on count + affected users) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Webhook alerting when cluster severity escalates (Slack-compatible, env `TELEMETRY_ALERT_WEBHOOK_URL`) ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
|
||
- [x] Dashboard: cluster timeline chart (Recharts stacked bar, last 14 days, severity breakdown) ([`dc49073`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/dc49073))
|
||
- [x] Dashboard: "Resolve" / "Ignore" / "Reopen" actions on clusters ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
|
||
- [x] Cluster status field (`open`/`resolved`/`ignored`) + `PATCH /telemetry/clusters/:id` endpoint ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
|
||
|
||
### Geo Enrichment
|
||
|
||
- [x] Server-side IP → country/region lookup on ingestion (configurable via `TELEMETRY_GEO_API_URL`, 24h in-memory cache, 2s timeout) ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
|
||
- [x] Populate `countryCode` + `regionCode` fields (e.g., `US:WA`) on events from server-side IP lookup ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
|
||
- [x] Admin UI: geographic distribution chart (horizontal bar chart + country table, Geo tab on client-logs page) ([`0bfd4bd`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0bfd4bd), [`82a25c0`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/82a25c0))
|
||
- [x] Policy targeting by `regionCode`/`countryCodes` ranges (schema already supports it in `TelemetryTargetingSchema`)
|
||
|
||
### Collection Policy Builder UI
|
||
|
||
- [x] Admin page: `/ops/telemetry-policies` ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
- [x] CRUD UI for collection policies (name, enabled, targeting rules, sampling rates) ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
- [x] Targeting builder: platform checkboxes, channel badges, release channel selection, percentage slider ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
- [x] Live preview: "N / M clients would match this policy" — `POST /telemetry/policies/preview` + UI button ([`61c919a`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/61c919a), [`da9031b`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/da9031b))
|
||
- [x] Policy activation/deactivation toggle ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
- [x] Scheduling: `startsAt` / `expiresAt` date pickers ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
|
||
### Privacy & Compliance
|
||
|
||
- [x] PII regex scanner on ingestion (email, phone, SSN, credit card patterns → reject before storage) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Admin API: GDPR erasure endpoint `DELETE /telemetry/user/:userId` — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Admin UI: GDPR erasure proxy route `/api/telemetry/erasure` ([`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9))
|
||
- [x] Retention policy enforcement (TTL-based auto-expiry, `TELEMETRY_EVENT_TTL_DAYS` env var) — Phase 1 ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff))
|
||
- [x] Audit log entries for policy CRUD + GDPR erasure (`telemetry.policy.created/updated/deleted`, `telemetry.gdpr.erasure`, `telemetry.cluster.resolved/ignored`) ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
|
||
- [x] Admin UI: GDPR erasure tab on Client Logs page ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
|
||
|
||
### Performance & Scale
|
||
|
||
- [x] ETag caching on `GET /telemetry/config` (`If-None-Match` → 304, `Cache-Control: private, max-age=60`) ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
|
||
- [x] Server-side rate limiting per `installId` (100 events/min, in-memory sliding window) ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
|
||
- [x] Cosmos DB indexing policy tuning — `scripts/cosmos-telemetry-indexes.sh` with composite indexes for all 3 containers ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
|
||
- [x] Batch ingestion deduplication by `event.id` ([`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410))
|
||
- [x] In-memory ingestion metrics counters + `GET /telemetry/metrics` admin endpoint ([`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323))
|
||
- [x] Admin UI: Metrics tab on Client Logs page (ingested, rejected, PII blocked, rate limited, duplicates) ([`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3))
|
||
- [x] Prometheus OpenMetrics export endpoint `GET /telemetry/metrics/prometheus` ([`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5))
|
||
|
||
---
|
||
|
||
## Phase 4 — Operational Wiring (NOT STARTED 🔴)
|
||
|
||
> **This phase bridges "code exists" → "telemetry actually flows."**
|
||
> All Phases 0–3 are code-complete, but **no telemetry data has ever reached the server** from any real client.
|
||
> The items below are required before the telemetry system can be called "done."
|
||
|
||
### 4.1 — Platform-Service Deployment
|
||
|
||
- [ ] Deploy platform-service to a **publicly reachable URL** (Azure Container Apps, Azure App Service, or VM)
|
||
- [ ] Configure DNS / reverse proxy so clients can reach `https://api.lysnrai.com` (or similar)
|
||
- [ ] Set env vars: `COSMOS_ENDPOINT`, `COSMOS_KEY`, `TELEMETRY_ENABLED=true`
|
||
- [ ] Run `scripts/cosmos-telemetry-indexes.sh` against live Cosmos DB to create containers + indexes
|
||
- [ ] Verify `POST /api/telemetry/events` accepts a test payload from `curl`
|
||
|
||
### 4.2 — iOS Keyboard Extension Wiring
|
||
|
||
- [ ] **Register App Groups capability** in Apple Developer portal for both `com.bytelyst.LysnrAI` and `com.bytelyst.LysnrAI.keyboard`
|
||
- [ ] **Restore entitlements** in TestFlight builds (currently cleared because provisioning profile lacks App Groups)
|
||
- `LysnrAI.entitlements`: `aps-environment` + `com.apple.security.application-groups`
|
||
- `LysnrKeyboard.entitlements`: `com.apple.security.application-groups`
|
||
- [ ] **Write `platform_service_url`** to App Group UserDefaults — currently `LysnrTelemetry.swift` reads `platform_service_url` from App Group (line 80) but **nothing writes it**
|
||
- Option A: Main app writes URL on launch from env/config
|
||
- Option B: Hardcode URL in `LysnrTelemetry.swift` init
|
||
- Option C: Bundle in `env.dev` and read from shared config
|
||
- [ ] **Verify mic permission flow on physical device** — keyboard extensions may not show permission prompts; main app must request mic permission first. Current "Mic error" on device likely caused by this.
|
||
- [ ] Test Full Access ON vs OFF paths on physical device
|
||
|
||
### 4.3 — iOS Main App TelemetryService Integration
|
||
|
||
- [ ] Verify `TelemetryService.swift` reads `platform_service_url` from config/env and writes to App Group
|
||
- [ ] Verify keyboard queue drain works: main app foreground → reads App Group `telemetry_event_queue` → POSTs to server
|
||
- [ ] Test lifecycle: app backgrounded → keyboard generates events → app foregrounded → events flushed
|
||
|
||
### 4.4 — Desktop App Wiring
|
||
|
||
- [ ] Set `PLATFORM_SERVICE_URL` env var in `~/.LysnrAI/.env` pointing to deployed service
|
||
- [ ] Verify `platform_telemetry.py` sends events on dictation start/stop
|
||
- [ ] Test offline → online queue drain
|
||
|
||
### 4.5 — Web Dashboard Wiring
|
||
|
||
- [ ] Set `PLATFORM_SERVICE_URL` in dashboard `.env.local` files
|
||
- [ ] Verify `/api/telemetry/ingest` proxy routes forward to deployed platform-service
|
||
- [ ] Verify admin dashboard `/ops/client-logs` page loads real data from platform-service
|
||
|
||
### 4.6 — Android Wiring
|
||
|
||
- [ ] Set platform service URL in Android app config
|
||
- [ ] Test SharedPreferences offline queue + foreground flush
|
||
- [ ] Verify keyboard instrumentation events reach server
|
||
|
||
### 4.7 — Webhook / Alert Configuration
|
||
|
||
- [ ] Set `TELEMETRY_ALERT_WEBHOOK_URL` env var (Slack webhook or equivalent)
|
||
- [ ] Test cluster severity escalation triggers webhook
|
||
- [ ] Set `TELEMETRY_GEO_API_URL` env var (ip-api.com or similar) for geo enrichment
|
||
|
||
### 4.8 — End-to-End Smoke Test
|
||
|
||
- [ ] iOS keyboard → platform-service → Cosmos → admin dashboard query — **full round-trip**
|
||
- [ ] Desktop → platform-service → Cosmos → admin dashboard query
|
||
- [ ] Web dashboard → platform-service ingest → admin dashboard query
|
||
- [ ] Trigger error cluster creation → verify cluster appears in admin UI
|
||
- [ ] Trigger rate limit → verify rejection in metrics tab
|
||
- [ ] GDPR erasure → verify events deleted from Cosmos
|
||
|
||
### Summary: What Blocks "100% Done"
|
||
|
||
| Blocker | Severity | Effort |
|
||
| --------------------------------------------------- | ----------- | ----------------------------------------------- |
|
||
| **Platform-service not deployed** | 🔴 Critical | Medium — needs Azure infra |
|
||
| **App Group entitlements not registered** | 🔴 Critical | Low — Apple Developer portal config |
|
||
| **`platform_service_url` not written to App Group** | 🔴 Critical | Low — one-line code change |
|
||
| **Cosmos containers not created in prod** | 🟡 High | Low — run indexing script |
|
||
| **Mic permission flow on device** | 🟡 High | Medium — needs device testing + possible UX fix |
|
||
| **Webhook URL not configured** | 🟢 Low | Trivial — env var |
|
||
| **Geo API URL not configured** | 🟢 Low | Trivial — env var |
|
||
| **Remaining test gaps (5 items)** | 🟢 Low | Medium — integration/e2e tests |
|
||
|
||
---
|
||
|
||
## Architecture Summary
|
||
|
||
```
|
||
┌─────────────────────┐ ┌──────────────────────┐ ┌───────────────────┐
|
||
│ iOS Keyboard Ext │ │ iOS Main App │ │ Desktop (Python) │
|
||
│ LysnrTelemetry │───▶│ TelemetryService │ │ PlatformTelemetry│
|
||
│ (App Group queue) │ │ (drains queue) │ │ (urllib POST) │
|
||
└─────────────────────┘ └──────────┬───────────┘ └────────┬──────────┘
|
||
Full Access ON ──┐ │ │
|
||
direct POST │ │ │
|
||
▼ ▼ ▼
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Platform Service (Fastify, port 4003) │
|
||
│ POST /api/telemetry/events — batch ingestion │
|
||
│ GET /api/telemetry/config — client collection config │
|
||
│ GET /api/telemetry/query — admin event search │
|
||
│ GET /api/telemetry/clusters — admin error clusters │
|
||
│ CRUD /api/telemetry/policies — collection policy management │
|
||
│ DELETE /api/telemetry/user/:userId — GDPR erasure │
|
||
└────────────────────────────┬────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Azure Cosmos DB │
|
||
│ telemetry_events partitionKeyPath: /pk │
|
||
│ pk value = productId:yyyyMM:platform (e.g. lysnrai:202602:ios) │
|
||
│ telemetry_error_clusters partitionKeyPath: /pk │
|
||
│ pk value = productId:platform:module (e.g. lysnrai:ios:dictation)│
|
||
│ telemetry_collection_policies partitionKeyPath: /productId │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
|
||
┌─────────────────────────┐ ┌──────────────────────┐
|
||
│ Admin Dashboard │ GET │ User Dashboard │ POST
|
||
│ /ops/client-logs │─────────▶│ /api/telemetry/ │─────────▶ platform
|
||
│ (queries via │ query/ │ ingest │ /events -service
|
||
│ platform-service API) │ clusters│ (browser → proxy) │
|
||
└─────────────────────────┘ └──────────────────────┘
|
||
|
||
┌───────────────────────┐
|
||
│ Android │
|
||
│ TelemetryClient.kt │──▶ POST /api/telemetry/events ──▶ platform-service
|
||
│ (SharedPreferences) │
|
||
└───────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Test Coverage
|
||
|
||
| Component | Test File | Tests | Coverage |
|
||
| --------------------------------- | ------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||
| **Platform-service telemetry** | `telemetry.test.ts` | 89 | Zod schemas (34), `containsPII` (6), `computePk` (4), `normalizeMessage` (7), `generateFingerprint` (8), `policyMatchesContext` (13), `mergePolicies` (5), `checkRateLimit` (3), plus additional route-logic tests |
|
||
| **iOS LysnrTelemetry (keyboard)** | `LysnrAITests/LysnrTelemetryTests.swift` | 18 | Identity (5), session management (2), event types (1), DictationContext (3), track (3), flush (2), queue (1), crash-safety (1) |
|
||
| **Desktop Python client** | `tests/cloud/test_platform_telemetry.py` | 19 | Event format (6), queue behavior (2), session mgmt (2), flush/HTTP (5), install ID (2), singleton (2) |
|
||
| **Web dashboard client** | `user-dashboard-web/src/__tests__/telemetry.test.ts` | 12 | `trackEvent` (3), `trackPageView` (1), `flush` (4), install ID (2), `initTelemetry` (2) |
|
||
| **Tracker dashboard client** | `tracker-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
|
||
| **Admin dashboard client** | `admin-dashboard-web/src/__tests__/telemetry.test.ts` | 10 | `trackEvent` (3), `trackPageView` (1), `flush` (4), `initTelemetry` (2) |
|
||
| **Total** | | **158** | |
|
||
|
||
### Verification commands
|
||
|
||
```bash
|
||
# Platform-service (89 telemetry tests within 624 total)
|
||
cd ../learning_ai_common_plat && pnpm --filter @lysnrai/platform-service test
|
||
|
||
# iOS keyboard telemetry (18 tests)
|
||
cd learning_voice_ai_agent
|
||
xcodebuild test-without-building \
|
||
-workspace mobile_app/ios/LysnrAI.xcworkspace \
|
||
-scheme LysnrAITests \
|
||
-destination 'platform=iOS Simulator,name=iPhone 17 Pro' \
|
||
-only-testing:LysnrAITests/LysnrTelemetryTests
|
||
|
||
# Desktop Python (19 tests)
|
||
python -m pytest tests/cloud/test_platform_telemetry.py -v
|
||
|
||
# Web user-dashboard (12 tests)
|
||
cd user-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
|
||
|
||
# Tracker dashboard (10 tests)
|
||
cd tracker-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
|
||
|
||
# Admin dashboard (10 tests)
|
||
cd admin-dashboard-web && npx vitest run src/__tests__/telemetry.test.ts
|
||
```
|
||
|
||
### Not yet tested
|
||
|
||
- [x] iOS `LysnrTelemetry.swift` — ✅ 18 XCTest unit tests (`LysnrTelemetryTests.swift`, build 28)
|
||
- [ ] iOS `TelemetryService.swift` (main app) — needs XCTest target for main app
|
||
- [ ] Android `TelemetryClient.kt` — needs Android instrumented tests or Robolectric
|
||
- [ ] Admin dashboard `/api/telemetry/route.ts` — API route integration test
|
||
- [ ] Platform-service HTTP integration tests (Fastify inject for telemetry routes)
|
||
- [ ] End-to-end: client → platform-service → Cosmos read-back → admin dashboard query
|
||
|
||
---
|
||
|
||
## Bugs Found During Review
|
||
|
||
The following bugs were discovered during systematic review of the roadmap against actual code and fixed:
|
||
|
||
| # | Severity | Issue | Fix |
|
||
| --- | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
|
||
| 1 | **High** | Desktop Python `id` used `uuid.uuid4().hex` (32 hex, no dashes) — fails Zod `.uuid()` server validation | Changed to `str(uuid.uuid4())` |
|
||
| 2 | **High** | Web telemetry `osFamily='web'` not in Zod `OsFamilyEnum` — fails server validation | Changed to `'other'` |
|
||
| 3 | **Medium** | Status said "Phase 2 complete" but Android is all unchecked | Fixed status line |
|
||
| 4 | **Medium** | Architecture diagram showed wrong pk for `telemetry_error_clusters` (`/productId` → actual `/pk` = `productId:platform:module`) | Fixed diagram |
|
||
| 5 | **Medium** | Tracker dashboard telemetry missing from roadmap entirely | Added as Phase 2 pending |
|
||
| 6 | **Medium** | Admin dashboard self-telemetry (page views) not mentioned | Added as Phase 2 pending |
|
||
| 7 | **Low** | Architecture diagram missing Android client box | Added with "not yet implemented" note |
|
||
| 8 | **Low** | Architecture diagram implied Admin reads Cosmos directly (it queries Platform Service) | Fixed data flow arrows |
|
||
| 9 | **Low** | Web `telemetry.ts` JSDoc said "via the admin dashboard proxy" (wrong dashboard) | Fixed to "user dashboard's /api/telemetry/ingest proxy" |
|
||
| 10 | **Low** | Commit log missing roadmap doc commit | Added |
|
||
|
||
---
|
||
|
||
## Commit Log
|
||
|
||
| Date | Repo | Commit | Description |
|
||
| ---------- | ----------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
|
||
| 2026-02-16 | common-plat | [`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e) | Design doc: client telemetry & log insights |
|
||
| 2026-02-16 | common-plat | [`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02) | Fix 18 gaps in telemetry design doc (rev 2) |
|
||
| 2026-02-16 | common-plat | [`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff) | Telemetry module — ingest, config, query, clusters, policies (34 tests) |
|
||
| 2026-02-17 | voice-agent | [`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475) | iOS keyboard telemetry client + KeyboardViewController instrumentation |
|
||
| 2026-02-17 | voice-agent | [`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94) | Admin dashboard Client Logs page + sidebar nav |
|
||
| 2026-02-17 | voice-agent | [`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa) | iOS main app TelemetryService + Desktop Python platform_telemetry |
|
||
| 2026-02-17 | voice-agent | [`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6) | Web user-dashboard telemetry client + ingest proxy |
|
||
| 2026-02-17 | common-plat | [`c3d6977`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c3d6977) | Telemetry roadmap doc (this file) |
|
||
| 2026-02-17 | voice-agent | [`ae77438`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/ae77438) | Fix: desktop uuid format + web osFamily — pass Zod validation |
|
||
| 2026-02-17 | common-plat | [`20f77d5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/20f77d5) | Tests: route-logic tests — PII, pk, fingerprint, policy matching (34→77) |
|
||
| 2026-02-17 | voice-agent | [`08efdb6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/08efdb6) | Tests: Python client (19) + web dashboard (12) telemetry tests |
|
||
| 2026-02-17 | voice-agent | [`a102609`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a102609) | Tracker + admin self-telemetry clients + tests (20 tests) |
|
||
| 2026-02-17 | voice-agent | [`9196f48`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/9196f48) | Android TelemetryClient + keyboard instrumentation + ProcessLifecycleOwner |
|
||
| 2026-02-17 | voice-agent | [`c7732c9`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/c7732c9) | Phase 3: Policy Builder UI + GDPR erasure proxy + sidebar nav |
|
||
| 2026-02-17 | common-plat | [`2fb3410`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2fb3410) | Phase 3: Rate limiting, batch dedup, ETag config caching (614 tests) |
|
||
| 2026-02-17 | common-plat | [`056f323`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/056f323) | Phase 3: Cluster resolve/ignore, audit logging, webhook alerts, metrics, Cosmos indexes |
|
||
| 2026-02-17 | voice-agent | [`6d7b1d3`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/6d7b1d3) | Phase 3: Cluster actions UI, metrics tab, GDPR erasure UI |
|
||
| 2026-02-17 | common-plat | [`2f61ea5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/2f61ea5) | Phase 3: Geo enrichment, Prometheus metrics export |
|
||
| 2026-02-17 | voice-agent | [`dc49073`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/dc49073) | Phase 3: Cluster timeline chart (Recharts) |
|
||
| 2026-02-17 | common-plat | [`61c919a`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/61c919a) | Phase 3: Policy preview endpoint (count matching clients) |
|
||
| 2026-02-17 | voice-agent | [`da9031b`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/da9031b) | Phase 3: Policy builder live preview UI + API proxy |
|
||
| 2026-02-17 | common-plat | [`0bfd4bd`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0bfd4bd) | Phase 3: Geo distribution endpoint (GET /telemetry/geo, Cosmos GROUP BY) |
|
||
| 2026-02-17 | voice-agent | [`82a25c0`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/82a25c0) | Phase 3: Geo distribution UI — bar chart + country table on client-logs Geo tab |
|