# Client Telemetry — Implementation Roadmap > **Status:** Phase 2 in progress (Android pending), Phase 3 pending > **Last updated:** 2026-02-17 > **Design doc:** [`CLIENT_TELEMETRY_DESIGN.md`](./CLIENT_TELEMETRY_DESIGN.md) > **Repos:** `learning_ai_common_plat` (platform-service) · `learning_voice_ai_agent` (all clients + dashboards) --- ## Phase 0 — Design & Review - [x] Write comprehensive telemetry design doc — schema, APIs, admin UX, privacy guardrails ([`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e)) - [x] Systematic review: identify and fix 18 bugs/gaps in the design doc ([`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02)) - TTL format (ISO → seconds), `regionCode` prefix format, missing `pk` field - Auth model for keyboard extension (`X-Install-Token`) - Config endpoint query params (`userId`/`anonymousInstallId`) - Error clustering made version-agnostic (`affectedVersions` array) - GDPR erasure endpoint added - iOS offline queue strategy (App Group UserDefaults, FIFO eviction) - Global defaults for `batchSize`/`flushInterval`/`maxQueueSize` --- ## Phase 1 — MVP (iOS Keyboard + Backend + Admin UI) ### Platform-Service Telemetry Module - [x] `types.ts` — Zod schemas for events, policies, clusters, queries ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) - [x] `repository.ts` — Cosmos DB CRUD for events, policies, clusters ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) - [x] `routes.ts` — Fastify routes: ingestion, config, admin query, clusters, policy CRUD, GDPR erasure ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) - [x] `telemetry.test.ts` — 34 Vitest tests for schemas + policy evaluation ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) - [x] Register telemetry routes in `server.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) - [x] Add Cosmos containers (`telemetry_events`, `telemetry_error_clusters`, `telemetry_collection_policies`) to `cosmos-init.ts` ([`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff)) ### iOS Keyboard Telemetry Client - [x] `LysnrTelemetry.swift` — Singleton client with App Group offline queue, `X-Install-Token` auth, 200-event cap ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475)) - [x] Instrument `KeyboardViewController.swift` — 10+ telemetry points ([`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475)) - [x] `session_started` / `session_ended` (with full `DictationContext`) - [x] `backend_selected` (azure / local + reason) - [x] `recognition_started` / `recognition_failed` - [x] `mic_permission_denied` - [x] `insert_noop` detection - [x] `error_recovery_attempted` (local→azure, azure→local) - [x] Session summary metrics (duration, segments, words, transcript length) ### Admin Dashboard — Client Logs Page - [x] `/ops/client-logs/page.tsx` — Events table + Error Clusters tab ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94)) - [x] Stat cards (total events, errors, warnings, keyboard events) - [x] Filters (platform, channel, level, module, free-text search) - [x] Expandable event detail rows (device, tags, metrics, dictation context) - [x] Error Clusters tab with severity, affected versions, user count - [x] `/api/telemetry/route.ts` — API route proxying to platform-service ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94)) - [x] `platform-client.ts` — `queryTelemetryEvents` + `queryTelemetryClusters` ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94)) - [x] `sidebar-nav.tsx` — "Client Logs" nav item with `FileText` icon ([`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94)) --- ## Phase 2 — Full Platform Coverage ### iOS Main App - [x] `TelemetryService.swift` — Main app telemetry service with App Group queue drain on foreground ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa)) - [x] `LysnrAIApp.swift` — `scenePhase` integration for activate/deactivate lifecycle ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa)) - [x] `app_foregrounded` / `app_backgrounded` events - [x] Keyboard queue flush on every foreground transition - [x] 60-second periodic flush timer ### Desktop App (Python) - [x] `platform_telemetry.py` — `PlatformTelemetry` singleton with `urllib.request` POST, threaded flush timer, persistent `install_id` in `~/.LysnrAI/install_id` ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa)) - [x] `main.py` instrumentation ([`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa)) - [x] `app_started` / `app_stopped` lifecycle events - [x] `dictation_started` (with backend tag) - [x] `dictation_completed` (with duration_ms, word_count, transcript_length metrics) - [x] `mic_permission_denied` / `recording_start_failed` error events ### Web User Dashboard - [x] `telemetry.ts` — Browser client with `sendBeacon`, `localStorage` install ID, auto-flush on visibility change ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6)) - [x] `/api/telemetry/ingest/route.ts` — Server-side proxy to platform-service ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6)) - [x] `providers.tsx` — `initTelemetry()` called on app mount ([`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6)) ### Tracker Dashboard (Not Started) - [ ] Add `telemetry.ts` client module (same pattern as user dashboard) - [ ] Add `/api/telemetry/ingest/route.ts` proxy - [ ] `initTelemetry()` in providers or layout ### Admin Dashboard Self-Telemetry (Not Started) - [ ] Add `telemetry.ts` client module to admin-dashboard-web - [ ] Add `/api/telemetry/admin-ingest/route.ts` proxy (separate from admin query route) - [ ] Track admin page views, filter usage, policy changes ### Android (Not Started) - [ ] `TelemetryClient.kt` — Kotlin telemetry client for Android keyboard + main app - [ ] Instrument `LysnrInputMethodService.kt` — dictation lifecycle events - [ ] Offline queue using SharedPreferences or Room - [ ] Flush on app foreground via `ProcessLifecycleOwner` --- ## Phase 3 — Intelligence & Admin Tooling ### Error Clustering & Alerting - [ ] Automated error fingerprinting (hash of `platform + channel + module + eventName + errorDomain + errorCode`) - [ ] Cluster severity escalation (`warn` → `error` → `fatal` based on count + affected users) - [ ] Slack/email alerting when cluster severity escalates - [ ] Dashboard: cluster timeline chart showing occurrence rate over time - [ ] Dashboard: "Resolve" / "Ignore" actions on clusters ### Geo Enrichment - [ ] Server-side IP → country/region lookup on ingestion (GeoLite2 or Azure Maps) - [ ] Populate `regionCode` field (e.g., `US:WA`) for events without client-provided region - [ ] Admin UI: geographic heatmap of error distribution - [ ] Policy targeting by `regionCode` ranges ### Collection Policy Builder UI - [ ] Admin page: `/ops/telemetry-policies` - [ ] CRUD UI for collection policies (name, enabled, targeting rules, sampling rates) - [ ] Targeting builder: platform checkboxes, version range inputs, percentage slider - [ ] Live preview: "N clients would match this policy" - [ ] Policy activation/deactivation toggle - [ ] Scheduling: `startsAt` / `expiresAt` date pickers ### Privacy & Compliance - [ ] PII regex scanner on ingestion (email, phone, SSN patterns → redact before storage) - [ ] Admin UI: GDPR erasure tool (search by userId → delete all events) - [ ] Retention policy enforcement (TTL-based auto-expiry per container) - [ ] Audit log entries for policy changes and data deletions ### Performance & Scale - [ ] Client-side config caching (poll `/api/telemetry/config` with `If-None-Match` ETag) - [ ] Server-side rate limiting per `installId` (100 events/min default) - [ ] Cosmos DB indexing policy tuning for `telemetry_events` (composite indexes on query patterns) - [ ] Batch ingestion deduplication by `event.id` - [ ] Prometheus metrics for ingestion throughput and error rates --- ## Architecture Summary ``` ┌─────────────────────┐ ┌──────────────────────┐ ┌───────────────────┐ │ iOS Keyboard Ext │ │ iOS Main App │ │ Desktop (Python) │ │ LysnrTelemetry │───▶│ TelemetryService │ │ PlatformTelemetry│ │ (App Group queue) │ │ (drains queue) │ │ (urllib POST) │ └─────────────────────┘ └──────────┬───────────┘ └────────┬──────────┘ Full Access ON ──┐ │ │ direct POST │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ Platform Service (Fastify, port 4003) │ │ POST /api/telemetry/events — batch ingestion │ │ GET /api/telemetry/config — client collection config │ │ GET /api/telemetry/query — admin event search │ │ GET /api/telemetry/clusters — admin error clusters │ │ CRUD /api/telemetry/policies — collection policy management │ │ DELETE /api/telemetry/user/:userId — GDPR erasure │ └────────────────────────────┬────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ Azure Cosmos DB │ │ telemetry_events partitionKeyPath: /pk │ │ pk value = productId:yyyyMM:platform (e.g. lysnrai:202602:ios) │ │ telemetry_error_clusters partitionKeyPath: /pk │ │ pk value = productId:platform:module (e.g. lysnrai:ios:dictation)│ │ telemetry_collection_policies partitionKeyPath: /productId │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────┐ ┌──────────────────────┐ │ Admin Dashboard │ GET │ User Dashboard │ POST │ /ops/client-logs │─────────▶│ /api/telemetry/ │─────────▶ platform │ (queries via │ query/ │ ingest │ /events -service │ platform-service API) │ clusters│ (browser → proxy) │ └─────────────────────────┘ └──────────────────────┘ ┌───────────────────────┐ │ Android (Phase 2) │ ← Not yet implemented │ TelemetryClient.kt │──▶ POST /api/telemetry/events ──▶ platform-service │ (SharedPreferences) │ └───────────────────────┘ ``` --- ## Bugs Found During Review The following bugs were discovered during systematic review of the roadmap against actual code and fixed: | # | Severity | Issue | Fix | | --- | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | | 1 | **High** | Desktop Python `id` used `uuid.uuid4().hex` (32 hex, no dashes) — fails Zod `.uuid()` server validation | Changed to `str(uuid.uuid4())` | | 2 | **High** | Web telemetry `osFamily='web'` not in Zod `OsFamilyEnum` — fails server validation | Changed to `'other'` | | 3 | **Medium** | Status said "Phase 2 complete" but Android is all unchecked | Fixed status line | | 4 | **Medium** | Architecture diagram showed wrong pk for `telemetry_error_clusters` (`/productId` → actual `/pk` = `productId:platform:module`) | Fixed diagram | | 5 | **Medium** | Tracker dashboard telemetry missing from roadmap entirely | Added as Phase 2 pending | | 6 | **Medium** | Admin dashboard self-telemetry (page views) not mentioned | Added as Phase 2 pending | | 7 | **Low** | Architecture diagram missing Android client box | Added with "not yet implemented" note | | 8 | **Low** | Architecture diagram implied Admin reads Cosmos directly (it queries Platform Service) | Fixed data flow arrows | | 9 | **Low** | Web `telemetry.ts` JSDoc said "via the admin dashboard proxy" (wrong dashboard) | Fixed to "user dashboard's /api/telemetry/ingest proxy" | | 10 | **Low** | Commit log missing roadmap doc commit | Added | --- ## Commit Log | Date | Repo | Commit | Description | | ---------- | ----------- | --------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | | 2026-02-16 | common-plat | [`c59049e`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c59049e) | Design doc: client telemetry & log insights | | 2026-02-16 | common-plat | [`083cf02`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/083cf02) | Fix 18 gaps in telemetry design doc (rev 2) | | 2026-02-16 | common-plat | [`ce4c4ff`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/ce4c4ff) | Telemetry module — ingest, config, query, clusters, policies (34 tests) | | 2026-02-17 | voice-agent | [`e546475`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/e546475) | iOS keyboard telemetry client + KeyboardViewController instrumentation | | 2026-02-17 | voice-agent | [`d202f94`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/d202f94) | Admin dashboard Client Logs page + sidebar nav | | 2026-02-17 | voice-agent | [`a173baa`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/a173baa) | iOS main app TelemetryService + Desktop Python platform_telemetry | | 2026-02-17 | voice-agent | [`130e1d6`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/130e1d6) | Web user-dashboard telemetry client + ingest proxy | | 2026-02-17 | common-plat | [`c3d6977`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c3d6977) | Telemetry roadmap doc (this file) | | 2026-02-17 | voice-agent | [`ae77438`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/ae77438) | Fix: desktop uuid format + web osFamily — pass Zod validation |