From 40b62bf3a6ab7202ce692899b3c102c6950bdac7 Mon Sep 17 00:00:00 2001
From: Saravana Achu Mac <saravanakumardb1@gmail.com>
Date: Tue, 5 May 2026 13:51:28 -0700
Subject: [PATCH] docs(ops): define telemetry taxonomy

---
 README.md                                  |   1 +
 docs/RELEASE_CHECKLIST.md                  |   1 +
 docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md | 145 +++++++++++++++++++++
 3 files changed, 147 insertions(+)
 create mode 100644 docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md
diff --git a/README.md b/README.md
index b4a404a..a4513e3 100644
--- a/README.md
+++ b/README.md
@@ -138,3 +138,4 @@ Current baseline note: after common-platform workspace alignment, `pnpm install
 - [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach
 - [`docs/SEED_BOOTSTRAP_STRATEGY.md`](docs/SEED_BOOTSTRAP_STRATEGY.md) — Built-in prompt, intake rule, onboarding workspace, and feature-flag bootstrap strategy
 - [`docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`](docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md) — Encrypted-field, schema-change, and backfill migration plan
+- [`docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md`](docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md) — Event taxonomy and diagnostic breadcrumb contract
diff --git a/docs/RELEASE_CHECKLIST.md b/docs/RELEASE_CHECKLIST.md
index 31ed08e..d5608cf 100644
--- a/docs/RELEASE_CHECKLIST.md
+++ b/docs/RELEASE_CHECKLIST.md
@@ -101,6 +101,7 @@ Do not place secrets in `NEXT_PUBLIC_*` or `EXPO_PUBLIC_*` variables.
 - Confirm backend, web, and mobile tests from `docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md` P10 have passed or have explicit release-owner signoff.
 - Confirm Docker images build in CI.
 - Confirm common-platform services are deployed and reachable: platform-service, extraction-service, mcp-server, telemetry, diagnostics, flags, kill switch, blob.
+- Confirm telemetry events and diagnostic breadcrumbs follow `docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md`.
 - Confirm Cosmos database, containers, partition keys, backups, and retention policy are ready.
 - Confirm field encryption provider and key material are ready.
 - Confirm feature flags and kill switch defaults are safe for release.
diff --git a/docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md b/docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md
new file mode 100644
index 0000000..30bcdbc
--- /dev/null
+++ b/docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md
@@ -0,0 +1,145 @@
+# NoteLett Telemetry And Diagnostics Taxonomy
+
+Date: May 5, 2026
+Product ID: `notelett`
+Common platform sources:
+
+- `../learning_ai/learning_ai_common_plat/docs/ecosystem/ECOSYSTEM_EVENT_TAXONOMY.md`
+- `../learning_ai/learning_ai_common_plat/docs/design/CLIENT_TELEMETRY_DESIGN.md`
+- `@bytelyst/backend-telemetry`
+- `@bytelyst/telemetry-client`
+- `@bytelyst/diagnostics-client`
+
+## Purpose
+
+This document defines the production event vocabulary and diagnostic breadcrumb contract for NoteLett. It is intentionally product-specific while staying aligned with common-platform telemetry and ecosystem event naming.
+
+Telemetry events answer "what happened?" Breadcrumbs answer "what led up to the failure report?"
+
+## Event Naming Rules
+
+- Use dot-separated lowercase names: `<domain>.<entity>.<action>`.
+- Prefer stable facts over UI labels.
+- Keep `productId`, user/install identity, surface, channel, request id, correlation id, and release metadata in the event envelope or metadata.
+- Never include note body text, prompt text, extracted URL text, blob paths, share tokens, JWTs, API keys, email addresses, or raw LLM responses.
+- Use counts, booleans, durations, ids, content type, status, feature flag key, model/provider name, and error class instead of sensitive payloads.
+
+The backend currently buffers events through `backend/src/lib/telemetry.ts`; web/mobile use common-platform telemetry clients configured in `web/src/lib/telemetry.ts` and `mobile/src/lib/platform.ts`.
+
+## Required Metadata
+
+Every new event should include the fields below when available:
+
+| Metadata | Source |
+| --- | --- |
+| `productId` | common client/backend config |
+| `surface` or `channel` | `backend`, `notes_web`, `notelett_mobile`, `mcp` |
+| `userId` or `anonymousInstallId` | auth/session layer |
+| `requestId` | `x-request-id` / generated outbound request id |
+| `correlationId` | MCP/tool/action/workflow id when available |
+| `workspaceId` | workspace-scoped product flows |
+| `noteId` | note-scoped product flows |
+| `durationMs` | LLM, extraction, prompt, intake, upload, and scheduler work |
+| `status` | lifecycle state such as `submitted`, `completed`, `failed`, `approved` |
+| `errorType` | sanitized class/code, not raw secret-bearing messages |
+
+## Backend Event Taxonomy
+
+| Domain | Event | Required metadata | Notes |
+| --- | --- | --- | --- |
+| Notes | `note.created` | `noteId`, `workspaceId`, `sourceType?` | Existing event; use for manual, import, intake, and voice-created notes. |
+| Notes | `note.updated` | `noteId`, `workspaceId`, `changedFields?` | Avoid logging field values. |
+| Notes | `note.archived` | `noteId`, `workspaceId` | Existing event. |
+| Notes | `note.restored` | `noteId`, `workspaceId` | Add when restore telemetry is implemented. |
+| Notes | `note.searched` | `workspaceId?`, `mode`, `resultCount?` | Existing event for lexical/hybrid search. |
+| Notes | `note.exported_text` | `noteId`, `format?` | Existing text export/share-safe event. |
+| Sharing | `note.share_created` | `noteId`, `workspaceId`, `expiresInHours?` | Never log share token. |
+| Sharing | `note.share_revoked` | `noteId`, `workspaceId` | Existing event. |
+| Collaboration | `note.shared_with_user` | `noteId`, `permission` | Existing event; do not log email. |
+| Collaboration | `note.collaborator_removed` | `noteId`, `removedUserId` | Existing event. |
+| Workspaces | `workspace.created` | `workspaceId` | Existing event. |
+| Workspaces | `workspace.onboarding_seeded` | `workspaceId`, `noteCount`, `agentActionCount` | Existing event; add counts when touched. |
+
+## Prompt And AI Event Taxonomy
+
+| Domain | Event | Required metadata | Notes |
+| --- | --- | --- | --- |
+| Prompt templates | `smart_action_template_created` | `category`, `inputType` | Existing event. |
+| Prompt runs | `smart_action_run` | `templateSlug`, `noteId?`, `workspaceId?`, `model?`, `durationMs?` | Existing event in runner. |
+| Prompt runs | `smart_action_result_saved` | `templateSlug`, `resultType`, `durationMs?` | Existing event. |
+| Prompt runs | `smart_action_error` | `templateSlug`, `errorType`, `durationMs?` | Existing event; sanitize error. |
+| Prompt schedules | `scheduled_action_fired` | `scheduleId`, `templateSlug` | Existing event. |
+| Prompt webhooks | `webhook_triggered` | `webhookId`, `triggerEvent` | Existing event; do not log webhook secret. |
+| Copilot | `note.copilot` | `noteId`, `action` | Existing route event. |
+| Copilot | `copilot_transform` | `action`, `durationMs` | Existing lower-level event. Prefer joining to `note.copilot` through request id. |
+| Suggestions | `duplicate_detected` | `noteId`, `similarityScore` | Existing event; avoid logging titles/body. |
+| Suggestions | `auto_summarize_triggered` | `noteId`, `wordCount` | Existing event. |
+| URL extraction | `url_extract_completed` | `domain`, `wordCount` | Existing event; log domain only, not full URL. |
+| Palace | `palace.memories_extracted` | `workspaceId?`, `noteId?`, `memoryCount` | Existing event; no raw memory content. |
+
+## Intake Event Taxonomy
+
+| Event | Required metadata | Notes |
+| --- | --- | --- |
+| `intake_submitted` | `contentType`, `templateSlug`, `domain`, `workspaceId?` | Existing event; domain only. |
+| `intake_job_completed` | `contentType`, `templateSlug`, `domain`, `durationMs?` | Existing event; add duration when touched. |
+| `intake_job_failed` | `contentType`, `errorType`, `stage?` | Existing event currently logs a raw error string; next code touch should normalize to `errorType`. |
+| `intake_rule_created` | `ruleId`, `contentType`, `priority` | Add when rule telemetry is implemented. |
+| `intake_rule_updated` | `ruleId`, `changedFields` | Add when rule telemetry is implemented. |
+
+## Reviews And MCP Event Taxonomy
+
+| Domain | Event | Required metadata | Notes |
+| --- | --- | --- | --- |
+| Reviews | `agent_action.created` | `actionId`, `workspaceId`, `noteId?`, `toolName`, `actionType`, `actorType` | Add for direct API-created actions. |
+| Reviews | `agent_action.approved` | `actionId`, `workspaceId`, `noteId?`, `reviewerId?` | Add for single or batch review. |
+| Reviews | `agent_action.rejected` | `actionId`, `workspaceId`, `noteId?`, `reviewerId?` | Add for single or batch review. |
+| Reviews | `agent_action.batch_reviewed` | `approvedCount`, `rejectedCount`, `total` | Add for batch endpoint. |
+| MCP | `mcp.tool.called` | `toolName`, `correlationId`, `dryRun`, `idempotencyKey?` | Add for read/write tools if telemetry volume is acceptable. |
+| MCP | `mcp.tool.applied` | `toolName`, `actionId?`, `workspaceId`, `noteId?` | Write tools should connect to agent action audit rows. |
+| MCP | `mcp.tool.failed` | `toolName`, `errorType`, `correlationId?` | Sanitize error details. |
+
+MCP events should align with common-platform agent runtime names where possible: `agent.run.started`, `agent.run.completed`, and `audit.action.logged` remain ecosystem-level names for cross-product replay. NoteLett can emit product-local `mcp.*` events for operational dashboards.
+
+## Mobile Capture Event Taxonomy
+
+Mobile telemetry uses `@bytelyst/telemetry-client` with channel `notelett_mobile`.
+
+| Event | Required metadata | Notes |
+| --- | --- | --- |
+| `mobile_app_initialized` | `appVersion`, `buildNumber`, `osFamily` | Existing event category `app_shell`. |
+| `mobile.capture.started` | `captureMode`, `workspaceId?` | Add when capture flow is instrumented. |
+| `mobile.capture.saved` | `captureMode`, `noteId`, `workspaceId`, `hasBlob` | Do not log raw captured text. |
+| `mobile.capture.failed` | `captureMode`, `errorType`, `offline` | Sanitize error details. |
+| `mobile.intake.shared_url_received` | `contentType?`, `domain?` | Domain only; do not log full URL. |
+| `mobile.offline_queue.flushed` | `queuedCount`, `successCount`, `failureCount` | Useful for offline reliability. |
+| `mobile.telemetry.flushed` | `reason`, `queuedCount?` | Pair with app background lifecycle. |
+
+## Diagnostic Breadcrumbs
+
+Use common-platform `@bytelyst/diagnostics-client` breadcrumbs for client-side failure reports. Breadcrumbs should be terse, bounded, and free of sensitive text.
+
+Recommended categories and messages:
+
+| Category | Message | Data |
+| --- | --- | --- |
+| `navigation` | `opened_dashboard`, `opened_note_detail`, `opened_capture`, `opened_reviews`, `opened_settings` | route id, note/workspace id where already visible |
+| `note` | `note_create_started`, `note_save_completed`, `note_archive_failed` | noteId, workspaceId, status, errorType |
+| `prompt` | `prompt_run_started`, `prompt_run_completed`, `prompt_run_failed` | templateSlug, noteId, workspaceId, durationMs, errorType |
+| `intake` | `intake_submitted`, `intake_polling_started`, `intake_failed` | contentType, domain, jobId, errorType |
+| `review` | `review_decision_started`, `review_decision_completed`, `review_decision_failed` | actionId, decision, errorType |
+| `capture` | `capture_mode_selected`, `capture_saved`, `capture_failed` | captureMode, workspaceId, noteId, errorType |
+| `offline` | `offline_queue_enqueued`, `offline_queue_flushed`, `offline_queue_failed` | operation, queuedCount, errorType |
+| `platform` | `feature_flags_unavailable`, `kill_switch_checked`, `telemetry_flush_failed` | dependency, status, errorType |
+| `mcp` | `mcp_settings_updated`, `mcp_connection_failed` | serverHost, errorType |
+
+Breadcrumb data must not include note bodies, prompt bodies, token values, blob paths, share tokens, full URLs, headers, or response bodies.
+
+## Adoption Checklist
+
+- New backend route handlers call `trackEvent()` for release-critical facts.
+- New web/mobile workflows add breadcrumbs around failure-prone actions.
+- Error events use `errorType` or a short code, not raw exception messages, unless the message is guaranteed sanitized.
+- Event names are documented here before they are emitted from production code.
+- High-volume events have sampling or an explicit volume review.
+- P10 final verification samples `/api/diagnostics/telemetry` in non-production and confirms platform telemetry ingest in production smoke.