docs(ops): define telemetry taxonomy
This commit is contained in:
parent
067e32a2aa
commit
40b62bf3a6
@ -138,3 +138,4 @@ Current baseline note: after common-platform workspace alignment, `pnpm install
|
|||||||
- [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach
|
- [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach
|
||||||
- [`docs/SEED_BOOTSTRAP_STRATEGY.md`](docs/SEED_BOOTSTRAP_STRATEGY.md) — Built-in prompt, intake rule, onboarding workspace, and feature-flag bootstrap strategy
|
- [`docs/SEED_BOOTSTRAP_STRATEGY.md`](docs/SEED_BOOTSTRAP_STRATEGY.md) — Built-in prompt, intake rule, onboarding workspace, and feature-flag bootstrap strategy
|
||||||
- [`docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`](docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md) — Encrypted-field, schema-change, and backfill migration plan
|
- [`docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`](docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md) — Encrypted-field, schema-change, and backfill migration plan
|
||||||
|
- [`docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md`](docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md) — Event taxonomy and diagnostic breadcrumb contract
|
||||||
|
|||||||
@ -101,6 +101,7 @@ Do not place secrets in `NEXT_PUBLIC_*` or `EXPO_PUBLIC_*` variables.
|
|||||||
- Confirm backend, web, and mobile tests from `docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md` P10 have passed or have explicit release-owner signoff.
|
- Confirm backend, web, and mobile tests from `docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md` P10 have passed or have explicit release-owner signoff.
|
||||||
- Confirm Docker images build in CI.
|
- Confirm Docker images build in CI.
|
||||||
- Confirm common-platform services are deployed and reachable: platform-service, extraction-service, mcp-server, telemetry, diagnostics, flags, kill switch, blob.
|
- Confirm common-platform services are deployed and reachable: platform-service, extraction-service, mcp-server, telemetry, diagnostics, flags, kill switch, blob.
|
||||||
|
- Confirm telemetry events and diagnostic breadcrumbs follow `docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md`.
|
||||||
- Confirm Cosmos database, containers, partition keys, backups, and retention policy are ready.
|
- Confirm Cosmos database, containers, partition keys, backups, and retention policy are ready.
|
||||||
- Confirm field encryption provider and key material are ready.
|
- Confirm field encryption provider and key material are ready.
|
||||||
- Confirm feature flags and kill switch defaults are safe for release.
|
- Confirm feature flags and kill switch defaults are safe for release.
|
||||||
|
|||||||
145
docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md
Normal file
145
docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md
Normal file
@ -0,0 +1,145 @@
|
|||||||
|
# NoteLett Telemetry And Diagnostics Taxonomy
|
||||||
|
|
||||||
|
Date: May 5, 2026
|
||||||
|
Product ID: `notelett`
|
||||||
|
Common platform sources:
|
||||||
|
|
||||||
|
- `../learning_ai/learning_ai_common_plat/docs/ecosystem/ECOSYSTEM_EVENT_TAXONOMY.md`
|
||||||
|
- `../learning_ai/learning_ai_common_plat/docs/design/CLIENT_TELEMETRY_DESIGN.md`
|
||||||
|
- `@bytelyst/backend-telemetry`
|
||||||
|
- `@bytelyst/telemetry-client`
|
||||||
|
- `@bytelyst/diagnostics-client`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document defines the production event vocabulary and diagnostic breadcrumb contract for NoteLett. It is intentionally product-specific while staying aligned with common-platform telemetry and ecosystem event naming.
|
||||||
|
|
||||||
|
Telemetry events answer "what happened?" Breadcrumbs answer "what led up to the failure report?"
|
||||||
|
|
||||||
|
## Event Naming Rules
|
||||||
|
|
||||||
|
- Use dot-separated lowercase names: `<domain>.<entity>.<action>`.
|
||||||
|
- Prefer stable facts over UI labels.
|
||||||
|
- Keep `productId`, user/install identity, surface, channel, request id, correlation id, and release metadata in the event envelope or metadata.
|
||||||
|
- Never include note body text, prompt text, extracted URL text, blob paths, share tokens, JWTs, API keys, email addresses, or raw LLM responses.
|
||||||
|
- Use counts, booleans, durations, ids, content type, status, feature flag key, model/provider name, and error class instead of sensitive payloads.
|
||||||
|
|
||||||
|
The backend currently buffers events through `backend/src/lib/telemetry.ts`; web/mobile use common-platform telemetry clients configured in `web/src/lib/telemetry.ts` and `mobile/src/lib/platform.ts`.
|
||||||
|
|
||||||
|
## Required Metadata
|
||||||
|
|
||||||
|
Every new event should include the fields below when available:
|
||||||
|
|
||||||
|
| Metadata | Source |
|
||||||
|
| --- | --- |
|
||||||
|
| `productId` | common client/backend config |
|
||||||
|
| `surface` or `channel` | `backend`, `notes_web`, `notelett_mobile`, `mcp` |
|
||||||
|
| `userId` or `anonymousInstallId` | auth/session layer |
|
||||||
|
| `requestId` | `x-request-id` / generated outbound request id |
|
||||||
|
| `correlationId` | MCP/tool/action/workflow id when available |
|
||||||
|
| `workspaceId` | workspace-scoped product flows |
|
||||||
|
| `noteId` | note-scoped product flows |
|
||||||
|
| `durationMs` | LLM, extraction, prompt, intake, upload, and scheduler work |
|
||||||
|
| `status` | lifecycle state such as `submitted`, `completed`, `failed`, `approved` |
|
||||||
|
| `errorType` | sanitized class/code, not raw secret-bearing messages |
|
||||||
|
|
||||||
|
## Backend Event Taxonomy
|
||||||
|
|
||||||
|
| Domain | Event | Required metadata | Notes |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| Notes | `note.created` | `noteId`, `workspaceId`, `sourceType?` | Existing event; use for manual, import, intake, and voice-created notes. |
|
||||||
|
| Notes | `note.updated` | `noteId`, `workspaceId`, `changedFields?` | Avoid logging field values. |
|
||||||
|
| Notes | `note.archived` | `noteId`, `workspaceId` | Existing event. |
|
||||||
|
| Notes | `note.restored` | `noteId`, `workspaceId` | Add when restore telemetry is implemented. |
|
||||||
|
| Notes | `note.searched` | `workspaceId?`, `mode`, `resultCount?` | Existing event for lexical/hybrid search. |
|
||||||
|
| Notes | `note.exported_text` | `noteId`, `format?` | Existing text export/share-safe event. |
|
||||||
|
| Sharing | `note.share_created` | `noteId`, `workspaceId`, `expiresInHours?` | Never log share token. |
|
||||||
|
| Sharing | `note.share_revoked` | `noteId`, `workspaceId` | Existing event. |
|
||||||
|
| Collaboration | `note.shared_with_user` | `noteId`, `permission` | Existing event; do not log email. |
|
||||||
|
| Collaboration | `note.collaborator_removed` | `noteId`, `removedUserId` | Existing event. |
|
||||||
|
| Workspaces | `workspace.created` | `workspaceId` | Existing event. |
|
||||||
|
| Workspaces | `workspace.onboarding_seeded` | `workspaceId`, `noteCount`, `agentActionCount` | Existing event; add counts when touched. |
|
||||||
|
|
||||||
|
## Prompt And AI Event Taxonomy
|
||||||
|
|
||||||
|
| Domain | Event | Required metadata | Notes |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| Prompt templates | `smart_action_template_created` | `category`, `inputType` | Existing event. |
|
||||||
|
| Prompt runs | `smart_action_run` | `templateSlug`, `noteId?`, `workspaceId?`, `model?`, `durationMs?` | Existing event in runner. |
|
||||||
|
| Prompt runs | `smart_action_result_saved` | `templateSlug`, `resultType`, `durationMs?` | Existing event. |
|
||||||
|
| Prompt runs | `smart_action_error` | `templateSlug`, `errorType`, `durationMs?` | Existing event; sanitize error. |
|
||||||
|
| Prompt schedules | `scheduled_action_fired` | `scheduleId`, `templateSlug` | Existing event. |
|
||||||
|
| Prompt webhooks | `webhook_triggered` | `webhookId`, `triggerEvent` | Existing event; do not log webhook secret. |
|
||||||
|
| Copilot | `note.copilot` | `noteId`, `action` | Existing route event. |
|
||||||
|
| Copilot | `copilot_transform` | `action`, `durationMs` | Existing lower-level event. Prefer joining to `note.copilot` through request id. |
|
||||||
|
| Suggestions | `duplicate_detected` | `noteId`, `similarityScore` | Existing event; avoid logging titles/body. |
|
||||||
|
| Suggestions | `auto_summarize_triggered` | `noteId`, `wordCount` | Existing event. |
|
||||||
|
| URL extraction | `url_extract_completed` | `domain`, `wordCount` | Existing event; log domain only, not full URL. |
|
||||||
|
| Palace | `palace.memories_extracted` | `workspaceId?`, `noteId?`, `memoryCount` | Existing event; no raw memory content. |
|
||||||
|
|
||||||
|
## Intake Event Taxonomy
|
||||||
|
|
||||||
|
| Event | Required metadata | Notes |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `intake_submitted` | `contentType`, `templateSlug`, `domain`, `workspaceId?` | Existing event; domain only. |
|
||||||
|
| `intake_job_completed` | `contentType`, `templateSlug`, `domain`, `durationMs?` | Existing event; add duration when touched. |
|
||||||
|
| `intake_job_failed` | `contentType`, `errorType`, `stage?` | Existing event currently logs a raw error string; next code touch should normalize to `errorType`. |
|
||||||
|
| `intake_rule_created` | `ruleId`, `contentType`, `priority` | Add when rule telemetry is implemented. |
|
||||||
|
| `intake_rule_updated` | `ruleId`, `changedFields` | Add when rule telemetry is implemented. |
|
||||||
|
|
||||||
|
## Reviews And MCP Event Taxonomy
|
||||||
|
|
||||||
|
| Domain | Event | Required metadata | Notes |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| Reviews | `agent_action.created` | `actionId`, `workspaceId`, `noteId?`, `toolName`, `actionType`, `actorType` | Add for direct API-created actions. |
|
||||||
|
| Reviews | `agent_action.approved` | `actionId`, `workspaceId`, `noteId?`, `reviewerId?` | Add for single or batch review. |
|
||||||
|
| Reviews | `agent_action.rejected` | `actionId`, `workspaceId`, `noteId?`, `reviewerId?` | Add for single or batch review. |
|
||||||
|
| Reviews | `agent_action.batch_reviewed` | `approvedCount`, `rejectedCount`, `total` | Add for batch endpoint. |
|
||||||
|
| MCP | `mcp.tool.called` | `toolName`, `correlationId`, `dryRun`, `idempotencyKey?` | Add for read/write tools if telemetry volume is acceptable. |
|
||||||
|
| MCP | `mcp.tool.applied` | `toolName`, `actionId?`, `workspaceId`, `noteId?` | Write tools should connect to agent action audit rows. |
|
||||||
|
| MCP | `mcp.tool.failed` | `toolName`, `errorType`, `correlationId?` | Sanitize error details. |
|
||||||
|
|
||||||
|
MCP events should align with common-platform agent runtime names where possible: `agent.run.started`, `agent.run.completed`, and `audit.action.logged` remain ecosystem-level names for cross-product replay. NoteLett can emit product-local `mcp.*` events for operational dashboards.
|
||||||
|
|
||||||
|
## Mobile Capture Event Taxonomy
|
||||||
|
|
||||||
|
Mobile telemetry uses `@bytelyst/telemetry-client` with channel `notelett_mobile`.
|
||||||
|
|
||||||
|
| Event | Required metadata | Notes |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `mobile_app_initialized` | `appVersion`, `buildNumber`, `osFamily` | Existing event category `app_shell`. |
|
||||||
|
| `mobile.capture.started` | `captureMode`, `workspaceId?` | Add when capture flow is instrumented. |
|
||||||
|
| `mobile.capture.saved` | `captureMode`, `noteId`, `workspaceId`, `hasBlob` | Do not log raw captured text. |
|
||||||
|
| `mobile.capture.failed` | `captureMode`, `errorType`, `offline` | Sanitize error details. |
|
||||||
|
| `mobile.intake.shared_url_received` | `contentType?`, `domain?` | Domain only; do not log full URL. |
|
||||||
|
| `mobile.offline_queue.flushed` | `queuedCount`, `successCount`, `failureCount` | Useful for offline reliability. |
|
||||||
|
| `mobile.telemetry.flushed` | `reason`, `queuedCount?` | Pair with app background lifecycle. |
|
||||||
|
|
||||||
|
## Diagnostic Breadcrumbs
|
||||||
|
|
||||||
|
Use common-platform `@bytelyst/diagnostics-client` breadcrumbs for client-side failure reports. Breadcrumbs should be terse, bounded, and free of sensitive text.
|
||||||
|
|
||||||
|
Recommended categories and messages:
|
||||||
|
|
||||||
|
| Category | Message | Data |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `navigation` | `opened_dashboard`, `opened_note_detail`, `opened_capture`, `opened_reviews`, `opened_settings` | route id, note/workspace id where already visible |
|
||||||
|
| `note` | `note_create_started`, `note_save_completed`, `note_archive_failed` | noteId, workspaceId, status, errorType |
|
||||||
|
| `prompt` | `prompt_run_started`, `prompt_run_completed`, `prompt_run_failed` | templateSlug, noteId, workspaceId, durationMs, errorType |
|
||||||
|
| `intake` | `intake_submitted`, `intake_polling_started`, `intake_failed` | contentType, domain, jobId, errorType |
|
||||||
|
| `review` | `review_decision_started`, `review_decision_completed`, `review_decision_failed` | actionId, decision, errorType |
|
||||||
|
| `capture` | `capture_mode_selected`, `capture_saved`, `capture_failed` | captureMode, workspaceId, noteId, errorType |
|
||||||
|
| `offline` | `offline_queue_enqueued`, `offline_queue_flushed`, `offline_queue_failed` | operation, queuedCount, errorType |
|
||||||
|
| `platform` | `feature_flags_unavailable`, `kill_switch_checked`, `telemetry_flush_failed` | dependency, status, errorType |
|
||||||
|
| `mcp` | `mcp_settings_updated`, `mcp_connection_failed` | serverHost, errorType |
|
||||||
|
|
||||||
|
Breadcrumb data must not include note bodies, prompt bodies, token values, blob paths, share tokens, full URLs, headers, or response bodies.
|
||||||
|
|
||||||
|
## Adoption Checklist
|
||||||
|
|
||||||
|
- New backend route handlers call `trackEvent()` for release-critical facts.
|
||||||
|
- New web/mobile workflows add breadcrumbs around failure-prone actions.
|
||||||
|
- Error events use `errorType` or a short code, not raw exception messages, unless the message is guaranteed sanitized.
|
||||||
|
- Event names are documented here before they are emitted from production code.
|
||||||
|
- High-volume events have sampling or an explicit volume review.
|
||||||
|
- P10 final verification samples `/api/diagnostics/telemetry` in non-production and confirms platform telemetry ingest in production smoke.
|
||||||
Loading…
Reference in New Issue
Block a user