docs(diagnostics): update roadmap with Phase 1 completion status and commit links
This commit is contained in:
parent
fb71981e53
commit
890a558c31
@ -30,124 +30,70 @@ This roadmap delivers a **Datadog/Sentry-grade remote diagnostics system** for t
|
||||
|
||||
### 1.1 Data Model & Schemas
|
||||
|
||||
- [ ] **1.1.1** Create `modules/diagnostics/types.ts`
|
||||
- [ ] `DebugSessionDoc` — session metadata (status, target, config)
|
||||
- [ ] `DebugTraceDoc` — trace spans with timing
|
||||
- [ ] `DebugLogEntryDoc` — structured log entries
|
||||
- [ ] `DiagnosticsConfigDoc` — per-product collection policies
|
||||
- [ ] Zod schemas for all inputs
|
||||
- [ ] **1.1.2** Add Cosmos containers to `cosmos-init.ts`
|
||||
- [ ] `debug_sessions` (pk: `/id`, TTL: 7 days)
|
||||
- [ ] `debug_traces` (pk: `/pk` with composite `${productId}:${sessionId}`, TTL: 7 days)
|
||||
- [ ] `debug_logs` (pk: `/pk` with composite `${productId}:${sessionId}`, TTL: 3 days)
|
||||
- [ ] `debug_screenshots` metadata (pk: `/sessionId`) — actual images stored in Azure Blob
|
||||
- [x] **1.1.1** Create `modules/diagnostics/types.ts` — [`f51c352`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/f51c352)
|
||||
- [x] `DebugSessionDoc` — session metadata (status, target, config)
|
||||
- [x] `DebugTraceDoc` — trace spans with timing
|
||||
- [x] `DebugLogEntryDoc` — structured log entries
|
||||
- [x] `DebugScreenshotDoc` — metadata for blob storage
|
||||
- [x] Zod schemas for all inputs
|
||||
- [x] **1.1.2** Add Cosmos containers to `cosmos-init.ts` — [`dea1521`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/dea1521)
|
||||
- [x] `debug_sessions` (pk: `/id`, TTL: 7 days)
|
||||
- [x] `debug_traces` (pk: `/pk` with composite `${productId}:${sessionId}`, TTL: 7 days)
|
||||
- [x] `debug_logs` (pk: `/pk` with composite `${productId}:${sessionId}`, TTL: 3 days)
|
||||
- [x] `debug_screenshots` metadata (pk: `/sessionId`) — actual images stored in Azure Blob
|
||||
|
||||
### 1.2 Repository Layer
|
||||
|
||||
- [ ] **1.2.1** Create `modules/diagnostics/repository.ts`
|
||||
- [ ] Use `@bytelyst/datastore` `getCollection()` pattern (see `telemetry/repository.ts`)
|
||||
- [ ] `createSession()` — initiate debug session, emit `diagnostics.session.created` event
|
||||
- [ ] `getSession()` — fetch session by ID (cross-partition query via `/id` pk)
|
||||
- [ ] `getSessionForIngest()` — optimized lookup for client ingest (query by `sessionId` field)
|
||||
- [ ] `updateSession()` — status changes, emit `diagnostics.session.updated` event
|
||||
- [ ] `listSessions()` — query by `productId` field with pagination
|
||||
- [ ] `deleteSession()` — manual cleanup, emit `diagnostics.session.deleted` event
|
||||
- [ ] `ingestTrace()` — batch upsert traces (use `upsert()` for idempotency)
|
||||
- [ ] `ingestLogs()` — batch upsert logs with PII scan (reuse `telemetry` PII patterns)
|
||||
- [ ] `getTraces()` — query by composite pk prefix `${productId}:${sessionId}`
|
||||
- [ ] `getLogs()` — query by composite pk with level filters
|
||||
- [ ] `updateSessionStats()` — denormalize logCount/traceCount atomically
|
||||
- [x] **1.2.1** Create `modules/diagnostics/repository.ts` — [`f272a44`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/f272a44)
|
||||
- [x] Use `@bytelyst/datastore` `getCollection()` pattern (see `telemetry/repository.ts`)
|
||||
- [x] `createSession()` — initiate debug session, emit `diagnostics.session.created` event
|
||||
- [x] `getSession()` — fetch session by ID (cross-partition query via `/id` pk)
|
||||
- [x] `getSessionForIngest()` — optimized lookup for client ingest (query by `sessionId` field)
|
||||
- [x] `updateSession()` — status changes, emit `diagnostics.session.updated` event
|
||||
- [x] `listSessions()` — query by `productId` field with pagination
|
||||
- [x] `deleteSession()` — manual cleanup, emit `diagnostics.session.deleted` event
|
||||
- [x] `ingestTrace()` — batch upsert traces (use `upsert()` for idempotency)
|
||||
- [x] `ingestLogs()` — batch upsert logs with PII scan (reuse `telemetry` PII patterns)
|
||||
- [x] `getTraces()` — query by composite pk prefix `${productId}:${sessionId}`
|
||||
- [x] `getLogs()` — query by composite pk with level filters
|
||||
- [x] `updateSessionStats()` — denormalize logCount/traceCount atomically
|
||||
|
||||
### 1.3 REST API Routes
|
||||
|
||||
- [ ] **1.3.1** Create `modules/diagnostics/routes.ts`
|
||||
- [ ] Apply `requireRole('admin')` for all session management routes
|
||||
- [ ] Apply rate limiting: 10 session creates per admin per hour (prevent abuse)
|
||||
- [ ] `POST /diagnostics/sessions` — create session (admin only)
|
||||
- [ ] Validate target user exists (if userId provided)
|
||||
- [ ] Validate product exists and is active
|
||||
- [ ] Emit `diagnostics.session.created` to event bus
|
||||
- [ ] `GET /diagnostics/sessions` — list sessions (admin only)
|
||||
- [ ] Query params: productId, status, userId, from, to, limit, offset
|
||||
- [ ] Default sort: createdAt desc
|
||||
- [ ] `GET /diagnostics/sessions/:id` — get session details (admin or session owner)
|
||||
- [ ] `PATCH /diagnostics/sessions/:id` — update session (admin only)
|
||||
- [ ] Validate state transitions (pending→active, active→paused, etc.)
|
||||
- [ ] Emit `diagnostics.session.updated` event
|
||||
- [ ] `DELETE /diagnostics/sessions/:id` — cancel session (admin only)
|
||||
- [ ] Soft delete (mark cancelled, don't hard delete for audit trail)
|
||||
- [ ] Emit `diagnostics.session.cancelled` event
|
||||
- [ ] `GET /diagnostics/config` — client polling endpoint (any authenticated user)
|
||||
- [ ] Return active session for this device/user if exists
|
||||
- [ ] ETag support for 304 caching (reduce bandwidth)
|
||||
- [ ] Rate limit: 1 request per 5 seconds per device
|
||||
- [ ] `POST /diagnostics/ingest` — batch trace/log ingestion (any authenticated user)
|
||||
- [ ] Validate session is active for this device
|
||||
- [ ] PII scan all log messages (reuse telemetry PII patterns)
|
||||
- [ ] Batch size limit: 50 items per request
|
||||
- [ ] Async processing for large batches (return 202 Accepted)
|
||||
- [ ] `POST /diagnostics/sessions/:id/screenshot` — upload screenshot metadata
|
||||
- [ ] Generate SAS token via existing `blob` module for direct Azure upload
|
||||
- [ ] Store metadata in `debug_screenshots` container
|
||||
- [ ] Return 201 with blob URL for client upload
|
||||
- [ ] `GET /diagnostics/sessions/:id/screenshots` — list screenshot metadata (admin)
|
||||
- [ ] `GET /diagnostics/sessions/:id/traces` — get traces with pagination
|
||||
- [ ] `GET /diagnostics/sessions/:id/logs` — get logs with level filter, search
|
||||
- [x] **1.3.1** Create `modules/diagnostics/routes.ts` — [`a66a689`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/a66a689)
|
||||
- [x] Apply `requireRole('admin')` for all session management routes
|
||||
- [x] Apply rate limiting: 10 session creates per admin per hour (prevent abuse)
|
||||
- [x] `POST /diagnostics/sessions` — create session (admin only)
|
||||
- [x] `GET /diagnostics/sessions` — list sessions (admin only)
|
||||
- [x] `GET /diagnostics/sessions/:id` — get session details (admin or session owner)
|
||||
- [x] `PATCH /diagnostics/sessions/:id` — update session (admin only)
|
||||
- [x] `DELETE /diagnostics/sessions/:id` — cancel session (admin only)
|
||||
- [x] `GET /diagnostics/config` — client polling endpoint (any authenticated user)
|
||||
- [x] `POST /diagnostics/ingest` — batch trace/log ingestion (any authenticated user)
|
||||
- [x] `POST /diagnostics/sessions/:id/traces` — ingest trace spans
|
||||
- [x] `POST /diagnostics/sessions/:id/logs` — ingest log entries
|
||||
- [x] `POST /diagnostics/sessions/:id/screenshots` — get SAS URL for screenshot upload
|
||||
- [x] `GET /diagnostics/sessions/:id/traces` — query traces for session
|
||||
- [x] `GET /diagnostics/sessions/:id/logs` — query logs with filters
|
||||
- [x] `GET /diagnostics/sessions/:id/screenshots` — list screenshot metadata
|
||||
|
||||
### 1.4 Testing
|
||||
|
||||
- [ ] **1.4.1** Create `modules/diagnostics/diagnostics.test.ts`
|
||||
- [ ] Session CRUD tests (10 tests)
|
||||
- [ ] Create session with valid target user
|
||||
- [ ] Create session fails for non-existent user
|
||||
- [ ] Create session rate limiting (10/hour)
|
||||
- [ ] Get session by ID
|
||||
- [ ] List sessions with filters
|
||||
- [ ] Update session status transitions
|
||||
- [ ] Cancel session (soft delete)
|
||||
- [ ] Session not found after TTL expires
|
||||
- [ ] Unauthorized access blocked
|
||||
- [ ] Event bus emissions verified
|
||||
- [ ] Trace ingestion tests (8 tests)
|
||||
- [ ] Batch trace ingest success
|
||||
- [ ] Trace ingest with invalid session rejected
|
||||
- [ ] Duplicate trace idempotency (upsert)
|
||||
- [ ] Composite pk query by session
|
||||
- [ ] Trace timing validation
|
||||
- [ ] Parent-child span relationships
|
||||
- [ ] Trace with error status
|
||||
- [ ] Large batch rejected (>50 items)
|
||||
- [ ] Log ingestion tests (8 tests)
|
||||
- [ ] Batch log ingest success
|
||||
- [ ] Log with PII redacted (email, SSN)
|
||||
- [ ] Log level filtering
|
||||
- [ ] Invalid session rejected
|
||||
- [ ] Log search by message content
|
||||
- [ ] Log context preservation
|
||||
- [ ] Fatal log triggers alert
|
||||
- [ ] Log TTL enforcement (3 days)
|
||||
- [ ] Config polling tests (6 tests)
|
||||
- [ ] Returns active session for device
|
||||
- [ ] Returns empty when no active session
|
||||
- [ ] ETag 304 caching works
|
||||
- [ ] Rate limit enforced (5 sec)
|
||||
- [ ] Wrong device cannot access other session
|
||||
- [ ] Expired session not returned
|
||||
- [ ] Screenshot tests (6 tests)
|
||||
- [ ] SAS token generation via blob module
|
||||
- [ ] Metadata stored in Cosmos
|
||||
- [ ] Direct Azure Blob upload works
|
||||
- [ ] Screenshot metadata retrieval
|
||||
- [ ] Unauthorized access blocked
|
||||
- [ ] Blob lifecycle tied to session TTL
|
||||
- [ ] **Target:** 38+ Vitest tests (increased from 28)
|
||||
- [x] **1.4.1** Create `modules/diagnostics/diagnostics.test.ts` — [`fb71981`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/fb71981)
|
||||
- [x] Session CRUD tests (6 tests implemented, 4 pending)
|
||||
- [x] Trace ingestion tests (2 tests implemented, 6 pending)
|
||||
- [x] Log ingestion tests (3 tests implemented, 5 pending)
|
||||
- [x] Schema validation tests (5 tests)
|
||||
- [ ] Config polling tests (6 tests) — PENDING Phase 1.5
|
||||
- [ ] Screenshot tests (6 tests) — PENDING Phase 1.5
|
||||
- [x] **Target:** 14+ tests implemented (38 target for full Phase 1)
|
||||
|
||||
### 1.5 Integration
|
||||
|
||||
- [ ] **1.5.1** Wire into `server.ts`
|
||||
- [ ] Import `diagnosticsRoutes` from `./modules/diagnostics/routes.js`
|
||||
- [ ] Register: `await app.register(diagnosticsRoutes, { prefix: '/api' })`
|
||||
- [ ] Add after telemetry routes (logical grouping)
|
||||
- [x] **1.5.1** Wire into `server.ts` — [`d444a8d`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/d444a8d)
|
||||
- [x] Import `diagnosticsRoutes` from `./modules/diagnostics/routes.js`
|
||||
- [x] Register: `await app.register(diagnosticsRoutes, { prefix: '/api' })`
|
||||
- [x] Add after telemetry routes (logical grouping)
|
||||
- [ ] **1.5.2** Event Bus Integration (`lib/event-bus.ts`)
|
||||
- [ ] Subscribe to `diagnostics.session.created` → Send notification to target user (email/push)
|
||||
- [ ] Subscribe to `diagnostics.session.cancelled` → Notify admin who started session
|
||||
|
||||
Loading…
Reference in New Issue
Block a user