docs(roadmap): record P9.5 operator runbook

This commit is contained in:
Saravana Achu Mac 2026-05-05 13:54:27 -07:00
parent 57a7e10bc9
commit e9f045571b

View File

@ -238,7 +238,7 @@ Goal: production data can be created, migrated, backed up, and debugged safely.
- [x] **P9.2** Add seed/bootstrap strategy for built-in prompt templates, intake rules, default workspace, and feature flags. Commit: `efcc5fa`; Verified: `zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend run typecheck'`; `zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend exec vitest run src/lib/product-config.test.ts src/modules/note-prompts/note-prompts.test.ts src/modules/intake/routes.test.ts src/modules/workspaces/routes.integration.test.ts --reporter=verbose'`; `zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; DB_PROVIDER=memory pnpm run seed:bootstrap'`; `git diff --check`. Added `pnpm run seed:bootstrap`, a backend bootstrap command that persists deterministic built-in prompt templates and intake rules through shared datastore/Cosmos initialization, documented production seed ordering, per-user onboarding workspace behavior, and feature-flag ownership in `docs/SEED_BOOTSTRAP_STRATEGY.md`, linked the strategy from README/release docs, and fixed backend product identity loading so sourced local env still seeds under `productId: "notelett"`.
- [x] **P9.3** Add migration plan for encrypted fields, schema changes, and backfills; reuse common platform migration scripts where practical. Commit: `3b8fa29`; Verified: `git diff --check`; `rg -n "DATA_MIGRATION_AND_BACKFILL_PLAN|migrateDocuments|encrypt-migrate|Encrypted Field Backfill Map|FIELD_ENCRYPT_ENABLED|FIELD_ENCRYPT_KEY_PROVIDER=akv" docs README.md`. Added `docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md` with the NoteLett encrypted-field backfill map, common-platform `@bytelyst/field-encrypt` `migrateDocuments()` reuse guidance, explicit note that the shared `encrypt-migrate.ts` NoteLett config currently covers only `notes.body`, dry-run/live command shape, schema-change sequencing, idempotent backfill operating rules, and rollback guidance; linked it from README, Cosmos operations, field-encryption coverage, and release docs, and corrected seed bootstrap env names to the actual `FIELD_ENCRYPT_*` settings.
- [x] **P9.4** Define telemetry event taxonomy and diagnostic breadcrumbs for notes, prompts, intake, reviews, mobile capture, and MCP actions. Commit: `40b62bf`; Verified: `git diff --check`; `rg -n "TELEMETRY_AND_DIAGNOSTICS_TAXONOMY|Backend Event Taxonomy|Prompt And AI Event Taxonomy|Intake Event Taxonomy|Reviews And MCP Event Taxonomy|Mobile Capture Event Taxonomy|Diagnostic Breadcrumbs|@bytelyst/diagnostics-client|ECOSYSTEM_EVENT_TAXONOMY" docs README.md`. Added `docs/TELEMETRY_AND_DIAGNOSTICS_TAXONOMY.md` covering event naming rules, required metadata, backend note/workspace/sharing/collaboration events, prompt/AI events, intake lifecycle events, review and MCP action events, mobile capture/offline telemetry, and safe diagnostic breadcrumb categories aligned with common-platform telemetry and diagnostics docs; linked it from README and the release checklist.
- [ ] **P9.5** Add operator runbook: incident triage, dependency outage behavior, stuck scheduler/webhook recovery, failed blob upload recovery, failed LLM/extraction recovery. Commit:
- [x] **P9.5** Add operator runbook: incident triage, dependency outage behavior, stuck scheduler/webhook recovery, failed blob upload recovery, failed LLM/extraction recovery. Commit: `57a7e10`; Verified: `git diff --check`; `rg -n "OPERATOR_RUNBOOK|Stuck Scheduler Recovery|Failed Blob Upload Recovery|Failed LLM Or Extraction Recovery|MCP Action Recovery|incident triage|scheduler/webhooks" docs README.md`. Added `docs/OPERATOR_RUNBOOK.md` with first-five-minute triage, dependency outage behavior, feature-flag mitigations, stuck scheduler recovery, webhook recovery, failed blob upload recovery, failed LLM/extraction recovery, review queue recovery, MCP action recovery, communication/closeout rules, and verification commands; linked it from README and the release checklist.
Acceptance criteria:
- Operators know what data exists, how it is protected, and how to recover.