docs(data): add cosmos operations runbook

This commit is contained in:
Saravana Achu Mac 2026-05-05 13:38:58 -07:00
parent 33ea533d8b
commit d897702d9e
3 changed files with 119 additions and 0 deletions

View File

@ -135,3 +135,4 @@ Current baseline note: after common-platform workspace alignment, `pnpm install
- [`docs/PLATFORM_SMOKE_CHECKS.md`](docs/PLATFORM_SMOKE_CHECKS.md) — Shared platform and NoteLett smoke commands
- [`docs/MOBILE_PRODUCTION_BUILD_AND_SMOKE.md`](docs/MOBILE_PRODUCTION_BUILD_AND_SMOKE.md) — Expo build notes and iOS/Android smoke checklist
- [`docs/RELEASE_CHECKLIST.md`](docs/RELEASE_CHECKLIST.md) — Release notes template, deploy checklist, rollback, migrations, and monitoring placeholders
- [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach

View File

@ -0,0 +1,116 @@
# NoteLett Cosmos Data Operations
Date: May 5, 2026
Product ID: `notelett`
Source of truth: `backend/src/lib/cosmos-init.ts`
## Scope
This document records the production data layout, index expectations, retention policy, and backup/restore approach for NoteLett Cosmos DB. It complements `docs/COSMOS_QUERY_REVIEW.md`, which focuses on query shape and scope isolation.
## Container Inventory
Every production document must include `productId: "notelett"` and the appropriate user/workspace ownership field. The backend registers these containers through the common-platform `@bytelyst/cosmos` helpers.
| Container | Partition key | Primary owner/scope | Main documents |
| --- | --- | --- | --- |
| `notes` | `/workspaceId` | user + workspace | Note body/status/tags/links/embedding metadata |
| `workspaces` | `/userId` | user | Workspace metadata |
| `note_relationships` | `/workspaceId` | user + workspace | Typed note links |
| `note_tasks` | `/workspaceId` | user + workspace | Extracted or user-created tasks |
| `note_artifacts` | `/workspaceId` | user + workspace | Blob/artifact metadata |
| `note_agent_actions` | `/workspaceId` | user + workspace | MCP/agent audit and review records |
| `saved_views` | `/userId` | user | Saved search/filter views |
| `note_prompts` | `/userId` | user or `__builtin__` | Prompt templates |
| `note_prompt_schedules` | `/userId` | user | Scheduled prompt actions |
| `note_prompt_webhooks` | `/userId` | user | Prompt webhook triggers |
| `note_shares` | `/workspaceId` | user + workspace | Expiring public share links |
| `note_versions` | `/workspaceId` | user + workspace | Note version history |
| `note_intake_rules` | `/userId` | user | URL intake routing rules |
| `note_intake_jobs` | `/userId` | user | Intake job lifecycle state |
| `note_collaborators` | `/sharedWithUserId` | shared-with user | Direct note collaborators |
| `palace_wings` | `/userId` | user | Palace wing taxonomy |
| `palace_rooms` | `/userId` | user | Palace rooms |
| `palace_memories` | `/userId` | user | Palace memories |
| `palace_tunnels` | `/userId` | user | Palace tunnels |
| `palace_kg` | `/userId` | user | Knowledge graph triples |
| `palace_diaries` | `/userId` | user | Palace diary entries |
## Index Expectations
The current registration code provides partition keys only. Unless the production account overrides indexing policies out-of-band, Cosmos default automatic indexing applies.
Keep these expectations:
- Point reads and workspace/user scoped list reads should rely on partition key plus `id`, `workspaceId`, `userId`, and `productId`.
- Common sort fields such as `updatedAt`, `createdAt`, `state`, `status`, `priority`, and `expiresAt` should remain indexed.
- Do not exclude encrypted text fields from indexing until query coverage proves they are never filtered or sorted. Encrypted values are not useful for semantic search, but repository code still needs predictable reads during migration periods.
- Public share token lookup is intentionally bounded but cross-partition; if volume grows, add a user- or token-partitioned projection before relaxing query limits.
- Global dashboard/search reads over `notes` are acceptable for release 1 but should move to a user-partitioned search projection or external search service if high traffic appears.
Recommended future explicit composite indexes, if query diagnostics show RU pressure:
| Container | Candidate composite index | Reason |
| --- | --- | --- |
| `notes` | `/productId`, `/userId`, `/workspaceId`, `/updatedAt DESC` | Workspace note lists and exports |
| `note_agent_actions` | `/productId`, `/userId`, `/state`, `/updatedAt DESC` | Approval queue |
| `note_intake_jobs` | `/productId`, `/userId`, `/status`, `/startedAt DESC` | Active intake polling |
| `note_versions` | `/productId`, `/workspaceId`, `/noteId`, `/createdAt DESC` | Version history |
| `note_shares` | `/productId`, `/shareToken` | Public share token lookup |
## Retention Expectations
No container TTL is currently configured in code. Until TTL is explicitly added, retention is application-managed.
| Data class | Current release expectation | Future retention candidate |
| --- | --- | --- |
| Notes, workspaces, relationships, tasks, artifacts | Retain until user delete/archive behavior says otherwise | User export/delete policy |
| Agent actions | Retain for audit trail through release 1 | 180-365 day TTL after compliance review |
| Note versions | Retain while version history is user-visible | Cap by count or age per note |
| Public shares | App-level expiry via `expiresAt`; revoked/expired shares should not grant access | TTL after expiry + grace period |
| Intake jobs | Retain while user can inspect processing state | 30-90 day TTL |
| Prompt schedules/webhooks/templates | Retain until user deletion | None by default |
| Palace data | Retain as user-owned knowledge data | User delete/export policy |
Before enabling TTL, confirm export, audit, and rollback requirements for that data class.
## Backup Approach
Production Cosmos accounts should use Azure-native backup configured outside this repo:
- Prefer continuous backup with point-in-time restore for production if cost and region support allow it.
- At minimum, enable periodic backup with a retention window that satisfies product and compliance needs.
- Record account name, database name, backup mode, retention window, and restore permissions in the environment release record.
- Keep Cosmos keys in Key Vault or the deployment secret manager; never commit keys.
Application-level export is not a database backup. `GET /api/notes/export` is user-scoped JSON/Markdown portability and does not include all operational records, versions, shares, collaborators, intake jobs, or Palace data.
## Restore Approach
Use this restore sequence for incidents:
1. Identify affected account, database, containers, product id, user ids, workspace ids, and time window.
2. Prefer point-in-time restore into a new Cosmos account/database, not over the active database.
3. Validate restored data with read-only queries scoped by `productId`, `userId`, and `workspaceId`.
4. Decide whether to cut traffic over to the restored database or run a scoped backfill into production.
5. For scoped backfills, preserve original ids, partition keys, `productId`, ownership fields, timestamps, and encryption state.
6. Run backend readiness and a representative authenticated note/workspace flow after restore.
7. Record restored containers, restore timestamp, operator, validation queries, and follow-up migrations.
## Operational Checks
Before release:
- `backend/src/lib/cosmos-init.ts` container list matches this document.
- Production config rejects `DB_PROVIDER=memory`.
- Field encryption is enabled with a production key provider.
- Cosmos backup mode and retention window are recorded in release notes.
- Query-risk follow-ups from `docs/COSMOS_QUERY_REVIEW.md` have owners.
- Any migration/backfill has a dry-run, stop criteria, and rollback note.
## Related Docs
- `docs/COSMOS_QUERY_REVIEW.md`
- `docs/FIELD_ENCRYPTION_COVERAGE.md`
- `docs/RELEASE_CHECKLIST.md`
- `docs/IMPORT_EXPORT_READINESS.md`

View File

@ -4,6 +4,8 @@ Date: 2026-05-05
This review covers current NoteLett repository access patterns against the Cosmos container registrations in `backend/src/lib/cosmos-init.ts`. All documents retain `productId: "notelett"` and all list routes include product/user/workspace scope filters before returning user data.
For the broader container inventory, index expectations, retention policy, and backup/restore approach, see `docs/COSMOS_DATA_OPERATIONS.md`.
## Container Partitions
| Container | Partition key | Primary access pattern |