docs(data): add migration backfill plan

This commit is contained in:
Saravana Achu Mac 2026-05-05 13:48:52 -07:00
parent 5ce97f8615
commit 3b8fa29c2a
6 changed files with 114 additions and 4 deletions

View File

@ -137,3 +137,4 @@ Current baseline note: after common-platform workspace alignment, `pnpm install
- [`docs/RELEASE_CHECKLIST.md`](docs/RELEASE_CHECKLIST.md) — Release notes template, deploy checklist, rollback, migrations, and monitoring placeholders - [`docs/RELEASE_CHECKLIST.md`](docs/RELEASE_CHECKLIST.md) — Release notes template, deploy checklist, rollback, migrations, and monitoring placeholders
- [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach - [`docs/COSMOS_DATA_OPERATIONS.md`](docs/COSMOS_DATA_OPERATIONS.md) — Cosmos containers, indexes, retention, and backup/restore approach
- [`docs/SEED_BOOTSTRAP_STRATEGY.md`](docs/SEED_BOOTSTRAP_STRATEGY.md) — Built-in prompt, intake rule, onboarding workspace, and feature-flag bootstrap strategy - [`docs/SEED_BOOTSTRAP_STRATEGY.md`](docs/SEED_BOOTSTRAP_STRATEGY.md) — Built-in prompt, intake rule, onboarding workspace, and feature-flag bootstrap strategy
- [`docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`](docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md) — Encrypted-field, schema-change, and backfill migration plan

View File

@ -111,6 +111,7 @@ Before release:
## Related Docs ## Related Docs
- `docs/COSMOS_QUERY_REVIEW.md` - `docs/COSMOS_QUERY_REVIEW.md`
- `docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`
- `docs/FIELD_ENCRYPTION_COVERAGE.md` - `docs/FIELD_ENCRYPTION_COVERAGE.md`
- `docs/RELEASE_CHECKLIST.md` - `docs/RELEASE_CHECKLIST.md`
- `docs/IMPORT_EXPORT_READINESS.md` - `docs/IMPORT_EXPORT_READINESS.md`

View File

@ -0,0 +1,108 @@
# NoteLett Data Migration And Backfill Plan
Date: May 5, 2026
Product ID: `notelett`
Common platform source: `../learning_ai/learning_ai_common_plat`
## Purpose
This document defines how NoteLett runs production data migrations safely, with special attention to encrypted fields, schema changes, and long-running backfills. It complements `docs/COSMOS_DATA_OPERATIONS.md` for container layout and `docs/FIELD_ENCRYPTION_COVERAGE.md` for the encrypted-field inventory.
## Common Platform Reuse
Use the common-platform field-encryption migration helper before writing product-local migration machinery:
- Package helper: `../learning_ai/learning_ai_common_plat/packages/field-encrypt/src/migration.ts`
- Cross-product CLI: `../learning_ai/learning_ai_common_plat/scripts/encrypt-migrate.ts`
The helper `migrateDocuments()` is idempotent: it skips values already shaped like an encrypted field envelope and records scanned, encrypted, skipped, and error counts. That behavior is the baseline for NoteLett encrypted-field backfills.
Current common-platform CLI coverage for NoteLett only includes `notes.body`. Before any production live encryption backfill, either extend the common-platform NoteLett config to include the full field map below or run a NoteLett-owned wrapper that calls `@bytelyst/field-encrypt` `migrateDocuments()` with the same map.
## Encrypted Field Backfill Map
These are the fields currently encrypted by NoteLett repositories on create/update:
| Container | Partition key | Fields | Encryption context | User id field | Notes |
| --- | --- | --- | --- | --- | --- |
| `notes` | `/workspaceId` | `body` | `notes` | `userId` | Common CLI currently uses `createdBy`; use `userId` for consistency with repository encryption. |
| `palace_memories` | `/userId` | `content` | `palace-memory` | `userId` | Preserve embeddings and Palace metadata as plaintext query fields. |
| `note_artifacts` | `/workspaceId` | `title`, `description`, `blobPath` | `note_artifacts` | `userId` | `description` and `blobPath` can be absent; skip null/undefined. |
| `note_prompts` | `/userId` | `description`, `systemPrompt`, `userPromptTemplate` | `note_prompts` | `userId` | User templates are sensitive; built-in `__builtin__` templates are system content and may be reseeded. |
| `note_agent_actions` | `/workspaceId` | `reason`, `beforeSummary`, `afterSummary`, `reviewNote` | `note_agent_actions` | `userId` | Preserve state, actor, workflow, idempotency, and correlation fields as plaintext query metadata. |
Do not encrypt fields used for partitioning, ownership checks, filtering, sorting, idempotency, correlation, or operational routing.
## Encryption Migration Procedure
1. Confirm `FIELD_ENCRYPT_ENABLED=true` and production key configuration are valid.
2. Run a dry run per container and field, recording scanned/encrypted/skipped/error counts.
3. Confirm a recent Cosmos backup or restore point exists.
4. Pause or feature-flag high-write workflows for the target container if the backfill will take longer than a short maintenance window.
5. Run one container at a time with bounded batch size, starting with the lowest-risk container.
6. Stop immediately if any field reports unexpected errors, owner mismatches, or a large count of documents missing required partition keys.
7. Sample raw Cosmos documents for `__encrypted: true` and then read the same records through NoteLett APIs to verify plaintext responses.
8. Record migration id, commit hash, environment, key provider, dry-run/live counts, operator, start/end time, and rollback decision in the release record.
Suggested dry-run command when using the common-platform CLI:
```bash
cd ../learning_ai/learning_ai_common_plat
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --dry-run --verbose'
```
Only run the live command after the NoteLett field map has full coverage:
```bash
cd ../learning_ai/learning_ai_common_plat
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --batch-size 50 --verbose'
```
## Schema Change Plan
Every schema change needs a migration note before deployment:
- additive optional field: deploy code first, backfill only if UX or query behavior requires it
- additive required field: deploy code that writes and tolerates missing field, backfill, then tighten validation in a later release
- enum expansion: deploy readers before writers; keep unknown-safe UI fallbacks
- field rename: dual-read and dual-write, backfill, verify, then remove old field in a later release
- partition key change: create a new container/projection and copy data; do not mutate an existing Cosmos container partition key in place
- ownership/scope change: require explicit auth tests and product/user/workspace sampling
- encrypted-field addition: use the encryption procedure above and preserve plaintext reads until backfill is complete
## Backfill Operating Rules
- Backfills must be idempotent and restartable.
- Backfills must filter by `productId: "notelett"` wherever the query shape allows.
- Backfills must preserve `id`, partition key, `productId`, owner fields, `createdAt`, and original source timestamps.
- Backfills should write `updatedAt` only when the application semantics require it; operational encryption backfills should avoid changing user-visible timestamps.
- Long-running jobs need progress logs at least every batch and a clear stop criterion.
- Cross-container backfills need a manifest of source container, target container, count, and validation query.
- Any migration that could affect user-visible reads needs a rollback note before live execution.
## Rollback And Recovery
Encrypted-field backfills are usually forward-only because the production key should remain available for reads after deployment. If rollback is required:
1. Keep code that can read encrypted and plaintext values deployed.
2. Prefer rolling back the application image while leaving encrypted data intact.
3. If data restore is required, restore into a new Cosmos account/database first and validate scoped records before copying back.
4. Do not decrypt and rewrite sensitive fields to plaintext unless explicitly approved as an incident recovery action.
Schema/backfill rollback should name the previous image tag, affected containers, restore point, validation queries, and any feature flags to disable while recovery runs.
## Verification Checklist
For migration-plan changes:
```bash
git diff --check
rg -n "DATA_MIGRATION_AND_BACKFILL_PLAN|migrateDocuments|encrypt-migrate|Encrypted Field Backfill Map" docs README.md
```
For live migration readiness:
```bash
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend run typecheck'
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend exec vitest run src/lib/encryption-coverage.test.ts src/modules/note-artifacts/routes.integration.test.ts src/modules/note-prompts/note-prompts.test.ts src/modules/note-agent-actions/routes.integration.test.ts'
```

View File

@ -18,7 +18,7 @@ NoteLett uses `@bytelyst/field-encrypt` through `backend/src/lib/field-encrypt.t
This change is backward-compatible for existing plaintext documents: repository reads decrypt only values that match the shared `EncryptedField` envelope and pass plaintext through unchanged. Newly created and updated documents write encrypted values for the fields above. This change is backward-compatible for existing plaintext documents: repository reads decrypt only values that match the shared `EncryptedField` envelope and pass plaintext through unchanged. Newly created and updated documents write encrypted values for the fields above.
Recommended production migration before launch: Recommended production migration before launch is defined in `docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`. At a high level:
1. Run a one-time backfill using `@bytelyst/field-encrypt` `migrateDocuments` or an equivalent repository-level script for the fields listed above. 1. Run a one-time backfill using `@bytelyst/field-encrypt` `migrateDocuments` or an equivalent repository-level script for the fields listed above.
2. Process one container at a time, partition-aware: `notes` and artifact/action containers by `workspaceId`, prompt templates by `userId`. 2. Process one container at a time, partition-aware: `notes` and artifact/action containers by `workspaceId`, prompt templates by `userId`.

View File

@ -111,7 +111,7 @@ Do not place secrets in `NEXT_PUBLIC_*` or `EXPO_PUBLIC_*` variables.
- Run `pnpm run seed:bootstrap` with production backend environment; see `docs/SEED_BOOTSTRAP_STRATEGY.md`. - Run `pnpm run seed:bootstrap` with production backend environment; see `docs/SEED_BOOTSTRAP_STRATEGY.md`.
- Built-in prompt templates and intake rules are seeded idempotently. - Built-in prompt templates and intake rules are seeded idempotently.
- Default workspace/bootstrap behavior is per-user and deterministic. - Default workspace/bootstrap behavior is per-user and deterministic.
- Cosmos schema changes are backward compatible or have a documented backfill. - Cosmos schema changes are backward compatible or have a documented backfill in `docs/DATA_MIGRATION_AND_BACKFILL_PLAN.md`.
- Encrypted-field migrations have a dry-run and rollback note. - Encrypted-field migrations have a dry-run and rollback note.
- Long-running backfills have owner, estimated duration, progress logs, and stop criteria. - Long-running backfills have owner, estimated duration, progress logs, and stop criteria.
- No migration relies on client-side public environment variables for secrets. - No migration relies on client-side public environment variables for secrets.

View File

@ -27,8 +27,8 @@ COSMOS_ENDPOINT=... \
COSMOS_KEY=... \ COSMOS_KEY=... \
COSMOS_DATABASE=notelett-prod \ COSMOS_DATABASE=notelett-prod \
JWT_SECRET=... \ JWT_SECRET=... \
FIELD_ENCRYPTION_ENABLED=true \ FIELD_ENCRYPT_ENABLED=true \
FIELD_ENCRYPTION_KEY_PROVIDER=azure-key-vault \ FIELD_ENCRYPT_KEY_PROVIDER=akv \
pnpm run seed:bootstrap pnpm run seed:bootstrap
``` ```