# NoteLett Data Migration And Backfill Plan Date: May 5, 2026 Product ID: `notelett` Common platform source: `../learning_ai/learning_ai_common_plat` ## Purpose This document defines how NoteLett runs production data migrations safely, with special attention to encrypted fields, schema changes, and long-running backfills. It complements `docs/COSMOS_DATA_OPERATIONS.md` for container layout and `docs/FIELD_ENCRYPTION_COVERAGE.md` for the encrypted-field inventory. ## Common Platform Reuse Use the common-platform field-encryption migration helper before writing product-local migration machinery: - Package helper: `../learning_ai/learning_ai_common_plat/packages/field-encrypt/src/migration.ts` - Cross-product CLI: `../learning_ai/learning_ai_common_plat/scripts/encrypt-migrate.ts` The helper `migrateDocuments()` is idempotent: it skips values already shaped like an encrypted field envelope and records scanned, encrypted, skipped, and error counts. That behavior is the baseline for NoteLett encrypted-field backfills. Current common-platform CLI coverage for NoteLett only includes `notes.body`. Before any production live encryption backfill, either extend the common-platform NoteLett config to include the full field map below or run a NoteLett-owned wrapper that calls `@bytelyst/field-encrypt` `migrateDocuments()` with the same map. ## Encrypted Field Backfill Map These are the fields currently encrypted by NoteLett repositories on create/update: | Container | Partition key | Fields | Encryption context | User id field | Notes | | --- | --- | --- | --- | --- | --- | | `notes` | `/workspaceId` | `body` | `notes` | `userId` | Common CLI currently uses `createdBy`; use `userId` for consistency with repository encryption. | | `palace_memories` | `/userId` | `content` | `palace-memory` | `userId` | Preserve embeddings and Palace metadata as plaintext query fields. | | `note_artifacts` | `/workspaceId` | `title`, `description`, `blobPath` | `note_artifacts` | `userId` | `description` and `blobPath` can be absent; skip null/undefined. | | `note_prompts` | `/userId` | `description`, `systemPrompt`, `userPromptTemplate` | `note_prompts` | `userId` | User templates are sensitive; built-in `__builtin__` templates are system content and may be reseeded. | | `note_agent_actions` | `/workspaceId` | `reason`, `beforeSummary`, `afterSummary`, `reviewNote` | `note_agent_actions` | `userId` | Preserve state, actor, workflow, idempotency, and correlation fields as plaintext query metadata. | Do not encrypt fields used for partitioning, ownership checks, filtering, sorting, idempotency, correlation, or operational routing. ## Encryption Migration Procedure 1. Confirm `FIELD_ENCRYPT_ENABLED=true` and production key configuration are valid. 2. Run a dry run per container and field, recording scanned/encrypted/skipped/error counts. 3. Confirm a recent Cosmos backup or restore point exists. 4. Pause or feature-flag high-write workflows for the target container if the backfill will take longer than a short maintenance window. 5. Run one container at a time with bounded batch size, starting with the lowest-risk container. 6. Stop immediately if any field reports unexpected errors, owner mismatches, or a large count of documents missing required partition keys. 7. Sample raw Cosmos documents for `__encrypted: true` and then read the same records through NoteLett APIs to verify plaintext responses. 8. Record migration id, commit hash, environment, key provider, dry-run/live counts, operator, start/end time, and rollback decision in the release record. Suggested dry-run command when using the common-platform CLI: ```bash cd ../learning_ai/learning_ai_common_plat zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --dry-run --verbose' ``` Only run the live command after the NoteLett field map has full coverage: ```bash cd ../learning_ai/learning_ai_common_plat zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --batch-size 50 --verbose' ``` ## Schema Change Plan Every schema change needs a migration note before deployment: - additive optional field: deploy code first, backfill only if UX or query behavior requires it - additive required field: deploy code that writes and tolerates missing field, backfill, then tighten validation in a later release - enum expansion: deploy readers before writers; keep unknown-safe UI fallbacks - field rename: dual-read and dual-write, backfill, verify, then remove old field in a later release - partition key change: create a new container/projection and copy data; do not mutate an existing Cosmos container partition key in place - ownership/scope change: require explicit auth tests and product/user/workspace sampling - encrypted-field addition: use the encryption procedure above and preserve plaintext reads until backfill is complete ## Backfill Operating Rules - Backfills must be idempotent and restartable. - Backfills must filter by `productId: "notelett"` wherever the query shape allows. - Backfills must preserve `id`, partition key, `productId`, owner fields, `createdAt`, and original source timestamps. - Backfills should write `updatedAt` only when the application semantics require it; operational encryption backfills should avoid changing user-visible timestamps. - Long-running jobs need progress logs at least every batch and a clear stop criterion. - Cross-container backfills need a manifest of source container, target container, count, and validation query. - Any migration that could affect user-visible reads needs a rollback note before live execution. ## Rollback And Recovery Encrypted-field backfills are usually forward-only because the production key should remain available for reads after deployment. If rollback is required: 1. Keep code that can read encrypted and plaintext values deployed. 2. Prefer rolling back the application image while leaving encrypted data intact. 3. If data restore is required, restore into a new Cosmos account/database first and validate scoped records before copying back. 4. Do not decrypt and rewrite sensitive fields to plaintext unless explicitly approved as an incident recovery action. Schema/backfill rollback should name the previous image tag, affected containers, restore point, validation queries, and any feature flags to disable while recovery runs. ## Verification Checklist For migration-plan changes: ```bash git diff --check rg -n "DATA_MIGRATION_AND_BACKFILL_PLAN|migrateDocuments|encrypt-migrate|Encrypted Field Backfill Map" docs README.md ``` For live migration readiness: ```bash zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend run typecheck' zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend exec vitest run src/lib/encryption-coverage.test.ts src/modules/note-artifacts/routes.integration.test.ts src/modules/note-prompts/note-prompts.test.ts src/modules/note-agent-actions/routes.integration.test.ts' ```