7.0 KiB
NoteLett Data Migration And Backfill Plan
Date: May 5, 2026
Product ID: notelett
Common platform source: ../learning_ai/learning_ai_common_plat
Purpose
This document defines how NoteLett runs production data migrations safely, with special attention to encrypted fields, schema changes, and long-running backfills. It complements docs/COSMOS_DATA_OPERATIONS.md for container layout and docs/FIELD_ENCRYPTION_COVERAGE.md for the encrypted-field inventory.
Common Platform Reuse
Use the common-platform field-encryption migration helper before writing product-local migration machinery:
- Package helper:
../learning_ai/learning_ai_common_plat/packages/field-encrypt/src/migration.ts - Cross-product CLI:
../learning_ai/learning_ai_common_plat/scripts/encrypt-migrate.ts
The helper migrateDocuments() is idempotent: it skips values already shaped like an encrypted field envelope and records scanned, encrypted, skipped, and error counts. That behavior is the baseline for NoteLett encrypted-field backfills.
Current common-platform CLI coverage for NoteLett only includes notes.body. Before any production live encryption backfill, either extend the common-platform NoteLett config to include the full field map below or run a NoteLett-owned wrapper that calls @bytelyst/field-encrypt migrateDocuments() with the same map.
Encrypted Field Backfill Map
These are the fields currently encrypted by NoteLett repositories on create/update:
| Container | Partition key | Fields | Encryption context | User id field | Notes |
|---|---|---|---|---|---|
notes |
/workspaceId |
body |
notes |
userId |
Common CLI currently uses createdBy; use userId for consistency with repository encryption. |
palace_memories |
/userId |
content |
palace-memory |
userId |
Preserve embeddings and Palace metadata as plaintext query fields. |
note_artifacts |
/workspaceId |
title, description, blobPath |
note_artifacts |
userId |
description and blobPath can be absent; skip null/undefined. |
note_prompts |
/userId |
description, systemPrompt, userPromptTemplate |
note_prompts |
userId |
User templates are sensitive; built-in __builtin__ templates are system content and may be reseeded. |
note_agent_actions |
/workspaceId |
reason, beforeSummary, afterSummary, reviewNote |
note_agent_actions |
userId |
Preserve state, actor, workflow, idempotency, and correlation fields as plaintext query metadata. |
Do not encrypt fields used for partitioning, ownership checks, filtering, sorting, idempotency, correlation, or operational routing.
Encryption Migration Procedure
- Confirm
FIELD_ENCRYPT_ENABLED=trueand production key configuration are valid. - Run a dry run per container and field, recording scanned/encrypted/skipped/error counts.
- Confirm a recent Cosmos backup or restore point exists.
- Pause or feature-flag high-write workflows for the target container if the backfill will take longer than a short maintenance window.
- Run one container at a time with bounded batch size, starting with the lowest-risk container.
- Stop immediately if any field reports unexpected errors, owner mismatches, or a large count of documents missing required partition keys.
- Sample raw Cosmos documents for
__encrypted: trueand then read the same records through NoteLett APIs to verify plaintext responses. - Record migration id, commit hash, environment, key provider, dry-run/live counts, operator, start/end time, and rollback decision in the release record.
Suggested dry-run command when using the common-platform CLI:
cd ../learning_ai/learning_ai_common_plat
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --dry-run --verbose'
Only run the live command after the NoteLett field map has full coverage:
cd ../learning_ai/learning_ai_common_plat
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; COSMOS_DATABASE=notelett-prod FIELD_ENCRYPT_KEY_PROVIDER=akv npx tsx scripts/encrypt-migrate.ts --product notelett --batch-size 50 --verbose'
Schema Change Plan
Every schema change needs a migration note before deployment:
- additive optional field: deploy code first, backfill only if UX or query behavior requires it
- additive required field: deploy code that writes and tolerates missing field, backfill, then tighten validation in a later release
- enum expansion: deploy readers before writers; keep unknown-safe UI fallbacks
- field rename: dual-read and dual-write, backfill, verify, then remove old field in a later release
- partition key change: create a new container/projection and copy data; do not mutate an existing Cosmos container partition key in place
- ownership/scope change: require explicit auth tests and product/user/workspace sampling
- encrypted-field addition: use the encryption procedure above and preserve plaintext reads until backfill is complete
Backfill Operating Rules
- Backfills must be idempotent and restartable.
- Backfills must filter by
productId: "notelett"wherever the query shape allows. - Backfills must preserve
id, partition key,productId, owner fields,createdAt, and original source timestamps. - Backfills should write
updatedAtonly when the application semantics require it; operational encryption backfills should avoid changing user-visible timestamps. - Long-running jobs need progress logs at least every batch and a clear stop criterion.
- Cross-container backfills need a manifest of source container, target container, count, and validation query.
- Any migration that could affect user-visible reads needs a rollback note before live execution.
Rollback And Recovery
Encrypted-field backfills are usually forward-only because the production key should remain available for reads after deployment. If rollback is required:
- Keep code that can read encrypted and plaintext values deployed.
- Prefer rolling back the application image while leaving encrypted data intact.
- If data restore is required, restore into a new Cosmos account/database first and validate scoped records before copying back.
- Do not decrypt and rewrite sensitive fields to plaintext unless explicitly approved as an incident recovery action.
Schema/backfill rollback should name the previous image tag, affected containers, restore point, validation queries, and any feature flags to disable while recovery runs.
Verification Checklist
For migration-plan changes:
git diff --check
rg -n "DATA_MIGRATION_AND_BACKFILL_PLAN|migrateDocuments|encrypt-migrate|Encrypted Field Backfill Map" docs README.md
For live migration readiness:
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend run typecheck'
zsh -lc 'source ~/.zshrc; export GITEA_NPM_TOKEN; pnpm --filter @notelett/backend exec vitest run src/lib/encryption-coverage.test.ts src/modules/note-artifacts/routes.integration.test.ts src/modules/note-prompts/note-prompts.test.ts src/modules/note-agent-actions/routes.integration.test.ts'