Sprint B — closes audit items B4 and B5. - docs/runbooks/MEK_ROTATION.md: step-by-step procedure for rotating the field-encrypt master key in Azure Key Vault, including pre-flight checks, rewrapAllDeks usage, verification queries, rollback, and lost-MEK recovery. Replaces the previous gap where MEK rotation had no documented operator path. - docs/runbooks/SECRET_MANAGEMENT.md: inventory of every secret consumed by NoteLett with its production source (AKV), two production-grade patterns (workload identity vs K8s CSI), the compose-host pattern, rotation flow per secret type, verification commands, and red-flag triage. Both docs cross-link each other and call out concrete open items (automation, dual-JWT support, audit-log emission) for later sprints rather than overstating current capabilities.
7.3 KiB
Runbook — Secret Management for NoteLett
Owner: Platform / Security Touches: backend container (port 4016), web container (port 3000), Azure Key Vault, deployment platform Audience: anyone deploying NoteLett to a non-development environment
Principles
- Never commit a secret to git, never bake one into a Docker image.
- Every secret has exactly one source of truth — Azure Key Vault (AKV) in production.
- The container reads secrets at process start, never from disk on the runtime host.
- Rotation is non-disruptive: rolling a deployment after rotating the secret is enough.
Secret Inventory
| Variable | Required when | Source of truth (prod) | Source (dev) |
|---|---|---|---|
JWT_SECRET |
always (validated ≥ 32 chars in prod) | AKV secret notelett-jwt-secret |
.env (dev default rejected in prod) |
COSMOS_ENDPOINT |
DB_PROVIDER=cosmos |
AKV secret bytelyst-cosmos-endpoint |
.env |
COSMOS_KEY |
DB_PROVIDER=cosmos |
AKV secret bytelyst-cosmos-key |
.env |
AZURE_KEYVAULT_URL |
FIELD_ENCRYPT_KEY_PROVIDER=akv |
Static config (URL, not a secret) | .env |
FIELD_ENCRYPT_KEY |
FIELD_ENCRYPT_KEY_PROVIDER=env (non-prod only) |
n/a — prod uses AKV | .env |
OPENAI_API_KEY |
LLM_PROVIDER=openai |
AKV secret notelett-openai-api-key |
.env |
OPENAI_BASE_URL |
optional override | Static config (URL, not a secret) | .env |
AZURE_OPENAI_API_KEY |
LLM_PROVIDER=azure |
AKV secret notelett-azure-openai-key |
.env |
AZURE_OPENAI_ENDPOINT |
LLM_PROVIDER=azure |
Static config (URL, not a secret) | .env |
GITEA_NPM_TOKEN |
Docker build only (when not using docker-prep.sh tarballs) |
CI secret | ~/.npmrc |
backend/src/lib/config.ts enforces production assertions for the four hardest constraints: JWT_SECRET must not be the dev default and must be ≥ 32 chars, DB_PROVIDER must be cosmos, Cosmos endpoint/key/database must be set, and field encryption must be enabled with akv or env provider (never memory).
Production Pattern — Azure Key Vault
Two supported flows depending on the deployment target:
Flow A — Workload Identity (preferred)
-
The backend container runs under a Managed Identity (Azure Container Apps, AKS, or App Service).
-
The Managed Identity has
secrets/getandkeys/{wrapKey, unwrapKey}permissions on the NoteLett key vault. -
At process start, an init step resolves secrets from AKV and exports them as env vars in the process scope only:
# entrypoint snippet (illustrative) eval "$(node -e " import('@azure/identity').then(({ DefaultAzureCredential }) => import('@azure/keyvault-secrets').then(async ({ SecretClient }) => { const c = new SecretClient(process.env.AZURE_KEYVAULT_URL, new DefaultAzureCredential()); for (const name of ['notelett-jwt-secret','bytelyst-cosmos-key','notelett-openai-api-key']) { const v = (await c.getSecret(name)).value; process.stdout.write(`export ${name.replace(/-/g,'_').toUpperCase()}='${v}'\n`); } }) )" )" exec node dist/server.jsIn
@bytelyst/configthis is encapsulated byresolveKeyVaultSecrets(...)(see common-plat). Use that helper instead of writing inline glue. -
Secrets never touch the container filesystem and never appear in logs (they live in process env only).
Flow B — Kubernetes Secret synced from AKV
- Use the AKV CSI driver or
secrets-store.csi.k8s.ioto project AKV secrets into a Kubernetes Secret. - Reference the K8s Secret in the Deployment via
envFromso values land in the container env. - Rotate by recreating the Pod after the secret syncs.
Deployment Pattern — docker-compose.yml
The committed docker-compose.yml reads from the host shell env (${OPENAI_API_KEY:-} etc.) and from a local .env. For production-like single-host deploys:
-
Place secrets in a file owned by the deployer with
chmod 600, never in git. -
Source it before
docker compose up:set -a source /etc/notelett/secrets.env set +a docker compose up -d -
Avoid
--env-fileon thedocker composecommand line — it persists the path in process listings and is harder to rotate. -
After deploy, scrub
/etc/notelett/secrets.envfrom any shell history (history -d) and confirmdocker compose configdoes not leak the secret values to logs.
Rotation
Rotation pattern for any AKV-backed secret:
- Update the AKV secret with a new version (
az keyvault secret set ...). - Roll the backend deployment (rolling restart picks up the new value at process start).
- For
JWT_SECRET: rotation invalidates all outstanding access tokens. Plan for forced re-auth or implement dual-secret support before rotating in production. - For
OPENAI_API_KEY/AZURE_OPENAI_API_KEY: rotation is hot — in-flight LLM calls complete with the old key; new calls use the new key after restart. - For
COSMOS_KEY: prefer rotating the secondary key first, swap the deployment to use it, then rotate the primary.
MEK rotation has its own runbook: MEK_ROTATION.md.
Verification
After any rotation or initial deploy:
# 1. Service health.
curl https://<backend-host>/health
# 2. Dependency readiness (datastore + encryption + platform/extraction/MCP if configured).
curl https://<backend-host>/api/diagnostics/readiness
# 3. Authenticated note read (proves JWT_SECRET and Cosmos creds are wired).
curl -H "Authorization: Bearer <token>" https://<backend-host>/api/notes?workspaceId=<ws>
# 4. LLM smoke (proves OPENAI_API_KEY or AZURE_OPENAI_API_KEY are wired, if LLM_PROVIDER != mock).
curl -X POST -H "Authorization: Bearer <token>" -H "Content-Type: application/json" \
-d '{"workspaceId":"<ws>","noteId":"<noteId>","transform":"shorten"}' \
https://<backend-host>/api/notes/copilot/transform
If any returns 5xx, check the structured log line for a missing-secret error before re-rotating.
Red Flags
- A secret value appearing in
req.logorapp.logoutput. Stop, rotate, and audit. - A secret committed to git. Use
git filter-repoto scrub, force-push (coordinate with the team), and rotate the secret immediately. - Two pods seeing different secret values. Indicates a partial K8s rollout — finish the rollout before traffic is sent to the new version.
FIELD_ENCRYPT_KEY_PROVIDER=memoryin production. The backend will refuse to start, but if it slips through (e.g. withNODE_ENVset to something other thanproduction), all encrypted documents are unrecoverable on restart.
Open Items
- Centralized rotation calendar. Tracked in production-hardening backlog: schedule per-secret cadence (90 days for
OPENAI_API_KEY, 365 days forJWT_SECRET, etc.). - Audit log integration. Emit a
secret.rotatedevent toactiontrailafter each rotation. Currently rotation is logged only in AKV's own audit feed. - Dual-JWT support. Today
JWT_SECRETrotation invalidates outstanding tokens; planned: supportJWT_SECRET_NEXTfor graceful transitions.
References
- Config validation:
backend/src/lib/config.ts - AKV-backed encryption provider:
../learning_ai_common_plat/packages/field-encrypt/src/key-provider-akv.ts - Shared secret resolver:
../learning_ai_common_plat/packages/config/src/akv.ts - Related:
MEK_ROTATION.md