learning_ai_common_plat/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md
root 59c4638f85
Some checks failed
CI — Common Platform / Build, Test & Typecheck (push) Failing after 12m58s
Publish @bytelyst/* packages / publish (push) Failing after 12m26s
docs(tracker): close review-pass gaps in tracker docs
1) Dual-numbering reconciliation
   - ROADMAP groups Phase 1 by topic (1.1-1.8); PRH groups by execution
     day (1.A-1.F). Added bidirectional mapping table to both docs so
     agents can cross-reference any phase reference unambiguously.

2) Fresh-agent quick pointer at top of ROADMAP
   - New section tells a new agent exactly which 4 docs to read, in
     what order, and which task to pick up first (1.A from the tracker).

3) Broken sub-roadmap links neutralised
   - 03_RICH_ITEMS_ROADMAP.md, 04_AGENT_API_ROADMAP.md,
     05_INTAKE_ROADMAP.md were linked but did not exist. Replaced with
     plain text + 'create when Phase N begins' note so the link doesn't
     404. Matches the pattern already used in IMPLEMENTATION_TRACKER.

4) Runbook stubs created (Phase 1.F.11/1.F.12 placeholders)
   - docs/runbooks/MEK_ROTATION.md — adapted from NoteLett bcad7d3
   - docs/runbooks/SECRET_MANAGEMENT.md — secret inventory + resolution
     path + compromise procedure + PII scrubbing rule
   Each is a stub now; full content lands when Phase 1.F executes.

5) Stale 'today' wording removed from PRH baseline table
   - Replaced 'after fix today' with 'as of 2026-05-25' so the doc
     ages cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 11:48:33 +00:00

2.5 KiB

MEK Rotation Runbook

Status: Stub — populate as part of Phase 1.F.11 Owner: Platform team Source pattern: learning_ai_notes/docs/runbooks/MEK_ROTATION.md (commit bcad7d3)


Purpose

This runbook describes how to rotate the Master Encryption Key (MEK) used for field-level encryption of sensitive tracker data (PII fields on items, comments, attachment URLs, agent API key seeds).

Tracker MEK rotation follows the same envelope-encryption pattern as NoteLett:

  1. Tracker holds a per-product MEK reference in env (TRACKER_MEK_ID_<PRODUCTID>).
  2. The MEK itself is stored in Azure KeyVault, never in process memory beyond a single request lifecycle.
  3. Each encrypted field has a keyId envelope marking which MEK version encrypted it.
  4. Rotation creates a new MEK version; new writes use the new version; reads support both old and new until reencryption sweep completes.

Pre-rotation Checklist

  • Confirm Azure KeyVault access from tracker-service host
  • Confirm latest backup of Cosmos tracker container exists (RPO < 1 h)
  • Notify on-call: rotation window expected ~30 min for active read-path verification
  • Capture baseline metrics — read/write latency on encrypted fields

Rotation Procedure

TODO — adapt full procedure from learning_ai_notes/docs/runbooks/MEK_ROTATION.md once tracker-service field encryption ships in Phase 1.F. Sketch only below.

  1. Create new MEK version in KeyVault
    • az keyvault key create --vault-name <vault> --name tracker-mek-<productId> --kty RSA
    • Record new keyId
  2. Update tracker-service env with new TRACKER_MEK_ID_<PRODUCTID>
  3. Rolling restart tracker-service — new writes encrypt with new key
  4. Reencryption sweep — background job re-reads + re-writes all encrypted fields with new key
  5. Verify — zero encrypted fields still on old key version
  6. Revoke old key — disable old KeyVault version

Rollback

If decryption fails after rotation:

  1. Revert env to previous TRACKER_MEK_ID_<PRODUCTID>
  2. Restart tracker-service
  3. Re-enable old KeyVault version
  4. Investigate which fields failed before retrying

Verification

  • pnpm run smoke:local passes end-to-end after rotation
  • All encrypted fields on items / comments / attachments decrypt correctly via API
  • Audit log entry recorded for the rotation event

See SECRET_MANAGEMENT.md for the broader env / KeyVault secret workflow.