learning_ai_common_plat/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md
root 59c4638f85
Some checks failed
CI — Common Platform / Build, Test & Typecheck (push) Failing after 12m58s
Publish @bytelyst/* packages / publish (push) Failing after 12m26s
docs(tracker): close review-pass gaps in tracker docs
1) Dual-numbering reconciliation
   - ROADMAP groups Phase 1 by topic (1.1-1.8); PRH groups by execution
     day (1.A-1.F). Added bidirectional mapping table to both docs so
     agents can cross-reference any phase reference unambiguously.

2) Fresh-agent quick pointer at top of ROADMAP
   - New section tells a new agent exactly which 4 docs to read, in
     what order, and which task to pick up first (1.A from the tracker).

3) Broken sub-roadmap links neutralised
   - 03_RICH_ITEMS_ROADMAP.md, 04_AGENT_API_ROADMAP.md,
     05_INTAKE_ROADMAP.md were linked but did not exist. Replaced with
     plain text + 'create when Phase N begins' note so the link doesn't
     404. Matches the pattern already used in IMPLEMENTATION_TRACKER.

4) Runbook stubs created (Phase 1.F.11/1.F.12 placeholders)
   - docs/runbooks/MEK_ROTATION.md — adapted from NoteLett bcad7d3
   - docs/runbooks/SECRET_MANAGEMENT.md — secret inventory + resolution
     path + compromise procedure + PII scrubbing rule
   Each is a stub now; full content lands when Phase 1.F executes.

5) Stale 'today' wording removed from PRH baseline table
   - Replaced 'after fix today' with 'as of 2026-05-25' so the doc
     ages cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 11:48:33 +00:00

4.4 KiB

Secret Management Runbook

Status: Stub — populate as part of Phase 1.F.12 Owner: Platform team Source pattern: learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md (commit bcad7d3)


Purpose

This runbook documents how secrets flow from Azure KeyVault → env → process for tracker-web (Next.js client/server) and the platform-service backend it proxies, and how to rotate them safely.


Secret Inventory

Secret Used by Storage Rotation cadence
JWT_SECRET platform-service + tracker-web API routes KeyVault bytelyst-jwt-secret Quarterly
TRACKER_MEK_ID_<PRODUCTID> tracker-service field encryption KeyVault per-product MEK keys Quarterly (see MEK_ROTATION.md)
POSTHOG_KEY tracker-web client-side telemetry KeyVault tracker-posthog-key On compromise only
COSMOS_CONNECTION_STRING platform-service Cosmos client KeyVault bytelyst-cosmos-conn On compromise only
VALKEY_PASSWORD platform-service session/cache KeyVault bytelyst-valkey-password Quarterly
TURNSTILE_SECRET (Phase 1.4) tracker-web public submission CAPTCHA KeyVault tracker-turnstile-secret On compromise only
GITHUB_WEBHOOK_SECRET (Phase 3.3) tracker-service inbound webhook HMAC KeyVault tracker-gh-webhook-secret On compromise only
GITEA_WEBHOOK_SECRET (Phase 3.3) tracker-service inbound webhook HMAC KeyVault tracker-gitea-webhook-secret On compromise only
Agent API keys (Phase 3.1) end-user-managed; stored hashed in Cosmos Cosmos apikeys collection User-managed

Resolution Path

KeyVault  →  Docker build args (NEXT_PUBLIC_*)  →  baked into next-build
         ╲
          →  docker-compose env_file       →  process.env at runtime
              ↘
               systemd EnvironmentFile      →  process.env at runtime
  • Build-time secrets (NEXT_PUBLIC_*) are baked into the Next.js standalone build via Dockerfile ARG + ENV (Phase 1.C.3 / 1.2). Once baked they are visible in client JS; only put truly public-safe values here.
  • Runtime secrets flow via docker-compose env_file or systemd EnvironmentFile so they are not visible in client bundles.
  • No secret ever appears in git, .env.example, log lines, or container inspect output.

Rotation Procedure

TODO — adapt full procedure from learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md once tracker-service ships secret-aware deployment in Phase 1.F.

  1. Create new secret version in KeyVault
  2. Update env source (docker-compose .env, systemd unit, or CI/CD secret store)
  3. Rolling restart affected services
  4. Verifypnpm run smoke:local passes against rotated stack
  5. Revoke previous secret version in KeyVault after 24 h soak

On Suspected Compromise

  1. Immediately revoke the suspected secret in KeyVault
  2. Rotate all dependent secrets in the same blast radius
  3. Force-revoke all JWT tokens (bump JWT_SECRET → all sessions invalidated)
  4. Audit access logs since suspected compromise window
  5. File a tracker bug type chore, label security, priority critical

PII Scrubbing Rule

Per Phase 1.F.10: emails, names, and any field marked pii: true in the schema must NEVER appear as plaintext in:

  • Log lines (use @bytelyst/logger redaction map)
  • Telemetry events sent to PostHog
  • Error messages bubbled to clients
  • Webhook delivery logs (Phase 3.4 / 3.5)

See MEK_ROTATION.md for field-level encryption key rotation specifically.