docs(tracker): close review-pass gaps in tracker docs

1) Dual-numbering reconciliation - ROADMAP groups Phase 1 by topic (1.1-1.8); PRH groups by execution day (1.A-1.F). Added bidirectional mapping table to both docs so agents can cross-reference any phase reference unambiguously. 2) Fresh-agent quick pointer at top of ROADMAP - New section tells a new agent exactly which 4 docs to read, in what order, and which task to pick up first (1.A from the tracker). 3) Broken sub-roadmap links neutralised - 03_RICH_ITEMS_ROADMAP.md, 04_AGENT_API_ROADMAP.md, 05_INTAKE_ROADMAP.md were linked but did not exist. Replaced with plain text + 'create when Phase N begins' note so the link doesn't 404. Matches the pattern already used in IMPLEMENTATION_TRACKER. 4) Runbook stubs created (Phase 1.F.11/1.F.12 placeholders) - docs/runbooks/MEK_ROTATION.md — adapted from NoteLett bcad7d3 - docs/runbooks/SECRET_MANAGEMENT.md — secret inventory + resolution path + compromise procedure + PII scrubbing rule Each is a stub now; full content lands when Phase 1.F executes. 5) Stale 'today' wording removed from PRH baseline table - Replaced 'after fix today' with 'as of 2026-05-25' so the doc ages cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 11:48:33 +00:00 · 2026-05-25 11:48:33 +00:00 · 59c4638f85
commit 59c4638f85
parent 930f97ff63
5 changed files with 215 additions and 5 deletions
--- a/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md
+++ b/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md
@ -30,7 +30,7 @@ stated verification, commit with the repo convention, push, and record the commi
 | **Tests present**                 | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet                                                       |
 | **DevOps present**                | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) |
 | **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client`         |
-| **Deployed status**               | `chronomind-web` healthy after fix today · `tracker-web` container `unhealthy` (B-001/B-002 root cause: valkey)                                |
+| **Deployed status**               | `tracker-web` container `unhealthy` as of 2026-05-25 (B-001/B-002 root cause: valkey cascade — see Phase 1.A)                                  |
 ---
@ -46,6 +46,26 @@ stated verification, commit with the repo convention, push, and record the commi
 ---
 ## Mapping to ROADMAP.md Topic Sections
 This playbook groups work by **execution day**. The master roadmap groups the same work by
 **topic**. Cross-reference when assigning tickets or marking progress:
 | Day milestone (this doc)         | Topic (ROADMAP.md §)                                                  |
 | -------------------------------- | --------------------------------------------------------------------- |
 | 1.A Container Health Restoration | 1.1 Infrastructure Health                                             |
 | 1.B Workspace Health             | 1.8 Workspace Health                                                  |
 | 1.C Docker Hardening             | 1.2 Docker Hardening                                                  |
 | 1.D UI Drift Ratchet             | 1.3 UI Drift Ratchet                                                  |
 | 1.E Test Hardening               | 1.5 Test Coverage                                                     |
 | 1.F Observability + Security     | 1.6 Error Handling & Observability + 1.7 Security + 1.4 Rate Limiting |
 Mark progress under **both** numbering schemes when checking off in
 [`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) — agents reading either doc
 must be able to find the same checkboxes.
 ---
 ## Phase 1.A — Container Health Restoration (Day 1)
 **Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades.
--- a/dashboards/tracker-web/docs/ROADMAP.md
+++ b/dashboards/tracker-web/docs/ROADMAP.md
@ -12,6 +12,21 @@
 ---
 ## ⚡ Where to Start (Fresh-Agent Quick Pointer)
 You just opened this repo and want to make progress. Do this in order:
 1. **Read** [`PRD.md`](./PRD.md) (5 min) — what tracker is and why
 2. **Read** [`roadmaps/01_FOUNDATIONS_AND_DECISIONS.md`](./roadmaps/01_FOUNDATIONS_AND_DECISIONS.md) (5 min) — locked decisions you must not silently change
 3. **Read** [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) (10 min) — Day-by-day Phase 1 playbook
 4. **Pick the first unchecked task** in [`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) following the dependency chain (1.A → 1.B → 1.C → 1.D → 1.E → 1.F)
 5. **Do the task**, run its verification command, **commit with format** `<type>(<scope>): <subject>`, **push to origin**, then **paste the commit SHA** into the corresponding `(______)` placeholder in `IMPLEMENTATION_TRACKER.md`
 6. **Repeat** — one slice, one commit, one tracker update at a time
 > **Do not skip ahead** to Phase 2+ until Phase 1 exit criteria all pass (see [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md` § Exit Criteria](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md#exit-criteria-for-production-readiness-end-of-phase-1)).
 ---
 ## 1. Purpose
 This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation
@ -147,6 +162,23 @@ Everything checked here is already shipped and running at **https://tracker.byte
 > **Target:** Sprint ending 2026-06-14
 > **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md)
 > **Two numbering schemes — same content.** This file groups Phase 1 by **topic** (1.1–1.8).
 > The handoff playbook groups the same work by **execution day** (1.A–1.F). Mapping:
 | Topic (this doc)                   | Day milestone (handoff)                         |
 | ---------------------------------- | ----------------------------------------------- |
 | 1.1 Infrastructure Health          | 1.A Container Health Restoration                |
 | 1.2 Docker Hardening               | 1.C Docker Hardening                            |
 | 1.3 UI Drift Ratchet               | 1.D UI Drift Ratchet                            |
 | 1.4 Rate Limiting & Spam           | _(in 1.F security cluster — execute alongside)_ |
 | 1.5 Test Coverage                  | 1.E Test Hardening                              |
 | 1.6 Error Handling & Observability | 1.F Observability + Security (obs half)         |
 | 1.7 Security                       | 1.F Observability + Security (sec half)         |
 | 1.8 Workspace Health               | 1.B Workspace Health                            |
 > **Execution order** (sequential): 1.A → 1.B → 1.C → 1.D → 1.E → 1.F. Use this doc for
 > _scope and acceptance_; use the handoff playbook for _step-by-step day-by-day execution_.
 ### 1.1 Infrastructure Health ⚠️
 - [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures
@ -228,7 +260,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
 > **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track
 > work without leaving the tool.
 > **Target:** Sprint ending 2026-07-12
-> **Detailed plan:** [`docs/roadmaps/03_RICH_ITEMS_ROADMAP.md`](./roadmaps/03_RICH_ITEMS_ROADMAP.md)
+> **Detailed plan:** `docs/roadmaps/03_RICH_ITEMS_ROADMAP.md` _(to be created when Phase 2 begins)_
 ### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕
@ -323,7 +355,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
 > consume, update, and create tracker items — closing the loop between AI-assisted development
 > and project management.
 > **Target:** Sprint ending 2026-07-26
-> **Detailed plan:** [`docs/roadmaps/04_AGENT_API_ROADMAP.md`](./roadmaps/04_AGENT_API_ROADMAP.md)
+> **Detailed plan:** `docs/roadmaps/04_AGENT_API_ROADMAP.md` _(to be created when Phase 3 begins)_
 > **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first.
 ### 3.1 Agent Authentication
@ -409,7 +441,7 @@ All routes: `Authorization: Bearer <agent-key>` + `X-Product-Id: {productId}`.
 > **Goal:** Every stakeholder — public users, internal team, developers, agents — has a
 > frictionless native path to submit and track items.
 > **Target:** Sprint ending 2026-08-09
-> **Detailed plan:** [`docs/roadmaps/05_INTAKE_ROADMAP.md`](./roadmaps/05_INTAKE_ROADMAP.md)
+> **Detailed plan:** `docs/roadmaps/05_INTAKE_ROADMAP.md` _(to be created when Phase 4 begins)_
 ### 4.1 Public Submission Enhancements 🌐
--- a/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md
+++ b/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md
@ -1,7 +1,7 @@
 # 00 — Master Execution Plan
 **Parent:** [`docs/ROADMAP.md`](../ROADMAP.md)
-**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(Phase 2)_ · `04_AGENT_API_ROADMAP.md` _(Phase 3)_ · `05_INTAKE_ROADMAP.md` _(Phase 4)_
+**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(create when Phase 2 begins)_ · `04_AGENT_API_ROADMAP.md` _(create when Phase 3 begins)_ · `05_INTAKE_ROADMAP.md` _(create when Phase 4 begins)_
 ---
--- a/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md
+++ b/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md
@ -0,0 +1,70 @@
 # MEK Rotation Runbook
 **Status:** Stub — populate as part of Phase 1.F.11
 **Owner:** Platform team
 **Source pattern:** [`learning_ai_notes/docs/runbooks/MEK_ROTATION.md`](../../../../../learning_ai_notes/docs/runbooks/MEK_ROTATION.md) (commit `bcad7d3`)
 ---
 ## Purpose
 This runbook describes how to **rotate the Master Encryption Key (MEK)** used for
 field-level encryption of sensitive tracker data (PII fields on items, comments, attachment
 URLs, agent API key seeds).
 Tracker MEK rotation follows the same envelope-encryption pattern as NoteLett:
 1. Tracker holds a **per-product MEK reference** in env (`TRACKER_MEK_ID_<PRODUCTID>`).
 2. The MEK itself is stored in **Azure KeyVault**, never in process memory beyond
   a single request lifecycle.
 3. Each encrypted field has a `keyId` envelope marking which MEK version encrypted it.
 4. Rotation creates a new MEK version; new writes use the new version; reads support
   both old and new until reencryption sweep completes.
 ---
 ## Pre-rotation Checklist
 - [ ] Confirm Azure KeyVault access from tracker-service host
 - [ ] Confirm latest backup of Cosmos `tracker` container exists (RPO < 1 h)
 - [ ] Notify on-call: rotation window expected ~30 min for active read-path verification
 - [ ] Capture baseline metrics — read/write latency on encrypted fields
 ---
 ## Rotation Procedure
 > **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/MEK_ROTATION.md`
 > once tracker-service field encryption ships in Phase 1.F. Sketch only below.
 1. **Create new MEK version in KeyVault**
   - `az keyvault key create --vault-name <vault> --name tracker-mek-<productId> --kty RSA`
   - Record new `keyId`
 2. **Update tracker-service env** with new `TRACKER_MEK_ID_<PRODUCTID>`
 3. **Rolling restart tracker-service** — new writes encrypt with new key
 4. **Reencryption sweep** — background job re-reads + re-writes all encrypted fields with new key
 5. **Verify** — zero encrypted fields still on old key version
 6. **Revoke old key** — disable old KeyVault version
 ---
 ## Rollback
 If decryption fails after rotation:
 1. Revert env to previous `TRACKER_MEK_ID_<PRODUCTID>`
 2. Restart tracker-service
 3. Re-enable old KeyVault version
 4. Investigate which fields failed before retrying
 ---
 ## Verification
 - [ ] `pnpm run smoke:local` passes end-to-end after rotation
 - [ ] All encrypted fields on items / comments / attachments decrypt correctly via API
 - [ ] Audit log entry recorded for the rotation event
 ---
 _See [`SECRET_MANAGEMENT.md`](./SECRET_MANAGEMENT.md) for the broader env / KeyVault secret workflow._
--- a/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md
+++ b/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md
@ -0,0 +1,88 @@
 # Secret Management Runbook
 **Status:** Stub — populate as part of Phase 1.F.12
 **Owner:** Platform team
 **Source pattern:** [`learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`](../../../../../learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md) (commit `bcad7d3`)
 ---
 ## Purpose
 This runbook documents how secrets flow from **Azure KeyVault → env → process** for
 tracker-web (Next.js client/server) and the platform-service backend it proxies, and
 how to rotate them safely.
 ---
 ## Secret Inventory
 | Secret                              | Used by                                   | Storage                                 | Rotation cadence                                     |
 | ----------------------------------- | ----------------------------------------- | --------------------------------------- | ---------------------------------------------------- |
 | `JWT_SECRET`                        | platform-service + tracker-web API routes | KeyVault `bytelyst-jwt-secret`          | Quarterly                                            |
 | `TRACKER_MEK_ID_<PRODUCTID>`        | tracker-service field encryption          | KeyVault per-product MEK keys           | Quarterly (see [MEK_ROTATION.md](./MEK_ROTATION.md)) |
 | `POSTHOG_KEY`                       | tracker-web client-side telemetry         | KeyVault `tracker-posthog-key`          | On compromise only                                   |
 | `COSMOS_CONNECTION_STRING`          | platform-service Cosmos client            | KeyVault `bytelyst-cosmos-conn`         | On compromise only                                   |
 | `VALKEY_PASSWORD`                   | platform-service session/cache            | KeyVault `bytelyst-valkey-password`     | Quarterly                                            |
 | `TURNSTILE_SECRET` (Phase 1.4)      | tracker-web public submission CAPTCHA     | KeyVault `tracker-turnstile-secret`     | On compromise only                                   |
 | `GITHUB_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC      | KeyVault `tracker-gh-webhook-secret`    | On compromise only                                   |
 | `GITEA_WEBHOOK_SECRET` (Phase 3.3)  | tracker-service inbound webhook HMAC      | KeyVault `tracker-gitea-webhook-secret` | On compromise only                                   |
 | Agent API keys (Phase 3.1)          | end-user-managed; stored hashed in Cosmos | Cosmos `apikeys` collection             | User-managed                                         |
 ---
 ## Resolution Path
 ```
 KeyVault  →  Docker build args (NEXT_PUBLIC_*)  →  baked into next-build
         ╲
          →  docker-compose env_file       →  process.env at runtime
              ↘
               systemd EnvironmentFile      →  process.env at runtime
 ```
 - **Build-time secrets** (`NEXT_PUBLIC_*`) are baked into the Next.js standalone build via
  Dockerfile `ARG` + `ENV` (Phase 1.C.3 / 1.2). Once baked they are visible in client JS;
  only put truly public-safe values here.
 - **Runtime secrets** flow via docker-compose `env_file` or systemd `EnvironmentFile` so
  they are not visible in client bundles.
 - **No secret ever** appears in `git`, `.env.example`, log lines, or container `inspect`
  output.
 ---
 ## Rotation Procedure
 > **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`
 > once tracker-service ships secret-aware deployment in Phase 1.F.
 1. **Create new secret version in KeyVault**
 2. **Update env source** (docker-compose `.env`, systemd unit, or CI/CD secret store)
 3. **Rolling restart** affected services
 4. **Verify** — `pnpm run smoke:local` passes against rotated stack
 5. **Revoke previous secret version** in KeyVault after 24 h soak
 ---
 ## On Suspected Compromise
 1. **Immediately revoke** the suspected secret in KeyVault
 2. **Rotate** all dependent secrets in the same blast radius
 3. **Force-revoke** all JWT tokens (bump `JWT_SECRET` → all sessions invalidated)
 4. **Audit** access logs since suspected compromise window
 5. **File a tracker bug** type `chore`, label `security`, priority `critical`
 ---
 ## PII Scrubbing Rule
 Per Phase 1.F.10: emails, names, and any field marked `pii: true` in the schema must NEVER
 appear as plaintext in:
 - Log lines (use `@bytelyst/logger` redaction map)
 - Telemetry events sent to PostHog
 - Error messages bubbled to clients
 - Webhook delivery logs (Phase 3.4 / 3.5)
 ---
 _See [`MEK_ROTATION.md`](./MEK_ROTATION.md) for field-level encryption key rotation specifically._