docs(tracker): close review-pass gaps in tracker docs
Some checks failed
CI — Common Platform / Build, Test & Typecheck (push) Failing after 12m58s
Publish @bytelyst/* packages / publish (push) Failing after 12m26s

1) Dual-numbering reconciliation
   - ROADMAP groups Phase 1 by topic (1.1-1.8); PRH groups by execution
     day (1.A-1.F). Added bidirectional mapping table to both docs so
     agents can cross-reference any phase reference unambiguously.

2) Fresh-agent quick pointer at top of ROADMAP
   - New section tells a new agent exactly which 4 docs to read, in
     what order, and which task to pick up first (1.A from the tracker).

3) Broken sub-roadmap links neutralised
   - 03_RICH_ITEMS_ROADMAP.md, 04_AGENT_API_ROADMAP.md,
     05_INTAKE_ROADMAP.md were linked but did not exist. Replaced with
     plain text + 'create when Phase N begins' note so the link doesn't
     404. Matches the pattern already used in IMPLEMENTATION_TRACKER.

4) Runbook stubs created (Phase 1.F.11/1.F.12 placeholders)
   - docs/runbooks/MEK_ROTATION.md — adapted from NoteLett bcad7d3
   - docs/runbooks/SECRET_MANAGEMENT.md — secret inventory + resolution
     path + compromise procedure + PII scrubbing rule
   Each is a stub now; full content lands when Phase 1.F executes.

5) Stale 'today' wording removed from PRH baseline table
   - Replaced 'after fix today' with 'as of 2026-05-25' so the doc
     ages cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
root 2026-05-25 11:48:33 +00:00
parent 930f97ff63
commit 59c4638f85
5 changed files with 215 additions and 5 deletions

View File

@ -30,7 +30,7 @@ stated verification, commit with the repo convention, push, and record the commi
| **Tests present** | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet | | **Tests present** | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet |
| **DevOps present** | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) | | **DevOps present** | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) |
| **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client` | | **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client` |
| **Deployed status** | `chronomind-web` healthy after fix today · `tracker-web` container `unhealthy` (B-001/B-002 root cause: valkey) | | **Deployed status** | `tracker-web` container `unhealthy` as of 2026-05-25 (B-001/B-002 root cause: valkey cascade — see Phase 1.A) |
--- ---
@ -46,6 +46,26 @@ stated verification, commit with the repo convention, push, and record the commi
--- ---
## Mapping to ROADMAP.md Topic Sections
This playbook groups work by **execution day**. The master roadmap groups the same work by
**topic**. Cross-reference when assigning tickets or marking progress:
| Day milestone (this doc) | Topic (ROADMAP.md §) |
| -------------------------------- | --------------------------------------------------------------------- |
| 1.A Container Health Restoration | 1.1 Infrastructure Health |
| 1.B Workspace Health | 1.8 Workspace Health |
| 1.C Docker Hardening | 1.2 Docker Hardening |
| 1.D UI Drift Ratchet | 1.3 UI Drift Ratchet |
| 1.E Test Hardening | 1.5 Test Coverage |
| 1.F Observability + Security | 1.6 Error Handling & Observability + 1.7 Security + 1.4 Rate Limiting |
Mark progress under **both** numbering schemes when checking off in
[`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) — agents reading either doc
must be able to find the same checkboxes.
---
## Phase 1.A — Container Health Restoration (Day 1) ## Phase 1.A — Container Health Restoration (Day 1)
**Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades. **Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades.

View File

@ -12,6 +12,21 @@
--- ---
## ⚡ Where to Start (Fresh-Agent Quick Pointer)
You just opened this repo and want to make progress. Do this in order:
1. **Read** [`PRD.md`](./PRD.md) (5 min) — what tracker is and why
2. **Read** [`roadmaps/01_FOUNDATIONS_AND_DECISIONS.md`](./roadmaps/01_FOUNDATIONS_AND_DECISIONS.md) (5 min) — locked decisions you must not silently change
3. **Read** [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) (10 min) — Day-by-day Phase 1 playbook
4. **Pick the first unchecked task** in [`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) following the dependency chain (1.A → 1.B → 1.C → 1.D → 1.E → 1.F)
5. **Do the task**, run its verification command, **commit with format** `<type>(<scope>): <subject>`, **push to origin**, then **paste the commit SHA** into the corresponding `(______)` placeholder in `IMPLEMENTATION_TRACKER.md`
6. **Repeat** — one slice, one commit, one tracker update at a time
> **Do not skip ahead** to Phase 2+ until Phase 1 exit criteria all pass (see [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md` § Exit Criteria](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md#exit-criteria-for-production-readiness-end-of-phase-1)).
---
## 1. Purpose ## 1. Purpose
This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation
@ -147,6 +162,23 @@ Everything checked here is already shipped and running at **https://tracker.byte
> **Target:** Sprint ending 2026-06-14 > **Target:** Sprint ending 2026-06-14
> **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) > **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md)
> **Two numbering schemes — same content.** This file groups Phase 1 by **topic** (1.11.8).
> The handoff playbook groups the same work by **execution day** (1.A1.F). Mapping:
| Topic (this doc) | Day milestone (handoff) |
| ---------------------------------- | ----------------------------------------------- |
| 1.1 Infrastructure Health | 1.A Container Health Restoration |
| 1.2 Docker Hardening | 1.C Docker Hardening |
| 1.3 UI Drift Ratchet | 1.D UI Drift Ratchet |
| 1.4 Rate Limiting & Spam | _(in 1.F security cluster — execute alongside)_ |
| 1.5 Test Coverage | 1.E Test Hardening |
| 1.6 Error Handling & Observability | 1.F Observability + Security (obs half) |
| 1.7 Security | 1.F Observability + Security (sec half) |
| 1.8 Workspace Health | 1.B Workspace Health |
> **Execution order** (sequential): 1.A → 1.B → 1.C → 1.D → 1.E → 1.F. Use this doc for
> _scope and acceptance_; use the handoff playbook for _step-by-step day-by-day execution_.
### 1.1 Infrastructure Health ⚠️ ### 1.1 Infrastructure Health ⚠️
- [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures - [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures
@ -228,7 +260,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
> **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track > **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track
> work without leaving the tool. > work without leaving the tool.
> **Target:** Sprint ending 2026-07-12 > **Target:** Sprint ending 2026-07-12
> **Detailed plan:** [`docs/roadmaps/03_RICH_ITEMS_ROADMAP.md`](./roadmaps/03_RICH_ITEMS_ROADMAP.md) > **Detailed plan:** `docs/roadmaps/03_RICH_ITEMS_ROADMAP.md` _(to be created when Phase 2 begins)_
### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕 ### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕
@ -323,7 +355,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
> consume, update, and create tracker items — closing the loop between AI-assisted development > consume, update, and create tracker items — closing the loop between AI-assisted development
> and project management. > and project management.
> **Target:** Sprint ending 2026-07-26 > **Target:** Sprint ending 2026-07-26
> **Detailed plan:** [`docs/roadmaps/04_AGENT_API_ROADMAP.md`](./roadmaps/04_AGENT_API_ROADMAP.md) > **Detailed plan:** `docs/roadmaps/04_AGENT_API_ROADMAP.md` _(to be created when Phase 3 begins)_
> **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first. > **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first.
### 3.1 Agent Authentication ### 3.1 Agent Authentication
@ -409,7 +441,7 @@ All routes: `Authorization: Bearer <agent-key>` + `X-Product-Id: {productId}`.
> **Goal:** Every stakeholder — public users, internal team, developers, agents — has a > **Goal:** Every stakeholder — public users, internal team, developers, agents — has a
> frictionless native path to submit and track items. > frictionless native path to submit and track items.
> **Target:** Sprint ending 2026-08-09 > **Target:** Sprint ending 2026-08-09
> **Detailed plan:** [`docs/roadmaps/05_INTAKE_ROADMAP.md`](./roadmaps/05_INTAKE_ROADMAP.md) > **Detailed plan:** `docs/roadmaps/05_INTAKE_ROADMAP.md` _(to be created when Phase 4 begins)_
### 4.1 Public Submission Enhancements 🌐 ### 4.1 Public Submission Enhancements 🌐

View File

@ -1,7 +1,7 @@
# 00 — Master Execution Plan # 00 — Master Execution Plan
**Parent:** [`docs/ROADMAP.md`](../ROADMAP.md) **Parent:** [`docs/ROADMAP.md`](../ROADMAP.md)
**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(Phase 2)_ · `04_AGENT_API_ROADMAP.md` _(Phase 3)_ · `05_INTAKE_ROADMAP.md` _(Phase 4)_ **Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(create when Phase 2 begins)_ · `04_AGENT_API_ROADMAP.md` _(create when Phase 3 begins)_ · `05_INTAKE_ROADMAP.md` _(create when Phase 4 begins)_
--- ---

View File

@ -0,0 +1,70 @@
# MEK Rotation Runbook
**Status:** Stub — populate as part of Phase 1.F.11
**Owner:** Platform team
**Source pattern:** [`learning_ai_notes/docs/runbooks/MEK_ROTATION.md`](../../../../../learning_ai_notes/docs/runbooks/MEK_ROTATION.md) (commit `bcad7d3`)
---
## Purpose
This runbook describes how to **rotate the Master Encryption Key (MEK)** used for
field-level encryption of sensitive tracker data (PII fields on items, comments, attachment
URLs, agent API key seeds).
Tracker MEK rotation follows the same envelope-encryption pattern as NoteLett:
1. Tracker holds a **per-product MEK reference** in env (`TRACKER_MEK_ID_<PRODUCTID>`).
2. The MEK itself is stored in **Azure KeyVault**, never in process memory beyond
a single request lifecycle.
3. Each encrypted field has a `keyId` envelope marking which MEK version encrypted it.
4. Rotation creates a new MEK version; new writes use the new version; reads support
both old and new until reencryption sweep completes.
---
## Pre-rotation Checklist
- [ ] Confirm Azure KeyVault access from tracker-service host
- [ ] Confirm latest backup of Cosmos `tracker` container exists (RPO < 1 h)
- [ ] Notify on-call: rotation window expected ~30 min for active read-path verification
- [ ] Capture baseline metrics — read/write latency on encrypted fields
---
## Rotation Procedure
> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/MEK_ROTATION.md`
> once tracker-service field encryption ships in Phase 1.F. Sketch only below.
1. **Create new MEK version in KeyVault**
- `az keyvault key create --vault-name <vault> --name tracker-mek-<productId> --kty RSA`
- Record new `keyId`
2. **Update tracker-service env** with new `TRACKER_MEK_ID_<PRODUCTID>`
3. **Rolling restart tracker-service** — new writes encrypt with new key
4. **Reencryption sweep** — background job re-reads + re-writes all encrypted fields with new key
5. **Verify** — zero encrypted fields still on old key version
6. **Revoke old key** — disable old KeyVault version
---
## Rollback
If decryption fails after rotation:
1. Revert env to previous `TRACKER_MEK_ID_<PRODUCTID>`
2. Restart tracker-service
3. Re-enable old KeyVault version
4. Investigate which fields failed before retrying
---
## Verification
- [ ] `pnpm run smoke:local` passes end-to-end after rotation
- [ ] All encrypted fields on items / comments / attachments decrypt correctly via API
- [ ] Audit log entry recorded for the rotation event
---
_See [`SECRET_MANAGEMENT.md`](./SECRET_MANAGEMENT.md) for the broader env / KeyVault secret workflow._

View File

@ -0,0 +1,88 @@
# Secret Management Runbook
**Status:** Stub — populate as part of Phase 1.F.12
**Owner:** Platform team
**Source pattern:** [`learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`](../../../../../learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md) (commit `bcad7d3`)
---
## Purpose
This runbook documents how secrets flow from **Azure KeyVault → env → process** for
tracker-web (Next.js client/server) and the platform-service backend it proxies, and
how to rotate them safely.
---
## Secret Inventory
| Secret | Used by | Storage | Rotation cadence |
| ----------------------------------- | ----------------------------------------- | --------------------------------------- | ---------------------------------------------------- |
| `JWT_SECRET` | platform-service + tracker-web API routes | KeyVault `bytelyst-jwt-secret` | Quarterly |
| `TRACKER_MEK_ID_<PRODUCTID>` | tracker-service field encryption | KeyVault per-product MEK keys | Quarterly (see [MEK_ROTATION.md](./MEK_ROTATION.md)) |
| `POSTHOG_KEY` | tracker-web client-side telemetry | KeyVault `tracker-posthog-key` | On compromise only |
| `COSMOS_CONNECTION_STRING` | platform-service Cosmos client | KeyVault `bytelyst-cosmos-conn` | On compromise only |
| `VALKEY_PASSWORD` | platform-service session/cache | KeyVault `bytelyst-valkey-password` | Quarterly |
| `TURNSTILE_SECRET` (Phase 1.4) | tracker-web public submission CAPTCHA | KeyVault `tracker-turnstile-secret` | On compromise only |
| `GITHUB_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gh-webhook-secret` | On compromise only |
| `GITEA_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gitea-webhook-secret` | On compromise only |
| Agent API keys (Phase 3.1) | end-user-managed; stored hashed in Cosmos | Cosmos `apikeys` collection | User-managed |
---
## Resolution Path
```
KeyVault → Docker build args (NEXT_PUBLIC_*) → baked into next-build
→ docker-compose env_file → process.env at runtime
systemd EnvironmentFile → process.env at runtime
```
- **Build-time secrets** (`NEXT_PUBLIC_*`) are baked into the Next.js standalone build via
Dockerfile `ARG` + `ENV` (Phase 1.C.3 / 1.2). Once baked they are visible in client JS;
only put truly public-safe values here.
- **Runtime secrets** flow via docker-compose `env_file` or systemd `EnvironmentFile` so
they are not visible in client bundles.
- **No secret ever** appears in `git`, `.env.example`, log lines, or container `inspect`
output.
---
## Rotation Procedure
> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`
> once tracker-service ships secret-aware deployment in Phase 1.F.
1. **Create new secret version in KeyVault**
2. **Update env source** (docker-compose `.env`, systemd unit, or CI/CD secret store)
3. **Rolling restart** affected services
4. **Verify**`pnpm run smoke:local` passes against rotated stack
5. **Revoke previous secret version** in KeyVault after 24 h soak
---
## On Suspected Compromise
1. **Immediately revoke** the suspected secret in KeyVault
2. **Rotate** all dependent secrets in the same blast radius
3. **Force-revoke** all JWT tokens (bump `JWT_SECRET` → all sessions invalidated)
4. **Audit** access logs since suspected compromise window
5. **File a tracker bug** type `chore`, label `security`, priority `critical`
---
## PII Scrubbing Rule
Per Phase 1.F.10: emails, names, and any field marked `pii: true` in the schema must NEVER
appear as plaintext in:
- Log lines (use `@bytelyst/logger` redaction map)
- Telemetry events sent to PostHog
- Error messages bubbled to clients
- Webhook delivery logs (Phase 3.4 / 3.5)
---
_See [`MEK_ROTATION.md`](./MEK_ROTATION.md) for field-level encryption key rotation specifically._