diff --git a/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md b/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md index f0e40edb..a2311b07 100644 --- a/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md +++ b/dashboards/tracker-web/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md @@ -30,7 +30,7 @@ stated verification, commit with the repo convention, push, and record the commi | **Tests present** | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet | | **DevOps present** | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) | | **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client` | -| **Deployed status** | `chronomind-web` healthy after fix today · `tracker-web` container `unhealthy` (B-001/B-002 root cause: valkey) | +| **Deployed status** | `tracker-web` container `unhealthy` as of 2026-05-25 (B-001/B-002 root cause: valkey cascade — see Phase 1.A) | --- @@ -46,6 +46,26 @@ stated verification, commit with the repo convention, push, and record the commi --- +## Mapping to ROADMAP.md Topic Sections + +This playbook groups work by **execution day**. The master roadmap groups the same work by +**topic**. Cross-reference when assigning tickets or marking progress: + +| Day milestone (this doc) | Topic (ROADMAP.md §) | +| -------------------------------- | --------------------------------------------------------------------- | +| 1.A Container Health Restoration | 1.1 Infrastructure Health | +| 1.B Workspace Health | 1.8 Workspace Health | +| 1.C Docker Hardening | 1.2 Docker Hardening | +| 1.D UI Drift Ratchet | 1.3 UI Drift Ratchet | +| 1.E Test Hardening | 1.5 Test Coverage | +| 1.F Observability + Security | 1.6 Error Handling & Observability + 1.7 Security + 1.4 Rate Limiting | + +Mark progress under **both** numbering schemes when checking off in +[`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) — agents reading either doc +must be able to find the same checkboxes. + +--- + ## Phase 1.A — Container Health Restoration (Day 1) **Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades. diff --git a/dashboards/tracker-web/docs/ROADMAP.md b/dashboards/tracker-web/docs/ROADMAP.md index 1adb7171..2b92f229 100644 --- a/dashboards/tracker-web/docs/ROADMAP.md +++ b/dashboards/tracker-web/docs/ROADMAP.md @@ -12,6 +12,21 @@ --- +## ⚡ Where to Start (Fresh-Agent Quick Pointer) + +You just opened this repo and want to make progress. Do this in order: + +1. **Read** [`PRD.md`](./PRD.md) (5 min) — what tracker is and why +2. **Read** [`roadmaps/01_FOUNDATIONS_AND_DECISIONS.md`](./roadmaps/01_FOUNDATIONS_AND_DECISIONS.md) (5 min) — locked decisions you must not silently change +3. **Read** [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) (10 min) — Day-by-day Phase 1 playbook +4. **Pick the first unchecked task** in [`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) following the dependency chain (1.A → 1.B → 1.C → 1.D → 1.E → 1.F) +5. **Do the task**, run its verification command, **commit with format** `(): `, **push to origin**, then **paste the commit SHA** into the corresponding `(______)` placeholder in `IMPLEMENTATION_TRACKER.md` +6. **Repeat** — one slice, one commit, one tracker update at a time + +> **Do not skip ahead** to Phase 2+ until Phase 1 exit criteria all pass (see [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md` § Exit Criteria](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md#exit-criteria-for-production-readiness-end-of-phase-1)). + +--- + ## 1. Purpose This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation @@ -147,6 +162,23 @@ Everything checked here is already shipped and running at **https://tracker.byte > **Target:** Sprint ending 2026-06-14 > **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) +> **Two numbering schemes — same content.** This file groups Phase 1 by **topic** (1.1–1.8). +> The handoff playbook groups the same work by **execution day** (1.A–1.F). Mapping: + +| Topic (this doc) | Day milestone (handoff) | +| ---------------------------------- | ----------------------------------------------- | +| 1.1 Infrastructure Health | 1.A Container Health Restoration | +| 1.2 Docker Hardening | 1.C Docker Hardening | +| 1.3 UI Drift Ratchet | 1.D UI Drift Ratchet | +| 1.4 Rate Limiting & Spam | _(in 1.F security cluster — execute alongside)_ | +| 1.5 Test Coverage | 1.E Test Hardening | +| 1.6 Error Handling & Observability | 1.F Observability + Security (obs half) | +| 1.7 Security | 1.F Observability + Security (sec half) | +| 1.8 Workspace Health | 1.B Workspace Health | + +> **Execution order** (sequential): 1.A → 1.B → 1.C → 1.D → 1.E → 1.F. Use this doc for +> _scope and acceptance_; use the handoff playbook for _step-by-step day-by-day execution_. + ### 1.1 Infrastructure Health ⚠️ - [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures @@ -228,7 +260,7 @@ Everything checked here is already shipped and running at **https://tracker.byte > **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track > work without leaving the tool. > **Target:** Sprint ending 2026-07-12 -> **Detailed plan:** [`docs/roadmaps/03_RICH_ITEMS_ROADMAP.md`](./roadmaps/03_RICH_ITEMS_ROADMAP.md) +> **Detailed plan:** `docs/roadmaps/03_RICH_ITEMS_ROADMAP.md` _(to be created when Phase 2 begins)_ ### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕 @@ -323,7 +355,7 @@ Everything checked here is already shipped and running at **https://tracker.byte > consume, update, and create tracker items — closing the loop between AI-assisted development > and project management. > **Target:** Sprint ending 2026-07-26 -> **Detailed plan:** [`docs/roadmaps/04_AGENT_API_ROADMAP.md`](./roadmaps/04_AGENT_API_ROADMAP.md) +> **Detailed plan:** `docs/roadmaps/04_AGENT_API_ROADMAP.md` _(to be created when Phase 3 begins)_ > **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first. ### 3.1 Agent Authentication @@ -409,7 +441,7 @@ All routes: `Authorization: Bearer ` + `X-Product-Id: {productId}`. > **Goal:** Every stakeholder — public users, internal team, developers, agents — has a > frictionless native path to submit and track items. > **Target:** Sprint ending 2026-08-09 -> **Detailed plan:** [`docs/roadmaps/05_INTAKE_ROADMAP.md`](./roadmaps/05_INTAKE_ROADMAP.md) +> **Detailed plan:** `docs/roadmaps/05_INTAKE_ROADMAP.md` _(to be created when Phase 4 begins)_ ### 4.1 Public Submission Enhancements 🌐 diff --git a/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md b/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md index 6916032d..94e6918a 100644 --- a/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md +++ b/dashboards/tracker-web/docs/roadmaps/00_MASTER_EXECUTION_PLAN.md @@ -1,7 +1,7 @@ # 00 — Master Execution Plan **Parent:** [`docs/ROADMAP.md`](../ROADMAP.md) -**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(Phase 2)_ · `04_AGENT_API_ROADMAP.md` _(Phase 3)_ · `05_INTAKE_ROADMAP.md` _(Phase 4)_ +**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(create when Phase 2 begins)_ · `04_AGENT_API_ROADMAP.md` _(create when Phase 3 begins)_ · `05_INTAKE_ROADMAP.md` _(create when Phase 4 begins)_ --- diff --git a/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md b/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md new file mode 100644 index 00000000..7cfed7e1 --- /dev/null +++ b/dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md @@ -0,0 +1,70 @@ +# MEK Rotation Runbook + +**Status:** Stub — populate as part of Phase 1.F.11 +**Owner:** Platform team +**Source pattern:** [`learning_ai_notes/docs/runbooks/MEK_ROTATION.md`](../../../../../learning_ai_notes/docs/runbooks/MEK_ROTATION.md) (commit `bcad7d3`) + +--- + +## Purpose + +This runbook describes how to **rotate the Master Encryption Key (MEK)** used for +field-level encryption of sensitive tracker data (PII fields on items, comments, attachment +URLs, agent API key seeds). + +Tracker MEK rotation follows the same envelope-encryption pattern as NoteLett: + +1. Tracker holds a **per-product MEK reference** in env (`TRACKER_MEK_ID_`). +2. The MEK itself is stored in **Azure KeyVault**, never in process memory beyond + a single request lifecycle. +3. Each encrypted field has a `keyId` envelope marking which MEK version encrypted it. +4. Rotation creates a new MEK version; new writes use the new version; reads support + both old and new until reencryption sweep completes. + +--- + +## Pre-rotation Checklist + +- [ ] Confirm Azure KeyVault access from tracker-service host +- [ ] Confirm latest backup of Cosmos `tracker` container exists (RPO < 1 h) +- [ ] Notify on-call: rotation window expected ~30 min for active read-path verification +- [ ] Capture baseline metrics — read/write latency on encrypted fields + +--- + +## Rotation Procedure + +> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/MEK_ROTATION.md` +> once tracker-service field encryption ships in Phase 1.F. Sketch only below. + +1. **Create new MEK version in KeyVault** + - `az keyvault key create --vault-name --name tracker-mek- --kty RSA` + - Record new `keyId` +2. **Update tracker-service env** with new `TRACKER_MEK_ID_` +3. **Rolling restart tracker-service** — new writes encrypt with new key +4. **Reencryption sweep** — background job re-reads + re-writes all encrypted fields with new key +5. **Verify** — zero encrypted fields still on old key version +6. **Revoke old key** — disable old KeyVault version + +--- + +## Rollback + +If decryption fails after rotation: + +1. Revert env to previous `TRACKER_MEK_ID_` +2. Restart tracker-service +3. Re-enable old KeyVault version +4. Investigate which fields failed before retrying + +--- + +## Verification + +- [ ] `pnpm run smoke:local` passes end-to-end after rotation +- [ ] All encrypted fields on items / comments / attachments decrypt correctly via API +- [ ] Audit log entry recorded for the rotation event + +--- + +_See [`SECRET_MANAGEMENT.md`](./SECRET_MANAGEMENT.md) for the broader env / KeyVault secret workflow._ diff --git a/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md b/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md new file mode 100644 index 00000000..a4720d98 --- /dev/null +++ b/dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md @@ -0,0 +1,88 @@ +# Secret Management Runbook + +**Status:** Stub — populate as part of Phase 1.F.12 +**Owner:** Platform team +**Source pattern:** [`learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`](../../../../../learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md) (commit `bcad7d3`) + +--- + +## Purpose + +This runbook documents how secrets flow from **Azure KeyVault → env → process** for +tracker-web (Next.js client/server) and the platform-service backend it proxies, and +how to rotate them safely. + +--- + +## Secret Inventory + +| Secret | Used by | Storage | Rotation cadence | +| ----------------------------------- | ----------------------------------------- | --------------------------------------- | ---------------------------------------------------- | +| `JWT_SECRET` | platform-service + tracker-web API routes | KeyVault `bytelyst-jwt-secret` | Quarterly | +| `TRACKER_MEK_ID_` | tracker-service field encryption | KeyVault per-product MEK keys | Quarterly (see [MEK_ROTATION.md](./MEK_ROTATION.md)) | +| `POSTHOG_KEY` | tracker-web client-side telemetry | KeyVault `tracker-posthog-key` | On compromise only | +| `COSMOS_CONNECTION_STRING` | platform-service Cosmos client | KeyVault `bytelyst-cosmos-conn` | On compromise only | +| `VALKEY_PASSWORD` | platform-service session/cache | KeyVault `bytelyst-valkey-password` | Quarterly | +| `TURNSTILE_SECRET` (Phase 1.4) | tracker-web public submission CAPTCHA | KeyVault `tracker-turnstile-secret` | On compromise only | +| `GITHUB_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gh-webhook-secret` | On compromise only | +| `GITEA_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gitea-webhook-secret` | On compromise only | +| Agent API keys (Phase 3.1) | end-user-managed; stored hashed in Cosmos | Cosmos `apikeys` collection | User-managed | + +--- + +## Resolution Path + +``` +KeyVault → Docker build args (NEXT_PUBLIC_*) → baked into next-build + ╲ + → docker-compose env_file → process.env at runtime + ↘ + systemd EnvironmentFile → process.env at runtime +``` + +- **Build-time secrets** (`NEXT_PUBLIC_*`) are baked into the Next.js standalone build via + Dockerfile `ARG` + `ENV` (Phase 1.C.3 / 1.2). Once baked they are visible in client JS; + only put truly public-safe values here. +- **Runtime secrets** flow via docker-compose `env_file` or systemd `EnvironmentFile` so + they are not visible in client bundles. +- **No secret ever** appears in `git`, `.env.example`, log lines, or container `inspect` + output. + +--- + +## Rotation Procedure + +> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md` +> once tracker-service ships secret-aware deployment in Phase 1.F. + +1. **Create new secret version in KeyVault** +2. **Update env source** (docker-compose `.env`, systemd unit, or CI/CD secret store) +3. **Rolling restart** affected services +4. **Verify** — `pnpm run smoke:local` passes against rotated stack +5. **Revoke previous secret version** in KeyVault after 24 h soak + +--- + +## On Suspected Compromise + +1. **Immediately revoke** the suspected secret in KeyVault +2. **Rotate** all dependent secrets in the same blast radius +3. **Force-revoke** all JWT tokens (bump `JWT_SECRET` → all sessions invalidated) +4. **Audit** access logs since suspected compromise window +5. **File a tracker bug** type `chore`, label `security`, priority `critical` + +--- + +## PII Scrubbing Rule + +Per Phase 1.F.10: emails, names, and any field marked `pii: true` in the schema must NEVER +appear as plaintext in: + +- Log lines (use `@bytelyst/logger` redaction map) +- Telemetry events sent to PostHog +- Error messages bubbled to clients +- Webhook delivery logs (Phase 3.4 / 3.5) + +--- + +_See [`MEK_ROTATION.md`](./MEK_ROTATION.md) for field-level encryption key rotation specifically._