docs(tracker): close review-pass gaps in tracker docs
1) Dual-numbering reconciliation
- ROADMAP groups Phase 1 by topic (1.1-1.8); PRH groups by execution
day (1.A-1.F). Added bidirectional mapping table to both docs so
agents can cross-reference any phase reference unambiguously.
2) Fresh-agent quick pointer at top of ROADMAP
- New section tells a new agent exactly which 4 docs to read, in
what order, and which task to pick up first (1.A from the tracker).
3) Broken sub-roadmap links neutralised
- 03_RICH_ITEMS_ROADMAP.md, 04_AGENT_API_ROADMAP.md,
05_INTAKE_ROADMAP.md were linked but did not exist. Replaced with
plain text + 'create when Phase N begins' note so the link doesn't
404. Matches the pattern already used in IMPLEMENTATION_TRACKER.
4) Runbook stubs created (Phase 1.F.11/1.F.12 placeholders)
- docs/runbooks/MEK_ROTATION.md — adapted from NoteLett bcad7d3
- docs/runbooks/SECRET_MANAGEMENT.md — secret inventory + resolution
path + compromise procedure + PII scrubbing rule
Each is a stub now; full content lands when Phase 1.F executes.
5) Stale 'today' wording removed from PRH baseline table
- Replaced 'after fix today' with 'as of 2026-05-25' so the doc
ages cleanly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
930f97ff63
commit
59c4638f85
@ -30,7 +30,7 @@ stated verification, commit with the repo convention, push, and record the commi
|
|||||||
| **Tests present** | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet |
|
| **Tests present** | Vitest scaffolding + ~6 unit tests · Playwright scaffold · no `@axe-core/playwright` yet |
|
||||||
| **DevOps present** | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) |
|
| **DevOps present** | `Dockerfile` (standalone Next.js) · `docker-compose.yml` · `vercel.json` · no `scripts/docker-prep.sh` yet (must adopt from NoteLett/FlowMonk) |
|
||||||
| **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client` |
|
| **Common platform packages used** | `@bytelyst/api-client` · `@bytelyst/dashboard-components` · `@bytelyst/react-auth` · `@bytelyst/logger` · `@bytelyst/telemetry-client` |
|
||||||
| **Deployed status** | `chronomind-web` healthy after fix today · `tracker-web` container `unhealthy` (B-001/B-002 root cause: valkey) |
|
| **Deployed status** | `tracker-web` container `unhealthy` as of 2026-05-25 (B-001/B-002 root cause: valkey cascade — see Phase 1.A) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -46,6 +46,26 @@ stated verification, commit with the repo convention, push, and record the commi
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Mapping to ROADMAP.md Topic Sections
|
||||||
|
|
||||||
|
This playbook groups work by **execution day**. The master roadmap groups the same work by
|
||||||
|
**topic**. Cross-reference when assigning tickets or marking progress:
|
||||||
|
|
||||||
|
| Day milestone (this doc) | Topic (ROADMAP.md §) |
|
||||||
|
| -------------------------------- | --------------------------------------------------------------------- |
|
||||||
|
| 1.A Container Health Restoration | 1.1 Infrastructure Health |
|
||||||
|
| 1.B Workspace Health | 1.8 Workspace Health |
|
||||||
|
| 1.C Docker Hardening | 1.2 Docker Hardening |
|
||||||
|
| 1.D UI Drift Ratchet | 1.3 UI Drift Ratchet |
|
||||||
|
| 1.E Test Hardening | 1.5 Test Coverage |
|
||||||
|
| 1.F Observability + Security | 1.6 Error Handling & Observability + 1.7 Security + 1.4 Rate Limiting |
|
||||||
|
|
||||||
|
Mark progress under **both** numbering schemes when checking off in
|
||||||
|
[`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) — agents reading either doc
|
||||||
|
must be able to find the same checkboxes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Phase 1.A — Container Health Restoration (Day 1)
|
## Phase 1.A — Container Health Restoration (Day 1)
|
||||||
|
|
||||||
**Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades.
|
**Goal:** Get all tracker-related containers healthy. Block on this — everything else cascades.
|
||||||
|
|||||||
@ -12,6 +12,21 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## ⚡ Where to Start (Fresh-Agent Quick Pointer)
|
||||||
|
|
||||||
|
You just opened this repo and want to make progress. Do this in order:
|
||||||
|
|
||||||
|
1. **Read** [`PRD.md`](./PRD.md) (5 min) — what tracker is and why
|
||||||
|
2. **Read** [`roadmaps/01_FOUNDATIONS_AND_DECISIONS.md`](./roadmaps/01_FOUNDATIONS_AND_DECISIONS.md) (5 min) — locked decisions you must not silently change
|
||||||
|
3. **Read** [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md) (10 min) — Day-by-day Phase 1 playbook
|
||||||
|
4. **Pick the first unchecked task** in [`IMPLEMENTATION_TRACKER.md`](./IMPLEMENTATION_TRACKER.md) following the dependency chain (1.A → 1.B → 1.C → 1.D → 1.E → 1.F)
|
||||||
|
5. **Do the task**, run its verification command, **commit with format** `<type>(<scope>): <subject>`, **push to origin**, then **paste the commit SHA** into the corresponding `(______)` placeholder in `IMPLEMENTATION_TRACKER.md`
|
||||||
|
6. **Repeat** — one slice, one commit, one tracker update at a time
|
||||||
|
|
||||||
|
> **Do not skip ahead** to Phase 2+ until Phase 1 exit criteria all pass (see [`PRODUCTION_READINESS_HANDOFF_ROADMAP.md` § Exit Criteria](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md#exit-criteria-for-production-readiness-end-of-phase-1)).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 1. Purpose
|
## 1. Purpose
|
||||||
|
|
||||||
This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation
|
This is the **master execution tracker** for the tracker-web dashboard. Detailed implementation
|
||||||
@ -147,6 +162,23 @@ Everything checked here is already shipped and running at **https://tracker.byte
|
|||||||
> **Target:** Sprint ending 2026-06-14
|
> **Target:** Sprint ending 2026-06-14
|
||||||
> **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md)
|
> **Detailed plan:** [`docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md`](./PRODUCTION_READINESS_HANDOFF_ROADMAP.md)
|
||||||
|
|
||||||
|
> **Two numbering schemes — same content.** This file groups Phase 1 by **topic** (1.1–1.8).
|
||||||
|
> The handoff playbook groups the same work by **execution day** (1.A–1.F). Mapping:
|
||||||
|
|
||||||
|
| Topic (this doc) | Day milestone (handoff) |
|
||||||
|
| ---------------------------------- | ----------------------------------------------- |
|
||||||
|
| 1.1 Infrastructure Health | 1.A Container Health Restoration |
|
||||||
|
| 1.2 Docker Hardening | 1.C Docker Hardening |
|
||||||
|
| 1.3 UI Drift Ratchet | 1.D UI Drift Ratchet |
|
||||||
|
| 1.4 Rate Limiting & Spam | _(in 1.F security cluster — execute alongside)_ |
|
||||||
|
| 1.5 Test Coverage | 1.E Test Hardening |
|
||||||
|
| 1.6 Error Handling & Observability | 1.F Observability + Security (obs half) |
|
||||||
|
| 1.7 Security | 1.F Observability + Security (sec half) |
|
||||||
|
| 1.8 Workspace Health | 1.B Workspace Health |
|
||||||
|
|
||||||
|
> **Execution order** (sequential): 1.A → 1.B → 1.C → 1.D → 1.E → 1.F. Use this doc for
|
||||||
|
> _scope and acceptance_; use the handoff playbook for _step-by-step day-by-day execution_.
|
||||||
|
|
||||||
### 1.1 Infrastructure Health ⚠️
|
### 1.1 Infrastructure Health ⚠️
|
||||||
|
|
||||||
- [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures
|
- [ ] **Fix valkey (Redis) container health** — currently `unhealthy`; root cause of most downstream container failures
|
||||||
@ -228,7 +260,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
|
|||||||
> **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track
|
> **Goal:** Items rich enough for developers, PMs, and agents to fully spec, reproduce, and track
|
||||||
> work without leaving the tool.
|
> work without leaving the tool.
|
||||||
> **Target:** Sprint ending 2026-07-12
|
> **Target:** Sprint ending 2026-07-12
|
||||||
> **Detailed plan:** [`docs/roadmaps/03_RICH_ITEMS_ROADMAP.md`](./roadmaps/03_RICH_ITEMS_ROADMAP.md)
|
> **Detailed plan:** `docs/roadmaps/03_RICH_ITEMS_ROADMAP.md` _(to be created when Phase 2 begins)_
|
||||||
|
|
||||||
### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕
|
### 2.1 UI Primitives Migration to `@bytelyst/ui` 🆕
|
||||||
|
|
||||||
@ -323,7 +355,7 @@ Everything checked here is already shipped and running at **https://tracker.byte
|
|||||||
> consume, update, and create tracker items — closing the loop between AI-assisted development
|
> consume, update, and create tracker items — closing the loop between AI-assisted development
|
||||||
> and project management.
|
> and project management.
|
||||||
> **Target:** Sprint ending 2026-07-26
|
> **Target:** Sprint ending 2026-07-26
|
||||||
> **Detailed plan:** [`docs/roadmaps/04_AGENT_API_ROADMAP.md`](./roadmaps/04_AGENT_API_ROADMAP.md)
|
> **Detailed plan:** `docs/roadmaps/04_AGENT_API_ROADMAP.md` _(to be created when Phase 3 begins)_
|
||||||
> **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first.
|
> **Dependency:** Phase 2 acceptance-criteria checklist + PR link fields must ship first.
|
||||||
|
|
||||||
### 3.1 Agent Authentication
|
### 3.1 Agent Authentication
|
||||||
@ -409,7 +441,7 @@ All routes: `Authorization: Bearer <agent-key>` + `X-Product-Id: {productId}`.
|
|||||||
> **Goal:** Every stakeholder — public users, internal team, developers, agents — has a
|
> **Goal:** Every stakeholder — public users, internal team, developers, agents — has a
|
||||||
> frictionless native path to submit and track items.
|
> frictionless native path to submit and track items.
|
||||||
> **Target:** Sprint ending 2026-08-09
|
> **Target:** Sprint ending 2026-08-09
|
||||||
> **Detailed plan:** [`docs/roadmaps/05_INTAKE_ROADMAP.md`](./roadmaps/05_INTAKE_ROADMAP.md)
|
> **Detailed plan:** `docs/roadmaps/05_INTAKE_ROADMAP.md` _(to be created when Phase 4 begins)_
|
||||||
|
|
||||||
### 4.1 Public Submission Enhancements 🌐
|
### 4.1 Public Submission Enhancements 🌐
|
||||||
|
|
||||||
|
|||||||
@ -1,7 +1,7 @@
|
|||||||
# 00 — Master Execution Plan
|
# 00 — Master Execution Plan
|
||||||
|
|
||||||
**Parent:** [`docs/ROADMAP.md`](../ROADMAP.md)
|
**Parent:** [`docs/ROADMAP.md`](../ROADMAP.md)
|
||||||
**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(Phase 2)_ · `04_AGENT_API_ROADMAP.md` _(Phase 3)_ · `05_INTAKE_ROADMAP.md` _(Phase 4)_
|
**Related sub-plans:** [`01_FOUNDATIONS_AND_DECISIONS.md`](./01_FOUNDATIONS_AND_DECISIONS.md) · `03_RICH_ITEMS_ROADMAP.md` _(create when Phase 2 begins)_ · `04_AGENT_API_ROADMAP.md` _(create when Phase 3 begins)_ · `05_INTAKE_ROADMAP.md` _(create when Phase 4 begins)_
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
70
dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md
Normal file
70
dashboards/tracker-web/docs/runbooks/MEK_ROTATION.md
Normal file
@ -0,0 +1,70 @@
|
|||||||
|
# MEK Rotation Runbook
|
||||||
|
|
||||||
|
**Status:** Stub — populate as part of Phase 1.F.11
|
||||||
|
**Owner:** Platform team
|
||||||
|
**Source pattern:** [`learning_ai_notes/docs/runbooks/MEK_ROTATION.md`](../../../../../learning_ai_notes/docs/runbooks/MEK_ROTATION.md) (commit `bcad7d3`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This runbook describes how to **rotate the Master Encryption Key (MEK)** used for
|
||||||
|
field-level encryption of sensitive tracker data (PII fields on items, comments, attachment
|
||||||
|
URLs, agent API key seeds).
|
||||||
|
|
||||||
|
Tracker MEK rotation follows the same envelope-encryption pattern as NoteLett:
|
||||||
|
|
||||||
|
1. Tracker holds a **per-product MEK reference** in env (`TRACKER_MEK_ID_<PRODUCTID>`).
|
||||||
|
2. The MEK itself is stored in **Azure KeyVault**, never in process memory beyond
|
||||||
|
a single request lifecycle.
|
||||||
|
3. Each encrypted field has a `keyId` envelope marking which MEK version encrypted it.
|
||||||
|
4. Rotation creates a new MEK version; new writes use the new version; reads support
|
||||||
|
both old and new until reencryption sweep completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-rotation Checklist
|
||||||
|
|
||||||
|
- [ ] Confirm Azure KeyVault access from tracker-service host
|
||||||
|
- [ ] Confirm latest backup of Cosmos `tracker` container exists (RPO < 1 h)
|
||||||
|
- [ ] Notify on-call: rotation window expected ~30 min for active read-path verification
|
||||||
|
- [ ] Capture baseline metrics — read/write latency on encrypted fields
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rotation Procedure
|
||||||
|
|
||||||
|
> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/MEK_ROTATION.md`
|
||||||
|
> once tracker-service field encryption ships in Phase 1.F. Sketch only below.
|
||||||
|
|
||||||
|
1. **Create new MEK version in KeyVault**
|
||||||
|
- `az keyvault key create --vault-name <vault> --name tracker-mek-<productId> --kty RSA`
|
||||||
|
- Record new `keyId`
|
||||||
|
2. **Update tracker-service env** with new `TRACKER_MEK_ID_<PRODUCTID>`
|
||||||
|
3. **Rolling restart tracker-service** — new writes encrypt with new key
|
||||||
|
4. **Reencryption sweep** — background job re-reads + re-writes all encrypted fields with new key
|
||||||
|
5. **Verify** — zero encrypted fields still on old key version
|
||||||
|
6. **Revoke old key** — disable old KeyVault version
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
If decryption fails after rotation:
|
||||||
|
|
||||||
|
1. Revert env to previous `TRACKER_MEK_ID_<PRODUCTID>`
|
||||||
|
2. Restart tracker-service
|
||||||
|
3. Re-enable old KeyVault version
|
||||||
|
4. Investigate which fields failed before retrying
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [ ] `pnpm run smoke:local` passes end-to-end after rotation
|
||||||
|
- [ ] All encrypted fields on items / comments / attachments decrypt correctly via API
|
||||||
|
- [ ] Audit log entry recorded for the rotation event
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_See [`SECRET_MANAGEMENT.md`](./SECRET_MANAGEMENT.md) for the broader env / KeyVault secret workflow._
|
||||||
88
dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md
Normal file
88
dashboards/tracker-web/docs/runbooks/SECRET_MANAGEMENT.md
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
# Secret Management Runbook
|
||||||
|
|
||||||
|
**Status:** Stub — populate as part of Phase 1.F.12
|
||||||
|
**Owner:** Platform team
|
||||||
|
**Source pattern:** [`learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`](../../../../../learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md) (commit `bcad7d3`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This runbook documents how secrets flow from **Azure KeyVault → env → process** for
|
||||||
|
tracker-web (Next.js client/server) and the platform-service backend it proxies, and
|
||||||
|
how to rotate them safely.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Secret Inventory
|
||||||
|
|
||||||
|
| Secret | Used by | Storage | Rotation cadence |
|
||||||
|
| ----------------------------------- | ----------------------------------------- | --------------------------------------- | ---------------------------------------------------- |
|
||||||
|
| `JWT_SECRET` | platform-service + tracker-web API routes | KeyVault `bytelyst-jwt-secret` | Quarterly |
|
||||||
|
| `TRACKER_MEK_ID_<PRODUCTID>` | tracker-service field encryption | KeyVault per-product MEK keys | Quarterly (see [MEK_ROTATION.md](./MEK_ROTATION.md)) |
|
||||||
|
| `POSTHOG_KEY` | tracker-web client-side telemetry | KeyVault `tracker-posthog-key` | On compromise only |
|
||||||
|
| `COSMOS_CONNECTION_STRING` | platform-service Cosmos client | KeyVault `bytelyst-cosmos-conn` | On compromise only |
|
||||||
|
| `VALKEY_PASSWORD` | platform-service session/cache | KeyVault `bytelyst-valkey-password` | Quarterly |
|
||||||
|
| `TURNSTILE_SECRET` (Phase 1.4) | tracker-web public submission CAPTCHA | KeyVault `tracker-turnstile-secret` | On compromise only |
|
||||||
|
| `GITHUB_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gh-webhook-secret` | On compromise only |
|
||||||
|
| `GITEA_WEBHOOK_SECRET` (Phase 3.3) | tracker-service inbound webhook HMAC | KeyVault `tracker-gitea-webhook-secret` | On compromise only |
|
||||||
|
| Agent API keys (Phase 3.1) | end-user-managed; stored hashed in Cosmos | Cosmos `apikeys` collection | User-managed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolution Path
|
||||||
|
|
||||||
|
```
|
||||||
|
KeyVault → Docker build args (NEXT_PUBLIC_*) → baked into next-build
|
||||||
|
╲
|
||||||
|
→ docker-compose env_file → process.env at runtime
|
||||||
|
↘
|
||||||
|
systemd EnvironmentFile → process.env at runtime
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Build-time secrets** (`NEXT_PUBLIC_*`) are baked into the Next.js standalone build via
|
||||||
|
Dockerfile `ARG` + `ENV` (Phase 1.C.3 / 1.2). Once baked they are visible in client JS;
|
||||||
|
only put truly public-safe values here.
|
||||||
|
- **Runtime secrets** flow via docker-compose `env_file` or systemd `EnvironmentFile` so
|
||||||
|
they are not visible in client bundles.
|
||||||
|
- **No secret ever** appears in `git`, `.env.example`, log lines, or container `inspect`
|
||||||
|
output.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rotation Procedure
|
||||||
|
|
||||||
|
> **TODO** — adapt full procedure from `learning_ai_notes/docs/runbooks/SECRET_MANAGEMENT.md`
|
||||||
|
> once tracker-service ships secret-aware deployment in Phase 1.F.
|
||||||
|
|
||||||
|
1. **Create new secret version in KeyVault**
|
||||||
|
2. **Update env source** (docker-compose `.env`, systemd unit, or CI/CD secret store)
|
||||||
|
3. **Rolling restart** affected services
|
||||||
|
4. **Verify** — `pnpm run smoke:local` passes against rotated stack
|
||||||
|
5. **Revoke previous secret version** in KeyVault after 24 h soak
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## On Suspected Compromise
|
||||||
|
|
||||||
|
1. **Immediately revoke** the suspected secret in KeyVault
|
||||||
|
2. **Rotate** all dependent secrets in the same blast radius
|
||||||
|
3. **Force-revoke** all JWT tokens (bump `JWT_SECRET` → all sessions invalidated)
|
||||||
|
4. **Audit** access logs since suspected compromise window
|
||||||
|
5. **File a tracker bug** type `chore`, label `security`, priority `critical`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PII Scrubbing Rule
|
||||||
|
|
||||||
|
Per Phase 1.F.10: emails, names, and any field marked `pii: true` in the schema must NEVER
|
||||||
|
appear as plaintext in:
|
||||||
|
|
||||||
|
- Log lines (use `@bytelyst/logger` redaction map)
|
||||||
|
- Telemetry events sent to PostHog
|
||||||
|
- Error messages bubbled to clients
|
||||||
|
- Webhook delivery logs (Phase 3.4 / 3.5)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_See [`MEK_ROTATION.md`](./MEK_ROTATION.md) for field-level encryption key rotation specifically._
|
||||||
Loading…
Reference in New Issue
Block a user