docs: roadmap v3 — incorporate review feedback (F11-F13, Phase E)
Review-driven additions: - F11 added (silent UI breakage from missing/un-COPY'd postcss.config.mjs; 4 repos hit this tonight: notes dff459e, jarvis_jr 36f6bc1, clock a308c6444, local_memory_gpt 07cdf6b) - F12 added (healthcheck localhost → IPv6 false-fail; jarvis_jr incident) - F13 added (enumerated COPY drift from filesystem; root cause of F11b) Structural changes: - New A8 (config-file COPY audit + glob pattern decision) - New A9 (healthcheck IPv4 canonicalization) - New A0-V verification gate (build Gitea path before optimizing) - New § 2.5 canonical decisions (Alpine + ARG BASE_IMAGE override, 127.0.0.1, --lockfile=false pending ADR) - New § 7.5 canonical web Dockerfile (was missing, where F11 lives) - New § 7.6 docker-doctor.sh skeleton - New Phase E (docker-doctor.sh CI lint as durable insurance) - B7 promoted from Phase D to Phase B proper (drift compounds) - B4 husky hook extended to also block .tgz and .bak - A0-1 env-var expansion chain explicitly documented - A2-3 verification command corrected (docker buildx du, not docker history) - Pilot order inverted: clock first (web + backend), then peakpulse - C9 smoke test added (CSS bundle > 50 KB, F11 guard) - 4 new risk-register rows for F11/F12/F13/BASE_IMAGE drift
This commit is contained in:
parent
529d4f37f5
commit
1a638a84e1
@ -1,34 +1,46 @@
|
||||
# Docker Build Optimization Roadmap
|
||||
|
||||
> **Status:** Draft v2 (post-audit) · **Owner:** Platform DevOps · **Created:** 2026-05-27 · **Revised:** 2026-05-27
|
||||
> **Status:** Draft v3 (post-review) · **Owner:** Platform DevOps · **Created:** 2026-05-27 · **Revised:** 2026-05-27
|
||||
>
|
||||
> Pilot Docker-build speed-ups + hermetic-fallback hardening on `learning_ai_peakpulse`
|
||||
> and `learning_ai_clock`, then capture the playbook here for ecosystem-wide rollout.
|
||||
> Pilot Docker-build correctness + speed fixes on `learning_ai_clock` (web + backend)
|
||||
> and `learning_ai_peakpulse` (backend), then capture the playbook here for
|
||||
> ecosystem-wide rollout.
|
||||
|
||||
---
|
||||
|
||||
## 0. Pre-flight audit findings (2026-05-27)
|
||||
|
||||
A read-only audit of the two pilot repos surfaced **10 concrete bugs/gaps**
|
||||
that contradict the casual narrative that "Gitea-registry is the default and
|
||||
`docker-prep.sh` is the fallback." The actual state is closer to the inverse:
|
||||
A read-only audit of pilot repos + lessons from recent live incidents surfaced
|
||||
**13 concrete bugs/gaps**. The actual state of the ecosystem is closer to the
|
||||
inverse of the casual narrative: tarballs are the de facto default, the
|
||||
Gitea-registry path is partially wired, and there is a separate class of
|
||||
"build green, app broken" silent failures (F11–F13) that the speed-focused
|
||||
plan needs to address first.
|
||||
|
||||
| # | Finding | Location | Severity |
|
||||
|---|---|---|---|
|
||||
| F1 | `pnpm-lock.yaml` is in `.dockerignore` — any lockfile-based optimization is blocked until removed | `peakpulse/.dockerignore`, `clock/.dockerignore` | **High** |
|
||||
| F2 | `pnpm-workspace.yaml` references sibling `../learning_ai_common_plat/packages/*` — `--frozen-lockfile` inside Docker will fail unless workspace is flattened or sibling tree is copied | `peakpulse/pnpm-workspace.yaml`, `clock/pnpm-workspace.yaml` | **High** |
|
||||
| F3 | `peakpulse/.npmrc.docker` is tarball-only (no `@bytelyst:registry=…` line) — the "Gitea-registry" path doesn't actually work in this repo today | `peakpulse/.npmrc.docker` | **High** |
|
||||
| F4 | `clock/.npmrc.docker` hardcodes `http://localhost:3300` — from inside a Docker container `localhost` is the container itself, not the host registry | `clock/.npmrc.docker` | **High** |
|
||||
| F5 | `clock/backend/Dockerfile` has neither `ARG GITEA_NPM_HOST` nor a BuildKit secret mount — it is wholly dependent on `.docker-deps/` having been pre-populated | `clock/backend/Dockerfile` | High |
|
||||
| F6 | `clock/web/Dockerfile` accepts `ARG GITEA_NPM_HOST` but never uses it and has no `--mount=type=secret` — passing the arg is a no-op | `clock/web/Dockerfile` | Medium |
|
||||
| F7 | `peakpulse/docker-compose.yml` does not pass `GITEA_NPM_HOST` build arg or declare `secrets:` block, so `docker compose build` cannot use the Gitea path | `peakpulse/docker-compose.yml` | Medium |
|
||||
| F8 | `COPY .docker-deps/` is unconditional in every backend Dockerfile — every build requires either `docker-prep.sh` to have run OR an empty `.docker-deps/` dir to pre-exist | both repos | Medium |
|
||||
| F1 | `pnpm-lock.yaml` is in `.dockerignore` — any lockfile-based optimization is blocked until removed | `peakpulse/.dockerignore`, `clock/.dockerignore` | High |
|
||||
| F2 | `pnpm-workspace.yaml` references sibling `../learning_ai_common_plat/packages/*` — `--frozen-lockfile` inside Docker will fail unless workspace is flattened or sibling tree is copied | both pilots | High |
|
||||
| F3 | `peakpulse/.npmrc.docker` is tarball-only (no `@bytelyst:registry=…` line) — the "Gitea-registry" path doesn't work in this repo today | `peakpulse/.npmrc.docker` | High |
|
||||
| F4 | `clock/.npmrc.docker` hardcodes `http://localhost:3300` — from inside Docker, `localhost` is the container, not the host registry | `clock/.npmrc.docker` | High |
|
||||
| F5 | `clock/backend/Dockerfile` has neither `ARG GITEA_NPM_HOST` nor a BuildKit secret mount — wholly dependent on pre-populated `.docker-deps/` | `clock/backend/Dockerfile` | High |
|
||||
| F6 | `clock/web/Dockerfile` accepts `ARG GITEA_NPM_HOST` but never uses it; no `--mount=type=secret` either | `clock/web/Dockerfile` | Medium |
|
||||
| F7 | `peakpulse/docker-compose.yml` does not pass `GITEA_NPM_HOST` build arg or declare `secrets:` block | `peakpulse/docker-compose.yml` | Medium |
|
||||
| F8 | `COPY .docker-deps/` is unconditional in every backend Dockerfile — every build requires `docker-prep.sh` to have run OR an empty `.docker-deps/` dir to pre-exist | both repos | Medium |
|
||||
| F9 | `npm install -g pnpm@10.6.5` runs on every build (no `corepack`) — 5–10 s overhead, no pinning to `packageManager` field | all four Dockerfiles | Low |
|
||||
| F10 | No BuildKit `--mount=type=cache` for pnpm store — cold install on every rebuild even when deps unchanged | all four Dockerfiles | High (the main speed win) |
|
||||
| F10 | No BuildKit `--mount=type=cache` for pnpm store — cold install on every rebuild even when deps unchanged | all four Dockerfiles | High (main speed win) |
|
||||
| **F11** | **Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: `next build` succeeds, container is "healthy", but CSS bundle is ~33 KB (only `@font-face`) and all Tailwind classes are absent → UI renders unstyled.** Two sub-bugs: (a) `postcss.config.mjs` missing entirely while `@tailwindcss/postcss` is in `package.json` (NoteLett, JarvisJr fixes `dff459e`, `36f6bc1`); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes `a308c6444`, `07cdf6b`). | `*/web/Dockerfile`, `*/web/postcss.config.*` | **High** |
|
||||
| **F12** | **Healthcheck uses `localhost`, resolves to IPv6 `::1`, false-fails.** Backend listens on `0.0.0.0` (IPv4 only). `wget --spider http://localhost:.../health` hits `::1`, connection refused, container marked "unhealthy", `web` service won't start due to `depends_on: condition: service_healthy`. Incident: `learning_ai_jarvis_jr/docker-compose.yml`. | every `docker-compose*.yml` healthcheck | **Medium** |
|
||||
| **F13** | **Enumerated `COPY web/foo ./foo` pattern drifts from filesystem.** New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). | every Dockerfile using enumerated COPY | **Medium** |
|
||||
|
||||
**Implication:** the original plan to "switch to `--frozen-lockfile` + Gitea
|
||||
registry" requires two upstream fixes first (F1, F2). The roadmap below
|
||||
accounts for that.
|
||||
**Implications:**
|
||||
|
||||
- The original "switch to `--frozen-lockfile` + Gitea registry" plan requires
|
||||
two upstream fixes first (F1, F2).
|
||||
- F11–F13 mean **correctness fixes must precede speed fixes**, otherwise we
|
||||
ship faster builds of broken apps.
|
||||
- A linter (Phase E `docker-doctor.sh`) is the durable insurance against
|
||||
F11/F13 recurrence — they are silent in CI today.
|
||||
|
||||
---
|
||||
|
||||
@ -36,8 +48,8 @@ accounts for that.
|
||||
|
||||
| Path | Status today | Trigger | Notes |
|
||||
|---|---|---|---|
|
||||
| **`docker-prep.sh` tarballs** | **De facto default** in peakpulse + flowmonk; also works in clock | Run `docker-prep.sh` then `docker compose build` | Hermetic; mutates `package.json`; slow to repack |
|
||||
| **Gitea NPM registry** | Partially wired in clock + notes; broken in peakpulse | `docker compose build` with `GITEA_NPM_HOST` arg + secret | Needs `.npmrc.docker` standardization to actually be default |
|
||||
| **`docker-prep.sh` tarballs** | **De facto default** in peakpulse + flowmonk; also works in clock/notes | Run `docker-prep.sh` then `docker compose build` | Hermetic; mutates `package.json`; slow to repack |
|
||||
| **Gitea NPM registry** | Partially wired in clock + notes; broken in peakpulse | `docker compose build` with `GITEA_NPM_HOST` arg + secret | Needs `.npmrc.docker` standardization to be the default |
|
||||
| **Legacy `file:` refs** | Deprecated | — | Removed during pnpm/Gitea migration |
|
||||
|
||||
### Measurement targets
|
||||
@ -56,11 +68,12 @@ accounts for that.
|
||||
|
||||
**Goals**
|
||||
|
||||
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (the single biggest win)
|
||||
- ✅ Make `docker-prep.sh` idempotent, safe to re-run, gitignore-clean
|
||||
- ✅ Eliminate F11–F13 class of silent "build green, app broken" failures
|
||||
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
|
||||
- ✅ Make `docker-prep.sh` idempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift)
|
||||
- ✅ Standardize `.npmrc.docker` across the ecosystem so the Gitea path actually works
|
||||
- ✅ Fix `docker-compose.yml` to pass `GITEA_NPM_HOST` + secrets so the registry path is usable without manual flags
|
||||
- ✅ Document which path to use when, and the trade-offs
|
||||
- ✅ Ship `docker-doctor.sh` CI lint as the durable insurance layer
|
||||
|
||||
**Non-goals**
|
||||
|
||||
@ -71,19 +84,49 @@ accounts for that.
|
||||
|
||||
---
|
||||
|
||||
## 3. Phase A — Build speed + path correctness
|
||||
## 2.5 Canonical decisions
|
||||
|
||||
Order matters: A0 must precede A1–A5 (you can't enable a path that doesn't work).
|
||||
Decisions taken now to avoid contradictions later in the doc:
|
||||
|
||||
### A0. Make the Gitea-registry path actually work (peakpulse + clock)
|
||||
- **Base image:** `node:22-alpine` is canonical. For repos blocked by the
|
||||
corporate proxy's Alpine SSL interception (currently only
|
||||
`learning_ai_notes`), the Dockerfile MUST expose:
|
||||
```dockerfile
|
||||
ARG BASE_IMAGE=node:22-alpine
|
||||
FROM ${BASE_IMAGE} AS builder
|
||||
```
|
||||
Override per-repo via `--build-arg BASE_IMAGE=node:22-slim`. Document the
|
||||
override in the repo's `AGENTS.md`.
|
||||
- **Healthcheck host:** `127.0.0.1` (NOT `localhost`) in every
|
||||
`docker-compose*.yml` `test:` block. See F12.
|
||||
- **Lockfile mode in Docker:** `--lockfile=false` for now. `--frozen-lockfile`
|
||||
is blocked on the A3 ADR (F2).
|
||||
|
||||
---
|
||||
|
||||
## 3. Phase A — Correctness + build speed + path correctness
|
||||
|
||||
Order matters: A0 must precede A1+ (you can't optimize a path that doesn't
|
||||
work), and A8+A9 (correctness) must land before measuring speed wins.
|
||||
|
||||
### A0. Make the Gitea-registry path actually work (clock + peakpulse)
|
||||
|
||||
- [ ] **A0-1.** Standardize `.npmrc.docker` to use a templated host so it works on host (`localhost`) and inside Docker (`host.docker.internal`):
|
||||
```
|
||||
@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/
|
||||
//${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/:_authToken=${GITEA_NPM_TOKEN}
|
||||
strict-ssl=false
|
||||
auto-install-peers=true
|
||||
```
|
||||
- [ ] **A0-2.** Remove `pnpm-lock.yaml` from `.dockerignore` in both repos (fixes F1)
|
||||
> **⚠️ Env-var expansion chain:** pnpm expands `${VAR}` in `.npmrc` at read
|
||||
> time using the current process environment (see [pnpm npmrc docs][pnpm-npmrc]).
|
||||
> That means the Dockerfile MUST do `ARG GITEA_NPM_HOST` → `ENV GITEA_NPM_HOST=$GITEA_NPM_HOST`
|
||||
> **before** the `pnpm install` RUN line, AND the `GITEA_NPM_TOKEN` must be
|
||||
> exported from the BuildKit secret mount inside the same `RUN` (since secrets
|
||||
> don't persist as env across layers).
|
||||
|
||||
[pnpm-npmrc]: https://pnpm.io/npmrc
|
||||
- [ ] **A0-2.** Remove `pnpm-lock.yaml` from `.dockerignore` in both repos (fixes F1; harmless under `--lockfile=false` since we don't COPY it, but unblocks future A3)
|
||||
- [ ] **A0-3.** Add `GITEA_NPM_HOST` build arg + `secrets:` block to every service in `docker-compose.yml`:
|
||||
```yaml
|
||||
build:
|
||||
@ -98,27 +141,28 @@ Order matters: A0 must precede A1–A5 (you can't enable a path that doesn't wor
|
||||
environment: GITEA_NPM_TOKEN
|
||||
```
|
||||
- [ ] **A0-4.** Add `extra_hosts: ["host.docker.internal:host-gateway"]` to each service so Linux Docker can resolve the host
|
||||
- [ ] **A0-5.** Document required env: `GITEA_NPM_TOKEN` must be exported in the shell that runs `docker compose build`
|
||||
- [ ] **A0-5.** Document required env: `GITEA_NPM_TOKEN` must be exported in the shell that runs `docker compose build` (add to repo `README.md` quickstart)
|
||||
- [ ] **A0-V.** **Verification gate (between A0 and A1):** build the registry path **without** any cache-mount or layer optimizations. Confirm `docker compose build --no-cache` succeeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.
|
||||
|
||||
### A1. Replace `npm install -g pnpm@X` with corepack
|
||||
|
||||
- [ ] **A1-1.** Replace lines `RUN npm install -g pnpm@10.6.5` with:
|
||||
- [ ] **A1-1.** Replace `RUN npm install -g pnpm@10.6.5` with:
|
||||
```dockerfile
|
||||
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
|
||||
```
|
||||
- [ ] **A1-2.** Verify `packageManager` field in `backend/package.json` matches (already `pnpm@10.6.5` in peakpulse)
|
||||
- [ ] **A1-2.** Verify `packageManager` field in `backend/package.json` and `web/package.json` matches (already `pnpm@10.6.5` in peakpulse backend)
|
||||
|
||||
### A2. Add BuildKit pnpm-store cache mount
|
||||
|
||||
- [ ] **A2-1.** Set `# syntax=docker/dockerfile:1.7` directive at top of every Dockerfile
|
||||
- [ ] **A2-2.** Wrap install step with cache mount:
|
||||
- [ ] **A2-2.** Wrap install step with cache + secret mount:
|
||||
```dockerfile
|
||||
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
|
||||
--mount=type=secret,id=gitea_npm_token \
|
||||
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
|
||||
pnpm install --ignore-scripts
|
||||
pnpm install --ignore-scripts --lockfile=false
|
||||
```
|
||||
- [ ] **A2-3.** Verify cache hit on second build via `docker buildx du` or `docker history`
|
||||
- [ ] **A2-3.** Verify cache mount is active: `docker buildx du --filter type=exec.cachemount` shows non-zero size after a build. **Real success metric** is wall-clock: warm rebuild (touching one source file) drops to < 30 s.
|
||||
|
||||
### A3. Decide lockfile policy (BLOCKED on F2 resolution)
|
||||
|
||||
@ -139,20 +183,18 @@ Two options — pick one in a short ADR before implementing:
|
||||
|
||||
### A4. Restructure layer order
|
||||
|
||||
- [ ] **A4-1.** Reorder COPY/RUN so deps install layer is `package.json` + `.npmrc` ONLY, then a separate layer for `src/`, `tsconfig.json`, `shared/`
|
||||
- [ ] **A4-2.** Move all `ARG` lines that affect deps install **before** the install step; move `NEXT_PUBLIC_*` ARGs (clock web) closer to the build step
|
||||
- [ ] **A4-1.** Reorder COPY/RUN so deps-install layer is `package.json` + `.npmrc.docker` ONLY, then a separate layer for `src/`, config files, `shared/`
|
||||
- [ ] **A4-2.** Move all `ARG` lines that affect deps install **before** the install step; move `NEXT_PUBLIC_*` ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)
|
||||
|
||||
### A5. Gate `.docker-deps/` behind a build arg
|
||||
|
||||
- [ ] **A5-1.** Add `ARG USE_TARBALLS=false` to Dockerfile
|
||||
- [ ] **A5-2.** Conditionally copy:
|
||||
- [ ] **A5-2.** Use wildcard COPY so missing dir doesn't break the build:
|
||||
```dockerfile
|
||||
# Always-empty placeholder so COPY doesn't fail in registry mode
|
||||
RUN mkdir -p /app/.docker-deps
|
||||
COPY .docker-deps* /app/.docker-deps/
|
||||
```
|
||||
(The wildcard tolerates a missing `.docker-deps/` dir; works without enabling BuildKit COPY's `--from` tricks.)
|
||||
- [ ] **A5-3.** Verify `.docker-deps/` is in `.gitignore` and `.dockerignore` is NOT excluding it when tarball mode is in use
|
||||
- [ ] **A5-3.** Verify `.docker-deps/` is in `.gitignore` and `.dockerignore` does NOT exclude it when tarball mode is in use
|
||||
|
||||
### A6. `.dockerignore` audit
|
||||
|
||||
@ -164,37 +206,93 @@ Two options — pick one in a short ADR before implementing:
|
||||
|
||||
| Repo | Surface | Cold before | Cold after | Warm before | Warm after | Notes |
|
||||
|---|---|---|---|---|---|---|
|
||||
| peakpulse | backend | — | — | — | — | |
|
||||
| clock | backend | — | — | — | — | |
|
||||
| clock | web | — | — | — | — | |
|
||||
| clock | backend | — | — | — | — | |
|
||||
| peakpulse | backend | — | — | — | — | |
|
||||
|
||||
Use:
|
||||
```
|
||||
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend # cold
|
||||
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend # cold
|
||||
touch backend/src/server.ts && time docker compose build backend # warm
|
||||
```
|
||||
|
||||
### A8. Config-file COPY audit & canonical pattern (addresses F11, F13)
|
||||
|
||||
- [ ] **A8-1.** For every Dockerfile in scope, list all build-time files present in the surface directory (`web/` or `backend/`) that affect the build:
|
||||
- `postcss.config.{js,mjs,cjs,ts}`
|
||||
- `tailwind.config.{js,mjs,cjs,ts}`
|
||||
- `next.config.{js,mjs,ts}`
|
||||
- `tsconfig*.json`
|
||||
- `package.json`
|
||||
- `.npmrc.docker`, `.npmrc`
|
||||
- `babel.config.*` (if present)
|
||||
- `drizzle.config.*` (if present)
|
||||
- `vitest.config.*` (only if the build needs it)
|
||||
Verify each is COPY'd in the Dockerfile.
|
||||
- [ ] **A8-2.** Choose canonical COPY pattern. **Decision: middle-ground glob** for web surfaces:
|
||||
```dockerfile
|
||||
COPY web/*.{json,ts,mjs,js,cjs} ./
|
||||
COPY web/public/ ./public/
|
||||
COPY web/src/ ./src/
|
||||
```
|
||||
Trade-off: glob picks up unintended root-level files if any are added later, but **dramatically reduces F11/F13 risk**. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).
|
||||
- [ ] **A8-3.** Repo-by-repo migration: replace enumerated `COPY web/foo ./foo` with the glob pattern; verify the resulting image has all expected files via `docker run --rm <img> ls -la`.
|
||||
|
||||
### A9. Healthcheck canonicalization (addresses F12)
|
||||
|
||||
- [ ] **A9-1.** Replace `localhost` with `127.0.0.1` in every `docker-compose*.yml` healthcheck `test:` block. Sweep with:
|
||||
```
|
||||
rg -l 'http://localhost' --glob 'docker-compose*.yml'
|
||||
```
|
||||
- [ ] **A9-2.** Standardize healthcheck shape:
|
||||
- **Alpine-based images:**
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
```
|
||||
- **Slim/Debian images** (`wget` not always present, but `node` is):
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
|
||||
```
|
||||
- [ ] **A9-3.** Add `start_period` (10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.
|
||||
|
||||
---
|
||||
|
||||
## 4. Phase B — Hermetic-fallback polish (`docker-prep.sh`)
|
||||
|
||||
The script is **duplicated with minor variations** across product repos. Pilot
|
||||
in peakpulse + clock, then propose a canonical home.
|
||||
`docker-prep.sh` is duplicated with minor variations across product repos.
|
||||
**Promotion to canonical home is now in Phase B, not Phase D** — drift
|
||||
compounds linearly with time and the `.npmrc` template precedent proves the
|
||||
pattern is cheap.
|
||||
|
||||
- [ ] **B1.** Add `--dry-run` flag — list packs/rewrites, no side effects
|
||||
- [ ] **B2.** Idempotency guard — refuse to run if any `*.bak` exists unless `--force`
|
||||
- [ ] **B3.** Ensure `.docker-deps/` and `*.bak` are in `.gitignore` of every pilot repo
|
||||
- [ ] **B4.** Pre-commit hook (husky) — block commits containing `"file:../.docker-deps/"` inside any `package.json`. Add to `.husky/pre-commit`:
|
||||
- [ ] **B4.** Pre-commit hook (husky) — block commits containing rewritten `package.json`, staged tarballs, OR `.bak` files:
|
||||
```bash
|
||||
# .husky/pre-commit
|
||||
if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then
|
||||
echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first."
|
||||
exit 1
|
||||
fi
|
||||
if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then
|
||||
echo "ERROR: docker-prep.sh artifacts staged. Run --restore first."
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
- [ ] **B5.** Auto-restore on script error via `trap restore_on_error EXIT` (unless `--keep` passed)
|
||||
- [ ] **B6.** Update script header comment with explicit "use only when Gitea unreachable OR you need uncommitted common-plat changes"
|
||||
- [ ] **B7.** Propose canonical home: `learning_ai_common_plat/scripts/docker-prep.template.sh` + `sync-docker-prep.sh` (mirrors `.npmrc` template pattern). Defer execution to Phase D.
|
||||
- [ ] **B8.** Add a `--strip-overrides` option that removes `pnpm.overrides` block after build, in case `--restore` is forgotten (additional safety net)
|
||||
- [ ] **B6.** Update script header comment per § 7.4 template
|
||||
- [ ] **B7. CANONICAL HOME (was deferred — now in Phase B proper).**
|
||||
- [ ] **B7-1.** Move script to `learning_ai_common_plat/scripts/docker-prep.template.sh`
|
||||
- [ ] **B7-2.** Add `learning_ai_common_plat/scripts/sync-docker-prep.sh` to copy template into all product repos (mirrors `sync-npmrc.sh`)
|
||||
- [ ] **B7-3.** Add `learning_ai_common_plat/scripts/check-docker-prep-drift.sh` for CI (mirrors `check-npmrc-drift.sh`)
|
||||
- [ ] **B7-4.** Update every repo's `AGENTS.md` with the "NEVER edit `docker-prep.sh` directly" warning + template link
|
||||
- [ ] **B8.** Add `--strip-overrides` option that removes `pnpm.overrides` block after build — safety net in case `--restore` is forgotten
|
||||
|
||||
---
|
||||
|
||||
@ -205,30 +303,32 @@ Pilot exit criteria (must all pass before Phase D):
|
||||
- [ ] **C1.** Cold Docker build succeeds on both pilots via Gitea-registry path (no `docker-prep.sh` invocation)
|
||||
- [ ] **C2.** Warm rebuild (single source file touched) < 30 s on both pilots
|
||||
- [ ] **C3.** `docker-prep.sh` → `docker compose build` → `--restore` leaves `git status` clean
|
||||
- [ ] **C4.** Pre-commit hook blocks a deliberately-staged rewritten `package.json`
|
||||
- [ ] **C4.** Pre-commit hook blocks: (a) rewritten `package.json`, (b) staged `.tgz`, (c) staged `.bak`
|
||||
- [ ] **C5.** Gitea Actions CI green on both pilots (verify CI uses the same Dockerfile path)
|
||||
- [ ] **C6.** Build-time metrics filled into the table in § 3.A7
|
||||
- [ ] **C7.** Decision recorded in ADR for A3 (lockfile policy)
|
||||
- [ ] **C7.** ADR recorded for A3 (lockfile policy)
|
||||
- [ ] **C8.** `docker-doctor.sh` (Phase E) runs clean against both pilots
|
||||
- [ ] **C9.** Smoke test: render the web app, inspect `<head>` for non-trivial CSS bundle (> 50 KB), confirm Tailwind classes apply. Guard against F11 regression.
|
||||
|
||||
---
|
||||
|
||||
## 6. Phase D — Ecosystem rollout (deferred until § 5 passes)
|
||||
|
||||
Apply Phase A0 → A2 + A4 → A6 + B to remaining repos. **Pilots excluded.**
|
||||
Apply Phase A + B + E to remaining repos. **Pilots excluded.**
|
||||
|
||||
| Repo | Backend | Web | docker-prep | Notes |
|
||||
|---|---|---|---|---|
|
||||
| `learning_ai_notes` | ☐ | ☐ | ☐ | Uses `node:22-slim` (corp proxy / Alpine SSL issue) |
|
||||
| `learning_ai_fastgap` | ☐ | ☐ | ☐ | Mobile + web + backend |
|
||||
| `learning_ai_jarvis_jr` | ☐ | ☐ | ☐ | |
|
||||
| `learning_ai_flowmonk` | ☐ | ☐ | ☐ | `.npmrc.docker` is tarball-only — needs A0-1 |
|
||||
| `learning_ai_trails` | ☐ | ☐ | ☐ | |
|
||||
| `learning_ai_local_memory_gpt` | ☐ | ☐ | ☐ | SQLite-based, no Cosmos |
|
||||
| `learning_multimodal_memory_agents` (MindLyst) | ☐ | ☐ | ☐ | KMP repo, different layout |
|
||||
| `learning_voice_ai_agent` (LysnrAI) | ☐ | ☐ | ☐ | Python desktop + TS dashboards |
|
||||
| `learning_ai_efforise` | ☐ | ☐ | ☐ | |
|
||||
| `learning_ai_auth_app` | ☐ | n/a | ☐ | iOS/Android — no backend Dockerfile |
|
||||
| `learning_ai_talk2obsidian` | ☐ | ☐ | ☐ | Single-container app |
|
||||
| Repo | Backend | Web | docker-prep | Healthcheck | Notes |
|
||||
|---|---|---|---|---|---|
|
||||
| `learning_ai_notes` | ☐ | ☐ | ☐ | ☐ | `BASE_IMAGE=node:22-slim` override (corp proxy Alpine SSL) |
|
||||
| `learning_ai_fastgap` | ☐ | ☐ | ☐ | ☐ | Mobile + web + backend |
|
||||
| `learning_ai_jarvis_jr` | ☐ | ☐ | ☐ | ☐ | F12 incident already fixed; verify regression-proof |
|
||||
| `learning_ai_flowmonk` | ☐ | ☐ | ☐ | ☐ | `.npmrc.docker` is tarball-only — needs A0-1 |
|
||||
| `learning_ai_trails` | ☐ | ☐ | ☐ | ☐ | |
|
||||
| `learning_ai_local_memory_gpt` | ☐ | ☐ | ☐ | ☐ | SQLite-based; F11(b) already fixed `07cdf6b` — verify regression-proof |
|
||||
| `learning_multimodal_memory_agents` (MindLyst) | ☐ | ☐ | ☐ | ☐ | KMP repo, different layout |
|
||||
| `learning_voice_ai_agent` (LysnrAI) | ☐ | ☐ | ☐ | ☐ | Python desktop + TS dashboards |
|
||||
| `learning_ai_efforise` | ☐ | ☐ | ☐ | ☐ | |
|
||||
| `learning_ai_auth_app` | ☐ | n/a | ☐ | n/a | iOS/Android — no Docker surfaces |
|
||||
| `learning_ai_talk2obsidian` | ☐ | ☐ | ☐ | ☐ | Single-container app |
|
||||
|
||||
---
|
||||
|
||||
@ -243,11 +343,12 @@ strict-ssl=false
|
||||
auto-install-peers=true
|
||||
```
|
||||
|
||||
### 7.2 Canonical backend Dockerfile (post Phase A)
|
||||
### 7.2 Canonical backend Dockerfile
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1.7
|
||||
FROM node:22-alpine AS builder
|
||||
ARG BASE_IMAGE=node:22-alpine
|
||||
FROM ${BASE_IMAGE} AS builder
|
||||
WORKDIR /app/backend
|
||||
|
||||
ARG GITEA_NPM_HOST=host.docker.internal
|
||||
@ -261,7 +362,7 @@ RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
|
||||
# ── Deps layer (cacheable) ─────────────────────────────────────────
|
||||
COPY .npmrc.docker ./.npmrc
|
||||
COPY backend/package.json ./package.json
|
||||
# Tolerate missing .docker-deps/ when in registry mode (wildcard match)
|
||||
# Tolerate missing .docker-deps/ when in registry mode
|
||||
RUN mkdir -p /app/.docker-deps
|
||||
COPY .docker-deps* /app/.docker-deps/
|
||||
|
||||
@ -277,7 +378,7 @@ COPY shared/ ../shared/
|
||||
RUN pnpm run build
|
||||
|
||||
# ── Runtime ────────────────────────────────────────────────────────
|
||||
FROM node:22-alpine
|
||||
FROM ${BASE_IMAGE}
|
||||
WORKDIR /app/backend
|
||||
ENV NODE_ENV=production
|
||||
COPY --from=builder /app/backend/node_modules ./node_modules
|
||||
@ -289,7 +390,7 @@ CMD ["node", "dist/server.js"]
|
||||
```
|
||||
|
||||
> `--lockfile=false` is intentional pending the A3 ADR. Switch to
|
||||
> `--frozen-lockfile` once the sibling-workspace problem (F2) is resolved.
|
||||
> `--frozen-lockfile` only once the sibling-workspace problem (F2) is resolved.
|
||||
|
||||
### 7.3 Canonical `docker-compose.yml` service block
|
||||
|
||||
@ -309,8 +410,16 @@ services:
|
||||
- "4010:4010"
|
||||
environment:
|
||||
- NODE_ENV=production
|
||||
- PORT=4010
|
||||
# ...
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
# F12: use 127.0.0.1 NOT localhost (IPv6 resolution false-fails)
|
||||
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:4010/health || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
|
||||
secrets:
|
||||
gitea_npm_token:
|
||||
@ -337,6 +446,7 @@ secrets:
|
||||
# ./scripts/docker-prep.sh --force # override idempotency guard
|
||||
# ./scripts/docker-prep.sh --restore # undo rewrite
|
||||
# ./scripts/docker-prep.sh --keep # skip auto-restore on error
|
||||
# ./scripts/docker-prep.sh --strip-overrides # remove pnpm.overrides block
|
||||
#
|
||||
# Side effects:
|
||||
# - Creates .docker-deps/ (gitignored)
|
||||
@ -347,58 +457,210 @@ secrets:
|
||||
# Safety:
|
||||
# - Refuses to run if .bak files already exist (unless --force)
|
||||
# - Auto-restores on error (trap EXIT) unless --keep passed
|
||||
# - Pre-commit hook blocks committing rewritten package.json
|
||||
# - Pre-commit hook blocks committing rewritten package.json, .tgz, .bak
|
||||
```
|
||||
|
||||
### 7.5 Canonical Next.js web Dockerfile (addresses F11, F13)
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1.7
|
||||
ARG BASE_IMAGE=node:22-alpine
|
||||
FROM ${BASE_IMAGE} AS deps
|
||||
WORKDIR /app/web
|
||||
|
||||
ARG GITEA_NPM_HOST=host.docker.internal
|
||||
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
|
||||
ENV NPM_CONFIG_STRICT_SSL=false
|
||||
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
|
||||
|
||||
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
|
||||
|
||||
COPY .npmrc.docker ./.npmrc
|
||||
COPY web/package.json ./package.json
|
||||
RUN mkdir -p /app/.docker-deps
|
||||
COPY .docker-deps* /app/.docker-deps/
|
||||
|
||||
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
|
||||
--mount=type=secret,id=gitea_npm_token \
|
||||
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
|
||||
pnpm install --ignore-scripts --lockfile=false
|
||||
|
||||
# ── Builder ────────────────────────────────────────────────────────
|
||||
FROM ${BASE_IMAGE} AS builder
|
||||
WORKDIR /app/web
|
||||
COPY --from=deps /app/web/node_modules ./node_modules
|
||||
COPY --from=deps /app/web/package.json ./package.json
|
||||
|
||||
# F11/F13 fix: glob ALL root-level config files instead of enumerating.
|
||||
# Picks up postcss.config.*, tailwind.config.*, next.config.*, tsconfig*,
|
||||
# any future *.config.* additions without Dockerfile changes.
|
||||
COPY web/*.json web/*.ts web/*.mjs web/*.js web/*.cjs ./
|
||||
COPY web/public/ ./public/
|
||||
COPY web/src/ ./src/
|
||||
COPY shared/ ../shared/
|
||||
|
||||
ARG NEXT_PUBLIC_BACKEND_URL
|
||||
ARG NEXT_PUBLIC_PLATFORM_SERVICE_URL
|
||||
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL
|
||||
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=$NEXT_PUBLIC_PLATFORM_SERVICE_URL
|
||||
ENV NEXT_TELEMETRY_DISABLED=1
|
||||
|
||||
RUN corepack enable && pnpm run build
|
||||
|
||||
# ── Runtime (Next.js standalone) ───────────────────────────────────
|
||||
FROM ${BASE_IMAGE} AS runner
|
||||
WORKDIR /app/web
|
||||
ENV NODE_ENV=production
|
||||
ENV NEXT_TELEMETRY_DISABLED=1
|
||||
|
||||
COPY --from=builder /app/web/.next/standalone ./
|
||||
# Next 16 standalone server runs as `node web/server.js` from /app/web,
|
||||
# so static assets live at /app/web/web/.next/static (NOT ./.next/static).
|
||||
COPY --from=builder /app/web/.next/static ./web/.next/static
|
||||
COPY --from=builder /app/web/public ./web/public
|
||||
|
||||
EXPOSE 3000
|
||||
ENV PORT=3000
|
||||
ENV HOSTNAME=0.0.0.0
|
||||
CMD ["node", "web/server.js"]
|
||||
```
|
||||
|
||||
> **Verification step after every web Dockerfile change:** smoke-test the
|
||||
> built image by running it and curling the rendered HTML. Confirm the CSS
|
||||
> bundle in `<link>` references is > 50 KB. A bundle of ~33 KB is the F11
|
||||
> signature (only `@font-face`, no Tailwind utilities).
|
||||
|
||||
### 7.6 `docker-doctor.sh` skeleton (Phase E)
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# docker-doctor.sh — pre-flight Dockerfile + docker-compose health checks.
|
||||
# Run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore.
|
||||
set -euo pipefail
|
||||
|
||||
REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
|
||||
FAILED=0
|
||||
|
||||
# Check 1 (A8/F11/F13): every config file in web/ is COPY'd in web/Dockerfile
|
||||
for cfg in postcss.config tailwind.config next.config; do
|
||||
for f in "$REPO_DIR"/web/${cfg}.{js,mjs,cjs,ts}; do
|
||||
[[ -f "$f" ]] || continue
|
||||
base=$(basename "$f")
|
||||
if ! grep -q "COPY web/${base}\\|COPY web/\\*" "$REPO_DIR/web/Dockerfile" 2>/dev/null; then
|
||||
echo "✗ F11/F13: $base exists but not COPY'd in web/Dockerfile"
|
||||
FAILED=1
|
||||
fi
|
||||
done
|
||||
done
|
||||
|
||||
# Check 2 (A9/F12): healthchecks use 127.0.0.1
|
||||
if grep -rE 'test:.*http://localhost' "$REPO_DIR"/docker-compose*.yml 2>/dev/null; then
|
||||
echo "✗ F12: healthcheck uses localhost (should be 127.0.0.1)"
|
||||
FAILED=1
|
||||
fi
|
||||
|
||||
# Check 3: .npmrc.docker matches canonical template
|
||||
if [[ -f "$REPO_DIR/.npmrc.docker" ]]; then
|
||||
if ! grep -q '\${GITEA_NPM_HOST}' "$REPO_DIR/.npmrc.docker"; then
|
||||
echo "✗ F4: .npmrc.docker doesn't use \${GITEA_NPM_HOST} placeholder"
|
||||
FAILED=1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check 4: .dockerignore doesn't exclude pnpm-lock.yaml
|
||||
if grep -q '^pnpm-lock\.yaml$' "$REPO_DIR/.dockerignore" 2>/dev/null; then
|
||||
echo "⚠ F1: .dockerignore excludes pnpm-lock.yaml (blocks lockfile optimization)"
|
||||
fi
|
||||
|
||||
# Check 5: base image is on approved list
|
||||
for df in "$REPO_DIR"/{backend,web}/Dockerfile; do
|
||||
[[ -f "$df" ]] || continue
|
||||
if ! grep -qE 'FROM (\$\{BASE_IMAGE\}|node:22-(alpine|slim))' "$df"; then
|
||||
echo "✗ Unapproved base image in $df"
|
||||
FAILED=1
|
||||
fi
|
||||
done
|
||||
|
||||
exit $FAILED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Open questions (numbered TODOs, not blockers)
|
||||
## 8. Phase E — Observability / lint (NEW)
|
||||
|
||||
1. **Shared pnpm cache volume?** Should the BuildKit pnpm store cache be shared
|
||||
across all 13 repos via a named Docker volume (`pnpm-store`) instead of
|
||||
per-repo BuildKit caches keyed by `id=pnpm`? (BuildKit caches are already
|
||||
shared by `id=` — verify before adding volume complexity.)
|
||||
2. **Custom base image?** Publish `bytelyst/node-pnpm:22` with pnpm
|
||||
pre-installed to skip the corepack step entirely. Cost: maintenance of a
|
||||
base image; benefit: ~5 s/build × 13 repos × N builds/day.
|
||||
3. **CI hostname?** Gitea Actions runs builds with `--add-host` to reach the
|
||||
registry. Is `host.docker.internal:host-gateway` portable to Linux CI
|
||||
runners, or do we need a CI-specific Dockerfile variant?
|
||||
4. **Canonical script home?** `docker-prep.sh` is currently per-repo with
|
||||
drift. Move to `learning_ai_common_plat/scripts/docker-prep.template.sh`
|
||||
with a `sync-docker-prep.sh` (mirrors `.npmrc` template pattern)?
|
||||
5. **Multi-platform builds?** Any need for `linux/amd64` + `linux/arm64`
|
||||
images? If yes, BuildKit cache mounts interact awkwardly with `buildx`
|
||||
`--platform`. Defer to separate roadmap.
|
||||
6. **Workspace flattening?** Could we eliminate the
|
||||
`../learning_ai_common_plat/packages/*` workspace entry inside Docker by
|
||||
building with a flattened `pnpm-workspace.yaml` (only local `backend/`)?
|
||||
This unlocks `--frozen-lockfile`. Requires lockfile regeneration step.
|
||||
New phase: `docker-doctor.sh` (see § 7.6 skeleton) as durable insurance
|
||||
against tonight's-class silent bugs (F11, F12, F13).
|
||||
|
||||
- [ ] **E1.** Land `docker-doctor.sh` in `learning_ai_common_plat/scripts/` (canonical)
|
||||
- [ ] **E2.** Provide a thin per-repo wrapper at `scripts/docker-doctor.sh` that calls the canonical
|
||||
- [ ] **E3.** Wire into CI: run on PRs touching `Dockerfile`, `docker-compose*.yml`, `.dockerignore`, `.npmrc.docker`
|
||||
- [ ] **E4.** Wire into pre-commit hook (warning-only at first, error after 2 weeks)
|
||||
- [ ] **E5.** Document checks in `learning_ai_common_plat/AI.dev/SKILLS/docker-doctor.md`
|
||||
- [ ] **E6.** Add `make docker-doctor` target to each pilot repo
|
||||
|
||||
Checks implemented:
|
||||
|
||||
| Check | Addresses | Action |
|
||||
|---|---|---|
|
||||
| Every `web/*.config.*` file is COPY'd | F11, F13 | Error |
|
||||
| `docker-compose.yml` healthcheck uses `127.0.0.1` | F12 | Error |
|
||||
| `.npmrc.docker` uses `${GITEA_NPM_HOST}` placeholder | F4 | Error |
|
||||
| `.dockerignore` doesn't exclude `pnpm-lock.yaml` | F1 | Warn (until A3 ADR lands) |
|
||||
| Base image is on approved list | Canonical decision | Error |
|
||||
| `.docker-deps/` and `*.bak` in `.gitignore` | B3 | Error |
|
||||
|
||||
---
|
||||
|
||||
## 9. Execution order
|
||||
## 9. Open questions (numbered TODOs, not blockers)
|
||||
|
||||
1. **Now (this commit):** roadmap doc lands here; sign-off requested.
|
||||
2. **A0 first** — fix `.npmrc.docker`, `docker-compose.yml`, `.dockerignore` on both pilots. Without this, the Gitea path doesn't work and no measurement is possible.
|
||||
3. **A1 + A2** on peakpulse backend. Measure. Commit.
|
||||
4. **A1 + A2** on clock backend, then clock web. Measure. Commit.
|
||||
5. **A4 + A5 + A6** on all three surfaces. Commit.
|
||||
6. **A3 ADR** — decide lockfile policy (defer implementation).
|
||||
7. **A7** — fill in metrics table.
|
||||
8. **Phase B** — harden `docker-prep.sh` on peakpulse, then mirror to clock.
|
||||
9. **Phase C** — verification gates C1–C7.
|
||||
10. **Phase D** — scheduled separately, only after § 5 passes.
|
||||
1. **Shared pnpm cache volume?** BuildKit caches are already shared across
|
||||
builds by `id=pnpm`. Test whether a named Docker volume adds anything
|
||||
before adding complexity.
|
||||
2. **Custom base image?** Publish `bytelyst/node-pnpm:22{alpine,slim}` with
|
||||
pnpm pre-installed to skip corepack. Cost: image maintenance; benefit: ~5 s/build.
|
||||
3. **CI hostname?** Verify `host.docker.internal:host-gateway` works in Gitea
|
||||
Actions Linux runners, or if a CI-specific Dockerfile variant is needed.
|
||||
4. **Multi-platform builds?** `linux/amd64` + `linux/arm64` interact awkwardly
|
||||
with cache mounts under `buildx`. Defer to separate roadmap.
|
||||
5. **Workspace flattening?** Eliminate the `../learning_ai_common_plat/packages/*`
|
||||
workspace entry inside Docker via a flattened `pnpm-workspace.yaml`.
|
||||
Unlocks `--frozen-lockfile`. Requires lockfile regeneration step.
|
||||
|
||||
---
|
||||
|
||||
## 10. Risk register
|
||||
## 10. Execution order
|
||||
|
||||
1. **Now (this commit):** roadmap doc v3 lands here; sign-off requested.
|
||||
2. **Phase A0 on `learning_ai_clock`** (web + backend) — pilot order
|
||||
intentionally inverted vs. v2: web is where F11/F13 incidents lived, and
|
||||
clock exercises both surface types in one repo. Fix `.npmrc.docker`,
|
||||
`docker-compose.yml`, `.dockerignore`. Verify **A0-V** (Gitea path works
|
||||
end-to-end) before any speed work.
|
||||
3. **A8 + A9 + A1** on clock (correctness before speed). Commit.
|
||||
4. **A2 + A4 + A5 + A6** on clock. Measure. Commit.
|
||||
5. **Phase A0 → A6** on `learning_ai_peakpulse` (backend only) as validation
|
||||
second pass for the simpler case.
|
||||
6. **A7** — fill in metrics table.
|
||||
7. **A3 ADR** — decide lockfile policy (defer implementation).
|
||||
8. **Phase B** — harden `docker-prep.sh` on clock, then promote to canonical
|
||||
home in common-plat (B7) and sync to peakpulse.
|
||||
9. **Phase E** — land `docker-doctor.sh`, wire into CI as warning, then error.
|
||||
10. **Phase C** — verification gates C1–C9.
|
||||
11. **Phase D** — scheduled separately, only after § 5 passes.
|
||||
|
||||
---
|
||||
|
||||
## 11. Risk register
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| Removing `pnpm-lock.yaml` from `.dockerignore` exposes a stale or sibling-aware lockfile that breaks Docker installs | Keep `--lockfile=false` for now (A3 ADR); revisit after F2 resolution |
|
||||
| BuildKit cache mount on shared CI runners causes cross-build interference | Use distinct `id=` per repo (`id=pnpm-${repo}`) if observed |
|
||||
| `host.docker.internal` doesn't resolve in Linux Docker | `extra_hosts: ["host.docker.internal:host-gateway"]` (added in A0-4) |
|
||||
| `host.docker.internal` doesn't resolve in Linux Docker | `extra_hosts: ["host.docker.internal:host-gateway"]` (A0-4) |
|
||||
| Removing `.docker-deps/` from default builds breaks repos that haven't done A0 yet | Wildcard `COPY .docker-deps*` keeps both paths working during migration |
|
||||
| `docker-prep.sh` `--force` is misused and `.bak` files get committed | Pre-commit hook (B4) blocks this regardless |
|
||||
| Corp network blocks `host.docker.internal:3300` | Verify SSH tunnel (`localhost:3300` from host) reaches Gitea; document in operations.md |
|
||||
| `docker-prep.sh` `--force` is misused and `.bak` files get committed | Pre-commit hook (B4) blocks `.bak`, `.tgz`, rewritten `package.json` |
|
||||
| Corp network blocks `host.docker.internal:3300` | Verify SSH tunnel reaches Gitea; document in operations.md |
|
||||
| **F11 regression: build green, app ships with no CSS** | C9 smoke test + Phase E `docker-doctor.sh` check on `web/*.config.*` COPY coverage |
|
||||
| **F12 regression: healthcheck false-fails on IPv6** | Phase E `docker-doctor.sh` grep for `localhost` in compose files |
|
||||
| **F13 regression: new config file added, Dockerfile forgotten** | A8-2 glob COPY pattern (root cause fix) + Phase E lint (defense in depth) |
|
||||
| `BASE_IMAGE` override in `notes` diverges silently from canonical | Phase E check approved list; document override in repo `AGENTS.md` |
|
||||
|
||||
Loading…
Reference in New Issue
Block a user