# Docker Build Optimization Roadmap
> **Status:** Draft v7 (Phase A complete on clock; warm rebuilds 2.9 s backend / 5.4 s web) · **Owner:** Platform DevOps · **Created:** 2026-05-27 · **Revised:** 2026-05-27
>
> Pilot Docker-build correctness + speed fixes on `learning_ai_clock` (web + backend)
> and `learning_ai_peakpulse` (backend), then capture the playbook here for
> ecosystem-wide rollout.
>
> **Upstream prerequisite shipped (commit `610a59fd` in `learning_ai_common_plat`):**
> Gitea owner parameterization + helper scripts (`scripts/gitea/doctor.sh`,
> `scripts/gitea/token.sh`). The `.npmrc` template now resolves owner from
> `${GITEA_NPM_OWNER:-learning_ai_user}`. **All A0-1 work in this roadmap
> inherits this — Dockerfile/.npmrc.docker must use the same `${GITEA_NPM_OWNER}`
> placeholder, not a hardcoded literal.**
---
## 0. Pre-flight audit findings (2026-05-27)
A read-only audit of pilot repos + lessons from recent live incidents +
the A0-V execution iterations on clock surfaced **18 concrete bugs/gaps**
(F14–F15 added after the Gitea-hardening commit; F16–F18 added during the
A0-V execution sweep on clock, 2026-05-27). The actual state of the ecosystem is closer to the
inverse of the casual narrative: tarballs are the de facto default, the
Gitea-registry path is partially wired, and there is a separate class of
"build green, app broken" silent failures (F11–F13) that the speed-focused
plan needs to address first.
| # | Finding | Location | Severity |
|---|---|---|---|
| F1 | `pnpm-lock.yaml` is in `.dockerignore` — any lockfile-based optimization is blocked until removed | `peakpulse/.dockerignore`, `clock/.dockerignore` | High |
| F2 | `pnpm-workspace.yaml` references sibling `../learning_ai_common_plat/packages/*` — `--frozen-lockfile` inside Docker will fail unless workspace is flattened or sibling tree is copied | both pilots | High |
| F3 | `peakpulse/.npmrc.docker` is tarball-only (no `@bytelyst:registry=…` line) — the "Gitea-registry" path doesn't work in this repo today | `peakpulse/.npmrc.docker` | High |
| F4 | `clock/.npmrc.docker` hardcodes `http://localhost:3300` — from inside Docker, `localhost` is the container, not the host registry | `clock/.npmrc.docker` | High |
| F5 | `clock/backend/Dockerfile` has neither `ARG GITEA_NPM_HOST` nor a BuildKit secret mount — wholly dependent on pre-populated `.docker-deps/` | `clock/backend/Dockerfile` | High |
| F6 | `clock/web/Dockerfile` accepts `ARG GITEA_NPM_HOST` but never uses it; no `--mount=type=secret` either | `clock/web/Dockerfile` | Medium |
| F7 | `peakpulse/docker-compose.yml` does not pass `GITEA_NPM_HOST` build arg or declare `secrets:` block | `peakpulse/docker-compose.yml` | Medium |
| F8 | `COPY .docker-deps/` is unconditional in every backend Dockerfile — every build requires `docker-prep.sh` to have run OR an empty `.docker-deps/` dir to pre-exist | both repos | Medium |
| F9 | `npm install -g pnpm@10.6.5` runs on every build (no `corepack`) — 5–10 s overhead, no pinning to `packageManager` field | all four Dockerfiles | Low |
| F10 | No BuildKit `--mount=type=cache` for pnpm store — cold install on every rebuild even when deps unchanged | all four Dockerfiles | High (main speed win) |
| **F11** | **Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: `next build` succeeds, container is "healthy", but CSS bundle is ~33 KB (only `@font-face`) and all Tailwind classes are absent → UI renders unstyled.** Two sub-bugs: (a) `postcss.config.mjs` missing entirely while `@tailwindcss/postcss` is in `package.json` (NoteLett, JarvisJr fixes `dff459e`, `36f6bc1`); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes `a308c6444`, `07cdf6b`). | `*/web/Dockerfile`, `*/web/postcss.config.*` | **High** |
| **F12** | **Healthcheck uses `localhost`, resolves to IPv6 `::1`, false-fails.** Backend listens on `0.0.0.0` (IPv4 only). `wget --spider http://localhost:.../health` hits `::1`, connection refused, container marked "unhealthy", `web` service won't start due to `depends_on: condition: service_healthy`. Incident: `learning_ai_jarvis_jr/docker-compose.yml`. | every `docker-compose*.yml` healthcheck | **Medium** |
| **F13** | **Enumerated `COPY web/foo ./foo` pattern drifts from filesystem.** New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). | every Dockerfile using enumerated COPY | **Medium** |
| **F14** | **Hardcoded Gitea owner (`learning_ai_user`) literally embedded in `.npmrc.docker` + CI workflows + publish scripts across 14 repos.** When the org was renamed from `bytelyst` → `learning_ai_user`, every repo needed a manual commit. **Resolved upstream in `common-plat` (`610a59fd`):** owner now resolves from `${GITEA_NPM_OWNER:-learning_ai_user}`; `scripts/gitea/{doctor,token}.sh` ship as pre-flight/rotation helpers. Docker work in this roadmap MUST consume the env var, not the literal. | `.npmrc.docker`, Dockerfile `ARG`/`ENV`, CI workflows | **Medium** |
| **F15** | **Stale shell-env tokens.** `~/.gitea_npm_token` rotated on disk; long-lived shells still exported the old value. Caused 401s during `docker compose build` until `source ~/.zshrc`. **Mitigation shipped:** `bash scripts/gitea/doctor.sh` detects env-vs-file drift and refuses to proceed. **Action required in this roadmap:** wire doctor as a pre-build CI gate. | dev workstation + CI runners | Low (now caught) |
| **F16** | **At least 10 published `@bytelyst/*` packages had unrewritten `workspace:*` refs in their `package.json` dependencies.** Root cause: `publish-outdated-packages.sh` extracts a pnpm-packed tarball then **re-packs with `npm pack`** (workaround for a historical Gitea-compat issue with pnpm's tarball format), and `npm pack` doesn't recognize the pnpm-specific `workspace:` protocol — it passes it through literally. **Fixed in `common-plat@cfcfc7bb`** (`fix(gitea): rewrite workspace:* in published tarballs (F16)`) — inserted a workspace:* rewriter between extract and npm-repack + a defense-in-depth grep guard. Republished 10 affected packages. | `common-plat` publish flow + Gitea registry | **Critical (FIXED)** |
| **F17** | **Gitea bakes `localhost:3300` into the `dist.tarball` field of every published package's metadata.** Inside Docker, `localhost` is the container itself, not the host — so even after a successful registry-metadata fetch via `host.docker.internal`, pnpm follows the tarball URL to `localhost:3300` and ECONNREFUSEs. Root cause: Gitea `app.ini`'s `ROOT_URL=http://localhost:3300/` was baked at publish time. **Fixed** by setting `ROOT_URL=http://host.docker.internal:3300/`, restarting Gitea, adding `127.0.0.1 host.docker.internal` to `/etc/hosts`, adding `host.docker.internal` to `NO_PROXY` (corp proxy was hijacking DNS), and republishing all 64 packages (`common-plat@dd90f709`). | Gitea `app.ini` + host `/etc/hosts` + every dev machine's `switch-network.sh` | **Critical (FIXED)** |
| **F18** | **`clock/web/package.json` had 4 `@bytelyst/*` deps declared as `file:` refs to sibling `../../learning_ai_common_plat/packages/*`** — a legacy pre-Gitea pattern. Inside Docker those paths don't exist, so `pnpm install` fails with `ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND`. Discovered during clock web A0-V on 2026-05-27. **Fixed in `learning_ai_clock@8b5c767a3`** by rewriting to `*` semver. Same pattern likely lives in other product repos (especially anything that consumes `@bytelyst/ui`, `@bytelyst/design-tokens`, `@bytelyst/use-theme`) — audit needed in Phase D rollout. | `*/web/package.json` (and likely others) | **High** |
**Implications:**
- The original "switch to `--frozen-lockfile` + Gitea registry" plan requires
two upstream fixes first (F1, F2).
- F11–F13 mean **correctness fixes must precede speed fixes**, otherwise we
ship faster builds of broken apps.
- F16 + F17 are **both fixed** as of 2026-05-27. Gitea path now works
end-to-end on clock. A-pre is largely complete; remaining items (A-pre-4,
A-pre-5) become Phase E checks.
- F18 (sibling `file:` refs in product repo manifests) is the same family as
F2 but separately tractable — fixed in clock, audit needed across other
repos as part of Phase D rollout.
- A linter (Phase E `docker-doctor.sh`) is the durable insurance against
F11/F13/F18 recurrence — silent in CI today. The registry-side guard
(publish-time check for `workspace:*` leaks) shipped in `common-plat@cfcfc7bb`
as part of the F16 fix.
---
## 1. Context: three build paths
| Path | Status today | Trigger | Notes |
|---|---|---|---|
| **`docker-prep.sh` tarballs** | **De facto default** in peakpulse + flowmonk; also works in clock/notes | Run `docker-prep.sh` then `docker compose build` | Hermetic; mutates `package.json`; slow to repack |
| **Gitea NPM registry** | Partially wired in clock + notes; broken in peakpulse | `docker compose build` with `GITEA_NPM_HOST` arg + secret | Needs `.npmrc.docker` standardization to be the default |
| **Legacy `file:` refs** | Deprecated | — | Removed during pnpm/Gitea migration |
### Measurement targets
| Build | Baseline (observed) | Target after Phase A |
|---|---|---|
| Cold (no cache) | ~2–3 min | ≤ 2 min |
| Warm (one source file changed) | ~2–3 min | **< 30 s** |
| `docker-prep.sh` pack step alone | ~60–90 s | < 30 s (pnpm pack cache) |
> Fill in actuals during Phase C.
---
## 2. Goals & non-goals
**Goals**
- ✅ Eliminate F11–F13 class of silent "build green, app broken" failures
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
- ✅ Make `docker-prep.sh` idempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift)
- ✅ Standardize `.npmrc.docker` across the ecosystem so the Gitea path actually works
- ✅ Fix `docker-compose.yml` to pass `GITEA_NPM_HOST` + secrets so the registry path is usable without manual flags
- ✅ Ship `docker-doctor.sh` CI lint as the durable insurance layer
**Non-goals**
- ❌ Migrating off pnpm or off the Gitea registry
- ❌ Adopting `--frozen-lockfile` until F2 is resolved (sibling-workspace problem)
- ❌ Publishing `@bytelyst/*` to the public npm registry
- ❌ Multi-platform builds (separate roadmap)
---
## 2.5 Canonical decisions
Decisions taken now to avoid contradictions later in the doc:
- **Base image:** `node:22-alpine` is canonical. For repos blocked by the
corporate proxy's Alpine SSL interception (currently only
`learning_ai_notes`), the Dockerfile MUST expose:
```dockerfile
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS builder
```
Override per-repo via `--build-arg BASE_IMAGE=node:22-slim`. Document the
override in the repo's `AGENTS.md`.
- **Healthcheck host:** `127.0.0.1` (NOT `localhost`) in every
`docker-compose*.yml` `test:` block. See F12.
- **Lockfile mode in Docker:** `--lockfile=false` for now. `--frozen-lockfile`
is blocked on the A3 ADR (F2).
---
## 3. Phase A — Correctness + build speed + path correctness
Order matters: **A-pre must precede A0** (you can't build via a registry that
serves broken metadata); A0 must precede A1+ (you can't optimize a path that
doesn't work), and A8+A9 (correctness) must land before measuring speed wins.
### A-pre. Make the Gitea registry actually usable from Docker (F16 + F17 + F18)
**Owner:** `learning_ai_common_plat` + per-product repo · **Status:** ✅ done for clock + global config.
Three distinct bugs surfaced during clock A0-V on 2026-05-27:
- **F16:** Publish flow leaked `workspace:*` into published metadata.
- **F17:** Gitea baked `localhost:3300` into tarball URLs.
- **F18:** Product repos had legacy `file:` refs to sibling packages.
- [x] **A-pre-1.** Audit `publish-outdated-packages.sh` — confirmed it uses
`pnpm pack` then re-tars with `npm pack`, which loses `workspace:` rewriting.
- [x] **A-pre-2.** Patch publish script with a workspace:* rewriter + a
post-rewrite grep guard. Shipped in `common-plat@cfcfc7bb`.
- [x] **A-pre-3.** Verify all packages publish with `0` workspace:* refs.
Confirmed via curl scan across all 64 packages.
- [x] **A-pre-4.** F17 fix: set Gitea `ROOT_URL=http://host.docker.internal:3300/`,
restart Gitea, add `127.0.0.1 host.docker.internal` to `/etc/hosts`, add
`host.docker.internal` to `NO_PROXY` in `switch-network.sh`, bulk republish
all 64 packages. Shipped in `common-plat@dd90f709`.
- [x] **A-pre-5.** F18 fix: rewrite `file:../../learning_ai_common_plat/packages/*`
refs in `clock/web/package.json` to `*` semver. Shipped in `clock@8b5c767a3`.
Audit needed in Phase D for other product repos.
- [x] **A-pre-6.** Document Gitea config requirements (below).
### A-pre-6. Gitea configuration prerequisites (one-time per dev machine)
The Gitea registry MUST be configured with `ROOT_URL=http://host.docker.internal:3300/`
so published tarball URLs are reachable from inside Docker containers. The
host `/etc/hosts` MUST resolve `host.docker.internal` to `127.0.0.1` so the
same URLs work from the host shell.
On macOS (Homebrew Gitea):
```bash
# 1. Edit Gitea's app.ini
sudo -e /opt/homebrew/var/gitea/custom/conf/app.ini
# change: ROOT_URL = http://localhost:3300/
# to: ROOT_URL = http://host.docker.internal:3300/
# 2. Restart Gitea
brew services restart gitea
# 3. Add /etc/hosts entry so host.docker.internal resolves on the host too
sudo sh -c 'grep -q host.docker.internal /etc/hosts || \
echo "127.0.0.1 host.docker.internal" >> /etc/hosts'
# 4. Ensure host.docker.internal is in NO_PROXY for corp shells
# (already done in switch-network.sh as of common-plat@dd90f709)
source ~/.zshrc # reload
# 5. Verify
curl -sS http://host.docker.internal:3300/api/v1/version
# expected: {"version":"1.25.5"} or similar
```
### A0. Make the Gitea-registry path actually work (clock + peakpulse)
- [ ] **A0-1.** Standardize `.npmrc.docker` to use templated host AND owner so it works on host (`localhost`) and inside Docker (`host.docker.internal`), and so future owner renames are a one-line env change:
```
@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
//${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true
```
> **⚠️ Env-var expansion chain:** pnpm expands `${VAR}` in `.npmrc` at read
> time using the current process environment (see [pnpm npmrc docs][pnpm-npmrc]).
> That means the Dockerfile MUST do `ARG GITEA_NPM_HOST` + `ARG GITEA_NPM_OWNER`
> → `ENV GITEA_NPM_HOST=$GITEA_NPM_HOST` / `ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER`
> **before** the `pnpm install` RUN line, AND the `GITEA_NPM_TOKEN` must be
> exported from the BuildKit secret mount inside the same `RUN` (since secrets
> don't persist as env across layers).
>
> **Note on F14:** The canonical `.npmrc` (host-side) template already uses
> `${GITEA_NPM_OWNER}` (shipped in common-plat commit `610a59fd`).
> `.npmrc.docker` lagged behind because Docker builds have a separate file —
> A0-1 brings them into parity.
[pnpm-npmrc]: https://pnpm.io/npmrc
- [ ] **A0-2.** Remove `pnpm-lock.yaml` from `.dockerignore` in both repos (fixes F1; harmless under `--lockfile=false` since we don't COPY it, but unblocks future A3)
- [ ] **A0-3.** Add `GITEA_NPM_HOST` + `GITEA_NPM_OWNER` build args + `secrets:` block to every service in `docker-compose.yml`:
```yaml
build:
context: .
dockerfile: backend/Dockerfile
args:
GITEA_NPM_HOST: ${GITEA_NPM_HOST:-host.docker.internal}
GITEA_NPM_OWNER: ${GITEA_NPM_OWNER:-learning_ai_user}
secrets:
- gitea_npm_token
secrets:
gitea_npm_token:
environment: GITEA_NPM_TOKEN
```
- [ ] **A0-4.** Add `extra_hosts: ["host.docker.internal:host-gateway"]` to each service so Linux Docker can resolve the host
- [ ] **A0-5.** Document required env: `GITEA_NPM_TOKEN` must be exported in the shell that runs `docker compose build` (add to repo `README.md` quickstart). Reference `bash ../learning_ai_common_plat/scripts/gitea/token.sh status` for verification.
- [ ] **A0-D.** **Run `gitea-doctor` before any Docker build** (addresses F15). Inline into deploy/CI workflows:
```bash
bash ../learning_ai_common_plat/scripts/gitea/doctor.sh --quiet || exit 1
docker compose build
```
- Locally: shell alias or `Makefile` target `make build` that runs doctor then `docker compose build`.
- In Gitea Actions CI: a pre-job step. If `doctor` exits non-zero, the build is skipped with a clear error rather than failing 4 minutes in with `ERR_PNPM_AUTHENTICATION`.
- [ ] **A0-V.** **Verification gate (between A0 and A1):** build the registry path **without** any cache-mount or layer optimizations. Confirm `docker compose build --no-cache` succeeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.
> **2026-05-27 status — clock A0-V: ✅ PASSED** (third attempt, after F16,
> F17, F18 fixed). Cold-build wall-clock:
> - backend: **59.2 s** (commits: `clock@0be887288` + `common-plat@cfcfc7bb` + `common-plat@dd90f709`)
> - web: **3:13 (193 s)** (commits: above + `clock@8b5c767a3`)
>
> Both surfaces resolve `@bytelyst/*` from the Gitea registry end-to-end —
> no `docker-prep.sh` tarballs, no sibling `file:` refs, no proxy interference.
> See §3.A7 metrics table.
### A1. Replace `npm install -g pnpm@X` with corepack
- [ ] **A1-1.** Replace `RUN npm install -g pnpm@10.6.5` with:
```dockerfile
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
```
- [ ] **A1-2.** Verify `packageManager` field in `backend/package.json` and `web/package.json` matches (already `pnpm@10.6.5` in peakpulse backend)
### A2. Add BuildKit pnpm-store cache mount
- [ ] **A2-1.** Set `# syntax=docker/dockerfile:1.7` directive at top of every Dockerfile
- [ ] **A2-2.** Wrap install step with cache + secret mount:
```dockerfile
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
```
- [ ] **A2-3.** Verify cache mount is active: `docker buildx du --filter type=exec.cachemount` shows non-zero size after a build. **Real success metric** is wall-clock: warm rebuild (touching one source file) drops to < 30 s.
### A3. Decide lockfile policy (BLOCKED on F2 resolution)
Two options — pick one in a short ADR before implementing:
- **Option 1: Keep `--lockfile=false`** (current pragmatic approach)
- ✅ No sibling-workspace complications
- ❌ No reproducibility guarantee inside Docker
- ❌ Slower installs (full resolution every build)
- **Option 2: Generate a Docker-only lockfile** via `pnpm install --lockfile-only` against a flattened `package.json` that resolves `@bytelyst/*` to semver
- ✅ Reproducibility
- ✅ Faster installs
- ❌ New build step + tooling
- ❌ Drift risk between dev lockfile and Docker lockfile
- [ ] **A3-1.** Write 1-page ADR (`docs/decisions/0001-docker-lockfile-policy.md`) and pick Option 1 or 2
- [ ] **A3-2.** Defer `--frozen-lockfile` adoption until ADR lands
### A4. Restructure layer order
- [ ] **A4-1.** Reorder COPY/RUN so deps-install layer is `package.json` + `.npmrc.docker` ONLY, then a separate layer for `src/`, config files, `shared/`
- [ ] **A4-2.** Move all `ARG` lines that affect deps install **before** the install step; move `NEXT_PUBLIC_*` ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)
### A5. Gate `.docker-deps/` behind a build arg
- [ ] **A5-1.** Add `ARG USE_TARBALLS=false` to Dockerfile
- [ ] **A5-2.** Use wildcard COPY so missing dir doesn't break the build:
```dockerfile
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
```
- [ ] **A5-3.** Verify `.docker-deps/` is in `.gitignore` and `.dockerignore` does NOT exclude it when tarball mode is in use
### A6. `.dockerignore` audit
- [ ] **A6-1.** Confirm exclusions: `node_modules`, `**/node_modules`, `dist`, `.next`, `*.log`, `.env`, `.env.*`, `.git`, `*.bak`
- [ ] **A6-2.** Remove: `pnpm-lock.yaml` exclusion (was correct under `--lockfile=false`, blocks future optimization)
- [ ] **A6-3.** Confirm `.docker-deps/` is NOT excluded when tarball path is active
### A7. Measure & record
| Repo | Surface | Cold (A0-V) | Cold (post-A2) | Warm (post-A2) | Notes |
|---|---|---|---|---|---|
| clock | backend | **59.2 s** | **64.7 s** | **2.9 s** | Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: `clock@8b5c767a3` (A0-V), `clock@f6a806ff3` (A1+A8+A9), `clock@55e8d22d3` (A2+A5+A6) |
| clock | web | **193 s (3:13)** | **291 s (4:51) †** | **5.4 s** | Warm → 97.2% reduction. † Cold variance — see footer |
| peakpulse | backend | — | — | — | Pending §10 step 9 |
**Footer note on cold-build variance.** Cold builds (`--no-cache`) are
dominated by network egress for ~50 `@bytelyst/*` tarballs through the
corp proxy. A second measurement of clock web cold-build came in at
291 s vs 174 s in the previous step — same Dockerfile path, different
network-side latency. Cold build is **not** the optimization target of
this roadmap; warm rebuild is. Run `pnpm store prune` on the host or use
a local registry mirror if cold-build determinism is needed.
Measurement commands:
```bash
# Cold (clear all layer cache; cache mounts may still persist)
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend
# Warm (one source file changed; deps unchanged)
touch backend/src/server.ts
time DOCKER_BUILDKIT=1 docker compose build backend
# Deps-changed (touch package.json; pnpm store cache helps here)
touch backend/package.json
time DOCKER_BUILDKIT=1 docker compose build backend
```
### A8. Config-file COPY audit & canonical pattern (addresses F11, F13)
- [ ] **A8-1.** For every Dockerfile in scope, list all build-time files present in the surface directory (`web/` or `backend/`) that affect the build:
- `postcss.config.{js,mjs,cjs,ts}`
- `tailwind.config.{js,mjs,cjs,ts}`
- `next.config.{js,mjs,ts}`
- `tsconfig*.json`
- `package.json`
- `.npmrc.docker`, `.npmrc`
- `babel.config.*` (if present)
- `drizzle.config.*` (if present)
- `vitest.config.*` (only if the build needs it)
Verify each is COPY'd in the Dockerfile.
- [ ] **A8-2.** Choose canonical COPY pattern. **Decision: middle-ground glob** for web surfaces:
```dockerfile
COPY web/*.{json,ts,mjs,js,cjs} ./
COPY web/public/ ./public/
COPY web/src/ ./src/
```
Trade-off: glob picks up unintended root-level files if any are added later, but **dramatically reduces F11/F13 risk**. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).
- [ ] **A8-3.** Repo-by-repo migration: replace enumerated `COPY web/foo ./foo` with the glob pattern; verify the resulting image has all expected files via `docker run --rm ls -la`.
### A9. Healthcheck canonicalization (addresses F12)
- [ ] **A9-1.** Replace `localhost` with `127.0.0.1` in every `docker-compose*.yml` healthcheck `test:` block. Sweep with:
```
rg -l 'http://localhost' --glob 'docker-compose*.yml'
```
- [ ] **A9-2.** Standardize healthcheck shape:
- **Alpine-based images:**
```yaml
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
```
- **Slim/Debian images** (`wget` not always present, but `node` is):
```yaml
healthcheck:
test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
```
- [ ] **A9-3.** Add `start_period` (10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.
---
## 4. Phase B — Hermetic-fallback polish (`docker-prep.sh`)
`docker-prep.sh` is duplicated with minor variations across product repos.
**Promotion to canonical home is now in Phase B, not Phase D** — drift
compounds linearly with time and the `.npmrc` template precedent proves the
pattern is cheap.
- [ ] **B1.** Add `--dry-run` flag — list packs/rewrites, no side effects
- [ ] **B2.** Idempotency guard — refuse to run if any `*.bak` exists unless `--force`
- [ ] **B3.** Ensure `.docker-deps/` and `*.bak` are in `.gitignore` of every pilot repo
- [ ] **B4.** Pre-commit hook (husky) — block commits containing rewritten `package.json`, staged tarballs, OR `.bak` files:
```bash
# .husky/pre-commit
if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then
echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first."
exit 1
fi
if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then
echo "ERROR: docker-prep.sh artifacts staged. Run --restore first."
exit 1
fi
```
- [ ] **B5.** Auto-restore on script error via `trap restore_on_error EXIT` (unless `--keep` passed)
- [ ] **B6.** Update script header comment per § 7.4 template
- [ ] **B7. CANONICAL HOME (was deferred — now in Phase B proper).**
- [ ] **B7-1.** Move script to `learning_ai_common_plat/scripts/docker-prep.template.sh`
- [ ] **B7-2.** Add `learning_ai_common_plat/scripts/sync-docker-prep.sh` to copy template into all product repos (mirrors `sync-npmrc.sh`)
- [ ] **B7-3.** Add `learning_ai_common_plat/scripts/check-docker-prep-drift.sh` for CI (mirrors `check-npmrc-drift.sh`)
- [ ] **B7-4.** Update every repo's `AGENTS.md` with the "NEVER edit `docker-prep.sh` directly" warning + template link
- [ ] **B8.** Add `--strip-overrides` option that removes `pnpm.overrides` block after build — safety net in case `--restore` is forgotten
---
## 5. Phase C — Verification gates
Pilot exit criteria (must all pass before Phase D):
- [ ] **C1.** Cold Docker build succeeds on both pilots via Gitea-registry path (no `docker-prep.sh` invocation)
- [ ] **C2.** Warm rebuild (single source file touched) < 30 s on both pilots
- [ ] **C3.** `docker-prep.sh` → `docker compose build` → `--restore` leaves `git status` clean
- [ ] **C4.** Pre-commit hook blocks: (a) rewritten `package.json`, (b) staged `.tgz`, (c) staged `.bak`
- [ ] **C5.** Gitea Actions CI green on both pilots (verify CI uses the same Dockerfile path)
- [ ] **C6.** Build-time metrics filled into the table in § 3.A7
- [ ] **C7.** ADR recorded for A3 (lockfile policy)
- [ ] **C8.** `docker-doctor.sh` (Phase E) runs clean against both pilots
- [ ] **C9.** Smoke test: render the web app, inspect `