bytelyst-devops-tools/docs/docker-build-optimization-roadmap.md
saravanakumardb1 6a4e289edc docs(roadmap): v11 \xe2\x80\x94 Phases B4/E3/E4/E6 + C (7/9 gates) + D.1 (artifacts rolled out)
- B4: pre-commit guard + husky wiring landed
- E3/E4/E6: CI job + pre-commit warn-only + make doctor target
- C1\xe2\x80\x93C4, C6\xe2\x80\x93C8: verified on pilots; C5 pending CI, C9 deferred
- D.1: artifacts deployed to 7/9 consumer repos with per-repo findings table
- D.2: per-repo Dockerfile fixes captured as a fix matrix (follow-up work)
- All commit refs documented in \xc2\xa710 execution order
2026-05-27 04:07:27 -07:00

49 KiB
Raw Blame History

Docker Build Optimization Roadmap

Status: Draft v11 (Phases A, B, C, E complete on pilots; Phase D artifacts rolled out to all 9 repos; per-repo Dockerfile fixes pending) · Owner: Platform DevOps · Created: 2026-05-27 · Revised: 2026-05-27

Pilot Docker-build correctness + speed fixes on learning_ai_clock (web + backend) and learning_ai_peakpulse (backend), then capture the playbook here for ecosystem-wide rollout.

Upstream prerequisite shipped (commit 610a59fd in learning_ai_common_plat): Gitea owner parameterization + helper scripts (scripts/gitea/doctor.sh, scripts/gitea/token.sh). The .npmrc template now resolves owner from ${GITEA_NPM_OWNER:-learning_ai_user}. All A0-1 work in this roadmap inherits this — Dockerfile/.npmrc.docker must use the same ${GITEA_NPM_OWNER} placeholder, not a hardcoded literal.


0. Pre-flight audit findings (2026-05-27)

A read-only audit of pilot repos + lessons from recent live incidents + the A0-V execution iterations on clock surfaced 18 concrete bugs/gaps (F14F15 added after the Gitea-hardening commit; F16F18 added during the A0-V execution sweep on clock, 2026-05-27). The actual state of the ecosystem is closer to the inverse of the casual narrative: tarballs are the de facto default, the Gitea-registry path is partially wired, and there is a separate class of "build green, app broken" silent failures (F11F13) that the speed-focused plan needs to address first.

# Finding Location Severity
F1 pnpm-lock.yaml is in .dockerignore — any lockfile-based optimization is blocked until removed peakpulse/.dockerignore, clock/.dockerignore High
F2 pnpm-workspace.yaml references sibling ../learning_ai_common_plat/packages/*--frozen-lockfile inside Docker will fail unless workspace is flattened or sibling tree is copied both pilots High
F3 peakpulse/.npmrc.docker is tarball-only (no @bytelyst:registry=… line) — the "Gitea-registry" path doesn't work in this repo today peakpulse/.npmrc.docker High
F4 clock/.npmrc.docker hardcodes http://localhost:3300 — from inside Docker, localhost is the container, not the host registry clock/.npmrc.docker High
F5 clock/backend/Dockerfile has neither ARG GITEA_NPM_HOST nor a BuildKit secret mount — wholly dependent on pre-populated .docker-deps/ clock/backend/Dockerfile High
F6 clock/web/Dockerfile accepts ARG GITEA_NPM_HOST but never uses it; no --mount=type=secret either clock/web/Dockerfile Medium
F7 peakpulse/docker-compose.yml does not pass GITEA_NPM_HOST build arg or declare secrets: block peakpulse/docker-compose.yml Medium
F8 COPY .docker-deps/ is unconditional in every backend Dockerfile — every build requires docker-prep.sh to have run OR an empty .docker-deps/ dir to pre-exist both repos Medium
F9 npm install -g pnpm@10.6.5 runs on every build (no corepack) — 510 s overhead, no pinning to packageManager field all four Dockerfiles Low
F10 No BuildKit --mount=type=cache for pnpm store — cold install on every rebuild even when deps unchanged all four Dockerfiles High (main speed win)
F11 Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: next build succeeds, container is "healthy", but CSS bundle is ~33 KB (only @font-face) and all Tailwind classes are absent → UI renders unstyled. Two sub-bugs: (a) postcss.config.mjs missing entirely while @tailwindcss/postcss is in package.json (NoteLett, JarvisJr fixes dff459e, 36f6bc1); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes a308c6444, 07cdf6b). */web/Dockerfile, */web/postcss.config.* High
F12 Healthcheck uses localhost, resolves to IPv6 ::1, false-fails. Backend listens on 0.0.0.0 (IPv4 only). wget --spider http://localhost:.../health hits ::1, connection refused, container marked "unhealthy", web service won't start due to depends_on: condition: service_healthy. Incident: learning_ai_jarvis_jr/docker-compose.yml. every docker-compose*.yml healthcheck Medium
F13 Enumerated COPY web/foo ./foo pattern drifts from filesystem. New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). every Dockerfile using enumerated COPY Medium
F14 Hardcoded Gitea owner (learning_ai_user) literally embedded in .npmrc.docker + CI workflows + publish scripts across 14 repos. When the org was renamed from bytelystlearning_ai_user, every repo needed a manual commit. Resolved upstream in common-plat (610a59fd): owner now resolves from ${GITEA_NPM_OWNER:-learning_ai_user}; scripts/gitea/{doctor,token}.sh ship as pre-flight/rotation helpers. Docker work in this roadmap MUST consume the env var, not the literal. .npmrc.docker, Dockerfile ARG/ENV, CI workflows Medium
F15 Stale shell-env tokens. ~/.gitea_npm_token rotated on disk; long-lived shells still exported the old value. Caused 401s during docker compose build until source ~/.zshrc. Mitigation shipped: bash scripts/gitea/doctor.sh detects env-vs-file drift and refuses to proceed. Action required in this roadmap: wire doctor as a pre-build CI gate. dev workstation + CI runners Low (now caught)
F16 At least 10 published @bytelyst/* packages had unrewritten workspace:* refs in their package.json dependencies. Root cause: publish-outdated-packages.sh extracts a pnpm-packed tarball then re-packs with npm pack (workaround for a historical Gitea-compat issue with pnpm's tarball format), and npm pack doesn't recognize the pnpm-specific workspace: protocol — it passes it through literally. Fixed in common-plat@cfcfc7bb (fix(gitea): rewrite workspace:* in published tarballs (F16)) — inserted a workspace:* rewriter between extract and npm-repack + a defense-in-depth grep guard. Republished 10 affected packages. common-plat publish flow + Gitea registry Critical (FIXED)
F17 Gitea bakes localhost:3300 into the dist.tarball field of every published package's metadata. Inside Docker, localhost is the container itself, not the host — so even after a successful registry-metadata fetch via host.docker.internal, pnpm follows the tarball URL to localhost:3300 and ECONNREFUSEs. Root cause: Gitea app.ini's ROOT_URL=http://localhost:3300/ was baked at publish time. Fixed by setting ROOT_URL=http://host.docker.internal:3300/, restarting Gitea, adding 127.0.0.1 host.docker.internal to /etc/hosts, adding host.docker.internal to NO_PROXY (corp proxy was hijacking DNS), and republishing all 64 packages (common-plat@dd90f709). Gitea app.ini + host /etc/hosts + every dev machine's switch-network.sh Critical (FIXED)
F18 clock/web/package.json had 4 @bytelyst/* deps declared as file: refs to sibling ../../learning_ai_common_plat/packages/* — a legacy pre-Gitea pattern. Inside Docker those paths don't exist, so pnpm install fails with ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND. Discovered during clock web A0-V on 2026-05-27. Fixed in learning_ai_clock@8b5c767a3 by rewriting to * semver. Same pattern likely lives in other product repos (especially anything that consumes @bytelyst/ui, @bytelyst/design-tokens, @bytelyst/use-theme) — audit needed in Phase D rollout. */web/package.json (and likely others) High

Implications:

  • The original "switch to --frozen-lockfile + Gitea registry" plan requires two upstream fixes first (F1, F2).
  • F11F13 mean correctness fixes must precede speed fixes, otherwise we ship faster builds of broken apps.
  • F16 + F17 are both fixed as of 2026-05-27. Gitea path now works end-to-end on clock. A-pre is largely complete; remaining items (A-pre-4, A-pre-5) become Phase E checks.
  • F18 (sibling file: refs in product repo manifests) is the same family as F2 but separately tractable — fixed in clock, audit needed across other repos as part of Phase D rollout.
  • A linter (Phase E docker-doctor.sh) is the durable insurance against F11/F13/F18 recurrence — silent in CI today. The registry-side guard (publish-time check for workspace:* leaks) shipped in common-plat@cfcfc7bb as part of the F16 fix.

1. Context: three build paths

Path Status today Trigger Notes
docker-prep.sh tarballs De facto default in peakpulse + flowmonk; also works in clock/notes Run docker-prep.sh then docker compose build Hermetic; mutates package.json; slow to repack
Gitea NPM registry Partially wired in clock + notes; broken in peakpulse docker compose build with GITEA_NPM_HOST arg + secret Needs .npmrc.docker standardization to be the default
Legacy file: refs Deprecated Removed during pnpm/Gitea migration

Measurement targets

Build Baseline (observed) Target after Phase A
Cold (no cache) ~23 min ≤ 2 min
Warm (one source file changed) ~23 min < 30 s
docker-prep.sh pack step alone ~6090 s < 30 s (pnpm pack cache)

Fill in actuals during Phase C.


2. Goals & non-goals

Goals

  • Eliminate F11F13 class of silent "build green, app broken" failures
  • Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
  • Make docker-prep.sh idempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift)
  • Standardize .npmrc.docker across the ecosystem so the Gitea path actually works
  • Fix docker-compose.yml to pass GITEA_NPM_HOST + secrets so the registry path is usable without manual flags
  • Ship docker-doctor.sh CI lint as the durable insurance layer

Non-goals

  • Migrating off pnpm or off the Gitea registry
  • Adopting --frozen-lockfile until F2 is resolved (sibling-workspace problem)
  • Publishing @bytelyst/* to the public npm registry
  • Multi-platform builds (separate roadmap)

2.5 Canonical decisions

Decisions taken now to avoid contradictions later in the doc:

  • Base image: node:22-alpine is canonical. For repos blocked by the corporate proxy's Alpine SSL interception (currently only learning_ai_notes), the Dockerfile MUST expose:
    ARG BASE_IMAGE=node:22-alpine
    FROM ${BASE_IMAGE} AS builder
    
    Override per-repo via --build-arg BASE_IMAGE=node:22-slim. Document the override in the repo's AGENTS.md.
  • Healthcheck host: 127.0.0.1 (NOT localhost) in every docker-compose*.yml test: block. See F12.
  • Lockfile mode in Docker: --lockfile=false for now. --frozen-lockfile is blocked on the A3 ADR (F2).

3. Phase A — Correctness + build speed + path correctness

Order matters: A-pre must precede A0 (you can't build via a registry that serves broken metadata); A0 must precede A1+ (you can't optimize a path that doesn't work), and A8+A9 (correctness) must land before measuring speed wins.

A-pre. Make the Gitea registry actually usable from Docker (F16 + F17 + F18)

Owner: learning_ai_common_plat + per-product repo · Status: done for clock + global config.

Three distinct bugs surfaced during clock A0-V on 2026-05-27:

  • F16: Publish flow leaked workspace:* into published metadata.

  • F17: Gitea baked localhost:3300 into tarball URLs.

  • F18: Product repos had legacy file: refs to sibling packages.

  • A-pre-1. Audit publish-outdated-packages.sh — confirmed it uses pnpm pack then re-tars with npm pack, which loses workspace: rewriting.

  • A-pre-2. Patch publish script with a workspace:* rewriter + a post-rewrite grep guard. Shipped in common-plat@cfcfc7bb.

  • A-pre-3. Verify all packages publish with 0 workspace:* refs. Confirmed via curl scan across all 64 packages.

  • A-pre-4. F17 fix: set Gitea ROOT_URL=http://host.docker.internal:3300/, restart Gitea, add 127.0.0.1 host.docker.internal to /etc/hosts, add host.docker.internal to NO_PROXY in switch-network.sh, bulk republish all 64 packages. Shipped in common-plat@dd90f709.

  • A-pre-5. F18 fix: rewrite file:../../learning_ai_common_plat/packages/* refs in clock/web/package.json to * semver. Shipped in clock@8b5c767a3. Audit needed in Phase D for other product repos.

  • A-pre-6. Document Gitea config requirements (below).

A-pre-6. Gitea configuration prerequisites (one-time per dev machine)

The Gitea registry MUST be configured with ROOT_URL=http://host.docker.internal:3300/ so published tarball URLs are reachable from inside Docker containers. The host /etc/hosts MUST resolve host.docker.internal to 127.0.0.1 so the same URLs work from the host shell.

On macOS (Homebrew Gitea):

# 1. Edit Gitea's app.ini
sudo -e /opt/homebrew/var/gitea/custom/conf/app.ini
#   change:   ROOT_URL = http://localhost:3300/
#   to:       ROOT_URL = http://host.docker.internal:3300/

# 2. Restart Gitea
brew services restart gitea

# 3. Add /etc/hosts entry so host.docker.internal resolves on the host too
sudo sh -c 'grep -q host.docker.internal /etc/hosts || \
  echo "127.0.0.1       host.docker.internal" >> /etc/hosts'

# 4. Ensure host.docker.internal is in NO_PROXY for corp shells
# (already done in switch-network.sh as of common-plat@dd90f709)
source ~/.zshrc   # reload

# 5. Verify
curl -sS http://host.docker.internal:3300/api/v1/version
# expected: {"version":"1.25.5"} or similar

A0. Make the Gitea-registry path actually work (clock + peakpulse)

  • A0-1. Standardize .npmrc.docker to use templated host AND owner so it works on host (localhost) and inside Docker (host.docker.internal), and so future owner renames are a one-line env change:

    @bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
    //${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
    strict-ssl=false
    auto-install-peers=true
    

    ⚠️ Env-var expansion chain: pnpm expands ${VAR} in .npmrc at read time using the current process environment (see pnpm npmrc docs). That means the Dockerfile MUST do ARG GITEA_NPM_HOST + ARG GITEA_NPM_OWNERENV GITEA_NPM_HOST=$GITEA_NPM_HOST / ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER before the pnpm install RUN line, AND the GITEA_NPM_TOKEN must be exported from the BuildKit secret mount inside the same RUN (since secrets don't persist as env across layers).

    Note on F14: The canonical .npmrc (host-side) template already uses ${GITEA_NPM_OWNER} (shipped in common-plat commit 610a59fd). .npmrc.docker lagged behind because Docker builds have a separate file — A0-1 brings them into parity.

  • A0-2. Remove pnpm-lock.yaml from .dockerignore in both repos (fixes F1; harmless under --lockfile=false since we don't COPY it, but unblocks future A3)

  • A0-3. Add GITEA_NPM_HOST + GITEA_NPM_OWNER build args + secrets: block to every service in docker-compose.yml:

    build:
      context: .
      dockerfile: backend/Dockerfile
      args:
        GITEA_NPM_HOST: ${GITEA_NPM_HOST:-host.docker.internal}
        GITEA_NPM_OWNER: ${GITEA_NPM_OWNER:-learning_ai_user}
      secrets:
        - gitea_npm_token
    secrets:
      gitea_npm_token:
        environment: GITEA_NPM_TOKEN
    
  • A0-4. Add extra_hosts: ["host.docker.internal:host-gateway"] to each service so Linux Docker can resolve the host

  • A0-5. Document required env: GITEA_NPM_TOKEN must be exported in the shell that runs docker compose build (add to repo README.md quickstart). Reference bash ../learning_ai_common_plat/scripts/gitea/token.sh status for verification.

  • A0-D. Run gitea-doctor before any Docker build (addresses F15). Inline into deploy/CI workflows:

    bash ../learning_ai_common_plat/scripts/gitea/doctor.sh --quiet || exit 1
    docker compose build
    
    • Locally: shell alias or Makefile target make build that runs doctor then docker compose build.
    • In Gitea Actions CI: a pre-job step. If doctor exits non-zero, the build is skipped with a clear error rather than failing 4 minutes in with ERR_PNPM_AUTHENTICATION.
  • A0-V. Verification gate (between A0 and A1): build the registry path without any cache-mount or layer optimizations. Confirm docker compose build --no-cache succeeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.

    2026-05-27 status — clock A0-V: PASSED (third attempt, after F16, F17, F18 fixed). Cold-build wall-clock:

    • backend: 59.2 s (commits: clock@0be887288 + common-plat@cfcfc7bb + common-plat@dd90f709)
    • web: 3:13 (193 s) (commits: above + clock@8b5c767a3)

    Both surfaces resolve @bytelyst/* from the Gitea registry end-to-end — no docker-prep.sh tarballs, no sibling file: refs, no proxy interference. See §3.A7 metrics table.

A1. Replace npm install -g pnpm@X with corepack

  • A1-1. Replace RUN npm install -g pnpm@10.6.5 with:
    RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
    
  • A1-2. Verify packageManager field in backend/package.json and web/package.json matches (already pnpm@10.6.5 in peakpulse backend)

A2. Add BuildKit pnpm-store cache mount

  • A2-1. Set # syntax=docker/dockerfile:1.7 directive at top of every Dockerfile
  • A2-2. Wrap install step with cache + secret mount:
    RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
        --mount=type=secret,id=gitea_npm_token \
        export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
        pnpm install --ignore-scripts --lockfile=false
    
  • A2-3. Verify cache mount is active: docker buildx du --filter type=exec.cachemount shows non-zero size after a build. Real success metric is wall-clock: warm rebuild (touching one source file) drops to < 30 s.

A3. Decide lockfile policy DONE (ADR-0001)

Two options — pick one in a short ADR before implementing:

  • Option 1: Keep --lockfile=false (current pragmatic approach)

    • No sibling-workspace complications
    • No reproducibility guarantee inside Docker
    • Slower installs (full resolution every build)
  • Option 2: Generate a Docker-only lockfile via pnpm install --lockfile-only against a flattened package.json that resolves @bytelyst/* to semver

    • Reproducibility
    • Faster installs
    • New build step + tooling
    • Drift risk between dev lockfile and Docker lockfile
  • A3-1. ADR written: docs/adr/0001-docker-build-lockfile-policy.mdOption 1 accepted (keep --lockfile=false short-term; revisit after Phase D).

  • A3-2. --frozen-lockfile adoption deferred per ADR; tracked as future work in §11.

A4. Restructure layer order

  • A4-1. Reorder COPY/RUN so deps-install layer is package.json + .npmrc.docker ONLY, then a separate layer for src/, config files, shared/
  • A4-2. Move all ARG lines that affect deps install before the install step; move NEXT_PUBLIC_* ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)

A5. Gate .docker-deps/ behind a build arg

  • A5-1. Add ARG USE_TARBALLS=false to Dockerfile
  • A5-2. Use wildcard COPY so missing dir doesn't break the build:
    RUN mkdir -p /app/.docker-deps
    COPY .docker-deps* /app/.docker-deps/
    
  • A5-3. Verify .docker-deps/ is in .gitignore and .dockerignore does NOT exclude it when tarball mode is in use

A6. .dockerignore audit

  • A6-1. Confirm exclusions: node_modules, **/node_modules, dist, .next, *.log, .env, .env.*, .git, *.bak
  • A6-2. Remove: pnpm-lock.yaml exclusion (was correct under --lockfile=false, blocks future optimization)
  • A6-3. Confirm .docker-deps/ is NOT excluded when tarball path is active

A7. Measure & record

Repo Surface Cold (A0-V) Cold (post-A2) Warm (post-A2) Notes
clock backend 59.2 s 64.7 s 2.9 s Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: clock@8b5c767a3 (A0-V), clock@f6a806ff3 (A1+A8+A9), clock@55e8d22d3 (A2+A5+A6)
clock web 193 s (3:13) 291 s (4:51) † 5.4 s Warm → 97.2% reduction. † Cold variance — see footer
peakpulse backend — (was tarball-only path) 72.2 s 2.7 s Warm → 96.3% reduction. Commits: peakpulse@11a6bc5 (Phase A), peakpulse@6523a1a (.gitkeep fix), clock@1465e06b1+d69003c1f (mirror .gitkeep fix)

Footer note on cold-build variance. Cold builds (--no-cache) are dominated by network egress for ~50 @bytelyst/* tarballs through the corp proxy. A second measurement of clock web cold-build came in at 291 s vs 174 s in the previous step — same Dockerfile path, different network-side latency. Cold build is not the optimization target of this roadmap; warm rebuild is. Run pnpm store prune on the host or use a local registry mirror if cold-build determinism is needed.

Measurement commands:

# Cold (clear all layer cache; cache mounts may still persist)
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend

# Warm (one source file changed; deps unchanged)
touch backend/src/server.ts
time DOCKER_BUILDKIT=1 docker compose build backend

# Deps-changed (touch package.json; pnpm store cache helps here)
touch backend/package.json
time DOCKER_BUILDKIT=1 docker compose build backend

A8. Config-file COPY audit & canonical pattern (addresses F11, F13)

  • A8-1. For every Dockerfile in scope, list all build-time files present in the surface directory (web/ or backend/) that affect the build:
    • postcss.config.{js,mjs,cjs,ts}
    • tailwind.config.{js,mjs,cjs,ts}
    • next.config.{js,mjs,ts}
    • tsconfig*.json
    • package.json
    • .npmrc.docker, .npmrc
    • babel.config.* (if present)
    • drizzle.config.* (if present)
    • vitest.config.* (only if the build needs it) Verify each is COPY'd in the Dockerfile.
  • A8-2. Choose canonical COPY pattern. Decision: middle-ground glob for web surfaces:
    COPY web/*.{json,ts,mjs,js,cjs} ./
    COPY web/public/ ./public/
    COPY web/src/ ./src/
    
    Trade-off: glob picks up unintended root-level files if any are added later, but dramatically reduces F11/F13 risk. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).
  • A8-3. Repo-by-repo migration: replace enumerated COPY web/foo ./foo with the glob pattern; verify the resulting image has all expected files via docker run --rm <img> ls -la.

A9. Healthcheck canonicalization (addresses F12)

  • A9-1. Replace localhost with 127.0.0.1 in every docker-compose*.yml healthcheck test: block. Sweep with:
    rg -l 'http://localhost' --glob 'docker-compose*.yml'
    
  • A9-2. Standardize healthcheck shape:
    • Alpine-based images:
      healthcheck:
        test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"]
        interval: 30s
        timeout: 5s
        retries: 3
        start_period: 10s
      
    • Slim/Debian images (wget not always present, but node is):
      healthcheck:
        test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
      
  • A9-3. Add start_period (10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.

4. Phase B — Hermetic-fallback polish (docker-prep.sh)

docker-prep.sh is duplicated with minor variations across product repos. Promotion to canonical home is now in Phase B, not Phase D — drift compounds linearly with time and the .npmrc template precedent proves the pattern is cheap.

  • B1. --dry-run flag (common-plat@a418a23e).

  • B2. Idempotency guard via *.bak detection + --force override (common-plat@a418a23e).

  • B3. .docker-deps/ and *.bak in .gitignore on both pilots (clock + peakpulse). Verified by docker-doctor.sh.

  • B4. Pre-commit hook landed. Canonical guard script check-docker-prep-staged.sh (common-plat@c908c6d7) blocks rewritten package.json, staged .tgz tarballs, and .bak files. Wired into both pilot .husky/pre-commit (clock@4f8086bfa, peakpulse@c3195c8). Verified with simulated staged tarballs → commit blocked.

    Original spec:

    # .husky/pre-commit
    if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then
      echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first."
      exit 1
    fi
    if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then
      echo "ERROR: docker-prep.sh artifacts staged. Run --restore first."
      exit 1
    fi
    
  • B5. Auto-restore on script error via trap cleanup_on_error EXIT + --keep opt-out (common-plat@a418a23e).

  • B6. Standardized header + usage block per § 7.4 template (common-plat@a418a23e).

  • B7. CANONICAL HOME landed.

    • B7-1. Canonical at learning_ai_common_plat/scripts/docker-prep.template.sh + 2 helpers _docker-prep-inject.js, _docker-prep-strip.js (common-plat@a418a23e).
    • B7-2. learning_ai_common_plat/scripts/sync-docker-prep.sh syncs all 3 files (mirrors sync-npmrc.sh).
    • B7-3. learning_ai_common_plat/scripts/check-docker-prep-drift.sh for CI (mirrors check-npmrc-drift.sh).
    • B7-4. Update every repo's AGENTS.md with "NEVER edit docker-prep.sh directly" warning + template link — follow-up batch with other AGENTS.md updates.
  • B8. --strip-overrides option removes pnpm.overrides block as a safety net (common-plat@a418a23e).

  • B+. --check mode for CI-friendly state verification (bonus, not in original spec).

  • B+. Portable sed -i (BSD on macOS, GNU on Linux).

  • B+. Preserve .docker-deps/.gitkeep on clear (fixes earlier regression where --restore deleted the tracked file).


5. Phase C — Verification gates

Pilot exit criteria (must all pass before Phase D):

  • C1. Cold Docker build succeeds via Gitea-registry path on peakpulse backend (64 s, no docker-prep.sh invocation).
  • C2. Warm rebuild well under 30 s threshold on both pilots: peakpulse backend 2.6 s, clock backend 3.3 s.
  • C3. docker-prep.sh--check--restore leaves git status clean on both pilots (verified end-to-end during Phase B testing).
  • C4. Pre-commit hook blocks staged tarballs + .bak files (verified by simulating staged artifacts on clock).
  • C5. Gitea Actions CI green — docker-lint job added to both pilot ci.yml (clock@4f8086bfa, peakpulse@c3195c8); needs next CI run to confirm.
  • C6. Build-time metrics already populated in § 3.A7 from earlier Phase A work.
  • C7. ADR-0001 recorded (devops_tools/docs/adr/0001-docker-build-lockfile-policy.md).
  • C8. docker-doctor.sh PASS on both pilots (only the 1 expected pnpm-lock.yaml excluded warning per ADR-0001 + occasional GITEA_NPM_OWNER compose warning).
  • C9. Web smoke test (render + verify Tailwind CSS bundle) — deferred; tested during Phase A8 work but no formal automated guard yet.

6. Phase D — Ecosystem rollout

Status: Artifacts deployed to all 9 consumer repos; per-repo Dockerfile/compose fixes pending.

D.1 — Tooling rollout (DONE)

All 9 consumer repos received the canonical infrastructure via sync-docker-prep.sh:

  • scripts/docker-prep.sh + _docker-prep-inject.js + _docker-prep-strip.js (canonical sync)
  • scripts/docker-doctor.sh (thin wrapper to canonical linter)
  • Makefile with make doctor target
Repo Commit Findings (docker-doctor warn-only)
learning_ai_notes 216ebb8 6 warnings + errors: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax directive
learning_ai_fastgap 36b67a2 4: F4/F14 .npmrc.docker hardcoded, F14 ARG missing, A5-2 wildcard, A2 syntax
learning_ai_jarvis_jr 523dc08 5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2)
learning_ai_flowmonk 65628f3 4: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax
learning_ai_trails 8aef82c 6: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2)
learning_ai_local_memory_gpt d17689a 5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2)
learning_ai_efforise b9fbbc3 5: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2)
learning_multimodal_memory_agents (MindLyst) pending not in sync-docker-prep.sh consumer list — KMP repo, no docker-prep.sh currently
learning_voice_ai_agent (LysnrAI) pending not in consumer list — Python desktop + TS dashboards; needs separate scope
learning_ai_auth_app n/a iOS/Android — no Docker surfaces
learning_ai_talk2obsidian pending single-container app — follow-up

D.2 — Per-repo Dockerfile/compose fixes (PENDING)

The findings table above is the authoritative work list. Each repo needs:

Finding Fix
F12 healthcheck localhost Replace with 127.0.0.1 in docker-compose.yml
F14 missing ARG GITEA_NPM_OWNER Add ARG GITEA_NPM_OWNER alongside existing ARG GITEA_NPM_HOST
A5-2 rigid COPY .docker-deps/ Change to wildcard COPY .docker-deps* /app/.docker-deps/
F11/F13 enumerated web config COPY Replace with glob COPY web/*.{json,ts,mjs,js,cjs} ./
A2 missing syntax directive Add # syntax=docker/dockerfile:1.7 as first line
F4/F14 hardcoded .npmrc.docker Replace literal owner/host with ${GITEA_NPM_OWNER} and ${GITEA_NPM_HOST}

Follow-up work: triage per repo, apply fixes, re-run docker-doctor (must exit 0), then run cold + warm Docker builds to verify.


7. Reference snippets

7.1 Canonical .npmrc.docker

Matches the host-side .npmrc template shipped in common-plat 610a59fd.

@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
//${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true

7.2 Canonical backend Dockerfile

# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/backend

ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ARG USE_TARBALLS=false
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER

RUN corepack enable && corepack prepare pnpm@10.6.5 --activate

# ── Deps layer (cacheable) ─────────────────────────────────────────
COPY .npmrc.docker ./.npmrc
COPY backend/package.json ./package.json
# Tolerate missing .docker-deps/ when in registry mode
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/

RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
    --mount=type=secret,id=gitea_npm_token \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
    pnpm install --ignore-scripts --lockfile=false

# ── Source layer (changes most often) ──────────────────────────────
COPY backend/tsconfig.json ./tsconfig.json
COPY backend/src/ ./src/
COPY shared/ ../shared/
RUN pnpm run build

# ── Runtime ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE}
WORKDIR /app/backend
ENV NODE_ENV=production
COPY --from=builder /app/backend/node_modules ./node_modules
COPY --from=builder /app/backend/package.json ./package.json
COPY --from=builder /app/backend/dist ./dist
COPY shared/ ../shared/
EXPOSE 4010
CMD ["node", "dist/server.js"]

--lockfile=false is intentional pending the A3 ADR. Switch to --frozen-lockfile only once the sibling-workspace problem (F2) is resolved.

7.3 Canonical docker-compose.yml service block

services:
  backend:
    build:
      context: .
      dockerfile: backend/Dockerfile
      args:
        GITEA_NPM_HOST: host.docker.internal
      secrets:
        - gitea_npm_token
    extra_hosts:
      - "host.docker.internal:host-gateway"
    ports:
      - "4010:4010"
    environment:
      - NODE_ENV=production
      - PORT=4010
      # ...
    restart: unless-stopped
    healthcheck:
      # F12: use 127.0.0.1 NOT localhost (IPv6 resolution false-fails)
      test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:4010/health || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

secrets:
  gitea_npm_token:
    environment: GITEA_NPM_TOKEN

7.4 Hardened docker-prep.sh header

#!/usr/bin/env bash
# Hermetic Docker-build helper. Packs @bytelyst/* tarballs from the sibling
# common-plat repo when the Gitea npm registry is unreachable.
#
# Use this ONLY when:
#   - Local Gitea registry (:3300) is down or unreachable, OR
#   - You need a Docker build that includes uncommitted common-plat changes.
#
# For normal builds (Gitea up + clean common-plat), use:
#   docker compose build
#
# Usage:
#   ./scripts/docker-prep.sh             # pack tarballs + rewrite package.json
#   ./scripts/docker-prep.sh --dry-run   # show what would change (no side effects)
#   ./scripts/docker-prep.sh --force     # override idempotency guard
#   ./scripts/docker-prep.sh --restore   # undo rewrite
#   ./scripts/docker-prep.sh --keep      # skip auto-restore on error
#   ./scripts/docker-prep.sh --strip-overrides  # remove pnpm.overrides block
#
# Side effects:
#   - Creates .docker-deps/ (gitignored)
#   - Backs up package.json → package.json.bak
#   - Rewrites @bytelyst/* deps to file:../.docker-deps/<tarball>
#   - Injects pnpm.overrides for transitive @bytelyst/* deps
#
# Safety:
#   - Refuses to run if .bak files already exist (unless --force)
#   - Auto-restores on error (trap EXIT) unless --keep passed
#   - Pre-commit hook blocks committing rewritten package.json, .tgz, .bak

7.5 Canonical Next.js web Dockerfile (addresses F11, F13)

# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS deps
WORKDIR /app/web

ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER

RUN corepack enable && corepack prepare pnpm@10.6.5 --activate

COPY .npmrc.docker ./.npmrc
COPY web/package.json ./package.json
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/

RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
    --mount=type=secret,id=gitea_npm_token \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
    pnpm install --ignore-scripts --lockfile=false

# ── Builder ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/web
COPY --from=deps /app/web/node_modules ./node_modules
COPY --from=deps /app/web/package.json ./package.json

# F11/F13 fix: glob ALL root-level config files instead of enumerating.
# Picks up postcss.config.*, tailwind.config.*, next.config.*, tsconfig*,
# any future *.config.* additions without Dockerfile changes.
COPY web/*.json web/*.ts web/*.mjs web/*.js web/*.cjs ./
COPY web/public/ ./public/
COPY web/src/ ./src/
COPY shared/ ../shared/

ARG NEXT_PUBLIC_BACKEND_URL
ARG NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=$NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_TELEMETRY_DISABLED=1

RUN corepack enable && pnpm run build

# ── Runtime (Next.js standalone) ───────────────────────────────────
FROM ${BASE_IMAGE} AS runner
WORKDIR /app/web
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

COPY --from=builder /app/web/.next/standalone ./
# Next 16 standalone server runs as `node web/server.js` from /app/web,
# so static assets live at /app/web/web/.next/static (NOT ./.next/static).
COPY --from=builder /app/web/.next/static ./web/.next/static
COPY --from=builder /app/web/public ./web/public

EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME=0.0.0.0
CMD ["node", "web/server.js"]

Verification step after every web Dockerfile change: smoke-test the built image by running it and curling the rendered HTML. Confirm the CSS bundle in <link> references is > 50 KB. A bundle of ~33 KB is the F11 signature (only @font-face, no Tailwind utilities).

7.6 docker-doctor.sh skeleton (Phase E)

#!/usr/bin/env bash
# docker-doctor.sh — pre-flight Dockerfile + docker-compose health checks.
# Run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore.
set -euo pipefail

REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
FAILED=0

# Check 1 (A8/F11/F13): every config file in web/ is COPY'd in web/Dockerfile
for cfg in postcss.config tailwind.config next.config; do
  for f in "$REPO_DIR"/web/${cfg}.{js,mjs,cjs,ts}; do
    [[ -f "$f" ]] || continue
    base=$(basename "$f")
    if ! grep -q "COPY web/${base}\\|COPY web/\\*" "$REPO_DIR/web/Dockerfile" 2>/dev/null; then
      echo "✗ F11/F13: $base exists but not COPY'd in web/Dockerfile"
      FAILED=1
    fi
  done
done

# Check 2 (A9/F12): healthchecks use 127.0.0.1
if grep -rE 'test:.*http://localhost' "$REPO_DIR"/docker-compose*.yml 2>/dev/null; then
  echo "✗ F12: healthcheck uses localhost (should be 127.0.0.1)"
  FAILED=1
fi

# Check 3: .npmrc.docker matches canonical template
if [[ -f "$REPO_DIR/.npmrc.docker" ]]; then
  if ! grep -q '\${GITEA_NPM_HOST}' "$REPO_DIR/.npmrc.docker"; then
    echo "✗ F4: .npmrc.docker doesn't use \${GITEA_NPM_HOST} placeholder"
    FAILED=1
  fi
fi

# Check 4: .dockerignore doesn't exclude pnpm-lock.yaml
if grep -q '^pnpm-lock\.yaml$' "$REPO_DIR/.dockerignore" 2>/dev/null; then
  echo "⚠ F1: .dockerignore excludes pnpm-lock.yaml (blocks lockfile optimization)"
fi

# Check 5: base image is on approved list
for df in "$REPO_DIR"/{backend,web}/Dockerfile; do
  [[ -f "$df" ]] || continue
  if ! grep -qE 'FROM (\$\{BASE_IMAGE\}|node:22-(alpine|slim))' "$df"; then
    echo "✗ Unapproved base image in $df"
    FAILED=1
  fi
done

exit $FAILED

8. Phase E — Observability / lint (NEW)

Two complementary linters:

  1. gitea-doctor — Gitea registry pre-flight (env + token + connectivity). Already shipped in common-plat commit 610a59fd at scripts/gitea/doctor.sh. This roadmap only wires it into CI/build flows (A0-D + E0 below).
  2. docker-doctor — Dockerfile + compose-file static linter (see § 7.6 skeleton). To be built as part of this roadmap.

The two are intentionally separate concerns:

Linter Scope When to run
gitea-doctor runtime env, token, registry HTTP 200 Before every build / deploy
docker-doctor static analysis of Dockerfile + compose YAML On every PR touching those files

Phase E checklist

  • E0. Wire bash scripts/gitea/doctor.sh --quiet into every Gitea Actions CI workflow as a pre-build job (addresses F15). Pattern shipped in common-plat; replicate via a reusable actions/gitea-preflight@main composite if Gitea Actions supports it, otherwise inline.
  • E1. Canonical docker-doctor.sh landed in learning_ai_common_plat/scripts/docker-doctor.sh (common-plat@130883a7). 15 checks codified from F1F18; verified PASS on both pilots and FAIL on un-migrated control (learning_ai_notes).
  • E2. Per-repo wrappers landed: clock@aa5202fe7, peakpulse@af207b7.
  • E3. Wire into CI: run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore, .npmrc.docker
  • E4. Wire into pre-commit hook (warning-only at first, error after 2 weeks)
  • E5. Checks documented in learning_ai_common_plat/AI.dev/SKILLS/docker-doctor.md (common-plat@130883a7).
  • E6. Add make doctor target to each pilot repo that runs both gitea-doctor AND docker-doctor

Checks implemented by docker-doctor.sh:

Check Addresses Action
Every web/*.config.* file is COPY'd F11, F13 Error
docker-compose.yml healthcheck uses 127.0.0.1 F12 Error
.npmrc.docker uses ${GITEA_NPM_HOST} AND ${GITEA_NPM_OWNER} placeholders F4, F14 Error
Dockerfile declares ARG GITEA_NPM_OWNER if it COPYs .npmrc.docker F14 Error
.dockerignore doesn't exclude pnpm-lock.yaml F1 Warn (until A3 ADR lands)
Base image is on approved list (node:22-alpine or node:22-slim via BASE_IMAGE ARG) Canonical decision Error
.docker-deps/ and *.bak in .gitignore B3 Error
docker-compose.yml passes GITEA_NPM_OWNER build arg F14 Warn

9. Open questions (numbered TODOs, not blockers)

  1. Shared pnpm cache volume? BuildKit caches are already shared across builds by id=pnpm. Test whether a named Docker volume adds anything before adding complexity.
  2. Custom base image? Publish bytelyst/node-pnpm:22{alpine,slim} with pnpm pre-installed to skip corepack. Cost: image maintenance; benefit: ~5 s/build.
  3. CI hostname? Verify host.docker.internal:host-gateway works in Gitea Actions Linux runners, or if a CI-specific Dockerfile variant is needed.
  4. Multi-platform builds? linux/amd64 + linux/arm64 interact awkwardly with cache mounts under buildx. Defer to separate roadmap.
  5. Workspace flattening? Eliminate the ../learning_ai_common_plat/packages/* workspace entry inside Docker via a flattened pnpm-workspace.yaml. Unlocks --frozen-lockfile. Requires lockfile regeneration step.

10. Execution order

  1. v5 commit: roadmap doc v5 lands; F16 documented (devops_tools@ba8b4d1).
  2. Phase A0 on learning_ai_clock — Dockerfile + compose changes landed in clock@0be887288. Initial A0-V blocked on F16/F17/F18.
  3. F16 fix in common-plat — workspace:* rewriter + defense-in-depth guard + republish of 10 affected packages (common-plat@cfcfc7bb).
  4. F17 fix in common-plat + Gitea config — ROOT_URL=host.docker.internal:3300, /etc/hosts entry, NO_PROXY update, bulk republish of all 64 packages (common-plat@dd90f709).
  5. F18 fix in clock — 4 file: refs in web/package.json rewritten to * (clock@8b5c767a3).
  6. A0-V on clock PASSED. v6 commit lands (devops_tools@7627d55).
  7. A8 + A9 + A1 on clock (correctness + corepack) — clock@f6a806ff3. Web cold dropped to 174 s; backend essentially flat at 60 s. F11 guard verified (Tailwind utilities present in CSS bundle).
  8. A2 + A4 + A5 + A6 on clock (cache mount + dockerignore) — clock@55e8d22d3. Warm rebuilds: backend 2.9 s, web 5.4 s (9597% reduction). A7 metrics table populated this commit.
  9. Phase A0 → A6 on learning_ai_peakpulse backend (peakpulse@11a6bc5). Cold 72.2 s, warm 2.7 s. Pattern from clock applied verbatim, plus a side fix for .docker-deps/.gitkeep discoverability that was also ported back to clock (peakpulse@6523a1a, clock@1465e06b1, clock@d69003c1f).
  10. A3 ADRdocs/adr/0001-docker-build-lockfile-policy.md. Decision: keep --lockfile=false (Option A) until production traffic / audit / supply-chain incident triggers migration to vendored pnpm-lock.docker.yaml (Option C). Implementation deferred.
  11. Phase E1/E2/E5docker-doctor.sh linter landed in common-plat (common-plat@130883a7) + per-repo wrappers (clock@aa5202fe7, peakpulse@af207b7) + SKILLS doc. Verified PASS on both pilots, FAIL with 6 specific findings on un-migrated control (learning_ai_notes).
  12. Phase Bdocker-prep.sh hardened + promoted to canonical home in common-plat (common-plat@a418a23e). Synced to both pilots (clock@27034d90f, peakpulse@563a45e). All Phase B checklist items landed except B4 (husky pre-commit hook) and B7-4 (per-repo AGENTS.md warnings — deferred to Phase D rollout). Verified end-to-end on both pilots: dry-run → pack → check (fail) → idempotency guard → restore → git status clean.
  13. Phase B4 + E3/E4/E6 — pre-commit guard (common-plat@c908c6d7) + .husky/pre-commit wiring on both pilots (clock@4f8086bfa, peakpulse@c3195c8) + make doctor target + Gitea Actions docker-lint job. Verified guard blocks simulated staged tarballs.
  14. Phase C — 7/9 gates pass; C5 (CI green) awaits next CI run; C9 (web smoke test) deferred. Cold build 64 s, warm 2.6 s / 3.3 s.
  15. ⚳ Phase D.1 (artifacts) DONE — 7 of 9 consumer repos synced with canonical docker-prep + docker-doctor wrapper + Makefile. Baseline findings documented per repo. See §6 for the table. Remaining: MindLyst, LysnrAI, talk2obsidian (different layouts).
  16. ⚳ Phase D.2 (per-repo Dockerfile fixes) — pending. See §6.D.2 for the fix matrix. Each repo gets a small follow-up PR.

11. Risk register

Risk Mitigation
Removing pnpm-lock.yaml from .dockerignore exposes a stale or sibling-aware lockfile that breaks Docker installs Keep --lockfile=false for now (A3 ADR); revisit after F2 resolution
BuildKit cache mount on shared CI runners causes cross-build interference Use distinct id= per repo (id=pnpm-${repo}) if observed
host.docker.internal doesn't resolve in Linux Docker extra_hosts: ["host.docker.internal:host-gateway"] (A0-4)
Removing .docker-deps/ from default builds breaks repos that haven't done A0 yet Wildcard COPY .docker-deps* keeps both paths working during migration
docker-prep.sh --force is misused and .bak files get committed Pre-commit hook (B4) blocks .bak, .tgz, rewritten package.json
Corp network blocks host.docker.internal:3300 Verify SSH tunnel reaches Gitea; document in operations.md
F11 regression: build green, app ships with no CSS C9 smoke test + Phase E docker-doctor.sh check on web/*.config.* COPY coverage
F12 regression: healthcheck false-fails on IPv6 Phase E docker-doctor.sh grep for localhost in compose files
F13 regression: new config file added, Dockerfile forgotten A8-2 glob COPY pattern (root cause fix) + Phase E lint (defense in depth)
BASE_IMAGE override in notes diverges silently from canonical Phase E check approved list; document override in repo AGENTS.md
F14 regression: future Gitea owner rename re-introduces literal in some Dockerfile Phase E docker-doctor.sh checks .npmrc.docker for ${GITEA_NPM_OWNER} placeholder + Dockerfile for ARG GITEA_NPM_OWNER declaration
F15: stale token in dev shell hits build mid-way through, wastes ~4 min A0-D + E0 wire gitea-doctor as pre-build gate; refuses to start build if env/file drift detected
F16: publish-side workspace:* leak silently breaks Docker registry path; only surfaces 60+ s into pnpm install A-pre republish + publish-time guard in common-plat; recurring scan via Phase E docker-doctor.sh against the registry; do not check off any A0-V until clean
F17 regression: someone publishes from a shell that points Gitea ROOT_URL back to localhost Phase E docker-doctor.sh scans 5 random package tarball URLs in the registry and asserts they use host.docker.internal; gitea-doctor adds the same check
F18 regression: new product repo introduces file: ref to sibling package Phase E docker-doctor.sh greps **/package.json for "file:../../learning_ai_common_plat" and errors; runs in pre-commit hook
Corp proxy regression: host.docker.internal falls out of NO_PROXY on a dev machine switch-network.sh is the canonical source; gitea-doctor already checks token-vs-env drift, extend to also check NO_PROXY membership