bytelyst-devops-tools/docs/docker-build-optimization-roadmap.md
saravanakumardb1 484c82c4b1 docs(roadmap): repair v13 \xc2\xa710 corruption + finalize C5 partial-validation note
A prior rebase merged the v13/v13.1 edits into \xc2\xa710 with mangled text
(steps 11\xe2\x80\x9320 out of order; step 10 garbled). Rebuilt the section
cleanly from v12 base + appended the new v13/v13.1 steps:

  11. Phase E1/E2/E5
  12. Phase B
  13. Phase B4 + E3/E4/E6
  14. Phase C (8/9; C5 partial)
  15. Phase D.1
  16. Phase D.2
  17. B7-4 AGENTS.md warnings
  18. Phase D extension (MindLyst, LysnrAI, talk2obsidian)
  19. Phase D.3 advisory cleanup
  20. C5 partial validation (this session)

Restored the lost "ported back to clock" trailing line for step 9.
No content changes beyond what was already documented in v13/v13.1.
2026-05-27 04:34:53 -07:00

53 KiB
Raw Blame History

Docker Build Optimization Roadmap

Status: Draft v13 (Phases A, B, C, D, E complete across all 12 consumer repos; docker-doctor PASS everywhere; only advisory warnings remain) · Owner: Platform DevOps · Created: 2026-05-27 · Revised: 2026-05-27

Pilot Docker-build correctness + speed fixes on learning_ai_clock (web + backend) and learning_ai_peakpulse (backend), then capture the playbook here for ecosystem-wide rollout.

Upstream prerequisite shipped (commit 610a59fd in learning_ai_common_plat): Gitea owner parameterization + helper scripts (scripts/gitea/doctor.sh, scripts/gitea/token.sh). The .npmrc template now resolves owner from ${GITEA_NPM_OWNER:-learning_ai_user}. All A0-1 work in this roadmap inherits this — Dockerfile/.npmrc.docker must use the same ${GITEA_NPM_OWNER} placeholder, not a hardcoded literal.


0. Pre-flight audit findings (2026-05-27)

A read-only audit of pilot repos + lessons from recent live incidents + the A0-V execution iterations on clock surfaced 18 concrete bugs/gaps (F14F15 added after the Gitea-hardening commit; F16F18 added during the A0-V execution sweep on clock, 2026-05-27). The actual state of the ecosystem is closer to the inverse of the casual narrative: tarballs are the de facto default, the Gitea-registry path is partially wired, and there is a separate class of "build green, app broken" silent failures (F11F13) that the speed-focused plan needs to address first.

# Finding Location Severity
F1 pnpm-lock.yaml is in .dockerignore — any lockfile-based optimization is blocked until removed peakpulse/.dockerignore, clock/.dockerignore High
F2 pnpm-workspace.yaml references sibling ../learning_ai_common_plat/packages/*--frozen-lockfile inside Docker will fail unless workspace is flattened or sibling tree is copied both pilots High
F3 peakpulse/.npmrc.docker is tarball-only (no @bytelyst:registry=… line) — the "Gitea-registry" path doesn't work in this repo today peakpulse/.npmrc.docker High
F4 clock/.npmrc.docker hardcodes http://localhost:3300 — from inside Docker, localhost is the container, not the host registry clock/.npmrc.docker High
F5 clock/backend/Dockerfile has neither ARG GITEA_NPM_HOST nor a BuildKit secret mount — wholly dependent on pre-populated .docker-deps/ clock/backend/Dockerfile High
F6 clock/web/Dockerfile accepts ARG GITEA_NPM_HOST but never uses it; no --mount=type=secret either clock/web/Dockerfile Medium
F7 peakpulse/docker-compose.yml does not pass GITEA_NPM_HOST build arg or declare secrets: block peakpulse/docker-compose.yml Medium
F8 COPY .docker-deps/ is unconditional in every backend Dockerfile — every build requires docker-prep.sh to have run OR an empty .docker-deps/ dir to pre-exist both repos Medium
F9 npm install -g pnpm@10.6.5 runs on every build (no corepack) — 510 s overhead, no pinning to packageManager field all four Dockerfiles Low
F10 No BuildKit --mount=type=cache for pnpm store — cold install on every rebuild even when deps unchanged all four Dockerfiles High (main speed win)
F11 Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: next build succeeds, container is "healthy", but CSS bundle is ~33 KB (only @font-face) and all Tailwind classes are absent → UI renders unstyled. Two sub-bugs: (a) postcss.config.mjs missing entirely while @tailwindcss/postcss is in package.json (NoteLett, JarvisJr fixes dff459e, 36f6bc1); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes a308c6444, 07cdf6b). */web/Dockerfile, */web/postcss.config.* High
F12 Healthcheck uses localhost, resolves to IPv6 ::1, false-fails. Backend listens on 0.0.0.0 (IPv4 only). wget --spider http://localhost:.../health hits ::1, connection refused, container marked "unhealthy", web service won't start due to depends_on: condition: service_healthy. Incident: learning_ai_jarvis_jr/docker-compose.yml. every docker-compose*.yml healthcheck Medium
F13 Enumerated COPY web/foo ./foo pattern drifts from filesystem. New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). every Dockerfile using enumerated COPY Medium
F14 Hardcoded Gitea owner (learning_ai_user) literally embedded in .npmrc.docker + CI workflows + publish scripts across 14 repos. When the org was renamed from bytelystlearning_ai_user, every repo needed a manual commit. Resolved upstream in common-plat (610a59fd): owner now resolves from ${GITEA_NPM_OWNER:-learning_ai_user}; scripts/gitea/{doctor,token}.sh ship as pre-flight/rotation helpers. Docker work in this roadmap MUST consume the env var, not the literal. .npmrc.docker, Dockerfile ARG/ENV, CI workflows Medium
F15 Stale shell-env tokens. ~/.gitea_npm_token rotated on disk; long-lived shells still exported the old value. Caused 401s during docker compose build until source ~/.zshrc. Mitigation shipped: bash scripts/gitea/doctor.sh detects env-vs-file drift and refuses to proceed. Action required in this roadmap: wire doctor as a pre-build CI gate. dev workstation + CI runners Low (now caught)
F16 At least 10 published @bytelyst/* packages had unrewritten workspace:* refs in their package.json dependencies. Root cause: publish-outdated-packages.sh extracts a pnpm-packed tarball then re-packs with npm pack (workaround for a historical Gitea-compat issue with pnpm's tarball format), and npm pack doesn't recognize the pnpm-specific workspace: protocol — it passes it through literally. Fixed in common-plat@cfcfc7bb (fix(gitea): rewrite workspace:* in published tarballs (F16)) — inserted a workspace:* rewriter between extract and npm-repack + a defense-in-depth grep guard. Republished 10 affected packages. common-plat publish flow + Gitea registry Critical (FIXED)
F17 Gitea bakes localhost:3300 into the dist.tarball field of every published package's metadata. Inside Docker, localhost is the container itself, not the host — so even after a successful registry-metadata fetch via host.docker.internal, pnpm follows the tarball URL to localhost:3300 and ECONNREFUSEs. Root cause: Gitea app.ini's ROOT_URL=http://localhost:3300/ was baked at publish time. Fixed by setting ROOT_URL=http://host.docker.internal:3300/, restarting Gitea, adding 127.0.0.1 host.docker.internal to /etc/hosts, adding host.docker.internal to NO_PROXY (corp proxy was hijacking DNS), and republishing all 64 packages (common-plat@dd90f709). Gitea app.ini + host /etc/hosts + every dev machine's switch-network.sh Critical (FIXED)
F18 clock/web/package.json had 4 @bytelyst/* deps declared as file: refs to sibling ../../learning_ai_common_plat/packages/* — a legacy pre-Gitea pattern. Inside Docker those paths don't exist, so pnpm install fails with ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND. Discovered during clock web A0-V on 2026-05-27. Fixed in learning_ai_clock@8b5c767a3 by rewriting to * semver. Same pattern likely lives in other product repos (especially anything that consumes @bytelyst/ui, @bytelyst/design-tokens, @bytelyst/use-theme) — audit needed in Phase D rollout. */web/package.json (and likely others) High

Implications:

  • The original "switch to --frozen-lockfile + Gitea registry" plan requires two upstream fixes first (F1, F2).
  • F11F13 mean correctness fixes must precede speed fixes, otherwise we ship faster builds of broken apps.
  • F16 + F17 are both fixed as of 2026-05-27. Gitea path now works end-to-end on clock. A-pre is largely complete; remaining items (A-pre-4, A-pre-5) become Phase E checks.
  • F18 (sibling file: refs in product repo manifests) is the same family as F2 but separately tractable — fixed in clock, audit needed across other repos as part of Phase D rollout.
  • A linter (Phase E docker-doctor.sh) is the durable insurance against F11/F13/F18 recurrence — silent in CI today. The registry-side guard (publish-time check for workspace:* leaks) shipped in common-plat@cfcfc7bb as part of the F16 fix.

1. Context: three build paths

Path Status today Trigger Notes
docker-prep.sh tarballs De facto default in peakpulse + flowmonk; also works in clock/notes Run docker-prep.sh then docker compose build Hermetic; mutates package.json; slow to repack
Gitea NPM registry Partially wired in clock + notes; broken in peakpulse docker compose build with GITEA_NPM_HOST arg + secret Needs .npmrc.docker standardization to be the default
Legacy file: refs Deprecated Removed during pnpm/Gitea migration

Measurement targets

Build Baseline (observed) Target after Phase A
Cold (no cache) ~23 min ≤ 2 min
Warm (one source file changed) ~23 min < 30 s
docker-prep.sh pack step alone ~6090 s < 30 s (pnpm pack cache)

Fill in actuals during Phase C.


2. Goals & non-goals

Goals

  • Eliminate F11F13 class of silent "build green, app broken" failures
  • Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
  • Make docker-prep.sh idempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift)
  • Standardize .npmrc.docker across the ecosystem so the Gitea path actually works
  • Fix docker-compose.yml to pass GITEA_NPM_HOST + secrets so the registry path is usable without manual flags
  • Ship docker-doctor.sh CI lint as the durable insurance layer

Non-goals

  • Migrating off pnpm or off the Gitea registry
  • Adopting --frozen-lockfile until F2 is resolved (sibling-workspace problem)
  • Publishing @bytelyst/* to the public npm registry
  • Multi-platform builds (separate roadmap)

2.5 Canonical decisions

Decisions taken now to avoid contradictions later in the doc:

  • Base image: node:22-alpine is canonical. For repos blocked by the corporate proxy's Alpine SSL interception (currently only learning_ai_notes), the Dockerfile MUST expose:
    ARG BASE_IMAGE=node:22-alpine
    FROM ${BASE_IMAGE} AS builder
    
    Override per-repo via --build-arg BASE_IMAGE=node:22-slim. Document the override in the repo's AGENTS.md.
  • Healthcheck host: 127.0.0.1 (NOT localhost) in every docker-compose*.yml test: block. See F12.
  • Lockfile mode in Docker: --lockfile=false for now. --frozen-lockfile is blocked on the A3 ADR (F2).

3. Phase A — Correctness + build speed + path correctness

Order matters: A-pre must precede A0 (you can't build via a registry that serves broken metadata); A0 must precede A1+ (you can't optimize a path that doesn't work), and A8+A9 (correctness) must land before measuring speed wins.

A-pre. Make the Gitea registry actually usable from Docker (F16 + F17 + F18)

Owner: learning_ai_common_plat + per-product repo · Status: done for clock + global config.

Three distinct bugs surfaced during clock A0-V on 2026-05-27:

  • F16: Publish flow leaked workspace:* into published metadata.

  • F17: Gitea baked localhost:3300 into tarball URLs.

  • F18: Product repos had legacy file: refs to sibling packages.

  • A-pre-1. Audit publish-outdated-packages.sh — confirmed it uses pnpm pack then re-tars with npm pack, which loses workspace: rewriting.

  • A-pre-2. Patch publish script with a workspace:* rewriter + a post-rewrite grep guard. Shipped in common-plat@cfcfc7bb.

  • A-pre-3. Verify all packages publish with 0 workspace:* refs. Confirmed via curl scan across all 64 packages.

  • A-pre-4. F17 fix: set Gitea ROOT_URL=http://host.docker.internal:3300/, restart Gitea, add 127.0.0.1 host.docker.internal to /etc/hosts, add host.docker.internal to NO_PROXY in switch-network.sh, bulk republish all 64 packages. Shipped in common-plat@dd90f709.

  • A-pre-5. F18 fix: rewrite file:../../learning_ai_common_plat/packages/* refs in clock/web/package.json to * semver. Shipped in clock@8b5c767a3. Audit needed in Phase D for other product repos.

  • A-pre-6. Document Gitea config requirements (below).

A-pre-6. Gitea configuration prerequisites (one-time per dev machine)

The Gitea registry MUST be configured with ROOT_URL=http://host.docker.internal:3300/ so published tarball URLs are reachable from inside Docker containers. The host /etc/hosts MUST resolve host.docker.internal to 127.0.0.1 so the same URLs work from the host shell.

On macOS (Homebrew Gitea):

# 1. Edit Gitea's app.ini
sudo -e /opt/homebrew/var/gitea/custom/conf/app.ini
#   change:   ROOT_URL = http://localhost:3300/
#   to:       ROOT_URL = http://host.docker.internal:3300/

# 2. Restart Gitea
brew services restart gitea

# 3. Add /etc/hosts entry so host.docker.internal resolves on the host too
sudo sh -c 'grep -q host.docker.internal /etc/hosts || \
  echo "127.0.0.1       host.docker.internal" >> /etc/hosts'

# 4. Ensure host.docker.internal is in NO_PROXY for corp shells
# (already done in switch-network.sh as of common-plat@dd90f709)
source ~/.zshrc   # reload

# 5. Verify
curl -sS http://host.docker.internal:3300/api/v1/version
# expected: {"version":"1.25.5"} or similar

A0. Make the Gitea-registry path actually work (clock + peakpulse)

  • A0-1. Standardize .npmrc.docker to use templated host AND owner so it works on host (localhost) and inside Docker (host.docker.internal), and so future owner renames are a one-line env change:

    @bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
    //${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
    strict-ssl=false
    auto-install-peers=true
    

    ⚠️ Env-var expansion chain: pnpm expands ${VAR} in .npmrc at read time using the current process environment (see pnpm npmrc docs). That means the Dockerfile MUST do ARG GITEA_NPM_HOST + ARG GITEA_NPM_OWNERENV GITEA_NPM_HOST=$GITEA_NPM_HOST / ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER before the pnpm install RUN line, AND the GITEA_NPM_TOKEN must be exported from the BuildKit secret mount inside the same RUN (since secrets don't persist as env across layers).

    Note on F14: The canonical .npmrc (host-side) template already uses ${GITEA_NPM_OWNER} (shipped in common-plat commit 610a59fd). .npmrc.docker lagged behind because Docker builds have a separate file — A0-1 brings them into parity.

  • A0-2. Remove pnpm-lock.yaml from .dockerignore in both repos (fixes F1; harmless under --lockfile=false since we don't COPY it, but unblocks future A3)

  • A0-3. Add GITEA_NPM_HOST + GITEA_NPM_OWNER build args + secrets: block to every service in docker-compose.yml:

    build:
      context: .
      dockerfile: backend/Dockerfile
      args:
        GITEA_NPM_HOST: ${GITEA_NPM_HOST:-host.docker.internal}
        GITEA_NPM_OWNER: ${GITEA_NPM_OWNER:-learning_ai_user}
      secrets:
        - gitea_npm_token
    secrets:
      gitea_npm_token:
        environment: GITEA_NPM_TOKEN
    
  • A0-4. Add extra_hosts: ["host.docker.internal:host-gateway"] to each service so Linux Docker can resolve the host

  • A0-5. Document required env: GITEA_NPM_TOKEN must be exported in the shell that runs docker compose build (add to repo README.md quickstart). Reference bash ../learning_ai_common_plat/scripts/gitea/token.sh status for verification.

  • A0-D. Run gitea-doctor before any Docker build (addresses F15). Inline into deploy/CI workflows:

    bash ../learning_ai_common_plat/scripts/gitea/doctor.sh --quiet || exit 1
    docker compose build
    
    • Locally: shell alias or Makefile target make build that runs doctor then docker compose build.
    • In Gitea Actions CI: a pre-job step. If doctor exits non-zero, the build is skipped with a clear error rather than failing 4 minutes in with ERR_PNPM_AUTHENTICATION.
  • A0-V. Verification gate (between A0 and A1): build the registry path without any cache-mount or layer optimizations. Confirm docker compose build --no-cache succeeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.

    2026-05-27 status — clock A0-V: PASSED (third attempt, after F16, F17, F18 fixed). Cold-build wall-clock:

    • backend: 59.2 s (commits: clock@0be887288 + common-plat@cfcfc7bb + common-plat@dd90f709)
    • web: 3:13 (193 s) (commits: above + clock@8b5c767a3)

    Both surfaces resolve @bytelyst/* from the Gitea registry end-to-end — no docker-prep.sh tarballs, no sibling file: refs, no proxy interference. See §3.A7 metrics table.

A1. Replace npm install -g pnpm@X with corepack

  • A1-1. Replace RUN npm install -g pnpm@10.6.5 with:
    RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
    
  • A1-2. Verify packageManager field in backend/package.json and web/package.json matches (already pnpm@10.6.5 in peakpulse backend)

A2. Add BuildKit pnpm-store cache mount

  • A2-1. Set # syntax=docker/dockerfile:1.7 directive at top of every Dockerfile
  • A2-2. Wrap install step with cache + secret mount:
    RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
        --mount=type=secret,id=gitea_npm_token \
        export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
        pnpm install --ignore-scripts --lockfile=false
    
  • A2-3. Verify cache mount is active: docker buildx du --filter type=exec.cachemount shows non-zero size after a build. Real success metric is wall-clock: warm rebuild (touching one source file) drops to < 30 s.

A3. Decide lockfile policy DONE (ADR-0001)

Two options — pick one in a short ADR before implementing:

  • Option 1: Keep --lockfile=false (current pragmatic approach)

    • No sibling-workspace complications
    • No reproducibility guarantee inside Docker
    • Slower installs (full resolution every build)
  • Option 2: Generate a Docker-only lockfile via pnpm install --lockfile-only against a flattened package.json that resolves @bytelyst/* to semver

    • Reproducibility
    • Faster installs
    • New build step + tooling
    • Drift risk between dev lockfile and Docker lockfile
  • A3-1. ADR written: docs/adr/0001-docker-build-lockfile-policy.mdOption 1 accepted (keep --lockfile=false short-term; revisit after Phase D).

  • A3-2. --frozen-lockfile adoption deferred per ADR; tracked as future work in §11.

A4. Restructure layer order

  • A4-1. Reorder COPY/RUN so deps-install layer is package.json + .npmrc.docker ONLY, then a separate layer for src/, config files, shared/
  • A4-2. Move all ARG lines that affect deps install before the install step; move NEXT_PUBLIC_* ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)

A5. Gate .docker-deps/ behind a build arg

  • A5-1. Add ARG USE_TARBALLS=false to Dockerfile
  • A5-2. Use wildcard COPY so missing dir doesn't break the build:
    RUN mkdir -p /app/.docker-deps
    COPY .docker-deps* /app/.docker-deps/
    
  • A5-3. Verify .docker-deps/ is in .gitignore and .dockerignore does NOT exclude it when tarball mode is in use

A6. .dockerignore audit

  • A6-1. Confirm exclusions: node_modules, **/node_modules, dist, .next, *.log, .env, .env.*, .git, *.bak
  • A6-2. Remove: pnpm-lock.yaml exclusion (was correct under --lockfile=false, blocks future optimization)
  • A6-3. Confirm .docker-deps/ is NOT excluded when tarball path is active

A7. Measure & record

Repo Surface Cold (A0-V) Cold (post-A2) Warm (post-A2) Notes
clock backend 59.2 s 64.7 s 2.9 s Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: clock@8b5c767a3 (A0-V), clock@f6a806ff3 (A1+A8+A9), clock@55e8d22d3 (A2+A5+A6)
clock web 193 s (3:13) 291 s (4:51) † 5.4 s Warm → 97.2% reduction. † Cold variance — see footer
peakpulse backend — (was tarball-only path) 72.2 s 2.7 s Warm → 96.3% reduction. Commits: peakpulse@11a6bc5 (Phase A), peakpulse@6523a1a (.gitkeep fix), clock@1465e06b1+d69003c1f (mirror .gitkeep fix)

Footer note on cold-build variance. Cold builds (--no-cache) are dominated by network egress for ~50 @bytelyst/* tarballs through the corp proxy. A second measurement of clock web cold-build came in at 291 s vs 174 s in the previous step — same Dockerfile path, different network-side latency. Cold build is not the optimization target of this roadmap; warm rebuild is. Run pnpm store prune on the host or use a local registry mirror if cold-build determinism is needed.

Measurement commands:

# Cold (clear all layer cache; cache mounts may still persist)
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend

# Warm (one source file changed; deps unchanged)
touch backend/src/server.ts
time DOCKER_BUILDKIT=1 docker compose build backend

# Deps-changed (touch package.json; pnpm store cache helps here)
touch backend/package.json
time DOCKER_BUILDKIT=1 docker compose build backend

A8. Config-file COPY audit & canonical pattern (addresses F11, F13)

  • A8-1. For every Dockerfile in scope, list all build-time files present in the surface directory (web/ or backend/) that affect the build:
    • postcss.config.{js,mjs,cjs,ts}
    • tailwind.config.{js,mjs,cjs,ts}
    • next.config.{js,mjs,ts}
    • tsconfig*.json
    • package.json
    • .npmrc.docker, .npmrc
    • babel.config.* (if present)
    • drizzle.config.* (if present)
    • vitest.config.* (only if the build needs it) Verify each is COPY'd in the Dockerfile.
  • A8-2. Choose canonical COPY pattern. Decision: middle-ground glob for web surfaces:
    COPY web/*.{json,ts,mjs,js,cjs} ./
    COPY web/public/ ./public/
    COPY web/src/ ./src/
    
    Trade-off: glob picks up unintended root-level files if any are added later, but dramatically reduces F11/F13 risk. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).
  • A8-3. Repo-by-repo migration: replace enumerated COPY web/foo ./foo with the glob pattern; verify the resulting image has all expected files via docker run --rm <img> ls -la.

A9. Healthcheck canonicalization (addresses F12)

  • A9-1. Replace localhost with 127.0.0.1 in every docker-compose*.yml healthcheck test: block. Sweep with:
    rg -l 'http://localhost' --glob 'docker-compose*.yml'
    
  • A9-2. Standardize healthcheck shape:
    • Alpine-based images:
      healthcheck:
        test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"]
        interval: 30s
        timeout: 5s
        retries: 3
        start_period: 10s
      
    • Slim/Debian images (wget not always present, but node is):
      healthcheck:
        test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
      
  • A9-3. Add start_period (10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.

4. Phase B — Hermetic-fallback polish (docker-prep.sh)

docker-prep.sh is duplicated with minor variations across product repos. Promotion to canonical home is now in Phase B, not Phase D — drift compounds linearly with time and the .npmrc template precedent proves the pattern is cheap.

  • B1. --dry-run flag (common-plat@a418a23e).

  • B2. Idempotency guard via *.bak detection + --force override (common-plat@a418a23e).

  • B3. .docker-deps/ and *.bak in .gitignore on both pilots (clock + peakpulse). Verified by docker-doctor.sh.

  • B4. Pre-commit hook landed. Canonical guard script check-docker-prep-staged.sh (common-plat@c908c6d7) blocks rewritten package.json, staged .tgz tarballs, and .bak files. Wired into both pilot .husky/pre-commit (clock@4f8086bfa, peakpulse@c3195c8). Verified with simulated staged tarballs → commit blocked.

    Original spec:

    # .husky/pre-commit
    if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then
      echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first."
      exit 1
    fi
    if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then
      echo "ERROR: docker-prep.sh artifacts staged. Run --restore first."
      exit 1
    fi
    
  • B5. Auto-restore on script error via trap cleanup_on_error EXIT + --keep opt-out (common-plat@a418a23e).

  • B6. Standardized header + usage block per § 7.4 template (common-plat@a418a23e).

  • B7. CANONICAL HOME landed.

    • B7-1. Canonical at learning_ai_common_plat/scripts/docker-prep.template.sh + 2 helpers _docker-prep-inject.js, _docker-prep-strip.js (common-plat@a418a23e).
    • B7-2. learning_ai_common_plat/scripts/sync-docker-prep.sh syncs all 3 files (mirrors sync-npmrc.sh).
    • B7-3. learning_ai_common_plat/scripts/check-docker-prep-drift.sh for CI (mirrors check-npmrc-drift.sh).
    • B7-4. AGENTS.md "NEVER edit docker-prep.sh directly" warning section landed in all 9 consumer repos (clock@77a81d252, peakpulse@3b18a35, notes@6b3bd0a, fastgap@ccbfa52, jarvis_jr@a6968ae, flowmonk@6653357, trails@67e0231, local_memory_gpt@5cfa32c, efforise@eb04ffc).
  • B8. --strip-overrides option removes pnpm.overrides block as a safety net (common-plat@a418a23e).

  • B+. --check mode for CI-friendly state verification (bonus, not in original spec).

  • B+. Portable sed -i (BSD on macOS, GNU on Linux).

  • B+. Preserve .docker-deps/.gitkeep on clear (fixes earlier regression where --restore deleted the tracked file).


5. Phase C — Verification gates

Pilot exit criteria (must all pass before Phase D):

  • C1. Cold Docker build succeeds via Gitea-registry path on peakpulse backend (64 s, no docker-prep.sh invocation).
  • C2. Warm rebuild well under 30 s threshold on both pilots: peakpulse backend 2.6 s, clock backend 3.3 s.
  • C3. docker-prep.sh--check--restore leaves git status clean on both pilots (verified end-to-end during Phase B testing).
  • C4. Pre-commit hook blocks staged tarballs + .bak files (verified by simulating staged artifacts on clock).
  • [~] C5. Gitea Actions CI green — partially validated. Workflow YAML is well-formed in both pilots (clock@4f8086bfa, peakpulse@c3195c8); local simulation of the docker-lint job (bash scripts/gitea/doctor.sh --quiet && bash scripts/docker-doctor.sh --quiet) exits 0 on both pilots. Gap: the pilot repos are not currently hosted on Gitea (http://localhost:3300/learning_ai_user/ has only learning_ai_uxui_web), so the workflow file ships but the runner never fires. A dummy git push gitea returns 404. C5 will fully close when the pilot repos are mirrored to Gitea (see learning_ai_common_plat/docs/runbooks/GITEA_VM_SETUP.md for the hosting setup).
  • C6. Build-time metrics already populated in § 3.A7 from earlier Phase A work.
  • C7. ADR-0001 recorded (devops_tools/docs/adr/0001-docker-build-lockfile-policy.md).
  • C8. docker-doctor.sh PASS on both pilots (only the 1 expected pnpm-lock.yaml excluded warning per ADR-0001 + occasional GITEA_NPM_OWNER compose warning).
  • C9. Web smoke test landed as Playwright spec web/e2e/css-bundle-smoke.spec.ts (clock@b8440bfea). Asserts title sanity + largest CSS bundle > 20 KB. Catches F11 regression at PR time.

6. Phase D — Ecosystem rollout

Status: DONE for all 12 consumer repos. D.1 artifacts + D.2 Dockerfile/compose fixes + D.3 advisory-warning cleanup + B7-4 AGENTS.md notes. docker-doctor exits PASS in every repo. Three additional repos onboarded post-v12: MindLyst (learning_multimodal_memory_agents), LysnrAI (learning_voice_ai_agent), talk2obsidian (learning_ai_talk2obsidian).

D.1 — Tooling rollout (DONE)

All 9 consumer repos received the canonical infrastructure via sync-docker-prep.sh:

  • scripts/docker-prep.sh + _docker-prep-inject.js + _docker-prep-strip.js (canonical sync)
  • scripts/docker-doctor.sh (thin wrapper to canonical linter)
  • Makefile with make doctor target
Repo Commit Findings (docker-doctor warn-only)
learning_ai_notes 216ebb8 6 warnings + errors: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax directive
learning_ai_fastgap 36b67a2 4: F4/F14 .npmrc.docker hardcoded, F14 ARG missing, A5-2 wildcard, A2 syntax
learning_ai_jarvis_jr 523dc08 5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2)
learning_ai_flowmonk 65628f3 4: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax
learning_ai_trails 8aef82c 6: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2)
learning_ai_local_memory_gpt d17689a 5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2)
learning_ai_efforise b9fbbc3 5: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2)
learning_multimodal_memory_agents (MindLyst) 84a5d10 full playbook applied (mindlyst-native/web/Dockerfile + backend/Dockerfile)
learning_voice_ai_agent (LysnrAI) 0f1fa64 full playbook applied (backend + user-dashboard-web + backend-python — Python Dockerfile correctly skips Node checks)
learning_ai_auth_app n/a iOS/Android — no Docker surfaces
learning_ai_talk2obsidian 793089e lighter rollout — single-stage Dockerfile, no .docker-deps/ pattern; docker-doctor + Makefile + AGENTS.md note + syntax directive + .gitignore rules

D.2 — Per-repo Dockerfile/compose fixes (DONE)

All 7 consumer repos received mechanical Phase D.2 fixes via an idempotent fixer script. Each repo's docker-doctor.sh now exits PASS (warnings only).

Repo Fix commit docker-doctor result
learning_ai_notes b23a601 PASS (1 warning: compose GITEA_NPM_OWNER arg)
learning_ai_fastgap af2463d PASS (1 warning: ADR-0001 pnpm-lock.yaml)
learning_ai_jarvis_jr 1a97a3f PASS (1 warning: ADR-0001 pnpm-lock.yaml)
learning_ai_flowmonk 412a657 PASS (1 warning: compose GITEA_NPM_OWNER arg)
learning_ai_trails 733477a PASS (1 warning: compose GITEA_NPM_OWNER arg)
learning_ai_local_memory_gpt 8c68595 PASS (1 warning: compose GITEA_NPM_OWNER arg)
learning_ai_efforise 06ea0d0 PASS (1 warning: healthcheck start_period)

Applied fixes (each fix is idempotent):

Finding Fix
F12 healthcheck localhost Replaced with 127.0.0.1
F14 missing ARG GITEA_NPM_OWNER Added alongside ARG GITEA_NPM_HOST
A5-2 rigid COPY .docker-deps/ Changed to wildcard COPY .docker-deps* ...
F11/F13 enumerated web config COPY Replaced with glob COPY web/*.json web/*.ts web/*.mjs ./
A2 missing syntax directive Added # syntax=docker/dockerfile:1.7
F4/F14 hardcoded .npmrc.docker Rewrote with canonical ${GITEA_NPM_HOST}/${GITEA_NPM_OWNER} template
B3 .gitignore missing *.bak Added rule
B3 missing .docker-deps/.gitkeep Created

D.3 — Advisory-warning cleanup (DONE)

Mechanical follow-up pass via /tmp/fix-compose-warnings.sh + /tmp/add-build-args.py (commits below) eliminated most advisory warnings across 10 repos:

Repo Cleanup commit
learning_ai_clock 3de867a80
learning_ai_notes 5687e5a
learning_ai_fastgap 94a81ac
learning_ai_jarvis_jr ed1cb88
learning_ai_flowmonk 938717f
learning_ai_trails 8837216
learning_ai_local_memory_gpt 0a486ac
learning_ai_efforise ff517f4
learning_multimodal_memory_agents 7304ca1
learning_voice_ai_agent 13291b9

Each repo got:

  • docker-compose.yml: full build.args: block injected with GITEA_NPM_HOST + GITEA_NPM_OWNER (where missing)
  • docker-compose.yml: start_period: 30s added to healthcheck blocks (where missing) to prevent false cold-start failures

D.4 — Final status

All 12 consumer repos now report docker-doctor: PASS with zero errors and at most a handful of expected advisory warnings (pnpm-lock.yaml excluded per ADR-0001; talk2obsidian's short-form build: . which would need yaml conversion to declare args).


7. Reference snippets

7.1 Canonical .npmrc.docker

Matches the host-side .npmrc template shipped in common-plat 610a59fd.

@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
//${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true

7.2 Canonical backend Dockerfile

# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/backend

ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ARG USE_TARBALLS=false
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER

RUN corepack enable && corepack prepare pnpm@10.6.5 --activate

# ── Deps layer (cacheable) ─────────────────────────────────────────
COPY .npmrc.docker ./.npmrc
COPY backend/package.json ./package.json
# Tolerate missing .docker-deps/ when in registry mode
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/

RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
    --mount=type=secret,id=gitea_npm_token \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
    pnpm install --ignore-scripts --lockfile=false

# ── Source layer (changes most often) ──────────────────────────────
COPY backend/tsconfig.json ./tsconfig.json
COPY backend/src/ ./src/
COPY shared/ ../shared/
RUN pnpm run build

# ── Runtime ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE}
WORKDIR /app/backend
ENV NODE_ENV=production
COPY --from=builder /app/backend/node_modules ./node_modules
COPY --from=builder /app/backend/package.json ./package.json
COPY --from=builder /app/backend/dist ./dist
COPY shared/ ../shared/
EXPOSE 4010
CMD ["node", "dist/server.js"]

--lockfile=false is intentional pending the A3 ADR. Switch to --frozen-lockfile only once the sibling-workspace problem (F2) is resolved.

7.3 Canonical docker-compose.yml service block

services:
  backend:
    build:
      context: .
      dockerfile: backend/Dockerfile
      args:
        GITEA_NPM_HOST: host.docker.internal
      secrets:
        - gitea_npm_token
    extra_hosts:
      - "host.docker.internal:host-gateway"
    ports:
      - "4010:4010"
    environment:
      - NODE_ENV=production
      - PORT=4010
      # ...
    restart: unless-stopped
    healthcheck:
      # F12: use 127.0.0.1 NOT localhost (IPv6 resolution false-fails)
      test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:4010/health || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

secrets:
  gitea_npm_token:
    environment: GITEA_NPM_TOKEN

7.4 Hardened docker-prep.sh header

#!/usr/bin/env bash
# Hermetic Docker-build helper. Packs @bytelyst/* tarballs from the sibling
# common-plat repo when the Gitea npm registry is unreachable.
#
# Use this ONLY when:
#   - Local Gitea registry (:3300) is down or unreachable, OR
#   - You need a Docker build that includes uncommitted common-plat changes.
#
# For normal builds (Gitea up + clean common-plat), use:
#   docker compose build
#
# Usage:
#   ./scripts/docker-prep.sh             # pack tarballs + rewrite package.json
#   ./scripts/docker-prep.sh --dry-run   # show what would change (no side effects)
#   ./scripts/docker-prep.sh --force     # override idempotency guard
#   ./scripts/docker-prep.sh --restore   # undo rewrite
#   ./scripts/docker-prep.sh --keep      # skip auto-restore on error
#   ./scripts/docker-prep.sh --strip-overrides  # remove pnpm.overrides block
#
# Side effects:
#   - Creates .docker-deps/ (gitignored)
#   - Backs up package.json → package.json.bak
#   - Rewrites @bytelyst/* deps to file:../.docker-deps/<tarball>
#   - Injects pnpm.overrides for transitive @bytelyst/* deps
#
# Safety:
#   - Refuses to run if .bak files already exist (unless --force)
#   - Auto-restores on error (trap EXIT) unless --keep passed
#   - Pre-commit hook blocks committing rewritten package.json, .tgz, .bak

7.5 Canonical Next.js web Dockerfile (addresses F11, F13)

# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS deps
WORKDIR /app/web

ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER

RUN corepack enable && corepack prepare pnpm@10.6.5 --activate

COPY .npmrc.docker ./.npmrc
COPY web/package.json ./package.json
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/

RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
    --mount=type=secret,id=gitea_npm_token \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
    pnpm install --ignore-scripts --lockfile=false

# ── Builder ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/web
COPY --from=deps /app/web/node_modules ./node_modules
COPY --from=deps /app/web/package.json ./package.json

# F11/F13 fix: glob ALL root-level config files instead of enumerating.
# Picks up postcss.config.*, tailwind.config.*, next.config.*, tsconfig*,
# any future *.config.* additions without Dockerfile changes.
COPY web/*.json web/*.ts web/*.mjs web/*.js web/*.cjs ./
COPY web/public/ ./public/
COPY web/src/ ./src/
COPY shared/ ../shared/

ARG NEXT_PUBLIC_BACKEND_URL
ARG NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=$NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_TELEMETRY_DISABLED=1

RUN corepack enable && pnpm run build

# ── Runtime (Next.js standalone) ───────────────────────────────────
FROM ${BASE_IMAGE} AS runner
WORKDIR /app/web
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

COPY --from=builder /app/web/.next/standalone ./
# Next 16 standalone server runs as `node web/server.js` from /app/web,
# so static assets live at /app/web/web/.next/static (NOT ./.next/static).
COPY --from=builder /app/web/.next/static ./web/.next/static
COPY --from=builder /app/web/public ./web/public

EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME=0.0.0.0
CMD ["node", "web/server.js"]

Verification step after every web Dockerfile change: smoke-test the built image by running it and curling the rendered HTML. Confirm the CSS bundle in <link> references is > 50 KB. A bundle of ~33 KB is the F11 signature (only @font-face, no Tailwind utilities).

7.6 docker-doctor.sh skeleton (Phase E)

#!/usr/bin/env bash
# docker-doctor.sh — pre-flight Dockerfile + docker-compose health checks.
# Run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore.
set -euo pipefail

REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
FAILED=0

# Check 1 (A8/F11/F13): every config file in web/ is COPY'd in web/Dockerfile
for cfg in postcss.config tailwind.config next.config; do
  for f in "$REPO_DIR"/web/${cfg}.{js,mjs,cjs,ts}; do
    [[ -f "$f" ]] || continue
    base=$(basename "$f")
    if ! grep -q "COPY web/${base}\\|COPY web/\\*" "$REPO_DIR/web/Dockerfile" 2>/dev/null; then
      echo "✗ F11/F13: $base exists but not COPY'd in web/Dockerfile"
      FAILED=1
    fi
  done
done

# Check 2 (A9/F12): healthchecks use 127.0.0.1
if grep -rE 'test:.*http://localhost' "$REPO_DIR"/docker-compose*.yml 2>/dev/null; then
  echo "✗ F12: healthcheck uses localhost (should be 127.0.0.1)"
  FAILED=1
fi

# Check 3: .npmrc.docker matches canonical template
if [[ -f "$REPO_DIR/.npmrc.docker" ]]; then
  if ! grep -q '\${GITEA_NPM_HOST}' "$REPO_DIR/.npmrc.docker"; then
    echo "✗ F4: .npmrc.docker doesn't use \${GITEA_NPM_HOST} placeholder"
    FAILED=1
  fi
fi

# Check 4: .dockerignore doesn't exclude pnpm-lock.yaml
if grep -q '^pnpm-lock\.yaml$' "$REPO_DIR/.dockerignore" 2>/dev/null; then
  echo "⚠ F1: .dockerignore excludes pnpm-lock.yaml (blocks lockfile optimization)"
fi

# Check 5: base image is on approved list
for df in "$REPO_DIR"/{backend,web}/Dockerfile; do
  [[ -f "$df" ]] || continue
  if ! grep -qE 'FROM (\$\{BASE_IMAGE\}|node:22-(alpine|slim))' "$df"; then
    echo "✗ Unapproved base image in $df"
    FAILED=1
  fi
done

exit $FAILED

8. Phase E — Observability / lint (NEW)

Two complementary linters:

  1. gitea-doctor — Gitea registry pre-flight (env + token + connectivity). Already shipped in common-plat commit 610a59fd at scripts/gitea/doctor.sh. This roadmap only wires it into CI/build flows (A0-D + E0 below).
  2. docker-doctor — Dockerfile + compose-file static linter (see § 7.6 skeleton). To be built as part of this roadmap.

The two are intentionally separate concerns:

Linter Scope When to run
gitea-doctor runtime env, token, registry HTTP 200 Before every build / deploy
docker-doctor static analysis of Dockerfile + compose YAML On every PR touching those files

Phase E checklist

  • E0. Wire bash scripts/gitea/doctor.sh --quiet into every Gitea Actions CI workflow as a pre-build job (addresses F15). Pattern shipped in common-plat; replicate via a reusable actions/gitea-preflight@main composite if Gitea Actions supports it, otherwise inline.
  • E1. Canonical docker-doctor.sh landed in learning_ai_common_plat/scripts/docker-doctor.sh (common-plat@130883a7). 15 checks codified from F1F18; verified PASS on both pilots and FAIL on un-migrated control (learning_ai_notes).
  • E2. Per-repo wrappers landed: clock@aa5202fe7, peakpulse@af207b7.
  • E3. Wire into CI: run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore, .npmrc.docker
  • E4. Wire into pre-commit hook (warning-only at first, error after 2 weeks)
  • E5. Checks documented in learning_ai_common_plat/AI.dev/SKILLS/docker-doctor.md (common-plat@130883a7).
  • E6. Add make doctor target to each pilot repo that runs both gitea-doctor AND docker-doctor

Checks implemented by docker-doctor.sh:

Check Addresses Action
Every web/*.config.* file is COPY'd F11, F13 Error
docker-compose.yml healthcheck uses 127.0.0.1 F12 Error
.npmrc.docker uses ${GITEA_NPM_HOST} AND ${GITEA_NPM_OWNER} placeholders F4, F14 Error
Dockerfile declares ARG GITEA_NPM_OWNER if it COPYs .npmrc.docker F14 Error
.dockerignore doesn't exclude pnpm-lock.yaml F1 Warn (until A3 ADR lands)
Base image is on approved list (node:22-alpine or node:22-slim via BASE_IMAGE ARG) Canonical decision Error
.docker-deps/ and *.bak in .gitignore B3 Error
docker-compose.yml passes GITEA_NPM_OWNER build arg F14 Warn

9. Open questions (numbered TODOs, not blockers)

  1. Shared pnpm cache volume? BuildKit caches are already shared across builds by id=pnpm. Test whether a named Docker volume adds anything before adding complexity.
  2. Custom base image? Publish bytelyst/node-pnpm:22{alpine,slim} with pnpm pre-installed to skip corepack. Cost: image maintenance; benefit: ~5 s/build.
  3. CI hostname? Verify host.docker.internal:host-gateway works in Gitea Actions Linux runners, or if a CI-specific Dockerfile variant is needed.
  4. Multi-platform builds? linux/amd64 + linux/arm64 interact awkwardly with cache mounts under buildx. Defer to separate roadmap.
  5. Workspace flattening? Eliminate the ../learning_ai_common_plat/packages/* workspace entry inside Docker via a flattened pnpm-workspace.yaml. Unlocks --frozen-lockfile. Requires lockfile regeneration step.

10. Execution order

  1. v5 commit: roadmap doc v5 lands; F16 documented (devops_tools@ba8b4d1).
  2. Phase A0 on learning_ai_clock — Dockerfile + compose changes landed in clock@0be887288. Initial A0-V blocked on F16/F17/F18.
  3. F16 fix in common-plat — workspace:* rewriter + defense-in-depth guard + republish of 10 affected packages (common-plat@cfcfc7bb).
  4. F17 fix in common-plat + Gitea config — ROOT_URL=host.docker.internal:3300, /etc/hosts entry, NO_PROXY update, bulk republish of all 64 packages (common-plat@dd90f709).
  5. F18 fix in clock — 4 file: refs in web/package.json rewritten to * (clock@8b5c767a3).
  6. A0-V on clock PASSED. v6 commit lands (devops_tools@7627d55).
  7. A8 + A9 + A1 on clock (correctness + corepack) — clock@f6a806ff3. Web cold dropped to 174 s; backend essentially flat at 60 s. F11 guard verified (Tailwind utilities present in CSS bundle).
  8. A2 + A4 + A5 + A6 on clock (cache mount + dockerignore) — clock@55e8d22d3. Warm rebuilds: backend 2.9 s, web 5.4 s (9597% reduction). A7 metrics table populated this commit.
  9. Phase A0 → A6 on learning_ai_peakpulse backend (peakpulse@11a6bc5). Cold 72.2 s, warm 2.7 s. Pattern from clock applied verbatim, plus a side fix for .docker-deps/.gitkeep discoverability that was also
  10. A3 ADRdocs/adr/0001-docker-build-lockfile-policy.md. Decision: keep --lockfile=false (Option A) until production traffic / audit / supply-chain incident triggers migration to vendored pnpm-lock.docker.yaml (Option C). Implementation deferred.
  11. Phase E1/E2/E5docker-doctor.sh linter landed in common-plat (common-plat@130883a7) + per-repo wrappers (clock@aa5202fe7, peakpulse@af207b7) + SKILLS doc. Verified PASS on both pilots, FAIL with 6 specific findings on un-migrated control (learning_ai_notes).
  12. Phase Bdocker-prep.sh hardened + promoted to canonical home in common-plat (common-plat@a418a23e). Synced to both pilots (clock@27034d90f, peakpulse@563a45e). Verified end-to-end on both pilots: dry-run → pack → check (fail) → idempotency guard → restore → git status clean.
  13. Phase B4 + E3/E4/E6 — pre-commit guard (common-plat@c908c6d7) + .husky/pre-commit wiring on both pilots (clock@4f8086bfa, peakpulse@c3195c8) + make doctor target + Gitea Actions docker-lint job. Verified guard blocks simulated staged tarballs.
  14. Phase C — 8/9 gates pass; C5 partially validated (workflow YAML well-formed; local docker-lint simulation exits 0; pilots not yet Gitea-hosted so runner does not fire). Cold build 64 s, warm 2.6 s / 3.3 s.
  15. Phase D.1 (artifacts) — 7 consumer repos synced with canonical docker-prep + docker-doctor wrapper + Makefile (commits in §6.D.1).
  16. Phase D.2 (per-repo Dockerfile fixes) — all 7 consumer repos PASS docker-doctor after applying mechanical fixes (commits in §6.D.2). Web smoke test (C9) landed on clock to guard F11 regression.
  17. B7-4 AGENTS.md "do not edit" warnings — landed in all 12 consumer repos.
  18. Phase D extension — MindLyst (84a5d10), LysnrAI (0f1fa64), talk2obsidian (793089e) brought into the consumer list. sync-docker-prep.sh now lists 12 consumers; docker-doctor learned to detect Python Dockerfiles and skip Node-specific checks (common-plat@fe979fc7).
  19. Phase D.3 advisory-warning cleanup — 10 repos received mechanical build.args injection + healthcheck.start_period additions. All 12 repos now docker-doctor: PASS with zero errors.
  20. ~ C5 partial validation (this session) — dummy commit pushed to clock (682f9629b/2f9c8c39a), confirmed git push gitea returns 404 (pilot repos not hosted on Gitea — only learning_ai_uxui_web exists there). Workflow YAML validates; local docker-lint simulation exit 0. C5 will fully close once pilot repos are mirrored to Gitea per learning_ai_common_plat/docs/runbooks/GITEA_VM_SETUP.md.

11. Risk register

Risk Mitigation
Removing pnpm-lock.yaml from .dockerignore exposes a stale or sibling-aware lockfile that breaks Docker installs Keep --lockfile=false for now (A3 ADR); revisit after F2 resolution
BuildKit cache mount on shared CI runners causes cross-build interference Use distinct id= per repo (id=pnpm-${repo}) if observed
host.docker.internal doesn't resolve in Linux Docker extra_hosts: ["host.docker.internal:host-gateway"] (A0-4)
Removing .docker-deps/ from default builds breaks repos that haven't done A0 yet Wildcard COPY .docker-deps* keeps both paths working during migration
docker-prep.sh --force is misused and .bak files get committed Pre-commit hook (B4) blocks .bak, .tgz, rewritten package.json
Corp network blocks host.docker.internal:3300 Verify SSH tunnel reaches Gitea; document in operations.md
F11 regression: build green, app ships with no CSS C9 smoke test + Phase E docker-doctor.sh check on web/*.config.* COPY coverage
F12 regression: healthcheck false-fails on IPv6 Phase E docker-doctor.sh grep for localhost in compose files
F13 regression: new config file added, Dockerfile forgotten A8-2 glob COPY pattern (root cause fix) + Phase E lint (defense in depth)
BASE_IMAGE override in notes diverges silently from canonical Phase E check approved list; document override in repo AGENTS.md
F14 regression: future Gitea owner rename re-introduces literal in some Dockerfile Phase E docker-doctor.sh checks .npmrc.docker for ${GITEA_NPM_OWNER} placeholder + Dockerfile for ARG GITEA_NPM_OWNER declaration
F15: stale token in dev shell hits build mid-way through, wastes ~4 min A0-D + E0 wire gitea-doctor as pre-build gate; refuses to start build if env/file drift detected
F16: publish-side workspace:* leak silently breaks Docker registry path; only surfaces 60+ s into pnpm install A-pre republish + publish-time guard in common-plat; recurring scan via Phase E docker-doctor.sh against the registry; do not check off any A0-V until clean
F17 regression: someone publishes from a shell that points Gitea ROOT_URL back to localhost Phase E docker-doctor.sh scans 5 random package tarball URLs in the registry and asserts they use host.docker.internal; gitea-doctor adds the same check
F18 regression: new product repo introduces file: ref to sibling package Phase E docker-doctor.sh greps **/package.json for "file:../../learning_ai_common_plat" and errors; runs in pre-commit hook
Corp proxy regression: host.docker.internal falls out of NO_PROXY on a dev machine switch-network.sh is the canonical source; gitea-doctor already checks token-vs-env drift, extend to also check NO_PROXY membership