A prior rebase merged the v13/v13.1 edits into \xc2\xa710 with mangled text (steps 11\xe2\x80\x9320 out of order; step 10 garbled). Rebuilt the section cleanly from v12 base + appended the new v13/v13.1 steps: 11. Phase E1/E2/E5 12. Phase B 13. Phase B4 + E3/E4/E6 14. Phase C (8/9; C5 partial) 15. Phase D.1 16. Phase D.2 17. B7-4 AGENTS.md warnings 18. Phase D extension (MindLyst, LysnrAI, talk2obsidian) 19. Phase D.3 advisory cleanup 20. C5 partial validation (this session) Restored the lost "ported back to clock" trailing line for step 9. No content changes beyond what was already documented in v13/v13.1.
53 KiB
Docker Build Optimization Roadmap
Status: Draft v13 (Phases A, B, C, D, E complete across all 12 consumer repos; docker-doctor PASS everywhere; only advisory warnings remain) · Owner: Platform DevOps · Created: 2026-05-27 · Revised: 2026-05-27
Pilot Docker-build correctness + speed fixes on
learning_ai_clock(web + backend) andlearning_ai_peakpulse(backend), then capture the playbook here for ecosystem-wide rollout.Upstream prerequisite shipped (commit
610a59fdinlearning_ai_common_plat): Gitea owner parameterization + helper scripts (scripts/gitea/doctor.sh,scripts/gitea/token.sh). The.npmrctemplate now resolves owner from${GITEA_NPM_OWNER:-learning_ai_user}. All A0-1 work in this roadmap inherits this — Dockerfile/.npmrc.docker must use the same${GITEA_NPM_OWNER}placeholder, not a hardcoded literal.
0. Pre-flight audit findings (2026-05-27)
A read-only audit of pilot repos + lessons from recent live incidents + the A0-V execution iterations on clock surfaced 18 concrete bugs/gaps (F14–F15 added after the Gitea-hardening commit; F16–F18 added during the A0-V execution sweep on clock, 2026-05-27). The actual state of the ecosystem is closer to the inverse of the casual narrative: tarballs are the de facto default, the Gitea-registry path is partially wired, and there is a separate class of "build green, app broken" silent failures (F11–F13) that the speed-focused plan needs to address first.
| # | Finding | Location | Severity |
|---|---|---|---|
| F1 | pnpm-lock.yaml is in .dockerignore — any lockfile-based optimization is blocked until removed |
peakpulse/.dockerignore, clock/.dockerignore |
High |
| F2 | pnpm-workspace.yaml references sibling ../learning_ai_common_plat/packages/* — --frozen-lockfile inside Docker will fail unless workspace is flattened or sibling tree is copied |
both pilots | High |
| F3 | peakpulse/.npmrc.docker is tarball-only (no @bytelyst:registry=… line) — the "Gitea-registry" path doesn't work in this repo today |
peakpulse/.npmrc.docker |
High |
| F4 | clock/.npmrc.docker hardcodes http://localhost:3300 — from inside Docker, localhost is the container, not the host registry |
clock/.npmrc.docker |
High |
| F5 | clock/backend/Dockerfile has neither ARG GITEA_NPM_HOST nor a BuildKit secret mount — wholly dependent on pre-populated .docker-deps/ |
clock/backend/Dockerfile |
High |
| F6 | clock/web/Dockerfile accepts ARG GITEA_NPM_HOST but never uses it; no --mount=type=secret either |
clock/web/Dockerfile |
Medium |
| F7 | peakpulse/docker-compose.yml does not pass GITEA_NPM_HOST build arg or declare secrets: block |
peakpulse/docker-compose.yml |
Medium |
| F8 | COPY .docker-deps/ is unconditional in every backend Dockerfile — every build requires docker-prep.sh to have run OR an empty .docker-deps/ dir to pre-exist |
both repos | Medium |
| F9 | npm install -g pnpm@10.6.5 runs on every build (no corepack) — 5–10 s overhead, no pinning to packageManager field |
all four Dockerfiles | Low |
| F10 | No BuildKit --mount=type=cache for pnpm store — cold install on every rebuild even when deps unchanged |
all four Dockerfiles | High (main speed win) |
| F11 | Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: next build succeeds, container is "healthy", but CSS bundle is ~33 KB (only @font-face) and all Tailwind classes are absent → UI renders unstyled. Two sub-bugs: (a) postcss.config.mjs missing entirely while @tailwindcss/postcss is in package.json (NoteLett, JarvisJr fixes dff459e, 36f6bc1); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes a308c6444, 07cdf6b). |
*/web/Dockerfile, */web/postcss.config.* |
High |
| F12 | Healthcheck uses localhost, resolves to IPv6 ::1, false-fails. Backend listens on 0.0.0.0 (IPv4 only). wget --spider http://localhost:.../health hits ::1, connection refused, container marked "unhealthy", web service won't start due to depends_on: condition: service_healthy. Incident: learning_ai_jarvis_jr/docker-compose.yml. |
every docker-compose*.yml healthcheck |
Medium |
| F13 | Enumerated COPY web/foo ./foo pattern drifts from filesystem. New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). |
every Dockerfile using enumerated COPY | Medium |
| F14 | Hardcoded Gitea owner (learning_ai_user) literally embedded in .npmrc.docker + CI workflows + publish scripts across 14 repos. When the org was renamed from bytelyst → learning_ai_user, every repo needed a manual commit. Resolved upstream in common-plat (610a59fd): owner now resolves from ${GITEA_NPM_OWNER:-learning_ai_user}; scripts/gitea/{doctor,token}.sh ship as pre-flight/rotation helpers. Docker work in this roadmap MUST consume the env var, not the literal. |
.npmrc.docker, Dockerfile ARG/ENV, CI workflows |
Medium |
| F15 | Stale shell-env tokens. ~/.gitea_npm_token rotated on disk; long-lived shells still exported the old value. Caused 401s during docker compose build until source ~/.zshrc. Mitigation shipped: bash scripts/gitea/doctor.sh detects env-vs-file drift and refuses to proceed. Action required in this roadmap: wire doctor as a pre-build CI gate. |
dev workstation + CI runners | Low (now caught) |
| F16 | At least 10 published @bytelyst/* packages had unrewritten workspace:* refs in their package.json dependencies. Root cause: publish-outdated-packages.sh extracts a pnpm-packed tarball then re-packs with npm pack (workaround for a historical Gitea-compat issue with pnpm's tarball format), and npm pack doesn't recognize the pnpm-specific workspace: protocol — it passes it through literally. Fixed in common-plat@cfcfc7bb (fix(gitea): rewrite workspace:* in published tarballs (F16)) — inserted a workspace:* rewriter between extract and npm-repack + a defense-in-depth grep guard. Republished 10 affected packages. |
common-plat publish flow + Gitea registry |
Critical (FIXED) |
| F17 | Gitea bakes localhost:3300 into the dist.tarball field of every published package's metadata. Inside Docker, localhost is the container itself, not the host — so even after a successful registry-metadata fetch via host.docker.internal, pnpm follows the tarball URL to localhost:3300 and ECONNREFUSEs. Root cause: Gitea app.ini's ROOT_URL=http://localhost:3300/ was baked at publish time. Fixed by setting ROOT_URL=http://host.docker.internal:3300/, restarting Gitea, adding 127.0.0.1 host.docker.internal to /etc/hosts, adding host.docker.internal to NO_PROXY (corp proxy was hijacking DNS), and republishing all 64 packages (common-plat@dd90f709). |
Gitea app.ini + host /etc/hosts + every dev machine's switch-network.sh |
Critical (FIXED) |
| F18 | clock/web/package.json had 4 @bytelyst/* deps declared as file: refs to sibling ../../learning_ai_common_plat/packages/* — a legacy pre-Gitea pattern. Inside Docker those paths don't exist, so pnpm install fails with ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND. Discovered during clock web A0-V on 2026-05-27. Fixed in learning_ai_clock@8b5c767a3 by rewriting to * semver. Same pattern likely lives in other product repos (especially anything that consumes @bytelyst/ui, @bytelyst/design-tokens, @bytelyst/use-theme) — audit needed in Phase D rollout. |
*/web/package.json (and likely others) |
High |
Implications:
- The original "switch to
--frozen-lockfile+ Gitea registry" plan requires two upstream fixes first (F1, F2). - F11–F13 mean correctness fixes must precede speed fixes, otherwise we ship faster builds of broken apps.
- F16 + F17 are both fixed as of 2026-05-27. Gitea path now works end-to-end on clock. A-pre is largely complete; remaining items (A-pre-4, A-pre-5) become Phase E checks.
- F18 (sibling
file:refs in product repo manifests) is the same family as F2 but separately tractable — fixed in clock, audit needed across other repos as part of Phase D rollout. - A linter (Phase E
docker-doctor.sh) is the durable insurance against F11/F13/F18 recurrence — silent in CI today. The registry-side guard (publish-time check forworkspace:*leaks) shipped incommon-plat@cfcfc7bbas part of the F16 fix.
1. Context: three build paths
| Path | Status today | Trigger | Notes |
|---|---|---|---|
docker-prep.sh tarballs |
De facto default in peakpulse + flowmonk; also works in clock/notes | Run docker-prep.sh then docker compose build |
Hermetic; mutates package.json; slow to repack |
| Gitea NPM registry | Partially wired in clock + notes; broken in peakpulse | docker compose build with GITEA_NPM_HOST arg + secret |
Needs .npmrc.docker standardization to be the default |
Legacy file: refs |
Deprecated | — | Removed during pnpm/Gitea migration |
Measurement targets
| Build | Baseline (observed) | Target after Phase A |
|---|---|---|
| Cold (no cache) | ~2–3 min | ≤ 2 min |
| Warm (one source file changed) | ~2–3 min | < 30 s |
docker-prep.sh pack step alone |
~60–90 s | < 30 s (pnpm pack cache) |
Fill in actuals during Phase C.
2. Goals & non-goals
Goals
- ✅ Eliminate F11–F13 class of silent "build green, app broken" failures
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
- ✅ Make
docker-prep.shidempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift) - ✅ Standardize
.npmrc.dockeracross the ecosystem so the Gitea path actually works - ✅ Fix
docker-compose.ymlto passGITEA_NPM_HOST+ secrets so the registry path is usable without manual flags - ✅ Ship
docker-doctor.shCI lint as the durable insurance layer
Non-goals
- ❌ Migrating off pnpm or off the Gitea registry
- ❌ Adopting
--frozen-lockfileuntil F2 is resolved (sibling-workspace problem) - ❌ Publishing
@bytelyst/*to the public npm registry - ❌ Multi-platform builds (separate roadmap)
2.5 Canonical decisions
Decisions taken now to avoid contradictions later in the doc:
- Base image:
node:22-alpineis canonical. For repos blocked by the corporate proxy's Alpine SSL interception (currently onlylearning_ai_notes), the Dockerfile MUST expose:
Override per-repo viaARG BASE_IMAGE=node:22-alpine FROM ${BASE_IMAGE} AS builder--build-arg BASE_IMAGE=node:22-slim. Document the override in the repo'sAGENTS.md. - Healthcheck host:
127.0.0.1(NOTlocalhost) in everydocker-compose*.ymltest:block. See F12. - Lockfile mode in Docker:
--lockfile=falsefor now.--frozen-lockfileis blocked on the A3 ADR (F2).
3. Phase A — Correctness + build speed + path correctness
Order matters: A-pre must precede A0 (you can't build via a registry that serves broken metadata); A0 must precede A1+ (you can't optimize a path that doesn't work), and A8+A9 (correctness) must land before measuring speed wins.
A-pre. Make the Gitea registry actually usable from Docker (F16 + F17 + F18)
Owner: learning_ai_common_plat + per-product repo · Status: ✅ done for clock + global config.
Three distinct bugs surfaced during clock A0-V on 2026-05-27:
-
F16: Publish flow leaked
workspace:*into published metadata. -
F17: Gitea baked
localhost:3300into tarball URLs. -
F18: Product repos had legacy
file:refs to sibling packages. -
A-pre-1. Audit
publish-outdated-packages.sh— confirmed it usespnpm packthen re-tars withnpm pack, which losesworkspace:rewriting. -
A-pre-2. Patch publish script with a workspace:* rewriter + a post-rewrite grep guard. Shipped in
common-plat@cfcfc7bb. -
A-pre-3. Verify all packages publish with
0workspace:* refs. Confirmed via curl scan across all 64 packages. -
A-pre-4. F17 fix: set Gitea
ROOT_URL=http://host.docker.internal:3300/, restart Gitea, add127.0.0.1 host.docker.internalto/etc/hosts, addhost.docker.internaltoNO_PROXYinswitch-network.sh, bulk republish all 64 packages. Shipped incommon-plat@dd90f709. -
A-pre-5. F18 fix: rewrite
file:../../learning_ai_common_plat/packages/*refs inclock/web/package.jsonto*semver. Shipped inclock@8b5c767a3. Audit needed in Phase D for other product repos. -
A-pre-6. Document Gitea config requirements (below).
A-pre-6. Gitea configuration prerequisites (one-time per dev machine)
The Gitea registry MUST be configured with ROOT_URL=http://host.docker.internal:3300/
so published tarball URLs are reachable from inside Docker containers. The
host /etc/hosts MUST resolve host.docker.internal to 127.0.0.1 so the
same URLs work from the host shell.
On macOS (Homebrew Gitea):
# 1. Edit Gitea's app.ini
sudo -e /opt/homebrew/var/gitea/custom/conf/app.ini
# change: ROOT_URL = http://localhost:3300/
# to: ROOT_URL = http://host.docker.internal:3300/
# 2. Restart Gitea
brew services restart gitea
# 3. Add /etc/hosts entry so host.docker.internal resolves on the host too
sudo sh -c 'grep -q host.docker.internal /etc/hosts || \
echo "127.0.0.1 host.docker.internal" >> /etc/hosts'
# 4. Ensure host.docker.internal is in NO_PROXY for corp shells
# (already done in switch-network.sh as of common-plat@dd90f709)
source ~/.zshrc # reload
# 5. Verify
curl -sS http://host.docker.internal:3300/api/v1/version
# expected: {"version":"1.25.5"} or similar
A0. Make the Gitea-registry path actually work (clock + peakpulse)
-
A0-1. Standardize
.npmrc.dockerto use templated host AND owner so it works on host (localhost) and inside Docker (host.docker.internal), and so future owner renames are a one-line env change:@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/ //${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN} strict-ssl=false auto-install-peers=true⚠️ Env-var expansion chain: pnpm expands
${VAR}in.npmrcat read time using the current process environment (see pnpm npmrc docs). That means the Dockerfile MUST doARG GITEA_NPM_HOST+ARG GITEA_NPM_OWNER→ENV GITEA_NPM_HOST=$GITEA_NPM_HOST/ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNERbefore thepnpm installRUN line, AND theGITEA_NPM_TOKENmust be exported from the BuildKit secret mount inside the sameRUN(since secrets don't persist as env across layers).Note on F14: The canonical
.npmrc(host-side) template already uses${GITEA_NPM_OWNER}(shipped in common-plat commit610a59fd)..npmrc.dockerlagged behind because Docker builds have a separate file — A0-1 brings them into parity. -
A0-2. Remove
pnpm-lock.yamlfrom.dockerignorein both repos (fixes F1; harmless under--lockfile=falsesince we don't COPY it, but unblocks future A3) -
A0-3. Add
GITEA_NPM_HOST+GITEA_NPM_OWNERbuild args +secrets:block to every service indocker-compose.yml:build: context: . dockerfile: backend/Dockerfile args: GITEA_NPM_HOST: ${GITEA_NPM_HOST:-host.docker.internal} GITEA_NPM_OWNER: ${GITEA_NPM_OWNER:-learning_ai_user} secrets: - gitea_npm_token secrets: gitea_npm_token: environment: GITEA_NPM_TOKEN -
A0-4. Add
extra_hosts: ["host.docker.internal:host-gateway"]to each service so Linux Docker can resolve the host -
A0-5. Document required env:
GITEA_NPM_TOKENmust be exported in the shell that runsdocker compose build(add to repoREADME.mdquickstart). Referencebash ../learning_ai_common_plat/scripts/gitea/token.sh statusfor verification. -
A0-D. Run
gitea-doctorbefore any Docker build (addresses F15). Inline into deploy/CI workflows:bash ../learning_ai_common_plat/scripts/gitea/doctor.sh --quiet || exit 1 docker compose build- Locally: shell alias or
Makefiletargetmake buildthat runs doctor thendocker compose build. - In Gitea Actions CI: a pre-job step. If
doctorexits non-zero, the build is skipped with a clear error rather than failing 4 minutes in withERR_PNPM_AUTHENTICATION.
- Locally: shell alias or
-
A0-V. Verification gate (between A0 and A1): build the registry path without any cache-mount or layer optimizations. Confirm
docker compose build --no-cachesucceeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.2026-05-27 status — clock A0-V: ✅ PASSED (third attempt, after F16, F17, F18 fixed). Cold-build wall-clock:
- backend: 59.2 s (commits:
clock@0be887288+common-plat@cfcfc7bb+common-plat@dd90f709) - web: 3:13 (193 s) (commits: above +
clock@8b5c767a3)
Both surfaces resolve
@bytelyst/*from the Gitea registry end-to-end — nodocker-prep.shtarballs, no siblingfile:refs, no proxy interference. See §3.A7 metrics table. - backend: 59.2 s (commits:
A1. Replace npm install -g pnpm@X with corepack
- A1-1. Replace
RUN npm install -g pnpm@10.6.5with:RUN corepack enable && corepack prepare pnpm@10.6.5 --activate - A1-2. Verify
packageManagerfield inbackend/package.jsonandweb/package.jsonmatches (alreadypnpm@10.6.5in peakpulse backend)
A2. Add BuildKit pnpm-store cache mount
- A2-1. Set
# syntax=docker/dockerfile:1.7directive at top of every Dockerfile - A2-2. Wrap install step with cache + secret mount:
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \ --mount=type=secret,id=gitea_npm_token \ export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \ pnpm install --ignore-scripts --lockfile=false - A2-3. Verify cache mount is active:
docker buildx du --filter type=exec.cachemountshows non-zero size after a build. Real success metric is wall-clock: warm rebuild (touching one source file) drops to < 30 s.
A3. Decide lockfile policy ✅ DONE (ADR-0001)
Two options — pick one in a short ADR before implementing:
-
Option 1: Keep
--lockfile=false(current pragmatic approach)- ✅ No sibling-workspace complications
- ❌ No reproducibility guarantee inside Docker
- ❌ Slower installs (full resolution every build)
-
Option 2: Generate a Docker-only lockfile via
pnpm install --lockfile-onlyagainst a flattenedpackage.jsonthat resolves@bytelyst/*to semver- ✅ Reproducibility
- ✅ Faster installs
- ❌ New build step + tooling
- ❌ Drift risk between dev lockfile and Docker lockfile
-
A3-1. ADR written:
docs/adr/0001-docker-build-lockfile-policy.md— Option 1 accepted (keep--lockfile=falseshort-term; revisit after Phase D). -
A3-2.
--frozen-lockfileadoption deferred per ADR; tracked as future work in §11.
A4. Restructure layer order
- A4-1. Reorder COPY/RUN so deps-install layer is
package.json+.npmrc.dockerONLY, then a separate layer forsrc/, config files,shared/ - A4-2. Move all
ARGlines that affect deps install before the install step; moveNEXT_PUBLIC_*ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)
A5. Gate .docker-deps/ behind a build arg
- A5-1. Add
ARG USE_TARBALLS=falseto Dockerfile - A5-2. Use wildcard COPY so missing dir doesn't break the build:
RUN mkdir -p /app/.docker-deps COPY .docker-deps* /app/.docker-deps/ - A5-3. Verify
.docker-deps/is in.gitignoreand.dockerignoredoes NOT exclude it when tarball mode is in use
A6. .dockerignore audit
- A6-1. Confirm exclusions:
node_modules,**/node_modules,dist,.next,*.log,.env,.env.*,.git,*.bak - A6-2. Remove:
pnpm-lock.yamlexclusion (was correct under--lockfile=false, blocks future optimization) - A6-3. Confirm
.docker-deps/is NOT excluded when tarball path is active
A7. Measure & record
| Repo | Surface | Cold (A0-V) | Cold (post-A2) | Warm (post-A2) | Notes |
|---|---|---|---|---|---|
| clock | backend | 59.2 s | 64.7 s | 2.9 s | Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: clock@8b5c767a3 (A0-V), clock@f6a806ff3 (A1+A8+A9), clock@55e8d22d3 (A2+A5+A6) |
| clock | web | 193 s (3:13) | 291 s (4:51) † | 5.4 s | Warm → 97.2% reduction. † Cold variance — see footer |
| peakpulse | backend | — (was tarball-only path) | 72.2 s | 2.7 s | Warm → 96.3% reduction. Commits: peakpulse@11a6bc5 (Phase A), peakpulse@6523a1a (.gitkeep fix), clock@1465e06b1+d69003c1f (mirror .gitkeep fix) |
Footer note on cold-build variance. Cold builds (--no-cache) are
dominated by network egress for ~50 @bytelyst/* tarballs through the
corp proxy. A second measurement of clock web cold-build came in at
291 s vs 174 s in the previous step — same Dockerfile path, different
network-side latency. Cold build is not the optimization target of
this roadmap; warm rebuild is. Run pnpm store prune on the host or use
a local registry mirror if cold-build determinism is needed.
Measurement commands:
# Cold (clear all layer cache; cache mounts may still persist)
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend
# Warm (one source file changed; deps unchanged)
touch backend/src/server.ts
time DOCKER_BUILDKIT=1 docker compose build backend
# Deps-changed (touch package.json; pnpm store cache helps here)
touch backend/package.json
time DOCKER_BUILDKIT=1 docker compose build backend
A8. Config-file COPY audit & canonical pattern (addresses F11, F13)
- A8-1. For every Dockerfile in scope, list all build-time files present in the surface directory (
web/orbackend/) that affect the build:postcss.config.{js,mjs,cjs,ts}tailwind.config.{js,mjs,cjs,ts}next.config.{js,mjs,ts}tsconfig*.jsonpackage.json.npmrc.docker,.npmrcbabel.config.*(if present)drizzle.config.*(if present)vitest.config.*(only if the build needs it) Verify each is COPY'd in the Dockerfile.
- A8-2. Choose canonical COPY pattern. Decision: middle-ground glob for web surfaces:
Trade-off: glob picks up unintended root-level files if any are added later, but dramatically reduces F11/F13 risk. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).COPY web/*.{json,ts,mjs,js,cjs} ./ COPY web/public/ ./public/ COPY web/src/ ./src/ - A8-3. Repo-by-repo migration: replace enumerated
COPY web/foo ./foowith the glob pattern; verify the resulting image has all expected files viadocker run --rm <img> ls -la.
A9. Healthcheck canonicalization (addresses F12)
- A9-1. Replace
localhostwith127.0.0.1in everydocker-compose*.ymlhealthchecktest:block. Sweep with:rg -l 'http://localhost' --glob 'docker-compose*.yml' - A9-2. Standardize healthcheck shape:
- Alpine-based images:
healthcheck: test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"] interval: 30s timeout: 5s retries: 3 start_period: 10s - Slim/Debian images (
wgetnot always present, butnodeis):healthcheck: test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
- Alpine-based images:
- A9-3. Add
start_period(10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.
4. Phase B — Hermetic-fallback polish (docker-prep.sh)
docker-prep.sh is duplicated with minor variations across product repos.
Promotion to canonical home is now in Phase B, not Phase D — drift
compounds linearly with time and the .npmrc template precedent proves the
pattern is cheap.
-
B1.
--dry-runflag (common-plat@a418a23e). -
B2. Idempotency guard via
*.bakdetection +--forceoverride (common-plat@a418a23e). -
B3.
.docker-deps/and*.bakin.gitignoreon both pilots (clock + peakpulse). Verified bydocker-doctor.sh. -
B4. Pre-commit hook landed. Canonical guard script
check-docker-prep-staged.sh(common-plat@c908c6d7) blocks rewrittenpackage.json, staged.tgztarballs, and.bakfiles. Wired into both pilot.husky/pre-commit(clock@4f8086bfa,peakpulse@c3195c8). Verified with simulated staged tarballs → commit blocked.Original spec:
# .husky/pre-commit if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first." exit 1 fi if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then echo "ERROR: docker-prep.sh artifacts staged. Run --restore first." exit 1 fi -
B5. Auto-restore on script error via
trap cleanup_on_error EXIT+--keepopt-out (common-plat@a418a23e). -
B6. Standardized header + usage block per § 7.4 template (
common-plat@a418a23e). -
B7. CANONICAL HOME landed.
- B7-1. Canonical at
learning_ai_common_plat/scripts/docker-prep.template.sh+ 2 helpers_docker-prep-inject.js,_docker-prep-strip.js(common-plat@a418a23e). - B7-2.
learning_ai_common_plat/scripts/sync-docker-prep.shsyncs all 3 files (mirrorssync-npmrc.sh). - B7-3.
learning_ai_common_plat/scripts/check-docker-prep-drift.shfor CI (mirrorscheck-npmrc-drift.sh). - B7-4. AGENTS.md "NEVER edit
docker-prep.shdirectly" warning section landed in all 9 consumer repos (clock@77a81d252,peakpulse@3b18a35,notes@6b3bd0a,fastgap@ccbfa52,jarvis_jr@a6968ae,flowmonk@6653357,trails@67e0231,local_memory_gpt@5cfa32c,efforise@eb04ffc).
- B7-1. Canonical at
-
B8.
--strip-overridesoption removespnpm.overridesblock as a safety net (common-plat@a418a23e). -
B+.
--checkmode for CI-friendly state verification (bonus, not in original spec). -
B+. Portable
sed -i(BSD on macOS, GNU on Linux). -
B+. Preserve
.docker-deps/.gitkeepon clear (fixes earlier regression where--restoredeleted the tracked file).
5. Phase C — Verification gates
Pilot exit criteria (must all pass before Phase D):
- C1. Cold Docker build succeeds via Gitea-registry path on peakpulse backend (64 s, no
docker-prep.shinvocation). - C2. Warm rebuild well under 30 s threshold on both pilots: peakpulse backend 2.6 s, clock backend 3.3 s.
- C3.
docker-prep.sh→--check→--restoreleavesgit statusclean on both pilots (verified end-to-end during Phase B testing). - C4. Pre-commit hook blocks staged tarballs +
.bakfiles (verified by simulating staged artifacts on clock). - [~] C5. Gitea Actions CI green — partially validated. Workflow YAML is well-formed in both pilots (
clock@4f8086bfa,peakpulse@c3195c8); local simulation of thedocker-lintjob (bash scripts/gitea/doctor.sh --quiet && bash scripts/docker-doctor.sh --quiet) exits 0 on both pilots. Gap: the pilot repos are not currently hosted on Gitea (http://localhost:3300/learning_ai_user/has onlylearning_ai_uxui_web), so the workflow file ships but the runner never fires. A dummygit push giteareturns 404. C5 will fully close when the pilot repos are mirrored to Gitea (seelearning_ai_common_plat/docs/runbooks/GITEA_VM_SETUP.mdfor the hosting setup). - C6. Build-time metrics already populated in § 3.A7 from earlier Phase A work.
- C7. ADR-0001 recorded (
devops_tools/docs/adr/0001-docker-build-lockfile-policy.md). - C8.
docker-doctor.shPASS on both pilots (only the 1 expectedpnpm-lock.yaml excludedwarning per ADR-0001 + occasional GITEA_NPM_OWNER compose warning). - C9. Web smoke test landed as Playwright spec
web/e2e/css-bundle-smoke.spec.ts(clock@b8440bfea). Asserts title sanity + largest CSS bundle > 20 KB. Catches F11 regression at PR time.
6. Phase D — Ecosystem rollout
Status: DONE for all 12 consumer repos. D.1 artifacts + D.2 Dockerfile/compose fixes + D.3 advisory-warning cleanup + B7-4 AGENTS.md notes. docker-doctor exits PASS in every repo. Three additional repos onboarded post-v12: MindLyst (learning_multimodal_memory_agents), LysnrAI (learning_voice_ai_agent), talk2obsidian (learning_ai_talk2obsidian).
D.1 — Tooling rollout (DONE)
All 9 consumer repos received the canonical infrastructure via sync-docker-prep.sh:
scripts/docker-prep.sh+_docker-prep-inject.js+_docker-prep-strip.js(canonical sync)scripts/docker-doctor.sh(thin wrapper to canonical linter)Makefilewithmake doctortarget
| Repo | Commit | Findings (docker-doctor warn-only) |
|---|---|---|
learning_ai_notes |
216ebb8 |
6 warnings + errors: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax directive |
learning_ai_fastgap |
36b67a2 |
4: F4/F14 .npmrc.docker hardcoded, F14 ARG missing, A5-2 wildcard, A2 syntax |
learning_ai_jarvis_jr |
523dc08 |
5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2) |
learning_ai_flowmonk |
65628f3 |
4: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax |
learning_ai_trails |
8aef82c |
6: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2) |
learning_ai_local_memory_gpt |
d17689a |
5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2) |
learning_ai_efforise |
b9fbbc3 |
5: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2) |
learning_multimodal_memory_agents (MindLyst) |
84a5d10 |
full playbook applied (mindlyst-native/web/Dockerfile + backend/Dockerfile) |
learning_voice_ai_agent (LysnrAI) |
0f1fa64 |
full playbook applied (backend + user-dashboard-web + backend-python — Python Dockerfile correctly skips Node checks) |
learning_ai_auth_app |
n/a | iOS/Android — no Docker surfaces |
learning_ai_talk2obsidian |
793089e |
lighter rollout — single-stage Dockerfile, no .docker-deps/ pattern; docker-doctor + Makefile + AGENTS.md note + syntax directive + .gitignore rules |
D.2 — Per-repo Dockerfile/compose fixes (DONE)
All 7 consumer repos received mechanical Phase D.2 fixes via an idempotent
fixer script. Each repo's docker-doctor.sh now exits PASS (warnings only).
| Repo | Fix commit | docker-doctor result |
|---|---|---|
learning_ai_notes |
b23a601 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_fastgap |
af2463d |
PASS (1 warning: ADR-0001 pnpm-lock.yaml) |
learning_ai_jarvis_jr |
1a97a3f |
PASS (1 warning: ADR-0001 pnpm-lock.yaml) |
learning_ai_flowmonk |
412a657 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_trails |
733477a |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_local_memory_gpt |
8c68595 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_efforise |
06ea0d0 |
PASS (1 warning: healthcheck start_period) |
Applied fixes (each fix is idempotent):
| Finding | Fix |
|---|---|
F12 healthcheck localhost |
Replaced with 127.0.0.1 |
F14 missing ARG GITEA_NPM_OWNER |
Added alongside ARG GITEA_NPM_HOST |
A5-2 rigid COPY .docker-deps/ |
Changed to wildcard COPY .docker-deps* ... |
| F11/F13 enumerated web config COPY | Replaced with glob COPY web/*.json web/*.ts web/*.mjs ./ |
| A2 missing syntax directive | Added # syntax=docker/dockerfile:1.7 |
F4/F14 hardcoded .npmrc.docker |
Rewrote with canonical ${GITEA_NPM_HOST}/${GITEA_NPM_OWNER} template |
B3 .gitignore missing *.bak |
Added rule |
B3 missing .docker-deps/.gitkeep |
Created |
D.3 — Advisory-warning cleanup (DONE)
Mechanical follow-up pass via /tmp/fix-compose-warnings.sh +
/tmp/add-build-args.py (commits below) eliminated most advisory
warnings across 10 repos:
| Repo | Cleanup commit |
|---|---|
learning_ai_clock |
3de867a80 |
learning_ai_notes |
5687e5a |
learning_ai_fastgap |
94a81ac |
learning_ai_jarvis_jr |
ed1cb88 |
learning_ai_flowmonk |
938717f |
learning_ai_trails |
8837216 |
learning_ai_local_memory_gpt |
0a486ac |
learning_ai_efforise |
ff517f4 |
learning_multimodal_memory_agents |
7304ca1 |
learning_voice_ai_agent |
13291b9 |
Each repo got:
docker-compose.yml: fullbuild.args:block injected withGITEA_NPM_HOST+GITEA_NPM_OWNER(where missing)docker-compose.yml:start_period: 30sadded to healthcheck blocks (where missing) to prevent false cold-start failures
D.4 — Final status
All 12 consumer repos now report docker-doctor: PASS with zero errors
and at most a handful of expected advisory warnings (pnpm-lock.yaml
excluded per ADR-0001; talk2obsidian's short-form build: . which would
need yaml conversion to declare args).
7. Reference snippets
7.1 Canonical .npmrc.docker
Matches the host-side .npmrc template shipped in common-plat 610a59fd.
@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
//${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true
7.2 Canonical backend Dockerfile
# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/backend
ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ARG USE_TARBALLS=false
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
# ── Deps layer (cacheable) ─────────────────────────────────────────
COPY .npmrc.docker ./.npmrc
COPY backend/package.json ./package.json
# Tolerate missing .docker-deps/ when in registry mode
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
# ── Source layer (changes most often) ──────────────────────────────
COPY backend/tsconfig.json ./tsconfig.json
COPY backend/src/ ./src/
COPY shared/ ../shared/
RUN pnpm run build
# ── Runtime ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE}
WORKDIR /app/backend
ENV NODE_ENV=production
COPY --from=builder /app/backend/node_modules ./node_modules
COPY --from=builder /app/backend/package.json ./package.json
COPY --from=builder /app/backend/dist ./dist
COPY shared/ ../shared/
EXPOSE 4010
CMD ["node", "dist/server.js"]
--lockfile=falseis intentional pending the A3 ADR. Switch to--frozen-lockfileonly once the sibling-workspace problem (F2) is resolved.
7.3 Canonical docker-compose.yml service block
services:
backend:
build:
context: .
dockerfile: backend/Dockerfile
args:
GITEA_NPM_HOST: host.docker.internal
secrets:
- gitea_npm_token
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
- "4010:4010"
environment:
- NODE_ENV=production
- PORT=4010
# ...
restart: unless-stopped
healthcheck:
# F12: use 127.0.0.1 NOT localhost (IPv6 resolution false-fails)
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:4010/health || exit 1"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
secrets:
gitea_npm_token:
environment: GITEA_NPM_TOKEN
7.4 Hardened docker-prep.sh header
#!/usr/bin/env bash
# Hermetic Docker-build helper. Packs @bytelyst/* tarballs from the sibling
# common-plat repo when the Gitea npm registry is unreachable.
#
# Use this ONLY when:
# - Local Gitea registry (:3300) is down or unreachable, OR
# - You need a Docker build that includes uncommitted common-plat changes.
#
# For normal builds (Gitea up + clean common-plat), use:
# docker compose build
#
# Usage:
# ./scripts/docker-prep.sh # pack tarballs + rewrite package.json
# ./scripts/docker-prep.sh --dry-run # show what would change (no side effects)
# ./scripts/docker-prep.sh --force # override idempotency guard
# ./scripts/docker-prep.sh --restore # undo rewrite
# ./scripts/docker-prep.sh --keep # skip auto-restore on error
# ./scripts/docker-prep.sh --strip-overrides # remove pnpm.overrides block
#
# Side effects:
# - Creates .docker-deps/ (gitignored)
# - Backs up package.json → package.json.bak
# - Rewrites @bytelyst/* deps to file:../.docker-deps/<tarball>
# - Injects pnpm.overrides for transitive @bytelyst/* deps
#
# Safety:
# - Refuses to run if .bak files already exist (unless --force)
# - Auto-restores on error (trap EXIT) unless --keep passed
# - Pre-commit hook blocks committing rewritten package.json, .tgz, .bak
7.5 Canonical Next.js web Dockerfile (addresses F11, F13)
# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS deps
WORKDIR /app/web
ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
COPY .npmrc.docker ./.npmrc
COPY web/package.json ./package.json
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
# ── Builder ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/web
COPY --from=deps /app/web/node_modules ./node_modules
COPY --from=deps /app/web/package.json ./package.json
# F11/F13 fix: glob ALL root-level config files instead of enumerating.
# Picks up postcss.config.*, tailwind.config.*, next.config.*, tsconfig*,
# any future *.config.* additions without Dockerfile changes.
COPY web/*.json web/*.ts web/*.mjs web/*.js web/*.cjs ./
COPY web/public/ ./public/
COPY web/src/ ./src/
COPY shared/ ../shared/
ARG NEXT_PUBLIC_BACKEND_URL
ARG NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=$NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_TELEMETRY_DISABLED=1
RUN corepack enable && pnpm run build
# ── Runtime (Next.js standalone) ───────────────────────────────────
FROM ${BASE_IMAGE} AS runner
WORKDIR /app/web
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
COPY --from=builder /app/web/.next/standalone ./
# Next 16 standalone server runs as `node web/server.js` from /app/web,
# so static assets live at /app/web/web/.next/static (NOT ./.next/static).
COPY --from=builder /app/web/.next/static ./web/.next/static
COPY --from=builder /app/web/public ./web/public
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME=0.0.0.0
CMD ["node", "web/server.js"]
Verification step after every web Dockerfile change: smoke-test the built image by running it and curling the rendered HTML. Confirm the CSS bundle in
<link>references is > 50 KB. A bundle of ~33 KB is the F11 signature (only@font-face, no Tailwind utilities).
7.6 docker-doctor.sh skeleton (Phase E)
#!/usr/bin/env bash
# docker-doctor.sh — pre-flight Dockerfile + docker-compose health checks.
# Run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore.
set -euo pipefail
REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
FAILED=0
# Check 1 (A8/F11/F13): every config file in web/ is COPY'd in web/Dockerfile
for cfg in postcss.config tailwind.config next.config; do
for f in "$REPO_DIR"/web/${cfg}.{js,mjs,cjs,ts}; do
[[ -f "$f" ]] || continue
base=$(basename "$f")
if ! grep -q "COPY web/${base}\\|COPY web/\\*" "$REPO_DIR/web/Dockerfile" 2>/dev/null; then
echo "✗ F11/F13: $base exists but not COPY'd in web/Dockerfile"
FAILED=1
fi
done
done
# Check 2 (A9/F12): healthchecks use 127.0.0.1
if grep -rE 'test:.*http://localhost' "$REPO_DIR"/docker-compose*.yml 2>/dev/null; then
echo "✗ F12: healthcheck uses localhost (should be 127.0.0.1)"
FAILED=1
fi
# Check 3: .npmrc.docker matches canonical template
if [[ -f "$REPO_DIR/.npmrc.docker" ]]; then
if ! grep -q '\${GITEA_NPM_HOST}' "$REPO_DIR/.npmrc.docker"; then
echo "✗ F4: .npmrc.docker doesn't use \${GITEA_NPM_HOST} placeholder"
FAILED=1
fi
fi
# Check 4: .dockerignore doesn't exclude pnpm-lock.yaml
if grep -q '^pnpm-lock\.yaml$' "$REPO_DIR/.dockerignore" 2>/dev/null; then
echo "⚠ F1: .dockerignore excludes pnpm-lock.yaml (blocks lockfile optimization)"
fi
# Check 5: base image is on approved list
for df in "$REPO_DIR"/{backend,web}/Dockerfile; do
[[ -f "$df" ]] || continue
if ! grep -qE 'FROM (\$\{BASE_IMAGE\}|node:22-(alpine|slim))' "$df"; then
echo "✗ Unapproved base image in $df"
FAILED=1
fi
done
exit $FAILED
8. Phase E — Observability / lint (NEW)
Two complementary linters:
gitea-doctor— Gitea registry pre-flight (env + token + connectivity). Already shipped incommon-platcommit610a59fdatscripts/gitea/doctor.sh. This roadmap only wires it into CI/build flows (A0-D + E0 below).docker-doctor— Dockerfile + compose-file static linter (see § 7.6 skeleton). To be built as part of this roadmap.
The two are intentionally separate concerns:
| Linter | Scope | When to run |
|---|---|---|
gitea-doctor |
runtime env, token, registry HTTP 200 | Before every build / deploy |
docker-doctor |
static analysis of Dockerfile + compose YAML | On every PR touching those files |
Phase E checklist
- E0. Wire
bash scripts/gitea/doctor.sh --quietinto every Gitea Actions CI workflow as a pre-build job (addresses F15). Pattern shipped incommon-plat; replicate via a reusableactions/gitea-preflight@maincomposite if Gitea Actions supports it, otherwise inline. - E1. Canonical
docker-doctor.shlanded inlearning_ai_common_plat/scripts/docker-doctor.sh(common-plat@130883a7). 15 checks codified from F1–F18; verified PASS on both pilots and FAIL on un-migrated control (learning_ai_notes). - E2. Per-repo wrappers landed:
clock@aa5202fe7,peakpulse@af207b7. - E3. Wire into CI: run on PRs touching
Dockerfile,docker-compose*.yml,.dockerignore,.npmrc.docker - E4. Wire into pre-commit hook (warning-only at first, error after 2 weeks)
- E5. Checks documented in
learning_ai_common_plat/AI.dev/SKILLS/docker-doctor.md(common-plat@130883a7). - E6. Add
make doctortarget to each pilot repo that runs bothgitea-doctorANDdocker-doctor
Checks implemented by docker-doctor.sh:
| Check | Addresses | Action |
|---|---|---|
Every web/*.config.* file is COPY'd |
F11, F13 | Error |
docker-compose.yml healthcheck uses 127.0.0.1 |
F12 | Error |
.npmrc.docker uses ${GITEA_NPM_HOST} AND ${GITEA_NPM_OWNER} placeholders |
F4, F14 | Error |
Dockerfile declares ARG GITEA_NPM_OWNER if it COPYs .npmrc.docker |
F14 | Error |
.dockerignore doesn't exclude pnpm-lock.yaml |
F1 | Warn (until A3 ADR lands) |
Base image is on approved list (node:22-alpine or node:22-slim via BASE_IMAGE ARG) |
Canonical decision | Error |
.docker-deps/ and *.bak in .gitignore |
B3 | Error |
docker-compose.yml passes GITEA_NPM_OWNER build arg |
F14 | Warn |
9. Open questions (numbered TODOs, not blockers)
- Shared pnpm cache volume? BuildKit caches are already shared across
builds by
id=pnpm. Test whether a named Docker volume adds anything before adding complexity. - Custom base image? Publish
bytelyst/node-pnpm:22{alpine,slim}with pnpm pre-installed to skip corepack. Cost: image maintenance; benefit: ~5 s/build. - CI hostname? Verify
host.docker.internal:host-gatewayworks in Gitea Actions Linux runners, or if a CI-specific Dockerfile variant is needed. - Multi-platform builds?
linux/amd64+linux/arm64interact awkwardly with cache mounts underbuildx. Defer to separate roadmap. - Workspace flattening? Eliminate the
../learning_ai_common_plat/packages/*workspace entry inside Docker via a flattenedpnpm-workspace.yaml. Unlocks--frozen-lockfile. Requires lockfile regeneration step.
10. Execution order
- ✅ v5 commit: roadmap doc v5 lands; F16 documented (
devops_tools@ba8b4d1). - ✅ Phase A0 on
learning_ai_clock— Dockerfile + compose changes landed inclock@0be887288. Initial A0-V blocked on F16/F17/F18. - ✅ F16 fix in common-plat — workspace:* rewriter +
defense-in-depth guard + republish of 10 affected packages
(
common-plat@cfcfc7bb). - ✅ F17 fix in common-plat + Gitea config —
ROOT_URL=host.docker.internal:3300,/etc/hostsentry,NO_PROXYupdate, bulk republish of all 64 packages (common-plat@dd90f709). - ✅ F18 fix in clock — 4
file:refs inweb/package.jsonrewritten to*(clock@8b5c767a3). - ✅ A0-V on clock PASSED. v6 commit lands (
devops_tools@7627d55). - ✅ A8 + A9 + A1 on clock (correctness + corepack) —
clock@f6a806ff3. Web cold dropped to 174 s; backend essentially flat at 60 s. F11 guard verified (Tailwind utilities present in CSS bundle). - ✅ A2 + A4 + A5 + A6 on clock (cache mount + dockerignore) —
clock@55e8d22d3. Warm rebuilds: backend 2.9 s, web 5.4 s (95–97% reduction). A7 metrics table populated this commit. - ✅ Phase A0 → A6 on
learning_ai_peakpulsebackend (peakpulse@11a6bc5). Cold 72.2 s, warm 2.7 s. Pattern from clock applied verbatim, plus a side fix for.docker-deps/.gitkeepdiscoverability that was also - ✅ A3 ADR —
docs/adr/0001-docker-build-lockfile-policy.md. Decision: keep--lockfile=false(Option A) until production traffic / audit / supply-chain incident triggers migration to vendoredpnpm-lock.docker.yaml(Option C). Implementation deferred. - ✅ Phase E1/E2/E5 —
docker-doctor.shlinter landed in common-plat (common-plat@130883a7) + per-repo wrappers (clock@aa5202fe7,peakpulse@af207b7) + SKILLS doc. Verified PASS on both pilots, FAIL with 6 specific findings on un-migrated control (learning_ai_notes). - ✅ Phase B —
docker-prep.shhardened + promoted to canonical home in common-plat (common-plat@a418a23e). Synced to both pilots (clock@27034d90f,peakpulse@563a45e). Verified end-to-end on both pilots: dry-run → pack → check (fail) → idempotency guard → restore →git statusclean. - ✅ Phase B4 + E3/E4/E6 — pre-commit guard
(
common-plat@c908c6d7) +.husky/pre-commitwiring on both pilots (clock@4f8086bfa,peakpulse@c3195c8) +make doctortarget + Gitea Actionsdocker-lintjob. Verified guard blocks simulated staged tarballs. - ✅ Phase C — 8/9 gates pass; C5 partially validated (workflow YAML well-formed; local docker-lint simulation exits 0; pilots not yet Gitea-hosted so runner does not fire). Cold build 64 s, warm 2.6 s / 3.3 s.
- ✅ Phase D.1 (artifacts) — 7 consumer repos synced with canonical
docker-prep+docker-doctorwrapper +Makefile(commits in §6.D.1). - ✅ Phase D.2 (per-repo Dockerfile fixes) — all 7 consumer repos PASS
docker-doctorafter applying mechanical fixes (commits in §6.D.2). Web smoke test (C9) landed on clock to guard F11 regression. - ✅ B7-4 AGENTS.md "do not edit" warnings — landed in all 12 consumer repos.
- ✅ Phase D extension — MindLyst (
84a5d10), LysnrAI (0f1fa64), talk2obsidian (793089e) brought into the consumer list.sync-docker-prep.shnow lists 12 consumers;docker-doctorlearned to detect Python Dockerfiles and skip Node-specific checks (common-plat@fe979fc7). - ✅ Phase D.3 advisory-warning cleanup — 10 repos received
mechanical
build.argsinjection +healthcheck.start_periodadditions. All 12 repos nowdocker-doctor: PASSwith zero errors. - ~ C5 partial validation (this session) — dummy commit pushed to
clock (
682f9629b/2f9c8c39a), confirmedgit push giteareturns 404 (pilot repos not hosted on Gitea — onlylearning_ai_uxui_webexists there). Workflow YAML validates; local docker-lint simulation exit 0. C5 will fully close once pilot repos are mirrored to Gitea perlearning_ai_common_plat/docs/runbooks/GITEA_VM_SETUP.md.
11. Risk register
| Risk | Mitigation |
|---|---|
Removing pnpm-lock.yaml from .dockerignore exposes a stale or sibling-aware lockfile that breaks Docker installs |
Keep --lockfile=false for now (A3 ADR); revisit after F2 resolution |
| BuildKit cache mount on shared CI runners causes cross-build interference | Use distinct id= per repo (id=pnpm-${repo}) if observed |
host.docker.internal doesn't resolve in Linux Docker |
extra_hosts: ["host.docker.internal:host-gateway"] (A0-4) |
Removing .docker-deps/ from default builds breaks repos that haven't done A0 yet |
Wildcard COPY .docker-deps* keeps both paths working during migration |
docker-prep.sh --force is misused and .bak files get committed |
Pre-commit hook (B4) blocks .bak, .tgz, rewritten package.json |
Corp network blocks host.docker.internal:3300 |
Verify SSH tunnel reaches Gitea; document in operations.md |
| F11 regression: build green, app ships with no CSS | C9 smoke test + Phase E docker-doctor.sh check on web/*.config.* COPY coverage |
| F12 regression: healthcheck false-fails on IPv6 | Phase E docker-doctor.sh grep for localhost in compose files |
| F13 regression: new config file added, Dockerfile forgotten | A8-2 glob COPY pattern (root cause fix) + Phase E lint (defense in depth) |
BASE_IMAGE override in notes diverges silently from canonical |
Phase E check approved list; document override in repo AGENTS.md |
| F14 regression: future Gitea owner rename re-introduces literal in some Dockerfile | Phase E docker-doctor.sh checks .npmrc.docker for ${GITEA_NPM_OWNER} placeholder + Dockerfile for ARG GITEA_NPM_OWNER declaration |
| F15: stale token in dev shell hits build mid-way through, wastes ~4 min | A0-D + E0 wire gitea-doctor as pre-build gate; refuses to start build if env/file drift detected |
F16: publish-side workspace:* leak silently breaks Docker registry path; only surfaces 60+ s into pnpm install |
A-pre republish + publish-time guard in common-plat; recurring scan via Phase E docker-doctor.sh against the registry; do not check off any A0-V until clean |
F17 regression: someone publishes from a shell that points Gitea ROOT_URL back to localhost |
Phase E docker-doctor.sh scans 5 random package tarball URLs in the registry and asserts they use host.docker.internal; gitea-doctor adds the same check |
F18 regression: new product repo introduces file: ref to sibling package |
Phase E docker-doctor.sh greps **/package.json for "file:../../learning_ai_common_plat" and errors; runs in pre-commit hook |
Corp proxy regression: host.docker.internal falls out of NO_PROXY on a dev machine |
switch-network.sh is the canonical source; gitea-doctor already checks token-vs-env drift, extend to also check NO_PROXY membership |