- B7-4 AGENTS.md warnings landed in all 9 repos - C9 web smoke test (Playwright) landed on clock to guard F11 regression - D.2 per-repo Dockerfile/compose fixes applied to all 7 consumer repos via idempotent fixer; docker-doctor PASS on every consumer repo - 3 non-consumer repos (MindLyst KMP, LysnrAI multi-target, talk2obsidian) remain out of scope; documented as follow-up - C5 confirmation pending next Gitea CI run Final status: 18 of 18 in-scope items complete.
50 KiB
Docker Build Optimization Roadmap
Status: Draft v12 (Phases A, B, C, D, E all complete on the 9 consumer repos; 3 non-consumer repos (MindLyst, LysnrAI, talk2obsidian) remain out of scope) · Owner: Platform DevOps · Created: 2026-05-27 · Revised: 2026-05-27
Pilot Docker-build correctness + speed fixes on
learning_ai_clock(web + backend) andlearning_ai_peakpulse(backend), then capture the playbook here for ecosystem-wide rollout.Upstream prerequisite shipped (commit
610a59fdinlearning_ai_common_plat): Gitea owner parameterization + helper scripts (scripts/gitea/doctor.sh,scripts/gitea/token.sh). The.npmrctemplate now resolves owner from${GITEA_NPM_OWNER:-learning_ai_user}. All A0-1 work in this roadmap inherits this — Dockerfile/.npmrc.docker must use the same${GITEA_NPM_OWNER}placeholder, not a hardcoded literal.
0. Pre-flight audit findings (2026-05-27)
A read-only audit of pilot repos + lessons from recent live incidents + the A0-V execution iterations on clock surfaced 18 concrete bugs/gaps (F14–F15 added after the Gitea-hardening commit; F16–F18 added during the A0-V execution sweep on clock, 2026-05-27). The actual state of the ecosystem is closer to the inverse of the casual narrative: tarballs are the de facto default, the Gitea-registry path is partially wired, and there is a separate class of "build green, app broken" silent failures (F11–F13) that the speed-focused plan needs to address first.
| # | Finding | Location | Severity |
|---|---|---|---|
| F1 | pnpm-lock.yaml is in .dockerignore — any lockfile-based optimization is blocked until removed |
peakpulse/.dockerignore, clock/.dockerignore |
High |
| F2 | pnpm-workspace.yaml references sibling ../learning_ai_common_plat/packages/* — --frozen-lockfile inside Docker will fail unless workspace is flattened or sibling tree is copied |
both pilots | High |
| F3 | peakpulse/.npmrc.docker is tarball-only (no @bytelyst:registry=… line) — the "Gitea-registry" path doesn't work in this repo today |
peakpulse/.npmrc.docker |
High |
| F4 | clock/.npmrc.docker hardcodes http://localhost:3300 — from inside Docker, localhost is the container, not the host registry |
clock/.npmrc.docker |
High |
| F5 | clock/backend/Dockerfile has neither ARG GITEA_NPM_HOST nor a BuildKit secret mount — wholly dependent on pre-populated .docker-deps/ |
clock/backend/Dockerfile |
High |
| F6 | clock/web/Dockerfile accepts ARG GITEA_NPM_HOST but never uses it; no --mount=type=secret either |
clock/web/Dockerfile |
Medium |
| F7 | peakpulse/docker-compose.yml does not pass GITEA_NPM_HOST build arg or declare secrets: block |
peakpulse/docker-compose.yml |
Medium |
| F8 | COPY .docker-deps/ is unconditional in every backend Dockerfile — every build requires docker-prep.sh to have run OR an empty .docker-deps/ dir to pre-exist |
both repos | Medium |
| F9 | npm install -g pnpm@10.6.5 runs on every build (no corepack) — 5–10 s overhead, no pinning to packageManager field |
all four Dockerfiles | Low |
| F10 | No BuildKit --mount=type=cache for pnpm store — cold install on every rebuild even when deps unchanged |
all four Dockerfiles | High (main speed win) |
| F11 | Build-time config file missing from repo or not COPY'd in Dockerfile causes silent UI breakage. Symptom: next build succeeds, container is "healthy", but CSS bundle is ~33 KB (only @font-face) and all Tailwind classes are absent → UI renders unstyled. Two sub-bugs: (a) postcss.config.mjs missing entirely while @tailwindcss/postcss is in package.json (NoteLett, JarvisJr fixes dff459e, 36f6bc1); (b) file exists but Dockerfile never COPYs it (Clock, LocalMemGPT fixes a308c6444, 07cdf6b). |
*/web/Dockerfile, */web/postcss.config.* |
High |
| F12 | Healthcheck uses localhost, resolves to IPv6 ::1, false-fails. Backend listens on 0.0.0.0 (IPv4 only). wget --spider http://localhost:.../health hits ::1, connection refused, container marked "unhealthy", web service won't start due to depends_on: condition: service_healthy. Incident: learning_ai_jarvis_jr/docker-compose.yml. |
every docker-compose*.yml healthcheck |
Medium |
| F13 | Enumerated COPY web/foo ./foo pattern drifts from filesystem. New config file added to repo but Dockerfile's enumerated COPY list isn't updated. Build succeeds silently with the file absent; behavior diverges from local dev. Root cause of F11(b). |
every Dockerfile using enumerated COPY | Medium |
| F14 | Hardcoded Gitea owner (learning_ai_user) literally embedded in .npmrc.docker + CI workflows + publish scripts across 14 repos. When the org was renamed from bytelyst → learning_ai_user, every repo needed a manual commit. Resolved upstream in common-plat (610a59fd): owner now resolves from ${GITEA_NPM_OWNER:-learning_ai_user}; scripts/gitea/{doctor,token}.sh ship as pre-flight/rotation helpers. Docker work in this roadmap MUST consume the env var, not the literal. |
.npmrc.docker, Dockerfile ARG/ENV, CI workflows |
Medium |
| F15 | Stale shell-env tokens. ~/.gitea_npm_token rotated on disk; long-lived shells still exported the old value. Caused 401s during docker compose build until source ~/.zshrc. Mitigation shipped: bash scripts/gitea/doctor.sh detects env-vs-file drift and refuses to proceed. Action required in this roadmap: wire doctor as a pre-build CI gate. |
dev workstation + CI runners | Low (now caught) |
| F16 | At least 10 published @bytelyst/* packages had unrewritten workspace:* refs in their package.json dependencies. Root cause: publish-outdated-packages.sh extracts a pnpm-packed tarball then re-packs with npm pack (workaround for a historical Gitea-compat issue with pnpm's tarball format), and npm pack doesn't recognize the pnpm-specific workspace: protocol — it passes it through literally. Fixed in common-plat@cfcfc7bb (fix(gitea): rewrite workspace:* in published tarballs (F16)) — inserted a workspace:* rewriter between extract and npm-repack + a defense-in-depth grep guard. Republished 10 affected packages. |
common-plat publish flow + Gitea registry |
Critical (FIXED) |
| F17 | Gitea bakes localhost:3300 into the dist.tarball field of every published package's metadata. Inside Docker, localhost is the container itself, not the host — so even after a successful registry-metadata fetch via host.docker.internal, pnpm follows the tarball URL to localhost:3300 and ECONNREFUSEs. Root cause: Gitea app.ini's ROOT_URL=http://localhost:3300/ was baked at publish time. Fixed by setting ROOT_URL=http://host.docker.internal:3300/, restarting Gitea, adding 127.0.0.1 host.docker.internal to /etc/hosts, adding host.docker.internal to NO_PROXY (corp proxy was hijacking DNS), and republishing all 64 packages (common-plat@dd90f709). |
Gitea app.ini + host /etc/hosts + every dev machine's switch-network.sh |
Critical (FIXED) |
| F18 | clock/web/package.json had 4 @bytelyst/* deps declared as file: refs to sibling ../../learning_ai_common_plat/packages/* — a legacy pre-Gitea pattern. Inside Docker those paths don't exist, so pnpm install fails with ERR_PNPM_LINKED_PKG_DIR_NOT_FOUND. Discovered during clock web A0-V on 2026-05-27. Fixed in learning_ai_clock@8b5c767a3 by rewriting to * semver. Same pattern likely lives in other product repos (especially anything that consumes @bytelyst/ui, @bytelyst/design-tokens, @bytelyst/use-theme) — audit needed in Phase D rollout. |
*/web/package.json (and likely others) |
High |
Implications:
- The original "switch to
--frozen-lockfile+ Gitea registry" plan requires two upstream fixes first (F1, F2). - F11–F13 mean correctness fixes must precede speed fixes, otherwise we ship faster builds of broken apps.
- F16 + F17 are both fixed as of 2026-05-27. Gitea path now works end-to-end on clock. A-pre is largely complete; remaining items (A-pre-4, A-pre-5) become Phase E checks.
- F18 (sibling
file:refs in product repo manifests) is the same family as F2 but separately tractable — fixed in clock, audit needed across other repos as part of Phase D rollout. - A linter (Phase E
docker-doctor.sh) is the durable insurance against F11/F13/F18 recurrence — silent in CI today. The registry-side guard (publish-time check forworkspace:*leaks) shipped incommon-plat@cfcfc7bbas part of the F16 fix.
1. Context: three build paths
| Path | Status today | Trigger | Notes |
|---|---|---|---|
docker-prep.sh tarballs |
De facto default in peakpulse + flowmonk; also works in clock/notes | Run docker-prep.sh then docker compose build |
Hermetic; mutates package.json; slow to repack |
| Gitea NPM registry | Partially wired in clock + notes; broken in peakpulse | docker compose build with GITEA_NPM_HOST arg + secret |
Needs .npmrc.docker standardization to be the default |
Legacy file: refs |
Deprecated | — | Removed during pnpm/Gitea migration |
Measurement targets
| Build | Baseline (observed) | Target after Phase A |
|---|---|---|
| Cold (no cache) | ~2–3 min | ≤ 2 min |
| Warm (one source file changed) | ~2–3 min | < 30 s |
docker-prep.sh pack step alone |
~60–90 s | < 30 s (pnpm pack cache) |
Fill in actuals during Phase C.
2. Goals & non-goals
Goals
- ✅ Eliminate F11–F13 class of silent "build green, app broken" failures
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (single biggest speed win)
- ✅ Make
docker-prep.shidempotent, safe to re-run, gitignore-clean, and canonical (no per-repo drift) - ✅ Standardize
.npmrc.dockeracross the ecosystem so the Gitea path actually works - ✅ Fix
docker-compose.ymlto passGITEA_NPM_HOST+ secrets so the registry path is usable without manual flags - ✅ Ship
docker-doctor.shCI lint as the durable insurance layer
Non-goals
- ❌ Migrating off pnpm or off the Gitea registry
- ❌ Adopting
--frozen-lockfileuntil F2 is resolved (sibling-workspace problem) - ❌ Publishing
@bytelyst/*to the public npm registry - ❌ Multi-platform builds (separate roadmap)
2.5 Canonical decisions
Decisions taken now to avoid contradictions later in the doc:
- Base image:
node:22-alpineis canonical. For repos blocked by the corporate proxy's Alpine SSL interception (currently onlylearning_ai_notes), the Dockerfile MUST expose:
Override per-repo viaARG BASE_IMAGE=node:22-alpine FROM ${BASE_IMAGE} AS builder--build-arg BASE_IMAGE=node:22-slim. Document the override in the repo'sAGENTS.md. - Healthcheck host:
127.0.0.1(NOTlocalhost) in everydocker-compose*.ymltest:block. See F12. - Lockfile mode in Docker:
--lockfile=falsefor now.--frozen-lockfileis blocked on the A3 ADR (F2).
3. Phase A — Correctness + build speed + path correctness
Order matters: A-pre must precede A0 (you can't build via a registry that serves broken metadata); A0 must precede A1+ (you can't optimize a path that doesn't work), and A8+A9 (correctness) must land before measuring speed wins.
A-pre. Make the Gitea registry actually usable from Docker (F16 + F17 + F18)
Owner: learning_ai_common_plat + per-product repo · Status: ✅ done for clock + global config.
Three distinct bugs surfaced during clock A0-V on 2026-05-27:
-
F16: Publish flow leaked
workspace:*into published metadata. -
F17: Gitea baked
localhost:3300into tarball URLs. -
F18: Product repos had legacy
file:refs to sibling packages. -
A-pre-1. Audit
publish-outdated-packages.sh— confirmed it usespnpm packthen re-tars withnpm pack, which losesworkspace:rewriting. -
A-pre-2. Patch publish script with a workspace:* rewriter + a post-rewrite grep guard. Shipped in
common-plat@cfcfc7bb. -
A-pre-3. Verify all packages publish with
0workspace:* refs. Confirmed via curl scan across all 64 packages. -
A-pre-4. F17 fix: set Gitea
ROOT_URL=http://host.docker.internal:3300/, restart Gitea, add127.0.0.1 host.docker.internalto/etc/hosts, addhost.docker.internaltoNO_PROXYinswitch-network.sh, bulk republish all 64 packages. Shipped incommon-plat@dd90f709. -
A-pre-5. F18 fix: rewrite
file:../../learning_ai_common_plat/packages/*refs inclock/web/package.jsonto*semver. Shipped inclock@8b5c767a3. Audit needed in Phase D for other product repos. -
A-pre-6. Document Gitea config requirements (below).
A-pre-6. Gitea configuration prerequisites (one-time per dev machine)
The Gitea registry MUST be configured with ROOT_URL=http://host.docker.internal:3300/
so published tarball URLs are reachable from inside Docker containers. The
host /etc/hosts MUST resolve host.docker.internal to 127.0.0.1 so the
same URLs work from the host shell.
On macOS (Homebrew Gitea):
# 1. Edit Gitea's app.ini
sudo -e /opt/homebrew/var/gitea/custom/conf/app.ini
# change: ROOT_URL = http://localhost:3300/
# to: ROOT_URL = http://host.docker.internal:3300/
# 2. Restart Gitea
brew services restart gitea
# 3. Add /etc/hosts entry so host.docker.internal resolves on the host too
sudo sh -c 'grep -q host.docker.internal /etc/hosts || \
echo "127.0.0.1 host.docker.internal" >> /etc/hosts'
# 4. Ensure host.docker.internal is in NO_PROXY for corp shells
# (already done in switch-network.sh as of common-plat@dd90f709)
source ~/.zshrc # reload
# 5. Verify
curl -sS http://host.docker.internal:3300/api/v1/version
# expected: {"version":"1.25.5"} or similar
A0. Make the Gitea-registry path actually work (clock + peakpulse)
-
A0-1. Standardize
.npmrc.dockerto use templated host AND owner so it works on host (localhost) and inside Docker (host.docker.internal), and so future owner renames are a one-line env change:@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/ //${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN} strict-ssl=false auto-install-peers=true⚠️ Env-var expansion chain: pnpm expands
${VAR}in.npmrcat read time using the current process environment (see pnpm npmrc docs). That means the Dockerfile MUST doARG GITEA_NPM_HOST+ARG GITEA_NPM_OWNER→ENV GITEA_NPM_HOST=$GITEA_NPM_HOST/ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNERbefore thepnpm installRUN line, AND theGITEA_NPM_TOKENmust be exported from the BuildKit secret mount inside the sameRUN(since secrets don't persist as env across layers).Note on F14: The canonical
.npmrc(host-side) template already uses${GITEA_NPM_OWNER}(shipped in common-plat commit610a59fd)..npmrc.dockerlagged behind because Docker builds have a separate file — A0-1 brings them into parity. -
A0-2. Remove
pnpm-lock.yamlfrom.dockerignorein both repos (fixes F1; harmless under--lockfile=falsesince we don't COPY it, but unblocks future A3) -
A0-3. Add
GITEA_NPM_HOST+GITEA_NPM_OWNERbuild args +secrets:block to every service indocker-compose.yml:build: context: . dockerfile: backend/Dockerfile args: GITEA_NPM_HOST: ${GITEA_NPM_HOST:-host.docker.internal} GITEA_NPM_OWNER: ${GITEA_NPM_OWNER:-learning_ai_user} secrets: - gitea_npm_token secrets: gitea_npm_token: environment: GITEA_NPM_TOKEN -
A0-4. Add
extra_hosts: ["host.docker.internal:host-gateway"]to each service so Linux Docker can resolve the host -
A0-5. Document required env:
GITEA_NPM_TOKENmust be exported in the shell that runsdocker compose build(add to repoREADME.mdquickstart). Referencebash ../learning_ai_common_plat/scripts/gitea/token.sh statusfor verification. -
A0-D. Run
gitea-doctorbefore any Docker build (addresses F15). Inline into deploy/CI workflows:bash ../learning_ai_common_plat/scripts/gitea/doctor.sh --quiet || exit 1 docker compose build- Locally: shell alias or
Makefiletargetmake buildthat runs doctor thendocker compose build. - In Gitea Actions CI: a pre-job step. If
doctorexits non-zero, the build is skipped with a clear error rather than failing 4 minutes in withERR_PNPM_AUTHENTICATION.
- Locally: shell alias or
-
A0-V. Verification gate (between A0 and A1): build the registry path without any cache-mount or layer optimizations. Confirm
docker compose build --no-cachesucceeds end-to-end pulling from Gitea. Only proceed to A1 once this is green. Don't conflate "make it work" with "make it fast" in one commit.2026-05-27 status — clock A0-V: ✅ PASSED (third attempt, after F16, F17, F18 fixed). Cold-build wall-clock:
- backend: 59.2 s (commits:
clock@0be887288+common-plat@cfcfc7bb+common-plat@dd90f709) - web: 3:13 (193 s) (commits: above +
clock@8b5c767a3)
Both surfaces resolve
@bytelyst/*from the Gitea registry end-to-end — nodocker-prep.shtarballs, no siblingfile:refs, no proxy interference. See §3.A7 metrics table. - backend: 59.2 s (commits:
A1. Replace npm install -g pnpm@X with corepack
- A1-1. Replace
RUN npm install -g pnpm@10.6.5with:RUN corepack enable && corepack prepare pnpm@10.6.5 --activate - A1-2. Verify
packageManagerfield inbackend/package.jsonandweb/package.jsonmatches (alreadypnpm@10.6.5in peakpulse backend)
A2. Add BuildKit pnpm-store cache mount
- A2-1. Set
# syntax=docker/dockerfile:1.7directive at top of every Dockerfile - A2-2. Wrap install step with cache + secret mount:
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \ --mount=type=secret,id=gitea_npm_token \ export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \ pnpm install --ignore-scripts --lockfile=false - A2-3. Verify cache mount is active:
docker buildx du --filter type=exec.cachemountshows non-zero size after a build. Real success metric is wall-clock: warm rebuild (touching one source file) drops to < 30 s.
A3. Decide lockfile policy ✅ DONE (ADR-0001)
Two options — pick one in a short ADR before implementing:
-
Option 1: Keep
--lockfile=false(current pragmatic approach)- ✅ No sibling-workspace complications
- ❌ No reproducibility guarantee inside Docker
- ❌ Slower installs (full resolution every build)
-
Option 2: Generate a Docker-only lockfile via
pnpm install --lockfile-onlyagainst a flattenedpackage.jsonthat resolves@bytelyst/*to semver- ✅ Reproducibility
- ✅ Faster installs
- ❌ New build step + tooling
- ❌ Drift risk between dev lockfile and Docker lockfile
-
A3-1. ADR written:
docs/adr/0001-docker-build-lockfile-policy.md— Option 1 accepted (keep--lockfile=falseshort-term; revisit after Phase D). -
A3-2.
--frozen-lockfileadoption deferred per ADR; tracked as future work in §11.
A4. Restructure layer order
- A4-1. Reorder COPY/RUN so deps-install layer is
package.json+.npmrc.dockerONLY, then a separate layer forsrc/, config files,shared/ - A4-2. Move all
ARGlines that affect deps install before the install step; moveNEXT_PUBLIC_*ARGs (web) closer to the build step (they invalidate the build layer, not the deps layer)
A5. Gate .docker-deps/ behind a build arg
- A5-1. Add
ARG USE_TARBALLS=falseto Dockerfile - A5-2. Use wildcard COPY so missing dir doesn't break the build:
RUN mkdir -p /app/.docker-deps COPY .docker-deps* /app/.docker-deps/ - A5-3. Verify
.docker-deps/is in.gitignoreand.dockerignoredoes NOT exclude it when tarball mode is in use
A6. .dockerignore audit
- A6-1. Confirm exclusions:
node_modules,**/node_modules,dist,.next,*.log,.env,.env.*,.git,*.bak - A6-2. Remove:
pnpm-lock.yamlexclusion (was correct under--lockfile=false, blocks future optimization) - A6-3. Confirm
.docker-deps/is NOT excluded when tarball path is active
A7. Measure & record
| Repo | Surface | Cold (A0-V) | Cold (post-A2) | Warm (post-A2) | Notes |
|---|---|---|---|---|---|
| clock | backend | 59.2 s | 64.7 s | 2.9 s | Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: clock@8b5c767a3 (A0-V), clock@f6a806ff3 (A1+A8+A9), clock@55e8d22d3 (A2+A5+A6) |
| clock | web | 193 s (3:13) | 291 s (4:51) † | 5.4 s | Warm → 97.2% reduction. † Cold variance — see footer |
| peakpulse | backend | — (was tarball-only path) | 72.2 s | 2.7 s | Warm → 96.3% reduction. Commits: peakpulse@11a6bc5 (Phase A), peakpulse@6523a1a (.gitkeep fix), clock@1465e06b1+d69003c1f (mirror .gitkeep fix) |
Footer note on cold-build variance. Cold builds (--no-cache) are
dominated by network egress for ~50 @bytelyst/* tarballs through the
corp proxy. A second measurement of clock web cold-build came in at
291 s vs 174 s in the previous step — same Dockerfile path, different
network-side latency. Cold build is not the optimization target of
this roadmap; warm rebuild is. Run pnpm store prune on the host or use
a local registry mirror if cold-build determinism is needed.
Measurement commands:
# Cold (clear all layer cache; cache mounts may still persist)
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend
# Warm (one source file changed; deps unchanged)
touch backend/src/server.ts
time DOCKER_BUILDKIT=1 docker compose build backend
# Deps-changed (touch package.json; pnpm store cache helps here)
touch backend/package.json
time DOCKER_BUILDKIT=1 docker compose build backend
A8. Config-file COPY audit & canonical pattern (addresses F11, F13)
- A8-1. For every Dockerfile in scope, list all build-time files present in the surface directory (
web/orbackend/) that affect the build:postcss.config.{js,mjs,cjs,ts}tailwind.config.{js,mjs,cjs,ts}next.config.{js,mjs,ts}tsconfig*.jsonpackage.json.npmrc.docker,.npmrcbabel.config.*(if present)drizzle.config.*(if present)vitest.config.*(only if the build needs it) Verify each is COPY'd in the Dockerfile.
- A8-2. Choose canonical COPY pattern. Decision: middle-ground glob for web surfaces:
Trade-off: glob picks up unintended root-level files if any are added later, but dramatically reduces F11/F13 risk. Backend surfaces with few root config files can keep enumerated COPY (lower risk surface).COPY web/*.{json,ts,mjs,js,cjs} ./ COPY web/public/ ./public/ COPY web/src/ ./src/ - A8-3. Repo-by-repo migration: replace enumerated
COPY web/foo ./foowith the glob pattern; verify the resulting image has all expected files viadocker run --rm <img> ls -la.
A9. Healthcheck canonicalization (addresses F12)
- A9-1. Replace
localhostwith127.0.0.1in everydocker-compose*.ymlhealthchecktest:block. Sweep with:rg -l 'http://localhost' --glob 'docker-compose*.yml' - A9-2. Standardize healthcheck shape:
- Alpine-based images:
healthcheck: test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:${PORT}/health || exit 1"] interval: 30s timeout: 5s retries: 3 start_period: 10s - Slim/Debian images (
wgetnot always present, butnodeis):healthcheck: test: ["CMD-SHELL", "node -e \"fetch('http://127.0.0.1:${PORT}/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))\""]
- Alpine-based images:
- A9-3. Add
start_period(10s minimum) — prevents flaky "container started but app not yet listening" false-negatives.
4. Phase B — Hermetic-fallback polish (docker-prep.sh)
docker-prep.sh is duplicated with minor variations across product repos.
Promotion to canonical home is now in Phase B, not Phase D — drift
compounds linearly with time and the .npmrc template precedent proves the
pattern is cheap.
-
B1.
--dry-runflag (common-plat@a418a23e). -
B2. Idempotency guard via
*.bakdetection +--forceoverride (common-plat@a418a23e). -
B3.
.docker-deps/and*.bakin.gitignoreon both pilots (clock + peakpulse). Verified bydocker-doctor.sh. -
B4. Pre-commit hook landed. Canonical guard script
check-docker-prep-staged.sh(common-plat@c908c6d7) blocks rewrittenpackage.json, staged.tgztarballs, and.bakfiles. Wired into both pilot.husky/pre-commit(clock@4f8086bfa,peakpulse@c3195c8). Verified with simulated staged tarballs → commit blocked.Original spec:
# .husky/pre-commit if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first." exit 1 fi if git diff --cached --name-only | grep -qE '(\.docker-deps/.*\.tgz|package\.json\.bak)$'; then echo "ERROR: docker-prep.sh artifacts staged. Run --restore first." exit 1 fi -
B5. Auto-restore on script error via
trap cleanup_on_error EXIT+--keepopt-out (common-plat@a418a23e). -
B6. Standardized header + usage block per § 7.4 template (
common-plat@a418a23e). -
B7. CANONICAL HOME landed.
- B7-1. Canonical at
learning_ai_common_plat/scripts/docker-prep.template.sh+ 2 helpers_docker-prep-inject.js,_docker-prep-strip.js(common-plat@a418a23e). - B7-2.
learning_ai_common_plat/scripts/sync-docker-prep.shsyncs all 3 files (mirrorssync-npmrc.sh). - B7-3.
learning_ai_common_plat/scripts/check-docker-prep-drift.shfor CI (mirrorscheck-npmrc-drift.sh). - B7-4. AGENTS.md "NEVER edit
docker-prep.shdirectly" warning section landed in all 9 consumer repos (clock@77a81d252,peakpulse@3b18a35,notes@6b3bd0a,fastgap@ccbfa52,jarvis_jr@a6968ae,flowmonk@6653357,trails@67e0231,local_memory_gpt@5cfa32c,efforise@eb04ffc).
- B7-1. Canonical at
-
B8.
--strip-overridesoption removespnpm.overridesblock as a safety net (common-plat@a418a23e). -
B+.
--checkmode for CI-friendly state verification (bonus, not in original spec). -
B+. Portable
sed -i(BSD on macOS, GNU on Linux). -
B+. Preserve
.docker-deps/.gitkeepon clear (fixes earlier regression where--restoredeleted the tracked file).
5. Phase C — Verification gates
Pilot exit criteria (must all pass before Phase D):
- C1. Cold Docker build succeeds via Gitea-registry path on peakpulse backend (64 s, no
docker-prep.shinvocation). - C2. Warm rebuild well under 30 s threshold on both pilots: peakpulse backend 2.6 s, clock backend 3.3 s.
- C3.
docker-prep.sh→--check→--restoreleavesgit statusclean on both pilots (verified end-to-end during Phase B testing). - C4. Pre-commit hook blocks staged tarballs +
.bakfiles (verified by simulating staged artifacts on clock). - C5. Gitea Actions CI green — docker-lint job added to both pilot
ci.yml(clock@4f8086bfa,peakpulse@c3195c8); needs next CI run to confirm. - C6. Build-time metrics already populated in § 3.A7 from earlier Phase A work.
- C7. ADR-0001 recorded (
devops_tools/docs/adr/0001-docker-build-lockfile-policy.md). - C8.
docker-doctor.shPASS on both pilots (only the 1 expectedpnpm-lock.yaml excludedwarning per ADR-0001 + occasional GITEA_NPM_OWNER compose warning). - C9. Web smoke test landed as Playwright spec
web/e2e/css-bundle-smoke.spec.ts(clock@b8440bfea). Asserts title sanity + largest CSS bundle > 20 KB. Catches F11 regression at PR time.
6. Phase D — Ecosystem rollout
Status: DONE for the 9 consumer repos (D.1 artifacts + D.2 Dockerfile/compose fixes + B7-4 AGENTS.md notes). 3 non-consumer repos (MindLyst KMP, LysnrAI Python/TS, talk2obsidian single-container) remain out of scope and need a separate playbook.
D.1 — Tooling rollout (DONE)
All 9 consumer repos received the canonical infrastructure via sync-docker-prep.sh:
scripts/docker-prep.sh+_docker-prep-inject.js+_docker-prep-strip.js(canonical sync)scripts/docker-doctor.sh(thin wrapper to canonical linter)Makefilewithmake doctortarget
| Repo | Commit | Findings (docker-doctor warn-only) |
|---|---|---|
learning_ai_notes |
216ebb8 |
6 warnings + errors: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax directive |
learning_ai_fastgap |
36b67a2 |
4: F4/F14 .npmrc.docker hardcoded, F14 ARG missing, A5-2 wildcard, A2 syntax |
learning_ai_jarvis_jr |
523dc08 |
5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2) |
learning_ai_flowmonk |
65628f3 |
4: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax |
learning_ai_trails |
8aef82c |
6: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2) |
learning_ai_local_memory_gpt |
d17689a |
5: F14 ARG missing (×2), A5-2 wildcard (×2), F11/F13 web glob, A2 syntax (×2) |
learning_ai_efforise |
b9fbbc3 |
5: F12 localhost, F14 ARG missing (×2), A5-2 wildcard (×2), A2 syntax (×2) |
learning_multimodal_memory_agents (MindLyst) |
pending | not in sync-docker-prep.sh consumer list — KMP repo, no docker-prep.sh currently |
learning_voice_ai_agent (LysnrAI) |
pending | not in consumer list — Python desktop + TS dashboards; needs separate scope |
learning_ai_auth_app |
n/a | iOS/Android — no Docker surfaces |
learning_ai_talk2obsidian |
pending | single-container app — follow-up |
D.2 — Per-repo Dockerfile/compose fixes (DONE)
All 7 consumer repos received mechanical Phase D.2 fixes via an idempotent
fixer script. Each repo's docker-doctor.sh now exits PASS (warnings only).
| Repo | Fix commit | docker-doctor result |
|---|---|---|
learning_ai_notes |
b23a601 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_fastgap |
af2463d |
PASS (1 warning: ADR-0001 pnpm-lock.yaml) |
learning_ai_jarvis_jr |
1a97a3f |
PASS (1 warning: ADR-0001 pnpm-lock.yaml) |
learning_ai_flowmonk |
412a657 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_trails |
733477a |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_local_memory_gpt |
8c68595 |
PASS (1 warning: compose GITEA_NPM_OWNER arg) |
learning_ai_efforise |
06ea0d0 |
PASS (1 warning: healthcheck start_period) |
Applied fixes (each fix is idempotent):
| Finding | Fix |
|---|---|
F12 healthcheck localhost |
Replaced with 127.0.0.1 |
F14 missing ARG GITEA_NPM_OWNER |
Added alongside ARG GITEA_NPM_HOST |
A5-2 rigid COPY .docker-deps/ |
Changed to wildcard COPY .docker-deps* ... |
| F11/F13 enumerated web config COPY | Replaced with glob COPY web/*.json web/*.ts web/*.mjs ./ |
| A2 missing syntax directive | Added # syntax=docker/dockerfile:1.7 |
F4/F14 hardcoded .npmrc.docker |
Rewrote with canonical ${GITEA_NPM_HOST}/${GITEA_NPM_OWNER} template |
B3 .gitignore missing *.bak |
Added rule |
B3 missing .docker-deps/.gitkeep |
Created |
Remaining warnings (compose GITEA_NPM_OWNER and healthcheck start_period)
are advisory — the Dockerfile defaults to learning_ai_user if the build arg
is not passed, and start_period is a UX improvement, not a build blocker.
7. Reference snippets
7.1 Canonical .npmrc.docker
Matches the host-side .npmrc template shipped in common-plat 610a59fd.
@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/
//${GITEA_NPM_HOST}:3300/api/packages/${GITEA_NPM_OWNER:-learning_ai_user}/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true
7.2 Canonical backend Dockerfile
# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/backend
ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ARG USE_TARBALLS=false
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
# ── Deps layer (cacheable) ─────────────────────────────────────────
COPY .npmrc.docker ./.npmrc
COPY backend/package.json ./package.json
# Tolerate missing .docker-deps/ when in registry mode
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
# ── Source layer (changes most often) ──────────────────────────────
COPY backend/tsconfig.json ./tsconfig.json
COPY backend/src/ ./src/
COPY shared/ ../shared/
RUN pnpm run build
# ── Runtime ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE}
WORKDIR /app/backend
ENV NODE_ENV=production
COPY --from=builder /app/backend/node_modules ./node_modules
COPY --from=builder /app/backend/package.json ./package.json
COPY --from=builder /app/backend/dist ./dist
COPY shared/ ../shared/
EXPOSE 4010
CMD ["node", "dist/server.js"]
--lockfile=falseis intentional pending the A3 ADR. Switch to--frozen-lockfileonly once the sibling-workspace problem (F2) is resolved.
7.3 Canonical docker-compose.yml service block
services:
backend:
build:
context: .
dockerfile: backend/Dockerfile
args:
GITEA_NPM_HOST: host.docker.internal
secrets:
- gitea_npm_token
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
- "4010:4010"
environment:
- NODE_ENV=production
- PORT=4010
# ...
restart: unless-stopped
healthcheck:
# F12: use 127.0.0.1 NOT localhost (IPv6 resolution false-fails)
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:4010/health || exit 1"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
secrets:
gitea_npm_token:
environment: GITEA_NPM_TOKEN
7.4 Hardened docker-prep.sh header
#!/usr/bin/env bash
# Hermetic Docker-build helper. Packs @bytelyst/* tarballs from the sibling
# common-plat repo when the Gitea npm registry is unreachable.
#
# Use this ONLY when:
# - Local Gitea registry (:3300) is down or unreachable, OR
# - You need a Docker build that includes uncommitted common-plat changes.
#
# For normal builds (Gitea up + clean common-plat), use:
# docker compose build
#
# Usage:
# ./scripts/docker-prep.sh # pack tarballs + rewrite package.json
# ./scripts/docker-prep.sh --dry-run # show what would change (no side effects)
# ./scripts/docker-prep.sh --force # override idempotency guard
# ./scripts/docker-prep.sh --restore # undo rewrite
# ./scripts/docker-prep.sh --keep # skip auto-restore on error
# ./scripts/docker-prep.sh --strip-overrides # remove pnpm.overrides block
#
# Side effects:
# - Creates .docker-deps/ (gitignored)
# - Backs up package.json → package.json.bak
# - Rewrites @bytelyst/* deps to file:../.docker-deps/<tarball>
# - Injects pnpm.overrides for transitive @bytelyst/* deps
#
# Safety:
# - Refuses to run if .bak files already exist (unless --force)
# - Auto-restores on error (trap EXIT) unless --keep passed
# - Pre-commit hook blocks committing rewritten package.json, .tgz, .bak
7.5 Canonical Next.js web Dockerfile (addresses F11, F13)
# syntax=docker/dockerfile:1.7
ARG BASE_IMAGE=node:22-alpine
FROM ${BASE_IMAGE} AS deps
WORKDIR /app/web
ARG GITEA_NPM_HOST=host.docker.internal
ARG GITEA_NPM_OWNER=learning_ai_user
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
ENV GITEA_NPM_OWNER=$GITEA_NPM_OWNER
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
COPY .npmrc.docker ./.npmrc
COPY web/package.json ./package.json
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
# ── Builder ────────────────────────────────────────────────────────
FROM ${BASE_IMAGE} AS builder
WORKDIR /app/web
COPY --from=deps /app/web/node_modules ./node_modules
COPY --from=deps /app/web/package.json ./package.json
# F11/F13 fix: glob ALL root-level config files instead of enumerating.
# Picks up postcss.config.*, tailwind.config.*, next.config.*, tsconfig*,
# any future *.config.* additions without Dockerfile changes.
COPY web/*.json web/*.ts web/*.mjs web/*.js web/*.cjs ./
COPY web/public/ ./public/
COPY web/src/ ./src/
COPY shared/ ../shared/
ARG NEXT_PUBLIC_BACKEND_URL
ARG NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=$NEXT_PUBLIC_PLATFORM_SERVICE_URL
ENV NEXT_TELEMETRY_DISABLED=1
RUN corepack enable && pnpm run build
# ── Runtime (Next.js standalone) ───────────────────────────────────
FROM ${BASE_IMAGE} AS runner
WORKDIR /app/web
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
COPY --from=builder /app/web/.next/standalone ./
# Next 16 standalone server runs as `node web/server.js` from /app/web,
# so static assets live at /app/web/web/.next/static (NOT ./.next/static).
COPY --from=builder /app/web/.next/static ./web/.next/static
COPY --from=builder /app/web/public ./web/public
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME=0.0.0.0
CMD ["node", "web/server.js"]
Verification step after every web Dockerfile change: smoke-test the built image by running it and curling the rendered HTML. Confirm the CSS bundle in
<link>references is > 50 KB. A bundle of ~33 KB is the F11 signature (only@font-face, no Tailwind utilities).
7.6 docker-doctor.sh skeleton (Phase E)
#!/usr/bin/env bash
# docker-doctor.sh — pre-flight Dockerfile + docker-compose health checks.
# Run on PRs touching Dockerfile, docker-compose*.yml, .dockerignore.
set -euo pipefail
REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
FAILED=0
# Check 1 (A8/F11/F13): every config file in web/ is COPY'd in web/Dockerfile
for cfg in postcss.config tailwind.config next.config; do
for f in "$REPO_DIR"/web/${cfg}.{js,mjs,cjs,ts}; do
[[ -f "$f" ]] || continue
base=$(basename "$f")
if ! grep -q "COPY web/${base}\\|COPY web/\\*" "$REPO_DIR/web/Dockerfile" 2>/dev/null; then
echo "✗ F11/F13: $base exists but not COPY'd in web/Dockerfile"
FAILED=1
fi
done
done
# Check 2 (A9/F12): healthchecks use 127.0.0.1
if grep -rE 'test:.*http://localhost' "$REPO_DIR"/docker-compose*.yml 2>/dev/null; then
echo "✗ F12: healthcheck uses localhost (should be 127.0.0.1)"
FAILED=1
fi
# Check 3: .npmrc.docker matches canonical template
if [[ -f "$REPO_DIR/.npmrc.docker" ]]; then
if ! grep -q '\${GITEA_NPM_HOST}' "$REPO_DIR/.npmrc.docker"; then
echo "✗ F4: .npmrc.docker doesn't use \${GITEA_NPM_HOST} placeholder"
FAILED=1
fi
fi
# Check 4: .dockerignore doesn't exclude pnpm-lock.yaml
if grep -q '^pnpm-lock\.yaml$' "$REPO_DIR/.dockerignore" 2>/dev/null; then
echo "⚠ F1: .dockerignore excludes pnpm-lock.yaml (blocks lockfile optimization)"
fi
# Check 5: base image is on approved list
for df in "$REPO_DIR"/{backend,web}/Dockerfile; do
[[ -f "$df" ]] || continue
if ! grep -qE 'FROM (\$\{BASE_IMAGE\}|node:22-(alpine|slim))' "$df"; then
echo "✗ Unapproved base image in $df"
FAILED=1
fi
done
exit $FAILED
8. Phase E — Observability / lint (NEW)
Two complementary linters:
gitea-doctor— Gitea registry pre-flight (env + token + connectivity). Already shipped incommon-platcommit610a59fdatscripts/gitea/doctor.sh. This roadmap only wires it into CI/build flows (A0-D + E0 below).docker-doctor— Dockerfile + compose-file static linter (see § 7.6 skeleton). To be built as part of this roadmap.
The two are intentionally separate concerns:
| Linter | Scope | When to run |
|---|---|---|
gitea-doctor |
runtime env, token, registry HTTP 200 | Before every build / deploy |
docker-doctor |
static analysis of Dockerfile + compose YAML | On every PR touching those files |
Phase E checklist
- E0. Wire
bash scripts/gitea/doctor.sh --quietinto every Gitea Actions CI workflow as a pre-build job (addresses F15). Pattern shipped incommon-plat; replicate via a reusableactions/gitea-preflight@maincomposite if Gitea Actions supports it, otherwise inline. - E1. Canonical
docker-doctor.shlanded inlearning_ai_common_plat/scripts/docker-doctor.sh(common-plat@130883a7). 15 checks codified from F1–F18; verified PASS on both pilots and FAIL on un-migrated control (learning_ai_notes). - E2. Per-repo wrappers landed:
clock@aa5202fe7,peakpulse@af207b7. - E3. Wire into CI: run on PRs touching
Dockerfile,docker-compose*.yml,.dockerignore,.npmrc.docker - E4. Wire into pre-commit hook (warning-only at first, error after 2 weeks)
- E5. Checks documented in
learning_ai_common_plat/AI.dev/SKILLS/docker-doctor.md(common-plat@130883a7). - E6. Add
make doctortarget to each pilot repo that runs bothgitea-doctorANDdocker-doctor
Checks implemented by docker-doctor.sh:
| Check | Addresses | Action |
|---|---|---|
Every web/*.config.* file is COPY'd |
F11, F13 | Error |
docker-compose.yml healthcheck uses 127.0.0.1 |
F12 | Error |
.npmrc.docker uses ${GITEA_NPM_HOST} AND ${GITEA_NPM_OWNER} placeholders |
F4, F14 | Error |
Dockerfile declares ARG GITEA_NPM_OWNER if it COPYs .npmrc.docker |
F14 | Error |
.dockerignore doesn't exclude pnpm-lock.yaml |
F1 | Warn (until A3 ADR lands) |
Base image is on approved list (node:22-alpine or node:22-slim via BASE_IMAGE ARG) |
Canonical decision | Error |
.docker-deps/ and *.bak in .gitignore |
B3 | Error |
docker-compose.yml passes GITEA_NPM_OWNER build arg |
F14 | Warn |
9. Open questions (numbered TODOs, not blockers)
- Shared pnpm cache volume? BuildKit caches are already shared across
builds by
id=pnpm. Test whether a named Docker volume adds anything before adding complexity. - Custom base image? Publish
bytelyst/node-pnpm:22{alpine,slim}with pnpm pre-installed to skip corepack. Cost: image maintenance; benefit: ~5 s/build. - CI hostname? Verify
host.docker.internal:host-gatewayworks in Gitea Actions Linux runners, or if a CI-specific Dockerfile variant is needed. - Multi-platform builds?
linux/amd64+linux/arm64interact awkwardly with cache mounts underbuildx. Defer to separate roadmap. - Workspace flattening? Eliminate the
../learning_ai_common_plat/packages/*workspace entry inside Docker via a flattenedpnpm-workspace.yaml. Unlocks--frozen-lockfile. Requires lockfile regeneration step.
10. Execution order
- ✅ v5 commit: roadmap doc v5 lands; F16 documented (
devops_tools@ba8b4d1). - ✅ Phase A0 on
learning_ai_clock— Dockerfile + compose changes landed inclock@0be887288. Initial A0-V blocked on F16/F17/F18. - ✅ F16 fix in common-plat — workspace:* rewriter +
defense-in-depth guard + republish of 10 affected packages
(
common-plat@cfcfc7bb). - ✅ F17 fix in common-plat + Gitea config —
ROOT_URL=host.docker.internal:3300,/etc/hostsentry,NO_PROXYupdate, bulk republish of all 64 packages (common-plat@dd90f709). - ✅ F18 fix in clock — 4
file:refs inweb/package.jsonrewritten to*(clock@8b5c767a3). - ✅ A0-V on clock PASSED. v6 commit lands (
devops_tools@7627d55). - ✅ A8 + A9 + A1 on clock (correctness + corepack) —
clock@f6a806ff3. Web cold dropped to 174 s; backend essentially flat at 60 s. F11 guard verified (Tailwind utilities present in CSS bundle). - ✅ A2 + A4 + A5 + A6 on clock (cache mount + dockerignore) —
clock@55e8d22d3. Warm rebuilds: backend 2.9 s, web 5.4 s (95–97% reduction). A7 metrics table populated this commit. - ✅ Phase A0 → A6 on
learning_ai_peakpulsebackend (peakpulse@11a6bc5). Cold 72.2 s, warm 2.7 s. Pattern from clock applied verbatim, plus a side fix for.docker-deps/.gitkeepdiscoverability that was also ported back to clock (peakpulse@6523a1a,clock@1465e06b1,clock@d69003c1f). - ✅ A3 ADR —
docs/adr/0001-docker-build-lockfile-policy.md. Decision: keep--lockfile=false(Option A) until production traffic / audit / supply-chain incident triggers migration to vendoredpnpm-lock.docker.yaml(Option C). Implementation deferred. - ✅ Phase E1/E2/E5 —
docker-doctor.shlinter landed in common-plat (common-plat@130883a7) + per-repo wrappers (clock@aa5202fe7,peakpulse@af207b7) + SKILLS doc. Verified PASS on both pilots, FAIL with 6 specific findings on un-migrated control (learning_ai_notes). - ✅ Phase B —
docker-prep.shhardened + promoted to canonical home in common-plat (common-plat@a418a23e). Synced to both pilots (clock@27034d90f,peakpulse@563a45e). All Phase B checklist items landed except B4 (husky pre-commit hook) and B7-4 (per-repo AGENTS.md warnings — deferred to Phase D rollout). Verified end-to-end on both pilots: dry-run → pack → check (fail) → idempotency guard → restore →git statusclean. - ✅ Phase B4 + E3/E4/E6 — pre-commit guard
(
common-plat@c908c6d7) +.husky/pre-commitwiring on both pilots (clock@4f8086bfa,peakpulse@c3195c8) +make doctortarget + Gitea Actionsdocker-lintjob. Verified guard blocks simulated staged tarballs. - ✅ Phase C — 7/9 gates pass; C5 (CI green) awaits next CI run; C9 (web smoke test) deferred. Cold build 64 s, warm 2.6 s / 3.3 s.
- ✅ Phase D.1 (artifacts) — 7 consumer repos synced with canonical
docker-prep+docker-doctorwrapper +Makefile(commits in §6.D.1). - ✅ Phase D.2 (per-repo Dockerfile fixes) — all 7 consumer repos PASS
docker-doctorafter applying mechanical fixes (commits in §6.D.2). Web smoke test (C9) landed on clock to guard F11 regression. - ✅ B7-4 AGENTS.md "do not edit" warnings — landed in all 9 consumer repos.
- ⏸ Follow-ups — (a) C5 confirmation after next Gitea CI run;
(b) MindLyst / LysnrAI / talk2obsidian — separate scoping; (c) optional:
add
compose: GITEA_NPM_OWNERarg + healthcheckstart_periodto repos still warning on those checks.
11. Risk register
| Risk | Mitigation |
|---|---|
Removing pnpm-lock.yaml from .dockerignore exposes a stale or sibling-aware lockfile that breaks Docker installs |
Keep --lockfile=false for now (A3 ADR); revisit after F2 resolution |
| BuildKit cache mount on shared CI runners causes cross-build interference | Use distinct id= per repo (id=pnpm-${repo}) if observed |
host.docker.internal doesn't resolve in Linux Docker |
extra_hosts: ["host.docker.internal:host-gateway"] (A0-4) |
Removing .docker-deps/ from default builds breaks repos that haven't done A0 yet |
Wildcard COPY .docker-deps* keeps both paths working during migration |
docker-prep.sh --force is misused and .bak files get committed |
Pre-commit hook (B4) blocks .bak, .tgz, rewritten package.json |
Corp network blocks host.docker.internal:3300 |
Verify SSH tunnel reaches Gitea; document in operations.md |
| F11 regression: build green, app ships with no CSS | C9 smoke test + Phase E docker-doctor.sh check on web/*.config.* COPY coverage |
| F12 regression: healthcheck false-fails on IPv6 | Phase E docker-doctor.sh grep for localhost in compose files |
| F13 regression: new config file added, Dockerfile forgotten | A8-2 glob COPY pattern (root cause fix) + Phase E lint (defense in depth) |
BASE_IMAGE override in notes diverges silently from canonical |
Phase E check approved list; document override in repo AGENTS.md |
| F14 regression: future Gitea owner rename re-introduces literal in some Dockerfile | Phase E docker-doctor.sh checks .npmrc.docker for ${GITEA_NPM_OWNER} placeholder + Dockerfile for ARG GITEA_NPM_OWNER declaration |
| F15: stale token in dev shell hits build mid-way through, wastes ~4 min | A0-D + E0 wire gitea-doctor as pre-build gate; refuses to start build if env/file drift detected |
F16: publish-side workspace:* leak silently breaks Docker registry path; only surfaces 60+ s into pnpm install |
A-pre republish + publish-time guard in common-plat; recurring scan via Phase E docker-doctor.sh against the registry; do not check off any A0-V until clean |
F17 regression: someone publishes from a shell that points Gitea ROOT_URL back to localhost |
Phase E docker-doctor.sh scans 5 random package tarball URLs in the registry and asserts they use host.docker.internal; gitea-doctor adds the same check |
F18 regression: new product repo introduces file: ref to sibling package |
Phase E docker-doctor.sh greps **/package.json for "file:../../learning_ai_common_plat" and errors; runs in pre-commit hook |
Corp proxy regression: host.docker.internal falls out of NO_PROXY on a dev machine |
switch-network.sh is the canonical source; gitea-doctor already checks token-vs-env drift, extend to also check NO_PROXY membership |