Captures audit findings on Dockerfile patterns across pilot repos (peakpulse, clock): - 10 concrete bugs documented (F1-F10): .dockerignore blocks pnpm-lock.yaml, sibling-workspace lockfile problem, .npmrc.docker inconsistencies, missing BuildKit cache mounts, etc. - Phase A0 added: fix Gitea-registry path before optimizing (without it, the 'default' path doesn't actually work) - Phase A1-A7: corepack, cache mounts, layer reordering, measurement - Phase B: docker-prep.sh hardening (dry-run, idempotency, auto-restore, pre-commit guard) - Phase C: 7 verification gates - Phase D: deferred 11-repo rollout checklist - ADR-pending lockfile policy decision (A3) - Risk register + 6 open questions
19 KiB
Docker Build Optimization Roadmap
Status: Draft v2 (post-audit) · Owner: Platform DevOps · Created: 2026-05-27 · Revised: 2026-05-27
Pilot Docker-build speed-ups + hermetic-fallback hardening on
learning_ai_peakpulseandlearning_ai_clock, then capture the playbook here for ecosystem-wide rollout.
0. Pre-flight audit findings (2026-05-27)
A read-only audit of the two pilot repos surfaced 10 concrete bugs/gaps
that contradict the casual narrative that "Gitea-registry is the default and
docker-prep.sh is the fallback." The actual state is closer to the inverse:
| # | Finding | Location | Severity |
|---|---|---|---|
| F1 | pnpm-lock.yaml is in .dockerignore — any lockfile-based optimization is blocked until removed |
peakpulse/.dockerignore, clock/.dockerignore |
High |
| F2 | pnpm-workspace.yaml references sibling ../learning_ai_common_plat/packages/* — --frozen-lockfile inside Docker will fail unless workspace is flattened or sibling tree is copied |
peakpulse/pnpm-workspace.yaml, clock/pnpm-workspace.yaml |
High |
| F3 | peakpulse/.npmrc.docker is tarball-only (no @bytelyst:registry=… line) — the "Gitea-registry" path doesn't actually work in this repo today |
peakpulse/.npmrc.docker |
High |
| F4 | clock/.npmrc.docker hardcodes http://localhost:3300 — from inside a Docker container localhost is the container itself, not the host registry |
clock/.npmrc.docker |
High |
| F5 | clock/backend/Dockerfile has neither ARG GITEA_NPM_HOST nor a BuildKit secret mount — it is wholly dependent on .docker-deps/ having been pre-populated |
clock/backend/Dockerfile |
High |
| F6 | clock/web/Dockerfile accepts ARG GITEA_NPM_HOST but never uses it and has no --mount=type=secret — passing the arg is a no-op |
clock/web/Dockerfile |
Medium |
| F7 | peakpulse/docker-compose.yml does not pass GITEA_NPM_HOST build arg or declare secrets: block, so docker compose build cannot use the Gitea path |
peakpulse/docker-compose.yml |
Medium |
| F8 | COPY .docker-deps/ is unconditional in every backend Dockerfile — every build requires either docker-prep.sh to have run OR an empty .docker-deps/ dir to pre-exist |
both repos | Medium |
| F9 | npm install -g pnpm@10.6.5 runs on every build (no corepack) — 5–10 s overhead, no pinning to packageManager field |
all four Dockerfiles | Low |
| F10 | No BuildKit --mount=type=cache for pnpm store — cold install on every rebuild even when deps unchanged |
all four Dockerfiles | High (the main speed win) |
Implication: the original plan to "switch to --frozen-lockfile + Gitea
registry" requires two upstream fixes first (F1, F2). The roadmap below
accounts for that.
1. Context: three build paths
| Path | Status today | Trigger | Notes |
|---|---|---|---|
docker-prep.sh tarballs |
De facto default in peakpulse + flowmonk; also works in clock | Run docker-prep.sh then docker compose build |
Hermetic; mutates package.json; slow to repack |
| Gitea NPM registry | Partially wired in clock + notes; broken in peakpulse | docker compose build with GITEA_NPM_HOST arg + secret |
Needs .npmrc.docker standardization to actually be default |
Legacy file: refs |
Deprecated | — | Removed during pnpm/Gitea migration |
Measurement targets
| Build | Baseline (observed) | Target after Phase A |
|---|---|---|
| Cold (no cache) | ~2–3 min | ≤ 2 min |
| Warm (one source file changed) | ~2–3 min | < 30 s |
docker-prep.sh pack step alone |
~60–90 s | < 30 s (pnpm pack cache) |
Fill in actuals during Phase C.
2. Goals & non-goals
Goals
- ✅ Cut warm rebuild time via BuildKit pnpm-store cache mount (the single biggest win)
- ✅ Make
docker-prep.shidempotent, safe to re-run, gitignore-clean - ✅ Standardize
.npmrc.dockeracross the ecosystem so the Gitea path actually works - ✅ Fix
docker-compose.ymlto passGITEA_NPM_HOST+ secrets so the registry path is usable without manual flags - ✅ Document which path to use when, and the trade-offs
Non-goals
- ❌ Migrating off pnpm or off the Gitea registry
- ❌ Adopting
--frozen-lockfileuntil F2 is resolved (sibling-workspace problem) - ❌ Publishing
@bytelyst/*to the public npm registry - ❌ Multi-platform builds (separate roadmap)
3. Phase A — Build speed + path correctness
Order matters: A0 must precede A1–A5 (you can't enable a path that doesn't work).
A0. Make the Gitea-registry path actually work (peakpulse + clock)
- A0-1. Standardize
.npmrc.dockerto use a templated host so it works on host (localhost) and inside Docker (host.docker.internal):@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/ //${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/:_authToken=${GITEA_NPM_TOKEN} strict-ssl=false - A0-2. Remove
pnpm-lock.yamlfrom.dockerignorein both repos (fixes F1) - A0-3. Add
GITEA_NPM_HOSTbuild arg +secrets:block to every service indocker-compose.yml:build: context: . dockerfile: backend/Dockerfile args: GITEA_NPM_HOST: host.docker.internal secrets: - gitea_npm_token secrets: gitea_npm_token: environment: GITEA_NPM_TOKEN - A0-4. Add
extra_hosts: ["host.docker.internal:host-gateway"]to each service so Linux Docker can resolve the host - A0-5. Document required env:
GITEA_NPM_TOKENmust be exported in the shell that runsdocker compose build
A1. Replace npm install -g pnpm@X with corepack
- A1-1. Replace lines
RUN npm install -g pnpm@10.6.5with:RUN corepack enable && corepack prepare pnpm@10.6.5 --activate - A1-2. Verify
packageManagerfield inbackend/package.jsonmatches (alreadypnpm@10.6.5in peakpulse)
A2. Add BuildKit pnpm-store cache mount
- A2-1. Set
# syntax=docker/dockerfile:1.7directive at top of every Dockerfile - A2-2. Wrap install step with cache mount:
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \ --mount=type=secret,id=gitea_npm_token \ export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \ pnpm install --ignore-scripts - A2-3. Verify cache hit on second build via
docker buildx duordocker history
A3. Decide lockfile policy (BLOCKED on F2 resolution)
Two options — pick one in a short ADR before implementing:
-
Option 1: Keep
--lockfile=false(current pragmatic approach)- ✅ No sibling-workspace complications
- ❌ No reproducibility guarantee inside Docker
- ❌ Slower installs (full resolution every build)
-
Option 2: Generate a Docker-only lockfile via
pnpm install --lockfile-onlyagainst a flattenedpackage.jsonthat resolves@bytelyst/*to semver- ✅ Reproducibility
- ✅ Faster installs
- ❌ New build step + tooling
- ❌ Drift risk between dev lockfile and Docker lockfile
-
A3-1. Write 1-page ADR (
docs/decisions/0001-docker-lockfile-policy.md) and pick Option 1 or 2 -
A3-2. Defer
--frozen-lockfileadoption until ADR lands
A4. Restructure layer order
- A4-1. Reorder COPY/RUN so deps install layer is
package.json+.npmrcONLY, then a separate layer forsrc/,tsconfig.json,shared/ - A4-2. Move all
ARGlines that affect deps install before the install step; moveNEXT_PUBLIC_*ARGs (clock web) closer to the build step
A5. Gate .docker-deps/ behind a build arg
- A5-1. Add
ARG USE_TARBALLS=falseto Dockerfile - A5-2. Conditionally copy:
(The wildcard tolerates a missing# Always-empty placeholder so COPY doesn't fail in registry mode RUN mkdir -p /app/.docker-deps COPY .docker-deps* /app/.docker-deps/.docker-deps/dir; works without enabling BuildKit COPY's--fromtricks.) - A5-3. Verify
.docker-deps/is in.gitignoreand.dockerignoreis NOT excluding it when tarball mode is in use
A6. .dockerignore audit
- A6-1. Confirm exclusions:
node_modules,**/node_modules,dist,.next,*.log,.env,.env.*,.git,*.bak - A6-2. Remove:
pnpm-lock.yamlexclusion (was correct under--lockfile=false, blocks future optimization) - A6-3. Confirm
.docker-deps/is NOT excluded when tarball path is active
A7. Measure & record
| Repo | Surface | Cold before | Cold after | Warm before | Warm after | Notes |
|---|---|---|---|---|---|---|
| peakpulse | backend | — | — | — | — | |
| clock | backend | — | — | — | — | |
| clock | web | — | — | — | — |
Use:
time DOCKER_BUILDKIT=1 docker compose build --no-cache backend # cold
touch backend/src/server.ts && time docker compose build backend # warm
4. Phase B — Hermetic-fallback polish (docker-prep.sh)
The script is duplicated with minor variations across product repos. Pilot in peakpulse + clock, then propose a canonical home.
- B1. Add
--dry-runflag — list packs/rewrites, no side effects - B2. Idempotency guard — refuse to run if any
*.bakexists unless--force - B3. Ensure
.docker-deps/and*.bakare in.gitignoreof every pilot repo - B4. Pre-commit hook (husky) — block commits containing
"file:../.docker-deps/"inside anypackage.json. Add to.husky/pre-commit:if git diff --cached --name-only | xargs grep -l '"file:\.\./\.docker-deps/' 2>/dev/null; then echo "ERROR: rewritten package.json detected. Run scripts/docker-prep.sh --restore first." exit 1 fi - B5. Auto-restore on script error via
trap restore_on_error EXIT(unless--keeppassed) - B6. Update script header comment with explicit "use only when Gitea unreachable OR you need uncommitted common-plat changes"
- B7. Propose canonical home:
learning_ai_common_plat/scripts/docker-prep.template.sh+sync-docker-prep.sh(mirrors.npmrctemplate pattern). Defer execution to Phase D. - B8. Add a
--strip-overridesoption that removespnpm.overridesblock after build, in case--restoreis forgotten (additional safety net)
5. Phase C — Verification gates
Pilot exit criteria (must all pass before Phase D):
- C1. Cold Docker build succeeds on both pilots via Gitea-registry path (no
docker-prep.shinvocation) - C2. Warm rebuild (single source file touched) < 30 s on both pilots
- C3.
docker-prep.sh→docker compose build→--restoreleavesgit statusclean - C4. Pre-commit hook blocks a deliberately-staged rewritten
package.json - C5. Gitea Actions CI green on both pilots (verify CI uses the same Dockerfile path)
- C6. Build-time metrics filled into the table in § 3.A7
- C7. Decision recorded in ADR for A3 (lockfile policy)
6. Phase D — Ecosystem rollout (deferred until § 5 passes)
Apply Phase A0 → A2 + A4 → A6 + B to remaining repos. Pilots excluded.
| Repo | Backend | Web | docker-prep | Notes |
|---|---|---|---|---|
learning_ai_notes |
☐ | ☐ | ☐ | Uses node:22-slim (corp proxy / Alpine SSL issue) |
learning_ai_fastgap |
☐ | ☐ | ☐ | Mobile + web + backend |
learning_ai_jarvis_jr |
☐ | ☐ | ☐ | |
learning_ai_flowmonk |
☐ | ☐ | ☐ | .npmrc.docker is tarball-only — needs A0-1 |
learning_ai_trails |
☐ | ☐ | ☐ | |
learning_ai_local_memory_gpt |
☐ | ☐ | ☐ | SQLite-based, no Cosmos |
learning_multimodal_memory_agents (MindLyst) |
☐ | ☐ | ☐ | KMP repo, different layout |
learning_voice_ai_agent (LysnrAI) |
☐ | ☐ | ☐ | Python desktop + TS dashboards |
learning_ai_efforise |
☐ | ☐ | ☐ | |
learning_ai_auth_app |
☐ | n/a | ☐ | iOS/Android — no backend Dockerfile |
learning_ai_talk2obsidian |
☐ | ☐ | ☐ | Single-container app |
7. Reference snippets
7.1 Canonical .npmrc.docker
@bytelyst:registry=http://${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/
//${GITEA_NPM_HOST}:3300/api/packages/learning_ai_user/npm/:_authToken=${GITEA_NPM_TOKEN}
strict-ssl=false
auto-install-peers=true
7.2 Canonical backend Dockerfile (post Phase A)
# syntax=docker/dockerfile:1.7
FROM node:22-alpine AS builder
WORKDIR /app/backend
ARG GITEA_NPM_HOST=host.docker.internal
ARG USE_TARBALLS=false
ENV NODE_TLS_REJECT_UNAUTHORIZED=0
ENV NPM_CONFIG_STRICT_SSL=false
ENV GITEA_NPM_HOST=$GITEA_NPM_HOST
RUN corepack enable && corepack prepare pnpm@10.6.5 --activate
# ── Deps layer (cacheable) ─────────────────────────────────────────
COPY .npmrc.docker ./.npmrc
COPY backend/package.json ./package.json
# Tolerate missing .docker-deps/ when in registry mode (wildcard match)
RUN mkdir -p /app/.docker-deps
COPY .docker-deps* /app/.docker-deps/
RUN --mount=type=cache,id=pnpm,target=/root/.local/share/pnpm/store \
--mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token 2>/dev/null || echo '')" && \
pnpm install --ignore-scripts --lockfile=false
# ── Source layer (changes most often) ──────────────────────────────
COPY backend/tsconfig.json ./tsconfig.json
COPY backend/src/ ./src/
COPY shared/ ../shared/
RUN pnpm run build
# ── Runtime ────────────────────────────────────────────────────────
FROM node:22-alpine
WORKDIR /app/backend
ENV NODE_ENV=production
COPY --from=builder /app/backend/node_modules ./node_modules
COPY --from=builder /app/backend/package.json ./package.json
COPY --from=builder /app/backend/dist ./dist
COPY shared/ ../shared/
EXPOSE 4010
CMD ["node", "dist/server.js"]
--lockfile=falseis intentional pending the A3 ADR. Switch to--frozen-lockfileonce the sibling-workspace problem (F2) is resolved.
7.3 Canonical docker-compose.yml service block
services:
backend:
build:
context: .
dockerfile: backend/Dockerfile
args:
GITEA_NPM_HOST: host.docker.internal
secrets:
- gitea_npm_token
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
- "4010:4010"
environment:
- NODE_ENV=production
# ...
restart: unless-stopped
secrets:
gitea_npm_token:
environment: GITEA_NPM_TOKEN
7.4 Hardened docker-prep.sh header
#!/usr/bin/env bash
# Hermetic Docker-build helper. Packs @bytelyst/* tarballs from the sibling
# common-plat repo when the Gitea npm registry is unreachable.
#
# Use this ONLY when:
# - Local Gitea registry (:3300) is down or unreachable, OR
# - You need a Docker build that includes uncommitted common-plat changes.
#
# For normal builds (Gitea up + clean common-plat), use:
# docker compose build
#
# Usage:
# ./scripts/docker-prep.sh # pack tarballs + rewrite package.json
# ./scripts/docker-prep.sh --dry-run # show what would change (no side effects)
# ./scripts/docker-prep.sh --force # override idempotency guard
# ./scripts/docker-prep.sh --restore # undo rewrite
# ./scripts/docker-prep.sh --keep # skip auto-restore on error
#
# Side effects:
# - Creates .docker-deps/ (gitignored)
# - Backs up package.json → package.json.bak
# - Rewrites @bytelyst/* deps to file:../.docker-deps/<tarball>
# - Injects pnpm.overrides for transitive @bytelyst/* deps
#
# Safety:
# - Refuses to run if .bak files already exist (unless --force)
# - Auto-restores on error (trap EXIT) unless --keep passed
# - Pre-commit hook blocks committing rewritten package.json
8. Open questions (numbered TODOs, not blockers)
- Shared pnpm cache volume? Should the BuildKit pnpm store cache be shared
across all 13 repos via a named Docker volume (
pnpm-store) instead of per-repo BuildKit caches keyed byid=pnpm? (BuildKit caches are already shared byid=— verify before adding volume complexity.) - Custom base image? Publish
bytelyst/node-pnpm:22with pnpm pre-installed to skip the corepack step entirely. Cost: maintenance of a base image; benefit: ~5 s/build × 13 repos × N builds/day. - CI hostname? Gitea Actions runs builds with
--add-hostto reach the registry. Ishost.docker.internal:host-gatewayportable to Linux CI runners, or do we need a CI-specific Dockerfile variant? - Canonical script home?
docker-prep.shis currently per-repo with drift. Move tolearning_ai_common_plat/scripts/docker-prep.template.shwith async-docker-prep.sh(mirrors.npmrctemplate pattern)? - Multi-platform builds? Any need for
linux/amd64+linux/arm64images? If yes, BuildKit cache mounts interact awkwardly withbuildx--platform. Defer to separate roadmap. - Workspace flattening? Could we eliminate the
../learning_ai_common_plat/packages/*workspace entry inside Docker by building with a flattenedpnpm-workspace.yaml(only localbackend/)? This unlocks--frozen-lockfile. Requires lockfile regeneration step.
9. Execution order
- Now (this commit): roadmap doc lands here; sign-off requested.
- A0 first — fix
.npmrc.docker,docker-compose.yml,.dockerignoreon both pilots. Without this, the Gitea path doesn't work and no measurement is possible. - A1 + A2 on peakpulse backend. Measure. Commit.
- A1 + A2 on clock backend, then clock web. Measure. Commit.
- A4 + A5 + A6 on all three surfaces. Commit.
- A3 ADR — decide lockfile policy (defer implementation).
- A7 — fill in metrics table.
- Phase B — harden
docker-prep.shon peakpulse, then mirror to clock. - Phase C — verification gates C1–C7.
- Phase D — scheduled separately, only after § 5 passes.
10. Risk register
| Risk | Mitigation |
|---|---|
Removing pnpm-lock.yaml from .dockerignore exposes a stale or sibling-aware lockfile that breaks Docker installs |
Keep --lockfile=false for now (A3 ADR); revisit after F2 resolution |
| BuildKit cache mount on shared CI runners causes cross-build interference | Use distinct id= per repo (id=pnpm-${repo}) if observed |
host.docker.internal doesn't resolve in Linux Docker |
extra_hosts: ["host.docker.internal:host-gateway"] (added in A0-4) |
Removing .docker-deps/ from default builds breaks repos that haven't done A0 yet |
Wildcard COPY .docker-deps* keeps both paths working during migration |
docker-prep.sh --force is misused and .bak files get committed |
Pre-commit hook (B4) blocks this regardless |
Corp network blocks host.docker.internal:3300 |
Verify SSH tunnel (localhost:3300 from host) reaches Gitea; document in operations.md |