Per § 10 steps 9 + 10.
Step 9: Peakpulse backend Phase A complete.
cold 72.2 s, warm 2.7 s (96.3% reduction). Pattern from clock applied
verbatim plus .docker-deps/.gitkeep discoverability fix back-ported
to clock. Commits:
peakpulse@11a6bc5 feat(docker): Phase A on peakpulse backend
peakpulse@6523a1a fix(docker): track .docker-deps/.gitkeep
clock@1465e06b1 fix(docker): track .docker-deps/.gitkeep
clock@d69003c1f chore: dedupe .docker-deps in .gitignore
Step 10: A3 ADR accepted.
New file: docs/adr/0001-docker-build-lockfile-policy.md
Decision: short-term Option A (--lockfile=false) — already shipped in
Phase A; long-term Option C (vendored pnpm-lock.docker.yaml). Migration
triggered by production deployment, audit requirement, supply-chain
incident, or loss of BuildKit cache. Implementation sketch in ADR § 4.
Roadmap doc updates:
- § A7 metrics table: peakpulse row populated (72.2 s / 2.7 s).
- § A3: collapsed bullet list into decision-record summary linking ADR.
- § 10: steps 9 + 10 marked ✅; status banner v7 → v8.
Next per § 10: step 11 (Phase B docker-prep hardening) or step 12
(Phase E docker-doctor.sh linter). Phase E is higher-value as durable
insurance against F11/F13/F16/F17/F18 regressions across the ecosystem.
222 lines
8.8 KiB
Markdown
222 lines
8.8 KiB
Markdown
# ADR-0001: Docker build lockfile policy
|
||
|
||
> **Status:** Accepted (decision); Deferred (implementation) · **Date:** 2026-05-27
|
||
> **Context:** docker-build-optimization-roadmap §A3 · **Supersedes:** None
|
||
> **Authors:** Platform DevOps
|
||
|
||
---
|
||
|
||
## 1. Context
|
||
|
||
The pilot Phase A work in `docker-build-optimization-roadmap` standardized
|
||
on `pnpm install --lockfile=false` inside Docker for both
|
||
`learning_ai_clock` (web + backend) and `learning_ai_peakpulse` (backend).
|
||
That choice unblocked Phase A by sidestepping a structural mismatch:
|
||
|
||
- `pnpm-lock.yaml` is generated against the **outer pnpm workspace**, which
|
||
includes `../learning_ai_common_plat/packages/*` as workspace members
|
||
(sibling-repo path).
|
||
- Inside the Docker build context, the sibling repo doesn't exist
|
||
(a single-repo build context is intentionally used for hermeticity).
|
||
- `--frozen-lockfile` therefore fails immediately with workspace
|
||
resolution errors (finding F2 in the roadmap audit).
|
||
|
||
`--lockfile=false` skips lockfile validation entirely and re-resolves all
|
||
dependencies against the registry on every `pnpm install`. This is
|
||
correct for the workspace-mismatch problem but introduces non-determinism:
|
||
the **same Dockerfile + same source tree can produce a different lockset**
|
||
across two builds if upstream `@bytelyst/*` versions move between them.
|
||
|
||
Phase A2's BuildKit cache mount mitigates the *speed* cost of
|
||
re-resolution but not the *determinism* cost.
|
||
|
||
This ADR records the decision on which long-term policy to adopt for
|
||
Docker builds. Implementation is deferred to a future Phase A3 sprint.
|
||
|
||
---
|
||
|
||
## 2. Options considered
|
||
|
||
### Option A — Keep `--lockfile=false` (status quo)
|
||
|
||
**How it works.** Docker `pnpm install` re-resolves on every cold build.
|
||
Cache mount preserves the pnpm content-addressed store across builds, so
|
||
warm rebuilds don't pay re-resolution cost.
|
||
|
||
**Pros:**
|
||
- Zero churn — already shipped in Phase A.
|
||
- Tolerates sibling-repo workspace mismatch for free.
|
||
- Tolerates `*` semver across all `@bytelyst/*` deps without rework.
|
||
- Compatible with the F17 fix (Gitea `host.docker.internal` URLs).
|
||
|
||
**Cons:**
|
||
- **Non-deterministic builds.** Same Dockerfile + same source can produce
|
||
different `node_modules` if a dependency was published between two
|
||
cold builds. CI runs days apart can ship divergent images for the same
|
||
commit.
|
||
- No supply-chain pinning. Any compromised upstream auto-rolls forward.
|
||
- `pnpm audit` on the host can disagree with what's actually inside
|
||
the image.
|
||
|
||
### Option B — Generate a Docker-only flat lockfile during build
|
||
|
||
**How it works.** Add a build step that runs `pnpm install --lockfile-only`
|
||
in a temp dir against a flattened `pnpm-workspace.yaml` that excludes
|
||
sibling-repo paths, then `--frozen-lockfile` against that generated lock.
|
||
|
||
**Pros:**
|
||
- Deterministic *within a single build* — same registry state at the
|
||
moment of the build always produces the same lockset.
|
||
- Doesn't require changes to the source tree's `pnpm-workspace.yaml`.
|
||
|
||
**Cons:**
|
||
- Still non-deterministic across builds (the lock is regenerated each time
|
||
unless cached separately).
|
||
- Adds Dockerfile complexity and a non-trivial new failure mode
|
||
(workspace-flattening logic).
|
||
- Marginal value over Option A given the cache mount.
|
||
|
||
### Option C — Vendor a Docker-flattened lockfile in the repo
|
||
|
||
**How it works.** Commit a `pnpm-lock.docker.yaml` (or similar) per repo
|
||
that's generated against a flattened workspace. Dockerfile uses
|
||
`pnpm install --frozen-lockfile --lockfile=pnpm-lock.docker.yaml`.
|
||
|
||
**Pros:**
|
||
- Fully deterministic. Same commit → same lockset → same image.
|
||
- Supply chain pins enforced.
|
||
- `pnpm audit` matches image contents.
|
||
|
||
**Cons:**
|
||
- Two lockfiles to maintain (the workspace one + the Docker one).
|
||
- Drift risk between the two — solved only by a CI gate that regenerates
|
||
the Docker lockfile on every PR that touches `package.json`.
|
||
- Requires a tested regenerate-on-CI workflow per repo.
|
||
- Workspace flattening logic must be encoded somewhere (script in
|
||
`common-plat/scripts/regen-docker-lockfile.sh`).
|
||
|
||
### Option D — Restructure to single-repo workspace (eliminate sibling)
|
||
|
||
**How it works.** Inline the consumed `@bytelyst/*` packages into each
|
||
product repo (vendor them) so there is no sibling-workspace dependency.
|
||
Then `--frozen-lockfile` works trivially.
|
||
|
||
**Pros:**
|
||
- Cleanest from a Docker-build-determinism standpoint.
|
||
|
||
**Cons:**
|
||
- Massive churn across 14+ product repos.
|
||
- Defeats the entire `learning_ai_common_plat` shared-package model.
|
||
- Multiplies maintenance cost of `@bytelyst/*` updates by the number of
|
||
consumers.
|
||
- Out of scope; would supersede the entire ecosystem architecture.
|
||
|
||
---
|
||
|
||
## 3. Decision
|
||
|
||
**Adopt Option A (`--lockfile=false`) as the official short-term policy.**
|
||
**Plan to migrate to Option C (`pnpm-lock.docker.yaml`) when supply-chain
|
||
determinism becomes a hard requirement** (e.g., before any production
|
||
deployment of a Docker-built image, or before SOC2-style attestation).
|
||
|
||
**Reasoning:**
|
||
|
||
1. **Phase A is already shipped on Option A** with verified speed wins
|
||
(warm rebuilds 2.7–5.4 s across all surfaces). Switching policies
|
||
mid-rollout would invalidate metrics + add risk.
|
||
2. **The cache mount (Phase A2) addresses the speed concern** that
|
||
Option A creates. The remaining concern is determinism, which is a
|
||
correctness concern — but the actual blast radius is limited because:
|
||
- All `@bytelyst/*` deps are first-party and pinned in source repos.
|
||
- Third-party deps already have fixed semver in `package.json` (no
|
||
loose `*` ranges to public registries).
|
||
- The Gitea registry is the only `@bytelyst/*` source — no public
|
||
supply-chain risk for the in-house deps.
|
||
3. **Option C is the right end state** but requires CI infrastructure
|
||
that doesn't exist yet (auto-regen-on-PR). Building it inside this
|
||
roadmap is scope creep.
|
||
4. **Option B is dominated by Option C** — same complexity, weaker
|
||
guarantees.
|
||
5. **Option D is non-starter** — it would require redesigning the
|
||
ByteLyst shared-package model.
|
||
|
||
---
|
||
|
||
## 4. Consequences
|
||
|
||
### Positive
|
||
|
||
- Phase A speed wins are preserved with zero policy churn.
|
||
- `pnpm-lock.yaml` continues to live in source repos for host development;
|
||
it stays in `.dockerignore` for Docker builds.
|
||
- The decision is reversible: switching to Option C in the future is
|
||
additive (add a Docker lockfile + change one Dockerfile line).
|
||
|
||
### Negative
|
||
|
||
- Same commit can produce different Docker images on different days. CI
|
||
must not assume image hash stability for a given commit.
|
||
- `pnpm audit` results from the host don't match Docker image contents.
|
||
Workaround: run `pnpm audit` inside the built container as a separate
|
||
CI job (cheap; no rebuild needed).
|
||
- Supply-chain attestation (SOC2, SLSA) cannot be produced for these
|
||
images today. Acceptable while there is no production traffic.
|
||
|
||
### Migration trigger
|
||
|
||
Switch to Option C when **any** of the following becomes true:
|
||
|
||
1. A production environment (paid customers, real PII) deploys a
|
||
Docker-built image from this codebase.
|
||
2. A regulatory/audit requirement demands reproducible builds.
|
||
3. A supply-chain incident occurs (compromised upstream package) and
|
||
we need rollback granularity finer than "rebuild from current `*`".
|
||
4. The cache-mount speed win disappears (e.g., CI runner switch removes
|
||
BuildKit cache persistence).
|
||
|
||
### Implementation sketch (when triggered)
|
||
|
||
1. In `learning_ai_common_plat`, add `scripts/regen-docker-lockfile.sh`:
|
||
- Reads each product repo's `package.json`.
|
||
- Generates a flattened `pnpm-workspace.yaml` (no sibling paths).
|
||
- Runs `pnpm install --lockfile-only` against the Gitea registry.
|
||
- Writes `pnpm-lock.docker.yaml` back to the product repo.
|
||
2. Each product repo gets a `.gitea/workflows/regen-docker-lockfile.yml`
|
||
that runs the script on PR-touch of `package.json` and either:
|
||
- commits the regenerated lockfile (auto-PR), or
|
||
- fails the PR with a "run regen-docker-lockfile.sh and commit" message.
|
||
3. Each product Dockerfile changes one line:
|
||
```dockerfile
|
||
# before
|
||
RUN pnpm install --ignore-scripts --lockfile=false
|
||
# after
|
||
COPY pnpm-lock.docker.yaml ./pnpm-lock.yaml
|
||
RUN pnpm install --ignore-scripts --frozen-lockfile
|
||
```
|
||
4. `.dockerignore` removes `pnpm-lock.yaml` exclusion (or adds explicit
|
||
include for `pnpm-lock.docker.yaml`).
|
||
|
||
This work is **not scoped** in the current roadmap and should be its own
|
||
small ADR-driven sprint.
|
||
|
||
---
|
||
|
||
## 5. Status tracking
|
||
|
||
| Phase | State | Notes |
|
||
|---|---|---|
|
||
| Decision | ✅ Accepted | This ADR |
|
||
| Implementation | ⏸ Deferred | Triggered by §4 conditions |
|
||
| Trigger monitor | ⚳ Open | Re-evaluate when Phase D rollout begins |
|
||
|
||
---
|
||
|
||
## 6. References
|
||
|
||
- `docker-build-optimization-roadmap.md` §0 F1, F2 (lockfile findings)
|
||
- `docker-build-optimization-roadmap.md` §A3 (deferred phase)
|
||
- `docker-build-optimization-roadmap.md` §A2 (BuildKit cache mount that
|
||
mitigates the speed concern of Option A)
|
||
- `learning_ai_common_plat/AGENTS.md` (canonical pnpm workspace config)
|