docs(docker): roadmap v8 — peakpulse Phase A done + A3 ADR-0001 accepted
Per § 10 steps 9 + 10.
Step 9: Peakpulse backend Phase A complete.
cold 72.2 s, warm 2.7 s (96.3% reduction). Pattern from clock applied
verbatim plus .docker-deps/.gitkeep discoverability fix back-ported
to clock. Commits:
peakpulse@11a6bc5 feat(docker): Phase A on peakpulse backend
peakpulse@6523a1a fix(docker): track .docker-deps/.gitkeep
clock@1465e06b1 fix(docker): track .docker-deps/.gitkeep
clock@d69003c1f chore: dedupe .docker-deps in .gitignore
Step 10: A3 ADR accepted.
New file: docs/adr/0001-docker-build-lockfile-policy.md
Decision: short-term Option A (--lockfile=false) — already shipped in
Phase A; long-term Option C (vendored pnpm-lock.docker.yaml). Migration
triggered by production deployment, audit requirement, supply-chain
incident, or loss of BuildKit cache. Implementation sketch in ADR § 4.
Roadmap doc updates:
- § A7 metrics table: peakpulse row populated (72.2 s / 2.7 s).
- § A3: collapsed bullet list into decision-record summary linking ADR.
- § 10: steps 9 + 10 marked ✅; status banner v7 → v8.
Next per § 10: step 11 (Phase B docker-prep hardening) or step 12
(Phase E docker-doctor.sh linter). Phase E is higher-value as durable
insurance against F11/F13/F16/F17/F18 regressions across the ecosystem.
This commit is contained in:
parent
062155b81e
commit
b00af09942
221
docs/adr/0001-docker-build-lockfile-policy.md
Normal file
221
docs/adr/0001-docker-build-lockfile-policy.md
Normal file
@ -0,0 +1,221 @@
|
||||
# ADR-0001: Docker build lockfile policy
|
||||
|
||||
> **Status:** Accepted (decision); Deferred (implementation) · **Date:** 2026-05-27
|
||||
> **Context:** docker-build-optimization-roadmap §A3 · **Supersedes:** None
|
||||
> **Authors:** Platform DevOps
|
||||
|
||||
---
|
||||
|
||||
## 1. Context
|
||||
|
||||
The pilot Phase A work in `docker-build-optimization-roadmap` standardized
|
||||
on `pnpm install --lockfile=false` inside Docker for both
|
||||
`learning_ai_clock` (web + backend) and `learning_ai_peakpulse` (backend).
|
||||
That choice unblocked Phase A by sidestepping a structural mismatch:
|
||||
|
||||
- `pnpm-lock.yaml` is generated against the **outer pnpm workspace**, which
|
||||
includes `../learning_ai_common_plat/packages/*` as workspace members
|
||||
(sibling-repo path).
|
||||
- Inside the Docker build context, the sibling repo doesn't exist
|
||||
(a single-repo build context is intentionally used for hermeticity).
|
||||
- `--frozen-lockfile` therefore fails immediately with workspace
|
||||
resolution errors (finding F2 in the roadmap audit).
|
||||
|
||||
`--lockfile=false` skips lockfile validation entirely and re-resolves all
|
||||
dependencies against the registry on every `pnpm install`. This is
|
||||
correct for the workspace-mismatch problem but introduces non-determinism:
|
||||
the **same Dockerfile + same source tree can produce a different lockset**
|
||||
across two builds if upstream `@bytelyst/*` versions move between them.
|
||||
|
||||
Phase A2's BuildKit cache mount mitigates the *speed* cost of
|
||||
re-resolution but not the *determinism* cost.
|
||||
|
||||
This ADR records the decision on which long-term policy to adopt for
|
||||
Docker builds. Implementation is deferred to a future Phase A3 sprint.
|
||||
|
||||
---
|
||||
|
||||
## 2. Options considered
|
||||
|
||||
### Option A — Keep `--lockfile=false` (status quo)
|
||||
|
||||
**How it works.** Docker `pnpm install` re-resolves on every cold build.
|
||||
Cache mount preserves the pnpm content-addressed store across builds, so
|
||||
warm rebuilds don't pay re-resolution cost.
|
||||
|
||||
**Pros:**
|
||||
- Zero churn — already shipped in Phase A.
|
||||
- Tolerates sibling-repo workspace mismatch for free.
|
||||
- Tolerates `*` semver across all `@bytelyst/*` deps without rework.
|
||||
- Compatible with the F17 fix (Gitea `host.docker.internal` URLs).
|
||||
|
||||
**Cons:**
|
||||
- **Non-deterministic builds.** Same Dockerfile + same source can produce
|
||||
different `node_modules` if a dependency was published between two
|
||||
cold builds. CI runs days apart can ship divergent images for the same
|
||||
commit.
|
||||
- No supply-chain pinning. Any compromised upstream auto-rolls forward.
|
||||
- `pnpm audit` on the host can disagree with what's actually inside
|
||||
the image.
|
||||
|
||||
### Option B — Generate a Docker-only flat lockfile during build
|
||||
|
||||
**How it works.** Add a build step that runs `pnpm install --lockfile-only`
|
||||
in a temp dir against a flattened `pnpm-workspace.yaml` that excludes
|
||||
sibling-repo paths, then `--frozen-lockfile` against that generated lock.
|
||||
|
||||
**Pros:**
|
||||
- Deterministic *within a single build* — same registry state at the
|
||||
moment of the build always produces the same lockset.
|
||||
- Doesn't require changes to the source tree's `pnpm-workspace.yaml`.
|
||||
|
||||
**Cons:**
|
||||
- Still non-deterministic across builds (the lock is regenerated each time
|
||||
unless cached separately).
|
||||
- Adds Dockerfile complexity and a non-trivial new failure mode
|
||||
(workspace-flattening logic).
|
||||
- Marginal value over Option A given the cache mount.
|
||||
|
||||
### Option C — Vendor a Docker-flattened lockfile in the repo
|
||||
|
||||
**How it works.** Commit a `pnpm-lock.docker.yaml` (or similar) per repo
|
||||
that's generated against a flattened workspace. Dockerfile uses
|
||||
`pnpm install --frozen-lockfile --lockfile=pnpm-lock.docker.yaml`.
|
||||
|
||||
**Pros:**
|
||||
- Fully deterministic. Same commit → same lockset → same image.
|
||||
- Supply chain pins enforced.
|
||||
- `pnpm audit` matches image contents.
|
||||
|
||||
**Cons:**
|
||||
- Two lockfiles to maintain (the workspace one + the Docker one).
|
||||
- Drift risk between the two — solved only by a CI gate that regenerates
|
||||
the Docker lockfile on every PR that touches `package.json`.
|
||||
- Requires a tested regenerate-on-CI workflow per repo.
|
||||
- Workspace flattening logic must be encoded somewhere (script in
|
||||
`common-plat/scripts/regen-docker-lockfile.sh`).
|
||||
|
||||
### Option D — Restructure to single-repo workspace (eliminate sibling)
|
||||
|
||||
**How it works.** Inline the consumed `@bytelyst/*` packages into each
|
||||
product repo (vendor them) so there is no sibling-workspace dependency.
|
||||
Then `--frozen-lockfile` works trivially.
|
||||
|
||||
**Pros:**
|
||||
- Cleanest from a Docker-build-determinism standpoint.
|
||||
|
||||
**Cons:**
|
||||
- Massive churn across 14+ product repos.
|
||||
- Defeats the entire `learning_ai_common_plat` shared-package model.
|
||||
- Multiplies maintenance cost of `@bytelyst/*` updates by the number of
|
||||
consumers.
|
||||
- Out of scope; would supersede the entire ecosystem architecture.
|
||||
|
||||
---
|
||||
|
||||
## 3. Decision
|
||||
|
||||
**Adopt Option A (`--lockfile=false`) as the official short-term policy.**
|
||||
**Plan to migrate to Option C (`pnpm-lock.docker.yaml`) when supply-chain
|
||||
determinism becomes a hard requirement** (e.g., before any production
|
||||
deployment of a Docker-built image, or before SOC2-style attestation).
|
||||
|
||||
**Reasoning:**
|
||||
|
||||
1. **Phase A is already shipped on Option A** with verified speed wins
|
||||
(warm rebuilds 2.7–5.4 s across all surfaces). Switching policies
|
||||
mid-rollout would invalidate metrics + add risk.
|
||||
2. **The cache mount (Phase A2) addresses the speed concern** that
|
||||
Option A creates. The remaining concern is determinism, which is a
|
||||
correctness concern — but the actual blast radius is limited because:
|
||||
- All `@bytelyst/*` deps are first-party and pinned in source repos.
|
||||
- Third-party deps already have fixed semver in `package.json` (no
|
||||
loose `*` ranges to public registries).
|
||||
- The Gitea registry is the only `@bytelyst/*` source — no public
|
||||
supply-chain risk for the in-house deps.
|
||||
3. **Option C is the right end state** but requires CI infrastructure
|
||||
that doesn't exist yet (auto-regen-on-PR). Building it inside this
|
||||
roadmap is scope creep.
|
||||
4. **Option B is dominated by Option C** — same complexity, weaker
|
||||
guarantees.
|
||||
5. **Option D is non-starter** — it would require redesigning the
|
||||
ByteLyst shared-package model.
|
||||
|
||||
---
|
||||
|
||||
## 4. Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Phase A speed wins are preserved with zero policy churn.
|
||||
- `pnpm-lock.yaml` continues to live in source repos for host development;
|
||||
it stays in `.dockerignore` for Docker builds.
|
||||
- The decision is reversible: switching to Option C in the future is
|
||||
additive (add a Docker lockfile + change one Dockerfile line).
|
||||
|
||||
### Negative
|
||||
|
||||
- Same commit can produce different Docker images on different days. CI
|
||||
must not assume image hash stability for a given commit.
|
||||
- `pnpm audit` results from the host don't match Docker image contents.
|
||||
Workaround: run `pnpm audit` inside the built container as a separate
|
||||
CI job (cheap; no rebuild needed).
|
||||
- Supply-chain attestation (SOC2, SLSA) cannot be produced for these
|
||||
images today. Acceptable while there is no production traffic.
|
||||
|
||||
### Migration trigger
|
||||
|
||||
Switch to Option C when **any** of the following becomes true:
|
||||
|
||||
1. A production environment (paid customers, real PII) deploys a
|
||||
Docker-built image from this codebase.
|
||||
2. A regulatory/audit requirement demands reproducible builds.
|
||||
3. A supply-chain incident occurs (compromised upstream package) and
|
||||
we need rollback granularity finer than "rebuild from current `*`".
|
||||
4. The cache-mount speed win disappears (e.g., CI runner switch removes
|
||||
BuildKit cache persistence).
|
||||
|
||||
### Implementation sketch (when triggered)
|
||||
|
||||
1. In `learning_ai_common_plat`, add `scripts/regen-docker-lockfile.sh`:
|
||||
- Reads each product repo's `package.json`.
|
||||
- Generates a flattened `pnpm-workspace.yaml` (no sibling paths).
|
||||
- Runs `pnpm install --lockfile-only` against the Gitea registry.
|
||||
- Writes `pnpm-lock.docker.yaml` back to the product repo.
|
||||
2. Each product repo gets a `.gitea/workflows/regen-docker-lockfile.yml`
|
||||
that runs the script on PR-touch of `package.json` and either:
|
||||
- commits the regenerated lockfile (auto-PR), or
|
||||
- fails the PR with a "run regen-docker-lockfile.sh and commit" message.
|
||||
3. Each product Dockerfile changes one line:
|
||||
```dockerfile
|
||||
# before
|
||||
RUN pnpm install --ignore-scripts --lockfile=false
|
||||
# after
|
||||
COPY pnpm-lock.docker.yaml ./pnpm-lock.yaml
|
||||
RUN pnpm install --ignore-scripts --frozen-lockfile
|
||||
```
|
||||
4. `.dockerignore` removes `pnpm-lock.yaml` exclusion (or adds explicit
|
||||
include for `pnpm-lock.docker.yaml`).
|
||||
|
||||
This work is **not scoped** in the current roadmap and should be its own
|
||||
small ADR-driven sprint.
|
||||
|
||||
---
|
||||
|
||||
## 5. Status tracking
|
||||
|
||||
| Phase | State | Notes |
|
||||
|---|---|---|
|
||||
| Decision | ✅ Accepted | This ADR |
|
||||
| Implementation | ⏸ Deferred | Triggered by §4 conditions |
|
||||
| Trigger monitor | ⚳ Open | Re-evaluate when Phase D rollout begins |
|
||||
|
||||
---
|
||||
|
||||
## 6. References
|
||||
|
||||
- `docker-build-optimization-roadmap.md` §0 F1, F2 (lockfile findings)
|
||||
- `docker-build-optimization-roadmap.md` §A3 (deferred phase)
|
||||
- `docker-build-optimization-roadmap.md` §A2 (BuildKit cache mount that
|
||||
mitigates the speed concern of Option A)
|
||||
- `learning_ai_common_plat/AGENTS.md` (canonical pnpm workspace config)
|
||||
@ -1,6 +1,6 @@
|
||||
# Docker Build Optimization Roadmap
|
||||
|
||||
> **Status:** Draft v7 (Phase A complete on clock; warm rebuilds 2.9 s backend / 5.4 s web) · **Owner:** Platform DevOps · **Created:** 2026-05-27 · **Revised:** 2026-05-27
|
||||
> **Status:** Draft v8 (Phase A complete on clock + peakpulse; A3 ADR accepted; warm rebuilds 2.7–5.4 s) · **Owner:** Platform DevOps · **Created:** 2026-05-27 · **Revised:** 2026-05-27
|
||||
>
|
||||
> Pilot Docker-build correctness + speed fixes on `learning_ai_clock` (web + backend)
|
||||
> and `learning_ai_peakpulse` (backend), then capture the playbook here for
|
||||
@ -310,7 +310,7 @@ Two options — pick one in a short ADR before implementing:
|
||||
|---|---|---|---|---|---|
|
||||
| clock | backend | **59.2 s** | **64.7 s** | **2.9 s** | Cold essentially flat (corepack adds ~1 s; cache mount empty on first run). Warm → 95.1% reduction. Commits: `clock@8b5c767a3` (A0-V), `clock@f6a806ff3` (A1+A8+A9), `clock@55e8d22d3` (A2+A5+A6) |
|
||||
| clock | web | **193 s (3:13)** | **291 s (4:51) †** | **5.4 s** | Warm → 97.2% reduction. † Cold variance — see footer |
|
||||
| peakpulse | backend | — | — | — | Pending §10 step 9 |
|
||||
| peakpulse | backend | — (was tarball-only path) | **72.2 s** | **2.7 s** | Warm → 96.3% reduction. Commits: `peakpulse@11a6bc5` (Phase A), `peakpulse@6523a1a` (.gitkeep fix), `clock@1465e06b1`+`d69003c1f` (mirror .gitkeep fix) |
|
||||
|
||||
**Footer note on cold-build variance.** Cold builds (`--no-cache`) are
|
||||
dominated by network egress for ~50 `@bytelyst/*` tarballs through the
|
||||
@ -790,9 +790,15 @@ Checks implemented by `docker-doctor.sh`:
|
||||
8. **✅ A2 + A4 + A5 + A6** on clock (cache mount + dockerignore) — `clock@55e8d22d3`.
|
||||
Warm rebuilds: **backend 2.9 s, web 5.4 s** (95–97% reduction).
|
||||
A7 metrics table populated this commit.
|
||||
9. **⏸ Phase A0 → A6** on `learning_ai_peakpulse` (backend only) as validation
|
||||
second pass. **STOP and request approval — different repo, separate audit.**
|
||||
10. **⚳ A3 ADR** — decide lockfile policy (defer implementation).
|
||||
9. **✅ Phase A0 → A6** on `learning_ai_peakpulse` backend (`peakpulse@11a6bc5`).
|
||||
Cold 72.2 s, warm 2.7 s. Pattern from clock applied verbatim, plus a
|
||||
side fix for `.docker-deps/.gitkeep` discoverability that was also
|
||||
ported back to clock (`peakpulse@6523a1a`, `clock@1465e06b1`,
|
||||
`clock@d69003c1f`).
|
||||
10. **✅ A3 ADR** — [`docs/adr/0001-docker-build-lockfile-policy.md`](adr/0001-docker-build-lockfile-policy.md).
|
||||
Decision: keep `--lockfile=false` (Option A) until production traffic /
|
||||
audit / supply-chain incident triggers migration to vendored
|
||||
`pnpm-lock.docker.yaml` (Option C). Implementation deferred.
|
||||
11. **⚳ Phase B** — harden `docker-prep.sh` on clock, then promote to canonical
|
||||
home in common-plat (B7) and sync to peakpulse.
|
||||
12. **⚳ Phase E** — land `docker-doctor.sh`, wire into CI as warning, then error.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user