bytelyst-devops-tools/docs/adr/0001-docker-build-lockfile-policy.md

# ADR-0001: Docker build lockfile policy

> **Status:** Accepted (decision); Deferred (implementation) · **Date:** 2026-05-27
> **Context:** docker-build-optimization-roadmap §A3 · **Supersedes:** None
> **Authors:** Platform DevOps

---

## 1. Context

The pilot Phase A work in `docker-build-optimization-roadmap` standardized
on `pnpm install --lockfile=false` inside Docker for both
`learning_ai_clock` (web + backend) and `learning_ai_peakpulse` (backend).
That choice unblocked Phase A by sidestepping a structural mismatch:

- `pnpm-lock.yaml` is generated against the **outer pnpm workspace**, which
  includes `../learning_ai_common_plat/packages/*` as workspace members
  (sibling-repo path).
- Inside the Docker build context, the sibling repo doesn't exist
  (a single-repo build context is intentionally used for hermeticity).
- `--frozen-lockfile` therefore fails immediately with workspace
  resolution errors (finding F2 in the roadmap audit).

`--lockfile=false` skips lockfile validation entirely and re-resolves all
dependencies against the registry on every `pnpm install`. This is
correct for the workspace-mismatch problem but introduces non-determinism:
the **same Dockerfile + same source tree can produce a different lockset**
across two builds if upstream `@bytelyst/*` versions move between them.

Phase A2's BuildKit cache mount mitigates the *speed* cost of
re-resolution but not the *determinism* cost.

This ADR records the decision on which long-term policy to adopt for
Docker builds. Implementation is deferred to a future Phase A3 sprint.

---

## 2. Options considered

### Option A — Keep `--lockfile=false` (status quo)

**How it works.** Docker `pnpm install` re-resolves on every cold build.
Cache mount preserves the pnpm content-addressed store across builds, so
warm rebuilds don't pay re-resolution cost.

**Pros:**
- Zero churn — already shipped in Phase A.
- Tolerates sibling-repo workspace mismatch for free.
- Tolerates `*` semver across all `@bytelyst/*` deps without rework.
- Compatible with the F17 fix (Gitea `host.docker.internal` URLs).

**Cons:**
- **Non-deterministic builds.** Same Dockerfile + same source can produce
  different `node_modules` if a dependency was published between two
  cold builds. CI runs days apart can ship divergent images for the same
  commit.
- No supply-chain pinning. Any compromised upstream auto-rolls forward.
- `pnpm audit` on the host can disagree with what's actually inside
  the image.

### Option B — Generate a Docker-only flat lockfile during build

**How it works.** Add a build step that runs `pnpm install --lockfile-only`
in a temp dir against a flattened `pnpm-workspace.yaml` that excludes
sibling-repo paths, then `--frozen-lockfile` against that generated lock.

**Pros:**
- Deterministic *within a single build* — same registry state at the
  moment of the build always produces the same lockset.
- Doesn't require changes to the source tree's `pnpm-workspace.yaml`.

**Cons:**
- Still non-deterministic across builds (the lock is regenerated each time
  unless cached separately).
- Adds Dockerfile complexity and a non-trivial new failure mode
  (workspace-flattening logic).
- Marginal value over Option A given the cache mount.

### Option C — Vendor a Docker-flattened lockfile in the repo

**How it works.** Commit a `pnpm-lock.docker.yaml` (or similar) per repo
that's generated against a flattened workspace. Dockerfile uses
`pnpm install --frozen-lockfile --lockfile=pnpm-lock.docker.yaml`.

**Pros:**
- Fully deterministic. Same commit → same lockset → same image.
- Supply chain pins enforced.
- `pnpm audit` matches image contents.

**Cons:**
- Two lockfiles to maintain (the workspace one + the Docker one).
- Drift risk between the two — solved only by a CI gate that regenerates
  the Docker lockfile on every PR that touches `package.json`.
- Requires a tested regenerate-on-CI workflow per repo.
- Workspace flattening logic must be encoded somewhere (script in
  `common-plat/scripts/regen-docker-lockfile.sh`).

### Option D — Restructure to single-repo workspace (eliminate sibling)

**How it works.** Inline the consumed `@bytelyst/*` packages into each
product repo (vendor them) so there is no sibling-workspace dependency.
Then `--frozen-lockfile` works trivially.

**Pros:**
- Cleanest from a Docker-build-determinism standpoint.

**Cons:**
- Massive churn across 14+ product repos.
- Defeats the entire `learning_ai_common_plat` shared-package model.
- Multiplies maintenance cost of `@bytelyst/*` updates by the number of
  consumers.
- Out of scope; would supersede the entire ecosystem architecture.

---

## 3. Decision

**Adopt Option A (`--lockfile=false`) as the official short-term policy.**
**Plan to migrate to Option C (`pnpm-lock.docker.yaml`) when supply-chain
determinism becomes a hard requirement** (e.g., before any production
deployment of a Docker-built image, or before SOC2-style attestation).

**Reasoning:**

1. **Phase A is already shipped on Option A** with verified speed wins
   (warm rebuilds 2.7–5.4 s across all surfaces). Switching policies
   mid-rollout would invalidate metrics + add risk.
2. **The cache mount (Phase A2) addresses the speed concern** that
   Option A creates. The remaining concern is determinism, which is a
   correctness concern — but the actual blast radius is limited because:
   - All `@bytelyst/*` deps are first-party and pinned in source repos.
   - Third-party deps already have fixed semver in `package.json` (no
     loose `*` ranges to public registries).
   - The Gitea registry is the only `@bytelyst/*` source — no public
     supply-chain risk for the in-house deps.
3. **Option C is the right end state** but requires CI infrastructure
   that doesn't exist yet (auto-regen-on-PR). Building it inside this
   roadmap is scope creep.
4. **Option B is dominated by Option C** — same complexity, weaker
   guarantees.
5. **Option D is non-starter** — it would require redesigning the
   ByteLyst shared-package model.

---

## 4. Consequences

### Positive

- Phase A speed wins are preserved with zero policy churn.
- `pnpm-lock.yaml` continues to live in source repos for host development;
  it stays in `.dockerignore` for Docker builds.
- The decision is reversible: switching to Option C in the future is
  additive (add a Docker lockfile + change one Dockerfile line).

### Negative

- Same commit can produce different Docker images on different days. CI
  must not assume image hash stability for a given commit.
- `pnpm audit` results from the host don't match Docker image contents.
  Workaround: run `pnpm audit` inside the built container as a separate
  CI job (cheap; no rebuild needed).
- Supply-chain attestation (SOC2, SLSA) cannot be produced for these
  images today. Acceptable while there is no production traffic.

### Migration trigger

Switch to Option C when **any** of the following becomes true:

1. A production environment (paid customers, real PII) deploys a
   Docker-built image from this codebase.
2. A regulatory/audit requirement demands reproducible builds.
3. A supply-chain incident occurs (compromised upstream package) and
   we need rollback granularity finer than "rebuild from current `*`".
4. The cache-mount speed win disappears (e.g., CI runner switch removes
   BuildKit cache persistence).

### Implementation sketch (when triggered)

1. In `learning_ai_common_plat`, add `scripts/regen-docker-lockfile.sh`:
   - Reads each product repo's `package.json`.
   - Generates a flattened `pnpm-workspace.yaml` (no sibling paths).
   - Runs `pnpm install --lockfile-only` against the Gitea registry.
   - Writes `pnpm-lock.docker.yaml` back to the product repo.
2. Each product repo gets a `.gitea/workflows/regen-docker-lockfile.yml`
   that runs the script on PR-touch of `package.json` and either:
   - commits the regenerated lockfile (auto-PR), or
   - fails the PR with a "run regen-docker-lockfile.sh and commit" message.
3. Each product Dockerfile changes one line:
   ```dockerfile
   # before
   RUN pnpm install --ignore-scripts --lockfile=false
   # after
   COPY pnpm-lock.docker.yaml ./pnpm-lock.yaml
   RUN pnpm install --ignore-scripts --frozen-lockfile
   ```
4. `.dockerignore` removes `pnpm-lock.yaml` exclusion (or adds explicit
   include for `pnpm-lock.docker.yaml`).

This work is **not scoped** in the current roadmap and should be its own
small ADR-driven sprint.

---

## 5. Status tracking

| Phase | State | Notes |
|---|---|---|
| Decision | ✅ Accepted | This ADR |
| Implementation | ⏸ Deferred | Triggered by §4 conditions |
| Trigger monitor | ⚳ Open | Re-evaluate when Phase D rollout begins |

---

## 6. References

- `docker-build-optimization-roadmap.md` §0 F1, F2 (lockfile findings)
- `docker-build-optimization-roadmap.md` §A3 (deferred phase)
- `docker-build-optimization-roadmap.md` §A2 (BuildKit cache mount that
  mitigates the speed concern of Option A)
- `learning_ai_common_plat/AGENTS.md` (canonical pnpm workspace config)