docs(docker): rewrite prompt.md as execution guide for Codex agent on fresh VM

- Reframed from 'review and fix' to 'execute, monitor, fix failures, validate' - 4 clear tasks: run script, handle failures, validate deployment, report results - Moved bug history and development context to background reference - Added copy-pastable validation commands for all 31 services - Simplified constraints: don't modify unless actual runtime failure
2026-03-28 02:06:52 -07:00 · 2026-03-28 02:06:52 -07:00 · 21ff1058a4
commit 21ff1058a4
parent 7c4f0bc3d9
1 changed files with 162 additions and 269 deletions
--- a/docs/devops/single_azure_vm/docker/prompt.md
+++ b/docs/devops/single_azure_vm/docker/prompt.md
@ -1,49 +1,139 @@
-# Codex Agent Prompt: ByteLyst Single-VM E2E Deployment
+# Codex Agent Prompt: Deploy ByteLyst on this Azure VM

-> **Goal:** Review, harden, test, and complete `setup.sh` so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 30 services healthy.
+> **Goal:** Execute `setup.sh` on this fresh Azure VM to deploy the entire ByteLyst ecosystem (31 services, 11 products). Then validate the deployment and fix any issues that arise.
 >
-> **IMPORTANT:** Read the "Current State" section below FIRST. Many tasks in this prompt are already completed. Do NOT re-implement them.
+> **You are running on:** A fresh Ubuntu 24.04 LTS Azure VM (Standard_D8s_v5: 8 vCPU, 32 GB RAM, 128 GB disk).
+> **Nothing is pre-installed** except SSH. The script handles everything.

 ---

-## Context
+## Your Tasks (execute in order)

-This folder contains three files you must work with:
+### Task 1: Run the bootstrap script

- **`setup.sh`** — 8-phase bash script (~990 lines) that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
- **`README.md`** — Deployment guide documenting what the script does, ports, troubleshooting
- **`prompt.md`** — This file (agent instructions)
+```bash
+chmod +x setup.sh
+sudo ./setup.sh
+```

-The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, act_runner, Ollama) then clones 12 repos, builds + publishes ~57 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 31 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs + 1 standalone LLM Lab dashboard).
+This will take ~15-25 minutes. It runs 8 phases:

-### Current State (ALREADY IMPLEMENTED — do NOT redo)
+1. Install system dependencies (Docker, Node.js 22, pnpm 10.6.5, Ollama, git, jq)
+2. Start Gitea (local npm registry on :3300) + CI runner
+3. Clone 12 repos from GitHub (`saravanakumardb1` org, public repos)
+4. Build all `@bytelyst/*` packages (`pnpm install && pnpm -r build`)
+5. Publish packages to Gitea npm registry
+6. Generate `.env.ecosystem` with Cosmos emulator keys, JWT secret, etc.
+7. Build + deploy 31 Docker Compose services (per-service, with fallback)
+8. Health-check all endpoints + create `/opt/bytelyst/check-health.sh`

-The following features are already built and tested in `setup.sh`:
+**Monitor the output.** The script logs everything to `/opt/bytelyst/setup.log`.

- **Resume/retry support:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help` CLI flags
- **Phase completion markers:** Stored in `/opt/bytelyst/.setup-state/phaseN.done`
- **GITEA_NPM_TOKEN auto-restore:** Token saved to `/opt/bytelyst/.gitea_token`, restored on resume
- **Per-service Docker build:** Phase 7 builds each of 31 services individually with `[N/31]` progress
- **Per-service fallback:** Failed builds are skipped, remaining services still start
- **Build logs:** Saved per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`
- **Phase 7 partial failure handling:** Phase 7 NOT marked done if builds fail, so `--resume` retries it
- **set -euo pipefail safety:** All pipelines in fallback paths use `|| true` to prevent premature abort
- **Ollama model pull non-fatal:** Model download failure doesn't abort the entire setup
- **SSH disconnect protection:** All output tee'd to `/opt/bytelyst/setup.log`
- **Idempotent:** Every phase handles re-runs gracefully
+### Task 2: Handle failures (if any)

-### Key files outside this folder that the script depends on
+If the script fails mid-way:

-| File                           | Repo                             | Purpose                                     |
-| ------------------------------ | -------------------------------- | ------------------------------------------- |
-| `docker-compose.ecosystem.yml` | `learning_ai_common_plat` (root) | Defines all 30 services                     |
-| `.env.ecosystem.example`       | `learning_ai_common_plat` (root) | Template for env vars                       |
-| `packages/*/package.json`      | `learning_ai_common_plat`        | ~49 `@bytelyst/*` packages to publish       |
-| `backend/Dockerfile`           | Each of the 10 product repos     | Product backend Docker builds               |
-| `web/Dockerfile`               | Each of the 10 product repos     | Product web Docker builds                   |
-| `.npmrc.docker`                | Each of the 10 product repos     | Gitea npm registry config for Docker builds |
+- **Check which phase failed:** `sudo ./setup.sh --status`
+- **Resume from where it stopped:** `sudo ./setup.sh --resume`
+- **Retry a single phase:** `sudo ./setup.sh --phase=N` (e.g., `--phase=7` for Docker builds)
+- **Check build logs:** `ls /opt/bytelyst/.setup-state/builds/` (per-service logs)
+- **Check the main log:** `tail -100 /opt/bytelyst/setup.log`

-### Repo list (all 12, cloned to `/opt/bytelyst/`)
+If a Docker build fails for a specific service:
+
+1. Read the build log: `cat /opt/bytelyst/.setup-state/builds/<service>.log`
+2. Diagnose the issue (missing dependency, Dockerfile error, OOM, etc.)
+3. If the fix is in a Dockerfile or source file, edit it in `/opt/bytelyst/<repo>/`
+4. Retry: `sudo ./setup.sh --phase=7`
+
+**Common issues and fixes:**
+
+- **OOM during Docker builds:** Phase 7 already stops Ollama to free ~3 GB. If still OOM, run `docker system prune -f` and retry.
+- **Gitea package not found:** Run `sudo ./setup.sh --phase=5` to re-publish, then `--phase=7`.
+- **Cosmos emulator slow to start:** Just wait. Health checks have timeouts. Retry with `--phase=8`.
+- **Build cache fills disk:** Phase 7 prunes after builds. If disk is full: `docker builder prune -af`.
+
+### Task 3: Validate the deployment
+
+After the script completes successfully, run the validation:
+
+```bash
+# Quick health check (generated by Phase 8)
+/opt/bytelyst/check-health.sh
+
+# Dry-run validation (checks all prerequisites)
+sudo ./setup.sh --dry-run
+```
+
+Then verify these specific endpoints:
+
+```bash
+# Platform service
+curl -sf http://localhost:4003/health | jq .
+# Expect: {"status":"ok","service":"platform-service",...}
+
+# All 10 product backends
+for port in 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019; do
+  echo -n "  :${port} -> "
+  curl -sf http://localhost:${port}/health | jq -r .status 2>/dev/null || echo "FAIL"
+done
+# Expect: all "ok"
+
+# All 10 product web apps (9 product + 1 LLM Lab)
+for port in 3002 3030 3035 3040 3045 3050 3055 3060 3070 3075; do
+  echo -n "  :${port} -> HTTP "
+  curl -so /dev/null -w '%{http_code}' http://localhost:${port}/
+  echo ""
+done
+# Expect: all HTTP 200
+
+# LocalMemGPT can see Ollama models
+curl -sf http://localhost:4019/api/models | jq '.[0].name'
+# Expect: model name (e.g., "llama3.2:3b")
+
+# LLM Lab dashboard proxies to Ollama
+curl -sf http://localhost:3075/api/ollama/tags | jq '.models[0].name'
+# Expect: model name
+
+# Gitea has packages
+curl -sf http://localhost:3300/api/packages/bytelyst/npm/ | jq '.[0].name'
+# Expect: @bytelyst package names
+
+# Grafana
+curl -sf -u admin:bytelyst http://localhost:3000/api/org | jq .name
+# Expect: "Main Org."
+
+# Mailpit
+curl -sf http://localhost:8025/api/v1/messages | jq .total
+# Expect: 0
+```
+
+### Task 4: Report results
+
+After validation, produce a summary:
+
+1. **Phase completion status** — output of `sudo ./setup.sh --status`
+2. **Health check results** — output of `/opt/bytelyst/check-health.sh`
+3. **Any services that failed** — which ones and why
+4. **Any manual fixes applied** — what you changed and where
+5. **Total deployment time** — from the script's completion banner
+
+---
+
+## Important Context
+
+### What the script deploys (31 services)
+
+| Category          | Count | Services                                                                 |
+| ----------------- | ----- | ------------------------------------------------------------------------ |
+| Infrastructure    | 6     | cosmos-emulator, azurite, mailpit, loki, grafana, traefik                |
+| Platform Services | 3     | platform-service (:4003), extraction-service (:4005), mcp-server (:4007) |
+| Dashboards        | 2     | admin-web (:3001), tracker-web (:3003)                                   |
+| Product Backends  | 10    | peakpulse thru localmemgpt (:4010-:4019)                                 |
+| Product Web Apps  | 9     | lysnrai-dashboard (:3002), chronomind thru localmemgpt-web               |
+| Standalone        | 1     | llmlab-dashboard (:3075)                                                 |
+
+### Repo list (cloned to `/opt/bytelyst/`)

 ```
 learning_ai_common_plat          # Shared platform: packages, services, dashboards, compose
@ -62,238 +152,7 @@ learning_ai_local_llms           # Local LLM Lab (dashboard only, no backend)

 GitHub org: `saravanakumardb1` (repos are public).

---
-
-## Bugs Already Fixed (do NOT re-fix these)
-
-The following issues have already been identified and fixed in the current `setup.sh`:
-
-| Bug                                                                         | Fix                                                             | Commit     |
-| --------------------------------------------------------------------------- | --------------------------------------------------------------- | ---------- | ----- | ---------- |
-| Docker apt source had extra whitespace from `\` continuation                | Single-line echo                                                | `ddd2db84` |
-| Gitea 1.22 returns token in `.sha1`, newer versions use `.token`            | `jq -r '.sha1 // .token'` fallback                              | `ddd2db84` |
-| jfrog registry sed didn't handle multi-line `\` continuation                | Added `/jfrog-pkg-proxy.*\\$/d` pattern                         | `ddd2db84` |
-| `detect_docker_host_ip()` uses `ip` command not in minimal installs         | Added `iproute2` to apt deps                                    | `ddd2db84` |
-| SSH disconnect loses all output                                             | `exec > >(tee -a setup.log) 2>&1`                               | `ddd2db84` |
-| `localmemgpt-backend` can't reach Ollama on Linux                           | `extra_hosts: ['host.docker.internal:host-gateway']` in compose | `3b31709b` |
-| `llmlab-dashboard` missing from setup.sh service arrays                     | Added to WEB_SERVICES + check-health.sh                         | `d8908093` |
-| Service count inconsistent (30 vs 31 across files)                          | Fixed all comments/docs to 31                                   | `d8908093` |
-| Phase 3 `cd` side effect leaves CWD in last repo dir                        | Added `cd "$INSTALL_DIR"` after loop                            | `d8908093` |
-| No `CORS_ORIGIN` in .env.ecosystem (remote browser CORS errors)             | Added `CORS_ORIGIN=*` to phase6_env                             | `d8908093` |
-| `NODE_ENV` not set for backends (run in dev mode)                           | Added `NODE_ENV=production` to phase6_env                       | `d8908093` |
-| 9 product web services missing healthchecks in compose                      | Added `healthcheck:` to all 9 web services                      | `f9a20e46` |
-| Dead `NEXT_PUBLIC_*` runtime env vars in compose (no effect on client code) | Replaced with non-prefixed server-side vars                     | `f9a20e46` |
-| Dashboard Dockerfiles had hardcoded corporate proxy                         | Converted to `ARG`-based proxy with empty defaults              | `2b9fd717` |
-| `pnpm install --frozen-lockfile` fails on shallow clones                    | Removed `--frozen-lockfile`                                     | `3b31709b` |
-| 3 service Dockerfiles had stale package.json COPY lists                     | Updated to all 57 packages + workspace members                  | `85aca553` |
-| Phase 5 publish counted 409 conflicts as failures                           | Distinguish real failures from expected conflicts               | `c0bc13e1` |
-| `set -e` + `pipefail` aborted script on `docker compose up` partial failure | Added `                                                         |            | true` | `a9414218` |
-| Phase 7 marked done even with partial build failures                        | Only mark done when all builds succeed                          | `a9414218` |
-| `docker compose config --format json` called 30x in loop                    | Cached once                                                     | `a9414218` |
-| `--phase=7` printed success even with failures                              | Now exits 1 with build log path                                 | `a9414218` |
-| `last_completed_phase` didn't enforce sequential order                      | Stops at first gap                                              | `a3f4c6fa` |
-| Phase 7 missing `.env.ecosystem` guard                                      | Fail early with helpful message                                 | `a3f4c6fa` |
-| `ollama pull \| tail` aborted entire setup on slow network                  | Made non-fatal                                                  | `b634708d` |
-| NodeSource `curl\|bash` deprecated install method                           | Modern GPG key + apt source method                              | `c2ca7f53` |
-| Missing `build-essential python3` for native addons                         | Added to apt deps                                               | `c2ca7f53` |
-| `pnpm -r build` fails on workspace members without build script             | Added `--if-present` flag                                       | `c2ca7f53` |
-| `gpg --dearmor` prompts on re-run if keyring exists                         | Added `--batch --yes`                                           | `1a1f7dd5` |
-| `jq` aborts script on malformed Gitea token response                        | Added `2>/dev/null \|\| echo ""` guard                          | `1a1f7dd5` |
-| `pnpm install`/`build` failures show no useful message                      | Wrapped in `if ! ...; then fail("...")`                         | `1a1f7dd5` |
-| Docker builds OOM with Ollama + Cosmos (~7 GB combined)                     | Stop Ollama during Phase 7, restart after                       | `1a1f7dd5` |
-| Pre-flight: script runs on tiny VMs with no warning                         | Added disk (≥40 GB) and RAM (≥16 GB) checks                     | `1a1f7dd5` |
-| Azurite + Loki missing from Phase 8 health checks                           | Added both to check-health.sh                                   | `f78d382d` |
-| GITEA_NPM_TOKEN silently empty on resume                                    | Added `require_gitea_token()` guard in Phase 4 + 7              | `e928ec60` |
-| Dashboard Dockerfiles `--frozen-lockfile` fails (incomplete workspace)      | Removed from admin-web + tracker-web                            | `e928ec60` |
-| Docker build cache exhausts disk (~20-40 GB)                                | Added `docker builder prune` after Phase 7                      | `e928ec60` |
-| Compose `NEXT_PUBLIC_*` env vars wrong for 8/9 web services                 | Fixed per-service to match product code                         | `01f2276a` |
-| MindLyst web 3 files fallback to production URLs                            | Changed to `http://localhost:4003`                              | `09bdda8`  |
-
---
-
-## Your Tasks (in priority order)
-
-> **All 7 tasks are DONE.** See "Current State" above and "Bugs Already Fixed" above.
-
-### ~~1. Audit `setup.sh` for correctness~~ ✅ DONE
-
-The script has been audited and all identified bugs fixed (see table above). Phases 1-8 are tested. Key things already verified:
-
- Docker CE install, Node.js 22 (NodeSource), pnpm 10.6.5, Ollama — all idempotent
- Gitea token: `.sha1 // .token` fallback in place
- Corporate proxy: removed at source in all repos, no runtime `sed` needed
- `pnpm install` runs without `--frozen-lockfile`
- Phase 5 publish: tolerates 409 conflicts
- Phase 6 env: heredoc with Cosmos/Azurite emulator keys, semicolons handled
- Phase 7: per-service build with fallback, BuildKit secrets via `GITEA_NPM_TOKEN` env export
- Phase 8: health check covers all 31 services + Gitea + Ollama
-
-### ~~2. Fix every bug you find~~ ✅ DONE
-
-All bugs fixed — see the 16-item table in "Bugs Already Fixed" above.
-
-### ~~3. Add error recovery and logging~~ ✅ DONE
-
-Already implemented:
-
- **Phase completion markers:** `/opt/bytelyst/.setup-state/phaseN.done`
- **Resume:** `--resume` (auto-detect), `--resume-from=N`, `--phase=N` (single), `--reset`, `--status`
- **Logging:** `exec > >(tee -a setup.log) 2>&1`
- **Per-service fallback:** Failed Docker builds are skipped, remaining services start
- **Build logs:** Per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`
-
-### ~~4. Add a dry-run / validation mode~~ ✅ DONE
-
-Added `--dry-run` flag that validates:
-
- System: root, disk >= 40 GB, RAM >= 16 GB, Ubuntu
- Docker: installed, daemon running, Compose available
- Node.js + pnpm installed
- Ollama: installed, service running
- Gitea: reachable, npm token saved
- Repos: all 12 cloned
- GitHub: reachable for cloning
- Compose file + .env.ecosystem exist
- Phase completion state
- Prints pass/fail summary with guidance
-
-### ~~5. Validate the `docker-compose.ecosystem.yml` integration~~ ✅ DONE
-
-Validated and fixed:
-
- All 31 services verified: build contexts, Dockerfile paths, port mappings
- `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret
- All `depends_on` conditions reference services that exist
- `localmemgpt-backend` has `extra_hosts: ['host.docker.internal:host-gateway']`
- Added healthchecks to all 9 product web services (were missing)
- Removed dead `NEXT_PUBLIC_*` runtime env vars (Next.js bakes at build time only)
- Replaced with non-prefixed server-side vars (`PLATFORM_SERVICE_URL`, `BACKEND_URL`, etc.)
- **31 total services:** 6 infra (pre-built images) + 25 built from Dockerfiles
-
-### ~~6. Update `README.md`~~ ✅ DONE
-
-Updated:
-
- Service count: 31 (was 30 in some places)
- NSG port list added inline in prerequisites (includes 3075 for llmlab-dashboard)
- Phase 7 description: 31 services
- Troubleshooting: added CORS and NODE_ENV entries
- Known Limitations: expanded remote browser access with SSH port-forwarding command
-
-### ~~7. Create a test plan~~ ✅ DONE
-
-Created `test-plan.md` with end-to-end validation steps:
-
- Quick validation (check-health.sh + dry-run)
- Phase-by-phase verification (all 8 phases)
- Functional smoke tests (LocalMemGPT+Ollama, LLM Lab, auth, Mailpit, Grafana)
- Idempotency + resume tests
- Remote port connectivity via SSH forwarding
- Service count summary table
-
-Previous inline test plan from prompt.md (kept for reference):
-
-```
-1. SSH into VM
-2. Run: /opt/bytelyst/check-health.sh
-   Expected: All 31 checks green
-3. Run: curl http://localhost:4003/health
-   Expected: {"status":"ok","service":"platform-service",...}
-4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
-   Expected: 201 with user object
-5. Open browser: http://<vm-ip>:3001
-   Expected: Admin dashboard login page
-6. Open browser: http://<vm-ip>:3040
-   Expected: FlowMonk web app
-7. Run: curl http://localhost:4019/api/models
-   Expected: List of Ollama models including llama3.2:3b
-8. Open browser: http://<vm-ip>:8025
-   Expected: Mailpit inbox (empty)
-9. Open browser: http://<vm-ip>:3000
-   Expected: Grafana login (admin / bytelyst)
-```
-
---
-
-## Constraints
-
- **DO NOT** change any files outside `docs/devops/single_azure_vm/` without asking
- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile without verifying the change is correct across all affected services
- **DO NOT** hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
- **DO NOT** add emojis to code
- **DO NOT** use `console.log` or `print` — use the existing `log()`, `ok()`, `warn()`, `fail()` helpers
- The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
- The script MUST be idempotent — running it twice should not break anything
- The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)
-
-## Definition of Done
-
- [ ] `setup.sh` runs flawlessly from `sudo ./setup.sh` on a raw Ubuntu 24.04 VM
- [ ] All 8 phases complete without manual intervention
- [ ] `/opt/bytelyst/check-health.sh` shows ALL 31 services green (including llmlab-dashboard :3075)
- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
- [ ] All 10 product web apps serve their landing page (9 product + 1 LLM Lab)
- [ ] Admin dashboard (`http://<vm-ip>:3001`) loads
- [ ] Tracker dashboard (`http://<vm-ip>:3003`) loads
- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
- [ ] LLM Lab dashboard (`http://<vm-ip>:3075`) loads and connects to Ollama
- [ ] Gitea UI accessible at `http://<vm-ip>:3300` with all `@bytelyst/*` packages visible
- [ ] Grafana accessible at `http://<vm-ip>:3000` (admin / bytelyst)
- [ ] Mailpit accessible at `http://<vm-ip>:8025`
- [ ] `README.md` is accurate and complete
- [ ] Script is idempotent (second run succeeds without errors)
- [ ] Resume works: `sudo ./setup.sh --resume` after interrupted run
- [ ] Single-phase retry works: `sudo ./setup.sh --phase=7` after build failure
- [ ] Setup log saved to `/opt/bytelyst/setup.log`
- [ ] Build logs saved per-service to `/opt/bytelyst/.setup-state/builds/`
-
---
-
-## Architecture Reference
-
-```
-Raw Ubuntu 24.04 VM
-├── Ollama (systemd, :11434) ─── local LLM inference
-├── Gitea (Docker, :3300) ────── npm package registry
-└── Docker Compose Ecosystem (30 services)
-    ├── Infrastructure
-    │   ├── cosmos-emulator (:8081, :1234)
-    │   ├── azurite (:10000)
-    │   ├── mailpit (:1025, :8025)
-    │   ├── loki (:3100)
-    │   ├── grafana (:3000)
-    │   └── gateway/traefik (:80, :8080)
-    ├── Platform Services
-    │   ├── platform-service (:4003) ── auth, billing, flags, audit
-    │   ├── extraction-service (:4005) ── AI text extraction
-    │   └── mcp-server (:4007) ── MCP tool server
-    ├── Dashboards
-    │   ├── admin-web (:3001) ── platform admin console
-    │   └── tracker-web (:3003) ── issue tracker
-    ├── Product Backends (Fastify 5 + TypeScript)
-    │   ├── peakpulse-backend (:4010)
-    │   ├── chronomind-backend (:4011)
-    │   ├── jarvisjr-backend (:4012)
-    │   ├── nomgap-backend (:4013)
-    │   ├── mindlyst-backend (:4014)
-    │   ├── lysnrai-backend (:4015)
-    │   ├── notelett-backend (:4016)
-    │   ├── flowmonk-backend (:4017)
-    │   ├── actiontrail-backend (:4018)
-    │   └── localmemgpt-backend (:4019) ── connects to Ollama
-    └── Product Web Apps (Next.js 16)
-        ├── lysnrai-dashboard (:3002)
-        ├── chronomind-web (:3030)
-        ├── jarvisjr-web (:3035)
-        ├── flowmonk-web (:3040)
-        ├── notelett-web (:3045)
-        ├── mindlyst-web (:3050)
-        ├── nomgap-web (:3055)
-        ├── actiontrail-web (:3060)
-        └── localmemgpt-web (:3070)
-```
-
-## How Docker Builds Reach Gitea
+### How Docker builds reach the Gitea npm registry

 Product Dockerfiles use BuildKit secret mount for the npm token:

@ -304,12 +163,45 @@ RUN --mount=type=secret,id=gitea_npm_token \
    pnpm install
 ```

-The `.npmrc.docker` in each product repo uses `${GITEA_NPM_HOST}:3300` as the registry host.
-During `docker compose build`, the host's `GITEA_NPM_TOKEN` env var is passed as a BuildKit secret,
-and `GITEA_NPM_HOST` is passed as a build arg (defaults to `host.docker.internal`, overridden to
-`172.17.0.1` on Linux VMs by the setup script).
+`GITEA_NPM_HOST` is passed as a build arg (overridden to `172.17.0.1` on Linux VMs by the script).

-## CLI Reference
+### Script features (already built in)
+
+- **Resume/retry:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--dry-run`
+- **Phase markers:** `/opt/bytelyst/.setup-state/phaseN.done`
+- **Per-service fallback:** Failed Docker builds skip, remaining services still start
+- **Build logs:** Per-service at `/opt/bytelyst/.setup-state/builds/<service>.log`
+- **Idempotent:** Safe to run twice
+- **SSH protection:** All output tee'd to `/opt/bytelyst/setup.log`
+
+### 30+ bugs already fixed in the script
+
+The script has been extensively tested and hardened. 30+ bugs were found and fixed during development, including: Docker apt issues, Gitea token format changes, corporate proxy cleanup, OOM handling, build cache management, CORS and NODE_ENV configuration, healthchecks for all web services, and more. See the git log for details. **Do NOT modify `setup.sh` unless you encounter an actual runtime failure on this VM.**
+
+---
+
+## Constraints
+
+- **DO NOT** modify `setup.sh` unless you encounter an actual failure that requires a fix
+- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile unless you are fixing a real build failure
+- **DO NOT** hardcode secrets or API keys
+- If you must edit a file to fix a build failure, **document what you changed and why** in your report
+- The script is designed for Ubuntu 24.04 LTS on x86_64 (Cosmos emulator requires x86)
+
+## Definition of Done
+
+- [ ] `setup.sh` completed all 8 phases (or you resolved failures and re-ran successfully)
+- [ ] `/opt/bytelyst/check-health.sh` shows ALL 31 services green
+- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
+- [ ] All 10 product web apps return HTTP 200 (9 product + 1 LLM Lab)
+- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
+- [ ] LLM Lab dashboard (:3075) loads and proxies to Ollama
+- [ ] Gitea UI at :3300 shows all `@bytelyst/*` packages
+- [ ] Grafana at :3000 is accessible (admin / bytelyst)
+- [ ] Mailpit at :8025 is accessible
+- [ ] You produced a results summary (Task 4)
+
+## CLI Quick Reference

 ```bash
 sudo ./setup.sh                    # Fresh install (all 8 phases)
@ -317,6 +209,7 @@ sudo ./setup.sh --phase=7          # Retry just the deploy phase
 sudo ./setup.sh --resume           # Auto-resume after SSH disconnect
 sudo ./setup.sh --resume-from=7    # Jump to deploy after manual fix
 sudo ./setup.sh --status           # Check what's done
+sudo ./setup.sh --dry-run          # Validate prerequisites (no changes)
 sudo ./setup.sh --reset            # Start completely over
 sudo ./setup.sh --help             # Show usage
 ```