diff --git a/.env.ecosystem.example b/.env.ecosystem.example index 4ab9dbb2..4ebfcc75 100644 --- a/.env.ecosystem.example +++ b/.env.ecosystem.example @@ -70,6 +70,12 @@ FIELD_ENCRYPT_KEY_PROVIDER=memory # ── Product Identity ───────────────────────────────────────────── DEFAULT_PRODUCT_ID=lysnrai +# ── Runtime environment ───────────────────────────────────────── +NODE_ENV=production + +# ── CORS (allow all origins for dev/test — restrict in production) ── +CORS_ORIGIN=* + # ── Webhooks (optional) ───────────────────────────────────────── WEBHOOK_INVITATION_REDEEMED_URL= WEBHOOK_REFERRAL_STATUS_URL= diff --git a/docs/devops/single_azure_vm/docker/README.md b/docs/devops/single_azure_vm/docker/README.md index 1ef26ef2..ab9d6f5b 100644 --- a/docs/devops/single_azure_vm/docker/README.md +++ b/docs/devops/single_azure_vm/docker/README.md @@ -10,7 +10,7 @@ - **Azure VM:** Ubuntu 24.04 LTS (or 22.04), **Standard_D8s_v5** (8 vCPU, 32 GB RAM) recommended - **Disk:** 128 GB+ (Docker images, Cosmos emulator, Ollama models, build artifacts) -- **Network:** NSG allowing inbound on ports listed in the Port Map below +- **Network:** NSG allowing inbound on ports: `22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3075, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 10000, 11434` - **GitHub access:** Repos must be accessible (public or `GITHUB_TOKEN` for private) - **Nothing else needed** — the script installs Docker, Node.js, pnpm, Gitea, Ollama, and everything @@ -69,7 +69,7 @@ sudo ./setup.sh --help # Show full usage | 4. Build | ~5 min | `pnpm install && pnpm -r build` all `@bytelyst/*` packages | | 5. Publish | ~3 min | Publish all packages to local Gitea npm registry | | 6. Env | instant | Generate `.env.ecosystem` with Cosmos emulator key, Azurite key, JWT secret | -| 7. Deploy | ~10 min | Stop Ollama (free RAM), per-service Docker build + deploy (30 services, with fallback), prune build cache, restart Ollama | +| 7. Deploy | ~10 min | Stop Ollama (free RAM), per-service Docker build + deploy (31 services, with fallback), prune build cache, restart Ollama | | 8. Verify | ~1 min | Health-check all 31+ endpoints + create `/opt/bytelyst/check-health.sh` | ## Port Map (after deployment) @@ -185,10 +185,25 @@ All optional — defaults work for most setups: - **Build failures:** Check Gitea is running (`docker ps | grep gitea`) and packages published (`curl http://localhost:3300/api/packages/bytelyst/npm/`). Per-service build logs: `/opt/bytelyst/.setup-state/builds/.log`. Retry: `sudo ./setup.sh --phase=7`. - **Ollama not responding:** Check `systemctl status ollama` or `curl http://localhost:11434/api/version`. - **Port conflicts:** Ensure nothing else runs on the listed ports before deploying. +- **CORS errors in browser:** The generated `.env.ecosystem` sets `CORS_ORIGIN=*` for dev/test. If you restrict it, update the value to match your access URL. +- **Services in development mode:** `.env.ecosystem` now sets `NODE_ENV=production` for all services. If you need debug logging, remove or change this value. ## Known Limitations -- **Remote browser access:** Product web apps fall back to `http://localhost:` for API calls. This works when browsing from the VM itself but **not from a remote browser** (e.g., laptop accessing `http://:3060`). For remote access, set up a reverse proxy (Traefik rules) or SSH port-forwarding. Health checks and server-side rendering still work regardless. +- **Remote browser access:** Product web apps use `http://localhost:` for browser-side API calls (baked at Next.js build time via `NEXT_PUBLIC_*` args). This works when browsing from the VM itself but **not from a remote browser** (e.g., laptop accessing `http://:3060`). For remote access, use SSH port-forwarding: + ```bash + # Forward all product ports to your laptop (run from your laptop) + ssh -N -L 3001:localhost:3001 -L 3002:localhost:3002 -L 3030:localhost:3030 \ + -L 3035:localhost:3035 -L 3040:localhost:3040 -L 3045:localhost:3045 \ + -L 3050:localhost:3050 -L 3055:localhost:3055 -L 3060:localhost:3060 \ + -L 3070:localhost:3070 -L 3075:localhost:3075 \ + -L 4003:localhost:4003 -L 4010:localhost:4010 -L 4011:localhost:4011 \ + -L 4012:localhost:4012 -L 4013:localhost:4013 -L 4014:localhost:4014 \ + -L 4015:localhost:4015 -L 4016:localhost:4016 -L 4017:localhost:4017 \ + -L 4018:localhost:4018 -L 4019:localhost:4019 \ + azureuser@ + ``` + Then open `http://localhost:3060` etc. on your laptop. Server-side code (API routes, SSR) uses Docker service names and works regardless. - **Cosmos emulator is x86-only:** Do not use ARM-based VMs (e.g., Dpsv6). Stick with `Standard_D8s_v5` or similar Intel/AMD instances. - **Memory pressure:** Phase 7 automatically stops Ollama (~3 GB) during Docker builds and restarts it after. If builds still OOM on 32 GB, retry with `sudo ./setup.sh --phase=7` (per-service fallback skips what already built). - **Corporate proxy in Dockerfiles:** Already removed at source across all repos. No runtime stripping needed. diff --git a/docs/devops/single_azure_vm/docker/prompt.md b/docs/devops/single_azure_vm/docker/prompt.md index 0b0512e1..1fe25bc7 100644 --- a/docs/devops/single_azure_vm/docker/prompt.md +++ b/docs/devops/single_azure_vm/docker/prompt.md @@ -14,7 +14,7 @@ This folder contains three files you must work with: - **`README.md`** — Deployment guide documenting what the script does, ports, troubleshooting - **`prompt.md`** — This file (agent instructions) -The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, act_runner, Ollama) then clones 12 repos, builds + publishes ~57 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 31 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs + 1 standalone). +The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, act_runner, Ollama) then clones 12 repos, builds + publishes ~57 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 31 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs + 1 standalone LLM Lab dashboard). ### Current State (ALREADY IMPLEMENTED — do NOT redo) @@ -23,7 +23,7 @@ The following features are already built and tested in `setup.sh`: - **Resume/retry support:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help` CLI flags - **Phase completion markers:** Stored in `/opt/bytelyst/.setup-state/phaseN.done` - **GITEA_NPM_TOKEN auto-restore:** Token saved to `/opt/bytelyst/.gitea_token`, restored on resume -- **Per-service Docker build:** Phase 7 builds each of 30 services individually with `[N/30]` progress +- **Per-service Docker build:** Phase 7 builds each of 31 services individually with `[N/31]` progress - **Per-service fallback:** Failed builds are skipped, remaining services still start - **Build logs:** Saved per-service to `/opt/bytelyst/.setup-state/builds/.log` - **Phase 7 partial failure handling:** Phase 7 NOT marked done if builds fail, so `--resume` retries it @@ -76,6 +76,13 @@ The following issues have already been identified and fixed in the current `setu | `detect_docker_host_ip()` uses `ip` command not in minimal installs | Added `iproute2` to apt deps | `ddd2db84` | | SSH disconnect loses all output | `exec > >(tee -a setup.log) 2>&1` | `ddd2db84` | | `localmemgpt-backend` can't reach Ollama on Linux | `extra_hosts: ['host.docker.internal:host-gateway']` in compose | `3b31709b` | +| `llmlab-dashboard` missing from setup.sh service arrays | Added to WEB_SERVICES + check-health.sh | `d8908093` | +| Service count inconsistent (30 vs 31 across files) | Fixed all comments/docs to 31 | `d8908093` | +| Phase 3 `cd` side effect leaves CWD in last repo dir | Added `cd "$INSTALL_DIR"` after loop | `d8908093` | +| No `CORS_ORIGIN` in .env.ecosystem (remote browser CORS errors) | Added `CORS_ORIGIN=*` to phase6_env | `d8908093` | +| `NODE_ENV` not set for backends (run in dev mode) | Added `NODE_ENV=production` to phase6_env | `d8908093` | +| 9 product web services missing healthchecks in compose | Added `healthcheck:` to all 9 web services | `f9a20e46` | +| Dead `NEXT_PUBLIC_*` runtime env vars in compose (no effect on client code) | Replaced with non-prefixed server-side vars | `f9a20e46` | | Dashboard Dockerfiles had hardcoded corporate proxy | Converted to `ARG`-based proxy with empty defaults | `2b9fd717` | | `pnpm install --frozen-lockfile` fails on shallow clones | Removed `--frozen-lockfile` | `3b31709b` | | 3 service Dockerfiles had stale package.json COPY lists | Updated to all 57 packages + workspace members | `85aca553` | @@ -106,8 +113,8 @@ The following issues have already been identified and fixed in the current `setu ## Your Tasks (in priority order) -> **Tasks 1-3 are ALREADY DONE.** See "Current State" above and "Bugs Already Fixed" above. -> Focus on Tasks 4-7 which are the remaining work. +> **Tasks 1-6 are DONE.** See "Current State" above and "Bugs Already Fixed" above. +> Only Task 4 (dry-run, low priority) and Task 7 (test plan) remain. ### ~~1. Audit `setup.sh` for correctness~~ ✅ DONE @@ -120,7 +127,7 @@ The script has been audited and all identified bugs fixed (see table above). Pha - Phase 5 publish: tolerates 409 conflicts - Phase 6 env: heredoc with Cosmos/Azurite emulator keys, semicolons handled - Phase 7: per-service build with fallback, BuildKit secrets via `GITEA_NPM_TOKEN` env export -- Phase 8: health check covers all 30 services + Gitea + Ollama +- Phase 8: health check covers all 31 services + Gitea + Ollama ### ~~2. Fix every bug you find~~ ✅ DONE @@ -136,7 +143,7 @@ Already implemented: - **Per-service fallback:** Failed Docker builds are skipped, remaining services start - **Build logs:** Per-service to `/opt/bytelyst/.setup-state/builds/.log` -### 4. Add a dry-run / validation mode (TODO) +### 4. Add a dry-run / validation mode (TODO — low priority) Add `--dry-run` support that: @@ -147,26 +154,28 @@ Add `--dry-run` support that: - Does NOT build, publish, or deploy - Prints a summary of what WOULD happen -### 5. Validate the `docker-compose.ecosystem.yml` integration +### ~~5. Validate the `docker-compose.ecosystem.yml` integration~~ ✅ DONE -Read `docker-compose.ecosystem.yml` (in the repo root) and verify: +Validated and fixed: -- Every service's `build.context` and `build.dockerfile` paths are correct relative to the compose file location -- Every service's port mapping matches the backend's `PORT` env var -- The `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret -- All `depends_on` conditions reference services that actually exist -- The `localmemgpt-backend` service has `extra_hosts: ['host.docker.internal:host-gateway']` for Ollama access -- **30 total services:** 6 infra (pre-built images) + 24 built from Dockerfiles +- All 31 services verified: build contexts, Dockerfile paths, port mappings +- `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret +- All `depends_on` conditions reference services that exist +- `localmemgpt-backend` has `extra_hosts: ['host.docker.internal:host-gateway']` +- Added healthchecks to all 9 product web services (were missing) +- Removed dead `NEXT_PUBLIC_*` runtime env vars (Next.js bakes at build time only) +- Replaced with non-prefixed server-side vars (`PLATFORM_SERVICE_URL`, `BACKEND_URL`, etc.) +- **31 total services:** 6 infra (pre-built images) + 25 built from Dockerfiles -### 6. Update `README.md` +### ~~6. Update `README.md`~~ ✅ DONE -After all fixes, update `README.md` to reflect: +Updated: -- CLI flags: `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help` -- Correct service count: 30 (not 27) -- Updated duration estimates if phases changed -- Any new troubleshooting entries -- NSG port list: `22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 8081, 10000, 11434` +- Service count: 31 (was 30 in some places) +- NSG port list added inline in prerequisites (includes 3075 for llmlab-dashboard) +- Phase 7 description: 31 services +- Troubleshooting: added CORS and NODE_ENV entries +- Known Limitations: expanded remote browser access with SSH port-forwarding command ### 7. Create a test plan @@ -209,12 +218,13 @@ Add a section to `README.md` (or a separate `test-plan.md`) that describes how t - [ ] `setup.sh` runs flawlessly from `sudo ./setup.sh` on a raw Ubuntu 24.04 VM - [ ] All 8 phases complete without manual intervention -- [ ] `/opt/bytelyst/check-health.sh` shows ALL 30+ services green +- [ ] `/opt/bytelyst/check-health.sh` shows ALL 31 services green (including llmlab-dashboard :3075) - [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}` -- [ ] All 9 product web apps serve their landing page +- [ ] All 10 product web apps serve their landing page (9 product + 1 LLM Lab) - [ ] Admin dashboard (`http://:3001`) loads - [ ] Tracker dashboard (`http://:3003`) loads - [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models) +- [ ] LLM Lab dashboard (`http://:3075`) loads and connects to Ollama - [ ] Gitea UI accessible at `http://:3300` with all `@bytelyst/*` packages visible - [ ] Grafana accessible at `http://:3000` (admin / bytelyst) - [ ] Mailpit accessible at `http://:8025`