docs(docker): rewrite prompt.md as execution guide for Codex agent on fresh VM
- Reframed from 'review and fix' to 'execute, monitor, fix failures, validate' - 4 clear tasks: run script, handle failures, validate deployment, report results - Moved bug history and development context to background reference - Added copy-pastable validation commands for all 31 services - Simplified constraints: don't modify unless actual runtime failure
This commit is contained in:
parent
7c4f0bc3d9
commit
21ff1058a4
@ -1,49 +1,139 @@
|
||||
# Codex Agent Prompt: ByteLyst Single-VM E2E Deployment
|
||||
# Codex Agent Prompt: Deploy ByteLyst on this Azure VM
|
||||
|
||||
> **Goal:** Review, harden, test, and complete `setup.sh` so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 30 services healthy.
|
||||
> **Goal:** Execute `setup.sh` on this fresh Azure VM to deploy the entire ByteLyst ecosystem (31 services, 11 products). Then validate the deployment and fix any issues that arise.
|
||||
>
|
||||
> **IMPORTANT:** Read the "Current State" section below FIRST. Many tasks in this prompt are already completed. Do NOT re-implement them.
|
||||
> **You are running on:** A fresh Ubuntu 24.04 LTS Azure VM (Standard_D8s_v5: 8 vCPU, 32 GB RAM, 128 GB disk).
|
||||
> **Nothing is pre-installed** except SSH. The script handles everything.
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
## Your Tasks (execute in order)
|
||||
|
||||
This folder contains three files you must work with:
|
||||
### Task 1: Run the bootstrap script
|
||||
|
||||
- **`setup.sh`** — 8-phase bash script (~990 lines) that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
|
||||
- **`README.md`** — Deployment guide documenting what the script does, ports, troubleshooting
|
||||
- **`prompt.md`** — This file (agent instructions)
|
||||
```bash
|
||||
chmod +x setup.sh
|
||||
sudo ./setup.sh
|
||||
```
|
||||
|
||||
The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, act_runner, Ollama) then clones 12 repos, builds + publishes ~57 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 31 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs + 1 standalone LLM Lab dashboard).
|
||||
This will take ~15-25 minutes. It runs 8 phases:
|
||||
|
||||
### Current State (ALREADY IMPLEMENTED — do NOT redo)
|
||||
1. Install system dependencies (Docker, Node.js 22, pnpm 10.6.5, Ollama, git, jq)
|
||||
2. Start Gitea (local npm registry on :3300) + CI runner
|
||||
3. Clone 12 repos from GitHub (`saravanakumardb1` org, public repos)
|
||||
4. Build all `@bytelyst/*` packages (`pnpm install && pnpm -r build`)
|
||||
5. Publish packages to Gitea npm registry
|
||||
6. Generate `.env.ecosystem` with Cosmos emulator keys, JWT secret, etc.
|
||||
7. Build + deploy 31 Docker Compose services (per-service, with fallback)
|
||||
8. Health-check all endpoints + create `/opt/bytelyst/check-health.sh`
|
||||
|
||||
The following features are already built and tested in `setup.sh`:
|
||||
**Monitor the output.** The script logs everything to `/opt/bytelyst/setup.log`.
|
||||
|
||||
- **Resume/retry support:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help` CLI flags
|
||||
- **Phase completion markers:** Stored in `/opt/bytelyst/.setup-state/phaseN.done`
|
||||
- **GITEA_NPM_TOKEN auto-restore:** Token saved to `/opt/bytelyst/.gitea_token`, restored on resume
|
||||
- **Per-service Docker build:** Phase 7 builds each of 31 services individually with `[N/31]` progress
|
||||
- **Per-service fallback:** Failed builds are skipped, remaining services still start
|
||||
- **Build logs:** Saved per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`
|
||||
- **Phase 7 partial failure handling:** Phase 7 NOT marked done if builds fail, so `--resume` retries it
|
||||
- **set -euo pipefail safety:** All pipelines in fallback paths use `|| true` to prevent premature abort
|
||||
- **Ollama model pull non-fatal:** Model download failure doesn't abort the entire setup
|
||||
- **SSH disconnect protection:** All output tee'd to `/opt/bytelyst/setup.log`
|
||||
- **Idempotent:** Every phase handles re-runs gracefully
|
||||
### Task 2: Handle failures (if any)
|
||||
|
||||
### Key files outside this folder that the script depends on
|
||||
If the script fails mid-way:
|
||||
|
||||
| File | Repo | Purpose |
|
||||
| ------------------------------ | -------------------------------- | ------------------------------------------- |
|
||||
| `docker-compose.ecosystem.yml` | `learning_ai_common_plat` (root) | Defines all 30 services |
|
||||
| `.env.ecosystem.example` | `learning_ai_common_plat` (root) | Template for env vars |
|
||||
| `packages/*/package.json` | `learning_ai_common_plat` | ~49 `@bytelyst/*` packages to publish |
|
||||
| `backend/Dockerfile` | Each of the 10 product repos | Product backend Docker builds |
|
||||
| `web/Dockerfile` | Each of the 10 product repos | Product web Docker builds |
|
||||
| `.npmrc.docker` | Each of the 10 product repos | Gitea npm registry config for Docker builds |
|
||||
- **Check which phase failed:** `sudo ./setup.sh --status`
|
||||
- **Resume from where it stopped:** `sudo ./setup.sh --resume`
|
||||
- **Retry a single phase:** `sudo ./setup.sh --phase=N` (e.g., `--phase=7` for Docker builds)
|
||||
- **Check build logs:** `ls /opt/bytelyst/.setup-state/builds/` (per-service logs)
|
||||
- **Check the main log:** `tail -100 /opt/bytelyst/setup.log`
|
||||
|
||||
### Repo list (all 12, cloned to `/opt/bytelyst/`)
|
||||
If a Docker build fails for a specific service:
|
||||
|
||||
1. Read the build log: `cat /opt/bytelyst/.setup-state/builds/<service>.log`
|
||||
2. Diagnose the issue (missing dependency, Dockerfile error, OOM, etc.)
|
||||
3. If the fix is in a Dockerfile or source file, edit it in `/opt/bytelyst/<repo>/`
|
||||
4. Retry: `sudo ./setup.sh --phase=7`
|
||||
|
||||
**Common issues and fixes:**
|
||||
|
||||
- **OOM during Docker builds:** Phase 7 already stops Ollama to free ~3 GB. If still OOM, run `docker system prune -f` and retry.
|
||||
- **Gitea package not found:** Run `sudo ./setup.sh --phase=5` to re-publish, then `--phase=7`.
|
||||
- **Cosmos emulator slow to start:** Just wait. Health checks have timeouts. Retry with `--phase=8`.
|
||||
- **Build cache fills disk:** Phase 7 prunes after builds. If disk is full: `docker builder prune -af`.
|
||||
|
||||
### Task 3: Validate the deployment
|
||||
|
||||
After the script completes successfully, run the validation:
|
||||
|
||||
```bash
|
||||
# Quick health check (generated by Phase 8)
|
||||
/opt/bytelyst/check-health.sh
|
||||
|
||||
# Dry-run validation (checks all prerequisites)
|
||||
sudo ./setup.sh --dry-run
|
||||
```
|
||||
|
||||
Then verify these specific endpoints:
|
||||
|
||||
```bash
|
||||
# Platform service
|
||||
curl -sf http://localhost:4003/health | jq .
|
||||
# Expect: {"status":"ok","service":"platform-service",...}
|
||||
|
||||
# All 10 product backends
|
||||
for port in 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019; do
|
||||
echo -n " :${port} -> "
|
||||
curl -sf http://localhost:${port}/health | jq -r .status 2>/dev/null || echo "FAIL"
|
||||
done
|
||||
# Expect: all "ok"
|
||||
|
||||
# All 10 product web apps (9 product + 1 LLM Lab)
|
||||
for port in 3002 3030 3035 3040 3045 3050 3055 3060 3070 3075; do
|
||||
echo -n " :${port} -> HTTP "
|
||||
curl -so /dev/null -w '%{http_code}' http://localhost:${port}/
|
||||
echo ""
|
||||
done
|
||||
# Expect: all HTTP 200
|
||||
|
||||
# LocalMemGPT can see Ollama models
|
||||
curl -sf http://localhost:4019/api/models | jq '.[0].name'
|
||||
# Expect: model name (e.g., "llama3.2:3b")
|
||||
|
||||
# LLM Lab dashboard proxies to Ollama
|
||||
curl -sf http://localhost:3075/api/ollama/tags | jq '.models[0].name'
|
||||
# Expect: model name
|
||||
|
||||
# Gitea has packages
|
||||
curl -sf http://localhost:3300/api/packages/bytelyst/npm/ | jq '.[0].name'
|
||||
# Expect: @bytelyst package names
|
||||
|
||||
# Grafana
|
||||
curl -sf -u admin:bytelyst http://localhost:3000/api/org | jq .name
|
||||
# Expect: "Main Org."
|
||||
|
||||
# Mailpit
|
||||
curl -sf http://localhost:8025/api/v1/messages | jq .total
|
||||
# Expect: 0
|
||||
```
|
||||
|
||||
### Task 4: Report results
|
||||
|
||||
After validation, produce a summary:
|
||||
|
||||
1. **Phase completion status** — output of `sudo ./setup.sh --status`
|
||||
2. **Health check results** — output of `/opt/bytelyst/check-health.sh`
|
||||
3. **Any services that failed** — which ones and why
|
||||
4. **Any manual fixes applied** — what you changed and where
|
||||
5. **Total deployment time** — from the script's completion banner
|
||||
|
||||
---
|
||||
|
||||
## Important Context
|
||||
|
||||
### What the script deploys (31 services)
|
||||
|
||||
| Category | Count | Services |
|
||||
| ----------------- | ----- | ------------------------------------------------------------------------ |
|
||||
| Infrastructure | 6 | cosmos-emulator, azurite, mailpit, loki, grafana, traefik |
|
||||
| Platform Services | 3 | platform-service (:4003), extraction-service (:4005), mcp-server (:4007) |
|
||||
| Dashboards | 2 | admin-web (:3001), tracker-web (:3003) |
|
||||
| Product Backends | 10 | peakpulse thru localmemgpt (:4010-:4019) |
|
||||
| Product Web Apps | 9 | lysnrai-dashboard (:3002), chronomind thru localmemgpt-web |
|
||||
| Standalone | 1 | llmlab-dashboard (:3075) |
|
||||
|
||||
### Repo list (cloned to `/opt/bytelyst/`)
|
||||
|
||||
```
|
||||
learning_ai_common_plat # Shared platform: packages, services, dashboards, compose
|
||||
@ -62,238 +152,7 @@ learning_ai_local_llms # Local LLM Lab (dashboard only, no backend)
|
||||
|
||||
GitHub org: `saravanakumardb1` (repos are public).
|
||||
|
||||
---
|
||||
|
||||
## Bugs Already Fixed (do NOT re-fix these)
|
||||
|
||||
The following issues have already been identified and fixed in the current `setup.sh`:
|
||||
|
||||
| Bug | Fix | Commit |
|
||||
| --------------------------------------------------------------------------- | --------------------------------------------------------------- | ---------- | ----- | ---------- |
|
||||
| Docker apt source had extra whitespace from `\` continuation | Single-line echo | `ddd2db84` |
|
||||
| Gitea 1.22 returns token in `.sha1`, newer versions use `.token` | `jq -r '.sha1 // .token'` fallback | `ddd2db84` |
|
||||
| jfrog registry sed didn't handle multi-line `\` continuation | Added `/jfrog-pkg-proxy.*\\$/d` pattern | `ddd2db84` |
|
||||
| `detect_docker_host_ip()` uses `ip` command not in minimal installs | Added `iproute2` to apt deps | `ddd2db84` |
|
||||
| SSH disconnect loses all output | `exec > >(tee -a setup.log) 2>&1` | `ddd2db84` |
|
||||
| `localmemgpt-backend` can't reach Ollama on Linux | `extra_hosts: ['host.docker.internal:host-gateway']` in compose | `3b31709b` |
|
||||
| `llmlab-dashboard` missing from setup.sh service arrays | Added to WEB_SERVICES + check-health.sh | `d8908093` |
|
||||
| Service count inconsistent (30 vs 31 across files) | Fixed all comments/docs to 31 | `d8908093` |
|
||||
| Phase 3 `cd` side effect leaves CWD in last repo dir | Added `cd "$INSTALL_DIR"` after loop | `d8908093` |
|
||||
| No `CORS_ORIGIN` in .env.ecosystem (remote browser CORS errors) | Added `CORS_ORIGIN=*` to phase6_env | `d8908093` |
|
||||
| `NODE_ENV` not set for backends (run in dev mode) | Added `NODE_ENV=production` to phase6_env | `d8908093` |
|
||||
| 9 product web services missing healthchecks in compose | Added `healthcheck:` to all 9 web services | `f9a20e46` |
|
||||
| Dead `NEXT_PUBLIC_*` runtime env vars in compose (no effect on client code) | Replaced with non-prefixed server-side vars | `f9a20e46` |
|
||||
| Dashboard Dockerfiles had hardcoded corporate proxy | Converted to `ARG`-based proxy with empty defaults | `2b9fd717` |
|
||||
| `pnpm install --frozen-lockfile` fails on shallow clones | Removed `--frozen-lockfile` | `3b31709b` |
|
||||
| 3 service Dockerfiles had stale package.json COPY lists | Updated to all 57 packages + workspace members | `85aca553` |
|
||||
| Phase 5 publish counted 409 conflicts as failures | Distinguish real failures from expected conflicts | `c0bc13e1` |
|
||||
| `set -e` + `pipefail` aborted script on `docker compose up` partial failure | Added ` | | true` | `a9414218` |
|
||||
| Phase 7 marked done even with partial build failures | Only mark done when all builds succeed | `a9414218` |
|
||||
| `docker compose config --format json` called 30x in loop | Cached once | `a9414218` |
|
||||
| `--phase=7` printed success even with failures | Now exits 1 with build log path | `a9414218` |
|
||||
| `last_completed_phase` didn't enforce sequential order | Stops at first gap | `a3f4c6fa` |
|
||||
| Phase 7 missing `.env.ecosystem` guard | Fail early with helpful message | `a3f4c6fa` |
|
||||
| `ollama pull \| tail` aborted entire setup on slow network | Made non-fatal | `b634708d` |
|
||||
| NodeSource `curl\|bash` deprecated install method | Modern GPG key + apt source method | `c2ca7f53` |
|
||||
| Missing `build-essential python3` for native addons | Added to apt deps | `c2ca7f53` |
|
||||
| `pnpm -r build` fails on workspace members without build script | Added `--if-present` flag | `c2ca7f53` |
|
||||
| `gpg --dearmor` prompts on re-run if keyring exists | Added `--batch --yes` | `1a1f7dd5` |
|
||||
| `jq` aborts script on malformed Gitea token response | Added `2>/dev/null \|\| echo ""` guard | `1a1f7dd5` |
|
||||
| `pnpm install`/`build` failures show no useful message | Wrapped in `if ! ...; then fail("...")` | `1a1f7dd5` |
|
||||
| Docker builds OOM with Ollama + Cosmos (~7 GB combined) | Stop Ollama during Phase 7, restart after | `1a1f7dd5` |
|
||||
| Pre-flight: script runs on tiny VMs with no warning | Added disk (≥40 GB) and RAM (≥16 GB) checks | `1a1f7dd5` |
|
||||
| Azurite + Loki missing from Phase 8 health checks | Added both to check-health.sh | `f78d382d` |
|
||||
| GITEA_NPM_TOKEN silently empty on resume | Added `require_gitea_token()` guard in Phase 4 + 7 | `e928ec60` |
|
||||
| Dashboard Dockerfiles `--frozen-lockfile` fails (incomplete workspace) | Removed from admin-web + tracker-web | `e928ec60` |
|
||||
| Docker build cache exhausts disk (~20-40 GB) | Added `docker builder prune` after Phase 7 | `e928ec60` |
|
||||
| Compose `NEXT_PUBLIC_*` env vars wrong for 8/9 web services | Fixed per-service to match product code | `01f2276a` |
|
||||
| MindLyst web 3 files fallback to production URLs | Changed to `http://localhost:4003` | `09bdda8` |
|
||||
|
||||
---
|
||||
|
||||
## Your Tasks (in priority order)
|
||||
|
||||
> **All 7 tasks are DONE.** See "Current State" above and "Bugs Already Fixed" above.
|
||||
|
||||
### ~~1. Audit `setup.sh` for correctness~~ ✅ DONE
|
||||
|
||||
The script has been audited and all identified bugs fixed (see table above). Phases 1-8 are tested. Key things already verified:
|
||||
|
||||
- Docker CE install, Node.js 22 (NodeSource), pnpm 10.6.5, Ollama — all idempotent
|
||||
- Gitea token: `.sha1 // .token` fallback in place
|
||||
- Corporate proxy: removed at source in all repos, no runtime `sed` needed
|
||||
- `pnpm install` runs without `--frozen-lockfile`
|
||||
- Phase 5 publish: tolerates 409 conflicts
|
||||
- Phase 6 env: heredoc with Cosmos/Azurite emulator keys, semicolons handled
|
||||
- Phase 7: per-service build with fallback, BuildKit secrets via `GITEA_NPM_TOKEN` env export
|
||||
- Phase 8: health check covers all 31 services + Gitea + Ollama
|
||||
|
||||
### ~~2. Fix every bug you find~~ ✅ DONE
|
||||
|
||||
All bugs fixed — see the 16-item table in "Bugs Already Fixed" above.
|
||||
|
||||
### ~~3. Add error recovery and logging~~ ✅ DONE
|
||||
|
||||
Already implemented:
|
||||
|
||||
- **Phase completion markers:** `/opt/bytelyst/.setup-state/phaseN.done`
|
||||
- **Resume:** `--resume` (auto-detect), `--resume-from=N`, `--phase=N` (single), `--reset`, `--status`
|
||||
- **Logging:** `exec > >(tee -a setup.log) 2>&1`
|
||||
- **Per-service fallback:** Failed Docker builds are skipped, remaining services start
|
||||
- **Build logs:** Per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`
|
||||
|
||||
### ~~4. Add a dry-run / validation mode~~ ✅ DONE
|
||||
|
||||
Added `--dry-run` flag that validates:
|
||||
|
||||
- System: root, disk >= 40 GB, RAM >= 16 GB, Ubuntu
|
||||
- Docker: installed, daemon running, Compose available
|
||||
- Node.js + pnpm installed
|
||||
- Ollama: installed, service running
|
||||
- Gitea: reachable, npm token saved
|
||||
- Repos: all 12 cloned
|
||||
- GitHub: reachable for cloning
|
||||
- Compose file + .env.ecosystem exist
|
||||
- Phase completion state
|
||||
- Prints pass/fail summary with guidance
|
||||
|
||||
### ~~5. Validate the `docker-compose.ecosystem.yml` integration~~ ✅ DONE
|
||||
|
||||
Validated and fixed:
|
||||
|
||||
- All 31 services verified: build contexts, Dockerfile paths, port mappings
|
||||
- `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret
|
||||
- All `depends_on` conditions reference services that exist
|
||||
- `localmemgpt-backend` has `extra_hosts: ['host.docker.internal:host-gateway']`
|
||||
- Added healthchecks to all 9 product web services (were missing)
|
||||
- Removed dead `NEXT_PUBLIC_*` runtime env vars (Next.js bakes at build time only)
|
||||
- Replaced with non-prefixed server-side vars (`PLATFORM_SERVICE_URL`, `BACKEND_URL`, etc.)
|
||||
- **31 total services:** 6 infra (pre-built images) + 25 built from Dockerfiles
|
||||
|
||||
### ~~6. Update `README.md`~~ ✅ DONE
|
||||
|
||||
Updated:
|
||||
|
||||
- Service count: 31 (was 30 in some places)
|
||||
- NSG port list added inline in prerequisites (includes 3075 for llmlab-dashboard)
|
||||
- Phase 7 description: 31 services
|
||||
- Troubleshooting: added CORS and NODE_ENV entries
|
||||
- Known Limitations: expanded remote browser access with SSH port-forwarding command
|
||||
|
||||
### ~~7. Create a test plan~~ ✅ DONE
|
||||
|
||||
Created `test-plan.md` with end-to-end validation steps:
|
||||
|
||||
- Quick validation (check-health.sh + dry-run)
|
||||
- Phase-by-phase verification (all 8 phases)
|
||||
- Functional smoke tests (LocalMemGPT+Ollama, LLM Lab, auth, Mailpit, Grafana)
|
||||
- Idempotency + resume tests
|
||||
- Remote port connectivity via SSH forwarding
|
||||
- Service count summary table
|
||||
|
||||
Previous inline test plan from prompt.md (kept for reference):
|
||||
|
||||
```
|
||||
1. SSH into VM
|
||||
2. Run: /opt/bytelyst/check-health.sh
|
||||
Expected: All 31 checks green
|
||||
3. Run: curl http://localhost:4003/health
|
||||
Expected: {"status":"ok","service":"platform-service",...}
|
||||
4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
|
||||
Expected: 201 with user object
|
||||
5. Open browser: http://<vm-ip>:3001
|
||||
Expected: Admin dashboard login page
|
||||
6. Open browser: http://<vm-ip>:3040
|
||||
Expected: FlowMonk web app
|
||||
7. Run: curl http://localhost:4019/api/models
|
||||
Expected: List of Ollama models including llama3.2:3b
|
||||
8. Open browser: http://<vm-ip>:8025
|
||||
Expected: Mailpit inbox (empty)
|
||||
9. Open browser: http://<vm-ip>:3000
|
||||
Expected: Grafana login (admin / bytelyst)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Constraints
|
||||
|
||||
- **DO NOT** change any files outside `docs/devops/single_azure_vm/` without asking
|
||||
- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile without verifying the change is correct across all affected services
|
||||
- **DO NOT** hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
|
||||
- **DO NOT** add emojis to code
|
||||
- **DO NOT** use `console.log` or `print` — use the existing `log()`, `ok()`, `warn()`, `fail()` helpers
|
||||
- The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
|
||||
- The script MUST be idempotent — running it twice should not break anything
|
||||
- The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `setup.sh` runs flawlessly from `sudo ./setup.sh` on a raw Ubuntu 24.04 VM
|
||||
- [ ] All 8 phases complete without manual intervention
|
||||
- [ ] `/opt/bytelyst/check-health.sh` shows ALL 31 services green (including llmlab-dashboard :3075)
|
||||
- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
|
||||
- [ ] All 10 product web apps serve their landing page (9 product + 1 LLM Lab)
|
||||
- [ ] Admin dashboard (`http://<vm-ip>:3001`) loads
|
||||
- [ ] Tracker dashboard (`http://<vm-ip>:3003`) loads
|
||||
- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
|
||||
- [ ] LLM Lab dashboard (`http://<vm-ip>:3075`) loads and connects to Ollama
|
||||
- [ ] Gitea UI accessible at `http://<vm-ip>:3300` with all `@bytelyst/*` packages visible
|
||||
- [ ] Grafana accessible at `http://<vm-ip>:3000` (admin / bytelyst)
|
||||
- [ ] Mailpit accessible at `http://<vm-ip>:8025`
|
||||
- [ ] `README.md` is accurate and complete
|
||||
- [ ] Script is idempotent (second run succeeds without errors)
|
||||
- [ ] Resume works: `sudo ./setup.sh --resume` after interrupted run
|
||||
- [ ] Single-phase retry works: `sudo ./setup.sh --phase=7` after build failure
|
||||
- [ ] Setup log saved to `/opt/bytelyst/setup.log`
|
||||
- [ ] Build logs saved per-service to `/opt/bytelyst/.setup-state/builds/`
|
||||
|
||||
---
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
```
|
||||
Raw Ubuntu 24.04 VM
|
||||
├── Ollama (systemd, :11434) ─── local LLM inference
|
||||
├── Gitea (Docker, :3300) ────── npm package registry
|
||||
└── Docker Compose Ecosystem (30 services)
|
||||
├── Infrastructure
|
||||
│ ├── cosmos-emulator (:8081, :1234)
|
||||
│ ├── azurite (:10000)
|
||||
│ ├── mailpit (:1025, :8025)
|
||||
│ ├── loki (:3100)
|
||||
│ ├── grafana (:3000)
|
||||
│ └── gateway/traefik (:80, :8080)
|
||||
├── Platform Services
|
||||
│ ├── platform-service (:4003) ── auth, billing, flags, audit
|
||||
│ ├── extraction-service (:4005) ── AI text extraction
|
||||
│ └── mcp-server (:4007) ── MCP tool server
|
||||
├── Dashboards
|
||||
│ ├── admin-web (:3001) ── platform admin console
|
||||
│ └── tracker-web (:3003) ── issue tracker
|
||||
├── Product Backends (Fastify 5 + TypeScript)
|
||||
│ ├── peakpulse-backend (:4010)
|
||||
│ ├── chronomind-backend (:4011)
|
||||
│ ├── jarvisjr-backend (:4012)
|
||||
│ ├── nomgap-backend (:4013)
|
||||
│ ├── mindlyst-backend (:4014)
|
||||
│ ├── lysnrai-backend (:4015)
|
||||
│ ├── notelett-backend (:4016)
|
||||
│ ├── flowmonk-backend (:4017)
|
||||
│ ├── actiontrail-backend (:4018)
|
||||
│ └── localmemgpt-backend (:4019) ── connects to Ollama
|
||||
└── Product Web Apps (Next.js 16)
|
||||
├── lysnrai-dashboard (:3002)
|
||||
├── chronomind-web (:3030)
|
||||
├── jarvisjr-web (:3035)
|
||||
├── flowmonk-web (:3040)
|
||||
├── notelett-web (:3045)
|
||||
├── mindlyst-web (:3050)
|
||||
├── nomgap-web (:3055)
|
||||
├── actiontrail-web (:3060)
|
||||
└── localmemgpt-web (:3070)
|
||||
```
|
||||
|
||||
## How Docker Builds Reach Gitea
|
||||
### How Docker builds reach the Gitea npm registry
|
||||
|
||||
Product Dockerfiles use BuildKit secret mount for the npm token:
|
||||
|
||||
@ -304,12 +163,45 @@ RUN --mount=type=secret,id=gitea_npm_token \
|
||||
pnpm install
|
||||
```
|
||||
|
||||
The `.npmrc.docker` in each product repo uses `${GITEA_NPM_HOST}:3300` as the registry host.
|
||||
During `docker compose build`, the host's `GITEA_NPM_TOKEN` env var is passed as a BuildKit secret,
|
||||
and `GITEA_NPM_HOST` is passed as a build arg (defaults to `host.docker.internal`, overridden to
|
||||
`172.17.0.1` on Linux VMs by the setup script).
|
||||
`GITEA_NPM_HOST` is passed as a build arg (overridden to `172.17.0.1` on Linux VMs by the script).
|
||||
|
||||
## CLI Reference
|
||||
### Script features (already built in)
|
||||
|
||||
- **Resume/retry:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--dry-run`
|
||||
- **Phase markers:** `/opt/bytelyst/.setup-state/phaseN.done`
|
||||
- **Per-service fallback:** Failed Docker builds skip, remaining services still start
|
||||
- **Build logs:** Per-service at `/opt/bytelyst/.setup-state/builds/<service>.log`
|
||||
- **Idempotent:** Safe to run twice
|
||||
- **SSH protection:** All output tee'd to `/opt/bytelyst/setup.log`
|
||||
|
||||
### 30+ bugs already fixed in the script
|
||||
|
||||
The script has been extensively tested and hardened. 30+ bugs were found and fixed during development, including: Docker apt issues, Gitea token format changes, corporate proxy cleanup, OOM handling, build cache management, CORS and NODE_ENV configuration, healthchecks for all web services, and more. See the git log for details. **Do NOT modify `setup.sh` unless you encounter an actual runtime failure on this VM.**
|
||||
|
||||
---
|
||||
|
||||
## Constraints
|
||||
|
||||
- **DO NOT** modify `setup.sh` unless you encounter an actual failure that requires a fix
|
||||
- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile unless you are fixing a real build failure
|
||||
- **DO NOT** hardcode secrets or API keys
|
||||
- If you must edit a file to fix a build failure, **document what you changed and why** in your report
|
||||
- The script is designed for Ubuntu 24.04 LTS on x86_64 (Cosmos emulator requires x86)
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `setup.sh` completed all 8 phases (or you resolved failures and re-ran successfully)
|
||||
- [ ] `/opt/bytelyst/check-health.sh` shows ALL 31 services green
|
||||
- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
|
||||
- [ ] All 10 product web apps return HTTP 200 (9 product + 1 LLM Lab)
|
||||
- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
|
||||
- [ ] LLM Lab dashboard (:3075) loads and proxies to Ollama
|
||||
- [ ] Gitea UI at :3300 shows all `@bytelyst/*` packages
|
||||
- [ ] Grafana at :3000 is accessible (admin / bytelyst)
|
||||
- [ ] Mailpit at :8025 is accessible
|
||||
- [ ] You produced a results summary (Task 4)
|
||||
|
||||
## CLI Quick Reference
|
||||
|
||||
```bash
|
||||
sudo ./setup.sh # Fresh install (all 8 phases)
|
||||
@ -317,6 +209,7 @@ sudo ./setup.sh --phase=7 # Retry just the deploy phase
|
||||
sudo ./setup.sh --resume # Auto-resume after SSH disconnect
|
||||
sudo ./setup.sh --resume-from=7 # Jump to deploy after manual fix
|
||||
sudo ./setup.sh --status # Check what's done
|
||||
sudo ./setup.sh --dry-run # Validate prerequisites (no changes)
|
||||
sudo ./setup.sh --reset # Start completely over
|
||||
sudo ./setup.sh --help # Show usage
|
||||
```
|
||||
|
||||
Loading…
Reference in New Issue
Block a user