docs(infra): add Codex agent handoff prompt for VM setup
This commit is contained in:
parent
7c34cee0ab
commit
6abf13d983
265
docs/devops/single_azure_vm/prompt.md
Normal file
265
docs/devops/single_azure_vm/prompt.md
Normal file
@ -0,0 +1,265 @@
|
||||
# Codex Agent Prompt: ByteLyst Single-VM E2E Deployment
|
||||
|
||||
> **Goal:** Review, harden, test, and complete `setup.sh` so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 27 services healthy.
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
This folder contains two files you must work with:
|
||||
|
||||
- **`setup.sh`** — 8-phase bash script that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
|
||||
- **`README.md`** — Deployment guide documenting what the script does, ports, troubleshooting
|
||||
|
||||
The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, Ollama) then clones 11 repos, builds + publishes ~49 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 27 Docker Compose services.
|
||||
|
||||
### Key files outside this folder that the script depends on
|
||||
|
||||
| File | Repo | Purpose |
|
||||
|------|------|---------|
|
||||
| `docker-compose.ecosystem.yml` | `learning_ai_common_plat` (root) | Defines all 27 services |
|
||||
| `.env.ecosystem.example` | `learning_ai_common_plat` (root) | Template for env vars |
|
||||
| `packages/*/package.json` | `learning_ai_common_plat` | ~49 `@bytelyst/*` packages to publish |
|
||||
| `backend/Dockerfile` | Each of the 10 product repos | Product backend Docker builds |
|
||||
| `web/Dockerfile` | Each of the 10 product repos | Product web Docker builds |
|
||||
| `.npmrc.docker` | Each of the 10 product repos | Gitea npm registry config for Docker builds |
|
||||
|
||||
### Repo list (all 11, cloned to `/opt/bytelyst/`)
|
||||
|
||||
```
|
||||
learning_ai_common_plat # Shared platform: packages, services, dashboards, compose
|
||||
learning_voice_ai_agent # LysnrAI
|
||||
learning_multimodal_memory_agents # MindLyst (web is at mindlyst-native/web/)
|
||||
learning_ai_clock # ChronoMind
|
||||
learning_ai_jarvis_jr # JarvisJr
|
||||
learning_ai_fastgap # NomGap
|
||||
learning_ai_peakpulse # PeakPulse
|
||||
learning_ai_flowmonk # FlowMonk
|
||||
learning_ai_notes # NoteLett
|
||||
learning_ai_trails # ActionTrail
|
||||
learning_ai_local_memory_gpt # LocalMemGPT
|
||||
```
|
||||
|
||||
GitHub org: `saravanakumardb1` (repos are public).
|
||||
|
||||
---
|
||||
|
||||
## Your Tasks (in priority order)
|
||||
|
||||
### 1. Audit `setup.sh` for correctness and completeness
|
||||
|
||||
Read the entire script and identify every potential failure point. Specifically check:
|
||||
|
||||
- **Phase 1 (System):**
|
||||
- Docker CE install via official apt repo — verify the GPG key + sources.list format works on Ubuntu 24.04
|
||||
- Node.js 22 via NodeSource — verify `setup_22.x` URL is current
|
||||
- pnpm 10.6.5 via `npm install -g` — correct
|
||||
- Ollama install via `https://ollama.com/install.sh` — verify it starts as systemd service, has fallback
|
||||
- All commands must be non-interactive (`DEBIAN_FRONTEND=noninteractive`)
|
||||
|
||||
- **Phase 2 (Gitea):**
|
||||
- Gitea Docker container `gitea/gitea:1.22` on port 3300
|
||||
- Admin user creation via `gitea admin user create` inside the container
|
||||
- Organization creation via REST API (`POST /api/v1/orgs`)
|
||||
- API token creation with `write:package` + `read:package` scopes
|
||||
- Token extracted via `jq -r '.sha1'` — verify Gitea 1.22 returns `.sha1` (not `.token`)
|
||||
|
||||
- **Phase 3 (Clone):**
|
||||
- Shallow clone (`--depth 1`) all 11 repos
|
||||
- Corporate proxy stripping: `sed` removes `HTTP_PROXY`, `HTTPS_PROXY`, `NO_PROXY` ENV lines and `jfrog-pkg-proxy` registry references from ALL Dockerfiles
|
||||
- **CRITICAL:** Verify the glob patterns catch ALL Dockerfiles including special paths:
|
||||
- `learning_multimodal_memory_agents/mindlyst-native/web/Dockerfile`
|
||||
- `learning_voice_ai_agent/user-dashboard-web/Dockerfile`
|
||||
- `learning_voice_ai_agent/backend/Dockerfile` (has backend-python too)
|
||||
- Verify `sed -i` works on Alpine/Ubuntu (GNU sed, not BSD)
|
||||
|
||||
- **Phase 4 (Build):**
|
||||
- `.npmrc` written to common-plat root with Gitea registry URL + token
|
||||
- `pnpm install` (no `--frozen-lockfile` — shallow clones may have lockfile drift)
|
||||
- `pnpm -r build` — builds ALL packages in dependency order
|
||||
- Verify this works when run as root (pnpm may have permission issues)
|
||||
|
||||
- **Phase 5 (Publish):**
|
||||
- Iterates `packages/*/`, skips non-`@bytelyst/*`, skips `private: true`, skips packages without `dist/`
|
||||
- `pnpm publish --registry <url> --no-git-checks`
|
||||
- Must tolerate "already exists" 409 errors gracefully
|
||||
|
||||
- **Phase 6 (Env):**
|
||||
- Generates `.env.ecosystem` with well-known Cosmos emulator key and Azurite key
|
||||
- Verify the heredoc correctly expands `${COSMOS_EMULATOR_KEY}` and `${AZURITE_KEY}`
|
||||
- Verify the `AZURE_BLOB_CONNECTION_STRING` semicolons don't break the env file
|
||||
- JWT secret generated via `openssl rand -base64 32`
|
||||
|
||||
- **Phase 7 (Deploy):**
|
||||
- `detect_docker_host_ip()` returns docker0 bridge IP (usually `172.17.0.1`)
|
||||
- `GITEA_NPM_HOST` set to this IP so Docker builds can reach Gitea on the host
|
||||
- `docker compose up --build -d` with BuildKit secrets for `GITEA_NPM_TOKEN`
|
||||
- Verify the `x-product-build` YAML anchor in `docker-compose.ecosystem.yml` correctly passes `GITEA_NPM_HOST` as build arg and `gitea_npm_token` as secret
|
||||
|
||||
- **Phase 8 (Verify):**
|
||||
- Waits for `platform-service` health (120s timeout)
|
||||
- Creates `/opt/bytelyst/check-health.sh` with all 27+ service URLs
|
||||
- Sleeps 30s then runs health check
|
||||
|
||||
### 2. Fix every bug you find
|
||||
|
||||
Do not just report issues — fix them directly in `setup.sh`. Common pitfalls to watch for:
|
||||
|
||||
- **Gitea API token field:** Gitea 1.22+ may return the token in `.token` instead of `.sha1`. Add fallback: `jq -r '.sha1 // .token'`
|
||||
- **pnpm as root:** May need `--unsafe-perm` or setting `pnpm config set unsafe-perm true`
|
||||
- **Docker BuildKit secrets:** The `secrets.gitea_npm_token.environment` directive requires the env var to be set in the shell running `docker compose`. Verify `export GITEA_NPM_TOKEN` is in scope.
|
||||
- **Cosmos emulator on Linux:** The `vnext-preview` image requires `PROTOCOL=http`. Verify `cosmos-emulator` healthcheck works (it checks port 8080 for `/ready`, not 8081).
|
||||
- **Product Dockerfiles:** Each uses `--mount=type=secret,id=gitea_npm_token` during `pnpm install`. Verify the secret ID matches what's in the compose file.
|
||||
- **MindLyst special path:** Its web Dockerfile is at `mindlyst-native/web/Dockerfile` (not `web/Dockerfile`). The compose file references `../learning_multimodal_memory_agents` with `dockerfile: mindlyst-native/web/Dockerfile`. Verify this context + dockerfile path is correct.
|
||||
- **LysnrAI extra dashboards:** Has `user-dashboard-web/Dockerfile` in addition to `backend/Dockerfile`. Verify the compose references the correct paths.
|
||||
|
||||
### 3. Add error recovery and logging
|
||||
|
||||
The script uses `set -euo pipefail` which exits on any error. This is too aggressive for a 25-minute deployment. Add:
|
||||
|
||||
- **Per-phase error trapping:** Wrap each phase in a function that catches errors and prints a clear message about which phase failed and what to check
|
||||
- **Log file:** Tee all output to `/opt/bytelyst/setup.log` so the user can review after SSH disconnection
|
||||
- **Resume support:** Save phase completion markers to `/opt/bytelyst/.phase_complete_N`. On re-run, skip already-completed phases (unless the user passes `FORCE_RERUN=1`)
|
||||
|
||||
### 4. Add a dry-run / validation mode
|
||||
|
||||
Add `DRY_RUN=1` support that:
|
||||
- Checks all prerequisites (disk space, memory, network access to GitHub)
|
||||
- Validates Docker is installed and running
|
||||
- Validates Gitea is reachable
|
||||
- Validates all repos can be cloned (HEAD request to GitHub)
|
||||
- Does NOT build, publish, or deploy
|
||||
- Prints a summary of what WOULD happen
|
||||
|
||||
### 5. Validate the `docker-compose.ecosystem.yml` integration
|
||||
|
||||
Read `docker-compose.ecosystem.yml` (in the repo root) and verify:
|
||||
|
||||
- Every service's `build.context` and `build.dockerfile` paths are correct relative to the compose file location
|
||||
- Every service's port mapping matches the backend's `PORT` env var
|
||||
- The `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret
|
||||
- All `depends_on` conditions reference services that actually exist
|
||||
- The `localmemgpt-backend` service has `extra_hosts: ['host.docker.internal:host-gateway']` for Ollama access
|
||||
|
||||
### 6. Update `README.md`
|
||||
|
||||
After all fixes, update `README.md` to reflect:
|
||||
- Any new env vars you added (e.g., `DRY_RUN`, `FORCE_RERUN`)
|
||||
- Updated duration estimates if phases changed
|
||||
- Any new troubleshooting entries
|
||||
- NSG port list: `22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 8081, 10000, 11434`
|
||||
|
||||
### 7. Create a test plan
|
||||
|
||||
Add a section to `README.md` (or a separate `test-plan.md`) that describes how to validate the deployment end-to-end:
|
||||
|
||||
```
|
||||
1. SSH into VM
|
||||
2. Run: /opt/bytelyst/check-health.sh
|
||||
Expected: All 27+ checks green
|
||||
3. Run: curl http://localhost:4003/health
|
||||
Expected: {"status":"ok","service":"platform-service",...}
|
||||
4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
|
||||
Expected: 201 with user object
|
||||
5. Open browser: http://<vm-ip>:3001
|
||||
Expected: Admin dashboard login page
|
||||
6. Open browser: http://<vm-ip>:3040
|
||||
Expected: FlowMonk web app
|
||||
7. Run: curl http://localhost:4019/api/models
|
||||
Expected: List of Ollama models including llama3.2:3b
|
||||
8. Open browser: http://<vm-ip>:8025
|
||||
Expected: Mailpit inbox (empty)
|
||||
9. Open browser: http://<vm-ip>:3000
|
||||
Expected: Grafana login (admin / bytelyst)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Constraints
|
||||
|
||||
- **DO NOT** change any files outside `docs/devops/single_azure_vm/` without asking
|
||||
- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile — the script must work with the repos as-is (it patches Dockerfiles after cloning)
|
||||
- **DO NOT** hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
|
||||
- **DO NOT** add emojis to code
|
||||
- **DO NOT** use `console.log` or `print` — use the existing `log()`, `ok()`, `warn()`, `fail()` helpers
|
||||
- The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
|
||||
- The script MUST be idempotent — running it twice should not break anything
|
||||
- The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `setup.sh` runs flawlessly from `sudo ./setup.sh` on a raw Ubuntu 24.04 VM
|
||||
- [ ] All 8 phases complete without manual intervention
|
||||
- [ ] `/opt/bytelyst/check-health.sh` shows ALL services green
|
||||
- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
|
||||
- [ ] All 9 product web apps serve their landing page
|
||||
- [ ] Admin dashboard (`http://<vm-ip>:3001`) loads
|
||||
- [ ] Tracker dashboard (`http://<vm-ip>:3003`) loads
|
||||
- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
|
||||
- [ ] Gitea UI accessible at `http://<vm-ip>:3300` with all `@bytelyst/*` packages visible
|
||||
- [ ] Grafana accessible at `http://<vm-ip>:3000` (admin / bytelyst)
|
||||
- [ ] Mailpit accessible at `http://<vm-ip>:8025`
|
||||
- [ ] `README.md` is accurate and complete
|
||||
- [ ] Script is idempotent (second run succeeds without errors)
|
||||
- [ ] Setup log saved to `/opt/bytelyst/setup.log`
|
||||
|
||||
---
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
```
|
||||
Raw Ubuntu 24.04 VM
|
||||
├── Ollama (systemd, :11434) ─── local LLM inference
|
||||
├── Gitea (Docker, :3300) ────── npm package registry
|
||||
└── Docker Compose Ecosystem (27 services)
|
||||
├── Infrastructure
|
||||
│ ├── cosmos-emulator (:8081, :1234)
|
||||
│ ├── azurite (:10000)
|
||||
│ ├── mailpit (:1025, :8025)
|
||||
│ ├── loki (:3100)
|
||||
│ ├── grafana (:3000)
|
||||
│ └── gateway/traefik (:80, :8080)
|
||||
├── Platform Services
|
||||
│ ├── platform-service (:4003) ── auth, billing, flags, audit
|
||||
│ ├── extraction-service (:4005) ── AI text extraction
|
||||
│ └── mcp-server (:4007) ── MCP tool server
|
||||
├── Dashboards
|
||||
│ ├── admin-web (:3001) ── platform admin console
|
||||
│ └── tracker-web (:3003) ── issue tracker
|
||||
├── Product Backends (Fastify 5 + TypeScript)
|
||||
│ ├── peakpulse-backend (:4010)
|
||||
│ ├── chronomind-backend (:4011)
|
||||
│ ├── jarvisjr-backend (:4012)
|
||||
│ ├── nomgap-backend (:4013)
|
||||
│ ├── mindlyst-backend (:4014)
|
||||
│ ├── lysnrai-backend (:4015)
|
||||
│ ├── notelett-backend (:4016)
|
||||
│ ├── flowmonk-backend (:4017)
|
||||
│ ├── actiontrail-backend (:4018)
|
||||
│ └── localmemgpt-backend (:4019) ── connects to Ollama
|
||||
└── Product Web Apps (Next.js 16)
|
||||
├── lysnrai-web (:3002)
|
||||
├── chronomind-web (:3030)
|
||||
├── jarvisjr-web (:3035)
|
||||
├── flowmonk-web (:3040)
|
||||
├── notelett-web (:3045)
|
||||
├── mindlyst-web (:3050)
|
||||
├── nomgap-web (:3055)
|
||||
├── actiontrail-web (:3060)
|
||||
└── localmemgpt-web (:3070)
|
||||
```
|
||||
|
||||
## How Docker Builds Reach Gitea
|
||||
|
||||
Product Dockerfiles use BuildKit secret mount for the npm token:
|
||||
```dockerfile
|
||||
RUN --mount=type=secret,id=gitea_npm_token \
|
||||
cp .npmrc.docker .npmrc && \
|
||||
GITEA_NPM_TOKEN=$(cat /run/secrets/gitea_npm_token) \
|
||||
pnpm install --frozen-lockfile
|
||||
```
|
||||
|
||||
The `.npmrc.docker` in each product repo uses `${GITEA_NPM_HOST}:3300` as the registry host.
|
||||
During `docker compose build`, the host's `GITEA_NPM_TOKEN` env var is passed as a BuildKit secret,
|
||||
and `GITEA_NPM_HOST` is passed as a build arg (defaults to `host.docker.internal`, overridden to
|
||||
`172.17.0.1` on Linux VMs by the setup script).
|
||||
Loading…
Reference in New Issue
Block a user