# Codex Agent Prompt: ByteLyst Single-VM E2E Deployment

> **Goal:** Review, harden, test, and complete `setup.sh` so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 30 services healthy.
>
> **IMPORTANT:** Read the "Current State" section below FIRST. Many tasks in this prompt are already completed. Do NOT re-implement them.

---

## Context

This folder contains three files you must work with:

- **`setup.sh`** — 8-phase bash script (~940 lines) that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
- **`README.md`** — Deployment guide documenting what the script does, ports, troubleshooting
- **`prompt.md`** — This file (agent instructions)

The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, Ollama) then clones 11 repos, builds + publishes ~49 `@bytelyst/*` npm packages to a local Gitea registry, generates environment config, and deploys 30 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs).

### Current State (ALREADY IMPLEMENTED — do NOT redo)

The following features are already built and tested in `setup.sh`:

- **Resume/retry support:** `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help` CLI flags
- **Phase completion markers:** Stored in `/opt/bytelyst/.setup-state/phaseN.done`
- **GITEA_NPM_TOKEN auto-restore:** Token saved to `/opt/bytelyst/.gitea_token`, restored on resume
- **Per-service Docker build:** Phase 7 builds each of 30 services individually with `[N/30]` progress
- **Per-service fallback:** Failed builds are skipped, remaining services still start
- **Build logs:** Saved per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`
- **Phase 7 partial failure handling:** Phase 7 NOT marked done if builds fail, so `--resume` retries it
- **set -euo pipefail safety:** All pipelines in fallback paths use `|| true` to prevent premature abort
- **Ollama model pull non-fatal:** Model download failure doesn't abort the entire setup
- **SSH disconnect protection:** All output tee'd to `/opt/bytelyst/setup.log`
- **Idempotent:** Every phase handles re-runs gracefully

### Key files outside this folder that the script depends on

| File | Repo | Purpose |
|------|------|---------|
| `docker-compose.ecosystem.yml` | `learning_ai_common_plat` (root) | Defines all 30 services |
| `.env.ecosystem.example` | `learning_ai_common_plat` (root) | Template for env vars |
| `packages/*/package.json` | `learning_ai_common_plat` | ~49 `@bytelyst/*` packages to publish |
| `backend/Dockerfile` | Each of the 10 product repos | Product backend Docker builds |
| `web/Dockerfile` | Each of the 10 product repos | Product web Docker builds |
| `.npmrc.docker` | Each of the 10 product repos | Gitea npm registry config for Docker builds |

### Repo list (all 11, cloned to `/opt/bytelyst/`)

```
learning_ai_common_plat          # Shared platform: packages, services, dashboards, compose
learning_voice_ai_agent          # LysnrAI
learning_multimodal_memory_agents # MindLyst (web is at mindlyst-native/web/)
learning_ai_clock                # ChronoMind
learning_ai_jarvis_jr            # JarvisJr
learning_ai_fastgap              # NomGap
learning_ai_peakpulse            # PeakPulse
learning_ai_flowmonk             # FlowMonk
learning_ai_notes                # NoteLett
learning_ai_trails               # ActionTrail
learning_ai_local_memory_gpt     # LocalMemGPT
```

GitHub org: `saravanakumardb1` (repos are public).

---

## Bugs Already Fixed (do NOT re-fix these)

The following issues have already been identified and fixed in the current `setup.sh`:

| Bug | Fix | Commit |
|-----|-----|--------|
| Docker apt source had extra whitespace from `\` continuation | Single-line echo | `ddd2db84` |
| Gitea 1.22 returns token in `.sha1`, newer versions use `.token` | `jq -r '.sha1 // .token'` fallback | `ddd2db84` |
| jfrog registry sed didn't handle multi-line `\` continuation | Added `/jfrog-pkg-proxy.*\\$/d` pattern | `ddd2db84` |
| `detect_docker_host_ip()` uses `ip` command not in minimal installs | Added `iproute2` to apt deps | `ddd2db84` |
| SSH disconnect loses all output | `exec > >(tee -a setup.log) 2>&1` | `ddd2db84` |
| `localmemgpt-backend` can't reach Ollama on Linux | `extra_hosts: ['host.docker.internal:host-gateway']` in compose | `3b31709b` |
| Dashboard Dockerfiles had hardcoded corporate proxy | Converted to `ARG`-based proxy with empty defaults | `2b9fd717` |
| `pnpm install --frozen-lockfile` fails on shallow clones | Removed `--frozen-lockfile` | `3b31709b` |
| 3 service Dockerfiles had stale package.json COPY lists | Updated to all 57 packages + workspace members | `85aca553` |
| Phase 5 publish counted 409 conflicts as failures | Distinguish real failures from expected conflicts | `c0bc13e1` |
| `set -e` + `pipefail` aborted script on `docker compose up` partial failure | Added `|| true` | `a9414218` |
| Phase 7 marked done even with partial build failures | Only mark done when all builds succeed | `a9414218` |
| `docker compose config --format json` called 30x in loop | Cached once | `a9414218` |
| `--phase=7` printed success even with failures | Now exits 1 with build log path | `a9414218` |
| `last_completed_phase` didn't enforce sequential order | Stops at first gap | `a3f4c6fa` |
| Phase 7 missing `.env.ecosystem` guard | Fail early with helpful message | `a3f4c6fa` |
| `ollama pull \| tail` aborted entire setup on slow network | Made non-fatal | `b634708d` |

---

## Your Tasks (in priority order)

> **Tasks 1-3 are ALREADY DONE.** See "Current State" above and "Bugs Already Fixed" above.
> Focus on Tasks 4-7 which are the remaining work.

### ~~1. Audit `setup.sh` for correctness~~ ✅ DONE

The script has been audited and all identified bugs fixed (see table above). Phases 1-8 are tested. Key things already verified:
- Docker CE install, Node.js 22 (NodeSource), pnpm 10.6.5, Ollama — all idempotent
- Gitea token: `.sha1 // .token` fallback in place
- Corporate proxy: removed at source in all repos, no runtime `sed` needed
- `pnpm install` runs without `--frozen-lockfile`
- Phase 5 publish: tolerates 409 conflicts
- Phase 6 env: heredoc with Cosmos/Azurite emulator keys, semicolons handled
- Phase 7: per-service build with fallback, BuildKit secrets via `GITEA_NPM_TOKEN` env export
- Phase 8: health check covers all 30 services + Gitea + Ollama

### ~~2. Fix every bug you find~~ ✅ DONE

All bugs fixed — see the 16-item table in "Bugs Already Fixed" above.

### ~~3. Add error recovery and logging~~ ✅ DONE

Already implemented:
- **Phase completion markers:** `/opt/bytelyst/.setup-state/phaseN.done`
- **Resume:** `--resume` (auto-detect), `--resume-from=N`, `--phase=N` (single), `--reset`, `--status`
- **Logging:** `exec > >(tee -a setup.log) 2>&1`
- **Per-service fallback:** Failed Docker builds are skipped, remaining services start
- **Build logs:** Per-service to `/opt/bytelyst/.setup-state/builds/<service>.log`

### 4. Add a dry-run / validation mode (TODO)

Add `--dry-run` support that:
- Checks all prerequisites (disk space, memory, network access to GitHub)
- Validates Docker is installed and running
- Validates Gitea is reachable
- Validates all repos can be cloned (HEAD request to GitHub)
- Does NOT build, publish, or deploy
- Prints a summary of what WOULD happen

### 5. Validate the `docker-compose.ecosystem.yml` integration

Read `docker-compose.ecosystem.yml` (in the repo root) and verify:

- Every service's `build.context` and `build.dockerfile` paths are correct relative to the compose file location
- Every service's port mapping matches the backend's `PORT` env var
- The `x-product-build` anchor correctly provides `GITEA_NPM_HOST` and `gitea_npm_token` secret
- All `depends_on` conditions reference services that actually exist
- The `localmemgpt-backend` service has `extra_hosts: ['host.docker.internal:host-gateway']` for Ollama access
- **30 total services:** 6 infra (pre-built images) + 24 built from Dockerfiles

### 6. Update `README.md`

After all fixes, update `README.md` to reflect:
- CLI flags: `--resume`, `--resume-from=N`, `--phase=N`, `--reset`, `--status`, `--help`
- Correct service count: 30 (not 27)
- Updated duration estimates if phases changed
- Any new troubleshooting entries
- NSG port list: `22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 8081, 10000, 11434`

### 7. Create a test plan

Add a section to `README.md` (or a separate `test-plan.md`) that describes how to validate the deployment end-to-end:

```
1. SSH into VM
2. Run: /opt/bytelyst/check-health.sh
   Expected: All 27+ checks green
3. Run: curl http://localhost:4003/health
   Expected: {"status":"ok","service":"platform-service",...}
4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
   Expected: 201 with user object
5. Open browser: http://<vm-ip>:3001
   Expected: Admin dashboard login page
6. Open browser: http://<vm-ip>:3040
   Expected: FlowMonk web app
7. Run: curl http://localhost:4019/api/models
   Expected: List of Ollama models including llama3.2:3b
8. Open browser: http://<vm-ip>:8025
   Expected: Mailpit inbox (empty)
9. Open browser: http://<vm-ip>:3000
   Expected: Grafana login (admin / bytelyst)
```

---

## Constraints

- **DO NOT** change any files outside `docs/devops/single_azure_vm/` without asking
- **DO NOT** modify `docker-compose.ecosystem.yml` or any Dockerfile — the script must work with the repos as-is (it patches Dockerfiles after cloning)
- **DO NOT** hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
- **DO NOT** add emojis to code
- **DO NOT** use `console.log` or `print` — use the existing `log()`, `ok()`, `warn()`, `fail()` helpers
- The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
- The script MUST be idempotent — running it twice should not break anything
- The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)

## Definition of Done

- [ ] `setup.sh` runs flawlessly from `sudo ./setup.sh` on a raw Ubuntu 24.04 VM
- [ ] All 8 phases complete without manual intervention
- [ ] `/opt/bytelyst/check-health.sh` shows ALL 30+ services green
- [ ] All 10 product backends respond to `/health` with `{"status":"ok",...}`
- [ ] All 9 product web apps serve their landing page
- [ ] Admin dashboard (`http://<vm-ip>:3001`) loads
- [ ] Tracker dashboard (`http://<vm-ip>:3003`) loads
- [ ] LocalMemGPT can reach Ollama (`curl http://localhost:4019/api/models` returns models)
- [ ] Gitea UI accessible at `http://<vm-ip>:3300` with all `@bytelyst/*` packages visible
- [ ] Grafana accessible at `http://<vm-ip>:3000` (admin / bytelyst)
- [ ] Mailpit accessible at `http://<vm-ip>:8025`
- [ ] `README.md` is accurate and complete
- [ ] Script is idempotent (second run succeeds without errors)
- [ ] Resume works: `sudo ./setup.sh --resume` after interrupted run
- [ ] Single-phase retry works: `sudo ./setup.sh --phase=7` after build failure
- [ ] Setup log saved to `/opt/bytelyst/setup.log`
- [ ] Build logs saved per-service to `/opt/bytelyst/.setup-state/builds/`

---

## Architecture Reference

```
Raw Ubuntu 24.04 VM
├── Ollama (systemd, :11434) ─── local LLM inference
├── Gitea (Docker, :3300) ────── npm package registry
└── Docker Compose Ecosystem (30 services)
    ├── Infrastructure
    │   ├── cosmos-emulator (:8081, :1234)
    │   ├── azurite (:10000)
    │   ├── mailpit (:1025, :8025)
    │   ├── loki (:3100)
    │   ├── grafana (:3000)
    │   └── gateway/traefik (:80, :8080)
    ├── Platform Services
    │   ├── platform-service (:4003) ── auth, billing, flags, audit
    │   ├── extraction-service (:4005) ── AI text extraction
    │   └── mcp-server (:4007) ── MCP tool server
    ├── Dashboards
    │   ├── admin-web (:3001) ── platform admin console
    │   └── tracker-web (:3003) ── issue tracker
    ├── Product Backends (Fastify 5 + TypeScript)
    │   ├── peakpulse-backend (:4010)
    │   ├── chronomind-backend (:4011)
    │   ├── jarvisjr-backend (:4012)
    │   ├── nomgap-backend (:4013)
    │   ├── mindlyst-backend (:4014)
    │   ├── lysnrai-backend (:4015)
    │   ├── notelett-backend (:4016)
    │   ├── flowmonk-backend (:4017)
    │   ├── actiontrail-backend (:4018)
    │   └── localmemgpt-backend (:4019) ── connects to Ollama
    └── Product Web Apps (Next.js 16)
        ├── lysnrai-web (:3002)
        ├── chronomind-web (:3030)
        ├── jarvisjr-web (:3035)
        ├── flowmonk-web (:3040)
        ├── notelett-web (:3045)
        ├── mindlyst-web (:3050)
        ├── nomgap-web (:3055)
        ├── actiontrail-web (:3060)
        └── localmemgpt-web (:3070)
```

## How Docker Builds Reach Gitea

Product Dockerfiles use BuildKit secret mount for the npm token:
```dockerfile
RUN --mount=type=secret,id=gitea_npm_token \
    cp .npmrc.docker .npmrc && \
    GITEA_NPM_TOKEN=$(cat /run/secrets/gitea_npm_token) \
    pnpm install
```

The `.npmrc.docker` in each product repo uses `${GITEA_NPM_HOST}:3300` as the registry host.
During `docker compose build`, the host's `GITEA_NPM_TOKEN` env var is passed as a BuildKit secret,
and `GITEA_NPM_HOST` is passed as a build arg (defaults to `host.docker.internal`, overridden to
`172.17.0.1` on Linux VMs by the setup script).

## CLI Reference

```bash
sudo ./setup.sh                    # Fresh install (all 8 phases)
sudo ./setup.sh --phase=7          # Retry just the deploy phase
sudo ./setup.sh --resume           # Auto-resume after SSH disconnect
sudo ./setup.sh --resume-from=7    # Jump to deploy after manual fix
sudo ./setup.sh --status           # Check what's done
sudo ./setup.sh --reset            # Start completely over
sudo ./setup.sh --help             # Show usage
```