learning_ai_common_plat/docs/devops/single_azure_vm/prompt.md
saravanakumardb1 40731e06f4 docs(infra): update prompt.md with 15 new bug fixes and stale corrections
- Added 15 recent fixes to the Bugs Already Fixed table
- Fixed line count (~940 → ~990)
- Fixed stale lysnrai-web → lysnrai-dashboard in architecture diagram
- Fixed test plan service count (27+ → 30+)
- Updated constraint: compose/Dockerfile changes allowed with verification
2026-03-24 13:49:17 -07:00

16 KiB

Codex Agent Prompt: ByteLyst Single-VM E2E Deployment

Goal: Review, harden, test, and complete setup.sh so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 30 services healthy.

IMPORTANT: Read the "Current State" section below FIRST. Many tasks in this prompt are already completed. Do NOT re-implement them.


Context

This folder contains three files you must work with:

  • setup.sh — 8-phase bash script (~990 lines) that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
  • README.md — Deployment guide documenting what the script does, ports, troubleshooting
  • prompt.md — This file (agent instructions)

The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, Ollama) then clones 11 repos, builds + publishes ~49 @bytelyst/* npm packages to a local Gitea registry, generates environment config, and deploys 30 Docker Compose services (6 infra + 3 platform + 2 dashboards + 10 backends + 9 webs).

Current State (ALREADY IMPLEMENTED — do NOT redo)

The following features are already built and tested in setup.sh:

  • Resume/retry support: --resume, --resume-from=N, --phase=N, --reset, --status, --help CLI flags
  • Phase completion markers: Stored in /opt/bytelyst/.setup-state/phaseN.done
  • GITEA_NPM_TOKEN auto-restore: Token saved to /opt/bytelyst/.gitea_token, restored on resume
  • Per-service Docker build: Phase 7 builds each of 30 services individually with [N/30] progress
  • Per-service fallback: Failed builds are skipped, remaining services still start
  • Build logs: Saved per-service to /opt/bytelyst/.setup-state/builds/<service>.log
  • Phase 7 partial failure handling: Phase 7 NOT marked done if builds fail, so --resume retries it
  • set -euo pipefail safety: All pipelines in fallback paths use || true to prevent premature abort
  • Ollama model pull non-fatal: Model download failure doesn't abort the entire setup
  • SSH disconnect protection: All output tee'd to /opt/bytelyst/setup.log
  • Idempotent: Every phase handles re-runs gracefully

Key files outside this folder that the script depends on

File Repo Purpose
docker-compose.ecosystem.yml learning_ai_common_plat (root) Defines all 30 services
.env.ecosystem.example learning_ai_common_plat (root) Template for env vars
packages/*/package.json learning_ai_common_plat ~49 @bytelyst/* packages to publish
backend/Dockerfile Each of the 10 product repos Product backend Docker builds
web/Dockerfile Each of the 10 product repos Product web Docker builds
.npmrc.docker Each of the 10 product repos Gitea npm registry config for Docker builds

Repo list (all 11, cloned to /opt/bytelyst/)

learning_ai_common_plat          # Shared platform: packages, services, dashboards, compose
learning_voice_ai_agent          # LysnrAI
learning_multimodal_memory_agents # MindLyst (web is at mindlyst-native/web/)
learning_ai_clock                # ChronoMind
learning_ai_jarvis_jr            # JarvisJr
learning_ai_fastgap              # NomGap
learning_ai_peakpulse            # PeakPulse
learning_ai_flowmonk             # FlowMonk
learning_ai_notes                # NoteLett
learning_ai_trails               # ActionTrail
learning_ai_local_memory_gpt     # LocalMemGPT

GitHub org: saravanakumardb1 (repos are public).


Bugs Already Fixed (do NOT re-fix these)

The following issues have already been identified and fixed in the current setup.sh:

Bug Fix Commit
Docker apt source had extra whitespace from \ continuation Single-line echo ddd2db84
Gitea 1.22 returns token in .sha1, newer versions use .token jq -r '.sha1 // .token' fallback ddd2db84
jfrog registry sed didn't handle multi-line \ continuation Added /jfrog-pkg-proxy.*\\$/d pattern ddd2db84
detect_docker_host_ip() uses ip command not in minimal installs Added iproute2 to apt deps ddd2db84
SSH disconnect loses all output exec > >(tee -a setup.log) 2>&1 ddd2db84
localmemgpt-backend can't reach Ollama on Linux extra_hosts: ['host.docker.internal:host-gateway'] in compose 3b31709b
Dashboard Dockerfiles had hardcoded corporate proxy Converted to ARG-based proxy with empty defaults 2b9fd717
pnpm install --frozen-lockfile fails on shallow clones Removed --frozen-lockfile 3b31709b
3 service Dockerfiles had stale package.json COPY lists Updated to all 57 packages + workspace members 85aca553
Phase 5 publish counted 409 conflicts as failures Distinguish real failures from expected conflicts c0bc13e1
set -e + pipefail aborted script on docker compose up partial failure Added `
Phase 7 marked done even with partial build failures Only mark done when all builds succeed a9414218
docker compose config --format json called 30x in loop Cached once a9414218
--phase=7 printed success even with failures Now exits 1 with build log path a9414218
last_completed_phase didn't enforce sequential order Stops at first gap a3f4c6fa
Phase 7 missing .env.ecosystem guard Fail early with helpful message a3f4c6fa
ollama pull | tail aborted entire setup on slow network Made non-fatal b634708d
NodeSource curl|bash deprecated install method Modern GPG key + apt source method c2ca7f53
Missing build-essential python3 for native addons Added to apt deps c2ca7f53
pnpm -r build fails on workspace members without build script Added --if-present flag c2ca7f53
gpg --dearmor prompts on re-run if keyring exists Added --batch --yes 1a1f7dd5
jq aborts script on malformed Gitea token response Added 2>/dev/null || echo "" guard 1a1f7dd5
pnpm install/build failures show no useful message Wrapped in if ! ...; then fail("...") 1a1f7dd5
Docker builds OOM with Ollama + Cosmos (~7 GB combined) Stop Ollama during Phase 7, restart after 1a1f7dd5
Pre-flight: script runs on tiny VMs with no warning Added disk (≥40 GB) and RAM (≥16 GB) checks 1a1f7dd5
Azurite + Loki missing from Phase 8 health checks Added both to check-health.sh f78d382d
GITEA_NPM_TOKEN silently empty on resume Added require_gitea_token() guard in Phase 4 + 7 e928ec60
Dashboard Dockerfiles --frozen-lockfile fails (incomplete workspace) Removed from admin-web + tracker-web e928ec60
Docker build cache exhausts disk (~20-40 GB) Added docker builder prune after Phase 7 e928ec60
Compose NEXT_PUBLIC_* env vars wrong for 8/9 web services Fixed per-service to match product code 01f2276a
MindLyst web 3 files fallback to production URLs Changed to http://localhost:4003 09bdda8

Your Tasks (in priority order)

Tasks 1-3 are ALREADY DONE. See "Current State" above and "Bugs Already Fixed" above. Focus on Tasks 4-7 which are the remaining work.

1. Audit setup.sh for correctness DONE

The script has been audited and all identified bugs fixed (see table above). Phases 1-8 are tested. Key things already verified:

  • Docker CE install, Node.js 22 (NodeSource), pnpm 10.6.5, Ollama — all idempotent
  • Gitea token: .sha1 // .token fallback in place
  • Corporate proxy: removed at source in all repos, no runtime sed needed
  • pnpm install runs without --frozen-lockfile
  • Phase 5 publish: tolerates 409 conflicts
  • Phase 6 env: heredoc with Cosmos/Azurite emulator keys, semicolons handled
  • Phase 7: per-service build with fallback, BuildKit secrets via GITEA_NPM_TOKEN env export
  • Phase 8: health check covers all 30 services + Gitea + Ollama

2. Fix every bug you find DONE

All bugs fixed — see the 16-item table in "Bugs Already Fixed" above.

3. Add error recovery and logging DONE

Already implemented:

  • Phase completion markers: /opt/bytelyst/.setup-state/phaseN.done
  • Resume: --resume (auto-detect), --resume-from=N, --phase=N (single), --reset, --status
  • Logging: exec > >(tee -a setup.log) 2>&1
  • Per-service fallback: Failed Docker builds are skipped, remaining services start
  • Build logs: Per-service to /opt/bytelyst/.setup-state/builds/<service>.log

4. Add a dry-run / validation mode (TODO)

Add --dry-run support that:

  • Checks all prerequisites (disk space, memory, network access to GitHub)
  • Validates Docker is installed and running
  • Validates Gitea is reachable
  • Validates all repos can be cloned (HEAD request to GitHub)
  • Does NOT build, publish, or deploy
  • Prints a summary of what WOULD happen

5. Validate the docker-compose.ecosystem.yml integration

Read docker-compose.ecosystem.yml (in the repo root) and verify:

  • Every service's build.context and build.dockerfile paths are correct relative to the compose file location
  • Every service's port mapping matches the backend's PORT env var
  • The x-product-build anchor correctly provides GITEA_NPM_HOST and gitea_npm_token secret
  • All depends_on conditions reference services that actually exist
  • The localmemgpt-backend service has extra_hosts: ['host.docker.internal:host-gateway'] for Ollama access
  • 30 total services: 6 infra (pre-built images) + 24 built from Dockerfiles

6. Update README.md

After all fixes, update README.md to reflect:

  • CLI flags: --resume, --resume-from=N, --phase=N, --reset, --status, --help
  • Correct service count: 30 (not 27)
  • Updated duration estimates if phases changed
  • Any new troubleshooting entries
  • NSG port list: 22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 8081, 10000, 11434

7. Create a test plan

Add a section to README.md (or a separate test-plan.md) that describes how to validate the deployment end-to-end:

1. SSH into VM
2. Run: /opt/bytelyst/check-health.sh
   Expected: All 30+ checks green
3. Run: curl http://localhost:4003/health
   Expected: {"status":"ok","service":"platform-service",...}
4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
   Expected: 201 with user object
5. Open browser: http://<vm-ip>:3001
   Expected: Admin dashboard login page
6. Open browser: http://<vm-ip>:3040
   Expected: FlowMonk web app
7. Run: curl http://localhost:4019/api/models
   Expected: List of Ollama models including llama3.2:3b
8. Open browser: http://<vm-ip>:8025
   Expected: Mailpit inbox (empty)
9. Open browser: http://<vm-ip>:3000
   Expected: Grafana login (admin / bytelyst)

Constraints

  • DO NOT change any files outside docs/devops/single_azure_vm/ without asking
  • DO NOT modify docker-compose.ecosystem.yml or any Dockerfile without verifying the change is correct across all affected services
  • DO NOT hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
  • DO NOT add emojis to code
  • DO NOT use console.log or print — use the existing log(), ok(), warn(), fail() helpers
  • The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
  • The script MUST be idempotent — running it twice should not break anything
  • The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)

Definition of Done

  • setup.sh runs flawlessly from sudo ./setup.sh on a raw Ubuntu 24.04 VM
  • All 8 phases complete without manual intervention
  • /opt/bytelyst/check-health.sh shows ALL 30+ services green
  • All 10 product backends respond to /health with {"status":"ok",...}
  • All 9 product web apps serve their landing page
  • Admin dashboard (http://<vm-ip>:3001) loads
  • Tracker dashboard (http://<vm-ip>:3003) loads
  • LocalMemGPT can reach Ollama (curl http://localhost:4019/api/models returns models)
  • Gitea UI accessible at http://<vm-ip>:3300 with all @bytelyst/* packages visible
  • Grafana accessible at http://<vm-ip>:3000 (admin / bytelyst)
  • Mailpit accessible at http://<vm-ip>:8025
  • README.md is accurate and complete
  • Script is idempotent (second run succeeds without errors)
  • Resume works: sudo ./setup.sh --resume after interrupted run
  • Single-phase retry works: sudo ./setup.sh --phase=7 after build failure
  • Setup log saved to /opt/bytelyst/setup.log
  • Build logs saved per-service to /opt/bytelyst/.setup-state/builds/

Architecture Reference

Raw Ubuntu 24.04 VM
├── Ollama (systemd, :11434) ─── local LLM inference
├── Gitea (Docker, :3300) ────── npm package registry
└── Docker Compose Ecosystem (30 services)
    ├── Infrastructure
    │   ├── cosmos-emulator (:8081, :1234)
    │   ├── azurite (:10000)
    │   ├── mailpit (:1025, :8025)
    │   ├── loki (:3100)
    │   ├── grafana (:3000)
    │   └── gateway/traefik (:80, :8080)
    ├── Platform Services
    │   ├── platform-service (:4003) ── auth, billing, flags, audit
    │   ├── extraction-service (:4005) ── AI text extraction
    │   └── mcp-server (:4007) ── MCP tool server
    ├── Dashboards
    │   ├── admin-web (:3001) ── platform admin console
    │   └── tracker-web (:3003) ── issue tracker
    ├── Product Backends (Fastify 5 + TypeScript)
    │   ├── peakpulse-backend (:4010)
    │   ├── chronomind-backend (:4011)
    │   ├── jarvisjr-backend (:4012)
    │   ├── nomgap-backend (:4013)
    │   ├── mindlyst-backend (:4014)
    │   ├── lysnrai-backend (:4015)
    │   ├── notelett-backend (:4016)
    │   ├── flowmonk-backend (:4017)
    │   ├── actiontrail-backend (:4018)
    │   └── localmemgpt-backend (:4019) ── connects to Ollama
    └── Product Web Apps (Next.js 16)
        ├── lysnrai-dashboard (:3002)
        ├── chronomind-web (:3030)
        ├── jarvisjr-web (:3035)
        ├── flowmonk-web (:3040)
        ├── notelett-web (:3045)
        ├── mindlyst-web (:3050)
        ├── nomgap-web (:3055)
        ├── actiontrail-web (:3060)
        └── localmemgpt-web (:3070)

How Docker Builds Reach Gitea

Product Dockerfiles use BuildKit secret mount for the npm token:

RUN --mount=type=secret,id=gitea_npm_token \
    cp .npmrc.docker .npmrc && \
    GITEA_NPM_TOKEN=$(cat /run/secrets/gitea_npm_token) \
    pnpm install

The .npmrc.docker in each product repo uses ${GITEA_NPM_HOST}:3300 as the registry host. During docker compose build, the host's GITEA_NPM_TOKEN env var is passed as a BuildKit secret, and GITEA_NPM_HOST is passed as a build arg (defaults to host.docker.internal, overridden to 172.17.0.1 on Linux VMs by the setup script).

CLI Reference

sudo ./setup.sh                    # Fresh install (all 8 phases)
sudo ./setup.sh --phase=7          # Retry just the deploy phase
sudo ./setup.sh --resume           # Auto-resume after SSH disconnect
sudo ./setup.sh --resume-from=7    # Jump to deploy after manual fix
sudo ./setup.sh --status           # Check what's done
sudo ./setup.sh --reset            # Start completely over
sudo ./setup.sh --help             # Show usage