learning_ai_common_plat/docs/devops/single_azure_vm/prompt.md

14 KiB

Codex Agent Prompt: ByteLyst Single-VM E2E Deployment

Goal: Review, harden, test, and complete setup.sh so it works flawlessly on a raw Ubuntu 24.04 Azure VM — zero manual intervention, 100% completion, all 27 services healthy.


Context

This folder contains two files you must work with:

  • setup.sh — 8-phase bash script that bootstraps the entire ByteLyst ecosystem on a blank Ubuntu VM
  • README.md — Deployment guide documenting what the script does, ports, troubleshooting

The script installs everything from scratch (Docker, Node.js, pnpm, Gitea, Ollama) then clones 11 repos, builds + publishes ~49 @bytelyst/* npm packages to a local Gitea registry, generates environment config, and deploys 27 Docker Compose services.

Key files outside this folder that the script depends on

File Repo Purpose
docker-compose.ecosystem.yml learning_ai_common_plat (root) Defines all 27 services
.env.ecosystem.example learning_ai_common_plat (root) Template for env vars
packages/*/package.json learning_ai_common_plat ~49 @bytelyst/* packages to publish
backend/Dockerfile Each of the 10 product repos Product backend Docker builds
web/Dockerfile Each of the 10 product repos Product web Docker builds
.npmrc.docker Each of the 10 product repos Gitea npm registry config for Docker builds

Repo list (all 11, cloned to /opt/bytelyst/)

learning_ai_common_plat          # Shared platform: packages, services, dashboards, compose
learning_voice_ai_agent          # LysnrAI
learning_multimodal_memory_agents # MindLyst (web is at mindlyst-native/web/)
learning_ai_clock                # ChronoMind
learning_ai_jarvis_jr            # JarvisJr
learning_ai_fastgap              # NomGap
learning_ai_peakpulse            # PeakPulse
learning_ai_flowmonk             # FlowMonk
learning_ai_notes                # NoteLett
learning_ai_trails               # ActionTrail
learning_ai_local_memory_gpt     # LocalMemGPT

GitHub org: saravanakumardb1 (repos are public).


Bugs Already Fixed (do NOT re-fix these)

The following issues have already been identified and fixed in the current setup.sh:

Bug Fix Commit
Docker apt source had extra whitespace from \ continuation Single-line echo ddd2db84
Gitea 1.22 returns token in .sha1, newer versions use .token jq -r '.sha1 // .token' fallback ddd2db84
jfrog registry sed didn't handle multi-line \ continuation Added /jfrog-pkg-proxy.*\\$/d pattern ddd2db84
detect_docker_host_ip() uses ip command not in minimal installs Added iproute2 to apt deps ddd2db84
SSH disconnect loses all output exec > >(tee -a setup.log) 2>&1 ddd2db84
localmemgpt-backend can't reach Ollama on Linux extra_hosts: ['host.docker.internal:host-gateway'] in compose 3b31709b
Dashboard Dockerfiles had hardcoded corporate proxy Converted to ARG-based proxy with empty defaults 2b9fd717
pnpm install --frozen-lockfile fails on shallow clones Removed --frozen-lockfile 3b31709b

Your Tasks (in priority order)

1. Audit setup.sh for correctness and completeness

Read the entire script and identify every potential failure point. Specifically check:

  • Phase 1 (System):

    • Docker CE install via official apt repo — verify the GPG key + sources.list format works on Ubuntu 24.04
    • Node.js 22 via NodeSource — verify setup_22.x URL is current
    • pnpm 10.6.5 via npm install -g — correct
    • Ollama install via https://ollama.com/install.sh — verify it starts as systemd service, has fallback
    • All commands must be non-interactive (DEBIAN_FRONTEND=noninteractive)
  • Phase 2 (Gitea):

    • Gitea Docker container gitea/gitea:1.22 on port 3300
    • Admin user creation via gitea admin user create inside the container
    • Organization creation via REST API (POST /api/v1/orgs)
    • API token creation with write:package + read:package scopes
    • Token extracted via jq -r '.sha1' — verify Gitea 1.22 returns .sha1 (not .token)
  • Phase 3 (Clone):

    • Shallow clone (--depth 1) all 11 repos
    • Corporate proxy stripping: sed removes HTTP_PROXY, HTTPS_PROXY, NO_PROXY ENV lines and jfrog-pkg-proxy registry references from ALL Dockerfiles
    • CRITICAL: Verify the glob patterns catch ALL Dockerfiles including special paths:
      • learning_multimodal_memory_agents/mindlyst-native/web/Dockerfile
      • learning_voice_ai_agent/user-dashboard-web/Dockerfile
      • learning_voice_ai_agent/backend/Dockerfile (has backend-python too)
    • Verify sed -i works on Alpine/Ubuntu (GNU sed, not BSD)
  • Phase 4 (Build):

    • .npmrc written to common-plat root with Gitea registry URL + token
    • pnpm install (no --frozen-lockfile — shallow clones may have lockfile drift)
    • pnpm -r build — builds ALL packages in dependency order
    • Verify this works when run as root (pnpm may have permission issues)
  • Phase 5 (Publish):

    • Iterates packages/*/, skips non-@bytelyst/*, skips private: true, skips packages without dist/
    • pnpm publish --registry <url> --no-git-checks
    • Must tolerate "already exists" 409 errors gracefully
  • Phase 6 (Env):

    • Generates .env.ecosystem with well-known Cosmos emulator key and Azurite key
    • Verify the heredoc correctly expands ${COSMOS_EMULATOR_KEY} and ${AZURITE_KEY}
    • Verify the AZURE_BLOB_CONNECTION_STRING semicolons don't break the env file
    • JWT secret generated via openssl rand -base64 32
  • Phase 7 (Deploy):

    • detect_docker_host_ip() returns docker0 bridge IP (usually 172.17.0.1)
    • GITEA_NPM_HOST set to this IP so Docker builds can reach Gitea on the host
    • docker compose up --build -d with BuildKit secrets for GITEA_NPM_TOKEN
    • Verify the x-product-build YAML anchor in docker-compose.ecosystem.yml correctly passes GITEA_NPM_HOST as build arg and gitea_npm_token as secret
  • Phase 8 (Verify):

    • Waits for platform-service health (120s timeout)
    • Creates /opt/bytelyst/check-health.sh with all 27+ service URLs
    • Sleeps 30s then runs health check

2. Fix every bug you find

Do not just report issues — fix them directly in setup.sh. Common pitfalls to watch for:

  • Gitea API token field: Gitea 1.22+ may return the token in .token instead of .sha1. Add fallback: jq -r '.sha1 // .token'
  • pnpm as root: May need --unsafe-perm or setting pnpm config set unsafe-perm true
  • Docker BuildKit secrets: The secrets.gitea_npm_token.environment directive requires the env var to be set in the shell running docker compose. Verify export GITEA_NPM_TOKEN is in scope.
  • Cosmos emulator on Linux: The vnext-preview image requires PROTOCOL=http. Verify cosmos-emulator healthcheck works (it checks port 8080 for /ready, not 8081).
  • Product Dockerfiles: Each uses --mount=type=secret,id=gitea_npm_token during pnpm install. Verify the secret ID matches what's in the compose file.
  • MindLyst special path: Its web Dockerfile is at mindlyst-native/web/Dockerfile (not web/Dockerfile). The compose file references ../learning_multimodal_memory_agents with dockerfile: mindlyst-native/web/Dockerfile. Verify this context + dockerfile path is correct.
  • LysnrAI extra dashboards: Has user-dashboard-web/Dockerfile in addition to backend/Dockerfile. Verify the compose references the correct paths.

3. Add error recovery and logging

The script uses set -euo pipefail which exits on any error. This is too aggressive for a 25-minute deployment. Add:

  • Per-phase error trapping: Wrap each phase in a function that catches errors and prints a clear message about which phase failed and what to check
  • Log file: Tee all output to /opt/bytelyst/setup.log so the user can review after SSH disconnection
  • Resume support: Save phase completion markers to /opt/bytelyst/.phase_complete_N. On re-run, skip already-completed phases (unless the user passes FORCE_RERUN=1)

4. Add a dry-run / validation mode

Add DRY_RUN=1 support that:

  • Checks all prerequisites (disk space, memory, network access to GitHub)
  • Validates Docker is installed and running
  • Validates Gitea is reachable
  • Validates all repos can be cloned (HEAD request to GitHub)
  • Does NOT build, publish, or deploy
  • Prints a summary of what WOULD happen

5. Validate the docker-compose.ecosystem.yml integration

Read docker-compose.ecosystem.yml (in the repo root) and verify:

  • Every service's build.context and build.dockerfile paths are correct relative to the compose file location
  • Every service's port mapping matches the backend's PORT env var
  • The x-product-build anchor correctly provides GITEA_NPM_HOST and gitea_npm_token secret
  • All depends_on conditions reference services that actually exist
  • The localmemgpt-backend service has extra_hosts: ['host.docker.internal:host-gateway'] for Ollama access

6. Update README.md

After all fixes, update README.md to reflect:

  • Any new env vars you added (e.g., DRY_RUN, FORCE_RERUN)
  • Updated duration estimates if phases changed
  • Any new troubleshooting entries
  • NSG port list: 22, 80, 1025, 1234, 3000-3003, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070, 3100, 3300, 4003, 4005, 4007, 4010-4019, 8025, 8080, 8081, 10000, 11434

7. Create a test plan

Add a section to README.md (or a separate test-plan.md) that describes how to validate the deployment end-to-end:

1. SSH into VM
2. Run: /opt/bytelyst/check-health.sh
   Expected: All 27+ checks green
3. Run: curl http://localhost:4003/health
   Expected: {"status":"ok","service":"platform-service",...}
4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}'
   Expected: 201 with user object
5. Open browser: http://<vm-ip>:3001
   Expected: Admin dashboard login page
6. Open browser: http://<vm-ip>:3040
   Expected: FlowMonk web app
7. Run: curl http://localhost:4019/api/models
   Expected: List of Ollama models including llama3.2:3b
8. Open browser: http://<vm-ip>:8025
   Expected: Mailpit inbox (empty)
9. Open browser: http://<vm-ip>:3000
   Expected: Grafana login (admin / bytelyst)

Constraints

  • DO NOT change any files outside docs/devops/single_azure_vm/ without asking
  • DO NOT modify docker-compose.ecosystem.yml or any Dockerfile — the script must work with the repos as-is (it patches Dockerfiles after cloning)
  • DO NOT hardcode secrets or API keys (Cosmos emulator and Azurite keys are well-known public keys, those are OK)
  • DO NOT add emojis to code
  • DO NOT use console.log or print — use the existing log(), ok(), warn(), fail() helpers
  • The script MUST work on a completely fresh Ubuntu 24.04 LTS VM with NOTHING pre-installed except SSH
  • The script MUST be idempotent — running it twice should not break anything
  • The script MUST complete in under 30 minutes on a Standard_D8s_v5 (8 vCPU, 32 GB)

Definition of Done

  • setup.sh runs flawlessly from sudo ./setup.sh on a raw Ubuntu 24.04 VM
  • All 8 phases complete without manual intervention
  • /opt/bytelyst/check-health.sh shows ALL services green
  • All 10 product backends respond to /health with {"status":"ok",...}
  • All 9 product web apps serve their landing page
  • Admin dashboard (http://<vm-ip>:3001) loads
  • Tracker dashboard (http://<vm-ip>:3003) loads
  • LocalMemGPT can reach Ollama (curl http://localhost:4019/api/models returns models)
  • Gitea UI accessible at http://<vm-ip>:3300 with all @bytelyst/* packages visible
  • Grafana accessible at http://<vm-ip>:3000 (admin / bytelyst)
  • Mailpit accessible at http://<vm-ip>:8025
  • README.md is accurate and complete
  • Script is idempotent (second run succeeds without errors)
  • Setup log saved to /opt/bytelyst/setup.log

Architecture Reference

Raw Ubuntu 24.04 VM
├── Ollama (systemd, :11434) ─── local LLM inference
├── Gitea (Docker, :3300) ────── npm package registry
└── Docker Compose Ecosystem (27 services)
    ├── Infrastructure
    │   ├── cosmos-emulator (:8081, :1234)
    │   ├── azurite (:10000)
    │   ├── mailpit (:1025, :8025)
    │   ├── loki (:3100)
    │   ├── grafana (:3000)
    │   └── gateway/traefik (:80, :8080)
    ├── Platform Services
    │   ├── platform-service (:4003) ── auth, billing, flags, audit
    │   ├── extraction-service (:4005) ── AI text extraction
    │   └── mcp-server (:4007) ── MCP tool server
    ├── Dashboards
    │   ├── admin-web (:3001) ── platform admin console
    │   └── tracker-web (:3003) ── issue tracker
    ├── Product Backends (Fastify 5 + TypeScript)
    │   ├── peakpulse-backend (:4010)
    │   ├── chronomind-backend (:4011)
    │   ├── jarvisjr-backend (:4012)
    │   ├── nomgap-backend (:4013)
    │   ├── mindlyst-backend (:4014)
    │   ├── lysnrai-backend (:4015)
    │   ├── notelett-backend (:4016)
    │   ├── flowmonk-backend (:4017)
    │   ├── actiontrail-backend (:4018)
    │   └── localmemgpt-backend (:4019) ── connects to Ollama
    └── Product Web Apps (Next.js 16)
        ├── lysnrai-web (:3002)
        ├── chronomind-web (:3030)
        ├── jarvisjr-web (:3035)
        ├── flowmonk-web (:3040)
        ├── notelett-web (:3045)
        ├── mindlyst-web (:3050)
        ├── nomgap-web (:3055)
        ├── actiontrail-web (:3060)
        └── localmemgpt-web (:3070)

How Docker Builds Reach Gitea

Product Dockerfiles use BuildKit secret mount for the npm token:

RUN --mount=type=secret,id=gitea_npm_token \
    cp .npmrc.docker .npmrc && \
    GITEA_NPM_TOKEN=$(cat /run/secrets/gitea_npm_token) \
    pnpm install --frozen-lockfile

The .npmrc.docker in each product repo uses ${GITEA_NPM_HOST}:3300 as the registry host. During docker compose build, the host's GITEA_NPM_TOKEN env var is passed as a BuildKit secret, and GITEA_NPM_HOST is passed as a build arg (defaults to host.docker.internal, overridden to 172.17.0.1 on Linux VMs by the setup script).