diff --git a/docs/devops/single_azure_vm/docker/README.md b/docs/devops/single_azure_vm/docker/README.md index ab9d6f5b..bc65db78 100644 --- a/docs/devops/single_azure_vm/docker/README.md +++ b/docs/devops/single_azure_vm/docker/README.md @@ -169,14 +169,15 @@ All optional — defaults work for most setups: ## CLI Flags -| Flag | Description | -| ----------------- | -------------------------------------- | -| `--resume` | Auto-resume from last completed phase | -| `--resume-from=N` | Resume from phase N (1-8) | -| `--phase=N` | Run ONLY phase N (useful for retrying) | -| `--reset` | Clear phase markers and start fresh | -| `--status` | Show completed phases and exit | -| `-h`, `--help` | Show usage help | +| Flag | Description | +| ----------------- | ---------------------------------------------------- | +| `--resume` | Auto-resume from last completed phase | +| `--resume-from=N` | Resume from phase N (1-8) | +| `--phase=N` | Run ONLY phase N (useful for retrying) | +| `--dry-run` | Validate prerequisites without building or deploying | +| `--reset` | Clear phase markers and start fresh | +| `--status` | Show completed phases and exit | +| `-h`, `--help` | Show usage help | ## Troubleshooting diff --git a/docs/devops/single_azure_vm/docker/prompt.md b/docs/devops/single_azure_vm/docker/prompt.md index 1fe25bc7..297d60ca 100644 --- a/docs/devops/single_azure_vm/docker/prompt.md +++ b/docs/devops/single_azure_vm/docker/prompt.md @@ -113,8 +113,7 @@ The following issues have already been identified and fixed in the current `setu ## Your Tasks (in priority order) -> **Tasks 1-6 are DONE.** See "Current State" above and "Bugs Already Fixed" above. -> Only Task 4 (dry-run, low priority) and Task 7 (test plan) remain. +> **All 7 tasks are DONE.** See "Current State" above and "Bugs Already Fixed" above. ### ~~1. Audit `setup.sh` for correctness~~ ✅ DONE @@ -143,16 +142,20 @@ Already implemented: - **Per-service fallback:** Failed Docker builds are skipped, remaining services start - **Build logs:** Per-service to `/opt/bytelyst/.setup-state/builds/.log` -### 4. Add a dry-run / validation mode (TODO — low priority) +### ~~4. Add a dry-run / validation mode~~ ✅ DONE -Add `--dry-run` support that: +Added `--dry-run` flag that validates: -- Checks all prerequisites (disk space, memory, network access to GitHub) -- Validates Docker is installed and running -- Validates Gitea is reachable -- Validates all repos can be cloned (HEAD request to GitHub) -- Does NOT build, publish, or deploy -- Prints a summary of what WOULD happen +- System: root, disk >= 40 GB, RAM >= 16 GB, Ubuntu +- Docker: installed, daemon running, Compose available +- Node.js + pnpm installed +- Ollama: installed, service running +- Gitea: reachable, npm token saved +- Repos: all 12 cloned +- GitHub: reachable for cloning +- Compose file + .env.ecosystem exist +- Phase completion state +- Prints pass/fail summary with guidance ### ~~5. Validate the `docker-compose.ecosystem.yml` integration~~ ✅ DONE @@ -177,14 +180,23 @@ Updated: - Troubleshooting: added CORS and NODE_ENV entries - Known Limitations: expanded remote browser access with SSH port-forwarding command -### 7. Create a test plan +### ~~7. Create a test plan~~ ✅ DONE -Add a section to `README.md` (or a separate `test-plan.md`) that describes how to validate the deployment end-to-end: +Created `test-plan.md` with end-to-end validation steps: + +- Quick validation (check-health.sh + dry-run) +- Phase-by-phase verification (all 8 phases) +- Functional smoke tests (LocalMemGPT+Ollama, LLM Lab, auth, Mailpit, Grafana) +- Idempotency + resume tests +- Remote port connectivity via SSH forwarding +- Service count summary table + +Previous inline test plan from prompt.md (kept for reference): ``` 1. SSH into VM 2. Run: /opt/bytelyst/check-health.sh - Expected: All 30+ checks green + Expected: All 31 checks green 3. Run: curl http://localhost:4003/health Expected: {"status":"ok","service":"platform-service",...} 4. Run: curl http://localhost:4003/api/auth/register -X POST -H 'Content-Type: application/json' -d '{"email":"test@test.com","password":"Test1234!","displayName":"Test"}' diff --git a/docs/devops/single_azure_vm/docker/setup.sh b/docs/devops/single_azure_vm/docker/setup.sh index a8432b79..540b2f68 100755 --- a/docs/devops/single_azure_vm/docker/setup.sh +++ b/docs/devops/single_azure_vm/docker/setup.sh @@ -22,6 +22,7 @@ # --resume Auto-resume from last completed phase # --resume-from=N Resume from phase N (1-8) # --phase=N Run ONLY phase N (useful for retrying a single phase) +# --dry-run Validate prerequisites without building or deploying # --reset Clear phase markers and start fresh # --status Show completed phases and exit # -h, --help Show usage help @@ -99,6 +100,101 @@ ok() { echo -e "${GREEN}[$(date +%H:%M:%S)] ✓${NC} $*"; } warn() { echo -e "${YELLOW}[$(date +%H:%M:%S)] ⚠${NC} $*"; } fail() { echo -e "${RED}[$(date +%H:%M:%S)] ✗${NC} $*"; exit 1; } +# ── Dry-run / validation mode ──────────────────────────────────── +dry_run() { + log "DRY RUN: Validating prerequisites (no changes will be made)..." + echo "" + local pass=0 total=0 + + check_item() { + local label="$1" cmd="$2" + total=$((total + 1)) + if eval "$cmd" > /dev/null 2>&1; then + ok " $label" + pass=$((pass + 1)) + else + warn " FAIL: $label" + fi + } + + log "=== System ===" + check_item "Running as root" "[ \"$(id -u)\" -eq 0 ]" + + local disk_gb mem_gb + disk_gb=$(df -BG / | awk 'NR==2 {gsub(/G/,"",\$4); print \$4}') 2>/dev/null || disk_gb=0 + mem_gb=$(free -g | awk '/^Mem:/ {print \$2}') 2>/dev/null || mem_gb=0 + check_item "Disk >= 40 GB (have ${disk_gb} GB)" "[ \"${disk_gb:-0}\" -ge 40 ]" + check_item "RAM >= 16 GB (have ${mem_gb} GB)" "[ \"${mem_gb:-0}\" -ge 16 ]" + check_item "OS is Ubuntu" "grep -qi ubuntu /etc/os-release 2>/dev/null" + + log "=== Docker ===" + check_item "Docker installed" "command -v docker" + check_item "Docker daemon running" "docker info" + check_item "Docker Compose available" "docker compose version" + + log "=== Node / pnpm ===" + check_item "Node.js installed" "command -v node" + check_item "pnpm installed" "command -v pnpm" + + log "=== Ollama ===" + check_item "Ollama installed" "command -v ollama" + check_item "Ollama service running" "curl -sf http://localhost:11434/api/version" + + log "=== Gitea ===" + check_item "Gitea reachable on :${GITEA_PORT}" "curl -sf http://localhost:${GITEA_PORT}/api/v1/version" + if [ -f "${INSTALL_DIR}/.gitea_token" ]; then + check_item "Gitea npm token saved" "true" + else + check_item "Gitea npm token saved" "false" + fi + + log "=== Repositories ===" + local repo_count=0 + for repo in "${REPOS[@]}"; do + if [ -d "${INSTALL_DIR}/${repo}/.git" ]; then + repo_count=$((repo_count + 1)) + fi + done + check_item "Repos cloned: ${repo_count}/${#REPOS[@]}" "[ \"$repo_count\" -eq \"${#REPOS[@]}\" ]" + + log "=== GitHub Access ===" + local gh_url="https://github.com/${GITHUB_USER}/learning_ai_common_plat" + check_item "GitHub reachable (${GITHUB_USER})" "curl -sfI \"${gh_url}\" | head -1 | grep -q '200\|301\|302'" + + log "=== Compose File ===" + local compose_path="${INSTALL_DIR}/learning_ai_common_plat/${COMPOSE_FILE}" + check_item "docker-compose.ecosystem.yml exists" "[ -f \"${compose_path}\" ]" + + log "=== .env.ecosystem ===" + local env_path="${INSTALL_DIR}/learning_ai_common_plat/.env.ecosystem" + check_item ".env.ecosystem exists" "[ -f \"${env_path}\" ]" + + log "=== Phase State ===" + for i in 1 2 3 4 5 6 7 8; do + if is_phase_done "$i"; then + ok " Phase $i: DONE" + else + log " Phase $i: pending" + fi + done + + echo "" + echo "=======================================" + echo " Dry-run summary: ${pass}/${total} checks passed" + echo "=======================================" + echo "" + + if [ "$pass" -eq "$total" ]; then + ok "All checks passed. System is ready for deployment." + elif [ "$pass" -ge 5 ]; then + warn "Some checks failed. The system is partially configured." + log "Run 'sudo ./setup.sh' to complete setup." + else + warn "Many checks failed. This looks like a fresh VM." + log "Run 'sudo ./setup.sh' to bootstrap from scratch." + fi +} + wait_for_url() { local url="$1" max="${2:-60}" i=0 while ! curl -sf "$url" > /dev/null 2>&1; do @@ -1002,6 +1098,7 @@ usage() { echo " --resume Auto-resume from last completed phase" echo " --resume-from=N Resume starting at phase N (1-8)" echo " --phase=N Run ONLY phase N" + echo " --dry-run Validate prerequisites without building or deploying" echo " --reset Clear phase markers and start fresh" echo " --status Show completed phases and exit" echo " -h, --help Show this help" @@ -1031,6 +1128,10 @@ main() { --phase=*) mode="single" only_phase="${arg#*=}" ;; + --dry-run) + mkdir -p "$INSTALL_DIR" + dry_run + exit 0 ;; --reset) mkdir -p "$INSTALL_DIR" reset_phase_markers diff --git a/docs/devops/single_azure_vm/docker/test-plan.md b/docs/devops/single_azure_vm/docker/test-plan.md new file mode 100644 index 00000000..62344705 --- /dev/null +++ b/docs/devops/single_azure_vm/docker/test-plan.md @@ -0,0 +1,268 @@ +# ByteLyst Single-VM Deployment — Test Plan + +> End-to-end validation steps for verifying a successful deployment. +> Run these after `setup.sh` completes all 8 phases. + +--- + +## Quick Validation (2 minutes) + +```bash +# 1. Run the generated health check script +/opt/bytelyst/check-health.sh + +# 2. Quick dry-run to verify all prerequisites are satisfied +sudo ./setup.sh --dry-run +``` + +If all checks pass, the deployment is healthy. For deeper validation, continue below. + +--- + +## Phase-by-Phase Verification + +### Phase 1: System Dependencies + +```bash +# Docker +docker --version # Expect: Docker version 2x.x+ +docker compose version # Expect: Docker Compose version v2.x+ +docker info | grep "Server Version" # Daemon running + +# Node.js + pnpm +node --version # Expect: v22.x +pnpm --version # Expect: 10.6.5 + +# Ollama +ollama --version # Expect: ollama version x.x.x +curl -s http://localhost:11434/api/version | jq . # API responding +systemctl is-active ollama # Expect: active + +# System tools +git --version && jq --version && curl --version | head -1 +``` + +### Phase 2: Gitea + CI Runner + +```bash +# Gitea API +curl -s http://localhost:3300/api/v1/version | jq . +# Expect: {"version":"1.22.x"} + +# Gitea admin auth +curl -s -u bytelyst-admin:ByteLyst2026! \ + http://localhost:3300/api/v1/user | jq .login +# Expect: "bytelyst-admin" + +# Gitea org exists +curl -s http://localhost:3300/api/v1/orgs/bytelyst | jq .username +# Expect: "bytelyst" + +# Gitea npm token saved +cat /opt/bytelyst/.gitea_token +# Expect: non-empty token string + +# act_runner service +systemctl is-active act_runner # Expect: active +``` + +### Phase 3: Repositories + +```bash +# All 12 repos cloned +ls -1d /opt/bytelyst/learning_ai_* | wc -l +# Expect: 12 + +# Each repo has .git +for repo in /opt/bytelyst/learning_ai_*; do + echo "$(basename $repo): $([ -d $repo/.git ] && echo OK || echo MISSING)" +done +``` + +### Phase 4-5: Packages Built + Published + +```bash +# Packages built (dist/ exists) +ls /opt/bytelyst/learning_ai_common_plat/packages/*/dist/ 2>/dev/null | head -5 +# Expect: files present + +# Packages in Gitea registry +curl -s http://localhost:3300/api/packages/bytelyst/npm/ | jq '.[].name' | head -10 +# Expect: @bytelyst/errors, @bytelyst/config, etc. +``` + +### Phase 6: Environment Config + +```bash +# .env.ecosystem generated +cat /opt/bytelyst/learning_ai_common_plat/.env.ecosystem | head -5 +# Expect: COSMOS_ENDPOINT, COSMOS_KEY, etc. + +# Key values present +grep NODE_ENV /opt/bytelyst/learning_ai_common_plat/.env.ecosystem +# Expect: NODE_ENV=production + +grep CORS_ORIGIN /opt/bytelyst/learning_ai_common_plat/.env.ecosystem +# Expect: CORS_ORIGIN=* + +grep JWT_SECRET /opt/bytelyst/learning_ai_common_plat/.env.ecosystem +# Expect: non-empty random value +``` + +### Phase 7: Docker Services Running + +```bash +# All 31 services running +cd /opt/bytelyst/learning_ai_common_plat +docker compose -f docker-compose.ecosystem.yml ps --format "table {{.Name}}\t{{.Status}}\t{{.Ports}}" | head -35 + +# Count running containers +docker compose -f docker-compose.ecosystem.yml ps -q | wc -l +# Expect: 31 +``` + +### Phase 8: Health Checks + +Run each category. All should return HTTP 200. + +```bash +# ── Infrastructure ── +curl -sf http://localhost:3300/api/v1/version && echo " Gitea OK" +curl -sf http://localhost:11434/api/version && echo " Ollama OK" +curl -sf http://localhost:1234 && echo " Cosmos Explorer OK" +curl -sf http://localhost:10000 && echo " Azurite OK" +curl -sf http://localhost:8025 && echo " Mailpit OK" +curl -sf http://localhost:3100/ready && echo " Loki OK" +curl -sf http://localhost:3000/api/health && echo " Grafana OK" +curl -sf http://localhost:8080/api/overview && echo " Traefik OK" + +# ── Platform Services ── +curl -sf http://localhost:4003/health | jq .status # platform-service +curl -sf http://localhost:4005/health | jq .status # extraction-service +curl -sf http://localhost:4007/health | jq .status # mcp-server + +# ── Dashboards ── +curl -sf http://localhost:3001 | head -1 # admin-web +curl -sf http://localhost:3003 | head -1 # tracker-web + +# ── Product Backends ── +for port in 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019; do + status=$(curl -sf http://localhost:${port}/health | jq -r .status 2>/dev/null) + echo " :${port} -> ${status:-FAIL}" +done + +# ── Product Web Apps ── +for port in 3002 3030 3035 3040 3045 3050 3055 3060 3070 3075; do + code=$(curl -so /dev/null -w '%{http_code}' http://localhost:${port}/) + echo " :${port} -> HTTP ${code}" +done +``` + +--- + +## Functional Smoke Tests + +### LocalMemGPT + Ollama Integration + +```bash +# Verify LocalMemGPT can see Ollama models +curl -sf http://localhost:4019/api/models | jq '.[0].name' +# Expect: model name (e.g., "llama3.2:3b") +``` + +### LLM Lab Dashboard + Ollama + +```bash +# Verify LLM Lab dashboard serves and can proxy to Ollama +curl -sf http://localhost:3075 | head -1 +# Expect: HTML content + +curl -sf http://localhost:3075/api/ollama/tags | jq '.models[0].name' +# Expect: model name +``` + +### Platform Service Auth + +```bash +# Health with request ID +curl -sf -H "x-request-id: test-123" http://localhost:4003/health | jq . +# Expect: {"status":"ok","service":"platform-service","requestId":"test-123"} +``` + +### Mailpit (Email) + +```bash +# Mailpit inbox (should be empty initially) +curl -sf http://localhost:8025/api/v1/messages | jq .total +# Expect: 0 +``` + +### Grafana + +```bash +# Grafana login (default credentials) +curl -sf -u admin:bytelyst http://localhost:3000/api/org | jq .name +# Expect: "Main Org." +``` + +--- + +## Idempotency Test + +```bash +# Run setup again — should complete without errors +sudo ./setup.sh --resume +# Expect: "All phases already completed. Use --reset to start over." + +# Run single phase — should be safe +sudo ./setup.sh --phase=8 +# Expect: health check passes again +``` + +## Resume Test + +```bash +# Check status +sudo ./setup.sh --status +# Expect: all 8 phases DONE + +# Reset and verify +sudo ./setup.sh --reset +sudo ./setup.sh --status +# Expect: all 8 phases pending +``` + +--- + +## Port Connectivity (from external machine) + +If testing remote access via SSH port-forwarding: + +```bash +# From your laptop (not the VM) +ssh -N -L 3001:localhost:3001 -L 3060:localhost:3060 -L 4003:localhost:4003 azureuser@ + +# Then in another terminal on your laptop: +curl -sf http://localhost:4003/health | jq . +# Expect: {"status":"ok",...} + +# Open in browser: +# http://localhost:3001 -> Admin Console +# http://localhost:3060 -> ActionTrail Web +``` + +--- + +## Expected Service Count Summary + +| Category | Count | Ports | +| ----------------- | ------ | ---------------------------------------------------- | +| Infrastructure | 6 | 1234, 3000, 3100, 8025, 8080, 10000 | +| Platform Services | 3 | 4003, 4005, 4007 | +| Dashboards | 2 | 3001, 3003 | +| Product Backends | 10 | 4010-4019 | +| Product Web Apps | 9 | 3002, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070 | +| LLM Lab Dashboard | 1 | 3075 | +| **Total** | **31** | | + +Plus external to Docker: Gitea (:3300), Ollama (:11434).