refactor(infra): reorganize single_azure_vm into docker/ and k8s/ subfolders
- Move setup.sh, README.md, prompt.md into docker/ subfolder - Create top-level README.md comparing both approaches - Create k8s/README.md with full design doc: k3s architecture, namespace strategy, manifest structure, ConfigMap/Secret design, Cosmos emulator StatefulSet, Ollama host service, resource limits, 5-phase implementation plan, and kubectl cheat sheet
This commit is contained in:
parent
3b2d6391b9
commit
7d0c469858
@ -1,188 +1,60 @@
|
||||
# ByteLyst Single-VM Deployment
|
||||
|
||||
> Deploy the **entire ByteLyst ecosystem** (30 services, 10 products) on a single **raw** Azure VM.
|
||||
> Nothing pre-installed required — the script handles everything from a blank Ubuntu machine.
|
||||
> Two files: this README and `setup.sh`. Copy both to the VM and run the script.
|
||||
> Deploy the **entire ByteLyst ecosystem** (30 services, 10 products) on a single Azure VM.
|
||||
> Two orchestration approaches — pick one or learn both side by side.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
## Approaches
|
||||
|
||||
- **Azure VM:** Ubuntu 24.04 LTS (or 22.04), **Standard_D8s_v5** (8 vCPU, 32 GB RAM) recommended
|
||||
- **Disk:** 128 GB+ (Docker images, Cosmos emulator, Ollama models, build artifacts)
|
||||
- **Network:** NSG allowing inbound on ports listed in the Port Map below
|
||||
- **GitHub access:** Repos must be accessible (public or `GITHUB_TOKEN` for private)
|
||||
- **Nothing else needed** — the script installs Docker, Node.js, pnpm, Gitea, Ollama, and everything
|
||||
### [`docker/`](docker/) — Docker Compose (Production-ready)
|
||||
|
||||
## Quick Start
|
||||
Proven, battle-tested deployment using `docker-compose.ecosystem.yml`.
|
||||
Installs everything from scratch on a raw Ubuntu VM in ~20 minutes.
|
||||
|
||||
```bash
|
||||
# 1. SSH into your Azure VM
|
||||
ssh azureuser@<vm-ip>
|
||||
|
||||
# 2. Copy setup.sh and make executable
|
||||
chmod +x setup.sh
|
||||
|
||||
# 3. Run — provide your GitHub username (repos are cloned via HTTPS)
|
||||
# If repos are private, also export GITHUB_TOKEN first.
|
||||
sudo ./setup.sh
|
||||
|
||||
# 4. Wait ~15-25 minutes for full build + deploy
|
||||
|
||||
# 5. Verify
|
||||
/opt/bytelyst/check-health.sh
|
||||
sudo ./docker/setup.sh # Full install
|
||||
sudo ./docker/setup.sh --resume # Resume after disconnect
|
||||
/opt/bytelyst/check-health.sh # Verify all 30 services
|
||||
```
|
||||
|
||||
### Resume & Retry
|
||||
**Use this if:** You want reliable deployment now.
|
||||
|
||||
Phase completion is tracked. If anything fails, you don't have to start over:
|
||||
### [`k8s/`](k8s/) — Kubernetes via k3s (Learning / Future-ready)
|
||||
|
||||
```bash
|
||||
sudo ./setup.sh --phase=7 # Retry just the deploy phase
|
||||
sudo ./setup.sh --resume # Auto-resume after SSH disconnect
|
||||
sudo ./setup.sh --resume-from=7 # Jump to deploy after manual fix
|
||||
sudo ./setup.sh --status # Check what's done
|
||||
sudo ./setup.sh --reset # Start completely over
|
||||
sudo ./setup.sh --help # Show full usage
|
||||
Same 30 services orchestrated by Kubernetes on a single VM using k3s.
|
||||
Builds on the same Docker images — no Dockerfile changes needed.
|
||||
|
||||
**Use this if:** You want to learn K8s with real services, practice `kubectl`,
|
||||
and prepare for multi-node scaling later.
|
||||
|
||||
---
|
||||
|
||||
## Architecture (shared by both approaches)
|
||||
|
||||
```
|
||||
Raw Ubuntu 24.04 VM (Standard_D8s_v5: 8 vCPU, 32 GB RAM)
|
||||
├── Ollama (systemd, :11434) ─── local LLM inference
|
||||
├── Gitea (Docker/:3300) ──────── npm package registry
|
||||
└── 30 Services
|
||||
├── Infrastructure (6): cosmos-emulator, azurite, mailpit, loki, grafana, traefik
|
||||
├── Platform (3): platform-service, extraction-service, mcp-server
|
||||
├── Dashboards (2): admin-web, tracker-web
|
||||
├── Backends (10): peakpulse, chronomind, jarvisjr, nomgap, mindlyst,
|
||||
│ lysnrai, notelett, flowmonk, actiontrail, localmemgpt
|
||||
└── Web Apps (9): lysnrai-dashboard, chronomind-web, jarvisjr-web, flowmonk-web,
|
||||
notelett-web, mindlyst-web, nomgap-web, actiontrail-web, localmemgpt-web
|
||||
```
|
||||
|
||||
## What the Script Installs & Does
|
||||
## Comparison
|
||||
|
||||
### Software installed on the VM (from scratch)
|
||||
|
||||
| Software | Version | Purpose |
|
||||
|----------|---------|----------|
|
||||
| **Docker CE** | latest | Container runtime + Compose + BuildKit |
|
||||
| **Node.js** | 22 LTS | Build toolchain for TypeScript packages |
|
||||
| **pnpm** | 10.6.5 | Package manager (workspace-aware) |
|
||||
| **Gitea** | 1.22 (Docker) | Local npm package registry on `:3300` |
|
||||
| **Ollama** | latest | Local LLM inference for LocalMemGPT on `:11434` |
|
||||
| **git, jq, curl** | latest | System utilities |
|
||||
|
||||
### Execution phases
|
||||
|
||||
| Phase | Duration | Description |
|
||||
|-------|----------|-------------|
|
||||
| 1. System | ~3 min | Pre-flight checks (disk ≥40 GB, RAM ≥16 GB), install Docker, Node.js 22, pnpm 10.6.5, Ollama, git, jq, build-essential |
|
||||
| 2. Gitea | ~1 min | Start Gitea Docker container, create admin + org + API token |
|
||||
| 3. Clone | ~3 min | Clone all 11 repos to `/opt/bytelyst/` |
|
||||
| 4. Build | ~5 min | `pnpm install && pnpm -r build` all `@bytelyst/*` packages |
|
||||
| 5. Publish | ~3 min | Publish all packages to local Gitea npm registry |
|
||||
| 6. Env | instant | Generate `.env.ecosystem` with Cosmos emulator key, Azurite key, JWT secret |
|
||||
| 7. Deploy | ~10 min | Stop Ollama (free RAM), per-service Docker build + deploy (30 services, with fallback), prune build cache, restart Ollama |
|
||||
| 8. Verify | ~1 min | Health-check all 30+ endpoints + create `/opt/bytelyst/check-health.sh` |
|
||||
|
||||
## Port Map (after deployment)
|
||||
|
||||
### Infrastructure (installed by setup.sh)
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| Gitea (npm registry) | 3300 | `http://<vm-ip>:3300` |
|
||||
| Ollama (LLM API) | 11434 | `http://<vm-ip>:11434` |
|
||||
| Cosmos Data Explorer | 1234 | `http://<vm-ip>:1234` |
|
||||
| Azurite (Blob) | 10000 | `http://<vm-ip>:10000` |
|
||||
| Mailpit UI | 8025 | `http://<vm-ip>:8025` |
|
||||
| Loki (Logs) | 3100 | `http://<vm-ip>:3100/ready` |
|
||||
| Grafana | 3000 | `http://<vm-ip>:3000` |
|
||||
| Traefik Dashboard | 8080 | `http://<vm-ip>:8080` |
|
||||
|
||||
### Platform Services
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| platform-service | 4003 | `http://<vm-ip>:4003/health` |
|
||||
| extraction-service | 4005 | `http://<vm-ip>:4005/health` |
|
||||
| mcp-server | 4007 | `http://<vm-ip>:4007/health` |
|
||||
|
||||
### Platform Dashboards
|
||||
| Dashboard | Port | URL |
|
||||
|-----------|------|-----|
|
||||
| Admin Console | 3001 | `http://<vm-ip>:3001` |
|
||||
| Issue Tracker | 3003 | `http://<vm-ip>:3003` |
|
||||
|
||||
### Product Backends
|
||||
| Product | Port | Health |
|
||||
|---------|------|--------|
|
||||
| PeakPulse | 4010 | `http://<vm-ip>:4010/health` |
|
||||
| ChronoMind | 4011 | `http://<vm-ip>:4011/health` |
|
||||
| JarvisJr | 4012 | `http://<vm-ip>:4012/health` |
|
||||
| NomGap | 4013 | `http://<vm-ip>:4013/health` |
|
||||
| MindLyst | 4014 | `http://<vm-ip>:4014/health` |
|
||||
| LysnrAI | 4015 | `http://<vm-ip>:4015/health` |
|
||||
| NoteLett | 4016 | `http://<vm-ip>:4016/health` |
|
||||
| FlowMonk | 4017 | `http://<vm-ip>:4017/health` |
|
||||
| ActionTrail | 4018 | `http://<vm-ip>:4018/health` |
|
||||
| LocalMemGPT | 4019 | `http://<vm-ip>:4019/health` |
|
||||
|
||||
### Product Web Apps
|
||||
| Product | Port | URL |
|
||||
|---------|------|-----|
|
||||
| LysnrAI Dashboard | 3002 | `http://<vm-ip>:3002` |
|
||||
| ChronoMind | 3030 | `http://<vm-ip>:3030` |
|
||||
| JarvisJr | 3035 | `http://<vm-ip>:3035` |
|
||||
| FlowMonk | 3040 | `http://<vm-ip>:3040` |
|
||||
| NoteLett | 3045 | `http://<vm-ip>:3045` |
|
||||
| MindLyst | 3050 | `http://<vm-ip>:3050` |
|
||||
| NomGap | 3055 | `http://<vm-ip>:3055` |
|
||||
| ActionTrail | 3060 | `http://<vm-ip>:3060` |
|
||||
| LocalMemGPT | 3070 | `http://<vm-ip>:3070` |
|
||||
|
||||
## Post-Deployment Commands
|
||||
|
||||
```bash
|
||||
# Check all service health
|
||||
/opt/bytelyst/check-health.sh
|
||||
|
||||
# View logs for a specific service
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml \
|
||||
logs -f platform-service
|
||||
|
||||
# Restart a specific service
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml \
|
||||
restart flowmonk-backend
|
||||
|
||||
# Stop everything
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml down
|
||||
|
||||
# Stop and wipe all data
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml down -v
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All optional — defaults work for most setups:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `GITHUB_USER` | `saravanakumardb1` | GitHub org/user to clone repos from |
|
||||
| `GITHUB_TOKEN` | (empty) | Set for private repos (HTTPS auth) |
|
||||
| `GITEA_ADMIN` | `bytelyst-admin` | Gitea admin username |
|
||||
| `GITEA_PASS` | `ByteLyst2026!` | Gitea admin password |
|
||||
| `OLLAMA_MODEL` | `llama3.2:3b` | Default LLM model to pull |
|
||||
| `SKIP_CLONE` | `0` | Set `1` to skip cloning (re-runs) |
|
||||
| `SKIP_BUILD` | `0` | Set `1` to skip package build+publish (re-runs) |
|
||||
|
||||
## CLI Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--resume` | Auto-resume from last completed phase |
|
||||
| `--resume-from=N` | Resume from phase N (1-8) |
|
||||
| `--phase=N` | Run ONLY phase N (useful for retrying) |
|
||||
| `--reset` | Clear phase markers and start fresh |
|
||||
| `--status` | Show completed phases and exit |
|
||||
| `-h`, `--help` | Show usage help |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Cosmos emulator slow:** It needs 20-30s on first boot. Services wait via health checks.
|
||||
- **Out of memory:** Use at least 32 GB RAM. Cosmos emulator needs ~4 GB, Ollama needs ~4 GB for 3B models.
|
||||
- **Build failures:** Check Gitea is running (`docker ps | grep gitea`) and packages published (`curl http://localhost:3300/api/packages/bytelyst/npm/`). Per-service build logs: `/opt/bytelyst/.setup-state/builds/<service>.log`. Retry: `sudo ./setup.sh --phase=7`.
|
||||
- **Ollama not responding:** Check `systemctl status ollama` or `curl http://localhost:11434/api/version`.
|
||||
- **Port conflicts:** Ensure nothing else runs on the listed ports before deploying.
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- **Remote browser access:** Product web apps fall back to `http://localhost:<port>` for API calls. This works when browsing from the VM itself but **not from a remote browser** (e.g., laptop accessing `http://<vm-ip>:3060`). For remote access, set up a reverse proxy (Traefik rules) or SSH port-forwarding. Health checks and server-side rendering still work regardless.
|
||||
- **Cosmos emulator is x86-only:** Do not use ARM-based VMs (e.g., Dpsv6). Stick with `Standard_D8s_v5` or similar Intel/AMD instances.
|
||||
- **Memory pressure:** Phase 7 automatically stops Ollama (~3 GB) during Docker builds and restarts it after. If builds still OOM on 32 GB, retry with `sudo ./setup.sh --phase=7` (per-service fallback skips what already built).
|
||||
- **Corporate proxy in Dockerfiles:** Already removed at source across all repos. No runtime stripping needed.
|
||||
| | Docker Compose | K8s (k3s) |
|
||||
|--|----------------|-----------|
|
||||
| **Setup time** | ~20 min | ~30 min |
|
||||
| **RAM overhead** | ~100 MB | ~600 MB |
|
||||
| **Config files** | 1 compose + 1 .env | ~30 manifests (or Helm) |
|
||||
| **Scaling** | Manual | `kubectl scale` / HPA |
|
||||
| **Rolling updates** | Restart-based | Zero-downtime |
|
||||
| **Resource limits** | Basic | Fine-grained per pod |
|
||||
| **Multi-VM ready** | Docker Swarm | Native `kubectl join` |
|
||||
| **Learning value** | Low | High (transferable to AKS/EKS/GKE) |
|
||||
|
||||
188
docs/devops/single_azure_vm/docker/README.md
Normal file
188
docs/devops/single_azure_vm/docker/README.md
Normal file
@ -0,0 +1,188 @@
|
||||
# ByteLyst Single-VM Deployment
|
||||
|
||||
> Deploy the **entire ByteLyst ecosystem** (30 services, 10 products) on a single **raw** Azure VM.
|
||||
> Nothing pre-installed required — the script handles everything from a blank Ubuntu machine.
|
||||
> Two files: this README and `setup.sh`. Copy both to the VM and run the script.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Azure VM:** Ubuntu 24.04 LTS (or 22.04), **Standard_D8s_v5** (8 vCPU, 32 GB RAM) recommended
|
||||
- **Disk:** 128 GB+ (Docker images, Cosmos emulator, Ollama models, build artifacts)
|
||||
- **Network:** NSG allowing inbound on ports listed in the Port Map below
|
||||
- **GitHub access:** Repos must be accessible (public or `GITHUB_TOKEN` for private)
|
||||
- **Nothing else needed** — the script installs Docker, Node.js, pnpm, Gitea, Ollama, and everything
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. SSH into your Azure VM
|
||||
ssh azureuser@<vm-ip>
|
||||
|
||||
# 2. Copy setup.sh and make executable
|
||||
chmod +x setup.sh
|
||||
|
||||
# 3. Run — provide your GitHub username (repos are cloned via HTTPS)
|
||||
# If repos are private, also export GITHUB_TOKEN first.
|
||||
sudo ./setup.sh
|
||||
|
||||
# 4. Wait ~15-25 minutes for full build + deploy
|
||||
|
||||
# 5. Verify
|
||||
/opt/bytelyst/check-health.sh
|
||||
```
|
||||
|
||||
### Resume & Retry
|
||||
|
||||
Phase completion is tracked. If anything fails, you don't have to start over:
|
||||
|
||||
```bash
|
||||
sudo ./setup.sh --phase=7 # Retry just the deploy phase
|
||||
sudo ./setup.sh --resume # Auto-resume after SSH disconnect
|
||||
sudo ./setup.sh --resume-from=7 # Jump to deploy after manual fix
|
||||
sudo ./setup.sh --status # Check what's done
|
||||
sudo ./setup.sh --reset # Start completely over
|
||||
sudo ./setup.sh --help # Show full usage
|
||||
```
|
||||
|
||||
## What the Script Installs & Does
|
||||
|
||||
### Software installed on the VM (from scratch)
|
||||
|
||||
| Software | Version | Purpose |
|
||||
|----------|---------|----------|
|
||||
| **Docker CE** | latest | Container runtime + Compose + BuildKit |
|
||||
| **Node.js** | 22 LTS | Build toolchain for TypeScript packages |
|
||||
| **pnpm** | 10.6.5 | Package manager (workspace-aware) |
|
||||
| **Gitea** | 1.22 (Docker) | Local npm package registry on `:3300` |
|
||||
| **Ollama** | latest | Local LLM inference for LocalMemGPT on `:11434` |
|
||||
| **git, jq, curl** | latest | System utilities |
|
||||
|
||||
### Execution phases
|
||||
|
||||
| Phase | Duration | Description |
|
||||
|-------|----------|-------------|
|
||||
| 1. System | ~3 min | Pre-flight checks (disk ≥40 GB, RAM ≥16 GB), install Docker, Node.js 22, pnpm 10.6.5, Ollama, git, jq, build-essential |
|
||||
| 2. Gitea | ~1 min | Start Gitea Docker container, create admin + org + API token |
|
||||
| 3. Clone | ~3 min | Clone all 11 repos to `/opt/bytelyst/` |
|
||||
| 4. Build | ~5 min | `pnpm install && pnpm -r build` all `@bytelyst/*` packages |
|
||||
| 5. Publish | ~3 min | Publish all packages to local Gitea npm registry |
|
||||
| 6. Env | instant | Generate `.env.ecosystem` with Cosmos emulator key, Azurite key, JWT secret |
|
||||
| 7. Deploy | ~10 min | Stop Ollama (free RAM), per-service Docker build + deploy (30 services, with fallback), prune build cache, restart Ollama |
|
||||
| 8. Verify | ~1 min | Health-check all 30+ endpoints + create `/opt/bytelyst/check-health.sh` |
|
||||
|
||||
## Port Map (after deployment)
|
||||
|
||||
### Infrastructure (installed by setup.sh)
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| Gitea (npm registry) | 3300 | `http://<vm-ip>:3300` |
|
||||
| Ollama (LLM API) | 11434 | `http://<vm-ip>:11434` |
|
||||
| Cosmos Data Explorer | 1234 | `http://<vm-ip>:1234` |
|
||||
| Azurite (Blob) | 10000 | `http://<vm-ip>:10000` |
|
||||
| Mailpit UI | 8025 | `http://<vm-ip>:8025` |
|
||||
| Loki (Logs) | 3100 | `http://<vm-ip>:3100/ready` |
|
||||
| Grafana | 3000 | `http://<vm-ip>:3000` |
|
||||
| Traefik Dashboard | 8080 | `http://<vm-ip>:8080` |
|
||||
|
||||
### Platform Services
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| platform-service | 4003 | `http://<vm-ip>:4003/health` |
|
||||
| extraction-service | 4005 | `http://<vm-ip>:4005/health` |
|
||||
| mcp-server | 4007 | `http://<vm-ip>:4007/health` |
|
||||
|
||||
### Platform Dashboards
|
||||
| Dashboard | Port | URL |
|
||||
|-----------|------|-----|
|
||||
| Admin Console | 3001 | `http://<vm-ip>:3001` |
|
||||
| Issue Tracker | 3003 | `http://<vm-ip>:3003` |
|
||||
|
||||
### Product Backends
|
||||
| Product | Port | Health |
|
||||
|---------|------|--------|
|
||||
| PeakPulse | 4010 | `http://<vm-ip>:4010/health` |
|
||||
| ChronoMind | 4011 | `http://<vm-ip>:4011/health` |
|
||||
| JarvisJr | 4012 | `http://<vm-ip>:4012/health` |
|
||||
| NomGap | 4013 | `http://<vm-ip>:4013/health` |
|
||||
| MindLyst | 4014 | `http://<vm-ip>:4014/health` |
|
||||
| LysnrAI | 4015 | `http://<vm-ip>:4015/health` |
|
||||
| NoteLett | 4016 | `http://<vm-ip>:4016/health` |
|
||||
| FlowMonk | 4017 | `http://<vm-ip>:4017/health` |
|
||||
| ActionTrail | 4018 | `http://<vm-ip>:4018/health` |
|
||||
| LocalMemGPT | 4019 | `http://<vm-ip>:4019/health` |
|
||||
|
||||
### Product Web Apps
|
||||
| Product | Port | URL |
|
||||
|---------|------|-----|
|
||||
| LysnrAI Dashboard | 3002 | `http://<vm-ip>:3002` |
|
||||
| ChronoMind | 3030 | `http://<vm-ip>:3030` |
|
||||
| JarvisJr | 3035 | `http://<vm-ip>:3035` |
|
||||
| FlowMonk | 3040 | `http://<vm-ip>:3040` |
|
||||
| NoteLett | 3045 | `http://<vm-ip>:3045` |
|
||||
| MindLyst | 3050 | `http://<vm-ip>:3050` |
|
||||
| NomGap | 3055 | `http://<vm-ip>:3055` |
|
||||
| ActionTrail | 3060 | `http://<vm-ip>:3060` |
|
||||
| LocalMemGPT | 3070 | `http://<vm-ip>:3070` |
|
||||
|
||||
## Post-Deployment Commands
|
||||
|
||||
```bash
|
||||
# Check all service health
|
||||
/opt/bytelyst/check-health.sh
|
||||
|
||||
# View logs for a specific service
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml \
|
||||
logs -f platform-service
|
||||
|
||||
# Restart a specific service
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml \
|
||||
restart flowmonk-backend
|
||||
|
||||
# Stop everything
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml down
|
||||
|
||||
# Stop and wipe all data
|
||||
docker compose -f /opt/bytelyst/learning_ai_common_plat/docker-compose.ecosystem.yml down -v
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All optional — defaults work for most setups:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `GITHUB_USER` | `saravanakumardb1` | GitHub org/user to clone repos from |
|
||||
| `GITHUB_TOKEN` | (empty) | Set for private repos (HTTPS auth) |
|
||||
| `GITEA_ADMIN` | `bytelyst-admin` | Gitea admin username |
|
||||
| `GITEA_PASS` | `ByteLyst2026!` | Gitea admin password |
|
||||
| `OLLAMA_MODEL` | `llama3.2:3b` | Default LLM model to pull |
|
||||
| `SKIP_CLONE` | `0` | Set `1` to skip cloning (re-runs) |
|
||||
| `SKIP_BUILD` | `0` | Set `1` to skip package build+publish (re-runs) |
|
||||
|
||||
## CLI Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--resume` | Auto-resume from last completed phase |
|
||||
| `--resume-from=N` | Resume from phase N (1-8) |
|
||||
| `--phase=N` | Run ONLY phase N (useful for retrying) |
|
||||
| `--reset` | Clear phase markers and start fresh |
|
||||
| `--status` | Show completed phases and exit |
|
||||
| `-h`, `--help` | Show usage help |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Cosmos emulator slow:** It needs 20-30s on first boot. Services wait via health checks.
|
||||
- **Out of memory:** Use at least 32 GB RAM. Cosmos emulator needs ~4 GB, Ollama needs ~4 GB for 3B models.
|
||||
- **Build failures:** Check Gitea is running (`docker ps | grep gitea`) and packages published (`curl http://localhost:3300/api/packages/bytelyst/npm/`). Per-service build logs: `/opt/bytelyst/.setup-state/builds/<service>.log`. Retry: `sudo ./setup.sh --phase=7`.
|
||||
- **Ollama not responding:** Check `systemctl status ollama` or `curl http://localhost:11434/api/version`.
|
||||
- **Port conflicts:** Ensure nothing else runs on the listed ports before deploying.
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- **Remote browser access:** Product web apps fall back to `http://localhost:<port>` for API calls. This works when browsing from the VM itself but **not from a remote browser** (e.g., laptop accessing `http://<vm-ip>:3060`). For remote access, set up a reverse proxy (Traefik rules) or SSH port-forwarding. Health checks and server-side rendering still work regardless.
|
||||
- **Cosmos emulator is x86-only:** Do not use ARM-based VMs (e.g., Dpsv6). Stick with `Standard_D8s_v5` or similar Intel/AMD instances.
|
||||
- **Memory pressure:** Phase 7 automatically stops Ollama (~3 GB) during Docker builds and restarts it after. If builds still OOM on 32 GB, retry with `sudo ./setup.sh --phase=7` (per-service fallback skips what already built).
|
||||
- **Corporate proxy in Dockerfiles:** Already removed at source across all repos. No runtime stripping needed.
|
||||
310
docs/devops/single_azure_vm/k8s/README.md
Normal file
310
docs/devops/single_azure_vm/k8s/README.md
Normal file
@ -0,0 +1,310 @@
|
||||
# ByteLyst Single-VM Kubernetes Deployment (k3s)
|
||||
|
||||
> Deploy the ByteLyst ecosystem on Kubernetes using **k3s** — a lightweight, certified K8s distribution
|
||||
> that runs on a single VM with ~512 MB overhead.
|
||||
|
||||
**Status:** Planning — see design decisions below.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Same VM as the Docker Compose approach:
|
||||
- **Azure VM:** Ubuntu 24.04 LTS, **Standard_D8s_v5** (8 vCPU, 32 GB RAM)
|
||||
- **Disk:** 128 GB+
|
||||
- **Docker images:** Built by `docker/setup.sh` phases 1-5 (reused, not rebuilt)
|
||||
|
||||
## Why k3s?
|
||||
|
||||
| Feature | k3s | minikube | kind | microk8s |
|
||||
|---------|-----|----------|------|----------|
|
||||
| RAM overhead | ~512 MB | ~2 GB | ~1 GB | ~800 MB |
|
||||
| Production-grade | Yes (CNCF certified) | No | No | Yes |
|
||||
| Built-in Traefik | Yes | No | No | Optional |
|
||||
| Single binary | Yes | No | No | No (snap) |
|
||||
| SQLite backend | Yes (no etcd needed) | N/A | N/A | Dqlite |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Ubuntu 24.04 VM
|
||||
├── k3s (single-node cluster)
|
||||
│ ├── kube-system namespace
|
||||
│ │ ├── CoreDNS
|
||||
│ │ ├── Traefik Ingress Controller
|
||||
│ │ ├── Local Path Provisioner
|
||||
│ │ └── Metrics Server
|
||||
│ │
|
||||
│ ├── bytelyst-infra namespace
|
||||
│ │ ├── cosmos-emulator (StatefulSet + PVC)
|
||||
│ │ ├── azurite (StatefulSet + PVC)
|
||||
│ │ ├── mailpit (Deployment)
|
||||
│ │ ├── loki (StatefulSet + PVC)
|
||||
│ │ └── grafana (Deployment + PVC)
|
||||
│ │
|
||||
│ ├── bytelyst-platform namespace
|
||||
│ │ ├── platform-service (Deployment, replicas: 1)
|
||||
│ │ ├── extraction-service (Deployment, replicas: 1)
|
||||
│ │ └── mcp-server (Deployment, replicas: 1)
|
||||
│ │
|
||||
│ ├── bytelyst-dashboards namespace
|
||||
│ │ ├── admin-web (Deployment, replicas: 1)
|
||||
│ │ └── tracker-web (Deployment, replicas: 1)
|
||||
│ │
|
||||
│ └── bytelyst-products namespace
|
||||
│ ├── *-backend (10 Deployments)
|
||||
│ └── *-web (9 Deployments)
|
||||
│
|
||||
├── Ollama (systemd, host network — :11434)
|
||||
└── Gitea (Docker container — :3300, used for build-time only)
|
||||
```
|
||||
|
||||
## Manifest Structure (planned)
|
||||
|
||||
```
|
||||
k8s/
|
||||
├── README.md # This file
|
||||
├── setup-k8s.sh # Bootstrap script (installs k3s, applies manifests)
|
||||
├── namespaces.yaml # 4 namespaces
|
||||
├── config/
|
||||
│ ├── configmap.yaml # Shared env vars (replaces .env.ecosystem)
|
||||
│ └── secrets.yaml # JWT_SECRET, COSMOS_KEY, etc.
|
||||
├── infra/
|
||||
│ ├── cosmos-emulator.yaml # StatefulSet + Service + PVC
|
||||
│ ├── azurite.yaml # StatefulSet + Service + PVC
|
||||
│ ├── mailpit.yaml # Deployment + Service
|
||||
│ ├── loki.yaml # StatefulSet + Service + PVC
|
||||
│ └── grafana.yaml # Deployment + Service + PVC
|
||||
├── platform/
|
||||
│ ├── platform-service.yaml # Deployment + Service
|
||||
│ ├── extraction-service.yaml # Deployment + Service
|
||||
│ └── mcp-server.yaml # Deployment + Service
|
||||
├── dashboards/
|
||||
│ ├── admin-web.yaml # Deployment + Service
|
||||
│ └── tracker-web.yaml # Deployment + Service
|
||||
├── products/
|
||||
│ ├── _backend-template.yaml # Helm-like template (for reference)
|
||||
│ ├── peakpulse-backend.yaml
|
||||
│ ├── chronomind-backend.yaml
|
||||
│ ├── ... (8 more backends)
|
||||
│ ├── lysnrai-dashboard.yaml
|
||||
│ ├── chronomind-web.yaml
|
||||
│ └── ... (7 more web apps)
|
||||
└── ingress/
|
||||
└── ingress.yaml # Traefik IngressRoute rules
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### 1. Image Source: Import from Docker
|
||||
|
||||
k3s uses containerd, not Docker. We import the Docker-built images:
|
||||
|
||||
```bash
|
||||
# Build images with Docker (phases 1-7 from docker/setup.sh)
|
||||
docker save platform-service:latest | k3s ctr images import -
|
||||
|
||||
# Or build directly with nerdctl (k3s-native)
|
||||
nerdctl build -t platform-service:latest -f services/platform-service/Dockerfile .
|
||||
```
|
||||
|
||||
**Decision:** Import from Docker first (simpler), migrate to nerdctl later.
|
||||
|
||||
### 2. Cosmos Emulator: StatefulSet with PVC
|
||||
|
||||
The Cosmos emulator needs persistent storage and specific env vars.
|
||||
Use a `StatefulSet` (not Deployment) for stable network identity:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: cosmos-emulator
|
||||
namespace: bytelyst-infra
|
||||
spec:
|
||||
replicas: 1
|
||||
serviceName: cosmos-emulator
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: cosmos
|
||||
image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
|
||||
ports:
|
||||
- containerPort: 8081
|
||||
- containerPort: 1234
|
||||
env:
|
||||
- name: AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE
|
||||
value: "true"
|
||||
- name: ENABLE_EXPLORER
|
||||
value: "true"
|
||||
resources:
|
||||
limits:
|
||||
memory: "3Gi"
|
||||
cpu: "2"
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: cosmos-data
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
### 3. Ollama: Host Network
|
||||
|
||||
Ollama stays as a systemd service on the host. Pods reach it via `hostNetwork`
|
||||
or a manually created Endpoints + Service pointing to the node IP:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: ollama
|
||||
namespace: bytelyst-products
|
||||
spec:
|
||||
ports:
|
||||
- port: 11434
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Endpoints
|
||||
metadata:
|
||||
name: ollama
|
||||
namespace: bytelyst-products
|
||||
subsets:
|
||||
- addresses:
|
||||
- ip: 172.17.0.1 # Host IP (node's internal IP)
|
||||
ports:
|
||||
- port: 11434
|
||||
```
|
||||
|
||||
### 4. ConfigMap replaces .env.ecosystem
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: bytelyst-config
|
||||
namespace: bytelyst-platform
|
||||
data:
|
||||
COSMOS_ENDPOINT: "http://cosmos-emulator.bytelyst-infra.svc:8081"
|
||||
COSMOS_DATABASE: "bytelyst"
|
||||
DB_PROVIDER: "cosmos"
|
||||
PLATFORM_SERVICE_URL: "http://platform-service.bytelyst-platform.svc:4003"
|
||||
EXTRACTION_SERVICE_URL: "http://extraction-service.bytelyst-platform.svc:4005"
|
||||
```
|
||||
|
||||
Note: K8s DNS uses `<service>.<namespace>.svc` format for cross-namespace access.
|
||||
|
||||
### 5. Secrets for sensitive values
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: bytelyst-secrets
|
||||
type: Opaque
|
||||
stringData:
|
||||
JWT_SECRET: "<generated>"
|
||||
COSMOS_KEY: "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw=="
|
||||
AZURE_BLOB_ACCOUNT_KEY: "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
|
||||
```
|
||||
|
||||
### 6. Health Checks → Readiness/Liveness Probes
|
||||
|
||||
Every backend gets K8s-native probes:
|
||||
|
||||
```yaml
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 4003
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 10
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 4003
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
```
|
||||
|
||||
### 7. Resource Limits
|
||||
|
||||
| Service type | CPU request | CPU limit | Memory request | Memory limit |
|
||||
|-------------|------------|-----------|---------------|-------------|
|
||||
| Backend | 100m | 500m | 256Mi | 512Mi |
|
||||
| Web app | 100m | 500m | 256Mi | 512Mi |
|
||||
| Platform service | 200m | 1000m | 384Mi | 768Mi |
|
||||
| Cosmos emulator | 1000m | 2000m | 2Gi | 3Gi |
|
||||
| Ollama | (host) | (host) | (host) | (host) |
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase A: Foundation (Day 1)
|
||||
- [ ] Install k3s on VM
|
||||
- [ ] Create 4 namespaces
|
||||
- [ ] Deploy ConfigMap + Secrets
|
||||
- [ ] Deploy cosmos-emulator + azurite (StatefulSets)
|
||||
- [ ] Verify: `kubectl get pods -A` shows infra running
|
||||
|
||||
### Phase B: Platform (Day 1-2)
|
||||
- [ ] Import platform-service Docker image
|
||||
- [ ] Deploy platform-service (Deployment + Service)
|
||||
- [ ] Verify: `kubectl exec` + `curl http://platform-service:4003/health`
|
||||
- [ ] Deploy extraction-service + mcp-server
|
||||
- [ ] Deploy admin-web + tracker-web
|
||||
|
||||
### Phase C: Products (Day 2-3)
|
||||
- [ ] Template: create one backend manifest, verify it works
|
||||
- [ ] Replicate for all 10 backends
|
||||
- [ ] Create web app manifests (9 services)
|
||||
- [ ] Verify: all 30 services running
|
||||
|
||||
### Phase D: Networking (Day 3)
|
||||
- [ ] Set up Traefik IngressRoute for external access
|
||||
- [ ] Configure NodePort services for direct port access
|
||||
- [ ] Create Ollama external service endpoint
|
||||
- [ ] Verify: health check script works against K8s services
|
||||
|
||||
### Phase E: Operations (Day 4+)
|
||||
- [ ] `kubectl scale deployment/flowmonk-backend --replicas=2` — test scaling
|
||||
- [ ] `kubectl rollout restart deployment/platform-service` — test rolling update
|
||||
- [ ] `kubectl top pods` — resource usage monitoring
|
||||
- [ ] Set up HorizontalPodAutoscaler for one service
|
||||
- [ ] Practice: `kubectl logs`, `kubectl exec`, `kubectl describe`
|
||||
|
||||
## Useful Commands (cheat sheet)
|
||||
|
||||
```bash
|
||||
# Cluster status
|
||||
kubectl get nodes
|
||||
kubectl get pods -A # All namespaces
|
||||
kubectl get pods -n bytelyst-products # Product namespace
|
||||
|
||||
# Deploy / update
|
||||
kubectl apply -f k8s/ # Apply all manifests
|
||||
kubectl apply -f k8s/products/ # Apply product manifests
|
||||
kubectl rollout restart deployment/flowmonk-backend -n bytelyst-products
|
||||
|
||||
# Debugging
|
||||
kubectl logs deployment/platform-service -n bytelyst-platform -f
|
||||
kubectl describe pod <pod-name> -n bytelyst-platform
|
||||
kubectl exec -it deployment/platform-service -n bytelyst-platform -- sh
|
||||
|
||||
# Scaling
|
||||
kubectl scale deployment/flowmonk-backend --replicas=2 -n bytelyst-products
|
||||
kubectl autoscale deployment/flowmonk-backend --min=1 --max=3 --cpu-percent=70
|
||||
|
||||
# Resource monitoring
|
||||
kubectl top pods -n bytelyst-products
|
||||
kubectl top nodes
|
||||
```
|
||||
|
||||
## Migration from Docker Compose
|
||||
|
||||
Both approaches can coexist on the same VM:
|
||||
1. `docker/setup.sh` builds images and publishes packages (phases 1-5)
|
||||
2. `docker compose down` stops the compose stack
|
||||
3. `setup-k8s.sh` imports images into k3s and applies manifests
|
||||
4. Both share the same Gitea registry and Ollama instance
|
||||
Loading…
Reference in New Issue
Block a user