docs(devops): add single-VM deployment guide with audit findings
This commit is contained in:
parent
2f06aacc27
commit
ae2af43d71
956
docs/devops/SINGLE_VM_DEPLOYMENT.md
Normal file
956
docs/devops/SINGLE_VM_DEPLOYMENT.md
Normal file
@ -0,0 +1,956 @@
|
||||
# ByteLyst Ecosystem — Single-VM Deployment Guide
|
||||
|
||||
> Deploy the **entire** ByteLyst ecosystem on one VM, fully Dockerized, with a K3s Kubernetes layer for production-readiness practice.
|
||||
|
||||
---
|
||||
|
||||
## 1. Service Inventory
|
||||
|
||||
### Shared Infrastructure (common-plat)
|
||||
|
||||
| Service | Port | Image | RAM Est. |
|
||||
| ------------------------ | ---------- | ---------------------------------------------------------------------- | -------- |
|
||||
| **platform-service** | 4003 | Fastify 5 + TS | ~200 MB |
|
||||
| **extraction-service** | 4005 | Fastify 5 + Python sidecar | ~350 MB |
|
||||
| **mcp-server** | 4007 | Fastify 5 + TS | ~150 MB |
|
||||
| **Cosmos DB Emulator** | 8081, 1234 | `mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview` | ~2 GB |
|
||||
| **Azurite** (blob) | 10000 | `mcr.microsoft.com/azure-storage/azurite` | ~100 MB |
|
||||
| **Mailpit** (SMTP) | 1025, 8025 | `axllent/mailpit` | ~50 MB |
|
||||
| **Traefik** (gateway) | 80, 8080 | `traefik:v3.3` | ~100 MB |
|
||||
| **Loki** (logs) | 3100 | `grafana/loki` | ~200 MB |
|
||||
| **Grafana** (dashboards) | 3000 | `grafana/grafana` | ~200 MB |
|
||||
|
||||
### Product Backends (Fastify 5 + TypeScript)
|
||||
|
||||
| Product | Port | RAM Est. |
|
||||
| ----------------------- | ---- | -------- |
|
||||
| **LysnrAI** backend | 4015 | ~150 MB |
|
||||
| **MindLyst** backend | 4014 | ~150 MB |
|
||||
| **ChronoMind** backend | 4011 | ~150 MB |
|
||||
| **JarvisJr** backend | 4012 | ~150 MB |
|
||||
| **NomGap** backend | 4013 | ~150 MB |
|
||||
| **PeakPulse** backend | 4010 | ~150 MB |
|
||||
| **FlowMonk** backend | 4017 | ~150 MB |
|
||||
| **NoteLett** backend | 4016 | ~150 MB |
|
||||
| **ActionTrail** backend | 4018 | ~150 MB |
|
||||
| **LocalMemGPT** backend | 4019 | ~150 MB |
|
||||
|
||||
### Web Dashboards (Next.js 16)
|
||||
|
||||
| Dashboard | Default Port | Compose Port | RAM Est. | Notes |
|
||||
| ---------------------- | ------------ | ------------ | -------- | ------------------------------------------------- |
|
||||
| **admin-web** | 3000 | **3001** | ~250 MB | No port in package.json; must set `PORT=3001` env |
|
||||
| **user-dashboard-web** | 3002 | 3002 | ~250 MB | Port set in package.json |
|
||||
| **tracker-web** | 3003 | 3003 | ~200 MB | Port set in package.json |
|
||||
| **NomGap** web | 3040 | 3040 | ~200 MB | Port set in Dockerfile |
|
||||
| **ChronoMind** web | 3000 | **3051** | ~200 MB | No port override; must set `PORT` env |
|
||||
| **JarvisJr** web | 3000 | **3052** | ~200 MB | No port override; must set `PORT` env |
|
||||
| **FlowMonk** web | 3000 | **3053** | ~200 MB | No port override; must set `PORT` env |
|
||||
| **NoteLett** web | 3000 | **3054** | ~200 MB | Dockerfile EXPOSE 3000; remap in compose |
|
||||
| **ActionTrail** web | 3000 | **3060** | ~200 MB | Dockerfile EXPOSE 3000; remap in compose |
|
||||
| **LocalMemGPT** web | 3070 | 3070 | ~200 MB | Port set in package.json + Dockerfile |
|
||||
| **MindLyst** web | 3050 | 3050 | ~200 MB | Port set in package.json (`-p 3050`) |
|
||||
|
||||
> **Port conflict warning:** Grafana uses port 3000. admin-web, ChronoMind, JarvisJr, FlowMonk, NoteLett, and ActionTrail webs all default to 3000. The compose file **must** either set `PORT` env var or remap via `ports:` mapping.
|
||||
|
||||
### Optional / AI
|
||||
|
||||
| Service | Port | RAM Est. |
|
||||
| ---------------- | ----- | ------------------------- |
|
||||
| **Ollama** (LLM) | 11434 | 4–16 GB (model-dependent) |
|
||||
|
||||
---
|
||||
|
||||
## 2. VM Sizing
|
||||
|
||||
### Minimum (dev/staging, no Ollama)
|
||||
|
||||
| Spec | Value |
|
||||
| --------- | ---------------- |
|
||||
| **vCPUs** | 8 |
|
||||
| **RAM** | 32 GB |
|
||||
| **Disk** | 100 GB SSD |
|
||||
| **OS** | Ubuntu 24.04 LTS |
|
||||
|
||||
**Breakdown:**
|
||||
|
||||
- Cosmos Emulator: ~2 GB
|
||||
- 10 Fastify backends × 150 MB = ~1.5 GB
|
||||
- 3 shared services × 250 MB = ~0.75 GB
|
||||
- 10 Next.js webs × 200 MB = ~2 GB
|
||||
- Infra (Traefik, Loki, Grafana, Azurite, Mailpit) = ~0.65 GB
|
||||
- K3s overhead = ~0.5 GB
|
||||
- **Subtotal: ~7.4 GB** → headroom for spikes + build cache = **32 GB**
|
||||
|
||||
### Recommended (with Ollama, small models)
|
||||
|
||||
| Spec | Value |
|
||||
| --------- | --------------------------------------------- |
|
||||
| **vCPUs** | 16 |
|
||||
| **RAM** | 64 GB |
|
||||
| **Disk** | 200 GB NVMe SSD |
|
||||
| **GPU** | Optional NVIDIA T4/A10 for fast LLM inference |
|
||||
| **OS** | Ubuntu 24.04 LTS |
|
||||
|
||||
### Cloud Equivalents
|
||||
|
||||
| Provider | Instance | vCPU | RAM | Price (approx) |
|
||||
| ----------- | ---------------- | ---- | ------ | ---------------- |
|
||||
| **Azure** | Standard_D8s_v5 | 8 | 32 GB | ~$280/mo |
|
||||
| **Azure** | Standard_D16s_v5 | 16 | 64 GB | ~$560/mo |
|
||||
| **AWS** | m6i.2xlarge | 8 | 32 GB | ~$280/mo |
|
||||
| **AWS** | m6i.4xlarge | 16 | 64 GB | ~$560/mo |
|
||||
| **Hetzner** | CPX51 | 16 | 32 GB | ~$45/mo |
|
||||
| **Hetzner** | CCX63 | 48 | 192 GB | ~$230/mo |
|
||||
| **Home** | Mac Mini M4 Pro | 12 | 48 GB | One-time ~$1,600 |
|
||||
|
||||
> **Cost tip:** Hetzner is 5–10× cheaper than Azure/AWS for dev/staging.
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture: Docker Compose → K3s Migration Path
|
||||
|
||||
### Phase 1: Docker Compose (after prerequisite work)
|
||||
|
||||
> **⚠️ Prerequisite:** 6 repos need Dockerfiles created, 3 webs need `output: 'standalone'` in next.config.ts, and ALL product repos must run `docker-prep.sh` before building (see §12 Audit Findings).
|
||||
|
||||
Create a **unified** `docker-compose.ecosystem.yml` that brings everything up.
|
||||
|
||||
### Phase 2: K3s (single-node Kubernetes)
|
||||
|
||||
[K3s](https://k3s.io/) is a lightweight, certified Kubernetes distro that runs on a single node. It gives you **real** `kubectl`, Helm, Ingress, and CRDs — identical APIs to production EKS/AKS/GKE.
|
||||
|
||||
**Why K3s over minikube/kind?**
|
||||
|
||||
- Production-grade (CNCF certified, used by Rancher)
|
||||
- Single binary, ~70 MB, installs in 30 seconds
|
||||
- Built-in Traefik Ingress (you already use Traefik!)
|
||||
- Built-in local-path StorageClass
|
||||
- Runs as systemd service (survives reboot)
|
||||
- Can scale to multi-node later by just joining worker nodes
|
||||
|
||||
---
|
||||
|
||||
## 4. Implementation Plan
|
||||
|
||||
### 4.1 Phase 1 — Unified Docker Compose
|
||||
|
||||
Create `docker-compose.ecosystem.yml` at workspace root (`~/code/mygh/`) that composes all services:
|
||||
|
||||
**⚠️ Critical prerequisite — run BEFORE `docker compose build`:**
|
||||
|
||||
```bash
|
||||
# Pack @bytelyst/* file: dependencies into tarballs for each product repo.
|
||||
# Every product repo has file: refs to ../learning_ai_common_plat/packages/*
|
||||
# which don't resolve inside Docker build context. docker-prep.sh packs them.
|
||||
for repo in learning_ai_trails learning_ai_local_memory_gpt learning_ai_notes learning_ai_fastgap; do
|
||||
(cd $repo && ./scripts/docker-prep.sh)
|
||||
done
|
||||
# Repos without docker-prep.sh yet need it created (see §12 Audit Findings)
|
||||
```
|
||||
|
||||
```yaml
|
||||
# ~/code/mygh/docker-compose.ecosystem.yml
|
||||
# NOTE: All product backends/webs have file: deps to @bytelyst/* packages.
|
||||
# You MUST run docker-prep.sh for each repo first (see above).
|
||||
|
||||
services:
|
||||
# ══════════════════════════════════════════════════════
|
||||
# INFRASTRUCTURE
|
||||
# ══════════════════════════════════════════════════════
|
||||
cosmos-emulator:
|
||||
image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
|
||||
ports: ['8081:8081', '1234:1234']
|
||||
environment:
|
||||
PROTOCOL: http
|
||||
ENABLE_EXPLORER: 'true'
|
||||
restart: unless-stopped
|
||||
|
||||
azurite:
|
||||
image: mcr.microsoft.com/azure-storage/azurite:3.35.0
|
||||
command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck
|
||||
ports: ['10000:10000']
|
||||
volumes: [azurite-data:/data]
|
||||
restart: unless-stopped
|
||||
|
||||
mailpit:
|
||||
image: axllent/mailpit:v1.27.5
|
||||
ports: ['1025:1025', '8025:8025']
|
||||
restart: unless-stopped
|
||||
|
||||
traefik:
|
||||
image: traefik:v3.3
|
||||
command:
|
||||
- '--api.insecure=true'
|
||||
- '--providers.docker=true'
|
||||
- '--providers.docker.exposedbydefault=false'
|
||||
- '--entrypoints.web.address=:80'
|
||||
ports: ['80:80', '8080:8080']
|
||||
volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
|
||||
restart: unless-stopped
|
||||
|
||||
loki:
|
||||
image: grafana/loki:3.3.2
|
||||
ports: ['3100:3100']
|
||||
volumes: [loki-data:/loki]
|
||||
restart: unless-stopped
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:11.4.0
|
||||
ports: ['3000:3000'] # NOTE: many Next.js webs also default to 3000 — avoid conflicts
|
||||
environment:
|
||||
GF_SECURITY_ADMIN_USER: admin
|
||||
GF_SECURITY_ADMIN_PASSWORD: lysnrai
|
||||
volumes: [grafana-data:/var/lib/grafana]
|
||||
restart: unless-stopped
|
||||
|
||||
# ══════════════════════════════════════════════════════
|
||||
# SHARED SERVICES (common-plat — no file: deps, pnpm workspace handles it)
|
||||
# ══════════════════════════════════════════════════════
|
||||
platform-service:
|
||||
build:
|
||||
context: ./learning_ai_common_plat
|
||||
dockerfile: services/platform-service/Dockerfile
|
||||
ports: ['4003:4003']
|
||||
env_file: [.env.ecosystem]
|
||||
environment:
|
||||
PORT: 4003
|
||||
COSMOS_AUTO_INIT: 'true'
|
||||
depends_on: [cosmos-emulator, azurite, mailpit]
|
||||
labels:
|
||||
- 'traefik.enable=true'
|
||||
- 'traefik.http.routers.platform.rule=Host(`platform.local`)'
|
||||
- 'traefik.http.services.platform.loadbalancer.server.port=4003'
|
||||
restart: unless-stopped
|
||||
|
||||
extraction-service:
|
||||
build:
|
||||
context: ./learning_ai_common_plat
|
||||
dockerfile: services/extraction-service/Dockerfile
|
||||
ports: ['4005:4005']
|
||||
env_file: [.env.ecosystem]
|
||||
environment:
|
||||
PORT: 4005
|
||||
depends_on: [cosmos-emulator]
|
||||
restart: unless-stopped
|
||||
|
||||
mcp-server:
|
||||
build:
|
||||
context: ./learning_ai_common_plat
|
||||
dockerfile: services/mcp-server/Dockerfile
|
||||
ports: ['4007:4007']
|
||||
env_file: [.env.ecosystem]
|
||||
environment:
|
||||
PORT: 4007
|
||||
PLATFORM_SERVICE_URL: http://platform-service:4003
|
||||
EXTRACTION_SERVICE_URL: http://extraction-service:4005
|
||||
depends_on: [platform-service, extraction-service]
|
||||
restart: unless-stopped
|
||||
|
||||
# ══════════════════════════════════════════════════════
|
||||
# PRODUCT BACKENDS
|
||||
# All have file: deps → must run docker-prep.sh first.
|
||||
# ActionTrail + LocalMemGPT Dockerfiles use repo-root context.
|
||||
# Others use backend/ subdir context.
|
||||
# ══════════════════════════════════════════════════════
|
||||
lysnrai-backend:
|
||||
build: ./learning_voice_ai_agent/backend # Needs Dockerfile (missing)
|
||||
ports: ['4015:4015']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4015', SERVICE_NAME: lysnrai-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
mindlyst-backend:
|
||||
build: ./learning_multimodal_memory_agents/backend # Needs Dockerfile (missing)
|
||||
ports: ['4014:4014']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4014', SERVICE_NAME: mindlyst-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
chronomind-backend:
|
||||
build: ./learning_ai_clock/backend # Needs Dockerfile (missing)
|
||||
ports: ['4011:4011']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4011', SERVICE_NAME: chronomind-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
jarvisjr-backend:
|
||||
build: ./learning_ai_jarvis_jr/backend # Needs Dockerfile (missing)
|
||||
ports: ['4012:4012']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4012', SERVICE_NAME: jarvisjr-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
nomgap-backend:
|
||||
build: ./learning_ai_fastgap/backend
|
||||
ports: ['4013:4013']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4013', SERVICE_NAME: nomgap-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
peakpulse-backend:
|
||||
build: ./learning_ai_peakpulse/backend # Needs Dockerfile (missing)
|
||||
ports: ['4010:4010']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4010', SERVICE_NAME: peakpulse-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
flowmonk-backend:
|
||||
build: ./learning_ai_flowmonk/backend # Needs Dockerfile (missing)
|
||||
ports: ['4017:4017']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4017', SERVICE_NAME: flowmonk-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
notelett-backend:
|
||||
build: ./learning_ai_notes/backend
|
||||
ports: ['4016:4016']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4016', SERVICE_NAME: notelett-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
actiontrail-backend:
|
||||
build:
|
||||
context: ./learning_ai_trails # Dockerfile expects repo-root context
|
||||
dockerfile: backend/Dockerfile
|
||||
ports: ['4018:4018']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4018', SERVICE_NAME: actiontrail-backend }
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
localmemgpt-backend:
|
||||
build:
|
||||
context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
|
||||
dockerfile: backend/Dockerfile
|
||||
ports: ['4019:4019']
|
||||
env_file: [.env.ecosystem]
|
||||
environment: { PORT: '4019', OLLAMA_URL: 'http://host.docker.internal:11434' }
|
||||
volumes: [localmemgpt-data:/app/db]
|
||||
restart: unless-stopped
|
||||
|
||||
# ══════════════════════════════════════════════════════
|
||||
# WEB DASHBOARDS
|
||||
# IMPORTANT: Most webs default to port 3000 internally.
|
||||
# Use PORT env var to override, or remap via host:container ports.
|
||||
# ══════════════════════════════════════════════════════
|
||||
admin-web:
|
||||
build: ./learning_ai_common_plat/dashboards/admin-web
|
||||
ports: ['3001:3001']
|
||||
env_file: [.env.ecosystem]
|
||||
environment:
|
||||
PORT: 3001 # admin-web has NO port override — defaults to 3000 without this!
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
user-dashboard:
|
||||
build: ./learning_voice_ai_agent/user-dashboard-web
|
||||
ports: ['3002:3002']
|
||||
env_file: [.env.ecosystem]
|
||||
depends_on: [lysnrai-backend]
|
||||
restart: unless-stopped
|
||||
|
||||
tracker-web:
|
||||
build: ./learning_ai_common_plat/dashboards/tracker-web
|
||||
ports: ['3003:3003']
|
||||
env_file: [.env.ecosystem]
|
||||
depends_on: [platform-service]
|
||||
restart: unless-stopped
|
||||
|
||||
nomgap-web:
|
||||
build: ./learning_ai_fastgap/web
|
||||
ports: ['3040:3040']
|
||||
environment:
|
||||
PORT: 3040
|
||||
NEXT_PUBLIC_NOMGAP_API_URL: http://nomgap-backend:4013/api
|
||||
NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003/api
|
||||
depends_on: [nomgap-backend]
|
||||
restart: unless-stopped
|
||||
|
||||
actiontrail-web:
|
||||
build: ./learning_ai_trails/web
|
||||
ports: ['3060:3000'] # Internal 3000 → external 3060
|
||||
environment:
|
||||
NEXT_PUBLIC_API_URL: http://actiontrail-backend:4018
|
||||
depends_on: [actiontrail-backend]
|
||||
restart: unless-stopped
|
||||
|
||||
localmemgpt-web:
|
||||
build:
|
||||
context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
|
||||
dockerfile: web/Dockerfile
|
||||
ports: ['3070:3070']
|
||||
environment:
|
||||
NEXT_PUBLIC_BACKEND_URL: http://localmemgpt-backend:4019
|
||||
depends_on: [localmemgpt-backend]
|
||||
restart: unless-stopped
|
||||
|
||||
notelett-web:
|
||||
build: ./learning_ai_notes/web
|
||||
ports: ['3054:3000'] # Internal 3000 → external 3054
|
||||
environment:
|
||||
NEXT_PUBLIC_BACKEND_URL: http://notelett-backend:4016
|
||||
depends_on: [notelett-backend]
|
||||
restart: unless-stopped
|
||||
|
||||
# Remaining webs need Dockerfiles + output:'standalone' in next.config.ts:
|
||||
# chronomind-web (3051), jarvisjr-web (3052), flowmonk-web (3053), mindlyst-web (3050)
|
||||
|
||||
volumes:
|
||||
azurite-data:
|
||||
loki-data:
|
||||
grafana-data:
|
||||
localmemgpt-data:
|
||||
```
|
||||
|
||||
### 4.2 Phase 2 — K3s (Single-Node Kubernetes)
|
||||
|
||||
#### Install K3s on the VM
|
||||
|
||||
```bash
|
||||
# Install K3s (30 seconds, includes kubectl + containerd)
|
||||
curl -sfL https://get.k3s.io | sh -
|
||||
|
||||
# Verify
|
||||
sudo kubectl get nodes
|
||||
# NAME STATUS ROLES AGE VERSION
|
||||
# myvm Ready control-plane,master 30s v1.30.x+k3s1
|
||||
|
||||
# Copy kubeconfig for non-root usage
|
||||
mkdir -p ~/.kube
|
||||
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
|
||||
sudo chown $(id -u):$(id -g) ~/.kube/config
|
||||
|
||||
# Install Helm
|
||||
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
|
||||
```
|
||||
|
||||
#### Namespace Layout
|
||||
|
||||
```bash
|
||||
kubectl create namespace bytelyst-infra # Cosmos, Azurite, Mailpit, Loki, Grafana
|
||||
kubectl create namespace bytelyst-platform # platform-service, extraction, mcp
|
||||
kubectl create namespace bytelyst-products # 10 product backends
|
||||
kubectl create namespace bytelyst-web # All Next.js dashboards
|
||||
```
|
||||
|
||||
#### Example K8s Manifest (one backend)
|
||||
|
||||
```yaml
|
||||
# k8s/products/lysnrai-backend.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: lysnrai-backend
|
||||
namespace: bytelyst-products
|
||||
labels:
|
||||
app: lysnrai-backend
|
||||
product: lysnrai
|
||||
spec:
|
||||
replicas: 1 # Scale to 2+ when ready
|
||||
selector:
|
||||
matchLabels:
|
||||
app: lysnrai-backend
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: lysnrai-backend
|
||||
spec:
|
||||
containers:
|
||||
- name: lysnrai-backend
|
||||
image: bytelyst/lysnrai-backend:latest
|
||||
ports:
|
||||
- containerPort: 4015
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: bytelyst-common-config
|
||||
- secretRef:
|
||||
name: bytelyst-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: '4015'
|
||||
- name: SERVICE_NAME
|
||||
value: lysnrai-backend
|
||||
resources:
|
||||
requests:
|
||||
memory: '128Mi'
|
||||
cpu: '100m'
|
||||
limits:
|
||||
memory: '256Mi'
|
||||
cpu: '500m'
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 4015
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 4015
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: lysnrai-backend
|
||||
namespace: bytelyst-products
|
||||
spec:
|
||||
selector:
|
||||
app: lysnrai-backend
|
||||
ports:
|
||||
- port: 4015
|
||||
targetPort: 4015
|
||||
```
|
||||
|
||||
#### Ingress (Traefik, built into K3s)
|
||||
|
||||
```yaml
|
||||
# k8s/ingress.yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: bytelyst-ingress
|
||||
namespace: bytelyst-products
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: web
|
||||
spec:
|
||||
rules:
|
||||
- host: lysnrai.local
|
||||
http:
|
||||
paths:
|
||||
- path: /api
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: lysnrai-backend
|
||||
port:
|
||||
number: 4015
|
||||
- host: platform.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: platform-service
|
||||
port:
|
||||
number: 4003
|
||||
# ... repeat per product
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Docker Compose → K3s Migration Cheat Sheet
|
||||
|
||||
| Docker Compose | K3s Equivalent |
|
||||
| ------------------------- | ------------------------------------------ |
|
||||
| `services:` | `Deployment` + `Service` |
|
||||
| `ports:` | `Service` (ClusterIP/NodePort) |
|
||||
| `env_file:` | `ConfigMap` + `Secret` |
|
||||
| `depends_on:` | `initContainers` or readiness probes |
|
||||
| `volumes:` | `PersistentVolumeClaim` (local-path) |
|
||||
| `restart: unless-stopped` | Built-in (K8s always restarts pods) |
|
||||
| `labels: traefik.*` | `Ingress` resource |
|
||||
| `docker compose up` | `kubectl apply -k k8s/` |
|
||||
| `docker compose logs` | `kubectl logs -f deploy/X` or Loki/Grafana |
|
||||
| `docker compose ps` | `kubectl get pods -A` |
|
||||
| Scale: change nothing | `kubectl scale deploy/X --replicas=3` |
|
||||
|
||||
---
|
||||
|
||||
## 6. K3s Practice Exercises (on single VM)
|
||||
|
||||
These exercises simulate real production scenarios:
|
||||
|
||||
### Exercise 1: Rolling Update
|
||||
|
||||
```bash
|
||||
# Build new image, deploy with zero downtime
|
||||
docker build -t bytelyst/lysnrai-backend:v2 ./learning_voice_ai_agent/backend
|
||||
kubectl set image deploy/lysnrai-backend lysnrai-backend=bytelyst/lysnrai-backend:v2 -n bytelyst-products
|
||||
kubectl rollout status deploy/lysnrai-backend -n bytelyst-products
|
||||
```
|
||||
|
||||
### Exercise 2: Scale Horizontally
|
||||
|
||||
```bash
|
||||
kubectl scale deploy/platform-service --replicas=3 -n bytelyst-platform
|
||||
# Traefik auto-balances across all 3 pods
|
||||
```
|
||||
|
||||
### Exercise 3: ConfigMap / Secret Rotation
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bytelyst-secrets \
|
||||
--from-literal=JWT_SECRET=new-secret \
|
||||
--from-literal=COSMOS_KEY=new-key \
|
||||
-n bytelyst-platform --dry-run=client -o yaml | kubectl apply -f -
|
||||
kubectl rollout restart deploy -n bytelyst-platform
|
||||
```
|
||||
|
||||
### Exercise 4: Resource Limits + HPA
|
||||
|
||||
```yaml
|
||||
# Auto-scale platform-service 1→5 pods based on CPU
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: platform-service-hpa
|
||||
namespace: bytelyst-platform
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: platform-service
|
||||
minReplicas: 1
|
||||
maxReplicas: 5
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
```
|
||||
|
||||
### Exercise 5: Helm Chart (packaged deploy)
|
||||
|
||||
```bash
|
||||
# Create chart scaffold
|
||||
helm create bytelyst-ecosystem
|
||||
# Templatize all 25+ services into one chart
|
||||
# Deploy: helm install bytelyst ./bytelyst-ecosystem -n bytelyst
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Scaling Path: Single VM → Multi-Node
|
||||
|
||||
```
|
||||
Phase 1: Docker Compose Phase 2: K3s (1 node)
|
||||
┌─────────────────────┐ ┌──────────────────────┐
|
||||
│ Single VM │ → │ Single VM + K3s │
|
||||
│ docker compose up │ │ kubectl apply -k │
|
||||
│ ~25 containers │ │ ~25 pods │
|
||||
└─────────────────────┘ └──────────────────────┘
|
||||
│
|
||||
▼
|
||||
Phase 3: K3s (3 nodes) Phase 4: Managed K8s
|
||||
┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ 1 server + 2 agents │ → │ AKS / EKS / GKE │
|
||||
│ Same manifests! │ │ Same manifests! │
|
||||
│ Real HA │ │ Auto-scaling nodes │
|
||||
└──────────────────────┘ └──────────────────────┘
|
||||
```
|
||||
|
||||
**Adding a worker node to K3s is one command:**
|
||||
|
||||
```bash
|
||||
# On the worker VM:
|
||||
curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=<token> sh -
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Recommended Directory Structure
|
||||
|
||||
```
|
||||
~/code/mygh/
|
||||
├── docker-compose.ecosystem.yml # Phase 1: all-in-one compose
|
||||
├── .env.ecosystem # Shared env vars
|
||||
├── k8s/ # Phase 2: K3s manifests
|
||||
│ ├── kustomization.yaml # Kustomize root
|
||||
│ ├── infra/ # Cosmos emulator, Azurite, Mailpit, Loki, Grafana
|
||||
│ ├── platform/ # platform-service, extraction, mcp
|
||||
│ ├── products/ # 10 product backends
|
||||
│ ├── web/ # 10+ Next.js dashboards
|
||||
│ ├── config/ # ConfigMaps
|
||||
│ └── secrets/ # Secrets (gitignored)
|
||||
├── helm/ # Phase 3: Helm chart
|
||||
│ └── bytelyst-ecosystem/
|
||||
│ ├── Chart.yaml
|
||||
│ ├── values.yaml
|
||||
│ └── templates/
|
||||
└── scripts/
|
||||
├── ecosystem-up.sh # docker compose -f docker-compose.ecosystem.yml up -d
|
||||
├── ecosystem-k3s-deploy.sh # kubectl apply -k k8s/
|
||||
└── ecosystem-build-all.sh # Build all Docker images
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Quick Start Commands
|
||||
|
||||
```bash
|
||||
# ── Phase 1: Docker Compose ───────────────────────────
|
||||
cd ~/code/mygh
|
||||
|
||||
# Build all images (first time, ~15-20 min)
|
||||
docker compose -f docker-compose.ecosystem.yml build
|
||||
|
||||
# Start everything
|
||||
docker compose -f docker-compose.ecosystem.yml up -d
|
||||
|
||||
# Check status
|
||||
docker compose -f docker-compose.ecosystem.yml ps
|
||||
|
||||
# View logs
|
||||
docker compose -f docker-compose.ecosystem.yml logs -f platform-service
|
||||
|
||||
# Tear down
|
||||
docker compose -f docker-compose.ecosystem.yml down
|
||||
|
||||
# ── Phase 2: K3s ──────────────────────────────────────
|
||||
# Build + load images into K3s containerd
|
||||
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service
|
||||
sudo k3s ctr images import <(docker save bytelyst/platform-service:latest)
|
||||
|
||||
# Deploy all
|
||||
kubectl apply -k k8s/
|
||||
|
||||
# Check pods
|
||||
kubectl get pods -A
|
||||
|
||||
# Port-forward for local access
|
||||
kubectl port-forward svc/platform-service 4003:4003 -n bytelyst-platform
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. What's NOT Dockerized Yet (gaps)
|
||||
|
||||
| Repo | Backend Dockerfile | Web Dockerfile | `docker-prep.sh` | `output:'standalone'` | Status |
|
||||
| --------------- | ------------------ | ------------------- | ---------------- | --------------------- | -------------------------------------------------------------- |
|
||||
| **LysnrAI** | ❌ | ✅ user-dashboard | ❌ | ✅ (conditional) | Need backend Dockerfile + docker-prep.sh |
|
||||
| **MindLyst** | ❌ | ❌ | ❌ | ❌ | Need all 4 |
|
||||
| **ChronoMind** | ❌ | ❌ | ❌ | ❌ | Need all 4 |
|
||||
| **JarvisJr** | ❌ | ❌ | ❌ | ❌ | Need all 4 |
|
||||
| **PeakPulse** | ❌ | ❌ | ❌ | ❌ | Need all 4 |
|
||||
| **FlowMonk** | ❌ | ❌ | ❌ | ❌ | Need all 4 |
|
||||
| **NomGap** | ✅ ⚠️ | ✅ | ✅ | ✅ | Backend Dockerfile ignores `file:` deps — see §12.F3 |
|
||||
| **NoteLett** | ✅ ⚠️ | ✅ | ✅ | ✅ | Backend Dockerfile `COPY .` pulls broken symlinks — see §12.F4 |
|
||||
| **ActionTrail** | ✅ | ✅ | ✅ | ✅ | Ready (uses `.tarballs/` pattern) |
|
||||
| **LocalMemGPT** | ✅ | ✅ | ✅ | ✅ | Ready (repo-root build context) |
|
||||
| **admin-web** | — | ✅ (in common-plat) | N/A (pnpm) | ✅ (conditional) | Ready |
|
||||
| **tracker-web** | — | ✅ (in common-plat) | N/A (pnpm) | ✅ (conditional) | Ready |
|
||||
|
||||
**6 repos need Dockerfiles** + `docker-prep.sh` + `output:'standalone'`. 2 existing Dockerfiles have issues.
|
||||
|
||||
---
|
||||
|
||||
## 11. Dockerfile Template (for missing repos)
|
||||
|
||||
> **Critical:** These templates assume you run `docker-prep.sh` first to pack `@bytelyst/*` file: deps into `.tarballs/`. Without this, `npm ci` will fail because `file:../../learning_ai_common_plat/packages/*` doesn't exist inside the Docker build context.
|
||||
|
||||
### Backend (Fastify 5 + TypeScript)
|
||||
|
||||
```dockerfile
|
||||
# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
|
||||
FROM node:22-alpine AS builder
|
||||
WORKDIR /app
|
||||
|
||||
COPY package.json package-lock.json ./
|
||||
COPY .tarballs/ ./.tarballs/
|
||||
RUN npm ci --ignore-scripts
|
||||
|
||||
COPY tsconfig.json ./
|
||||
COPY src/ ./src/
|
||||
RUN npx tsc
|
||||
|
||||
# Production stage
|
||||
FROM node:22-alpine
|
||||
WORKDIR /app
|
||||
ENV NODE_ENV=production
|
||||
|
||||
COPY package.json package-lock.json ./
|
||||
COPY .tarballs/ ./.tarballs/
|
||||
RUN npm ci --omit=dev --ignore-scripts
|
||||
|
||||
COPY --from=builder /app/dist ./dist
|
||||
# Copy shared/product.json if the backend reads it at runtime
|
||||
COPY shared/ ./shared/ 2>/dev/null || true
|
||||
|
||||
EXPOSE ${PORT:-4010}
|
||||
CMD ["node", "dist/server.js"]
|
||||
```
|
||||
|
||||
### Web (Next.js 16)
|
||||
|
||||
> **Prerequisite:** `next.config.ts` MUST have `output: 'standalone'` for the standalone Dockerfile pattern to work. Without it, `.next/standalone/` won't be generated and the COPY will fail.
|
||||
|
||||
```dockerfile
|
||||
# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
|
||||
FROM node:22-alpine AS builder
|
||||
WORKDIR /app
|
||||
|
||||
COPY package.json package-lock.json ./
|
||||
COPY .tarballs/ ./.tarballs/
|
||||
RUN npm ci
|
||||
|
||||
COPY . .
|
||||
|
||||
# Dummy env vars for Next.js build-time static page collection
|
||||
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
|
||||
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003
|
||||
|
||||
RUN npm run build
|
||||
|
||||
FROM node:22-alpine
|
||||
WORKDIR /app
|
||||
ENV NODE_ENV=production
|
||||
|
||||
COPY --from=builder /app/.next/standalone ./
|
||||
COPY --from=builder /app/.next/static ./.next/static
|
||||
COPY --from=builder /app/public ./public 2>/dev/null || true
|
||||
|
||||
EXPOSE 3000
|
||||
CMD ["node", "server.js"]
|
||||
```
|
||||
|
||||
### docker-prep.sh (for repos that don't have one yet)
|
||||
|
||||
Copy from `learning_ai_trails/scripts/docker-prep.sh` — it handles both `backend/` and `web/` targets, packs all `file:` refs into `.tarballs/`, and rewrites `package.json` to point at them.
|
||||
|
||||
```bash
|
||||
cp learning_ai_trails/scripts/docker-prep.sh <target-repo>/scripts/docker-prep.sh
|
||||
chmod +x <target-repo>/scripts/docker-prep.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Audit Findings (Review 2026-03-22)
|
||||
|
||||
Systematic code review of all claims in this document against the actual codebase.
|
||||
|
||||
### F1. Port Conflicts (CRITICAL)
|
||||
|
||||
**Grafana** uses port 3000. The following webs also default to 3000:
|
||||
|
||||
- admin-web (no port in package.json)
|
||||
- ChronoMind web (no port override)
|
||||
- JarvisJr web (no port override)
|
||||
- FlowMonk web (no port override)
|
||||
- NoteLett web (Dockerfile EXPOSE 3000)
|
||||
- ActionTrail web (Dockerfile EXPOSE 3000)
|
||||
|
||||
**Fix:** Set `PORT` env var in compose for each, or use host:container port remapping.
|
||||
|
||||
### F2. `file:` Dependencies Break Docker Builds (CRITICAL)
|
||||
|
||||
**Every** product backend and web has `file:../../learning_ai_common_plat/packages/*` dependencies in package.json. These resolve locally via symlinks but **fail inside Docker** because the sibling repo isn't in the build context.
|
||||
|
||||
**Pattern:** Each repo needs a `docker-prep.sh` that:
|
||||
|
||||
1. Runs `pnpm build` in common-plat
|
||||
2. Packs each `@bytelyst/*` package into a `.tarballs/*.tgz`
|
||||
3. Rewrites package.json `file:` refs → `file:.tarballs/bytelyst-*.tgz`
|
||||
|
||||
**Repos with `docker-prep.sh`:** ActionTrail ✅, LocalMemGPT ✅, NoteLett ✅, NomGap ✅
|
||||
**Repos missing `docker-prep.sh`:** LysnrAI, MindLyst, ChronoMind, JarvisJr, PeakPulse, FlowMonk
|
||||
|
||||
### F3. NomGap Backend Dockerfile Ignores `file:` Deps (BUG)
|
||||
|
||||
`@/learning_ai_fastgap/backend/Dockerfile` does `COPY package.json → npm ci` but doesn't copy `.tarballs/`. The `file:` refs will fail. Needs the `.tarballs/` COPY step added.
|
||||
|
||||
### F4. NoteLett Backend Dockerfile Copies Everything (BUG)
|
||||
|
||||
`@/learning_ai_notes/backend/Dockerfile` does `COPY . .` in the build stage, which includes broken `node_modules` symlinks from `file:` deps. Should use explicit `COPY` of `src/`, `tsconfig.json`, and `.tarballs/` instead.
|
||||
|
||||
### F5. Missing `output: 'standalone'` in next.config.ts (CRITICAL)
|
||||
|
||||
The Dockerfile template copies from `.next/standalone/` — this directory only exists when `output: 'standalone'` is set in `next.config.ts`.
|
||||
|
||||
| Web | Has `output: 'standalone'`? | Notes |
|
||||
| -------------- | --------------------------- | ----------------------------------------------------------------- |
|
||||
| NomGap | ✅ | Set directly |
|
||||
| NoteLett | ✅ | Set directly |
|
||||
| ActionTrail | ✅ | Set directly |
|
||||
| LocalMemGPT | ✅ | Set directly |
|
||||
| admin-web | ✅ | Conditional: `process.env.VERCEL ? {} : { output: 'standalone' }` |
|
||||
| tracker-web | ✅ | Conditional (same) |
|
||||
| user-dashboard | ✅ | Conditional (same) |
|
||||
| ChronoMind | ❌ | **Must add** |
|
||||
| JarvisJr | ❌ | **Must add** |
|
||||
| FlowMonk | ❌ | **Must add** |
|
||||
| MindLyst | ❌ | Unknown — needs check |
|
||||
|
||||
### F6. Build Context Mismatch for ActionTrail + LocalMemGPT
|
||||
|
||||
Their Dockerfiles expect repo-root as build context (they `COPY backend/...` and `COPY shared/...`). The compose `build:` must use `context: ./repo-name` + `dockerfile: backend/Dockerfile`, not `build: ./repo-name/backend`.
|
||||
|
||||
**Already correct in the compose above.** Calling it out so future editors don't "simplify" it.
|
||||
|
||||
### F7. Node.js Version Inconsistency
|
||||
|
||||
Existing Dockerfiles use mixed Node versions:
|
||||
|
||||
- NomGap, NoteLett: `node:20-alpine`
|
||||
- ActionTrail, LocalMemGPT: `node:22-alpine` / `node:22-slim`
|
||||
|
||||
**Recommendation:** Standardize on `node:22-alpine` for all new Dockerfiles. Existing ones work but should be updated for consistency.
|
||||
|
||||
### F8. Missing `--webpack` Flag for Next.js Builds
|
||||
|
||||
Several web apps require `--webpack` flag for builds (Serwist PWA incompatible with Turbopack, or `@bytelyst/*` file: ref transpilation). The Dockerfile template uses `npm run build` which should map to `next build --webpack` in package.json — verify each repo's `build` script.
|
||||
|
||||
### F9. Missing `.env.ecosystem` Template
|
||||
|
||||
The compose references `.env.ecosystem` but the doc doesn't define its contents. Key vars needed:
|
||||
|
||||
```env
|
||||
# .env.ecosystem — shared env for all services
|
||||
COSMOS_ENDPOINT=https://cosmos-emulator:8081
|
||||
COSMOS_KEY=<emulator-key>
|
||||
COSMOS_DATABASE=bytelyst
|
||||
JWT_SECRET=dev-ecosystem-secret-change-me
|
||||
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=http://azurite:10000/devstoreaccount1;
|
||||
PLATFORM_SERVICE_URL=http://platform-service:4003
|
||||
EXTRACTION_SERVICE_URL=http://extraction-service:4005
|
||||
DB_PROVIDER=memory
|
||||
NODE_ENV=production
|
||||
CORS_ORIGIN=*
|
||||
SMTP_HOST=mailpit
|
||||
SMTP_PORT=1025
|
||||
```
|
||||
|
||||
### F10. `host.docker.internal` Only Works on Docker Desktop (Mac/Windows)
|
||||
|
||||
LocalMemGPT uses `OLLAMA_URL: 'http://host.docker.internal:11434'` — this works on Docker Desktop but **not on Linux VMs** (which is the likely deployment target).
|
||||
|
||||
**Fix on Linux:** Add `extra_hosts: ['host.docker.internal:host-gateway']` to the service, or use `network_mode: host`.
|
||||
|
||||
### Summary of Required Work Before Compose Works
|
||||
|
||||
| Priority | Item | Count |
|
||||
| -------- | -------------------------------------------------------- | ------------- |
|
||||
| **P0** | Create missing `docker-prep.sh` | 6 repos |
|
||||
| **P0** | Create missing backend Dockerfiles | 6 repos |
|
||||
| **P0** | Create missing web Dockerfiles | 5 repos |
|
||||
| **P0** | Add `output: 'standalone'` to next.config.ts | 3 webs |
|
||||
| **P1** | Fix NomGap backend Dockerfile (add `.tarballs/` COPY) | 1 file |
|
||||
| **P1** | Fix NoteLett backend Dockerfile (explicit COPY, not `.`) | 1 file |
|
||||
| **P1** | Create `.env.ecosystem` template | 1 file |
|
||||
| **P2** | Standardize Node.js version to 22-alpine | 4 Dockerfiles |
|
||||
| **P2** | Add `extra_hosts` for Linux VM Ollama access | 1 service |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Question | Answer |
|
||||
| ------------------------------ | -------------------------------------------------------------------------------------------------------------- |
|
||||
| **Can deploy on single VM?** | **Yes.** All ~25 services fit in 32 GB RAM. |
|
||||
| **All Dockerized?** | 4/10 product repos fully Dockerized. 6 need Dockerfiles (copy-paste template). |
|
||||
| **K8s practice on single VM?** | **K3s** — certified K8s, single binary, same manifests scale to multi-node or AKS/EKS/GKE. |
|
||||
| **Recommended VM?** | 8 vCPU / 32 GB (min) or 16 vCPU / 64 GB (with Ollama). Hetzner ~$45/mo for dev. |
|
||||
| **Time to production K8s?** | Phase 1 (compose) → Phase 2 (K3s single) → Phase 3 (K3s multi) → Phase 4 (managed). Same manifests throughout. |
|
||||
Loading…
Reference in New Issue
Block a user