History

saravanakumardb1 32522b218a fix(k8s): setup-k8s.sh — fail phase 3 on build errors, fix non-root crash - Phase 3 now exits with error if any image builds fail, preventing mark_phase_done from running. Previously it just warned and continued, which could lead to phase 5 deploying with missing images. - Moved mkdir from top-level scope into mark_phase_done(). The old top-level mkdir -p /opt/bytelyst/.setup-state-k8s crashed non-root invocations (--status, --help) due to set -e + permission denied. - Fixed header comment: 'containerd' → 'Docker runtime' (we use --docker). - Added --resume to header usage block (was supported but undocumented).		2026-03-24 14:52:53 -07:00
..
config	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
dashboards	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
infra	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
platform	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
products	fix(k8s): remove YAML anchors that break across document separators	2026-03-24 14:51:48 -07:00
namespaces.yaml	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
README.md	feat(infra): add production-grade k3s Kubernetes setup for single VM	2026-03-24 14:47:17 -07:00
setup-k8s.sh	fix(k8s): setup-k8s.sh — fail phase 3 on build errors, fix non-root crash	2026-03-24 14:52:53 -07:00

README.md

ByteLyst Single-VM Kubernetes Deployment (k3s)

Deploy the entire ByteLyst ecosystem (30 services, 10 products) on Kubernetes using k3s — a lightweight, CNCF-certified K8s distribution. Production-grade for ~50 beta users on a single Azure VM.

Quick Start

# Step 1: Run Docker setup phases 1-5 (system deps, Gitea, repos, packages)
cd /opt/bytelyst/learning_ai_common_plat/docs/devops/single_azure_vm
sudo ./docker/setup.sh --resume      # Runs phases 1-5 (skip 6-8)

# Step 2: Deploy to Kubernetes
sudo ./k8s/setup-k8s.sh              # 6 phases: preflight → k3s → images → config → deploy → health

# Step 3: Verify
/opt/bytelyst/check-health-k8s.sh    # 32 health checks
kubectl get pods -A                   # All pods

Prerequisites

Azure VM: Ubuntu 24.04 LTS, Standard_D8s_v5 (8 vCPU, 32 GB RAM, 128 GB disk)
Docker setup phases 1-5 completed (system deps, Gitea, repos, packages built + published)

Why k3s?

Feature	k3s	minikube	kind	microk8s
RAM overhead	~512 MB	~2 GB	~1 GB	~800 MB
Production-grade	Yes (CNCF certified)	No	No	Yes
Built-in Traefik	Yes	No	No	Optional
Single binary	Yes	No	No	No (snap)
SQLite backend	Yes (no etcd needed)	N/A	N/A	Dqlite

Architecture

Ubuntu 24.04 VM
├── k3s (single-node cluster)
│   ├── kube-system namespace
│   │   ├── CoreDNS
│   │   ├── Traefik Ingress Controller
│   │   ├── Local Path Provisioner
│   │   └── Metrics Server
│   │
│   ├── bytelyst-infra namespace
│   │   ├── cosmos-emulator (StatefulSet + PVC)
│   │   ├── azurite (StatefulSet + PVC)
│   │   ├── mailpit (Deployment)
│   │   ├── loki (StatefulSet + PVC)
│   │   └── grafana (Deployment + PVC)
│   │
│   ├── bytelyst-platform namespace
│   │   ├── platform-service (Deployment, replicas: 1)
│   │   ├── extraction-service (Deployment, replicas: 1)
│   │   └── mcp-server (Deployment, replicas: 1)
│   │
│   ├── bytelyst-dashboards namespace
│   │   ├── admin-web (Deployment, replicas: 1)
│   │   └── tracker-web (Deployment, replicas: 1)
│   │
│   └── bytelyst-products namespace
│       ├── *-backend (10 Deployments)
│       └── *-web (9 Deployments)
│
├── Ollama (systemd, host network — :11434)
└── Gitea (Docker container — :3300, used for build-time only)

File Structure

k8s/
├── README.md                    # This file
├── setup-k8s.sh                 # Bootstrap script (6 phases)
├── namespaces.yaml              # 4 namespaces
├── config/
│   ├── configmap.yaml           # Shared env vars (replaces .env.ecosystem)
│   └── secrets.yaml             # JWT_SECRET template (generated at deploy)
├── infra/
│   ├── cosmos-emulator.yaml     # StatefulSet + Service + PVC + NodePort
│   ├── azurite.yaml             # StatefulSet + Service + PVC + NodePort
│   ├── mailpit.yaml             # Deployment + Service + NodePort
│   ├── loki.yaml                # StatefulSet + Service + PVC + NodePort
│   ├── grafana.yaml             # Deployment + Service + PVC + NodePort
│   └── ollama-external.yaml     # Service + Endpoints → host Ollama
├── platform/
│   ├── platform-service.yaml    # Deployment + Service + NodePort (:4003)
│   ├── extraction-service.yaml  # Deployment + Service + NodePort (:4005)
│   └── mcp-server.yaml          # Deployment + Service + NodePort (:4007)
├── dashboards/
│   ├── admin-web.yaml           # Deployment + Service + NodePort (:3001)
│   └── tracker-web.yaml         # Deployment + Service + NodePort (:3003)
└── products/
    ├── backends.yaml            # 10 backend Deployments + Services + NodePorts
    └── webs.yaml                # 9 web Deployments + Services + NodePorts

Setup Phases

Phase	Duration	What happens
1. Pre-flight	~10s	Verify Docker phases 1-5 completed, check disk/RAM
2. Install k3s	~2 min	k3s with Docker runtime, NodePort range 1024-32767
3. Build images	~15 min	Docker compose build + tag as `bytelyst/<service>:latest`
4. Generate config	~30s	Namespaces, ConfigMap (3 copies), Secrets (JWT), Ollama endpoint
5. Deploy	~5 min	Apply manifests: infra → platform → dashboards → products
6. Health check	~1 min	32 endpoint checks + kubectl pod status

Key Design Decisions

k3s with Docker Runtime

k3s installed with --docker flag — reuses existing Docker daemon and images. No containerd import step needed. Same images used by Docker Compose work directly.

4-Namespace Isolation

bytelyst-infra — Cosmos emulator, Azurite, Mailpit, Loki, Grafana
bytelyst-platform — platform-service, extraction-service, mcp-server
bytelyst-dashboards — admin-web, tracker-web
bytelyst-products — 10 backends + 9 web apps

ConfigMap + Secrets are copied to all 3 app namespaces by the setup script.

Cross-Namespace DNS

K8s DNS: <service>.<namespace>.svc.cluster.local

Backends reach Cosmos: cosmos-emulator.bytelyst-infra.svc:8081
Webs reach backends: flowmonk-backend.bytelyst-products.svc:4017
Everything reaches platform: platform-service.bytelyst-platform.svc:4003

Ollama as External Service

Ollama stays on the host (systemd). A headless Service + Endpoints in bytelyst-infra points to the node's internal IP. Pods reach it as ollama.bytelyst-infra.svc:11434. Setup script auto-detects the node IP.

NodePort for External Access

All services use the same ports as Docker Compose (e.g., :4003, :3002, :3030). k3s is configured with --kube-apiserver-arg=service-node-port-range=1024-32767.

Resource Limits (tuned for 32 GB VM, 50 beta users)

Service type	CPU request	CPU limit	Memory request	Memory limit
Backend (×10)	100m	500m	256Mi	512Mi
Web app (×9)	100m	500m	256Mi	512Mi
Platform (×3)	200m	1000m	384Mi	768Mi
Cosmos emulator	500m	2000m	2Gi	3Gi
Grafana	100m	500m	128Mi	256Mi
Mailpit / Loki	50-100m	500m	64-128Mi	512Mi
k3s overhead	—	—	—	~512Mi
Ollama (host)	—	—	—	~3Gi
Total			~10 Gi	~19 Gi

Fits comfortably in 32 GB with ~13 GB headroom.

Readiness + Liveness Probes

Every service gets both:

Readiness: GET /health every 10s (traffic only when ready)
Liveness: GET /health every 30s (auto-restart on failure)
Backends: initialDelaySeconds: 15, Web apps: initialDelaySeconds: 15
Cosmos emulator: initialDelaySeconds: 60 (slow startup)

Operations Cheat Sheet

# ── Cluster status ─────────────────────────────────
kubectl get nodes                              # Node health
kubectl get pods -A                            # All pods
kubectl top pods -A                            # Resource usage (CPU/memory)

# ── Deploy / update ────────────────────────────────
kubectl apply -f k8s/products/                 # Re-apply product manifests
kubectl rollout restart deploy/flowmonk-backend -n bytelyst-products  # Rolling restart

# ── Scaling (for load testing) ─────────────────────
kubectl scale deploy/platform-service --replicas=2 -n bytelyst-platform
kubectl autoscale deploy/flowmonk-backend --min=1 --max=3 --cpu-percent=70 -n bytelyst-products

# ── Debugging ──────────────────────────────────────
kubectl logs deploy/platform-service -n bytelyst-platform -f        # Stream logs
kubectl describe pod <name> -n bytelyst-platform                    # Pod events
kubectl exec -it deploy/platform-service -n bytelyst-platform -- sh # Shell into pod

# ── Teardown ───────────────────────────────────────
sudo ./setup-k8s.sh --teardown                 # Delete all namespaces (keep k3s)
/usr/local/bin/k3s-uninstall.sh                # Uninstall k3s completely

Port Map (same as Docker Compose)

Service	Port	Health check
Gitea (npm)	3300	`http://localhost:3300/api/v1/version`
Ollama (LLM)	11434	`http://localhost:11434/api/version`
Cosmos Explorer	1234	`http://localhost:1234`
Azurite (Blob)	10000	`http://localhost:10000/devstoreaccount1?comp=list`
Mailpit UI	8025	`http://localhost:8025`
Loki	3100	`http://localhost:3100/ready`
Grafana	3000	`http://localhost:3000/api/health`
platform-service	4003	`/health`
extraction-service	4005	`/health`
mcp-server	4007	`/health`
admin-web	3001	`/`
tracker-web	3003	`/`
Backends	4010-4019	`/health`
Web apps	3002, 3030, 3035, 3040, 3045, 3050, 3055, 3060, 3070	`/`

Switching Between Docker Compose and K8s

Both approaches coexist on the same VM:

# Docker → K8s
cd /opt/bytelyst/learning_ai_common_plat
docker compose -f docker-compose.ecosystem.yml down   # Stop compose stack
sudo ../docs/devops/single_azure_vm/k8s/setup-k8s.sh  # Deploy to k3s

# K8s → Docker
sudo ./setup-k8s.sh --teardown                        # Remove k8s resources
sudo ../docker/setup.sh --phase=7                      # Re-deploy via compose

Both share: Gitea registry (Docker container), Ollama (systemd), and built Docker images.

README.md Unescape Escape