learning_ai_common_plat/docs/devops/SINGLE_VM_DEPLOYMENT.md
2026-03-22 14:06:44 -07:00

62 KiB
Raw Blame History

ByteLyst Ecosystem — Single-VM Deployment Guide

Deploy the entire ByteLyst ecosystem on one VM, fully Dockerized, with a local Kubernetes layer (Docker Desktop or K3s) for production-readiness practice.


Package-Manager Strategy (current transition plan)

  • learning_ai_common_plat is already the canonical pnpm workspace monorepo for shared packages, services, and dashboards.
  • Node/TypeScript product repos are moving toward pnpm as the long-term standard, but that migration is still repo-by-repo and incremental.
  • During the transition, each repo's Docker/build flow must follow the repo's own:
    • packageManager field
    • lockfile
    • Dockerfile
    • docker-prep.sh behavior
  • This plan does not merge all repos into one mega-monorepo. Product repos remain independent repositories.
  • Once a repo migrates to pnpm, it must be fully aligned in the same change set:
    • no pnpm-lock.yaml with npm ci
    • no stale package-lock.json
    • no mixed package-manager assumptions in CI, Docker, or docs

Migration-impact note: The deployment architecture in this guide stays the same during the pnpm migration (Compose, K3s, ingress, namespaces, VM sizing). The main maintenance surface is Docker/build instructions and dependency-prep flow. The biggest operational risk is stale templates or stale docs after an individual repo migrates.


1. Service Inventory

Shared Infrastructure (common-plat)

Service Port Image RAM Est.
platform-service 4003 Fastify 5 + TS ~200 MB
extraction-service 4005 Fastify 5 + Python sidecar ~350 MB
mcp-server 4007 Fastify 5 + TS ~150 MB
Cosmos DB Emulator 8081, 1234 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview ~2 GB
Azurite (blob) 10000 mcr.microsoft.com/azure-storage/azurite ~100 MB
Mailpit (SMTP) 1025, 8025 axllent/mailpit ~50 MB
Traefik (gateway) 80, 8080 traefik:v3.3 ~100 MB
Loki (logs) 3100 grafana/loki ~200 MB
Grafana (dashboards) 3000 grafana/grafana ~200 MB

Product Backends (Fastify 5 + TypeScript)

Product Port RAM Est.
LysnrAI backend 4015 ~150 MB
MindLyst backend 4014 ~150 MB
ChronoMind backend 4011 ~150 MB
JarvisJr backend 4012 ~150 MB
NomGap backend 4013 ~150 MB
PeakPulse backend 4010 ~150 MB
FlowMonk backend 4017 ~150 MB
NoteLett backend 4016 ~150 MB
ActionTrail backend 4018 ~150 MB
LocalMemGPT backend 4019 ~150 MB

Web Dashboards (Next.js 16)

Dashboard Default Port Compose Port RAM Est. Notes
admin-web 3000 3001 ~250 MB No port in package.json; must set PORT=3001 env
user-dashboard-web 3002 3002 ~250 MB Port set in package.json
tracker-web 3003 3003 ~200 MB Port set in package.json
NomGap web 3040 3040 ~200 MB Port set in Dockerfile
ChronoMind web 3000 3051 ~200 MB No port override; must set PORT env
JarvisJr web 3000 3052 ~200 MB No port override; must set PORT env
FlowMonk web 3000 3053 ~200 MB No port override; must set PORT env
NoteLett web 3000 3054 ~200 MB Dockerfile EXPOSE 3000; remap in compose
ActionTrail web 3000 3060 ~200 MB Dockerfile EXPOSE 3000; remap in compose
LocalMemGPT web 3070 3070 ~200 MB Port set in package.json + Dockerfile
MindLyst web 3050 3050 ~200 MB Port set in package.json (-p 3050)

Port conflict warning: Grafana uses port 3000. admin-web, ChronoMind, JarvisJr, FlowMonk, NoteLett, and ActionTrail webs all default to 3000. The compose file must either set PORT env var or remap via ports: mapping.

Optional / AI

Service Port RAM Est.
Ollama (LLM) 11434 416 GB (model-dependent)

2. VM Sizing

Minimum (dev/staging, no Ollama)

Spec Value
vCPUs 8
RAM 32 GB
Disk 100 GB SSD
OS Ubuntu 24.04 LTS

Breakdown:

  • Cosmos Emulator: ~2 GB
  • 10 Fastify backends × 150 MB = ~1.5 GB
  • 3 shared services × 250 MB = ~0.75 GB
  • 11 Next.js webs × 200 MB = ~2.2 GB
  • Infra (Traefik, Loki, Grafana, Azurite, Mailpit) = ~0.65 GB
  • K3s overhead = ~0.5 GB
  • Subtotal: ~7.4 GB → headroom for spikes + build cache = 32 GB
Spec Value
vCPUs 16
RAM 64 GB
Disk 200 GB NVMe SSD
GPU Optional NVIDIA T4/A10 for fast LLM inference
OS Ubuntu 24.04 LTS

Cloud Equivalents

Provider Instance vCPU RAM Price (approx)
Azure Standard_D8s_v5 8 32 GB ~$280/mo
Azure Standard_D16s_v5 16 64 GB ~$560/mo
AWS m6i.2xlarge 8 32 GB ~$280/mo
AWS m6i.4xlarge 16 64 GB ~$560/mo
Hetzner CPX51 16 32 GB ~$45/mo
Hetzner CCX63 48 192 GB ~$230/mo
Home Mac Mini M4 Pro 12 48 GB One-time ~$1,600

Cost tip: Hetzner is 510× cheaper than Azure/AWS for dev/staging.


3. Architecture: Docker Compose → K3s Migration Path

Phase 1: Docker Compose (after prerequisite work)

⚠️ Prerequisite: ALL product repos must run docker-prep.sh before building Docker images (see §12 Audit Findings). All Dockerfiles and output: 'standalone' configs are now in place (completed 2026-03-22). During the package-manager transition, each repo's Docker build must follow that repo's declared package manager and lockfile semantics rather than assuming npm or pnpm globally.

Create a unified docker-compose.ecosystem.yml that brings everything up.

Phase 2: Local Kubernetes (Docker Desktop or K3s)

Two options for single-node K8s — both give you real kubectl, Helm, Ingress, and CRDs identical to production AKS/EKS/GKE.

Docker Desktop includes a built-in kind (Kubernetes IN Docker) cluster. Enable it in Docker Desktop → Settings → Kubernetes → Enable Kubernetes.

  • Zero install — checkbox in Docker Desktop, K8s v1.31+ included
  • Images shareddocker build images are immediately available to K8s (no import step!)
  • GUI dashboard — Docker Desktop shows Deployments, Pods, Services, Ingresses, ConfigMaps, Secrets
  • kubectl pre-configured — context docker-desktop auto-created
  • Helm works — install via brew install helm
  • Best for: Mac/Windows local development, quick iteration, visual debugging
  • Limitation: Single-node only, can't add workers (use K3s for multi-node practice)

K3s is a lightweight, certified Kubernetes distro.

  • Production-grade (CNCF certified, used by Rancher)
  • Single binary, ~70 MB, installs in 30 seconds
  • Built-in Traefik Ingress (you already use Traefik!)
  • Built-in local-path StorageClass
  • Runs as systemd service (survives reboot)
  • Can scale to multi-node later by just joining worker nodes
  • Best for: Linux VMs, Hetzner/cloud deployment, multi-node scaling practice

4. Implementation Plan

4.1 Phase 1 — Unified Docker Compose

Create docker-compose.ecosystem.yml at workspace root (~/code/mygh/) that composes all services:

⚠️ Critical prerequisite — run BEFORE docker compose build:

# Pack @bytelyst/* file: dependencies into tarballs for each product repo.
# Every product repo has file: refs to ../learning_ai_common_plat/packages/*
# which don't resolve inside Docker build context. docker-prep.sh packs them.
# The prep flow must preserve each repo's package-manager semantics while rewriting
# file: refs for Docker contexts.
for repo in learning_voice_ai_agent learning_multimodal_memory_agents learning_ai_clock \
            learning_ai_jarvis_jr learning_ai_peakpulse learning_ai_flowmonk \
            learning_ai_fastgap learning_ai_notes learning_ai_trails learning_ai_local_memory_gpt; do
  (cd $repo && ./scripts/docker-prep.sh)
done
# ~/code/mygh/docker-compose.ecosystem.yml
# NOTE: All product backends/webs have file: deps to @bytelyst/* packages.
# You MUST run docker-prep.sh for each repo first (see above).

services:
  # ══════════════════════════════════════════════════════
  # INFRASTRUCTURE
  # ══════════════════════════════════════════════════════
  cosmos-emulator:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
    ports: ['8081:8081', '1234:1234']
    environment:
      PROTOCOL: http
      ENABLE_EXPLORER: 'true'
    restart: unless-stopped

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:3.35.0
    command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck
    ports: ['10000:10000']
    volumes: [azurite-data:/data]
    restart: unless-stopped

  mailpit:
    image: axllent/mailpit:v1.27.5
    ports: ['1025:1025', '8025:8025']
    restart: unless-stopped

  traefik:
    image: traefik:v3.3
    command:
      - '--api.insecure=true'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--entrypoints.web.address=:80'
    ports: ['80:80', '8080:8080']
    volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
    restart: unless-stopped

  loki:
    image: grafana/loki:3.3.2
    ports: ['3100:3100']
    volumes: [loki-data:/loki]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.4.0
    ports: ['3000:3000'] # NOTE: many Next.js webs also default to 3000 — avoid conflicts
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: lysnrai
    volumes: [grafana-data:/var/lib/grafana]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # SHARED SERVICES (common-plat — no file: deps, pnpm workspace handles it)
  # ══════════════════════════════════════════════════════
  platform-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/platform-service/Dockerfile
    ports: ['4003:4003']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4003
      COSMOS_AUTO_INIT: 'true'
    depends_on: [cosmos-emulator, azurite, mailpit]
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.platform.rule=Host(`platform.local`)'
      - 'traefik.http.services.platform.loadbalancer.server.port=4003'
    restart: unless-stopped

  extraction-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/extraction-service/Dockerfile
    ports: ['4005:4005']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4005
    depends_on: [cosmos-emulator]
    restart: unless-stopped

  mcp-server:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/mcp-server/Dockerfile
    ports: ['4007:4007']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4007
      PLATFORM_SERVICE_URL: http://platform-service:4003
      EXTRACTION_SERVICE_URL: http://extraction-service:4005
    depends_on: [platform-service, extraction-service]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # PRODUCT BACKENDS
  # All have file: deps → must run docker-prep.sh first.
  # ActionTrail + LocalMemGPT Dockerfiles use repo-root context.
  # Others use backend/ subdir context.
  # ══════════════════════════════════════════════════════
  lysnrai-backend:
    build: ./learning_voice_ai_agent/backend
    ports: ['4015:4015']
    env_file: [.env.ecosystem]
    environment: { PORT: '4015', SERVICE_NAME: lysnrai-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  mindlyst-backend:
    build: ./learning_multimodal_memory_agents/backend
    ports: ['4014:4014']
    env_file: [.env.ecosystem]
    environment: { PORT: '4014', SERVICE_NAME: mindlyst-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  chronomind-backend:
    build: ./learning_ai_clock/backend
    ports: ['4011:4011']
    env_file: [.env.ecosystem]
    environment: { PORT: '4011', SERVICE_NAME: chronomind-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  jarvisjr-backend:
    build: ./learning_ai_jarvis_jr/backend
    ports: ['4012:4012']
    env_file: [.env.ecosystem]
    environment: { PORT: '4012', SERVICE_NAME: jarvisjr-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-backend:
    build: ./learning_ai_fastgap/backend
    ports: ['4013:4013']
    env_file: [.env.ecosystem]
    environment: { PORT: '4013', SERVICE_NAME: nomgap-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  peakpulse-backend:
    build: ./learning_ai_peakpulse/backend
    ports: ['4010:4010']
    env_file: [.env.ecosystem]
    environment: { PORT: '4010', SERVICE_NAME: peakpulse-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  flowmonk-backend:
    build: ./learning_ai_flowmonk/backend
    ports: ['4017:4017']
    env_file: [.env.ecosystem]
    environment: { PORT: '4017', SERVICE_NAME: flowmonk-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  notelett-backend:
    build: ./learning_ai_notes/backend
    ports: ['4016:4016']
    env_file: [.env.ecosystem]
    environment: { PORT: '4016', SERVICE_NAME: notelett-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  actiontrail-backend:
    build:
      context: ./learning_ai_trails # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4018:4018']
    env_file: [.env.ecosystem]
    environment: { PORT: '4018', SERVICE_NAME: actiontrail-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  localmemgpt-backend:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4019:4019']
    env_file: [.env.ecosystem]
    environment: { PORT: '4019', OLLAMA_URL: 'http://host.docker.internal:11434' }
    volumes: [localmemgpt-data:/app/db]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # WEB DASHBOARDS
  # IMPORTANT: Most webs default to port 3000 internally.
  # Use PORT env var to override, or remap via host:container ports.
  # ══════════════════════════════════════════════════════
  admin-web:
    build: ./learning_ai_common_plat/dashboards/admin-web
    ports: ['3001:3001']
    env_file: [.env.ecosystem]
    environment:
      PORT: 3001 # admin-web has NO port override — defaults to 3000 without this!
    depends_on: [platform-service]
    restart: unless-stopped

  user-dashboard:
    build: ./learning_voice_ai_agent/user-dashboard-web
    ports: ['3002:3002']
    env_file: [.env.ecosystem]
    depends_on: [lysnrai-backend]
    restart: unless-stopped

  tracker-web:
    build: ./learning_ai_common_plat/dashboards/tracker-web
    ports: ['3003:3003']
    env_file: [.env.ecosystem]
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-web:
    build: ./learning_ai_fastgap/web
    ports: ['3040:3040']
    environment:
      PORT: 3040
      NEXT_PUBLIC_NOMGAP_API_URL: http://nomgap-backend:4013/api
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003/api
    depends_on: [nomgap-backend]
    restart: unless-stopped

  actiontrail-web:
    build: ./learning_ai_trails/web
    ports: ['3060:3000'] # Internal 3000 → external 3060
    environment:
      NEXT_PUBLIC_API_URL: http://actiontrail-backend:4018
    depends_on: [actiontrail-backend]
    restart: unless-stopped

  localmemgpt-web:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: web/Dockerfile
    ports: ['3070:3070']
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://localmemgpt-backend:4019
    depends_on: [localmemgpt-backend]
    restart: unless-stopped

  notelett-web:
    build: ./learning_ai_notes/web
    ports: ['3054:3000'] # Internal 3000 → external 3054
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://notelett-backend:4016
    depends_on: [notelett-backend]
    restart: unless-stopped

  chronomind-web:
    build: ./learning_ai_clock/web
    ports: ['3051:3000'] # Internal 3000 → external 3051
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://chronomind-backend:4011
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [chronomind-backend]
    restart: unless-stopped

  jarvisjr-web:
    build: ./learning_ai_jarvis_jr/web
    ports: ['3052:3000'] # Internal 3000 → external 3052
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://jarvisjr-backend:4012
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [jarvisjr-backend]
    restart: unless-stopped

  flowmonk-web:
    build: ./learning_ai_flowmonk/web
    ports: ['3053:3000'] # Internal 3000 → external 3053
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://flowmonk-backend:4017
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [flowmonk-backend]
    restart: unless-stopped

  mindlyst-web:
    build: ./learning_multimodal_memory_agents/mindlyst-native/web
    ports: ['3050:3050']
    environment:
      PORT: 3050 # package.json sets -p 3050
      NEXT_PUBLIC_BACKEND_URL: http://mindlyst-backend:4014
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [mindlyst-backend]
    restart: unless-stopped

volumes:
  azurite-data:
  loki-data:
  grafana-data:
  localmemgpt-data:

4.2 Phase 2 — Local Kubernetes (Docker Desktop or K3s)

Install K3s on the VM

# Install K3s (30 seconds, includes kubectl + containerd)
curl -sfL https://get.k3s.io | sh -

# Verify
sudo kubectl get nodes
# NAME       STATUS   ROLES                  AGE   VERSION
# myvm       Ready    control-plane,master   30s   v1.30.x+k3s1

# Copy kubeconfig for non-root usage
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Namespace Layout

kubectl create namespace bytelyst-infra      # Cosmos, Azurite, Mailpit, Loki, Grafana
kubectl create namespace bytelyst-platform   # platform-service, extraction, mcp
kubectl create namespace bytelyst-products   # 10 product backends
kubectl create namespace bytelyst-web        # All Next.js dashboards

Example K8s Manifest (one backend)

# k8s/products/lysnrai-backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
  labels:
    app: lysnrai-backend
    product: lysnrai
spec:
  replicas: 1 # Scale to 2+ when ready
  selector:
    matchLabels:
      app: lysnrai-backend
  template:
    metadata:
      labels:
        app: lysnrai-backend
    spec:
      containers:
        - name: lysnrai-backend
          image: bytelyst/lysnrai-backend:latest
          ports:
            - containerPort: 4015
          envFrom:
            - configMapRef:
                name: bytelyst-common-config
            - secretRef:
                name: bytelyst-secrets
          env:
            - name: PORT
              value: '4015'
            - name: SERVICE_NAME
              value: lysnrai-backend
          resources:
            requests:
              memory: '128Mi'
              cpu: '100m'
            limits:
              memory: '256Mi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
spec:
  selector:
    app: lysnrai-backend
  ports:
    - port: 4015
      targetPort: 4015

Ingress (Traefik, built into K3s)

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bytelyst-ingress
  namespace: bytelyst-products
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
  rules:
    - host: lysnrai.local
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: lysnrai-backend
                port:
                  number: 4015
    - host: platform.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: platform-service
                port:
                  number: 4003
    # ... repeat per product

5. Docker Compose → K3s Migration Cheat Sheet

Docker Compose K3s Equivalent
services: Deployment + Service
ports: Service (ClusterIP/NodePort)
env_file: ConfigMap + Secret
depends_on: initContainers or readiness probes
volumes: PersistentVolumeClaim (local-path)
restart: unless-stopped Built-in (K8s always restarts pods)
labels: traefik.* Ingress resource
docker compose up kubectl apply -k k8s/
docker compose logs kubectl logs -f deploy/X or Loki/Grafana
docker compose ps kubectl get pods -A
Scale: change nothing kubectl scale deploy/X --replicas=3

6. K3s Practice Exercises (on single VM)

These exercises simulate real production scenarios:

Exercise 1: Rolling Update

# Build new image, deploy with zero downtime
docker build -t bytelyst/lysnrai-backend:v2 ./learning_voice_ai_agent/backend
kubectl set image deploy/lysnrai-backend lysnrai-backend=bytelyst/lysnrai-backend:v2 -n bytelyst-products
kubectl rollout status deploy/lysnrai-backend -n bytelyst-products

Exercise 2: Scale Horizontally

kubectl scale deploy/platform-service --replicas=3 -n bytelyst-platform
# Traefik auto-balances across all 3 pods

Exercise 3: ConfigMap / Secret Rotation

kubectl create secret generic bytelyst-secrets \
  --from-literal=JWT_SECRET=new-secret \
  --from-literal=COSMOS_KEY=new-key \
  -n bytelyst-platform --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deploy -n bytelyst-platform

Exercise 4: Resource Limits + HPA

# Auto-scale platform-service 1→5 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: platform-service-hpa
  namespace: bytelyst-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: platform-service
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Exercise 5: Helm Chart (packaged deploy)

# Create chart scaffold
helm create bytelyst-ecosystem
# Templatize all 25+ services into one chart
# Deploy: helm install bytelyst ./bytelyst-ecosystem -n bytelyst

7. Scaling Path: Single VM → Multi-Node

Phase 1: Docker Compose          Phase 2: Local K8s (1 node)
┌─────────────────────┐          ┌──────────────────────────────┐
│  Single VM / Mac     │    →     │  Docker Desktop K8s (kind)   │
│  docker compose up   │          │  or K3s on Linux VM          │
│  ~25 containers      │          │  kubectl apply -k · ~25 pods │
└─────────────────────┘          └──────────────────────────────┘
                                          │
                                          ▼
Phase 3: K3s (3 nodes)           Phase 4: Managed K8s
┌──────────────────────┐         ┌──────────────────────┐
│  1 server + 2 agents │    →    │  AKS / EKS / GKE     │
│  Same manifests!     │         │  Same manifests!      │
│  Real HA             │         │  Auto-scaling nodes   │
└──────────────────────┘         └──────────────────────┘

Docker Desktop K8s → K3s migration: Same manifests, just change kubectl context.

Adding a worker node to K3s (Phase 3) is one command:

# On the worker VM:
curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=<token> sh -

~/code/mygh/
├── docker-compose.ecosystem.yml     # Phase 1: all-in-one compose
├── .env.ecosystem                   # Shared env vars
├── k8s/                             # Phase 2: K3s manifests
│   ├── kustomization.yaml           # Kustomize root
│   ├── infra/                       # Cosmos emulator, Azurite, Mailpit, Loki, Grafana
│   ├── platform/                    # platform-service, extraction, mcp
│   ├── products/                    # 10 product backends
│   ├── web/                         # 10+ Next.js dashboards
│   ├── config/                      # ConfigMaps
│   └── secrets/                     # Secrets (gitignored)
├── helm/                            # Phase 3: Helm chart
│   └── bytelyst-ecosystem/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
└── scripts/
    ├── ecosystem-up.sh              # docker compose -f docker-compose.ecosystem.yml up -d
    ├── ecosystem-k3s-deploy.sh      # kubectl apply -k k8s/
    └── ecosystem-build-all.sh       # Build all Docker images

9. Quick Start Commands

# ── Phase 1: Docker Compose ───────────────────────────
cd ~/code/mygh

# Build all images (first time, ~15-20 min)
docker compose -f docker-compose.ecosystem.yml build

# Start everything
docker compose -f docker-compose.ecosystem.yml up -d

# Check status
docker compose -f docker-compose.ecosystem.yml ps

# View logs
docker compose -f docker-compose.ecosystem.yml logs -f platform-service

# Tear down
docker compose -f docker-compose.ecosystem.yml down

# ── Phase 2a: Docker Desktop Kubernetes (Mac) ────────
# Enable K8s: Docker Desktop → Settings → Kubernetes → Enable
# Verify:
kubectl config use-context docker-desktop
kubectl get nodes    # Should show: docker-desktop   Ready   control-plane

# Build images (Docker Desktop shares images with K8s — no import needed!)
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service

# Deploy all
kubectl apply -k k8s/

# Check pods
kubectl get pods -A

# Port-forward for local access
kubectl port-forward svc/platform-service 4003:4003 -n bytelyst-platform

# Or view everything in Docker Desktop GUI → Kubernetes tab

# ── Phase 2b: K3s (Linux VM) ─────────────────────────
# Build + load images into K3s containerd
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service
sudo k3s ctr images import <(docker save bytelyst/platform-service:latest)

# Deploy (same manifests as Docker Desktop!)
kubectl apply -k k8s/
kubectl get pods -A

10. Dockerization Status (all complete)

Repo Backend Dockerfile Web Dockerfile docker-prep.sh output:'standalone' Package manager state Lockfile state Docker template type Status
LysnrAI user-dashboard (conditional) Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready
MindLyst (conditional) Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready
ChronoMind (conditional) Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready
JarvisJr (conditional) Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready
PeakPulse — (no web) No Node web surface in this repo Follow repo-local current lockfile Repo-specific during transition Ready
FlowMonk (conditional) Pilot candidate for pnpm migration Follow repo-local current lockfile Repo-specific during transition Ready
NomGap Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Fixed (added .tarballs/ COPY)
NoteLett Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Fixed (explicit COPY, not .)
ActionTrail Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready (uses .tarballs/ pattern)
LocalMemGPT Transitioning toward pnpm target Follow repo-local current lockfile Repo-specific during transition Ready (repo-root build context)
admin-web (in common-plat) N/A (pnpm) (conditional) pnpm workspace today pnpm-lock.yaml via common-plat pnpm workspace template Ready
tracker-web (in common-plat) N/A (pnpm) (conditional) pnpm workspace today pnpm-lock.yaml via common-plat pnpm workspace template Ready

All 10 product repos now have Dockerfiles, docker-prep.sh, and output:'standalone'. Created 2026-03-22.

Note: The table above tracks Docker readiness, not completed package-manager migration. For product repos, use each repo's actual packageManager field and lockfile until that repo is explicitly migrated to pnpm.


11. Dockerfile Template (reference)

Critical: Run docker-prep.sh first for product repos that use @bytelyst/* file: dependencies. The prep step packs those dependencies into .tarballs/ so Docker builds can resolve them inside the repo's own build context. During the migration window, Dockerfiles must match the repo's package manager and lockfile instead of assuming a single global install command.

Backend / service template — npm repo variant

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN npx tsc

# Production stage
FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --omit=dev --ignore-scripts

COPY --from=builder /app/dist ./dist
# Copy shared/product.json if the backend reads it at runtime
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]

Backend / service template — pnpm repo variant

# Pre-requisite: run ./scripts/docker-prep.sh if this repo rewrites @bytelyst/* file: deps
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --prod --ignore-scripts

COPY --from=builder /app/dist ./dist
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]

Web (Next.js 16) — npm repo variant

Prerequisite: next.config.ts MUST have output: 'standalone' for the standalone Dockerfile pattern to work. Without it, .next/standalone/ won't be generated and the COPY will fail.

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN npm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]

Web (Next.js 16) — pnpm repo variant

Prerequisite: next.config.ts MUST have output: 'standalone' for the standalone Dockerfile pattern to work. Keep the repo's build script authoritative, including --webpack where required.

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs when applicable
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]

Template selection rule:

  • Use the npm variant only for repos that are still on npm with package-lock.json and matching Docker/CI scripts.
  • Use the pnpm variant for repos that have migrated to pnpm and carry pnpm-lock.yaml plus aligned CI/Docker/docs.
  • Do not leave a repo in mixed state after migration.

docker-prep.sh (for repos that don't have one yet)

Copy from learning_ai_trails/scripts/docker-prep.sh — it handles both backend/ and web/ targets, packs all file: refs into .tarballs/, and rewrites package.json to point at them.

The important rule is behavior, not shell-script ancestry:

  • docker-prep.sh must support both legacy npm repos and migrated pnpm repos.
  • It must not hardcode npm assumptions into tarball rewrite flow.
  • It must preserve the repo's package-manager semantics after prep:
    • keep the correct lockfile
    • keep the correct install command in Docker/CI
    • keep .tarballs/ handling compatible with the repo's active package manager
cp learning_ai_trails/scripts/docker-prep.sh <target-repo>/scripts/docker-prep.sh
chmod +x <target-repo>/scripts/docker-prep.sh

11.1 Long-Term Package-Manager Migration Roadmap

End-state

  • learning_ai_common_plat remains the canonical pnpm workspace monorepo.
  • Node-based product repos migrate to pnpm over time.
  • Product repos remain independent repositories, not one combined workspace.
  • Current .tarballs/ handling for @bytelyst/* remains supported unless it is explicitly simplified later.

Migration principles

  • No big-bang migration.
  • One repo at a time.
  • Fully green before moving to the next repo.
  • Do not combine package-manager migration with unrelated dependency upgrades.
  • Migrate CI, Docker, and docs together in the same repo migration.
  • No mixed lockfile/package-manager state after migration.

Phase 0 — policy and checklist

  • Define package-manager policy.
  • Define migration checklist.
  • Define validation gates.

Pilot

  • learning_ai_flowmonk

Wave 1

  • learning_ai_trails
  • learning_ai_local_memory_gpt

Wave 2

  • learning_ai_notes
  • learning_ai_fastgap
  • learning_ai_clock

Wave 3

  • learning_ai_jarvis_jr
  • learning_voice_ai_agent

Validation gates per migrated repo

A repo is only considered migrated when all of the following are aligned and passing:

  • install
  • test
  • typecheck
  • build
  • Docker build
  • local shared package resolution
  • docs/CI updated

12. Audit Findings (Review 2026-03-22)

Systematic code review of all claims in this document against the actual codebase.

F1. Port Conflicts (CRITICAL)

Grafana uses port 3000. The following webs also default to 3000:

  • admin-web (no port in package.json)
  • ChronoMind web (no port override)
  • JarvisJr web (no port override)
  • FlowMonk web (no port override)
  • NoteLett web (Dockerfile EXPOSE 3000)
  • ActionTrail web (Dockerfile EXPOSE 3000)

Fix: Set PORT env var in compose for each, or use host:container port remapping.

F2. file: Dependencies Break Docker Builds (CRITICAL)

Every product backend and web has file:../../learning_ai_common_plat/packages/* dependencies in package.json. These resolve locally via symlinks but fail inside Docker because the sibling repo isn't in the build context.

Pattern: Each repo needs a docker-prep.sh that:

  1. Runs pnpm build in common-plat
  2. Packs each @bytelyst/* package into a .tarballs/*.tgz
  3. Rewrites package.json file: refs → file:.tarballs/bytelyst-*.tgz
  4. Preserves the product repo's active package-manager semantics during the rewrite

All 10 repos now have docker-prep.sh (created 2026-03-22). Previously only ActionTrail, LocalMemGPT, NoteLett, NomGap had them.

Long-term note: As product repos migrate to pnpm, this pattern remains valid. What changes is the repo-local install/runtime contract (pnpm install --frozen-lockfile instead of npm ci), not the deployment architecture or the need to package @bytelyst/* dependencies for isolated Docker contexts.

F3. NomGap Backend Dockerfile Ignores file: Deps (BUG)

@/learning_ai_fastgap/backend/Dockerfile does COPY package.json → npm ci but doesn't copy .tarballs/. The file: refs will fail. Needs the .tarballs/ COPY step added.

F4. NoteLett Backend Dockerfile Copies Everything (BUG)

@/learning_ai_notes/backend/Dockerfile does COPY . . in the build stage, which includes broken node_modules symlinks from file: deps. Should use explicit COPY of src/, tsconfig.json, and .tarballs/ instead.

F5. Missing output: 'standalone' in next.config.ts (CRITICAL)

The Dockerfile template copies from .next/standalone/ — this directory only exists when output: 'standalone' is set in next.config.ts.

Web Has output: 'standalone'? Notes
NomGap Set directly
NoteLett Set directly
ActionTrail Set directly
LocalMemGPT Set directly
admin-web Conditional: process.env.VERCEL ? {} : { output: 'standalone' }
tracker-web Conditional (same)
user-dashboard Conditional (same)
ChronoMind Added 2026-03-22 (conditional)
JarvisJr Added 2026-03-22 (conditional)
FlowMonk Added 2026-03-22 (conditional)
MindLyst Added 2026-03-22 (conditional)

F6. Build Context Mismatch for ActionTrail + LocalMemGPT

Their Dockerfiles expect repo-root as build context (they COPY backend/... and COPY shared/...). The compose build: must use context: ./repo-name + dockerfile: backend/Dockerfile, not build: ./repo-name/backend.

Already correct in the compose above. Calling it out so future editors don't "simplify" it.

F7. Node.js Version Inconsistency

Existing Dockerfiles use mixed Node versions:

  • NomGap, NoteLett: node:20-alpine
  • ActionTrail, LocalMemGPT: node:22-alpine / node:22-slim

Recommendation: Standardize on node:22-alpine for all new Dockerfiles. Existing ones work but should be updated for consistency.

F8. Missing --webpack Flag for Next.js Builds

Several web apps require --webpack flag for builds (Serwist PWA incompatible with Turbopack, or @bytelyst/* file: ref transpilation). The Dockerfile template should call the repo's package-manager-appropriate build command (npm run build or pnpm run build) and that script should map to next build --webpack where required.

F9. Missing .env.ecosystem Template

The compose references .env.ecosystem but the doc doesn't define its contents. Key vars needed:

# .env.ecosystem — shared env for all services
COSMOS_ENDPOINT=https://cosmos-emulator:8081
COSMOS_KEY=<emulator-key>
COSMOS_DATABASE=bytelyst
JWT_SECRET=dev-ecosystem-secret-change-me
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=http://azurite:10000/devstoreaccount1;
PLATFORM_SERVICE_URL=http://platform-service:4003
EXTRACTION_SERVICE_URL=http://extraction-service:4005
DB_PROVIDER=memory
NODE_ENV=production
CORS_ORIGIN=*
SMTP_HOST=mailpit
SMTP_PORT=1025

F10. host.docker.internal Only Works on Docker Desktop (Mac/Windows)

LocalMemGPT uses OLLAMA_URL: 'http://host.docker.internal:11434' — this works on Docker Desktop but not on Linux VMs (which is the likely deployment target).

Fix on Linux: Add extra_hosts: ['host.docker.internal:host-gateway'] to the service, or use network_mode: host.

Summary of Required Work Before Compose Works

Priority Item Count Status
P0 Create missing docker-prep.sh 6 repos Done (3 created, 3 already existed)
P0 Create missing backend Dockerfiles 6 repos Done
P0 Create missing web Dockerfiles 5 repos Done (4 created, PeakPulse has no web)
P0 Add output: 'standalone' to next.config.ts 3 webs Done (4 webs: ChronoMind, JarvisJr, FlowMonk, MindLyst)
P1 Fix NomGap backend Dockerfile (add .tarballs/ COPY) 1 file Done
P1 Fix NoteLett backend Dockerfile (explicit COPY, not .) 1 file Done
P1 Create .env.ecosystem template 1 file Pending
P2 Standardize Node.js version to 22-alpine 4 Dockerfiles Done (all new Dockerfiles use 22-alpine)
P2 Add extra_hosts for Linux VM Ollama access 1 service Pending

13. K8s & Docker Best Practices (from Production Comparisons)

Derived from comparing three production K8s deployments: a Go-based Call Controller (Paladin), a Python/FastAPI streaming agent platform (NetBond), and a Python/FastAPI voice agent (Welcome Agent). These patterns should be adopted when ByteLyst moves from Docker Compose → K3s → managed K8s.

13.1 Deployment — Zero-Downtime Rolling Updates

Do this (NetBond pattern):

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0 # Never kill a pod before its replacement is ready
      maxSurge: 1 # Only 1 extra pod during rollout
  template:
    spec:
      terminationGracePeriodSeconds: 45 # Match your app's drain timeout
      containers:
        - lifecycle:
            preStop:
              exec:
                command: ['sleep', '5'] # Let load balancer deregister before SIGTERM

Don't do this (Paladin anti-pattern):

maxUnavailable: 50% # Half your pods die instantly — users get errors
maxSurge: 50% # Wastes resources by doubling pod count

ByteLyst action: Every deployment template should use maxUnavailable: 0 + preStop sleep + explicit terminationGracePeriodSeconds matching the Fastify graceful shutdown timeout.

13.2 Pod Security Context

Always set (NetBond pattern):

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

If the app needs writable paths (e.g., /tmp, cache dirs), use emptyDir volumes:

volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}
volumeMounts:
  - name: tmp
    mountPath: /tmp
  - name: cache
    mountPath: /home/node/.cache

ByteLyst action: All Fastify backends are stateless — readOnlyRootFilesystem: true works. Next.js standalone servers may need /tmp writable.

13.3 Health Probes — Dedicated Endpoints

Do this:

livenessProbe:
  httpGet:
    path: /health # Dedicated lightweight endpoint
    port: 4003
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5 # Fast fail — 5s max
readinessProbe:
  httpGet:
    path: /health
    port: 4003
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 5

Don't do this (Welcome Agent anti-pattern):

livenessProbe:
  httpGet:
    path: /openapi.json # Heavy endpoint, not a health check
  timeoutSeconds: 60 # Masks real failures for a full minute

ByteLyst action: All backends already expose GET /health{ status: "ok" }. Use it. Set timeout to 5s.

13.4 Ingress — WebSocket Support

If any service uses WebSocket or SSE (FlowMonk SSE, LocalMemGPT streaming, future real-time features):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-buffering: 'off'
    nginx.ingress.kubernetes.io/proxy-http-version: '1.1'
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";      

Missing WebSocket headers is a silent failure — connections drop after 60s with no error.

13.5 HPA — Use autoscaling/v2

Do this:

apiVersion: autoscaling/v2 # Current API, supports multiple metrics

Don't do this:

apiVersion: autoscaling/v1 # Deprecated, CPU-only, will be removed

13.6 Dockerfile Best Practices

Practice Do Don't
ENTRYPOINT form ENTRYPOINT ["node", "dist/server.js"] (exec form) ENTRYPOINT node dist/server.js (shell form — PID 1 is /bin/sh, signals broken)
COPY scope COPY package.json ./ then COPY src/ ./src/ (selective) COPY . . (copies node_modules, .git, tests, everything)
Layer count Combine related RUN steps 3 separate RUN pip install / RUN npm install steps
Non-root USER node (Node.js images have a node user) Running as root in production
Local variant Provide local.Dockerfile without corp proxy/JFrog deps Single Dockerfile that only works behind corporate proxy
Build args ARG NODE_ENV=production for conditional behavior Hardcoded env in Dockerfile

13.7 Helm Values Layering

Use 3 layers for environment management:

values.yaml          # Base defaults (image, port, probes, resources)
├── env/local.yaml   # Local K3s overrides (lower resources, NodePort, no TLS)
├── env/dev.yaml     # Dev cluster overrides (replicas, hostnames, secrets)
└── env/prod.yaml    # Prod overrides (more replicas, real TLS, HPA limits)

Deploy with layered -f flags:

# Local
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/local.yaml

# Dev
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/dev.yaml

# Prod
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/prod.yaml

13.8 Namespace Strategy

Use Helm _helpers.tpl for namespace — never hardcode:

# ✅ Standard pattern — respects --namespace flag
{{ include "myapp.namespace" . }}

# ❌ Anti-pattern — ignores helm --namespace, causes confusion
{{ .Values.namespace }}

13.9 Secrets Management Progression

Phase Strategy Complexity
Phase 1 (Compose) .env.ecosystem file (gitignored) Trivial
Phase 2 (K3s) Native K8s Secret objects + kubectl create secret Low
Phase 3 (Production) Azure Key Vault via SecretProviderClass CSI driver Medium
Phase 4 (Enterprise) AKV + AzureKeyVaultSecret CRD with auto-sync High

ByteLyst already uses AKV in production (platform-service) — the CSI driver pattern is the natural next step.

13.10 CI/CD Best Practices (Lessons from Production Pipelines)

Practice Description
Semantic release Auto-version from commit messages (feat: → minor, fix: → patch). ByteLyst already uses this convention.
Image promotion Build once → push to staging repo → promote to gold/prod repo (never rebuild for prod).
Branch pipelines Different CI stages per branch: feature (lint+test), develop (build+deploy-dev), main (promote+deploy-prod).
Security gates SAST + SCA scans on every build. Block merges on critical findings.
Quality gates Unit tests + coverage + SonarQube. Fail pipeline if coverage drops.
Auto-deploy to dev Pipeline trigger: when build completes → auto-deploy to dev. Manual gate for prod.
Chart versioning Publish Helm chart to OCI registry (ACR) with semantic version. Pull by version during deploy.

13.11 Local K8s Development Script Template

A good local K8s deploy script should handle both Docker Desktop K8s (kind) and K3s:

#!/usr/bin/env bash
# deploy-local-k8s.sh — Full local K8s deployment for ByteLyst ecosystem
# Works with both Docker Desktop Kubernetes and K3s.

set -euo pipefail

NAMESPACE="bytelyst"
ACTION="${1:-deploy}"  # deploy | teardown

# Detect K8s runtime
detect_runtime() {
  local ctx
  ctx=$(kubectl config current-context 2>/dev/null || echo "")
  if [[ "$ctx" == "docker-desktop" ]]; then
    echo "docker-desktop"  # kind cluster inside Docker Desktop
  elif command -v k3s &>/dev/null; then
    echo "k3s"
  else
    echo "unknown"
  fi
}

case "$ACTION" in
  deploy)
    RUNTIME=$(detect_runtime)
    echo "Detected K8s runtime: $RUNTIME"

    # 1. Build all Docker images
    echo "Building images..."
    for svc in platform-service extraction-service mcp-server; do
      docker build -t bytelyst/$svc:local ./learning_ai_common_plat/services/$svc
    done

    # 2. Load images into K8s runtime
    if [[ "$RUNTIME" == "docker-desktop" ]]; then
      echo "Docker Desktop: images are already available to K8s (shared daemon)."
    elif [[ "$RUNTIME" == "k3s" ]]; then
      echo "K3s: importing images into containerd..."
      for img in $(docker images --format '{{.Repository}}:{{.Tag}}' | grep bytelyst); do
        sudo k3s ctr images import <(docker save "$img")
      done
    else
      echo "WARNING: Unknown K8s runtime. You may need to load images manually."
    fi

    # 3. Create namespace + secrets
    kubectl create namespace "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -
    kubectl create secret generic bytelyst-secrets \
      --from-env-file=.env.ecosystem \
      -n "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

    # 4. Deploy via Helm with local overlay
    helm upgrade --install bytelyst ./helm/bytelyst-ecosystem \
      -f helm/bytelyst-ecosystem/values.yaml \
      -f helm/bytelyst-ecosystem/env/local.yaml \
      -n "$NAMESPACE"

    # 5. Wait + verify
    kubectl rollout status deploy -n "$NAMESPACE" --timeout=120s
    echo ""
    echo "All pods:"
    kubectl get pods -n "$NAMESPACE"
    echo ""
    if [[ "$RUNTIME" == "docker-desktop" ]]; then
      echo "View in Docker Desktop: Kubernetes tab → namespace: $NAMESPACE"
    fi
    echo "Port-forward: kubectl port-forward svc/platform-service 4003:4003 -n $NAMESPACE"
    ;;

  teardown)
    helm uninstall bytelyst -n "$NAMESPACE" 2>/dev/null || true
    kubectl delete namespace "$NAMESPACE" 2>/dev/null || true
    echo "Teardown complete."
    ;;
esac

13.12 Quick Reference — What to Apply at Each Phase

Best Practice Phase 1 (Compose) Phase 2 (K3s) Phase 3 (Prod K8s)
Zero-downtime rolling update N/A Apply Apply
Pod security context N/A Apply Apply
Health probes N/A (use healthcheck:) Apply Apply
WebSocket ingress headers N/A If using SSE/WS Apply
HPA v2 N/A Optional Apply
Exec-form ENTRYPOINT Apply now
Selective COPY Apply now
Non-root user Apply now
Values layering N/A Apply Apply
Secrets via AKV CSI N/A N/A Apply
Semantic release Apply now
Image promotion N/A N/A Apply
Local deploy script N/A Apply Adapt

Summary

Question Answer
Can deploy on single VM? Yes. All ~25 services fit in 32 GB RAM.
All Dockerized? Yes. All 10 product repos now have Dockerfiles + docker-prep.sh.
Package-manager direction? pnpm is the long-term standard for Node/TS repos, but migration is phased repo-by-repo, not big-bang.
K8s practice on single VM? Docker Desktop K8s (Mac/Windows) or K3s (Linux). Same manifests scale to AKS/EKS/GKE.
Recommended VM? 8 vCPU / 32 GB (min) or 16 vCPU / 64 GB (with Ollama). Hetzner ~$45/mo for dev.
Time to production K8s? Phase 1 (compose) → Phase 2 (Docker Desktop / K3s) → Phase 3 (multi-node) → Phase 4 (managed). Same manifests.