learning_ai_common_plat/docs/devops/SINGLE_VM_DEPLOYMENT.md

39 KiB
Raw Blame History

ByteLyst Ecosystem — Single-VM Deployment Guide

Deploy the entire ByteLyst ecosystem on one VM, fully Dockerized, with a K3s Kubernetes layer for production-readiness practice.


1. Service Inventory

Shared Infrastructure (common-plat)

Service Port Image RAM Est.
platform-service 4003 Fastify 5 + TS ~200 MB
extraction-service 4005 Fastify 5 + Python sidecar ~350 MB
mcp-server 4007 Fastify 5 + TS ~150 MB
Cosmos DB Emulator 8081, 1234 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview ~2 GB
Azurite (blob) 10000 mcr.microsoft.com/azure-storage/azurite ~100 MB
Mailpit (SMTP) 1025, 8025 axllent/mailpit ~50 MB
Traefik (gateway) 80, 8080 traefik:v3.3 ~100 MB
Loki (logs) 3100 grafana/loki ~200 MB
Grafana (dashboards) 3000 grafana/grafana ~200 MB

Product Backends (Fastify 5 + TypeScript)

Product Port RAM Est.
LysnrAI backend 4015 ~150 MB
MindLyst backend 4014 ~150 MB
ChronoMind backend 4011 ~150 MB
JarvisJr backend 4012 ~150 MB
NomGap backend 4013 ~150 MB
PeakPulse backend 4010 ~150 MB
FlowMonk backend 4017 ~150 MB
NoteLett backend 4016 ~150 MB
ActionTrail backend 4018 ~150 MB
LocalMemGPT backend 4019 ~150 MB

Web Dashboards (Next.js 16)

Dashboard Default Port Compose Port RAM Est. Notes
admin-web 3000 3001 ~250 MB No port in package.json; must set PORT=3001 env
user-dashboard-web 3002 3002 ~250 MB Port set in package.json
tracker-web 3003 3003 ~200 MB Port set in package.json
NomGap web 3040 3040 ~200 MB Port set in Dockerfile
ChronoMind web 3000 3051 ~200 MB No port override; must set PORT env
JarvisJr web 3000 3052 ~200 MB No port override; must set PORT env
FlowMonk web 3000 3053 ~200 MB No port override; must set PORT env
NoteLett web 3000 3054 ~200 MB Dockerfile EXPOSE 3000; remap in compose
ActionTrail web 3000 3060 ~200 MB Dockerfile EXPOSE 3000; remap in compose
LocalMemGPT web 3070 3070 ~200 MB Port set in package.json + Dockerfile
MindLyst web 3050 3050 ~200 MB Port set in package.json (-p 3050)

Port conflict warning: Grafana uses port 3000. admin-web, ChronoMind, JarvisJr, FlowMonk, NoteLett, and ActionTrail webs all default to 3000. The compose file must either set PORT env var or remap via ports: mapping.

Optional / AI

Service Port RAM Est.
Ollama (LLM) 11434 416 GB (model-dependent)

2. VM Sizing

Minimum (dev/staging, no Ollama)

Spec Value
vCPUs 8
RAM 32 GB
Disk 100 GB SSD
OS Ubuntu 24.04 LTS

Breakdown:

  • Cosmos Emulator: ~2 GB
  • 10 Fastify backends × 150 MB = ~1.5 GB
  • 3 shared services × 250 MB = ~0.75 GB
  • 10 Next.js webs × 200 MB = ~2 GB
  • Infra (Traefik, Loki, Grafana, Azurite, Mailpit) = ~0.65 GB
  • K3s overhead = ~0.5 GB
  • Subtotal: ~7.4 GB → headroom for spikes + build cache = 32 GB
Spec Value
vCPUs 16
RAM 64 GB
Disk 200 GB NVMe SSD
GPU Optional NVIDIA T4/A10 for fast LLM inference
OS Ubuntu 24.04 LTS

Cloud Equivalents

Provider Instance vCPU RAM Price (approx)
Azure Standard_D8s_v5 8 32 GB ~$280/mo
Azure Standard_D16s_v5 16 64 GB ~$560/mo
AWS m6i.2xlarge 8 32 GB ~$280/mo
AWS m6i.4xlarge 16 64 GB ~$560/mo
Hetzner CPX51 16 32 GB ~$45/mo
Hetzner CCX63 48 192 GB ~$230/mo
Home Mac Mini M4 Pro 12 48 GB One-time ~$1,600

Cost tip: Hetzner is 510× cheaper than Azure/AWS for dev/staging.


3. Architecture: Docker Compose → K3s Migration Path

Phase 1: Docker Compose (after prerequisite work)

⚠️ Prerequisite: 6 repos need Dockerfiles created, 3 webs need output: 'standalone' in next.config.ts, and ALL product repos must run docker-prep.sh before building (see §12 Audit Findings).

Create a unified docker-compose.ecosystem.yml that brings everything up.

Phase 2: K3s (single-node Kubernetes)

K3s is a lightweight, certified Kubernetes distro that runs on a single node. It gives you real kubectl, Helm, Ingress, and CRDs — identical APIs to production EKS/AKS/GKE.

Why K3s over minikube/kind?

  • Production-grade (CNCF certified, used by Rancher)
  • Single binary, ~70 MB, installs in 30 seconds
  • Built-in Traefik Ingress (you already use Traefik!)
  • Built-in local-path StorageClass
  • Runs as systemd service (survives reboot)
  • Can scale to multi-node later by just joining worker nodes

4. Implementation Plan

4.1 Phase 1 — Unified Docker Compose

Create docker-compose.ecosystem.yml at workspace root (~/code/mygh/) that composes all services:

⚠️ Critical prerequisite — run BEFORE docker compose build:

# Pack @bytelyst/* file: dependencies into tarballs for each product repo.
# Every product repo has file: refs to ../learning_ai_common_plat/packages/*
# which don't resolve inside Docker build context. docker-prep.sh packs them.
for repo in learning_ai_trails learning_ai_local_memory_gpt learning_ai_notes learning_ai_fastgap; do
  (cd $repo && ./scripts/docker-prep.sh)
done
# Repos without docker-prep.sh yet need it created (see §12 Audit Findings)
# ~/code/mygh/docker-compose.ecosystem.yml
# NOTE: All product backends/webs have file: deps to @bytelyst/* packages.
# You MUST run docker-prep.sh for each repo first (see above).

services:
  # ══════════════════════════════════════════════════════
  # INFRASTRUCTURE
  # ══════════════════════════════════════════════════════
  cosmos-emulator:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
    ports: ['8081:8081', '1234:1234']
    environment:
      PROTOCOL: http
      ENABLE_EXPLORER: 'true'
    restart: unless-stopped

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:3.35.0
    command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck
    ports: ['10000:10000']
    volumes: [azurite-data:/data]
    restart: unless-stopped

  mailpit:
    image: axllent/mailpit:v1.27.5
    ports: ['1025:1025', '8025:8025']
    restart: unless-stopped

  traefik:
    image: traefik:v3.3
    command:
      - '--api.insecure=true'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--entrypoints.web.address=:80'
    ports: ['80:80', '8080:8080']
    volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
    restart: unless-stopped

  loki:
    image: grafana/loki:3.3.2
    ports: ['3100:3100']
    volumes: [loki-data:/loki]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.4.0
    ports: ['3000:3000'] # NOTE: many Next.js webs also default to 3000 — avoid conflicts
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: lysnrai
    volumes: [grafana-data:/var/lib/grafana]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # SHARED SERVICES (common-plat — no file: deps, pnpm workspace handles it)
  # ══════════════════════════════════════════════════════
  platform-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/platform-service/Dockerfile
    ports: ['4003:4003']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4003
      COSMOS_AUTO_INIT: 'true'
    depends_on: [cosmos-emulator, azurite, mailpit]
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.platform.rule=Host(`platform.local`)'
      - 'traefik.http.services.platform.loadbalancer.server.port=4003'
    restart: unless-stopped

  extraction-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/extraction-service/Dockerfile
    ports: ['4005:4005']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4005
    depends_on: [cosmos-emulator]
    restart: unless-stopped

  mcp-server:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/mcp-server/Dockerfile
    ports: ['4007:4007']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4007
      PLATFORM_SERVICE_URL: http://platform-service:4003
      EXTRACTION_SERVICE_URL: http://extraction-service:4005
    depends_on: [platform-service, extraction-service]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # PRODUCT BACKENDS
  # All have file: deps → must run docker-prep.sh first.
  # ActionTrail + LocalMemGPT Dockerfiles use repo-root context.
  # Others use backend/ subdir context.
  # ══════════════════════════════════════════════════════
  lysnrai-backend:
    build: ./learning_voice_ai_agent/backend # Needs Dockerfile (missing)
    ports: ['4015:4015']
    env_file: [.env.ecosystem]
    environment: { PORT: '4015', SERVICE_NAME: lysnrai-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  mindlyst-backend:
    build: ./learning_multimodal_memory_agents/backend # Needs Dockerfile (missing)
    ports: ['4014:4014']
    env_file: [.env.ecosystem]
    environment: { PORT: '4014', SERVICE_NAME: mindlyst-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  chronomind-backend:
    build: ./learning_ai_clock/backend # Needs Dockerfile (missing)
    ports: ['4011:4011']
    env_file: [.env.ecosystem]
    environment: { PORT: '4011', SERVICE_NAME: chronomind-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  jarvisjr-backend:
    build: ./learning_ai_jarvis_jr/backend # Needs Dockerfile (missing)
    ports: ['4012:4012']
    env_file: [.env.ecosystem]
    environment: { PORT: '4012', SERVICE_NAME: jarvisjr-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-backend:
    build: ./learning_ai_fastgap/backend
    ports: ['4013:4013']
    env_file: [.env.ecosystem]
    environment: { PORT: '4013', SERVICE_NAME: nomgap-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  peakpulse-backend:
    build: ./learning_ai_peakpulse/backend # Needs Dockerfile (missing)
    ports: ['4010:4010']
    env_file: [.env.ecosystem]
    environment: { PORT: '4010', SERVICE_NAME: peakpulse-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  flowmonk-backend:
    build: ./learning_ai_flowmonk/backend # Needs Dockerfile (missing)
    ports: ['4017:4017']
    env_file: [.env.ecosystem]
    environment: { PORT: '4017', SERVICE_NAME: flowmonk-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  notelett-backend:
    build: ./learning_ai_notes/backend
    ports: ['4016:4016']
    env_file: [.env.ecosystem]
    environment: { PORT: '4016', SERVICE_NAME: notelett-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  actiontrail-backend:
    build:
      context: ./learning_ai_trails # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4018:4018']
    env_file: [.env.ecosystem]
    environment: { PORT: '4018', SERVICE_NAME: actiontrail-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  localmemgpt-backend:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4019:4019']
    env_file: [.env.ecosystem]
    environment: { PORT: '4019', OLLAMA_URL: 'http://host.docker.internal:11434' }
    volumes: [localmemgpt-data:/app/db]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # WEB DASHBOARDS
  # IMPORTANT: Most webs default to port 3000 internally.
  # Use PORT env var to override, or remap via host:container ports.
  # ══════════════════════════════════════════════════════
  admin-web:
    build: ./learning_ai_common_plat/dashboards/admin-web
    ports: ['3001:3001']
    env_file: [.env.ecosystem]
    environment:
      PORT: 3001 # admin-web has NO port override — defaults to 3000 without this!
    depends_on: [platform-service]
    restart: unless-stopped

  user-dashboard:
    build: ./learning_voice_ai_agent/user-dashboard-web
    ports: ['3002:3002']
    env_file: [.env.ecosystem]
    depends_on: [lysnrai-backend]
    restart: unless-stopped

  tracker-web:
    build: ./learning_ai_common_plat/dashboards/tracker-web
    ports: ['3003:3003']
    env_file: [.env.ecosystem]
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-web:
    build: ./learning_ai_fastgap/web
    ports: ['3040:3040']
    environment:
      PORT: 3040
      NEXT_PUBLIC_NOMGAP_API_URL: http://nomgap-backend:4013/api
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003/api
    depends_on: [nomgap-backend]
    restart: unless-stopped

  actiontrail-web:
    build: ./learning_ai_trails/web
    ports: ['3060:3000'] # Internal 3000 → external 3060
    environment:
      NEXT_PUBLIC_API_URL: http://actiontrail-backend:4018
    depends_on: [actiontrail-backend]
    restart: unless-stopped

  localmemgpt-web:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: web/Dockerfile
    ports: ['3070:3070']
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://localmemgpt-backend:4019
    depends_on: [localmemgpt-backend]
    restart: unless-stopped

  notelett-web:
    build: ./learning_ai_notes/web
    ports: ['3054:3000'] # Internal 3000 → external 3054
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://notelett-backend:4016
    depends_on: [notelett-backend]
    restart: unless-stopped

  # Remaining webs need Dockerfiles + output:'standalone' in next.config.ts:
  # chronomind-web (3051), jarvisjr-web (3052), flowmonk-web (3053), mindlyst-web (3050)

volumes:
  azurite-data:
  loki-data:
  grafana-data:
  localmemgpt-data:

4.2 Phase 2 — K3s (Single-Node Kubernetes)

Install K3s on the VM

# Install K3s (30 seconds, includes kubectl + containerd)
curl -sfL https://get.k3s.io | sh -

# Verify
sudo kubectl get nodes
# NAME       STATUS   ROLES                  AGE   VERSION
# myvm       Ready    control-plane,master   30s   v1.30.x+k3s1

# Copy kubeconfig for non-root usage
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Namespace Layout

kubectl create namespace bytelyst-infra      # Cosmos, Azurite, Mailpit, Loki, Grafana
kubectl create namespace bytelyst-platform   # platform-service, extraction, mcp
kubectl create namespace bytelyst-products   # 10 product backends
kubectl create namespace bytelyst-web        # All Next.js dashboards

Example K8s Manifest (one backend)

# k8s/products/lysnrai-backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
  labels:
    app: lysnrai-backend
    product: lysnrai
spec:
  replicas: 1 # Scale to 2+ when ready
  selector:
    matchLabels:
      app: lysnrai-backend
  template:
    metadata:
      labels:
        app: lysnrai-backend
    spec:
      containers:
        - name: lysnrai-backend
          image: bytelyst/lysnrai-backend:latest
          ports:
            - containerPort: 4015
          envFrom:
            - configMapRef:
                name: bytelyst-common-config
            - secretRef:
                name: bytelyst-secrets
          env:
            - name: PORT
              value: '4015'
            - name: SERVICE_NAME
              value: lysnrai-backend
          resources:
            requests:
              memory: '128Mi'
              cpu: '100m'
            limits:
              memory: '256Mi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
spec:
  selector:
    app: lysnrai-backend
  ports:
    - port: 4015
      targetPort: 4015

Ingress (Traefik, built into K3s)

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bytelyst-ingress
  namespace: bytelyst-products
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
  rules:
    - host: lysnrai.local
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: lysnrai-backend
                port:
                  number: 4015
    - host: platform.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: platform-service
                port:
                  number: 4003
    # ... repeat per product

5. Docker Compose → K3s Migration Cheat Sheet

Docker Compose K3s Equivalent
services: Deployment + Service
ports: Service (ClusterIP/NodePort)
env_file: ConfigMap + Secret
depends_on: initContainers or readiness probes
volumes: PersistentVolumeClaim (local-path)
restart: unless-stopped Built-in (K8s always restarts pods)
labels: traefik.* Ingress resource
docker compose up kubectl apply -k k8s/
docker compose logs kubectl logs -f deploy/X or Loki/Grafana
docker compose ps kubectl get pods -A
Scale: change nothing kubectl scale deploy/X --replicas=3

6. K3s Practice Exercises (on single VM)

These exercises simulate real production scenarios:

Exercise 1: Rolling Update

# Build new image, deploy with zero downtime
docker build -t bytelyst/lysnrai-backend:v2 ./learning_voice_ai_agent/backend
kubectl set image deploy/lysnrai-backend lysnrai-backend=bytelyst/lysnrai-backend:v2 -n bytelyst-products
kubectl rollout status deploy/lysnrai-backend -n bytelyst-products

Exercise 2: Scale Horizontally

kubectl scale deploy/platform-service --replicas=3 -n bytelyst-platform
# Traefik auto-balances across all 3 pods

Exercise 3: ConfigMap / Secret Rotation

kubectl create secret generic bytelyst-secrets \
  --from-literal=JWT_SECRET=new-secret \
  --from-literal=COSMOS_KEY=new-key \
  -n bytelyst-platform --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deploy -n bytelyst-platform

Exercise 4: Resource Limits + HPA

# Auto-scale platform-service 1→5 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: platform-service-hpa
  namespace: bytelyst-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: platform-service
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Exercise 5: Helm Chart (packaged deploy)

# Create chart scaffold
helm create bytelyst-ecosystem
# Templatize all 25+ services into one chart
# Deploy: helm install bytelyst ./bytelyst-ecosystem -n bytelyst

7. Scaling Path: Single VM → Multi-Node

Phase 1: Docker Compose          Phase 2: K3s (1 node)
┌─────────────────────┐          ┌──────────────────────┐
│  Single VM           │    →     │  Single VM + K3s     │
│  docker compose up   │          │  kubectl apply -k    │
│  ~25 containers      │          │  ~25 pods            │
└─────────────────────┘          └──────────────────────┘
                                          │
                                          ▼
Phase 3: K3s (3 nodes)           Phase 4: Managed K8s
┌──────────────────────┐         ┌──────────────────────┐
│  1 server + 2 agents │    →    │  AKS / EKS / GKE     │
│  Same manifests!     │         │  Same manifests!      │
│  Real HA             │         │  Auto-scaling nodes   │
└──────────────────────┘         └──────────────────────┘

Adding a worker node to K3s is one command:

# On the worker VM:
curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=<token> sh -

~/code/mygh/
├── docker-compose.ecosystem.yml     # Phase 1: all-in-one compose
├── .env.ecosystem                   # Shared env vars
├── k8s/                             # Phase 2: K3s manifests
│   ├── kustomization.yaml           # Kustomize root
│   ├── infra/                       # Cosmos emulator, Azurite, Mailpit, Loki, Grafana
│   ├── platform/                    # platform-service, extraction, mcp
│   ├── products/                    # 10 product backends
│   ├── web/                         # 10+ Next.js dashboards
│   ├── config/                      # ConfigMaps
│   └── secrets/                     # Secrets (gitignored)
├── helm/                            # Phase 3: Helm chart
│   └── bytelyst-ecosystem/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
└── scripts/
    ├── ecosystem-up.sh              # docker compose -f docker-compose.ecosystem.yml up -d
    ├── ecosystem-k3s-deploy.sh      # kubectl apply -k k8s/
    └── ecosystem-build-all.sh       # Build all Docker images

9. Quick Start Commands

# ── Phase 1: Docker Compose ───────────────────────────
cd ~/code/mygh

# Build all images (first time, ~15-20 min)
docker compose -f docker-compose.ecosystem.yml build

# Start everything
docker compose -f docker-compose.ecosystem.yml up -d

# Check status
docker compose -f docker-compose.ecosystem.yml ps

# View logs
docker compose -f docker-compose.ecosystem.yml logs -f platform-service

# Tear down
docker compose -f docker-compose.ecosystem.yml down

# ── Phase 2: K3s ──────────────────────────────────────
# Build + load images into K3s containerd
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service
sudo k3s ctr images import <(docker save bytelyst/platform-service:latest)

# Deploy all
kubectl apply -k k8s/

# Check pods
kubectl get pods -A

# Port-forward for local access
kubectl port-forward svc/platform-service 4003:4003 -n bytelyst-platform

10. What's NOT Dockerized Yet (gaps)

Repo Backend Dockerfile Web Dockerfile docker-prep.sh output:'standalone' Status
LysnrAI user-dashboard (conditional) Need backend Dockerfile + docker-prep.sh
MindLyst Need all 4
ChronoMind Need all 4
JarvisJr Need all 4
PeakPulse Need all 4
FlowMonk Need all 4
NomGap ⚠️ Backend Dockerfile ignores file: deps — see §12.F3
NoteLett ⚠️ Backend Dockerfile COPY . pulls broken symlinks — see §12.F4
ActionTrail Ready (uses .tarballs/ pattern)
LocalMemGPT Ready (repo-root build context)
admin-web (in common-plat) N/A (pnpm) (conditional) Ready
tracker-web (in common-plat) N/A (pnpm) (conditional) Ready

6 repos need Dockerfiles + docker-prep.sh + output:'standalone'. 2 existing Dockerfiles have issues.


11. Dockerfile Template (for missing repos)

Critical: These templates assume you run docker-prep.sh first to pack @bytelyst/* file: deps into .tarballs/. Without this, npm ci will fail because file:../../learning_ai_common_plat/packages/* doesn't exist inside the Docker build context.

Backend (Fastify 5 + TypeScript)

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN npx tsc

# Production stage
FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --omit=dev --ignore-scripts

COPY --from=builder /app/dist ./dist
# Copy shared/product.json if the backend reads it at runtime
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]

Web (Next.js 16)

Prerequisite: next.config.ts MUST have output: 'standalone' for the standalone Dockerfile pattern to work. Without it, .next/standalone/ won't be generated and the COPY will fail.

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN npm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]

docker-prep.sh (for repos that don't have one yet)

Copy from learning_ai_trails/scripts/docker-prep.sh — it handles both backend/ and web/ targets, packs all file: refs into .tarballs/, and rewrites package.json to point at them.

cp learning_ai_trails/scripts/docker-prep.sh <target-repo>/scripts/docker-prep.sh
chmod +x <target-repo>/scripts/docker-prep.sh

12. Audit Findings (Review 2026-03-22)

Systematic code review of all claims in this document against the actual codebase.

F1. Port Conflicts (CRITICAL)

Grafana uses port 3000. The following webs also default to 3000:

  • admin-web (no port in package.json)
  • ChronoMind web (no port override)
  • JarvisJr web (no port override)
  • FlowMonk web (no port override)
  • NoteLett web (Dockerfile EXPOSE 3000)
  • ActionTrail web (Dockerfile EXPOSE 3000)

Fix: Set PORT env var in compose for each, or use host:container port remapping.

F2. file: Dependencies Break Docker Builds (CRITICAL)

Every product backend and web has file:../../learning_ai_common_plat/packages/* dependencies in package.json. These resolve locally via symlinks but fail inside Docker because the sibling repo isn't in the build context.

Pattern: Each repo needs a docker-prep.sh that:

  1. Runs pnpm build in common-plat
  2. Packs each @bytelyst/* package into a .tarballs/*.tgz
  3. Rewrites package.json file: refs → file:.tarballs/bytelyst-*.tgz

Repos with docker-prep.sh: ActionTrail , LocalMemGPT , NoteLett , NomGap Repos missing docker-prep.sh: LysnrAI, MindLyst, ChronoMind, JarvisJr, PeakPulse, FlowMonk

F3. NomGap Backend Dockerfile Ignores file: Deps (BUG)

@/learning_ai_fastgap/backend/Dockerfile does COPY package.json → npm ci but doesn't copy .tarballs/. The file: refs will fail. Needs the .tarballs/ COPY step added.

F4. NoteLett Backend Dockerfile Copies Everything (BUG)

@/learning_ai_notes/backend/Dockerfile does COPY . . in the build stage, which includes broken node_modules symlinks from file: deps. Should use explicit COPY of src/, tsconfig.json, and .tarballs/ instead.

F5. Missing output: 'standalone' in next.config.ts (CRITICAL)

The Dockerfile template copies from .next/standalone/ — this directory only exists when output: 'standalone' is set in next.config.ts.

Web Has output: 'standalone'? Notes
NomGap Set directly
NoteLett Set directly
ActionTrail Set directly
LocalMemGPT Set directly
admin-web Conditional: process.env.VERCEL ? {} : { output: 'standalone' }
tracker-web Conditional (same)
user-dashboard Conditional (same)
ChronoMind Must add
JarvisJr Must add
FlowMonk Must add
MindLyst Unknown — needs check

F6. Build Context Mismatch for ActionTrail + LocalMemGPT

Their Dockerfiles expect repo-root as build context (they COPY backend/... and COPY shared/...). The compose build: must use context: ./repo-name + dockerfile: backend/Dockerfile, not build: ./repo-name/backend.

Already correct in the compose above. Calling it out so future editors don't "simplify" it.

F7. Node.js Version Inconsistency

Existing Dockerfiles use mixed Node versions:

  • NomGap, NoteLett: node:20-alpine
  • ActionTrail, LocalMemGPT: node:22-alpine / node:22-slim

Recommendation: Standardize on node:22-alpine for all new Dockerfiles. Existing ones work but should be updated for consistency.

F8. Missing --webpack Flag for Next.js Builds

Several web apps require --webpack flag for builds (Serwist PWA incompatible with Turbopack, or @bytelyst/* file: ref transpilation). The Dockerfile template uses npm run build which should map to next build --webpack in package.json — verify each repo's build script.

F9. Missing .env.ecosystem Template

The compose references .env.ecosystem but the doc doesn't define its contents. Key vars needed:

# .env.ecosystem — shared env for all services
COSMOS_ENDPOINT=https://cosmos-emulator:8081
COSMOS_KEY=<emulator-key>
COSMOS_DATABASE=bytelyst
JWT_SECRET=dev-ecosystem-secret-change-me
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=http://azurite:10000/devstoreaccount1;
PLATFORM_SERVICE_URL=http://platform-service:4003
EXTRACTION_SERVICE_URL=http://extraction-service:4005
DB_PROVIDER=memory
NODE_ENV=production
CORS_ORIGIN=*
SMTP_HOST=mailpit
SMTP_PORT=1025

F10. host.docker.internal Only Works on Docker Desktop (Mac/Windows)

LocalMemGPT uses OLLAMA_URL: 'http://host.docker.internal:11434' — this works on Docker Desktop but not on Linux VMs (which is the likely deployment target).

Fix on Linux: Add extra_hosts: ['host.docker.internal:host-gateway'] to the service, or use network_mode: host.

Summary of Required Work Before Compose Works

Priority Item Count
P0 Create missing docker-prep.sh 6 repos
P0 Create missing backend Dockerfiles 6 repos
P0 Create missing web Dockerfiles 5 repos
P0 Add output: 'standalone' to next.config.ts 3 webs
P1 Fix NomGap backend Dockerfile (add .tarballs/ COPY) 1 file
P1 Fix NoteLett backend Dockerfile (explicit COPY, not .) 1 file
P1 Create .env.ecosystem template 1 file
P2 Standardize Node.js version to 22-alpine 4 Dockerfiles
P2 Add extra_hosts for Linux VM Ollama access 1 service

Summary

Question Answer
Can deploy on single VM? Yes. All ~25 services fit in 32 GB RAM.
All Dockerized? 4/10 product repos fully Dockerized. 6 need Dockerfiles (copy-paste template).
K8s practice on single VM? K3s — certified K8s, single binary, same manifests scale to multi-node or AKS/EKS/GKE.
Recommended VM? 8 vCPU / 32 GB (min) or 16 vCPU / 64 GB (with Ollama). Hetzner ~$45/mo for dev.
Time to production K8s? Phase 1 (compose) → Phase 2 (K3s single) → Phase 3 (K3s multi) → Phase 4 (managed). Same manifests throughout.