saravanakumardb1 828d31b63d docs: update documentation

2026-03-22 14:06:44 -07:00

62 KiB

Raw Blame History

ByteLyst Ecosystem — Single-VM Deployment Guide

Deploy the entire ByteLyst ecosystem on one VM, fully Dockerized, with a local Kubernetes layer (Docker Desktop or K3s) for production-readiness practice.

Package-Manager Strategy (current transition plan)

learning_ai_common_plat is already the canonical pnpm workspace monorepo for shared packages, services, and dashboards.
Node/TypeScript product repos are moving toward pnpm as the long-term standard, but that migration is still repo-by-repo and incremental.
During the transition, each repo's Docker/build flow must follow the repo's own:
- packageManager field
- lockfile
- Dockerfile
- docker-prep.sh behavior
This plan does not merge all repos into one mega-monorepo. Product repos remain independent repositories.
Once a repo migrates to pnpm, it must be fully aligned in the same change set:
- no pnpm-lock.yaml with npm ci
- no stale package-lock.json
- no mixed package-manager assumptions in CI, Docker, or docs

Migration-impact note: The deployment architecture in this guide stays the same during the pnpm migration (Compose, K3s, ingress, namespaces, VM sizing). The main maintenance surface is Docker/build instructions and dependency-prep flow. The biggest operational risk is stale templates or stale docs after an individual repo migrates.

1. Service Inventory

Shared Infrastructure (common-plat)

Service	Port	Image	RAM Est.
platform-service	4003	Fastify 5 + TS	~200 MB
extraction-service	4005	Fastify 5 + Python sidecar	~350 MB
mcp-server	4007	Fastify 5 + TS	~150 MB
Cosmos DB Emulator	8081, 1234	`mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview`	~2 GB
Azurite (blob)	10000	`mcr.microsoft.com/azure-storage/azurite`	~100 MB
Mailpit (SMTP)	1025, 8025	`axllent/mailpit`	~50 MB
Traefik (gateway)	80, 8080	`traefik:v3.3`	~100 MB
Loki (logs)	3100	`grafana/loki`	~200 MB
Grafana (dashboards)	3000	`grafana/grafana`	~200 MB

Product Backends (Fastify 5 + TypeScript)

Product	Port	RAM Est.
LysnrAI backend	4015	~150 MB
MindLyst backend	4014	~150 MB
ChronoMind backend	4011	~150 MB
JarvisJr backend	4012	~150 MB
NomGap backend	4013	~150 MB
PeakPulse backend	4010	~150 MB
FlowMonk backend	4017	~150 MB
NoteLett backend	4016	~150 MB
ActionTrail backend	4018	~150 MB
LocalMemGPT backend	4019	~150 MB

Web Dashboards (Next.js 16)

Dashboard	Default Port	Compose Port	RAM Est.	Notes
admin-web	3000	3001	~250 MB	No port in package.json; must set `PORT=3001` env
user-dashboard-web	3002	3002	~250 MB	Port set in package.json
tracker-web	3003	3003	~200 MB	Port set in package.json
NomGap web	3040	3040	~200 MB	Port set in Dockerfile
ChronoMind web	3000	3051	~200 MB	No port override; must set `PORT` env
JarvisJr web	3000	3052	~200 MB	No port override; must set `PORT` env
FlowMonk web	3000	3053	~200 MB	No port override; must set `PORT` env
NoteLett web	3000	3054	~200 MB	Dockerfile EXPOSE 3000; remap in compose
ActionTrail web	3000	3060	~200 MB	Dockerfile EXPOSE 3000; remap in compose
LocalMemGPT web	3070	3070	~200 MB	Port set in package.json + Dockerfile
MindLyst web	3050	3050	~200 MB	Port set in package.json (`-p 3050`)

Port conflict warning: Grafana uses port 3000. admin-web, ChronoMind, JarvisJr, FlowMonk, NoteLett, and ActionTrail webs all default to 3000. The compose file must either set PORT env var or remap via ports: mapping.

Optional / AI

Service	Port	RAM Est.
Ollama (LLM)	11434	4–16 GB (model-dependent)

2. VM Sizing

Minimum (dev/staging, no Ollama)

Spec	Value
vCPUs	8
RAM	32 GB
Disk	100 GB SSD
OS	Ubuntu 24.04 LTS

Breakdown:

Cosmos Emulator: ~2 GB
10 Fastify backends × 150 MB = ~1.5 GB
3 shared services × 250 MB = ~0.75 GB
11 Next.js webs × 200 MB = ~2.2 GB
Infra (Traefik, Loki, Grafana, Azurite, Mailpit) = ~0.65 GB
K3s overhead = ~0.5 GB
Subtotal: ~7.4 GB → headroom for spikes + build cache = 32 GB

Recommended (with Ollama, small models)

Spec	Value
vCPUs	16
RAM	64 GB
Disk	200 GB NVMe SSD
GPU	Optional NVIDIA T4/A10 for fast LLM inference
OS	Ubuntu 24.04 LTS

Cloud Equivalents

Provider	Instance	vCPU	RAM	Price (approx)
Azure	Standard_D8s_v5	8	32 GB	~$280/mo
Azure	Standard_D16s_v5	16	64 GB	~$560/mo
AWS	m6i.2xlarge	8	32 GB	~$280/mo
AWS	m6i.4xlarge	16	64 GB	~$560/mo
Hetzner	CPX51	16	32 GB	~$45/mo
Hetzner	CCX63	48	192 GB	~$230/mo
Home	Mac Mini M4 Pro	12	48 GB	One-time ~$1,600

Cost tip: Hetzner is 5–10× cheaper than Azure/AWS for dev/staging.

3. Architecture: Docker Compose → K3s Migration Path

Phase 1: Docker Compose (after prerequisite work)

⚠️ Prerequisite: ALL product repos must run docker-prep.sh before building Docker images (see §12 Audit Findings). All Dockerfiles and output: 'standalone' configs are now in place (completed 2026-03-22). During the package-manager transition, each repo's Docker build must follow that repo's declared package manager and lockfile semantics rather than assuming npm or pnpm globally.

Create a unified docker-compose.ecosystem.yml that brings everything up.

Phase 2: Local Kubernetes (Docker Desktop or K3s)

Two options for single-node K8s — both give you real kubectl, Helm, Ingress, and CRDs identical to production AKS/EKS/GKE.

Option A: Docker Desktop Kubernetes (recommended for Mac/Windows dev)

Docker Desktop includes a built-in kind (Kubernetes IN Docker) cluster. Enable it in Docker Desktop → Settings → Kubernetes → Enable Kubernetes.

Zero install — checkbox in Docker Desktop, K8s v1.31+ included
Images shared — docker build images are immediately available to K8s (no import step!)
GUI dashboard — Docker Desktop shows Deployments, Pods, Services, Ingresses, ConfigMaps, Secrets
kubectl pre-configured — context docker-desktop auto-created
Helm works — install via brew install helm
Best for: Mac/Windows local development, quick iteration, visual debugging
Limitation: Single-node only, can't add workers (use K3s for multi-node practice)

Option B: K3s (recommended for Linux VMs / multi-node practice)

K3s is a lightweight, certified Kubernetes distro.

Production-grade (CNCF certified, used by Rancher)
Single binary, ~70 MB, installs in 30 seconds
Built-in Traefik Ingress (you already use Traefik!)
Built-in local-path StorageClass
Runs as systemd service (survives reboot)
Can scale to multi-node later by just joining worker nodes
Best for: Linux VMs, Hetzner/cloud deployment, multi-node scaling practice

4. Implementation Plan

4.1 Phase 1 — Unified Docker Compose

Create docker-compose.ecosystem.yml at workspace root (~/code/mygh/) that composes all services:

⚠️ Critical prerequisite — run BEFORE docker compose build:

# Pack @bytelyst/* file: dependencies into tarballs for each product repo.
# Every product repo has file: refs to ../learning_ai_common_plat/packages/*
# which don't resolve inside Docker build context. docker-prep.sh packs them.
# The prep flow must preserve each repo's package-manager semantics while rewriting
# file: refs for Docker contexts.
for repo in learning_voice_ai_agent learning_multimodal_memory_agents learning_ai_clock \
            learning_ai_jarvis_jr learning_ai_peakpulse learning_ai_flowmonk \
            learning_ai_fastgap learning_ai_notes learning_ai_trails learning_ai_local_memory_gpt; do
  (cd $repo && ./scripts/docker-prep.sh)
done

# ~/code/mygh/docker-compose.ecosystem.yml
# NOTE: All product backends/webs have file: deps to @bytelyst/* packages.
# You MUST run docker-prep.sh for each repo first (see above).

services:
  # ══════════════════════════════════════════════════════
  # INFRASTRUCTURE
  # ══════════════════════════════════════════════════════
  cosmos-emulator:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
    ports: ['8081:8081', '1234:1234']
    environment:
      PROTOCOL: http
      ENABLE_EXPLORER: 'true'
    restart: unless-stopped

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:3.35.0
    command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck
    ports: ['10000:10000']
    volumes: [azurite-data:/data]
    restart: unless-stopped

  mailpit:
    image: axllent/mailpit:v1.27.5
    ports: ['1025:1025', '8025:8025']
    restart: unless-stopped

  traefik:
    image: traefik:v3.3
    command:
      - '--api.insecure=true'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--entrypoints.web.address=:80'
    ports: ['80:80', '8080:8080']
    volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
    restart: unless-stopped

  loki:
    image: grafana/loki:3.3.2
    ports: ['3100:3100']
    volumes: [loki-data:/loki]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.4.0
    ports: ['3000:3000'] # NOTE: many Next.js webs also default to 3000 — avoid conflicts
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: lysnrai
    volumes: [grafana-data:/var/lib/grafana]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # SHARED SERVICES (common-plat — no file: deps, pnpm workspace handles it)
  # ══════════════════════════════════════════════════════
  platform-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/platform-service/Dockerfile
    ports: ['4003:4003']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4003
      COSMOS_AUTO_INIT: 'true'
    depends_on: [cosmos-emulator, azurite, mailpit]
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.platform.rule=Host(`platform.local`)'
      - 'traefik.http.services.platform.loadbalancer.server.port=4003'
    restart: unless-stopped

  extraction-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/extraction-service/Dockerfile
    ports: ['4005:4005']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4005
    depends_on: [cosmos-emulator]
    restart: unless-stopped

  mcp-server:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/mcp-server/Dockerfile
    ports: ['4007:4007']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4007
      PLATFORM_SERVICE_URL: http://platform-service:4003
      EXTRACTION_SERVICE_URL: http://extraction-service:4005
    depends_on: [platform-service, extraction-service]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # PRODUCT BACKENDS
  # All have file: deps → must run docker-prep.sh first.
  # ActionTrail + LocalMemGPT Dockerfiles use repo-root context.
  # Others use backend/ subdir context.
  # ══════════════════════════════════════════════════════
  lysnrai-backend:
    build: ./learning_voice_ai_agent/backend
    ports: ['4015:4015']
    env_file: [.env.ecosystem]
    environment: { PORT: '4015', SERVICE_NAME: lysnrai-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  mindlyst-backend:
    build: ./learning_multimodal_memory_agents/backend
    ports: ['4014:4014']
    env_file: [.env.ecosystem]
    environment: { PORT: '4014', SERVICE_NAME: mindlyst-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  chronomind-backend:
    build: ./learning_ai_clock/backend
    ports: ['4011:4011']
    env_file: [.env.ecosystem]
    environment: { PORT: '4011', SERVICE_NAME: chronomind-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  jarvisjr-backend:
    build: ./learning_ai_jarvis_jr/backend
    ports: ['4012:4012']
    env_file: [.env.ecosystem]
    environment: { PORT: '4012', SERVICE_NAME: jarvisjr-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-backend:
    build: ./learning_ai_fastgap/backend
    ports: ['4013:4013']
    env_file: [.env.ecosystem]
    environment: { PORT: '4013', SERVICE_NAME: nomgap-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  peakpulse-backend:
    build: ./learning_ai_peakpulse/backend
    ports: ['4010:4010']
    env_file: [.env.ecosystem]
    environment: { PORT: '4010', SERVICE_NAME: peakpulse-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  flowmonk-backend:
    build: ./learning_ai_flowmonk/backend
    ports: ['4017:4017']
    env_file: [.env.ecosystem]
    environment: { PORT: '4017', SERVICE_NAME: flowmonk-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  notelett-backend:
    build: ./learning_ai_notes/backend
    ports: ['4016:4016']
    env_file: [.env.ecosystem]
    environment: { PORT: '4016', SERVICE_NAME: notelett-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  actiontrail-backend:
    build:
      context: ./learning_ai_trails # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4018:4018']
    env_file: [.env.ecosystem]
    environment: { PORT: '4018', SERVICE_NAME: actiontrail-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  localmemgpt-backend:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4019:4019']
    env_file: [.env.ecosystem]
    environment: { PORT: '4019', OLLAMA_URL: 'http://host.docker.internal:11434' }
    volumes: [localmemgpt-data:/app/db]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # WEB DASHBOARDS
  # IMPORTANT: Most webs default to port 3000 internally.
  # Use PORT env var to override, or remap via host:container ports.
  # ══════════════════════════════════════════════════════
  admin-web:
    build: ./learning_ai_common_plat/dashboards/admin-web
    ports: ['3001:3001']
    env_file: [.env.ecosystem]
    environment:
      PORT: 3001 # admin-web has NO port override — defaults to 3000 without this!
    depends_on: [platform-service]
    restart: unless-stopped

  user-dashboard:
    build: ./learning_voice_ai_agent/user-dashboard-web
    ports: ['3002:3002']
    env_file: [.env.ecosystem]
    depends_on: [lysnrai-backend]
    restart: unless-stopped

  tracker-web:
    build: ./learning_ai_common_plat/dashboards/tracker-web
    ports: ['3003:3003']
    env_file: [.env.ecosystem]
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-web:
    build: ./learning_ai_fastgap/web
    ports: ['3040:3040']
    environment:
      PORT: 3040
      NEXT_PUBLIC_NOMGAP_API_URL: http://nomgap-backend:4013/api
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003/api
    depends_on: [nomgap-backend]
    restart: unless-stopped

  actiontrail-web:
    build: ./learning_ai_trails/web
    ports: ['3060:3000'] # Internal 3000 → external 3060
    environment:
      NEXT_PUBLIC_API_URL: http://actiontrail-backend:4018
    depends_on: [actiontrail-backend]
    restart: unless-stopped

  localmemgpt-web:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: web/Dockerfile
    ports: ['3070:3070']
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://localmemgpt-backend:4019
    depends_on: [localmemgpt-backend]
    restart: unless-stopped

  notelett-web:
    build: ./learning_ai_notes/web
    ports: ['3054:3000'] # Internal 3000 → external 3054
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://notelett-backend:4016
    depends_on: [notelett-backend]
    restart: unless-stopped

  chronomind-web:
    build: ./learning_ai_clock/web
    ports: ['3051:3000'] # Internal 3000 → external 3051
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://chronomind-backend:4011
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [chronomind-backend]
    restart: unless-stopped

  jarvisjr-web:
    build: ./learning_ai_jarvis_jr/web
    ports: ['3052:3000'] # Internal 3000 → external 3052
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://jarvisjr-backend:4012
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [jarvisjr-backend]
    restart: unless-stopped

  flowmonk-web:
    build: ./learning_ai_flowmonk/web
    ports: ['3053:3000'] # Internal 3000 → external 3053
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://flowmonk-backend:4017
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [flowmonk-backend]
    restart: unless-stopped

  mindlyst-web:
    build: ./learning_multimodal_memory_agents/mindlyst-native/web
    ports: ['3050:3050']
    environment:
      PORT: 3050 # package.json sets -p 3050
      NEXT_PUBLIC_BACKEND_URL: http://mindlyst-backend:4014
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [mindlyst-backend]
    restart: unless-stopped

volumes:
  azurite-data:
  loki-data:
  grafana-data:
  localmemgpt-data:

4.2 Phase 2 — Local Kubernetes (Docker Desktop or K3s)

Install K3s on the VM

# Install K3s (30 seconds, includes kubectl + containerd)
curl -sfL https://get.k3s.io | sh -

# Verify
sudo kubectl get nodes
# NAME       STATUS   ROLES                  AGE   VERSION
# myvm       Ready    control-plane,master   30s   v1.30.x+k3s1

# Copy kubeconfig for non-root usage
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Namespace Layout

kubectl create namespace bytelyst-infra      # Cosmos, Azurite, Mailpit, Loki, Grafana
kubectl create namespace bytelyst-platform   # platform-service, extraction, mcp
kubectl create namespace bytelyst-products   # 10 product backends
kubectl create namespace bytelyst-web        # All Next.js dashboards

Example K8s Manifest (one backend)

# k8s/products/lysnrai-backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
  labels:
    app: lysnrai-backend
    product: lysnrai
spec:
  replicas: 1 # Scale to 2+ when ready
  selector:
    matchLabels:
      app: lysnrai-backend
  template:
    metadata:
      labels:
        app: lysnrai-backend
    spec:
      containers:
        - name: lysnrai-backend
          image: bytelyst/lysnrai-backend:latest
          ports:
            - containerPort: 4015
          envFrom:
            - configMapRef:
                name: bytelyst-common-config
            - secretRef:
                name: bytelyst-secrets
          env:
            - name: PORT
              value: '4015'
            - name: SERVICE_NAME
              value: lysnrai-backend
          resources:
            requests:
              memory: '128Mi'
              cpu: '100m'
            limits:
              memory: '256Mi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
spec:
  selector:
    app: lysnrai-backend
  ports:
    - port: 4015
      targetPort: 4015

Ingress (Traefik, built into K3s)

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bytelyst-ingress
  namespace: bytelyst-products
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
  rules:
    - host: lysnrai.local
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: lysnrai-backend
                port:
                  number: 4015
    - host: platform.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: platform-service
                port:
                  number: 4003
    # ... repeat per product

5. Docker Compose → K3s Migration Cheat Sheet

Docker Compose	K3s Equivalent
`services:`	`Deployment` + `Service`
`ports:`	`Service` (ClusterIP/NodePort)
`env_file:`	`ConfigMap` + `Secret`
`depends_on:`	`initContainers` or readiness probes
`volumes:`	`PersistentVolumeClaim` (local-path)
`restart: unless-stopped`	Built-in (K8s always restarts pods)
`labels: traefik.*`	`Ingress` resource
`docker compose up`	`kubectl apply -k k8s/`
`docker compose logs`	`kubectl logs -f deploy/X` or Loki/Grafana
`docker compose ps`	`kubectl get pods -A`
Scale: change nothing	`kubectl scale deploy/X --replicas=3`

6. K3s Practice Exercises (on single VM)

These exercises simulate real production scenarios:

Exercise 1: Rolling Update

# Build new image, deploy with zero downtime
docker build -t bytelyst/lysnrai-backend:v2 ./learning_voice_ai_agent/backend
kubectl set image deploy/lysnrai-backend lysnrai-backend=bytelyst/lysnrai-backend:v2 -n bytelyst-products
kubectl rollout status deploy/lysnrai-backend -n bytelyst-products

Exercise 2: Scale Horizontally

kubectl scale deploy/platform-service --replicas=3 -n bytelyst-platform
# Traefik auto-balances across all 3 pods

Exercise 3: ConfigMap / Secret Rotation

kubectl create secret generic bytelyst-secrets \
  --from-literal=JWT_SECRET=new-secret \
  --from-literal=COSMOS_KEY=new-key \
  -n bytelyst-platform --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deploy -n bytelyst-platform

Exercise 4: Resource Limits + HPA

# Auto-scale platform-service 1→5 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: platform-service-hpa
  namespace: bytelyst-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: platform-service
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Exercise 5: Helm Chart (packaged deploy)

# Create chart scaffold
helm create bytelyst-ecosystem
# Templatize all 25+ services into one chart
# Deploy: helm install bytelyst ./bytelyst-ecosystem -n bytelyst

7. Scaling Path: Single VM → Multi-Node

Phase 1: Docker Compose          Phase 2: Local K8s (1 node)
┌─────────────────────┐          ┌──────────────────────────────┐
│  Single VM / Mac     │    →     │  Docker Desktop K8s (kind)   │
│  docker compose up   │          │  or K3s on Linux VM          │
│  ~25 containers      │          │  kubectl apply -k · ~25 pods │
└─────────────────────┘          └──────────────────────────────┘
                                          │
                                          ▼
Phase 3: K3s (3 nodes)           Phase 4: Managed K8s
┌──────────────────────┐         ┌──────────────────────┐
│  1 server + 2 agents │    →    │  AKS / EKS / GKE     │
│  Same manifests!     │         │  Same manifests!      │
│  Real HA             │         │  Auto-scaling nodes   │
└──────────────────────┘         └──────────────────────┘

Docker Desktop K8s → K3s migration: Same manifests, just change kubectl context.

Adding a worker node to K3s (Phase 3) is one command:

# On the worker VM:
curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=<token> sh -

8. Recommended Directory Structure

~/code/mygh/
├── docker-compose.ecosystem.yml     # Phase 1: all-in-one compose
├── .env.ecosystem                   # Shared env vars
├── k8s/                             # Phase 2: K3s manifests
│   ├── kustomization.yaml           # Kustomize root
│   ├── infra/                       # Cosmos emulator, Azurite, Mailpit, Loki, Grafana
│   ├── platform/                    # platform-service, extraction, mcp
│   ├── products/                    # 10 product backends
│   ├── web/                         # 10+ Next.js dashboards
│   ├── config/                      # ConfigMaps
│   └── secrets/                     # Secrets (gitignored)
├── helm/                            # Phase 3: Helm chart
│   └── bytelyst-ecosystem/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
└── scripts/
    ├── ecosystem-up.sh              # docker compose -f docker-compose.ecosystem.yml up -d
    ├── ecosystem-k3s-deploy.sh      # kubectl apply -k k8s/
    └── ecosystem-build-all.sh       # Build all Docker images

9. Quick Start Commands

# ── Phase 1: Docker Compose ───────────────────────────
cd ~/code/mygh

# Build all images (first time, ~15-20 min)
docker compose -f docker-compose.ecosystem.yml build

# Start everything
docker compose -f docker-compose.ecosystem.yml up -d

# Check status
docker compose -f docker-compose.ecosystem.yml ps

# View logs
docker compose -f docker-compose.ecosystem.yml logs -f platform-service

# Tear down
docker compose -f docker-compose.ecosystem.yml down

# ── Phase 2a: Docker Desktop Kubernetes (Mac) ────────
# Enable K8s: Docker Desktop → Settings → Kubernetes → Enable
# Verify:
kubectl config use-context docker-desktop
kubectl get nodes    # Should show: docker-desktop   Ready   control-plane

# Build images (Docker Desktop shares images with K8s — no import needed!)
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service

# Deploy all
kubectl apply -k k8s/

# Check pods
kubectl get pods -A

# Port-forward for local access
kubectl port-forward svc/platform-service 4003:4003 -n bytelyst-platform

# Or view everything in Docker Desktop GUI → Kubernetes tab

# ── Phase 2b: K3s (Linux VM) ─────────────────────────
# Build + load images into K3s containerd
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service
sudo k3s ctr images import <(docker save bytelyst/platform-service:latest)

# Deploy (same manifests as Docker Desktop!)
kubectl apply -k k8s/
kubectl get pods -A

10. Dockerization Status (all complete)

Repo	Backend Dockerfile	Web Dockerfile	`docker-prep.sh`	`output:'standalone'`	Package manager state	Lockfile state	Docker template type	Status
LysnrAI	✅	✅ user-dashboard	✅	✅ (conditional)	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
MindLyst	✅	✅	✅	✅ (conditional)	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
ChronoMind	✅	✅	✅	✅ (conditional)	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
JarvisJr	✅	✅	✅	✅ (conditional)	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
PeakPulse	✅	— (no web)	✅	—	No Node web surface in this repo	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
FlowMonk	✅	✅	✅	✅ (conditional)	Pilot candidate for `pnpm` migration	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready
NomGap	✅	✅	✅	✅	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Fixed (added `.tarballs/` COPY)
NoteLett	✅	✅	✅	✅	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Fixed (explicit COPY, not `.`)
ActionTrail	✅	✅	✅	✅	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready (uses `.tarballs/` pattern)
LocalMemGPT	✅	✅	✅	✅	Transitioning toward `pnpm` target	Follow repo-local current lockfile	Repo-specific during transition	✅ Ready (repo-root build context)
admin-web	—	✅ (in common-plat)	N/A (`pnpm`)	✅ (conditional)	`pnpm` workspace today	`pnpm-lock.yaml` via common-plat	`pnpm` workspace template	✅ Ready
tracker-web	—	✅ (in common-plat)	N/A (`pnpm`)	✅ (conditional)	`pnpm` workspace today	`pnpm-lock.yaml` via common-plat	`pnpm` workspace template	✅ Ready

All 10 product repos now have Dockerfiles, docker-prep.sh, and output:'standalone'. Created 2026-03-22.

Note: The table above tracks Docker readiness, not completed package-manager migration. For product repos, use each repo's actual packageManager field and lockfile until that repo is explicitly migrated to pnpm.

11. Dockerfile Template (reference)

Critical: Run docker-prep.sh first for product repos that use @bytelyst/* file: dependencies. The prep step packs those dependencies into .tarballs/ so Docker builds can resolve them inside the repo's own build context. During the migration window, Dockerfiles must match the repo's package manager and lockfile instead of assuming a single global install command.

Backend / service template — `npm` repo variant

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN npx tsc

# Production stage
FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --omit=dev --ignore-scripts

COPY --from=builder /app/dist ./dist
# Copy shared/product.json if the backend reads it at runtime
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]

Backend / service template — `pnpm` repo variant

# Pre-requisite: run ./scripts/docker-prep.sh if this repo rewrites @bytelyst/* file: deps
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --prod --ignore-scripts

COPY --from=builder /app/dist ./dist
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]

Web (Next.js 16) — `npm` repo variant

Prerequisite: next.config.ts MUST have output: 'standalone' for the standalone Dockerfile pattern to work. Without it, .next/standalone/ won't be generated and the COPY will fail.

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN npm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]

Web (Next.js 16) — `pnpm` repo variant

Prerequisite: next.config.ts MUST have output: 'standalone' for the standalone Dockerfile pattern to work. Keep the repo's build script authoritative, including --webpack where required.

# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs when applicable
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]

Template selection rule:

Use the npm variant only for repos that are still on npm with package-lock.json and matching Docker/CI scripts.

Use the pnpm variant for repos that have migrated to pnpm and carry pnpm-lock.yaml plus aligned CI/Docker/docs.

Do not leave a repo in mixed state after migration.

docker-prep.sh (for repos that don't have one yet)

Copy from learning_ai_trails/scripts/docker-prep.sh — it handles both backend/ and web/ targets, packs all file: refs into .tarballs/, and rewrites package.json to point at them.

The important rule is behavior, not shell-script ancestry:

docker-prep.sh must support both legacy npm repos and migrated pnpm repos.
It must not hardcode npm assumptions into tarball rewrite flow.
It must preserve the repo's package-manager semantics after prep:
- keep the correct lockfile
- keep the correct install command in Docker/CI
- keep .tarballs/ handling compatible with the repo's active package manager

cp learning_ai_trails/scripts/docker-prep.sh <target-repo>/scripts/docker-prep.sh
chmod +x <target-repo>/scripts/docker-prep.sh

11.1 Long-Term Package-Manager Migration Roadmap

End-state

learning_ai_common_plat remains the canonical pnpm workspace monorepo.
Node-based product repos migrate to pnpm over time.
Product repos remain independent repositories, not one combined workspace.
Current .tarballs/ handling for @bytelyst/* remains supported unless it is explicitly simplified later.

Migration principles

No big-bang migration.
One repo at a time.
Fully green before moving to the next repo.
Do not combine package-manager migration with unrelated dependency upgrades.
Migrate CI, Docker, and docs together in the same repo migration.
No mixed lockfile/package-manager state after migration.

Phase 0 — policy and checklist

Define package-manager policy.
Define migration checklist.
Define validation gates.

Pilot

learning_ai_flowmonk

Wave 1

learning_ai_trails
learning_ai_local_memory_gpt

Wave 2

learning_ai_notes
learning_ai_fastgap
learning_ai_clock

Wave 3

learning_ai_jarvis_jr
learning_voice_ai_agent

Validation gates per migrated repo

A repo is only considered migrated when all of the following are aligned and passing:

install
test
typecheck
build
Docker build
local shared package resolution
docs/CI updated

12. Audit Findings (Review 2026-03-22)

Systematic code review of all claims in this document against the actual codebase.

F1. Port Conflicts (CRITICAL)

Grafana uses port 3000. The following webs also default to 3000:

admin-web (no port in package.json)
ChronoMind web (no port override)
JarvisJr web (no port override)
FlowMonk web (no port override)
NoteLett web (Dockerfile EXPOSE 3000)
ActionTrail web (Dockerfile EXPOSE 3000)

Fix: Set PORT env var in compose for each, or use host:container port remapping.

F2. `file:` Dependencies Break Docker Builds (CRITICAL)

Every product backend and web has file:../../learning_ai_common_plat/packages/* dependencies in package.json. These resolve locally via symlinks but fail inside Docker because the sibling repo isn't in the build context.

Pattern: Each repo needs a docker-prep.sh that:

Runs pnpm build in common-plat
Packs each @bytelyst/* package into a .tarballs/*.tgz
Rewrites package.json file: refs → file:.tarballs/bytelyst-*.tgz
Preserves the product repo's active package-manager semantics during the rewrite

All 10 repos now have docker-prep.sh (created 2026-03-22). Previously only ActionTrail, LocalMemGPT, NoteLett, NomGap had them.

Long-term note: As product repos migrate to pnpm, this pattern remains valid. What changes is the repo-local install/runtime contract (pnpm install --frozen-lockfile instead of npm ci), not the deployment architecture or the need to package @bytelyst/* dependencies for isolated Docker contexts.

F3. NomGap Backend Dockerfile Ignores `file:` Deps (BUG)

@/learning_ai_fastgap/backend/Dockerfile does COPY package.json → npm ci but doesn't copy .tarballs/. The file: refs will fail. Needs the .tarballs/ COPY step added.

F4. NoteLett Backend Dockerfile Copies Everything (BUG)

@/learning_ai_notes/backend/Dockerfile does COPY . . in the build stage, which includes broken node_modules symlinks from file: deps. Should use explicit COPY of src/, tsconfig.json, and .tarballs/ instead.

F5. Missing `output: 'standalone'` in next.config.ts (CRITICAL)

The Dockerfile template copies from .next/standalone/ — this directory only exists when output: 'standalone' is set in next.config.ts.

Web	Has `output: 'standalone'`?	Notes
NomGap	✅	Set directly
NoteLett	✅	Set directly
ActionTrail	✅	Set directly
LocalMemGPT	✅	Set directly
admin-web	✅	Conditional: `process.env.VERCEL ? {} : { output: 'standalone' }`
tracker-web	✅	Conditional (same)
user-dashboard	✅	Conditional (same)
ChronoMind	✅	Added 2026-03-22 (conditional)
JarvisJr	✅	Added 2026-03-22 (conditional)
FlowMonk	✅	Added 2026-03-22 (conditional)
MindLyst	✅	Added 2026-03-22 (conditional)

F6. Build Context Mismatch for ActionTrail + LocalMemGPT

Their Dockerfiles expect repo-root as build context (they COPY backend/... and COPY shared/...). The compose build: must use context: ./repo-name + dockerfile: backend/Dockerfile, not build: ./repo-name/backend.

Already correct in the compose above. Calling it out so future editors don't "simplify" it.

F7. Node.js Version Inconsistency

Existing Dockerfiles use mixed Node versions:

NomGap, NoteLett: node:20-alpine
ActionTrail, LocalMemGPT: node:22-alpine / node:22-slim

Recommendation: Standardize on node:22-alpine for all new Dockerfiles. Existing ones work but should be updated for consistency.

F8. Missing `--webpack` Flag for Next.js Builds

Several web apps require --webpack flag for builds (Serwist PWA incompatible with Turbopack, or @bytelyst/* file: ref transpilation). The Dockerfile template should call the repo's package-manager-appropriate build command (npm run build or pnpm run build) and that script should map to next build --webpack where required.

F9. Missing `.env.ecosystem` Template

The compose references .env.ecosystem but the doc doesn't define its contents. Key vars needed:

# .env.ecosystem — shared env for all services
COSMOS_ENDPOINT=https://cosmos-emulator:8081
COSMOS_KEY=<emulator-key>
COSMOS_DATABASE=bytelyst
JWT_SECRET=dev-ecosystem-secret-change-me
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=http://azurite:10000/devstoreaccount1;
PLATFORM_SERVICE_URL=http://platform-service:4003
EXTRACTION_SERVICE_URL=http://extraction-service:4005
DB_PROVIDER=memory
NODE_ENV=production
CORS_ORIGIN=*
SMTP_HOST=mailpit
SMTP_PORT=1025

F10. `host.docker.internal` Only Works on Docker Desktop (Mac/Windows)

LocalMemGPT uses OLLAMA_URL: 'http://host.docker.internal:11434' — this works on Docker Desktop but not on Linux VMs (which is the likely deployment target).

Fix on Linux: Add extra_hosts: ['host.docker.internal:host-gateway'] to the service, or use network_mode: host.

Summary of Required Work Before Compose Works

Priority	Item	Count	Status
P0	Create missing `docker-prep.sh`	6 repos	✅ Done (3 created, 3 already existed)
P0	Create missing backend Dockerfiles	6 repos	✅ Done
P0	Create missing web Dockerfiles	5 repos	✅ Done (4 created, PeakPulse has no web)
P0	Add `output: 'standalone'` to next.config.ts	3 webs	✅ Done (4 webs: ChronoMind, JarvisJr, FlowMonk, MindLyst)
P1	Fix NomGap backend Dockerfile (add `.tarballs/` COPY)	1 file	✅ Done
P1	Fix NoteLett backend Dockerfile (explicit COPY, not `.`)	1 file	✅ Done
P1	Create `.env.ecosystem` template	1 file	Pending
P2	Standardize Node.js version to 22-alpine	4 Dockerfiles	✅ Done (all new Dockerfiles use 22-alpine)
P2	Add `extra_hosts` for Linux VM Ollama access	1 service	Pending

13. K8s & Docker Best Practices (from Production Comparisons)

Derived from comparing three production K8s deployments: a Go-based Call Controller (Paladin), a Python/FastAPI streaming agent platform (NetBond), and a Python/FastAPI voice agent (Welcome Agent). These patterns should be adopted when ByteLyst moves from Docker Compose → K3s → managed K8s.

13.1 Deployment — Zero-Downtime Rolling Updates

Do this (NetBond pattern):

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0 # Never kill a pod before its replacement is ready
      maxSurge: 1 # Only 1 extra pod during rollout
  template:
    spec:
      terminationGracePeriodSeconds: 45 # Match your app's drain timeout
      containers:
        - lifecycle:
            preStop:
              exec:
                command: ['sleep', '5'] # Let load balancer deregister before SIGTERM

Don't do this (Paladin anti-pattern):

maxUnavailable: 50% # Half your pods die instantly — users get errors
maxSurge: 50% # Wastes resources by doubling pod count

ByteLyst action: Every deployment template should use maxUnavailable: 0 + preStop sleep + explicit terminationGracePeriodSeconds matching the Fastify graceful shutdown timeout.

13.2 Pod Security Context

Always set (NetBond pattern):

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

If the app needs writable paths (e.g., /tmp, cache dirs), use emptyDir volumes:

volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}
volumeMounts:
  - name: tmp
    mountPath: /tmp
  - name: cache
    mountPath: /home/node/.cache

ByteLyst action: All Fastify backends are stateless — readOnlyRootFilesystem: true works. Next.js standalone servers may need /tmp writable.

13.3 Health Probes — Dedicated Endpoints

Do this:

livenessProbe:
  httpGet:
    path: /health # Dedicated lightweight endpoint
    port: 4003
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5 # Fast fail — 5s max
readinessProbe:
  httpGet:
    path: /health
    port: 4003
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 5

Don't do this (Welcome Agent anti-pattern):

livenessProbe:
  httpGet:
    path: /openapi.json # Heavy endpoint, not a health check
  timeoutSeconds: 60 # Masks real failures for a full minute

ByteLyst action: All backends already expose GET /health → { status: "ok" }. Use it. Set timeout to 5s.

13.4 Ingress — WebSocket Support

If any service uses WebSocket or SSE (FlowMonk SSE, LocalMemGPT streaming, future real-time features):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-buffering: 'off'
    nginx.ingress.kubernetes.io/proxy-http-version: '1.1'
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";

Missing WebSocket headers is a silent failure — connections drop after 60s with no error.

13.5 HPA — Use `autoscaling/v2`