# ByteLyst Ecosystem — Single-VM Deployment Guide

> Deploy the **entire** ByteLyst ecosystem on one VM, fully Dockerized, with a local Kubernetes layer (Docker Desktop or K3s) for production-readiness practice.

---

## Package-Manager Strategy (current transition plan)

- `learning_ai_common_plat` is already the canonical **`pnpm` workspace** monorepo for shared packages, services, and dashboards.
- Node/TypeScript product repos are moving toward **`pnpm` as the long-term standard**, but that migration is still **repo-by-repo** and **incremental**.
- During the transition, each repo's Docker/build flow must follow the repo's own:
  - `packageManager` field
  - lockfile
  - Dockerfile
  - `docker-prep.sh` behavior
- This plan does **not** merge all repos into one mega-monorepo. Product repos remain independent repositories.
- Once a repo migrates to `pnpm`, it must be fully aligned in the same change set:
  - no `pnpm-lock.yaml` with `npm ci`
  - no stale `package-lock.json`
  - no mixed package-manager assumptions in CI, Docker, or docs

> **Migration-impact note:** The deployment architecture in this guide stays the same during the `pnpm` migration (Compose, K3s, ingress, namespaces, VM sizing). The main maintenance surface is Docker/build instructions and dependency-prep flow. The biggest operational risk is stale templates or stale docs after an individual repo migrates.

---

## 1. Service Inventory

### Shared Infrastructure (common-plat)

| Service                  | Port       | Image                                                                  | RAM Est. |
| ------------------------ | ---------- | ---------------------------------------------------------------------- | -------- |
| **platform-service**     | 4003       | Fastify 5 + TS                                                         | ~200 MB  |
| **extraction-service**   | 4005       | Fastify 5 + Python sidecar                                             | ~350 MB  |
| **mcp-server**           | 4007       | Fastify 5 + TS                                                         | ~150 MB  |
| **Cosmos DB Emulator**   | 8081, 1234 | `mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview` | ~2 GB    |
| **Azurite** (blob)       | 10000      | `mcr.microsoft.com/azure-storage/azurite`                              | ~100 MB  |
| **Mailpit** (SMTP)       | 1025, 8025 | `axllent/mailpit`                                                      | ~50 MB   |
| **Traefik** (gateway)    | 80, 8080   | `traefik:v3.3`                                                         | ~100 MB  |
| **Loki** (logs)          | 3100       | `grafana/loki`                                                         | ~200 MB  |
| **Grafana** (dashboards) | 3000       | `grafana/grafana`                                                      | ~200 MB  |

### Product Backends (Fastify 5 + TypeScript)

| Product                 | Port | RAM Est. |
| ----------------------- | ---- | -------- |
| **LysnrAI** backend     | 4015 | ~150 MB  |
| **MindLyst** backend    | 4014 | ~150 MB  |
| **ChronoMind** backend  | 4011 | ~150 MB  |
| **JarvisJr** backend    | 4012 | ~150 MB  |
| **NomGap** backend      | 4013 | ~150 MB  |
| **PeakPulse** backend   | 4010 | ~150 MB  |
| **FlowMonk** backend    | 4017 | ~150 MB  |
| **NoteLett** backend    | 4016 | ~150 MB  |
| **ActionTrail** backend | 4018 | ~150 MB  |
| **LocalMemGPT** backend | 4019 | ~150 MB  |

### Web Dashboards (Next.js 16)

| Dashboard              | Default Port | Compose Port | RAM Est. | Notes                                             |
| ---------------------- | ------------ | ------------ | -------- | ------------------------------------------------- |
| **admin-web**          | 3000         | **3001**     | ~250 MB  | No port in package.json; must set `PORT=3001` env |
| **user-dashboard-web** | 3002         | 3002         | ~250 MB  | Port set in package.json                          |
| **tracker-web**        | 3003         | 3003         | ~200 MB  | Port set in package.json                          |
| **NomGap** web         | 3040         | 3040         | ~200 MB  | Port set in Dockerfile                            |
| **ChronoMind** web     | 3000         | **3051**     | ~200 MB  | No port override; must set `PORT` env             |
| **JarvisJr** web       | 3000         | **3052**     | ~200 MB  | No port override; must set `PORT` env             |
| **FlowMonk** web       | 3000         | **3053**     | ~200 MB  | No port override; must set `PORT` env             |
| **NoteLett** web       | 3000         | **3054**     | ~200 MB  | Dockerfile EXPOSE 3000; remap in compose          |
| **ActionTrail** web    | 3000         | **3060**     | ~200 MB  | Dockerfile EXPOSE 3000; remap in compose          |
| **LocalMemGPT** web    | 3070         | 3070         | ~200 MB  | Port set in package.json + Dockerfile             |
| **MindLyst** web       | 3050         | 3050         | ~200 MB  | Port set in package.json (`-p 3050`)              |

> **Port conflict warning:** Grafana uses port 3000. admin-web, ChronoMind, JarvisJr, FlowMonk, NoteLett, and ActionTrail webs all default to 3000. The compose file **must** either set `PORT` env var or remap via `ports:` mapping.

### Optional / AI

| Service          | Port  | RAM Est.                  |
| ---------------- | ----- | ------------------------- |
| **Ollama** (LLM) | 11434 | 4–16 GB (model-dependent) |

---

## 2. VM Sizing

### Minimum (dev/staging, no Ollama)

| Spec      | Value            |
| --------- | ---------------- |
| **vCPUs** | 8                |
| **RAM**   | 32 GB            |
| **Disk**  | 100 GB SSD       |
| **OS**    | Ubuntu 24.04 LTS |

**Breakdown:**

- Cosmos Emulator: ~2 GB
- 10 Fastify backends × 150 MB = ~1.5 GB
- 3 shared services × 250 MB = ~0.75 GB
- 11 Next.js webs × 200 MB = ~2.2 GB
- Infra (Traefik, Loki, Grafana, Azurite, Mailpit) = ~0.65 GB
- K3s overhead = ~0.5 GB
- **Subtotal: ~7.4 GB** → headroom for spikes + build cache = **32 GB**

### Recommended (with Ollama, small models)

| Spec      | Value                                         |
| --------- | --------------------------------------------- |
| **vCPUs** | 16                                            |
| **RAM**   | 64 GB                                         |
| **Disk**  | 200 GB NVMe SSD                               |
| **GPU**   | Optional NVIDIA T4/A10 for fast LLM inference |
| **OS**    | Ubuntu 24.04 LTS                              |

### Cloud Equivalents

| Provider    | Instance         | vCPU | RAM    | Price (approx)   |
| ----------- | ---------------- | ---- | ------ | ---------------- |
| **Azure**   | Standard_D8s_v5  | 8    | 32 GB  | ~$280/mo         |
| **Azure**   | Standard_D16s_v5 | 16   | 64 GB  | ~$560/mo         |
| **AWS**     | m6i.2xlarge      | 8    | 32 GB  | ~$280/mo         |
| **AWS**     | m6i.4xlarge      | 16   | 64 GB  | ~$560/mo         |
| **Hetzner** | CPX51            | 16   | 32 GB  | ~$45/mo          |
| **Hetzner** | CCX63            | 48   | 192 GB | ~$230/mo         |
| **Home**    | Mac Mini M4 Pro  | 12   | 48 GB  | One-time ~$1,600 |

> **Cost tip:** Hetzner is 5–10× cheaper than Azure/AWS for dev/staging.

---

## 3. Architecture: Docker Compose → K3s Migration Path

### Phase 1: Docker Compose (after prerequisite work)

> **⚠️ Prerequisite:** ALL product repos must run `docker-prep.sh` before building Docker images (see §12 Audit Findings). All Dockerfiles and `output: 'standalone'` configs are now in place (completed 2026-03-22). During the package-manager transition, each repo's Docker build must follow that repo's declared package manager and lockfile semantics rather than assuming `npm` or `pnpm` globally.

Create a **unified** `docker-compose.ecosystem.yml` that brings everything up.

### Phase 2: Local Kubernetes (Docker Desktop or K3s)

Two options for single-node K8s — both give you **real** `kubectl`, Helm, Ingress, and CRDs identical to production AKS/EKS/GKE.

#### Option A: Docker Desktop Kubernetes (recommended for Mac/Windows dev)

Docker Desktop includes a built-in **kind** (Kubernetes IN Docker) cluster. Enable it in Docker Desktop → Settings → Kubernetes → Enable Kubernetes.

- **Zero install** — checkbox in Docker Desktop, K8s v1.31+ included
- **Images shared** — `docker build` images are immediately available to K8s (no import step!)
- **GUI dashboard** — Docker Desktop shows Deployments, Pods, Services, Ingresses, ConfigMaps, Secrets
- **kubectl pre-configured** — context `docker-desktop` auto-created
- **Helm works** — install via `brew install helm`
- **Best for:** Mac/Windows local development, quick iteration, visual debugging
- **Limitation:** Single-node only, can't add workers (use K3s for multi-node practice)

#### Option B: K3s (recommended for Linux VMs / multi-node practice)

[K3s](https://k3s.io/) is a lightweight, certified Kubernetes distro.

- Production-grade (CNCF certified, used by Rancher)
- Single binary, ~70 MB, installs in 30 seconds
- Built-in Traefik Ingress (you already use Traefik!)
- Built-in local-path StorageClass
- Runs as systemd service (survives reboot)
- Can scale to multi-node later by just joining worker nodes
- **Best for:** Linux VMs, Hetzner/cloud deployment, multi-node scaling practice

---

## 4. Implementation Plan

### 4.1 Phase 1 — Unified Docker Compose

Create `docker-compose.ecosystem.yml` at workspace root (`~/code/mygh/`) that composes all services:

**⚠️ Critical prerequisite — run BEFORE `docker compose build`:**

```bash
# Pack @bytelyst/* file: dependencies into tarballs for each product repo.
# Every product repo has file: refs to ../learning_ai_common_plat/packages/*
# which don't resolve inside Docker build context. docker-prep.sh packs them.
# The prep flow must preserve each repo's package-manager semantics while rewriting
# file: refs for Docker contexts.
for repo in learning_voice_ai_agent learning_multimodal_memory_agents learning_ai_clock \
            learning_ai_jarvis_jr learning_ai_peakpulse learning_ai_flowmonk \
            learning_ai_fastgap learning_ai_notes learning_ai_trails learning_ai_local_memory_gpt; do
  (cd $repo && ./scripts/docker-prep.sh)
done
```

```yaml
# ~/code/mygh/docker-compose.ecosystem.yml
# NOTE: All product backends/webs have file: deps to @bytelyst/* packages.
# You MUST run docker-prep.sh for each repo first (see above).

services:
  # ══════════════════════════════════════════════════════
  # INFRASTRUCTURE
  # ══════════════════════════════════════════════════════
  cosmos-emulator:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
    ports: ['8081:8081', '1234:1234']
    environment:
      PROTOCOL: http
      ENABLE_EXPLORER: 'true'
    restart: unless-stopped

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:3.35.0
    command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck
    ports: ['10000:10000']
    volumes: [azurite-data:/data]
    restart: unless-stopped

  mailpit:
    image: axllent/mailpit:v1.27.5
    ports: ['1025:1025', '8025:8025']
    restart: unless-stopped

  traefik:
    image: traefik:v3.3
    command:
      - '--api.insecure=true'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--entrypoints.web.address=:80'
    ports: ['80:80', '8080:8080']
    volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
    restart: unless-stopped

  loki:
    image: grafana/loki:3.3.2
    ports: ['3100:3100']
    volumes: [loki-data:/loki]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.4.0
    ports: ['3000:3000'] # NOTE: many Next.js webs also default to 3000 — avoid conflicts
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: lysnrai
    volumes: [grafana-data:/var/lib/grafana]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # SHARED SERVICES (common-plat — no file: deps, pnpm workspace handles it)
  # ══════════════════════════════════════════════════════
  platform-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/platform-service/Dockerfile
    ports: ['4003:4003']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4003
      COSMOS_AUTO_INIT: 'true'
    depends_on: [cosmos-emulator, azurite, mailpit]
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.platform.rule=Host(`platform.local`)'
      - 'traefik.http.services.platform.loadbalancer.server.port=4003'
    restart: unless-stopped

  extraction-service:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/extraction-service/Dockerfile
    ports: ['4005:4005']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4005
    depends_on: [cosmos-emulator]
    restart: unless-stopped

  mcp-server:
    build:
      context: ./learning_ai_common_plat
      dockerfile: services/mcp-server/Dockerfile
    ports: ['4007:4007']
    env_file: [.env.ecosystem]
    environment:
      PORT: 4007
      PLATFORM_SERVICE_URL: http://platform-service:4003
      EXTRACTION_SERVICE_URL: http://extraction-service:4005
    depends_on: [platform-service, extraction-service]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # PRODUCT BACKENDS
  # All have file: deps → must run docker-prep.sh first.
  # ActionTrail + LocalMemGPT Dockerfiles use repo-root context.
  # Others use backend/ subdir context.
  # ══════════════════════════════════════════════════════
  lysnrai-backend:
    build: ./learning_voice_ai_agent/backend
    ports: ['4015:4015']
    env_file: [.env.ecosystem]
    environment: { PORT: '4015', SERVICE_NAME: lysnrai-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  mindlyst-backend:
    build: ./learning_multimodal_memory_agents/backend
    ports: ['4014:4014']
    env_file: [.env.ecosystem]
    environment: { PORT: '4014', SERVICE_NAME: mindlyst-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  chronomind-backend:
    build: ./learning_ai_clock/backend
    ports: ['4011:4011']
    env_file: [.env.ecosystem]
    environment: { PORT: '4011', SERVICE_NAME: chronomind-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  jarvisjr-backend:
    build: ./learning_ai_jarvis_jr/backend
    ports: ['4012:4012']
    env_file: [.env.ecosystem]
    environment: { PORT: '4012', SERVICE_NAME: jarvisjr-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-backend:
    build: ./learning_ai_fastgap/backend
    ports: ['4013:4013']
    env_file: [.env.ecosystem]
    environment: { PORT: '4013', SERVICE_NAME: nomgap-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  peakpulse-backend:
    build: ./learning_ai_peakpulse/backend
    ports: ['4010:4010']
    env_file: [.env.ecosystem]
    environment: { PORT: '4010', SERVICE_NAME: peakpulse-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  flowmonk-backend:
    build: ./learning_ai_flowmonk/backend
    ports: ['4017:4017']
    env_file: [.env.ecosystem]
    environment: { PORT: '4017', SERVICE_NAME: flowmonk-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  notelett-backend:
    build: ./learning_ai_notes/backend
    ports: ['4016:4016']
    env_file: [.env.ecosystem]
    environment: { PORT: '4016', SERVICE_NAME: notelett-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  actiontrail-backend:
    build:
      context: ./learning_ai_trails # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4018:4018']
    env_file: [.env.ecosystem]
    environment: { PORT: '4018', SERVICE_NAME: actiontrail-backend }
    depends_on: [platform-service]
    restart: unless-stopped

  localmemgpt-backend:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: backend/Dockerfile
    ports: ['4019:4019']
    env_file: [.env.ecosystem]
    environment: { PORT: '4019', OLLAMA_URL: 'http://host.docker.internal:11434' }
    volumes: [localmemgpt-data:/app/db]
    restart: unless-stopped

  # ══════════════════════════════════════════════════════
  # WEB DASHBOARDS
  # IMPORTANT: Most webs default to port 3000 internally.
  # Use PORT env var to override, or remap via host:container ports.
  # ══════════════════════════════════════════════════════
  admin-web:
    build: ./learning_ai_common_plat/dashboards/admin-web
    ports: ['3001:3001']
    env_file: [.env.ecosystem]
    environment:
      PORT: 3001 # admin-web has NO port override — defaults to 3000 without this!
    depends_on: [platform-service]
    restart: unless-stopped

  user-dashboard:
    build: ./learning_voice_ai_agent/user-dashboard-web
    ports: ['3002:3002']
    env_file: [.env.ecosystem]
    depends_on: [lysnrai-backend]
    restart: unless-stopped

  tracker-web:
    build: ./learning_ai_common_plat/dashboards/tracker-web
    ports: ['3003:3003']
    env_file: [.env.ecosystem]
    depends_on: [platform-service]
    restart: unless-stopped

  nomgap-web:
    build: ./learning_ai_fastgap/web
    ports: ['3040:3040']
    environment:
      PORT: 3040
      NEXT_PUBLIC_NOMGAP_API_URL: http://nomgap-backend:4013/api
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003/api
    depends_on: [nomgap-backend]
    restart: unless-stopped

  actiontrail-web:
    build: ./learning_ai_trails/web
    ports: ['3060:3000'] # Internal 3000 → external 3060
    environment:
      NEXT_PUBLIC_API_URL: http://actiontrail-backend:4018
    depends_on: [actiontrail-backend]
    restart: unless-stopped

  localmemgpt-web:
    build:
      context: ./learning_ai_local_memory_gpt # Dockerfile expects repo-root context
      dockerfile: web/Dockerfile
    ports: ['3070:3070']
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://localmemgpt-backend:4019
    depends_on: [localmemgpt-backend]
    restart: unless-stopped

  notelett-web:
    build: ./learning_ai_notes/web
    ports: ['3054:3000'] # Internal 3000 → external 3054
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://notelett-backend:4016
    depends_on: [notelett-backend]
    restart: unless-stopped

  chronomind-web:
    build: ./learning_ai_clock/web
    ports: ['3051:3000'] # Internal 3000 → external 3051
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://chronomind-backend:4011
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [chronomind-backend]
    restart: unless-stopped

  jarvisjr-web:
    build: ./learning_ai_jarvis_jr/web
    ports: ['3052:3000'] # Internal 3000 → external 3052
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://jarvisjr-backend:4012
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [jarvisjr-backend]
    restart: unless-stopped

  flowmonk-web:
    build: ./learning_ai_flowmonk/web
    ports: ['3053:3000'] # Internal 3000 → external 3053
    environment:
      NEXT_PUBLIC_BACKEND_URL: http://flowmonk-backend:4017
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [flowmonk-backend]
    restart: unless-stopped

  mindlyst-web:
    build: ./learning_multimodal_memory_agents/mindlyst-native/web
    ports: ['3050:3050']
    environment:
      PORT: 3050 # package.json sets -p 3050
      NEXT_PUBLIC_BACKEND_URL: http://mindlyst-backend:4014
      NEXT_PUBLIC_PLATFORM_SERVICE_URL: http://platform-service:4003
    depends_on: [mindlyst-backend]
    restart: unless-stopped

volumes:
  azurite-data:
  loki-data:
  grafana-data:
  localmemgpt-data:
```

### 4.2 Phase 2 — Local Kubernetes (Docker Desktop or K3s)

#### Install K3s on the VM

```bash
# Install K3s (30 seconds, includes kubectl + containerd)
curl -sfL https://get.k3s.io | sh -

# Verify
sudo kubectl get nodes
# NAME       STATUS   ROLES                  AGE   VERSION
# myvm       Ready    control-plane,master   30s   v1.30.x+k3s1

# Copy kubeconfig for non-root usage
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
```

#### Namespace Layout

```bash
kubectl create namespace bytelyst-infra      # Cosmos, Azurite, Mailpit, Loki, Grafana
kubectl create namespace bytelyst-platform   # platform-service, extraction, mcp
kubectl create namespace bytelyst-products   # 10 product backends
kubectl create namespace bytelyst-web        # All Next.js dashboards
```

#### Example K8s Manifest (one backend)

```yaml
# k8s/products/lysnrai-backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
  labels:
    app: lysnrai-backend
    product: lysnrai
spec:
  replicas: 1 # Scale to 2+ when ready
  selector:
    matchLabels:
      app: lysnrai-backend
  template:
    metadata:
      labels:
        app: lysnrai-backend
    spec:
      containers:
        - name: lysnrai-backend
          image: bytelyst/lysnrai-backend:latest
          ports:
            - containerPort: 4015
          envFrom:
            - configMapRef:
                name: bytelyst-common-config
            - secretRef:
                name: bytelyst-secrets
          env:
            - name: PORT
              value: '4015'
            - name: SERVICE_NAME
              value: lysnrai-backend
          resources:
            requests:
              memory: '128Mi'
              cpu: '100m'
            limits:
              memory: '256Mi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 4015
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: lysnrai-backend
  namespace: bytelyst-products
spec:
  selector:
    app: lysnrai-backend
  ports:
    - port: 4015
      targetPort: 4015
```

#### Ingress (Traefik, built into K3s)

```yaml
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bytelyst-ingress
  namespace: bytelyst-products
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
  rules:
    - host: lysnrai.local
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: lysnrai-backend
                port:
                  number: 4015
    - host: platform.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: platform-service
                port:
                  number: 4003
    # ... repeat per product
```

---

## 5. Docker Compose → K3s Migration Cheat Sheet

| Docker Compose            | K3s Equivalent                             |
| ------------------------- | ------------------------------------------ |
| `services:`               | `Deployment` + `Service`                   |
| `ports:`                  | `Service` (ClusterIP/NodePort)             |
| `env_file:`               | `ConfigMap` + `Secret`                     |
| `depends_on:`             | `initContainers` or readiness probes       |
| `volumes:`                | `PersistentVolumeClaim` (local-path)       |
| `restart: unless-stopped` | Built-in (K8s always restarts pods)        |
| `labels: traefik.*`       | `Ingress` resource                         |
| `docker compose up`       | `kubectl apply -k k8s/`                    |
| `docker compose logs`     | `kubectl logs -f deploy/X` or Loki/Grafana |
| `docker compose ps`       | `kubectl get pods -A`                      |
| Scale: change nothing     | `kubectl scale deploy/X --replicas=3`      |

---

## 6. K3s Practice Exercises (on single VM)

These exercises simulate real production scenarios:

### Exercise 1: Rolling Update

```bash
# Build new image, deploy with zero downtime
docker build -t bytelyst/lysnrai-backend:v2 ./learning_voice_ai_agent/backend
kubectl set image deploy/lysnrai-backend lysnrai-backend=bytelyst/lysnrai-backend:v2 -n bytelyst-products
kubectl rollout status deploy/lysnrai-backend -n bytelyst-products
```

### Exercise 2: Scale Horizontally

```bash
kubectl scale deploy/platform-service --replicas=3 -n bytelyst-platform
# Traefik auto-balances across all 3 pods
```

### Exercise 3: ConfigMap / Secret Rotation

```bash
kubectl create secret generic bytelyst-secrets \
  --from-literal=JWT_SECRET=new-secret \
  --from-literal=COSMOS_KEY=new-key \
  -n bytelyst-platform --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deploy -n bytelyst-platform
```

### Exercise 4: Resource Limits + HPA

```yaml
# Auto-scale platform-service 1→5 pods based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: platform-service-hpa
  namespace: bytelyst-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: platform-service
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
```

### Exercise 5: Helm Chart (packaged deploy)

```bash
# Create chart scaffold
helm create bytelyst-ecosystem
# Templatize all 25+ services into one chart
# Deploy: helm install bytelyst ./bytelyst-ecosystem -n bytelyst
```

---

## 7. Scaling Path: Single VM → Multi-Node

```
Phase 1: Docker Compose          Phase 2: Local K8s (1 node)
┌─────────────────────┐          ┌──────────────────────────────┐
│  Single VM / Mac     │    →     │  Docker Desktop K8s (kind)   │
│  docker compose up   │          │  or K3s on Linux VM          │
│  ~25 containers      │          │  kubectl apply -k · ~25 pods │
└─────────────────────┘          └──────────────────────────────┘
                                          │
                                          ▼
Phase 3: K3s (3 nodes)           Phase 4: Managed K8s
┌──────────────────────┐         ┌──────────────────────┐
│  1 server + 2 agents │    →    │  AKS / EKS / GKE     │
│  Same manifests!     │         │  Same manifests!      │
│  Real HA             │         │  Auto-scaling nodes   │
└──────────────────────┘         └──────────────────────┘
```

**Docker Desktop K8s → K3s migration:** Same manifests, just change `kubectl` context.

**Adding a worker node to K3s (Phase 3) is one command:**

```bash
# On the worker VM:
curl -sfL https://get.k3s.io | K3S_URL=https://server-ip:6443 K3S_TOKEN=<token> sh -
```

---

## 8. Recommended Directory Structure

```
~/code/mygh/
├── docker-compose.ecosystem.yml     # Phase 1: all-in-one compose
├── .env.ecosystem                   # Shared env vars
├── k8s/                             # Phase 2: K3s manifests
│   ├── kustomization.yaml           # Kustomize root
│   ├── infra/                       # Cosmos emulator, Azurite, Mailpit, Loki, Grafana
│   ├── platform/                    # platform-service, extraction, mcp
│   ├── products/                    # 10 product backends
│   ├── web/                         # 10+ Next.js dashboards
│   ├── config/                      # ConfigMaps
│   └── secrets/                     # Secrets (gitignored)
├── helm/                            # Phase 3: Helm chart
│   └── bytelyst-ecosystem/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
└── scripts/
    ├── ecosystem-up.sh              # docker compose -f docker-compose.ecosystem.yml up -d
    ├── ecosystem-k3s-deploy.sh      # kubectl apply -k k8s/
    └── ecosystem-build-all.sh       # Build all Docker images
```

---

## 9. Quick Start Commands

```bash
# ── Phase 1: Docker Compose ───────────────────────────
cd ~/code/mygh

# Build all images (first time, ~15-20 min)
docker compose -f docker-compose.ecosystem.yml build

# Start everything
docker compose -f docker-compose.ecosystem.yml up -d

# Check status
docker compose -f docker-compose.ecosystem.yml ps

# View logs
docker compose -f docker-compose.ecosystem.yml logs -f platform-service

# Tear down
docker compose -f docker-compose.ecosystem.yml down

# ── Phase 2a: Docker Desktop Kubernetes (Mac) ────────
# Enable K8s: Docker Desktop → Settings → Kubernetes → Enable
# Verify:
kubectl config use-context docker-desktop
kubectl get nodes    # Should show: docker-desktop   Ready   control-plane

# Build images (Docker Desktop shares images with K8s — no import needed!)
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service

# Deploy all
kubectl apply -k k8s/

# Check pods
kubectl get pods -A

# Port-forward for local access
kubectl port-forward svc/platform-service 4003:4003 -n bytelyst-platform

# Or view everything in Docker Desktop GUI → Kubernetes tab

# ── Phase 2b: K3s (Linux VM) ─────────────────────────
# Build + load images into K3s containerd
docker build -t bytelyst/platform-service:latest ./learning_ai_common_plat/services/platform-service
sudo k3s ctr images import <(docker save bytelyst/platform-service:latest)

# Deploy (same manifests as Docker Desktop!)
kubectl apply -k k8s/
kubectl get pods -A
```

---

## 10. Dockerization Status (all complete)

| Repo            | Backend Dockerfile | Web Dockerfile      | `docker-prep.sh` | `output:'standalone'` | Package manager state                    | Lockfile state                     | Docker template type            | Status                               |
| --------------- | ------------------ | ------------------- | ---------------- | --------------------- | ---------------------------------------- | ---------------------------------- | ------------------------------- | ------------------------------------ |
| **LysnrAI**     | ✅                 | ✅ user-dashboard   | ✅               | ✅ (conditional)      | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **MindLyst**    | ✅                 | ✅                  | ✅               | ✅ (conditional)      | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **ChronoMind**  | ✅                 | ✅                  | ✅               | ✅ (conditional)      | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **JarvisJr**    | ✅                 | ✅                  | ✅               | ✅ (conditional)      | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **PeakPulse**   | ✅                 | — (no web)          | ✅               | —                     | No Node web surface in this repo         | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **FlowMonk**    | ✅                 | ✅                  | ✅               | ✅ (conditional)      | **Pilot candidate** for `pnpm` migration | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready                             |
| **NomGap**      | ✅                 | ✅                  | ✅               | ✅                    | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Fixed (added `.tarballs/` COPY)   |
| **NoteLett**    | ✅                 | ✅                  | ✅               | ✅                    | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Fixed (explicit COPY, not `.`)    |
| **ActionTrail** | ✅                 | ✅                  | ✅               | ✅                    | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready (uses `.tarballs/` pattern) |
| **LocalMemGPT** | ✅                 | ✅                  | ✅               | ✅                    | Transitioning toward `pnpm` target       | Follow repo-local current lockfile | Repo-specific during transition | ✅ Ready (repo-root build context)   |
| **admin-web**   | —                  | ✅ (in common-plat) | N/A (`pnpm`)     | ✅ (conditional)      | `pnpm` workspace today                   | `pnpm-lock.yaml` via common-plat   | `pnpm` workspace template       | ✅ Ready                             |
| **tracker-web** | —                  | ✅ (in common-plat) | N/A (`pnpm`)     | ✅ (conditional)      | `pnpm` workspace today                   | `pnpm-lock.yaml` via common-plat   | `pnpm` workspace template       | ✅ Ready                             |

**All 10 product repos now have Dockerfiles, `docker-prep.sh`, and `output:'standalone'`.** Created 2026-03-22.

> **Note:** The table above tracks Docker readiness, not completed package-manager migration. For product repos, use each repo's actual `packageManager` field and lockfile until that repo is explicitly migrated to `pnpm`.

---

## 11. Dockerfile Template (reference)

> **Critical:** Run `docker-prep.sh` first for product repos that use `@bytelyst/*` `file:` dependencies. The prep step packs those dependencies into `.tarballs/` so Docker builds can resolve them inside the repo's own build context. During the migration window, Dockerfiles must match the repo's package manager and lockfile instead of assuming a single global install command.

### Backend / service template — `npm` repo variant

```dockerfile
# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN npx tsc

# Production stage
FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci --omit=dev --ignore-scripts

COPY --from=builder /app/dist ./dist
# Copy shared/product.json if the backend reads it at runtime
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]
```

### Backend / service template — `pnpm` repo variant

```dockerfile
# Pre-requisite: run ./scripts/docker-prep.sh if this repo rewrites @bytelyst/* file: deps
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --ignore-scripts

COPY tsconfig.json ./
COPY src/ ./src/
RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile --prod --ignore-scripts

COPY --from=builder /app/dist ./dist
COPY shared/ ./shared/ 2>/dev/null || true

EXPOSE ${PORT:-4010}
CMD ["node", "dist/server.js"]
```

### Web (Next.js 16) — `npm` repo variant

> **Prerequisite:** `next.config.ts` MUST have `output: 'standalone'` for the standalone Dockerfile pattern to work. Without it, `.next/standalone/` won't be generated and the COPY will fail.

```dockerfile
# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs
FROM node:22-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
COPY .tarballs/ ./.tarballs/
RUN npm ci

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN npm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]
```

### Web (Next.js 16) — `pnpm` repo variant

> **Prerequisite:** `next.config.ts` MUST have `output: 'standalone'` for the standalone Dockerfile pattern to work. Keep the repo's `build` script authoritative, including `--webpack` where required.

```dockerfile
# Pre-requisite: run ./scripts/docker-prep.sh to pack @bytelyst/* tarballs when applicable
FROM node:22-alpine AS builder
WORKDIR /app

RUN corepack enable && corepack prepare pnpm@10 --activate

COPY package.json pnpm-lock.yaml ./
COPY .tarballs/ ./.tarballs/
RUN pnpm install --frozen-lockfile

COPY . .

# Dummy env vars for Next.js build-time static page collection
ENV NEXT_PUBLIC_BACKEND_URL=http://localhost:4010
ENV NEXT_PUBLIC_PLATFORM_SERVICE_URL=http://localhost:4003

RUN pnpm run build

FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public 2>/dev/null || true

EXPOSE 3000
CMD ["node", "server.js"]
```

> **Template selection rule:**
>
> - Use the `npm` variant only for repos that are still on `npm` with `package-lock.json` and matching Docker/CI scripts.
> - Use the `pnpm` variant for repos that have migrated to `pnpm` and carry `pnpm-lock.yaml` plus aligned CI/Docker/docs.
> - Do **not** leave a repo in mixed state after migration.

### docker-prep.sh (for repos that don't have one yet)

Copy from `learning_ai_trails/scripts/docker-prep.sh` — it handles both `backend/` and `web/` targets, packs all `file:` refs into `.tarballs/`, and rewrites `package.json` to point at them.

The important rule is **behavior**, not shell-script ancestry:

- `docker-prep.sh` must support both legacy `npm` repos and migrated `pnpm` repos.
- It must **not** hardcode `npm` assumptions into tarball rewrite flow.
- It must preserve the repo's package-manager semantics after prep:
  - keep the correct lockfile
  - keep the correct install command in Docker/CI
  - keep `.tarballs/` handling compatible with the repo's active package manager

```bash
cp learning_ai_trails/scripts/docker-prep.sh <target-repo>/scripts/docker-prep.sh
chmod +x <target-repo>/scripts/docker-prep.sh
```

## 11.1 Long-Term Package-Manager Migration Roadmap

### End-state

- `learning_ai_common_plat` remains the canonical **`pnpm` workspace** monorepo.
- Node-based product repos migrate to **`pnpm` over time**.
- Product repos remain **independent repositories**, not one combined workspace.
- Current `.tarballs/` handling for `@bytelyst/*` remains supported unless it is explicitly simplified later.

### Migration principles

- No big-bang migration.
- One repo at a time.
- Fully green before moving to the next repo.
- Do not combine package-manager migration with unrelated dependency upgrades.
- Migrate CI, Docker, and docs together in the same repo migration.
- No mixed lockfile/package-manager state after migration.

### Phase 0 — policy and checklist

- Define package-manager policy.
- Define migration checklist.
- Define validation gates.

### Pilot

- `learning_ai_flowmonk`

### Wave 1

- `learning_ai_trails`
- `learning_ai_local_memory_gpt`

### Wave 2

- `learning_ai_notes`
- `learning_ai_fastgap`
- `learning_ai_clock`

### Wave 3

- `learning_ai_jarvis_jr`
- `learning_voice_ai_agent`

### Validation gates per migrated repo

A repo is only considered migrated when all of the following are aligned and passing:

- install
- test
- typecheck
- build
- Docker build
- local shared package resolution
- docs/CI updated

---

## 12. Audit Findings (Review 2026-03-22)

Systematic code review of all claims in this document against the actual codebase.

### F1. Port Conflicts (CRITICAL)

**Grafana** uses port 3000. The following webs also default to 3000:

- admin-web (no port in package.json)
- ChronoMind web (no port override)
- JarvisJr web (no port override)
- FlowMonk web (no port override)
- NoteLett web (Dockerfile EXPOSE 3000)
- ActionTrail web (Dockerfile EXPOSE 3000)

**Fix:** Set `PORT` env var in compose for each, or use host:container port remapping.

### F2. `file:` Dependencies Break Docker Builds (CRITICAL)

**Every** product backend and web has `file:../../learning_ai_common_plat/packages/*` dependencies in package.json. These resolve locally via symlinks but **fail inside Docker** because the sibling repo isn't in the build context.

**Pattern:** Each repo needs a `docker-prep.sh` that:

1. Runs `pnpm build` in common-plat
2. Packs each `@bytelyst/*` package into a `.tarballs/*.tgz`
3. Rewrites package.json `file:` refs → `file:.tarballs/bytelyst-*.tgz`
4. Preserves the product repo's active package-manager semantics during the rewrite

**All 10 repos now have `docker-prep.sh`** (created 2026-03-22). Previously only ActionTrail, LocalMemGPT, NoteLett, NomGap had them.

> **Long-term note:** As product repos migrate to `pnpm`, this pattern remains valid. What changes is the repo-local install/runtime contract (`pnpm install --frozen-lockfile` instead of `npm ci`), not the deployment architecture or the need to package `@bytelyst/*` dependencies for isolated Docker contexts.

### F3. NomGap Backend Dockerfile Ignores `file:` Deps (BUG)

`@/learning_ai_fastgap/backend/Dockerfile` does `COPY package.json → npm ci` but doesn't copy `.tarballs/`. The `file:` refs will fail. Needs the `.tarballs/` COPY step added.

### F4. NoteLett Backend Dockerfile Copies Everything (BUG)

`@/learning_ai_notes/backend/Dockerfile` does `COPY . .` in the build stage, which includes broken `node_modules` symlinks from `file:` deps. Should use explicit `COPY` of `src/`, `tsconfig.json`, and `.tarballs/` instead.

### F5. Missing `output: 'standalone'` in next.config.ts (CRITICAL)

The Dockerfile template copies from `.next/standalone/` — this directory only exists when `output: 'standalone'` is set in `next.config.ts`.

| Web            | Has `output: 'standalone'`? | Notes                                                             |
| -------------- | --------------------------- | ----------------------------------------------------------------- |
| NomGap         | ✅                          | Set directly                                                      |
| NoteLett       | ✅                          | Set directly                                                      |
| ActionTrail    | ✅                          | Set directly                                                      |
| LocalMemGPT    | ✅                          | Set directly                                                      |
| admin-web      | ✅                          | Conditional: `process.env.VERCEL ? {} : { output: 'standalone' }` |
| tracker-web    | ✅                          | Conditional (same)                                                |
| user-dashboard | ✅                          | Conditional (same)                                                |
| ChronoMind     | ✅                          | Added 2026-03-22 (conditional)                                    |
| JarvisJr       | ✅                          | Added 2026-03-22 (conditional)                                    |
| FlowMonk       | ✅                          | Added 2026-03-22 (conditional)                                    |
| MindLyst       | ✅                          | Added 2026-03-22 (conditional)                                    |

### F6. Build Context Mismatch for ActionTrail + LocalMemGPT

Their Dockerfiles expect repo-root as build context (they `COPY backend/...` and `COPY shared/...`). The compose `build:` must use `context: ./repo-name` + `dockerfile: backend/Dockerfile`, not `build: ./repo-name/backend`.

**Already correct in the compose above.** Calling it out so future editors don't "simplify" it.

### F7. Node.js Version Inconsistency

Existing Dockerfiles use mixed Node versions:

- NomGap, NoteLett: `node:20-alpine`
- ActionTrail, LocalMemGPT: `node:22-alpine` / `node:22-slim`

**Recommendation:** Standardize on `node:22-alpine` for all new Dockerfiles. Existing ones work but should be updated for consistency.

### F8. Missing `--webpack` Flag for Next.js Builds

Several web apps require `--webpack` flag for builds (Serwist PWA incompatible with Turbopack, or `@bytelyst/*` file: ref transpilation). The Dockerfile template should call the repo's package-manager-appropriate build command (`npm run build` or `pnpm run build`) and that script should map to `next build --webpack` where required.

### F9. Missing `.env.ecosystem` Template

The compose references `.env.ecosystem` but the doc doesn't define its contents. Key vars needed:

```env
# .env.ecosystem — shared env for all services
COSMOS_ENDPOINT=https://cosmos-emulator:8081
COSMOS_KEY=<emulator-key>
COSMOS_DATABASE=bytelyst
JWT_SECRET=dev-ecosystem-secret-change-me
AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=http://azurite:10000/devstoreaccount1;
PLATFORM_SERVICE_URL=http://platform-service:4003
EXTRACTION_SERVICE_URL=http://extraction-service:4005
DB_PROVIDER=memory
NODE_ENV=production
CORS_ORIGIN=*
SMTP_HOST=mailpit
SMTP_PORT=1025
```

### F10. `host.docker.internal` Only Works on Docker Desktop (Mac/Windows)

LocalMemGPT uses `OLLAMA_URL: 'http://host.docker.internal:11434'` — this works on Docker Desktop but **not on Linux VMs** (which is the likely deployment target).

**Fix on Linux:** Add `extra_hosts: ['host.docker.internal:host-gateway']` to the service, or use `network_mode: host`.

### Summary of Required Work Before Compose Works

| Priority | Item                                                     | Count         | Status                                                     |
| -------- | -------------------------------------------------------- | ------------- | ---------------------------------------------------------- |
| **P0**   | Create missing `docker-prep.sh`                          | 6 repos       | ✅ Done (3 created, 3 already existed)                     |
| **P0**   | Create missing backend Dockerfiles                       | 6 repos       | ✅ Done                                                    |
| **P0**   | Create missing web Dockerfiles                           | 5 repos       | ✅ Done (4 created, PeakPulse has no web)                  |
| **P0**   | Add `output: 'standalone'` to next.config.ts             | 3 webs        | ✅ Done (4 webs: ChronoMind, JarvisJr, FlowMonk, MindLyst) |
| **P1**   | Fix NomGap backend Dockerfile (add `.tarballs/` COPY)    | 1 file        | ✅ Done                                                    |
| **P1**   | Fix NoteLett backend Dockerfile (explicit COPY, not `.`) | 1 file        | ✅ Done                                                    |
| **P1**   | Create `.env.ecosystem` template                         | 1 file        | Pending                                                    |
| **P2**   | Standardize Node.js version to 22-alpine                 | 4 Dockerfiles | ✅ Done (all new Dockerfiles use 22-alpine)                |
| **P2**   | Add `extra_hosts` for Linux VM Ollama access             | 1 service     | Pending                                                    |

---

## 13. K8s & Docker Best Practices (from Production Comparisons)

> Derived from comparing three production K8s deployments: a Go-based Call Controller (Paladin), a Python/FastAPI streaming agent platform (NetBond), and a Python/FastAPI voice agent (Welcome Agent). These patterns should be adopted when ByteLyst moves from Docker Compose → K3s → managed K8s.

### 13.1 Deployment — Zero-Downtime Rolling Updates

**Do this (NetBond pattern):**

```yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0 # Never kill a pod before its replacement is ready
      maxSurge: 1 # Only 1 extra pod during rollout
  template:
    spec:
      terminationGracePeriodSeconds: 45 # Match your app's drain timeout
      containers:
        - lifecycle:
            preStop:
              exec:
                command: ['sleep', '5'] # Let load balancer deregister before SIGTERM
```

**Don't do this (Paladin anti-pattern):**

```yaml
maxUnavailable: 50% # Half your pods die instantly — users get errors
maxSurge: 50% # Wastes resources by doubling pod count
```

**ByteLyst action:** Every deployment template should use `maxUnavailable: 0` + preStop sleep + explicit `terminationGracePeriodSeconds` matching the Fastify graceful shutdown timeout.

### 13.2 Pod Security Context

**Always set (NetBond pattern):**

```yaml
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
```

If the app needs writable paths (e.g., `/tmp`, cache dirs), use `emptyDir` volumes:

```yaml
volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}
volumeMounts:
  - name: tmp
    mountPath: /tmp
  - name: cache
    mountPath: /home/node/.cache
```

**ByteLyst action:** All Fastify backends are stateless — `readOnlyRootFilesystem: true` works. Next.js standalone servers may need `/tmp` writable.

### 13.3 Health Probes — Dedicated Endpoints

**Do this:**

```yaml
livenessProbe:
  httpGet:
    path: /health # Dedicated lightweight endpoint
    port: 4003
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5 # Fast fail — 5s max
readinessProbe:
  httpGet:
    path: /health
    port: 4003
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 5
```

**Don't do this (Welcome Agent anti-pattern):**

```yaml
livenessProbe:
  httpGet:
    path: /openapi.json # Heavy endpoint, not a health check
  timeoutSeconds: 60 # Masks real failures for a full minute
```

**ByteLyst action:** All backends already expose `GET /health` → `{ status: "ok" }`. Use it. Set timeout to 5s.

### 13.4 Ingress — WebSocket Support

If any service uses WebSocket or SSE (FlowMonk SSE, LocalMemGPT streaming, future real-time features):

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
    nginx.ingress.kubernetes.io/proxy-buffering: 'off'
    nginx.ingress.kubernetes.io/proxy-http-version: '1.1'
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
```

Missing WebSocket headers is a silent failure — connections drop after 60s with no error.

### 13.5 HPA — Use `autoscaling/v2`

**Do this:**

```yaml
apiVersion: autoscaling/v2 # Current API, supports multiple metrics
```

**Don't do this:**

```yaml
apiVersion: autoscaling/v1 # Deprecated, CPU-only, will be removed
```

### 13.6 Dockerfile Best Practices

| Practice            | Do                                                         | Don't                                                                              |
| ------------------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| **ENTRYPOINT form** | `ENTRYPOINT ["node", "dist/server.js"]` (exec form)        | `ENTRYPOINT node dist/server.js` (shell form — PID 1 is `/bin/sh`, signals broken) |
| **COPY scope**      | `COPY package.json ./` then `COPY src/ ./src/` (selective) | `COPY . .` (copies node_modules, .git, tests, everything)                          |
| **Layer count**     | Combine related `RUN` steps                                | 3 separate `RUN pip install` / `RUN npm install` steps                             |
| **Non-root**        | `USER node` (Node.js images have a `node` user)            | Running as root in production                                                      |
| **Local variant**   | Provide `local.Dockerfile` without corp proxy/JFrog deps   | Single Dockerfile that only works behind corporate proxy                           |
| **Build args**      | `ARG NODE_ENV=production` for conditional behavior         | Hardcoded env in Dockerfile                                                        |

### 13.7 Helm Values Layering

Use 3 layers for environment management:

```
values.yaml          # Base defaults (image, port, probes, resources)
├── env/local.yaml   # Local K3s overrides (lower resources, NodePort, no TLS)
├── env/dev.yaml     # Dev cluster overrides (replicas, hostnames, secrets)
└── env/prod.yaml    # Prod overrides (more replicas, real TLS, HPA limits)
```

Deploy with layered `-f` flags:

```bash
# Local
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/local.yaml

# Dev
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/dev.yaml

# Prod
helm upgrade --install myapp ./charts -f charts/values.yaml -f charts/env/prod.yaml
```

### 13.8 Namespace Strategy

Use Helm `_helpers.tpl` for namespace — never hardcode:

```yaml
# ✅ Standard pattern — respects --namespace flag
{{ include "myapp.namespace" . }}

# ❌ Anti-pattern — ignores helm --namespace, causes confusion
{{ .Values.namespace }}
```

### 13.9 Secrets Management Progression

| Phase                    | Strategy                                              | Complexity |
| ------------------------ | ----------------------------------------------------- | ---------- |
| **Phase 1** (Compose)    | `.env.ecosystem` file (gitignored)                    | Trivial    |
| **Phase 2** (K3s)        | Native K8s `Secret` objects + `kubectl create secret` | Low        |
| **Phase 3** (Production) | Azure Key Vault via `SecretProviderClass` CSI driver  | Medium     |
| **Phase 4** (Enterprise) | AKV + `AzureKeyVaultSecret` CRD with auto-sync        | High       |

ByteLyst already uses AKV in production (platform-service) — the CSI driver pattern is the natural next step.

### 13.10 CI/CD Best Practices (Lessons from Production Pipelines)

| Practice               | Description                                                                                                  |
| ---------------------- | ------------------------------------------------------------------------------------------------------------ |
| **Semantic release**   | Auto-version from commit messages (`feat:` → minor, `fix:` → patch). ByteLyst already uses this convention.  |
| **Image promotion**    | Build once → push to staging repo → promote to gold/prod repo (never rebuild for prod).                      |
| **Branch pipelines**   | Different CI stages per branch: feature (lint+test), develop (build+deploy-dev), main (promote+deploy-prod). |
| **Security gates**     | SAST + SCA scans on every build. Block merges on critical findings.                                          |
| **Quality gates**      | Unit tests + coverage + SonarQube. Fail pipeline if coverage drops.                                          |
| **Auto-deploy to dev** | Pipeline trigger: when build completes → auto-deploy to dev. Manual gate for prod.                           |
| **Chart versioning**   | Publish Helm chart to OCI registry (ACR) with semantic version. Pull by version during deploy.               |

### 13.11 Local K8s Development Script Template

A good local K8s deploy script should handle both Docker Desktop K8s (kind) and K3s:

```bash
#!/usr/bin/env bash
# deploy-local-k8s.sh — Full local K8s deployment for ByteLyst ecosystem
# Works with both Docker Desktop Kubernetes and K3s.

set -euo pipefail

NAMESPACE="bytelyst"
ACTION="${1:-deploy}"  # deploy | teardown

# Detect K8s runtime
detect_runtime() {
  local ctx
  ctx=$(kubectl config current-context 2>/dev/null || echo "")
  if [[ "$ctx" == "docker-desktop" ]]; then
    echo "docker-desktop"  # kind cluster inside Docker Desktop
  elif command -v k3s &>/dev/null; then
    echo "k3s"
  else
    echo "unknown"
  fi
}

case "$ACTION" in
  deploy)
    RUNTIME=$(detect_runtime)
    echo "Detected K8s runtime: $RUNTIME"

    # 1. Build all Docker images
    echo "Building images..."
    for svc in platform-service extraction-service mcp-server; do
      docker build -t bytelyst/$svc:local ./learning_ai_common_plat/services/$svc
    done

    # 2. Load images into K8s runtime
    if [[ "$RUNTIME" == "docker-desktop" ]]; then
      echo "Docker Desktop: images are already available to K8s (shared daemon)."
    elif [[ "$RUNTIME" == "k3s" ]]; then
      echo "K3s: importing images into containerd..."
      for img in $(docker images --format '{{.Repository}}:{{.Tag}}' | grep bytelyst); do
        sudo k3s ctr images import <(docker save "$img")
      done
    else
      echo "WARNING: Unknown K8s runtime. You may need to load images manually."
    fi

    # 3. Create namespace + secrets
    kubectl create namespace "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -
    kubectl create secret generic bytelyst-secrets \
      --from-env-file=.env.ecosystem \
      -n "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

    # 4. Deploy via Helm with local overlay
    helm upgrade --install bytelyst ./helm/bytelyst-ecosystem \
      -f helm/bytelyst-ecosystem/values.yaml \
      -f helm/bytelyst-ecosystem/env/local.yaml \
      -n "$NAMESPACE"

    # 5. Wait + verify
    kubectl rollout status deploy -n "$NAMESPACE" --timeout=120s
    echo ""
    echo "All pods:"
    kubectl get pods -n "$NAMESPACE"
    echo ""
    if [[ "$RUNTIME" == "docker-desktop" ]]; then
      echo "View in Docker Desktop: Kubernetes tab → namespace: $NAMESPACE"
    fi
    echo "Port-forward: kubectl port-forward svc/platform-service 4003:4003 -n $NAMESPACE"
    ;;

  teardown)
    helm uninstall bytelyst -n "$NAMESPACE" 2>/dev/null || true
    kubectl delete namespace "$NAMESPACE" 2>/dev/null || true
    echo "Teardown complete."
    ;;
esac
```

### 13.12 Quick Reference — What to Apply at Each Phase

| Best Practice                | Phase 1 (Compose)        | Phase 2 (K3s)      | Phase 3 (Prod K8s) |
| ---------------------------- | ------------------------ | ------------------ | ------------------ |
| Zero-downtime rolling update | N/A                      | ✅ Apply           | ✅ Apply           |
| Pod security context         | N/A                      | ✅ Apply           | ✅ Apply           |
| Health probes                | N/A (use `healthcheck:`) | ✅ Apply           | ✅ Apply           |
| WebSocket ingress headers    | N/A                      | ✅ If using SSE/WS | ✅ Apply           |
| HPA v2                       | N/A                      | Optional           | ✅ Apply           |
| Exec-form ENTRYPOINT         | ✅ Apply now             | ✅                 | ✅                 |
| Selective COPY               | ✅ Apply now             | ✅                 | ✅                 |
| Non-root user                | ✅ Apply now             | ✅                 | ✅                 |
| Values layering              | N/A                      | ✅ Apply           | ✅ Apply           |
| Secrets via AKV CSI          | N/A                      | N/A                | ✅ Apply           |
| Semantic release             | ✅ Apply now             | ✅                 | ✅                 |
| Image promotion              | N/A                      | N/A                | ✅ Apply           |
| Local deploy script          | N/A                      | ✅ Apply           | ✅ Adapt           |

---

## Summary

| Question                       | Answer                                                                                                         |
| ------------------------------ | -------------------------------------------------------------------------------------------------------------- |
| **Can deploy on single VM?**   | **Yes.** All ~25 services fit in 32 GB RAM.                                                                    |
| **All Dockerized?**            | **Yes.** All 10 product repos now have Dockerfiles + docker-prep.sh.                                           |
| **Package-manager direction?** | **`pnpm` is the long-term standard** for Node/TS repos, but migration is phased repo-by-repo, not big-bang.    |
| **K8s practice on single VM?** | **Docker Desktop K8s** (Mac/Windows) or **K3s** (Linux). Same manifests scale to AKS/EKS/GKE.                  |
| **Recommended VM?**            | 8 vCPU / 32 GB (min) or 16 vCPU / 64 GB (with Ollama). Hetzner ~$45/mo for dev.                                |
| **Time to production K8s?**    | Phase 1 (compose) → Phase 2 (Docker Desktop / K3s) → Phase 3 (multi-node) → Phase 4 (managed). Same manifests. |