learning_ai_common_plat/docs/devops/SINGLE_VM_ENHANCED_PLAN.md

21 KiB
Raw Blame History

ByteLyst Ecosystem — Enhanced Single-VM Deployment Plan

Deploy the entire ByteLyst ecosystem on a single VM using modern open-source tooling. All 10 product repos consume @bytelyst/* packages from a local Gitea npm registry (49 packages). All Dockerfiles use pnpm + BuildKit secret mount. 1,591 backend tests green, 9/9 web typechecks clean.


Tier 1 — Game Changers (add these first)

Tool Replaces Why
Coolify Manual compose orchestration + Traefik config + SSL + deploy scripts Self-hosted PaaS. Git-push deploys, automatic SSL (Let's Encrypt), env var management UI, Docker Compose support, real-time logs, one-click rollbacks. Eliminates ~60% of manual deployment work.
Uptime Kuma Custom health-check scripts + prototype-self-test.sh Beautiful status page + monitoring for all 25+ endpoints. Slack/Discord/email alerts. Multi-protocol (HTTP, TCP, DNS, Docker). Setup: 2 minutes.
Prometheus + node-exporter + cadvisor Missing metrics stack next to Grafana/Loki Adds host metrics, container metrics, alertable service metrics, and closes the main observability gap in the current VM stack.
Valkey (Redis fork, BSD licensed) In-memory caches scattered across services Centralized session store, rate-limit counters, pub/sub for SSE fan-out, feature flag cache, job queue backend. Eliminates per-service in-memory state that dies on restart.

Tier 2 — Operational Excellence

Tool Replaces Why
Dozzle Loki+Grafana (for single-VM log viewing) Lightweight real-time Docker log viewer. Zero config, 8MB image, web UI. Keep Loki+Grafana for structured queries but use Dozzle for quick debugging.
Portainer CE CLI-only Docker management Visual container management, resource monitoring, compose stack deployment, volume management. Good for when the AI agent isn't available.
Restic + cron No backup strategy Encrypted, deduplicated backups of Docker volumes (Cosmos data, Gitea repos, Grafana dashboards) to Azure Blob or S3. Scheduled via the platform-service jobs module.
SOPS + age Plain .env files (secrets in cleartext) Encrypt secrets in git. sops -e .env.production > .env.production.enc. Decrypt at deploy time. No Key Vault dependency for single-VM.
PostgreSQL + pgvector Ad hoc relational/vector persistence plans Add only when a concrete service needs relational data plus embedding/vector search. Not a day-one requirement for the current VM.

Tier 3 — Developer Experience

Tool Replaces Why
Ollama External LLM API calls Local LLM inference for LocalMemGPT, extraction-service, and AI coaching features. Already referenced in compose. GPU optional (CPU works for small models).
Windmill Custom bash/cron scripts Open-source workflow engine (like n8n but code-first). Schedule package publishes, backup jobs, health sweeps, dependency updates. TypeScript/Python scripts with UI.
Caddy Traefik (if Coolify isn't used) Automatic HTTPS with zero config. Simpler than Traefik for single-domain setups. If using Coolify, Coolify handles this internally.

Decision Matrix: Coolify vs. Raw Docker Compose

Factor Coolify Raw Compose
Setup time ~15 min (one script) ~6-7 hours
SSL/HTTPS Automatic (Let's Encrypt) Manual (Caddy/Traefik + certs)
Git push deploy Built-in Custom webhook + script
Env var management Web UI per service .env files
Rollback One-click git revert + rebuild
Log viewer Built-in Dozzle or Loki
Resource monitoring Built-in dashboard Portainer or Grafana
Learning curve Low (GUI-driven) Medium (YAML wrangling)
Flexibility High (supports compose, Dockerfile, Nixpacks) Maximum
K8s migration path Export to compose, then convert Direct conversion

Recommendation: Use Coolify for the VM deployment. It's a mature, actively maintained project (36K+ GitHub stars) that handles the boring plumbing. Reserve raw compose/K3s for when you need multi-node or fine-grained control.

  1. Keep Grafana and Loki internal on the VM.
  2. Add Prometheus + node-exporter + cadvisor next.
  3. Add Valkey after metrics are in place.
  4. Add PostgreSQL + pgvector only when a concrete product or platform service requires it.

This keeps the stack incremental and avoids carrying the operational weight of PostgreSQL before there is a real consumer.


2. Enhanced Architecture (Single VM)

┌─────────────────────────────────────────────────────────────┐
│                     Azure VM (32 GB)                        │
│                                                             │
│  ┌─── Coolify (PaaS layer) ──────────────────────────────┐ │
│  │  • Git-push deploy for all repos                      │ │
│  │  • Automatic SSL via Let's Encrypt                    │ │
│  │  • Reverse proxy (built-in Traefik)                   │ │
│  │  • Environment variable management                    │ │
│  │  • Container lifecycle management                     │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Infrastructure ────────────────────────────────────┐ │
│  │  Gitea (npm registry + git + CI)          port 3300   │ │
│  │  Cosmos DB Emulator                       port 8081   │ │
│  │  Valkey (Redis-compatible)                port 6379   │ │
│  │  Azurite (blob storage)                   port 10000  │ │
│  │  Mailpit (SMTP sandbox)                   port 1025   │ │
│  │  Ollama (LLM inference)                   port 11434  │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Shared Services ───────────────────────────────────┐ │
│  │  platform-service          port 4003  (37 modules)    │ │
│  │  extraction-service        port 4005  (+ Python)      │ │
│  │  mcp-server                port 4007  (tool hub)      │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Product Backends (10) ─────────────────────────────┐ │
│  │  PeakPulse 4010 │ ChronoMind 4011 │ JarvisJr 4012    │ │
│  │  NomGap 4013    │ MindLyst 4014   │ LysnrAI 4015     │ │
│  │  NoteLett 4016  │ FlowMonk 4017   │ ActionTrail 4018 │ │
│  │  LocalMemGPT 4019                                     │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Web Dashboards (11) ───────────────────────────────┐ │
│  │  admin 3001 │ user 3002 │ tracker 3003                │ │
│  │  NomGap 3040 │ MindLyst 3050 │ ChronoMind 3051       │ │
│  │  JarvisJr 3052 │ FlowMonk 3053 │ NoteLett 3054       │ │
│  │  ActionTrail 3060 │ LocalMemGPT 3070                  │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Observability ─────────────────────────────────────┐ │
│  │  Uptime Kuma (status page)            port 3333       │ │
│  │  Dozzle (live log viewer)             port 9999       │ │
│  │  Grafana + Loki (structured logs)     port 3000/3100  │ │
│  │  Portainer CE (container mgmt)        port 9443       │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─── Automation ────────────────────────────────────────┐ │
│  │  Gitea Actions (CI runner)                            │ │
│  │  Restic (scheduled volume backups)                    │ │
│  └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Total containers: ~35 | Estimated RAM: ~10 GB idle, ~18 GB under load | VM: 32 GB recommended


3. Docker Build Optimizations

3.1 BuildKit Cache Mounts (already partially used)

Add pnpm store caching to all Dockerfiles for 3-5× faster rebuilds:

# Current (good but slow on cache miss):
RUN --mount=type=secret,id=gitea_npm_token \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token)" && \
    pnpm install --ignore-scripts --lockfile=false

# Enhanced (cache pnpm store across builds):
RUN --mount=type=secret,id=gitea_npm_token \
    --mount=type=cache,id=pnpm-store,target=/root/.local/share/pnpm/store \
    export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token)" && \
    pnpm install --ignore-scripts --lockfile=false

3.2 Multi-Repo Parallel Build Script

#!/bin/bash
# scripts/build-all-images.sh — parallel Docker builds for all services
export DOCKER_BUILDKIT=1
export TOKEN="$(cat ~/.gitea-npm-token)"

build_image() {
  local repo=$1 dockerfile=$2 tag=$3 context=$4
  echo "Building $tag..."
  docker build \
    --add-host localhost:host-gateway \
    --build-arg GITEA_NPM_HOST=host.docker.internal \
    --secret id=gitea_npm_token,env=TOKEN \
    -f "$dockerfile" -t "$tag" "$context" 2>&1 | tail -1
}

# Infrastructure builds (parallel)
build_image common-plat services/platform-service/Dockerfile bytelyst/platform-service:latest ./learning_ai_common_plat &
build_image common-plat services/extraction-service/Dockerfile bytelyst/extraction-service:latest ./learning_ai_common_plat &
build_image common-plat services/mcp-server/Dockerfile bytelyst/mcp-server:latest ./learning_ai_common_plat &
wait

# Product backends (parallel batches of 5)
build_image flowmonk backend/Dockerfile bytelyst/flowmonk-backend:latest ./learning_ai_flowmonk &
build_image notelett backend/Dockerfile bytelyst/notelett-backend:latest ./learning_ai_notes &
build_image actiontrail backend/Dockerfile bytelyst/actiontrail-backend:latest ./learning_ai_trails &
build_image localmemgpt backend/Dockerfile bytelyst/localmemgpt-backend:latest ./learning_ai_local_memory_gpt &
build_image nomgap backend/Dockerfile bytelyst/nomgap-backend:latest ./learning_ai_fastgap &
wait

build_image chronomind backend/Dockerfile bytelyst/chronomind-backend:latest ./learning_ai_clock &
build_image jarvisjr backend/Dockerfile bytelyst/jarvisjr-backend:latest ./learning_ai_jarvis_jr &
build_image peakpulse backend/Dockerfile bytelyst/peakpulse-backend:latest ./learning_ai_peakpulse &
build_image mindlyst backend/Dockerfile bytelyst/mindlyst-backend:latest ./learning_multimodal_memory_agents &
build_image lysnrai backend/Dockerfile bytelyst/lysnrai-backend:latest ./learning_voice_ai_agent &
wait

echo "All images built."

3.3 Docker Compose Profiles (selective startup)

# In docker-compose.ecosystem.yml, add profiles:
services:
  cosmos-emulator:
    profiles: [infra, full]
  platform-service:
    profiles: [platform, full]
  flowmonk-backend:
    profiles: [products, full]
  admin-web:
    profiles: [web, full]
  uptime-kuma:
    profiles: [observability, full]
# Start only infra:
docker compose --profile infra up -d

# Start infra + platform:
docker compose --profile infra --profile platform up -d

# Start everything:
docker compose --profile full up -d

4. Valkey Integration Plan

Why Valkey over raw Redis

  • BSD-licensed fork (Redis switched to SSPL)
  • Drop-in Redis-compatible (same protocol, same clients)
  • Actively maintained by Linux Foundation

What moves to Valkey

Current Moves to Valkey Benefit
In-memory rate limit counters (lib/rate-limiter.ts) INCR + EXPIRE Survives restarts, shared across replicas
In-memory feature flag cache (lib/feature-flags.ts) GET/SET with TTL Instant cross-service flag propagation
In-memory TTL cache (lib/cache.ts in ActionTrail) Redis GET/SET Shared cache across service replicas
In-process event bus (@bytelyst/events) Redis Pub/Sub Cross-service event propagation
SSE hub connections Redis Pub/Sub fan-out Multi-replica SSE without sticky sessions
Session tokens (Cosmos queries) Redis session store Sub-ms session lookups

Compose entry

valkey:
  image: valkey/valkey:8-alpine
  ports: ['6379:6379']
  volumes: [valkey-data:/data]
  command: valkey-server --save 60 1 --loglevel warning
  healthcheck:
    test: ['CMD', 'valkey-cli', 'ping']
    interval: 10s
  restart: unless-stopped

Package: @bytelyst/cache

New shared package wrapping ioredis with Valkey connection:

// packages/cache/src/index.ts
import Redis from 'ioredis';

export function createCacheClient(url = process.env.VALKEY_URL ?? 'redis://localhost:6379') {
  return new Redis(url, { lazyConnect: true, maxRetriesPerRequest: 3 });
}

export function createPubSub(url = process.env.VALKEY_URL ?? 'redis://localhost:6379') {
  return { publisher: new Redis(url), subscriber: new Redis(url) };
}

5. Coolify Setup (15-minute path)

Prerequisites

  • Ubuntu 24.04 VM with Docker installed
  • Domain pointing to VM IP (e.g., *.bytelyst.dev)

Install

# One command — installs Coolify + all dependencies
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash

Configure for ByteLyst

  1. Add Gitea as git source — Coolify connects to Gitea via API token
  2. Add each repo as a "service" — Coolify auto-detects Dockerfile, builds, deploys
  3. Set env vars per service — Web UI, encrypted at rest
  4. Enable auto-deploy — Push to main → build → deploy (via Gitea webhook)
  5. Configure domainsplatform.bytelyst.dev, flowmonk.bytelyst.dev, etc.

What Coolify handles automatically

  • Traefik reverse proxy with automatic SSL
  • Docker image builds with BuildKit
  • Container health checks and auto-restart
  • Rolling deploys with zero-downtime
  • Resource monitoring dashboard
  • Persistent volume management
  • Webhook-triggered deployments from Gitea

What still needs manual compose

Coolify supports Docker Compose files natively, so the infra services (Cosmos emulator, Valkey, Ollama, etc.) can be deployed as a single compose stack through the Coolify UI.


6. Observability Stack

Uptime Kuma — Status Page + Alerting

uptime-kuma:
  image: louislam/uptime-kuma:1
  ports: ['3333:3001']
  volumes: [uptime-kuma-data:/app/data]
  restart: unless-stopped

Configure 25+ monitors:

  • All /health endpoints (backends + services)
  • Web dashboard reachability
  • Cosmos emulator connectivity
  • Gitea API availability
  • Valkey ping
  • Ollama model availability

Dozzle — Live Container Logs

dozzle:
  image: amir20/dozzle:latest
  ports: ['9999:8080']
  volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
  restart: unless-stopped

Keep Loki + Grafana for

  • Structured log queries across time ranges
  • Custom dashboards (request latency, error rates, Cosmos RU consumption)
  • Alert rules based on log patterns

7. Backup Strategy (Restic)

# Install restic
apt install restic

# Init backup repo (Azure Blob, S3, local, or SFTP)
restic -r azure:bytelyst-backups:/ init

# Backup all Docker volumes
restic backup \
  /var/lib/docker/volumes/gitea-data/ \
  /var/lib/docker/volumes/cosmos-data/ \
  /var/lib/docker/volumes/valkey-data/ \
  /var/lib/docker/volumes/grafana-data/ \
  /var/lib/docker/volumes/uptime-kuma-data/ \
  --tag daily

# Cron: daily at 2 AM
echo "0 2 * * * restic backup ... --tag daily && restic forget --keep-daily 7 --keep-weekly 4 --prune" | crontab -

8. Secret Management (SOPS + age)

# Generate age key pair (one-time)
age-keygen -o ~/.config/sops/age/keys.txt

# Create .sops.yaml in repo root
cat > .sops.yaml << 'EOF'
creation_rules:
  - path_regex: \.env\..*\.enc$
    age: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF

# Encrypt secrets
sops -e .env.production > .env.production.enc

# Decrypt at deploy time (on VM)
sops -d .env.production.enc > .env.production

# Git-safe: .env.production.enc is committed, .env.production is gitignored

9. Implementation Order

Step Time What Tools
1 15 min Install Coolify on VM curl one-liner
2 30 min Deploy infra compose stack (Cosmos, Valkey, Gitea, Azurite, Mailpit, Ollama) via Coolify Coolify UI
3 30 min Publish 49 @bytelyst/* packages to VM's Gitea scripts/publish-all.sh
4 1 hr Add all 13 service repos to Coolify (auto-detect Dockerfile, set env vars) Coolify UI
5 30 min Deploy Uptime Kuma + configure 25+ health monitors Coolify + Uptime Kuma UI
6 15 min Deploy Dozzle for live log viewing Coolify
7 30 min Configure SOPS + age for secrets, encrypt .env.production CLI
8 30 min Configure Restic backups for all stateful volumes CLI + cron
9 30 min Smoke test: hit all /health endpoints, verify Uptime Kuma green Browser + curl

Total: ~4.5 hours to go from bare VM to fully running ecosystem


10. VM Sizing

Minimum (32 GB) — dev/staging, no Ollama

Component Count RAM
Cosmos DB Emulator 1 ~2 GB
Fastify backends 13 ~2 GB
Next.js web apps 11 ~2.2 GB
Valkey 1 ~100 MB
Infra (Traefik, Loki, Grafana, Azurite, Mailpit, Gitea) 7 ~1 GB
Observability (Uptime Kuma, Dozzle, Portainer) 3 ~300 MB
Coolify overhead 1 ~500 MB
Subtotal ~37 ~8.1 GB
Headroom for builds + spikes ~24 GB

Same as above + Ollama (~8 GB for llama3:8b) = ~16 GB active, ~48 GB headroom.

Cloud Pricing

Provider Instance vCPU RAM Price
Hetzner CPX51 16 32 GB ~€45/mo ← best value
Hetzner CCX33 8 32 GB ~€55/mo (dedicated)
Azure Standard_D8s_v5 8 32 GB ~$280/mo
Home Mac Mini M4 Pro 12 48 GB One-time ~$1,600

11. What This Plan Does NOT Cover (Future Work)

  • Multi-node K3s — Phase 2, same manifests, add workers with k3s agent
  • Managed Kubernetes (AKS/EKS) — Phase 3, same manifests + Helm chart
  • CI/CD pipeline for automated package publish → image build → deploy
  • Custom domain + DNS — depends on registrar choice
  • WAF / DDoS protection — Cloudflare free tier in front of Coolify
  • Mobile app distribution — TestFlight / Play Console (separate from VM)