docs: add enhanced single-VM deployment plan with Coolify, Valkey, Uptime Kuma, SOPS

New SINGLE_VM_ENHANCED_PLAN.md covers:
- Coolify as self-hosted PaaS (15-min setup vs 6-7hr manual)
- Valkey (Redis fork) for sessions, cache, pub/sub, rate limits
- Uptime Kuma for status page + alerting
- Dozzle for live container logs
- SOPS + age for git-safe encrypted secrets
- Restic for volume backups
- BuildKit cache mounts for faster Docker builds
- Docker Compose profiles for selective startup
- Revised 4.5-hour implementation timeline

Also updates SINGLE_VM_DEPLOYMENT.md §3 prerequisite to reference
resolved Gitea registry migration and new enhanced plan.
This commit is contained in:
saravanakumardb1 2026-03-24 07:49:50 -07:00
parent b0a4b2d9c3
commit baf47ac56b
2 changed files with 454 additions and 1 deletions

View File

@ -134,7 +134,9 @@
### Phase 1: Docker Compose (after prerequisite work)
> **⚠️ Prerequisite:** product repos that still rely on `file:`-based `@bytelyst/*` Docker consumption must run `docker-prep.sh` before building images (see §12 Audit Findings). FlowMonk's current backend/web Docker path is the registry-backed exception and uses repo-root build context instead. All Dockerfiles and `output: 'standalone'` configs are now in place (completed 2026-03-22). During the package-manager transition, each repo's Docker build must follow that repo's declared package manager and lockfile semantics rather than assuming `npm` or `pnpm` globally.
> **✅ Prerequisite RESOLVED (2026-03-24):** All 10 repos now consume `@bytelyst/*` packages from the Gitea npm registry. `docker-prep.sh` has been deleted from all repos. Docker builds use pnpm + BuildKit secret mount pattern. See [`GITEA_NPM_REGISTRY_MIGRATION.md`](GITEA_NPM_REGISTRY_MIGRATION.md) §14-17 for details.
>
> **📋 Enhanced plan:** See [`SINGLE_VM_ENHANCED_PLAN.md`](SINGLE_VM_ENHANCED_PLAN.md) for the updated deployment plan with Coolify, Valkey, Uptime Kuma, and other open-source tooling additions.
Create a **unified** `docker-compose.ecosystem.yml` that brings everything up.

View File

@ -0,0 +1,451 @@
# ByteLyst Ecosystem — Enhanced Single-VM Deployment Plan
> Supersedes the stale sections of `SINGLE_VM_DEPLOYMENT.md`. Incorporates lessons from the Gitea registry migration (2026-03-24) and introduces open-source tooling to minimize setup time while maximizing robustness.
---
## 0. What Changed Since the Original Plan
The original `SINGLE_VM_DEPLOYMENT.md` was written during the `file:` → registry transition. These items are now **resolved**:
- ✅ All 10 repos consume `@bytelyst/*` from Gitea npm registry (`^0.1.0`)
- ✅ `docker-prep.sh` deleted from all repos — no more tarball prep step
- ✅ All Dockerfiles use pnpm + BuildKit secret mount pattern
- ✅ 49 packages published, 1,591 backend tests green, 9/9 web typechecks clean
- ✅ Docker builds verified for MindLyst + LysnrAI (the two non-standard repos)
**The prerequisite blocker in §4.1 of the original plan is gone.** We can now build any image with just `docker build` + registry auth.
---
## 1. Recommended Open-Source Tooling Additions
### Tier 1 — Game Changers (add these first)
| Tool | Replaces | Why |
|------|----------|-----|
| **[Coolify](https://coolify.io)** | Manual compose orchestration + Traefik config + SSL + deploy scripts | Self-hosted PaaS. Git-push deploys, automatic SSL (Let's Encrypt), env var management UI, Docker Compose support, real-time logs, one-click rollbacks. **Eliminates ~60% of manual deployment work.** |
| **[Uptime Kuma](https://github.com/louislam/uptime-kuma)** | Custom health-check scripts + `prototype-self-test.sh` | Beautiful status page + monitoring for all 25+ endpoints. Slack/Discord/email alerts. Multi-protocol (HTTP, TCP, DNS, Docker). Setup: 2 minutes. |
| **[Valkey](https://valkey.io)** (Redis fork, BSD licensed) | In-memory caches scattered across services | Centralized session store, rate-limit counters, pub/sub for SSE fan-out, feature flag cache, job queue backend. Eliminates per-service in-memory state that dies on restart. |
### Tier 2 — Operational Excellence
| Tool | Replaces | Why |
|------|----------|-----|
| **[Dozzle](https://dozzle.dev)** | Loki+Grafana (for single-VM log viewing) | Lightweight real-time Docker log viewer. Zero config, 8MB image, web UI. Keep Loki+Grafana for structured queries but use Dozzle for quick debugging. |
| **[Portainer CE](https://portainer.io)** | CLI-only Docker management | Visual container management, resource monitoring, compose stack deployment, volume management. Good for when the AI agent isn't available. |
| **[Restic](https://restic.net)** + cron | No backup strategy | Encrypted, deduplicated backups of Docker volumes (Cosmos data, Gitea repos, Grafana dashboards) to Azure Blob or S3. Scheduled via the platform-service jobs module. |
| **[SOPS](https://github.com/getsops/sops)** + [age](https://github.com/FiloSottile/age) | Plain `.env` files (secrets in cleartext) | Encrypt secrets in git. `sops -e .env.production > .env.production.enc`. Decrypt at deploy time. No Key Vault dependency for single-VM. |
### Tier 3 — Developer Experience
| Tool | Replaces | Why |
|------|----------|-----|
| **[Ollama](https://ollama.ai)** | External LLM API calls | Local LLM inference for LocalMemGPT, extraction-service, and AI coaching features. Already referenced in compose. GPU optional (CPU works for small models). |
| **[Windmill](https://windmill.dev)** | Custom bash/cron scripts | Open-source workflow engine (like n8n but code-first). Schedule package publishes, backup jobs, health sweeps, dependency updates. TypeScript/Python scripts with UI. |
| **[Caddy](https://caddyserver.com)** | Traefik (if Coolify isn't used) | Automatic HTTPS with zero config. Simpler than Traefik for single-domain setups. If using Coolify, Coolify handles this internally. |
### Decision Matrix: Coolify vs. Raw Docker Compose
| Factor | Coolify | Raw Compose |
|--------|---------|-------------|
| **Setup time** | ~15 min (one script) | ~6-7 hours |
| **SSL/HTTPS** | Automatic (Let's Encrypt) | Manual (Caddy/Traefik + certs) |
| **Git push deploy** | Built-in | Custom webhook + script |
| **Env var management** | Web UI per service | `.env` files |
| **Rollback** | One-click | `git revert` + rebuild |
| **Log viewer** | Built-in | Dozzle or Loki |
| **Resource monitoring** | Built-in dashboard | Portainer or Grafana |
| **Learning curve** | Low (GUI-driven) | Medium (YAML wrangling) |
| **Flexibility** | High (supports compose, Dockerfile, Nixpacks) | Maximum |
| **K8s migration path** | Export to compose, then convert | Direct conversion |
**Recommendation: Use Coolify for the VM deployment.** It's a mature, actively maintained project (36K+ GitHub stars) that handles the boring plumbing. Reserve raw compose/K3s for when you need multi-node or fine-grained control.
---
## 2. Enhanced Architecture (Single VM)
```
┌─────────────────────────────────────────────────────────────┐
│ Azure VM (32 GB) │
│ │
│ ┌─── Coolify (PaaS layer) ──────────────────────────────┐ │
│ │ • Git-push deploy for all repos │ │
│ │ • Automatic SSL via Let's Encrypt │ │
│ │ • Reverse proxy (built-in Traefik) │ │
│ │ • Environment variable management │ │
│ │ • Container lifecycle management │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Infrastructure ────────────────────────────────────┐ │
│ │ Gitea (npm registry + git + CI) port 3300 │ │
│ │ Cosmos DB Emulator port 8081 │ │
│ │ Valkey (Redis-compatible) port 6379 │ │
│ │ Azurite (blob storage) port 10000 │ │
│ │ Mailpit (SMTP sandbox) port 1025 │ │
│ │ Ollama (LLM inference) port 11434 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Shared Services ───────────────────────────────────┐ │
│ │ platform-service port 4003 (37 modules) │ │
│ │ extraction-service port 4005 (+ Python) │ │
│ │ mcp-server port 4007 (tool hub) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Product Backends (10) ─────────────────────────────┐ │
│ │ PeakPulse 4010 │ ChronoMind 4011 │ JarvisJr 4012 │ │
│ │ NomGap 4013 │ MindLyst 4014 │ LysnrAI 4015 │ │
│ │ NoteLett 4016 │ FlowMonk 4017 │ ActionTrail 4018 │ │
│ │ LocalMemGPT 4019 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Web Dashboards (11) ───────────────────────────────┐ │
│ │ admin 3001 │ user 3002 │ tracker 3003 │ │
│ │ NomGap 3040 │ MindLyst 3050 │ ChronoMind 3051 │ │
│ │ JarvisJr 3052 │ FlowMonk 3053 │ NoteLett 3054 │ │
│ │ ActionTrail 3060 │ LocalMemGPT 3070 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Observability ─────────────────────────────────────┐ │
│ │ Uptime Kuma (status page) port 3333 │ │
│ │ Dozzle (live log viewer) port 9999 │ │
│ │ Grafana + Loki (structured logs) port 3000/3100 │ │
│ │ Portainer CE (container mgmt) port 9443 │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌─── Automation ────────────────────────────────────────┐ │
│ │ Gitea Actions (CI runner) │ │
│ │ Restic (scheduled volume backups) │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Total containers: ~35** | **Estimated RAM: ~10 GB idle, ~18 GB under load** | **VM: 32 GB recommended**
---
## 3. Docker Build Optimizations
### 3.1 BuildKit Cache Mounts (already partially used)
Add pnpm store caching to all Dockerfiles for 3-5× faster rebuilds:
```dockerfile
# Current (good but slow on cache miss):
RUN --mount=type=secret,id=gitea_npm_token \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token)" && \
pnpm install --ignore-scripts --lockfile=false
# Enhanced (cache pnpm store across builds):
RUN --mount=type=secret,id=gitea_npm_token \
--mount=type=cache,id=pnpm-store,target=/root/.local/share/pnpm/store \
export GITEA_NPM_TOKEN="$(cat /run/secrets/gitea_npm_token)" && \
pnpm install --ignore-scripts --lockfile=false
```
### 3.2 Multi-Repo Parallel Build Script
```bash
#!/bin/bash
# scripts/build-all-images.sh — parallel Docker builds for all services
export DOCKER_BUILDKIT=1
export TOKEN="$(cat ~/.gitea-npm-token)"
build_image() {
local repo=$1 dockerfile=$2 tag=$3 context=$4
echo "Building $tag..."
docker build \
--add-host localhost:host-gateway \
--build-arg GITEA_NPM_HOST=host.docker.internal \
--secret id=gitea_npm_token,env=TOKEN \
-f "$dockerfile" -t "$tag" "$context" 2>&1 | tail -1
}
# Infrastructure builds (parallel)
build_image common-plat services/platform-service/Dockerfile bytelyst/platform-service:latest ./learning_ai_common_plat &
build_image common-plat services/extraction-service/Dockerfile bytelyst/extraction-service:latest ./learning_ai_common_plat &
build_image common-plat services/mcp-server/Dockerfile bytelyst/mcp-server:latest ./learning_ai_common_plat &
wait
# Product backends (parallel batches of 5)
build_image flowmonk backend/Dockerfile bytelyst/flowmonk-backend:latest ./learning_ai_flowmonk &
build_image notelett backend/Dockerfile bytelyst/notelett-backend:latest ./learning_ai_notes &
build_image actiontrail backend/Dockerfile bytelyst/actiontrail-backend:latest ./learning_ai_trails &
build_image localmemgpt backend/Dockerfile bytelyst/localmemgpt-backend:latest ./learning_ai_local_memory_gpt &
build_image nomgap backend/Dockerfile bytelyst/nomgap-backend:latest ./learning_ai_fastgap &
wait
build_image chronomind backend/Dockerfile bytelyst/chronomind-backend:latest ./learning_ai_clock &
build_image jarvisjr backend/Dockerfile bytelyst/jarvisjr-backend:latest ./learning_ai_jarvis_jr &
build_image peakpulse backend/Dockerfile bytelyst/peakpulse-backend:latest ./learning_ai_peakpulse &
build_image mindlyst backend/Dockerfile bytelyst/mindlyst-backend:latest ./learning_multimodal_memory_agents &
build_image lysnrai backend/Dockerfile bytelyst/lysnrai-backend:latest ./learning_voice_ai_agent &
wait
echo "All images built."
```
### 3.3 Docker Compose Profiles (selective startup)
```yaml
# In docker-compose.ecosystem.yml, add profiles:
services:
cosmos-emulator:
profiles: [infra, full]
platform-service:
profiles: [platform, full]
flowmonk-backend:
profiles: [products, full]
admin-web:
profiles: [web, full]
uptime-kuma:
profiles: [observability, full]
```
```bash
# Start only infra:
docker compose --profile infra up -d
# Start infra + platform:
docker compose --profile infra --profile platform up -d
# Start everything:
docker compose --profile full up -d
```
---
## 4. Valkey Integration Plan
### Why Valkey over raw Redis
- BSD-licensed fork (Redis switched to SSPL)
- Drop-in Redis-compatible (same protocol, same clients)
- Actively maintained by Linux Foundation
### What moves to Valkey
| Current | Moves to Valkey | Benefit |
|---------|----------------|---------|
| In-memory rate limit counters (`lib/rate-limiter.ts`) | `INCR` + `EXPIRE` | Survives restarts, shared across replicas |
| In-memory feature flag cache (`lib/feature-flags.ts`) | `GET`/`SET` with TTL | Instant cross-service flag propagation |
| In-memory TTL cache (`lib/cache.ts` in ActionTrail) | Redis `GET`/`SET` | Shared cache across service replicas |
| In-process event bus (`@bytelyst/events`) | Redis Pub/Sub | Cross-service event propagation |
| SSE hub connections | Redis Pub/Sub fan-out | Multi-replica SSE without sticky sessions |
| Session tokens (Cosmos queries) | Redis session store | Sub-ms session lookups |
### Compose entry
```yaml
valkey:
image: valkey/valkey:8-alpine
ports: ['6379:6379']
volumes: [valkey-data:/data]
command: valkey-server --save 60 1 --loglevel warning
healthcheck:
test: ['CMD', 'valkey-cli', 'ping']
interval: 10s
restart: unless-stopped
```
### Package: `@bytelyst/cache`
New shared package wrapping `ioredis` with Valkey connection:
```typescript
// packages/cache/src/index.ts
import Redis from 'ioredis';
export function createCacheClient(url = process.env.VALKEY_URL ?? 'redis://localhost:6379') {
return new Redis(url, { lazyConnect: true, maxRetriesPerRequest: 3 });
}
export function createPubSub(url = process.env.VALKEY_URL ?? 'redis://localhost:6379') {
return { publisher: new Redis(url), subscriber: new Redis(url) };
}
```
---
## 5. Coolify Setup (15-minute path)
### Prerequisites
- Ubuntu 24.04 VM with Docker installed
- Domain pointing to VM IP (e.g., `*.bytelyst.dev`)
### Install
```bash
# One command — installs Coolify + all dependencies
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash
```
### Configure for ByteLyst
1. **Add Gitea as git source** — Coolify connects to Gitea via API token
2. **Add each repo as a "service"** — Coolify auto-detects Dockerfile, builds, deploys
3. **Set env vars per service** — Web UI, encrypted at rest
4. **Enable auto-deploy** — Push to main → build → deploy (via Gitea webhook)
5. **Configure domains**`platform.bytelyst.dev`, `flowmonk.bytelyst.dev`, etc.
### What Coolify handles automatically
- Traefik reverse proxy with automatic SSL
- Docker image builds with BuildKit
- Container health checks and auto-restart
- Rolling deploys with zero-downtime
- Resource monitoring dashboard
- Persistent volume management
- Webhook-triggered deployments from Gitea
### What still needs manual compose
Coolify supports Docker Compose files natively, so the infra services (Cosmos emulator, Valkey, Ollama, etc.) can be deployed as a single compose stack through the Coolify UI.
---
## 6. Observability Stack
### Uptime Kuma — Status Page + Alerting
```yaml
uptime-kuma:
image: louislam/uptime-kuma:1
ports: ['3333:3001']
volumes: [uptime-kuma-data:/app/data]
restart: unless-stopped
```
**Configure 25+ monitors:**
- All `/health` endpoints (backends + services)
- Web dashboard reachability
- Cosmos emulator connectivity
- Gitea API availability
- Valkey ping
- Ollama model availability
### Dozzle — Live Container Logs
```yaml
dozzle:
image: amir20/dozzle:latest
ports: ['9999:8080']
volumes: ['/var/run/docker.sock:/var/run/docker.sock:ro']
restart: unless-stopped
```
### Keep Loki + Grafana for
- Structured log queries across time ranges
- Custom dashboards (request latency, error rates, Cosmos RU consumption)
- Alert rules based on log patterns
---
## 7. Backup Strategy (Restic)
```bash
# Install restic
apt install restic
# Init backup repo (Azure Blob, S3, local, or SFTP)
restic -r azure:bytelyst-backups:/ init
# Backup all Docker volumes
restic backup \
/var/lib/docker/volumes/gitea-data/ \
/var/lib/docker/volumes/cosmos-data/ \
/var/lib/docker/volumes/valkey-data/ \
/var/lib/docker/volumes/grafana-data/ \
/var/lib/docker/volumes/uptime-kuma-data/ \
--tag daily
# Cron: daily at 2 AM
echo "0 2 * * * restic backup ... --tag daily && restic forget --keep-daily 7 --keep-weekly 4 --prune" | crontab -
```
---
## 8. Secret Management (SOPS + age)
```bash
# Generate age key pair (one-time)
age-keygen -o ~/.config/sops/age/keys.txt
# Create .sops.yaml in repo root
cat > .sops.yaml << 'EOF'
creation_rules:
- path_regex: \.env\..*\.enc$
age: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF
# Encrypt secrets
sops -e .env.production > .env.production.enc
# Decrypt at deploy time (on VM)
sops -d .env.production.enc > .env.production
# Git-safe: .env.production.enc is committed, .env.production is gitignored
```
---
## 9. Revised Implementation Order
| Step | Time | What | Tools |
|------|------|------|-------|
| **1** | 15 min | Install Coolify on VM | `curl` one-liner |
| **2** | 30 min | Deploy infra compose stack (Cosmos, Valkey, Gitea, Azurite, Mailpit, Ollama) via Coolify | Coolify UI |
| **3** | 30 min | Publish 49 `@bytelyst/*` packages to VM's Gitea | `scripts/publish-all.sh` |
| **4** | 1 hr | Add all 13 service repos to Coolify (auto-detect Dockerfile, set env vars) | Coolify UI |
| **5** | 30 min | Deploy Uptime Kuma + configure 25+ health monitors | Coolify + Uptime Kuma UI |
| **6** | 15 min | Deploy Dozzle for live log viewing | Coolify |
| **7** | 30 min | Configure SOPS + age for secrets, encrypt `.env.production` | CLI |
| **8** | 30 min | Configure Restic backups for all stateful volumes | CLI + cron |
| **9** | 30 min | Smoke test: hit all `/health` endpoints, verify Uptime Kuma green | Browser + curl |
**Total: ~4.5 hours** (down from 6-7 hours without Coolify)
---
## 10. VM Sizing (Updated)
### Minimum (32 GB) — dev/staging, no Ollama
| Component | Count | RAM |
|-----------|-------|-----|
| Cosmos DB Emulator | 1 | ~2 GB |
| Fastify backends | 13 | ~2 GB |
| Next.js web apps | 11 | ~2.2 GB |
| Valkey | 1 | ~100 MB |
| Infra (Traefik, Loki, Grafana, Azurite, Mailpit, Gitea) | 7 | ~1 GB |
| Observability (Uptime Kuma, Dozzle, Portainer) | 3 | ~300 MB |
| Coolify overhead | 1 | ~500 MB |
| **Subtotal** | **~37** | **~8.1 GB** |
| Headroom for builds + spikes | — | ~24 GB |
### Recommended (64 GB) — with Ollama (7B models)
Same as above + Ollama (~8 GB for llama3:8b) = ~16 GB active, ~48 GB headroom.
### Cloud Pricing (updated)
| Provider | Instance | vCPU | RAM | Price |
|----------|----------|------|-----|-------|
| **Hetzner** | CPX51 | 16 | 32 GB | **~€45/mo** ← best value |
| **Hetzner** | CCX33 | 8 | 32 GB | **~€55/mo** (dedicated) |
| **Azure** | Standard_D8s_v5 | 8 | 32 GB | ~$280/mo |
| **Home** | Mac Mini M4 Pro | 12 | 48 GB | One-time ~$1,600 |
---
## 11. What This Plan Does NOT Cover (Future Work)
- **Multi-node K3s** — Phase 2, same manifests, add workers with `k3s agent`
- **Managed Kubernetes** (AKS/EKS) — Phase 3, same manifests + Helm chart
- **CI/CD pipeline** for automated package publish → image build → deploy
- **Custom domain + DNS** — depends on registrar choice
- **WAF / DDoS protection** — Cloudflare free tier in front of Coolify
- **Mobile app distribution** — TestFlight / Play Console (separate from VM)