Apply deploy.resources.limits.memory to 45 services across 5 compose files.
Limits take effect on next docker compose up (no running containers affected).
Limits derived from 2-day Prometheus RSS baseline (avg of 2026-05-27-29):
common_plat ecosystem (37 services):
cosmos-emulator: 1g (319 MiB baseline, can spike on writes)
loki: 384m (75 MiB)
prometheus: 384m (91 MiB, grows with series cardinality)
node-exporter: 128m (21 MiB, very stable)
cadvisor: 256m (38 MiB)
valkey: 128m (tiny)
caddy: 256m (35 MiB)
platform-service: 512m (61 MiB)
extraction-service: 512m (99 MiB, Python sidecar)
mcp-server: 384m (21 MiB)
product backends: 512m (30-65 MiB each)
product webs: 512m (35-93 MiB each)
llmlab-dashboard: 512m (Ollama proxy, larger cache budget)
dashboard (2 services): backend 512m, web 512m
invttrdg (2 services): backend 768m (159 MiB + heavy state writes),
web 256m (nginx SPA)
clock/chronomind (2 services): backend 512m, web 512m
notes/notelett (2 services): backend 512m, web 512m
Ollama host process has NO limit (model load unpredictable, up to 8 GB).
trading-backend compose file not on disk — limit not applied.
gitea-npm-registry started manually — limit not applied.
Monitor OOMKill for 48h after next stack restart:
dmesg | grep -i oom
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
92 lines
2.9 KiB
YAML
92 lines
2.9 KiB
YAML
# Production-mode compose for DevOps Dashboard
|
|
# Usage:
|
|
# docker compose up --build
|
|
#
|
|
# Requires:
|
|
# - backend/.env populated (copy from backend/.env.example)
|
|
# - web/.env.local populated (copy from web/.env.local.example)
|
|
#
|
|
# For hot-reload dev mode use:
|
|
# docker compose -f docker-compose.yml -f docker-compose.dev.yml up
|
|
|
|
services:
|
|
# ---------------------------------------------------------------------------
|
|
# Backend — DevOps API service
|
|
# ---------------------------------------------------------------------------
|
|
backend:
|
|
build:
|
|
context: .
|
|
dockerfile: backend/Dockerfile
|
|
args:
|
|
BYTELYST_PACKAGE_SOURCE: ${BYTELYST_PACKAGE_SOURCE:-vendor}
|
|
container_name: devops-backend
|
|
env_file:
|
|
- backend/.env
|
|
environment:
|
|
- VM_SCRIPTS_PATH=/vm-scripts/VMs/HostingerVM
|
|
- VM_LOG_DIR=/host-logs
|
|
ports:
|
|
- '127.0.0.1:4004:4004'
|
|
networks:
|
|
- default
|
|
- platform_net
|
|
volumes:
|
|
# Read-only access to VM management scripts
|
|
- /opt/bytelyst/learning_ai_devops_tools/scripts:/vm-scripts:ro
|
|
# Read-write access to VM log files (cleanup + health-check write here)
|
|
- /var/log/vm-cleanup.log:/host-logs/vm-cleanup.log
|
|
- /var/log/vm-health-check.log:/host-logs/vm-health-check.log
|
|
- /var/log/docker-watchdog.log:/host-logs/docker-watchdog.log
|
|
# Docker socket — allows running docker commands against the host daemon
|
|
# (same pattern as Portainer/cAdvisor; container already runs as root)
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
|
extra_hosts:
|
|
# Reach the host for Ollama API (port 11434) and host-only services
|
|
- "host-gateway:host-gateway"
|
|
restart: unless-stopped
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 512m
|
|
healthcheck:
|
|
test: ['CMD', 'curl', '-f', 'http://localhost:4004/health']
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
start_period: 15s
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Web — Next.js dashboard
|
|
# ---------------------------------------------------------------------------
|
|
web:
|
|
build:
|
|
context: .
|
|
dockerfile: web/Dockerfile
|
|
args:
|
|
BYTELYST_PACKAGE_SOURCE: ${BYTELYST_PACKAGE_SOURCE:-vendor}
|
|
NEXT_PUBLIC_PRODUCT_ID: ${NEXT_PUBLIC_PRODUCT_ID:-devops}
|
|
NEXT_PUBLIC_PLATFORM_URL: https://api.bytelyst.com/platform/api
|
|
NEXT_PUBLIC_DEVOPS_API_URL: https://api.bytelyst.com/devops
|
|
container_name: devops-web
|
|
ports:
|
|
- '127.0.0.1:3049:3000'
|
|
networks:
|
|
- default
|
|
- platform_net
|
|
restart: unless-stopped
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 512m
|
|
depends_on:
|
|
backend:
|
|
condition: service_healthy
|
|
environment:
|
|
- NODE_ENV=production
|
|
|
|
networks:
|
|
default: {}
|
|
platform_net:
|
|
external: true
|
|
name: learning_ai_common_plat_default
|