feat(infra): Phase 2.3 — memory limits across all active Docker stacks
Apply deploy.resources.limits.memory to 45 services across 5 compose files.
Limits take effect on next docker compose up (no running containers affected).
Limits derived from 2-day Prometheus RSS baseline (avg of 2026-05-27-29):
common_plat ecosystem (37 services):
cosmos-emulator: 1g (319 MiB baseline, can spike on writes)
loki: 384m (75 MiB)
prometheus: 384m (91 MiB, grows with series cardinality)
node-exporter: 128m (21 MiB, very stable)
cadvisor: 256m (38 MiB)
valkey: 128m (tiny)
caddy: 256m (35 MiB)
platform-service: 512m (61 MiB)
extraction-service: 512m (99 MiB, Python sidecar)
mcp-server: 384m (21 MiB)
product backends: 512m (30-65 MiB each)
product webs: 512m (35-93 MiB each)
llmlab-dashboard: 512m (Ollama proxy, larger cache budget)
dashboard (2 services): backend 512m, web 512m
invttrdg (2 services): backend 768m (159 MiB + heavy state writes),
web 256m (nginx SPA)
clock/chronomind (2 services): backend 512m, web 512m
notes/notelett (2 services): backend 512m, web 512m
Ollama host process has NO limit (model load unpredictable, up to 8 GB).
trading-backend compose file not on disk — limit not applied.
gitea-npm-registry started manually — limit not applied.
Monitor OOMKill for 48h after next stack restart:
dmesg | grep -i oom
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
42c3b9cdd5
commit
253e888a24
@ -44,6 +44,10 @@ services:
|
|||||||
# Reach the host for Ollama API (port 11434) and host-only services
|
# Reach the host for Ollama API (port 11434) and host-only services
|
||||||
- "host-gateway:host-gateway"
|
- "host-gateway:host-gateway"
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 512m
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ['CMD', 'curl', '-f', 'http://localhost:4004/health']
|
test: ['CMD', 'curl', '-f', 'http://localhost:4004/health']
|
||||||
interval: 30s
|
interval: 30s
|
||||||
@ -70,6 +74,10 @@ services:
|
|||||||
- default
|
- default
|
||||||
- platform_net
|
- platform_net
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 512m
|
||||||
depends_on:
|
depends_on:
|
||||||
backend:
|
backend:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user