bytelyst-devops-tools

Author	SHA1	Message	Date
Hermes VM	74a8ee0993	feat(dashboard): close 3 of 5 Phase 5 P2 mitigation items (allow-list, projectPath, audit-log) Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md mitigation roadmap that don't need infra changes. Two P2 items remain (non-root container, docker-proxy daemon) — both genuinely need container/orchestration work and stay queued. 1. Allow-list shell wrapper (P1) New `lib/shell.ts`: - `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no interpolation. Single escape hatch for ad-hoc invocations. - `dockerRestart(name)` — name validated against `[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError on anything else (including non-strings, shell metacharacters, command-substitution attempts). Tests cover all of these. - `dockerPrune(kind, {all?})` — kind constrained to {container,image,volume,builder}; `--all` only valid for image. - `runBashScript(path, args, {allowedRoots})` — script path AND cwd both checked against allowed roots; rejects `..` escapes and prefix-matching siblings (`/opt/projects-evil` vs `/opt/projects`). - `runNpmScript(script, {cwd, allowedRoots})` — script ∈ {typecheck,lint,build,test,test:run,start}; cwd inside roots. 17 unit tests cover every rejection path. Module added to the coverage gate (≥95% lines). Migrated highest-risk callers off template-literal `exec`: - `vm/repository.ts:restartContainer` → `dockerRestart`. Was previously `await execAsync(\`docker restart "${name}"\`)` with only a regex check; now goes through the wrapper. - `system/repository.ts:dockerCleanup` → `dockerPrune` per kind + `execAllowed` for `docker system df`. Drops the array of template-literal command strings entirely. - `code-quality/repository.ts` → `runNpmScript` for every lifecycle invocation. cwd is now the resolved (normalised, `..`-collapsed) path, not the raw input. 2. projectPath validation for /code-quality/check (P1) `runCodeQualityCheck` now calls `assertPathInAllowedRoots(projectPath, getAllowedRoots())` before any subprocess spawns. `getAllowedRoots()` reads `CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to `/opt/bytelyst`). Rejection happens with a clear error message listing the configured roots so operators know what to allow. 3. Audit-log every privileged shell-out (P2) `audit/types.ts` extended: `action` now includes `'shell-exec'`, `entityType` includes `'host'`. The migration is additive — old audit rows still validate. Three privileged routes now write a `shell-exec` audit row with actor (authUserId / authRole), entity id, and a sanitized details payload before responding: - `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`, details include {type, force, freedSpace}. - `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`. - `POST /vm/containers/:name/restart` — `entityId: container-restart:<name>`, details include {success, message}. Audited even on failure so attempted privileged actions are still recorded. Audit writes are best-effort — a Cosmos hiccup logs a warn but never fails the request the operator was running. Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate ≥95% lines on every gated file (which now includes shell.ts). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 08:18:50 +00:00
Hermes VM	1e64d75fd4	feat(dashboard): Phase 5 P2 — structured pino logging with redaction First half of Phase 5 P2 (the "structured backend logging" piece; E2E-in-CI lands separately so the diff stays reviewable). Adds `lib/logger.ts` exporting a singleton pino instance shared between Fastify (via `loggerInstance`) and any non-request code path. One configured logger across the backend means uniform formatting, redaction, and log-level control: - LOG_LEVEL env knob (defaults: debug in non-prod, info in prod when NODE_ENV=production). Documented in `.env.example`. - Built-in redaction for Authorization / Cookie headers and the common secret-shaped field names (password, token, refreshToken, accessToken, csrfToken, JWT_SECRET, CSRF_SECRET, ENCRYPTION_KEY, COSMOS_KEY, AZURE_CLIENT_SECRET) so an accidental `req.log.info(req.body)` or `logger.error({ err, config }, …)` won't dump credentials. This is a backstop, not the primary defense — call sites should still avoid logging raw config/req. - JSON to stdout in every environment. Pipe through `pino-pretty` locally if you want pretty output; we deliberately don't bundle pino-pretty as a runtime dep. - `childLogger(module)` helper tags log lines with their origin so repositories/background workers don't have to repeat the module name on every line. Sweeps the runtime `console.error` sites that lose request context (deployment orchestrator background fire-and-forget, system docker stats/cleanup, backup CRUD, vm getAllContainers) onto the structured logger. CLI-only modules (`scripts/run-migrations.ts`, `migrations/index.ts`, `cosmos-init.ts` startup, `azure-keyvault.ts`, `config.ts` env warnings, `lib/migrations.ts` no-op message) keep `console.*` for now — they run before Fastify is up and are queued for a separate cleanup pass. Tests, typecheck, lint (0 errors), build green. Coverage gate still passing (≥95% lines on every gated file). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-30 07:18:44 +00:00
Hermes VM	42c3b9cdd5	feat(dashboard/vm): Phase 3.3 — All Containers panel with CPU/RAM, logs, bulk restart - repository.ts: getAllContainers() — batch docker inspect + docker stats --no-stream merged by container name; returns state, health, uptime, CPU%, RAM, memLimitMiB (0=no limit), restart count, stack from compose label; getContainerLogs() — docker logs --tail --timestamps - routes.ts: GET /api/vm/containers (all, with stats; ~3s for 38 containers), GET /api/vm/containers/:name/logs?lines=N - api.ts: ContainerInfo interface; vmApi.getAllContainers(), vmApi.getContainerLogs() - vm/page.tsx: ContainersPanel — collapsible (lazy-loads on first open); filter chips (All/Running/Unhealthy/No Limit) + stack dropdown; per-row log viewer (inline pre, dark bg, 50-line tail); per-row restart button; bulk "Restart N unhealthy" with confirmation modal; Fragment key pattern for row+log-row pairs I/O anomaly (Phase 0.3) root cause identified: invttrdg-backend and trading-backend write bot_state.json + .bak on every market tick (5×/min and 2×/min respectively) into container overlay layer → ~6 GB/day — intentional bot behaviour, no fix needed, trend chart already in place to monitor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 05:26:49 +00:00
Hermes VM	8d32cb7980	feat(dashboard/vm): Phases 4.1-4.3 — Prometheus trends, sparklines, weekly digest - prometheus.ts: new Prometheus client with 7d/30d range queries for disk, memory, swap, CPU steal, and disk I/O (GB/hr); getWeeklyDigestData() aggregates all metrics for digest and API endpoint - routes.ts: GET /api/vm/metrics/trend?metric=…&range=… and GET /api/vm/weekly-digest endpoints - api.ts: TrendPoint/TrendSeries types; getTrend() and getMemoryTrend() added to vmApi - vm/page.tsx: Sparkline (pure SVG polyline+fill), TrendCard with latest/avg/peak and threshold colouring, TrendsPanel with lazy load on first open; Promise.allSettled() isolation for all 5 data panels - vm-weekly-digest.sh: weekly Telegram digest via docker exec into devops-backend to reach Prometheus; emoji severity indicators; cron summary from /var/log/vm-cleanup.log - systemd timer: Mon 08:00 UTC, Persistent=true (fires on next boot if missed); first trigger 2026-06-02 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 05:26:49 +00:00
Hermes VM	2fc23d6baa	feat(vm): fix devops-backend VM module — Phase 0.1 complete - Switch backend runner from node:20-alpine to node:20-slim so GNU df flags (--output=pcent/avail) work inside the container - Add volume mounts to docker-compose.yml: scripts (ro), VM logs (rw), docker.sock; set VM_SCRIPTS_PATH + VM_LOG_DIR env vars - Rebuild repository.ts: env-configurable paths, cron history parser, unhealthy-container inspector, Ollama model endpoints - Add routes: GET /api/vm/cron-status, unhealthy containers, Ollama models, container restart, model unload - vm-cleanup.sh: add step_cosmos_pglog, step_docker_aged_images; fix (( count++ )) → count=$(( count + 1 )) for set -e compatibility - Add docs/VM_OBSERVABILITY_ROADMAP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:13:45 +00:00
Hermes VM	d0b8ce2c74	feat: add VM Health page to devops dashboard Backend (Fastify): - New module: modules/vm/ (types, repository, routes) - GET /api/vm/health — runs vm-health-check.sh --json, returns structured result - GET /api/vm/cleanup-log — tails /var/log/vm-cleanup.log - POST /api/vm/cleanup — triggers vm-cleanup.sh (weekly / monthly / dry-run) - Registered vmRoutes in server.ts Frontend (Next.js): - New page: /vm — VM Health - Overall status banner (OK/WARN/CRIT) with issue summary - Per-check cards: disk, load, RAM, swap, crash loops, container health, build cache, docker images, journal, syslog — color-coded by level - Cleanup trigger buttons (dry-run, weekly, monthly) with output viewer - Collapsible cleanup log viewer (last 40 lines) - Auto-refresh every 60s - sidebar-nav.tsx: added 'VM Health' entry with Server icon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00

6 Commits