- Short-TTL (30s) snapshot cache + in-flight coalescing so the panel poll and
concurrent refreshes don't fan out ~20 systemctl/git/ps/du subprocesses each
time; snapshot carries a `cached` flag and `getHermesOpsSnapshot({force})`.
- Distinguish "unit inactive" (down) from "probe couldn't run" (unknown): a new
exec() wrapper reports whether the command actually ran (ENOENT/timeout =
unknown) vs exited non-zero with output (e.g. systemctl is-active -> inactive).
Per-field ProbeStatus on gateway/dashboard/timer/repo; warnings differentiate
"is not active" from "status could not be determined".
- Robust Bheem/Uma checks: `runuser -u uma -- systemctl --user is-active/
is-enabled` with a ps / existsSync fallback so a failed probe degrades to the
legacy check instead of a false "down".
- Zod schema (HermesOpsSnapshotSchema) as the stable typed contract; the route
validates output before sending. New status fields are additive (active/
enabled/url/etc. preserved) so the existing web client is unaffected.
- Unit tests (mock execFile/fs): healthy snapshot, down vs unknown mapping,
runuser->ps fallback, unreadable repo, cache hit + force bypass, request
coalescing. Backend: 16 tests green.
Roadmap: check off Phase 1 items and Phase 5 P0 in hermes_dashboard_v2_roadmap.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ci.yml: actions/checkout into the runner workspace instead of cd-ing into a
hard-coded host path and `git reset --hard origin/main` on the live checkout;
install via `pnpm install:gitea` (self-contained, no sibling common-plat
checkout); E2E step left as a TODO pointer (ci-e2e-hardening, Phase 5 P2).
- Fix the same stale /opt/bytelyst/bytelyst-devops-tools path in deploy.sh,
scripts/deploy-hotcopy.sh, DEPLOYMENT.md, DEPLOYMENT_GUIDE.md.
- Replace the no-op `lint` echoes with real ESLint 9 flat configs (js +
typescript-eslint recommended) for backend and web; add a root `pnpm lint`.
- Fix the 10 errors lint surfaced, incl. require('os') in an ESM backend
(system/repository.ts -> import * as os), prefer-const x4, and a ternary
expression-statement in web vm/page.tsx.
Verified locally: secret-scan, lint (0 errors; correctly fails on bad code),
typecheck, unit tests (backend 9 / web 11), and build all green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- prometheus.ts: new Prometheus client with 7d/30d range queries for disk,
memory, swap, CPU steal, and disk I/O (GB/hr); getWeeklyDigestData()
aggregates all metrics for digest and API endpoint
- routes.ts: GET /api/vm/metrics/trend?metric=…&range=… and
GET /api/vm/weekly-digest endpoints
- api.ts: TrendPoint/TrendSeries types; getTrend() and getMemoryTrend()
added to vmApi
- vm/page.tsx: Sparkline (pure SVG polyline+fill), TrendCard with
latest/avg/peak and threshold colouring, TrendsPanel with lazy load
on first open; Promise.allSettled() isolation for all 5 data panels
- vm-weekly-digest.sh: weekly Telegram digest via docker exec into
devops-backend to reach Prometheus; emoji severity indicators; cron
summary from /var/log/vm-cleanup.log
- systemd timer: Mon 08:00 UTC, Persistent=true (fires on next boot
if missed); first trigger 2026-06-02
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- code-quality/repository.ts: fix tsErrorMatch[3] → [4] for type field (group 3 is column, 4 is error|warning)
- code-quality/repository.ts: fix ESLint regex to make rule brackets optional (not all formatters include them)
- code-quality/repository.ts: fix Vitest test count — parse 'Tests' line (individual tests) instead of 'Test Files' (file count); improve Jest regex to capture pass/fail independently
- env/repository.ts: replace raw process.env.ENCRYPTION_KEY with config.ENCRYPTION_KEY so the validated default flows through a single source of truth
- config.ts: add startup console.warn when CSRF_SECRET or ENCRYPTION_KEY are using insecure defaults
- deployments/orchestrator.ts: refactor runDeploymentScript to use try/catch/finally — deployment record is now always written in the finally block, preventing zombie 'running' states if updateDeployment itself throws
- auth.tsx: remove dead 'user &&' guard (user is always truthy after the !user check above); remove debug console.log calls, keep console.error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Security (backend):
- env/routes: add requireAdmin to all 6 env endpoints — GET /env was
fully open, exposing all secret values to unauthenticated requests
- deployments/routes: add requireAdmin to all 4 GET endpoints (deployment
history and logs were publicly readable)
- health/routes: remove duplicate requireAdmin call from DELETE /health/cache
handler body (was already enforced via preHandler)
Frontend — auth/api:
- system/page: replace raw fetch + localStorage token with apiRequest
(mutations now go through CSRF flow)
- vm/page: same — replace raw fetch with vmApi from api.ts
- api.ts: add vmApi (getHealth, getCleanupLog, runCleanup) + shared
VmHealthResult / VmCheck / VmCheckLevel types
Shared utilities:
- utils.ts: add formatBytes() and getStatusColor() shared helpers
- system/page: remove duplicate formatBytes, import from utils
- health/page: remove duplicate getStatusColor, import from utils
- page.tsx (home): remove duplicate getStatusColor, import from utils
UX improvements:
- page.tsx: remove Seed Services button from normal header (debug tool)
- page.tsx: deploy button now always enabled; shows inline warning banner
when service is not 'up' instead of silently disabling the button
- metrics: fix bar chart — bars now grow from bottom (flex-col-reverse),
add empty state, fix date parsing timezone edge case
- sidebar-nav: theme toggle now functional — persists to localStorage and
toggles document.documentElement class 'dark'
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add docker-compose.yml following trading web pattern
- Update web Dockerfile to use multi-stage build with metadata
- Add build metadata (commit SHA, branch, timestamp, author, message)
- Rewrite deploy.sh to use docker compose with build metadata
- Add hotcopy deployment script for quick updates
- Add comprehensive backend API with deployment orchestration
- Add health checks, service management, and monitoring endpoints
- Add CI/CD workflow configuration
- Add deployment documentation and guides
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>