Commit Graph

13 Commits

Author SHA1 Message Date
Hermes VM
a05702c315 Add Hermes token analytics dashboard 2026-06-06 03:34:49 +00:00
Hermes VM
02b362399b feat: complete hermes telemetry dashboard wiring 2026-05-31 08:28:26 +00:00
Hermes VM
ad16b1308e feat(dashboard): Phase 3 slice 1 — hermes telemetry contract + backend endpoint
First slice of Phase 3 ("real per-instance telemetry"). Defines the
read-only artifact contract from Decision #1 (sessions, cron, memory,
skills, watchdog alerts, backup history) and ships an admin-gated
backend endpoint that probes the live Hermes instance, gracefully
degrading to status:'unknown' wherever the source isn't readable.

What's new
  - `backend/src/modules/hermes-telemetry/types.ts` — Zod schemas for
    every section of the snapshot, plus a `HermesProbeStatus` reused
    from hermes-ops so the UI can distinguish "definitely empty" from
    "couldn't read the source" for each section independently.
  - `backend/src/modules/hermes-telemetry/repository.ts` — implementation
    that:
      * shells out via `runuser -u <user> --` for cross-user instances
        (Bheem/uma) the same way `hermes-ops/repository.ts` does;
      * parses `hermes sessions stats / cron list / memory list /
        skills list --json` when the CLI is present, otherwise
        reports status:'unknown';
      * tails the watchdog log and buckets each line by severity
        (critical / warn / info);
      * pulls `git -C <repo> log` against the instance's backup repo
        for backup history;
      * caches per-instance with a 30s TTL + in-flight coalescing,
        same pattern as hermes-ops.
  - `backend/src/modules/hermes-telemetry/routes.ts` — admin-only GET
    `/api/hermes/telemetry/:instance` (the `instance` path param is
    Zod-validated; the response is validated against
    `HermesTelemetrySnapshotSchema` before send so a shape regression
    surfaces here, not in the UI).
  - `backend/src/modules/hermes-telemetry/hermes-telemetry.test.ts` —
    6 unit tests: ENOENT-on-everything case validates against the
    schema, JSON-parse path for sessions/cron/memory/skills, watchdog
    log severity bucketing, backup-history `git log` parsing, cache
    hit, per-instance cache isolation. Coverage: 95.17% lines on the
    new repository module.
  - `backend/vitest.config.ts` — telemetry repository added to the
    coverage gate's `include` list (ratchet).
  - `web/src/lib/api.ts` — typed surface for the new endpoint:
    `HermesTelemetrySnapshot` + sub-types + `api.getHermesTelemetry`.

What's NOT in this slice
  - UI consumption. The Task Ledger / Agents / History panes still
    render mock data; converting them is queued for the next slices.
    This slice ships the contract + the backend so those slices can
    build on a stable shape.
  - Backward-compat replacement of `/api/hermes/ops` (which is
    unauthenticated today). That comes with the Phase 7 auth pass.

Verified: backend typecheck , 57/57 unit tests , web typecheck ,
lint 0 errors, coverage gate ≥95% lines on every gated file.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-30 07:53:37 +00:00
Hermes VM
3ee4e7104e fix(dashboard): Phase 5 P0 — correct CI workspace path + real ESLint
- ci.yml: actions/checkout into the runner workspace instead of cd-ing into a
  hard-coded host path and `git reset --hard origin/main` on the live checkout;
  install via `pnpm install:gitea` (self-contained, no sibling common-plat
  checkout); E2E step left as a TODO pointer (ci-e2e-hardening, Phase 5 P2).
- Fix the same stale /opt/bytelyst/bytelyst-devops-tools path in deploy.sh,
  scripts/deploy-hotcopy.sh, DEPLOYMENT.md, DEPLOYMENT_GUIDE.md.
- Replace the no-op `lint` echoes with real ESLint 9 flat configs (js +
  typescript-eslint recommended) for backend and web; add a root `pnpm lint`.
- Fix the 10 errors lint surfaced, incl. require('os') in an ESM backend
  (system/repository.ts -> import * as os), prefer-const x4, and a ternary
  expression-statement in web vm/page.tsx.

Verified locally: secret-scan, lint (0 errors; correctly fails on bad code),
typecheck, unit tests (backend 9 / web 11), and build all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 06:50:32 +00:00
Hermes VM
42c3b9cdd5 feat(dashboard/vm): Phase 3.3 — All Containers panel with CPU/RAM, logs, bulk restart
- repository.ts: getAllContainers() — batch docker inspect + docker stats
  --no-stream merged by container name; returns state, health, uptime,
  CPU%, RAM, memLimitMiB (0=no limit), restart count, stack from compose
  label; getContainerLogs() — docker logs --tail --timestamps
- routes.ts: GET /api/vm/containers (all, with stats; ~3s for 38
  containers), GET /api/vm/containers/:name/logs?lines=N
- api.ts: ContainerInfo interface; vmApi.getAllContainers(),
  vmApi.getContainerLogs()
- vm/page.tsx: ContainersPanel — collapsible (lazy-loads on first open);
  filter chips (All/Running/Unhealthy/No Limit) + stack dropdown;
  per-row log viewer (inline pre, dark bg, 50-line tail); per-row
  restart button; bulk "Restart N unhealthy" with confirmation modal;
  Fragment key pattern for row+log-row pairs

I/O anomaly (Phase 0.3) root cause identified: invttrdg-backend and
trading-backend write bot_state.json + .bak on every market tick
(5×/min and 2×/min respectively) into container overlay layer →
~6 GB/day — intentional bot behaviour, no fix needed, trend chart
already in place to monitor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 05:26:49 +00:00
Hermes VM
8d32cb7980 feat(dashboard/vm): Phases 4.1-4.3 — Prometheus trends, sparklines, weekly digest
- prometheus.ts: new Prometheus client with 7d/30d range queries for disk,
  memory, swap, CPU steal, and disk I/O (GB/hr); getWeeklyDigestData()
  aggregates all metrics for digest and API endpoint
- routes.ts: GET /api/vm/metrics/trend?metric=…&range=… and
  GET /api/vm/weekly-digest endpoints
- api.ts: TrendPoint/TrendSeries types; getTrend() and getMemoryTrend()
  added to vmApi
- vm/page.tsx: Sparkline (pure SVG polyline+fill), TrendCard with
  latest/avg/peak and threshold colouring, TrendsPanel with lazy load
  on first open; Promise.allSettled() isolation for all 5 data panels
- vm-weekly-digest.sh: weekly Telegram digest via docker exec into
  devops-backend to reach Prometheus; emoji severity indicators; cron
  summary from /var/log/vm-cleanup.log
- systemd timer: Mon 08:00 UTC, Persistent=true (fires on next boot
  if missed); first trigger 2026-06-02

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 05:26:49 +00:00
Hermes VM
7047d625ef feat(dashboard/vm): Phases 1.1, 1.3, 3.1, 3.4 — VM page panels
Phase 3.1 — VM Score Card (0–100):
- 6 weighted dimensions: steal time, RAM, disk, service health,
  maintenance hygiene, LLM readiness (matching roadmap scoring)
- Color-coded gauge + per-dimension progress bars with detail text
- Computed from health + cron + unhealthy data; degrades gracefully
  when any source is unavailable

Phase 1.3 — Unhealthy Container Detail Panel:
- Loads independently from GET /api/vm/containers/unhealthy
- Per-container: name, unhealthy since, restart count, last health logs
- Expandable row for health check output
- One-click restart with spinner, feedback toast, auto-refresh after 3s

Phase 1.1 — Cron Status Panel:
- Loads from GET /api/vm/cron-status
- Table: 4 managed jobs × schedule | last run | freed MB | status | next
- Collapsible run history (last 10) with step-by-step log expansion

Phase 3.4 — Ollama/LLM Panel:
- Loads from GET /api/vm/ollama/models
- Currently-loaded section with RAM pressure warning (<4 GB free)
- RAM bar visualisation showing model footprint
- Model list with size + last-used time
- One-click unload button

Other improvements:
- All data fetched in parallel (Promise.allSettled) — any panel failure
  does not block the rest of the page
- Add steal, failed_units, cron_missing_paths to CHECK_META/CHECK_ORDER
- Refresh now updates all 5 data sources atomically
- web/package-lock.json regenerated (was stale, caused build failure)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 21:49:23 +00:00
90f6db2014 Complete Hermes ops dashboard and roadmap 2026-05-27 20:53:58 +00:00
Hermes VM
1099d518ef improve: dashboard security, code quality, and UX fixes
Security (backend):
- env/routes: add requireAdmin to all 6 env endpoints — GET /env was
  fully open, exposing all secret values to unauthenticated requests
- deployments/routes: add requireAdmin to all 4 GET endpoints (deployment
  history and logs were publicly readable)
- health/routes: remove duplicate requireAdmin call from DELETE /health/cache
  handler body (was already enforced via preHandler)

Frontend — auth/api:
- system/page: replace raw fetch + localStorage token with apiRequest
  (mutations now go through CSRF flow)
- vm/page: same — replace raw fetch with vmApi from api.ts
- api.ts: add vmApi (getHealth, getCleanupLog, runCleanup) + shared
  VmHealthResult / VmCheck / VmCheckLevel types

Shared utilities:
- utils.ts: add formatBytes() and getStatusColor() shared helpers
- system/page: remove duplicate formatBytes, import from utils
- health/page: remove duplicate getStatusColor, import from utils
- page.tsx (home): remove duplicate getStatusColor, import from utils

UX improvements:
- page.tsx: remove Seed Services button from normal header (debug tool)
- page.tsx: deploy button now always enabled; shows inline warning banner
  when service is not 'up' instead of silently disabling the button
- metrics: fix bar chart — bars now grow from bottom (flex-col-reverse),
  add empty state, fix date parsing timezone edge case
- sidebar-nav: theme toggle now functional — persists to localStorage and
  toggles document.documentElement class 'dark'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 18:53:20 +00:00
root
0e6528b366 Add live Hermes operations dashboard 2026-05-27 13:04:36 +00:00
root
dea1546d9f feat: add Hermes mission control dashboard 2026-05-26 08:27:59 +00:00
root
85f21ae9f6 feat(devops): restore dashboard build and log polling 2026-05-18 09:00:56 +00:00
root
b35de88b08 feat(devops-web): fix responsive layout and add comprehensive dashboard pages
- Fix sidebar layout: use flexbox instead of margin-left approach
- Update sidebar to use responsive display (hidden on mobile, static on desktop)
- Fix mobile overlay z-index and positioning issues
- Add proper flex container structure to all pages
- Add new dashboard pages: health, metrics, system, env, code-quality, settings/cosmos
- Add comprehensive API client and type definitions
- Add error boundary and log viewer components
- Add test infrastructure with Vitest and Playwright
- Add Docker configuration and deployment scripts

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-11 03:10:31 +00:00