From eaaa545e6c4d87acbd1ff4d4259a94c9a2bee341 Mon Sep 17 00:00:00 2001 From: Hermes VM Date: Sat, 30 May 2026 08:26:26 +0000 Subject: [PATCH] feat(dashboard): close Phase 6 (trend cards + theme toggle), drop-root scaffold, Agents inventory, Phase 0 reconfirm MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the remaining tractable items from the carry-forward queue. 1. Drop-root scaffold for the backend container (P2 mitigation) `backend/Dockerfile` adds non-root `app` user (uid 1001) + `docker` group (gid via `DOCKER_GID` build arg, default 999). `BACKEND_USER` build arg defaults to `root` so existing deployments keep working; set it to `app` plus `DOCKER_GID=$(getent group docker | cut -d: -f3)` to flip the runtime non-root. `dashboard/DEPLOYMENT.md` gets a new "Running non-root" section with the exact `chgrp`/`chmod` recipe for the bind-mounted log files (the host-side prep that pairs with the build flip). DEPLOYMENT.md mitigation roadmap updated. 2. Phase 6 trend cards `lib/hermes-ops-history.ts` keeps the last 24 ops snapshots in localStorage (de-duped on `generatedAt`, schema-guarded on read, degrades silently on quota exceeded). Three trend cards in the ops panel: - Warning-volume sparkline + current count - Healthy-instance count sparkline (X/2) - Per-instance "minutes since last backup commit" with a 30m stale threshold SVG polyline sparklines, no chart library — `` with `vector-effect: non-scaling-stroke` so the line stays 2px regardless of the parent's width. 3. Phase 6 theme toggle `components/theme-toggle.tsx` Sun/Moon button mounted in the Hermes layout next to the instance switcher. Persists in localStorage `bytelyst.theme.v1`. The design system already defined `[data-theme="light"]` overrides in `styles/tokens.css`; the toggle just sets the attribute. FOUC-prevention inline script in the root layout reads the same key BEFORE React hydrates so the first paint matches the user's last choice. 4. Phase 3 partial close: Agents pane → telemetry inventory `/hermes/agents` now renders a "Memory & Skills inventory (live)" SectionCard backed by the Phase 3 telemetry endpoint per instance — `hermes memory list` and `hermes skills list` rendered with per-section probe-status badges (`up`/`unknown`), item counts, and the first N entries each. Agent **health** statuses (latency, failure rate, last-success/failure) stay seed-data — observability for those needs a separate ingestion contract that the telemetry endpoint doesn't provide today. 5. Phase 0 reconfirmation Roadmap Phase 0 ticked with explicit verification notes for each guardrail (no public listener, manual approvals, secret hygiene, Caddy review). Remains "must hold throughout" — the ticks reflect today's verified state, not single-checkbox completion. Verified: backend typecheck ✅, 74/74 backend unit tests ✅, web typecheck ✅, 7/7 E2E ✅, lint 0 errors, build green, coverage gate ≥95% lines on every gated file. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dashboard/DEPLOYMENT.md | 43 +++++++- dashboard/backend/Dockerfile | 25 ++++- dashboard/web/src/app/hermes/agents/page.tsx | 102 +++++++++++++++++- dashboard/web/src/app/hermes/layout.tsx | 10 +- dashboard/web/src/app/layout.tsx | 15 +++ .../web/src/components/hermes-ops-panel.tsx | 94 ++++++++++++++++ dashboard/web/src/components/theme-toggle.tsx | 67 ++++++++++++ dashboard/web/src/lib/hermes-ops-history.ts | 93 ++++++++++++++++ docs/hermes_dashboard_v2_roadmap.md | 20 ++-- 9 files changed, 451 insertions(+), 18 deletions(-) create mode 100644 dashboard/web/src/components/theme-toggle.tsx create mode 100644 dashboard/web/src/lib/hermes-ops-history.ts diff --git a/dashboard/DEPLOYMENT.md b/dashboard/DEPLOYMENT.md index d59a025..07799a3 100644 --- a/dashboard/DEPLOYMENT.md +++ b/dashboard/DEPLOYMENT.md @@ -452,12 +452,51 @@ constructs `docker ...` shell strings directly with `execAsync`. entity id (`docker-cleanup:` etc.), and a sanitized details payload. Audit writes are best-effort — a Cosmos hiccup logs a warn but never fails the request.)* -- [ ] **P2:** Run the backend container as a non-root user with `docker` group - membership; rebuild the Dockerfile accordingly. +- [x] **P2:** Run the backend container as a non-root user with `docker` group + membership; rebuild the Dockerfile accordingly. *(Dockerfile scaffolds + a non-root `app` user (uid 1001) with `docker` group membership at a + build-arg-configurable GID. Default `BACKEND_USER=root` preserves the + current behaviour so existing deployments don't break; set + `BACKEND_USER=app` and `DOCKER_GID=$(getent group docker | cut -d: -f3)` + to flip it on. Requires host-side prep on the bind-mounted log files — + see "Running non-root" below for the exact `chmod`/`chgrp` recipe.)* - [ ] **P3:** Move from `docker.sock` to a thin daemon (`docker-proxy`-style) that exposes only the verbs the dashboard actually needs (`stats`, `restart`, `logs`, the four `prune` variants). +### Running non-root + +Concrete recipe to flip the backend off root: + +```bash +# 1. Find the host's docker group GID +DOCKER_GID=$(getent group docker | cut -d: -f3) + +# 2. Make the bind-mounted log files group-owned by docker and group-writable +# so the in-container `app` user (gid=$DOCKER_GID) can read/write them. +sudo chgrp docker /var/log/vm-cleanup.log /var/log/vm-health-check.log /var/log/docker-watchdog.log +sudo chmod g+rw /var/log/vm-cleanup.log /var/log/vm-health-check.log /var/log/docker-watchdog.log + +# 3. Confirm the VM scripts mount is world-readable (it's read-only inside +# the container, so 0o755 on the directory is enough). +sudo chmod -R o+rX /opt/bytelyst/learning_ai_devops_tools/scripts + +# 4. Rebuild the backend image with BACKEND_USER=app and the host's GID. +cd /opt/bytelyst/learning_ai_devops_tools/dashboard +docker compose build --build-arg BACKEND_USER=app --build-arg DOCKER_GID=$DOCKER_GID backend + +# 5. Restart and verify +docker compose up -d backend +docker exec devops-backend whoami # → app +docker exec devops-backend id # uid=1001(app) gid=$DOCKER_GID(docker) +curl -fsS http://localhost:4004/health +``` + +If the backend can't reach the docker socket after the flip, double-check +the in-container `id` matches `getent group docker` on the host. The +`docker.sock` bind-mount carries its host ownership into the container, +so the in-container gid must match. + Operators reviewing whether to grant a new admin should read this whole section before doing so. Adding a new shell-out path in code is a **privilege change** and must update this table in the same commit. diff --git a/dashboard/backend/Dockerfile b/dashboard/backend/Dockerfile index c942bb8..8ce250d 100644 --- a/dashboard/backend/Dockerfile +++ b/dashboard/backend/Dockerfile @@ -47,11 +47,34 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ curl bash docker.io python3 \ && rm -rf /var/lib/apt/lists/* -COPY --from=builder /app/backend/dist ./dist +# Non-root user setup (Phase 5 P2 mitigation roadmap, item #4). +# The backend doesn't strictly need root — its only privileged action is +# talking to the docker daemon, which group membership covers. We create +# the user + a docker group at a build-arg-configurable GID so the GID +# can match the host's docker group (`getent group docker` on the host). +# +# Default `BACKEND_USER=root` keeps the current behaviour so existing +# deployments don't break. Set `BACKEND_USER=app` to run non-root; this +# requires the bind-mounted log files in `/var/log/vm-*.log` and +# `/var/log/docker-watchdog.log` to be group-readable+writable by the +# matching docker GID (or world-readable for read-only paths). See +# `dashboard/DEPLOYMENT.md` Privilege Surface → "Running non-root". +ARG BACKEND_USER=root +ARG DOCKER_GID=999 +RUN groupadd --system --gid "${DOCKER_GID}" docker || true \ + && useradd --system --create-home --uid 1001 --gid "${DOCKER_GID}" --shell /sbin/nologin app \ + && chown -R app:"${DOCKER_GID}" /app + +COPY --from=builder --chown=app:${DOCKER_GID} /app/backend/dist ./dist ENV NODE_ENV=production ENV PORT=4004 EXPOSE 4004 +# Switch to non-root only when explicitly opted in via build arg. If the +# arg is `app`, the next two layers actually drop privileges; if `root`, +# they're a no-op. +USER ${BACKEND_USER} + CMD ["node", "dist/server.js"] diff --git a/dashboard/web/src/app/hermes/agents/page.tsx b/dashboard/web/src/app/hermes/agents/page.tsx index e332062..ee47e28 100644 --- a/dashboard/web/src/app/hermes/agents/page.tsx +++ b/dashboard/web/src/app/hermes/agents/page.tsx @@ -1,13 +1,14 @@ 'use client'; import Link from 'next/link'; -import { ArrowLeft, Gauge, ShieldAlert, ServerCog } from 'lucide-react'; +import { ArrowLeft, Brain, Gauge, ShieldAlert, ServerCog, Wand2 } from 'lucide-react'; import { Badge, Button } from '@/components/ui/Primitives'; -import { useMemo } from 'react'; +import { useEffect, useMemo, useState } from 'react'; import { HermesShell, MetricCard, SectionCard } from '@/components/hermes-shell'; import { HermesInstanceBadge } from '@/components/hermes-instance-switcher'; import { useHermesInstance } from '@/lib/hermes-instance-context'; -import { getHermesAgents } from '@/lib/hermes'; +import { getHermesAgents, HERMES_INSTANCES, type HermesInstanceId } from '@/lib/hermes'; +import { api, type HermesTelemetrySnapshot } from '@/lib/api'; export default function HermesAgentsPage() { const { selectedInstance } = useHermesInstance(); @@ -16,6 +17,33 @@ export default function HermesAgentsPage() { const degraded = agents.filter((agent) => agent.status === 'degraded').length; const offline = agents.filter((agent) => agent.status === 'offline').length; + // Real per-instance memory + skills inventory from the Phase 3 telemetry + // endpoint. The agent statuses above remain seed-data (status observability + // needs a separate ingestion contract); the inventory below is genuine + // when the `hermes` CLI is reachable, status:'unknown' otherwise. + const [telemetry, setTelemetry] = useState>({ vijay: null, bheem: null }); + const [telemetryError, setTelemetryError] = useState(null); + + useEffect(() => { + const controller = new AbortController(); + const load = async () => { + try { + const [vijay, bheem] = await Promise.all([ + api.getHermesTelemetry('vijay'), + api.getHermesTelemetry('bheem'), + ]); + if (controller.signal.aborted) return; + setTelemetry({ vijay, bheem }); + setTelemetryError(null); + } catch (err) { + if (controller.signal.aborted) return; + setTelemetryError(err instanceof Error ? err.message : String(err)); + } + }; + void load(); + return () => controller.abort(); + }, []); + return ( + {/* --- Real memory + skills inventory from /api/hermes/telemetry --- */} + {telemetryError ? 'Probe failed' : 'Live data'}} + > + {telemetryError ? ( +

+ Could not load telemetry: {telemetryError} +

+ ) : ( +
+ {HERMES_INSTANCES.filter((inst) => selectedInstance === 'all' || selectedInstance === inst.id).map((inst) => { + const snapshot = telemetry[inst.id]; + const memory = snapshot?.memory; + const skills = snapshot?.skills; + return ( +
+
+
+

{inst.label}

+

{inst.description}

+
+ +
+ +
+
+
+ Memory items + + {snapshot ? `${memory?.items.length ?? 0} · ${memory?.status ?? 'loading'}` : 'loading'} + +
+ {memory && memory.items.length > 0 ? ( +
    + {memory.items.slice(0, 5).map((m) => ( +
  • {m.type}: {m.key} — {m.summary}
  • + ))} + {memory.items.length > 5 ?
  • + {memory.items.length - 5} more
  • : null} +
+ ) : null} +
+ +
+
+ Skills + + {snapshot ? `${skills?.items.length ?? 0} · ${skills?.status ?? 'loading'}` : 'loading'} + +
+ {skills && skills.items.length > 0 ? ( +
    + {skills.items.slice(0, 8).map((s) => ( +
  • {s.enabled ? 'on' : 'off'} {s.name}
  • + ))} + {skills.items.length > 8 ?
  • + {skills.items.length - 8} more
  • : null} +
+ ) : null} +
+
+
+ ); + })} +
+ )} + +
{['Hermes core', 'GitHub integration', 'Local VM runner', 'CLI runner', 'Scheduler / cron', 'Deployment tools', 'Monitoring tools', 'Notification tools', 'Model / LLM provider', 'Secrets / config health', 'OpenClaw integration placeholder', 'Telemetry ingest'].map((item) => ( diff --git a/dashboard/web/src/app/hermes/layout.tsx b/dashboard/web/src/app/hermes/layout.tsx index 2c0b63d..fe5a4cf 100644 --- a/dashboard/web/src/app/hermes/layout.tsx +++ b/dashboard/web/src/app/hermes/layout.tsx @@ -2,6 +2,7 @@ import { SidebarNav } from '@/components/sidebar-nav'; import { HermesInstanceSwitcher } from '@/components/hermes-instance-switcher'; +import { ThemeToggle } from '@/components/theme-toggle'; import { HermesInstanceProvider } from '@/lib/hermes-instance-context'; export default function HermesLayout({ children }: { children: React.ReactNode }) { @@ -11,11 +12,12 @@ export default function HermesLayout({ children }: { children: React.ReactNode }
- {/* Global instance switcher — every Mission Control pane reads - from the same `useHermesInstance()` hook, so this filter - propagates everywhere. */} -
+ {/* Global instance switcher + theme toggle — every Mission + Control pane reads from the same hooks so these controls + propagate everywhere. */} +
+
{children}
diff --git a/dashboard/web/src/app/layout.tsx b/dashboard/web/src/app/layout.tsx index 6c71863..ebc2f27 100644 --- a/dashboard/web/src/app/layout.tsx +++ b/dashboard/web/src/app/layout.tsx @@ -28,6 +28,21 @@ export default function RootLayout({ }>) { return ( + + {/* + FOUC guard: apply the persisted theme to BEFORE React + hydrates so the first paint matches the user's last choice. + Mirrors `STORAGE_KEY` and the allowed values in + `components/theme-toggle.tsx`. Inline-string is intentional; no + interpolation, no data exfil — just two literal strings. + */} +