- Short-TTL (30s) snapshot cache + in-flight coalescing so the panel poll and
concurrent refreshes don't fan out ~20 systemctl/git/ps/du subprocesses each
time; snapshot carries a `cached` flag and `getHermesOpsSnapshot({force})`.
- Distinguish "unit inactive" (down) from "probe couldn't run" (unknown): a new
exec() wrapper reports whether the command actually ran (ENOENT/timeout =
unknown) vs exited non-zero with output (e.g. systemctl is-active -> inactive).
Per-field ProbeStatus on gateway/dashboard/timer/repo; warnings differentiate
"is not active" from "status could not be determined".
- Robust Bheem/Uma checks: `runuser -u uma -- systemctl --user is-active/
is-enabled` with a ps / existsSync fallback so a failed probe degrades to the
legacy check instead of a false "down".
- Zod schema (HermesOpsSnapshotSchema) as the stable typed contract; the route
validates output before sending. New status fields are additive (active/
enabled/url/etc. preserved) so the existing web client is unaffected.
- Unit tests (mock execFile/fs): healthy snapshot, down vs unknown mapping,
runuser->ps fallback, unreadable repo, cache hit + force bypass, request
coalescing. Backend: 16 tests green.
Roadmap: check off Phase 1 items and Phase 5 P0 in hermes_dashboard_v2_roadmap.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>