MVP: when AQ_FLEET_REPO_BASE/<repo> is an existing local checkout, use it as the
clone source (fast, no network) and push/PR to its GitHub origin — embedded creds
in the local origin URL are stripped (gh credential helper handles auth). Selftest
PASS (full-path bare-repo fallback unchanged).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When AQ_FLEET_PR=1 and a claimed fleet job carries a `repo`, run the agent in an
isolated checkout on branch aq/job/<fleetJobId> (off baseBranch), then on a passing
verify commit/push and `gh pr create`. The PR URL + branch are recorded in the meta
and reported on lease release (-> the coordinator stores them on the run).
- fleet-client: parse repo/baseBranch from the claim, carry them in frontmatter;
fleet_report_insights now sends prUrl/branch.
- _fleet_pr_prepare (clone/fetch + branch, local-path aware, identity fallback) and
_fleet_pr_open (commit/push/gh pr create). WIP checkpointing is skipped for PR jobs
(the pushed branch is the durable artifact).
- New flags: AQ_FLEET_PR, AQ_FLEET_REPOS_DIR, GH_BIN. README documented.
- selftest: +1 case (bare-repo origin + gh stub) — branch pushed, PR opened, prUrl
reported on release. Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Previously the factory reported up to `review` and "shipping is always manual",
so a coordinator job never reached a terminal stage autonomously.
- On a passing local verify, always report `testing` to the coordinator so its
stage reflects that QA passed (was stuck at `review`).
- New AQ_FLEET_AUTOSHIP=1: the factory's verify gate IS the test phase, so advance
the coordinator job testing -> shipped and land it in shipped/ locally. This
closes the testing->shipped gap for an autonomous submit -> shipped pipeline.
Default off keeps the human review gate authoritative (job rests at testing).
selftest: +2 cases (autoship reports testing+shipped + lands in shipped/; autoship
OFF reports testing but withholds shipped). Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Devin does not surface token/cost in its stdout or local log, so parse_usage
previously emitted nothing for the devin engine (runs showed no metrics). Devin
DOES expose per-step usage in its ATIF conversation export.
- build_agent_cmd: pass `--export <path>` for the devin engine (path derived from
the job log path so parse_usage can find it; harmless 4th arg for other engines).
- parse_usage devin: read the export and sum per-step metadata.metrics
input_tokens / output_tokens / cache_read_tokens; take model from agent.model_name.
Pure grep/awk, no new dependency. USD cost is left unset (the export carries token
counts but not cost) — the dashboard shows tokens + model, cost stays blank.
These feed fleet_report_insights, so live devin fleet runs now report tokens +
model to the coordinator (verified live: model "Claude Opus 4.8", tokensIn/out +
cache populated on a real run).
selftest: +1 case (parse_usage devin sums per-step tokens + model from --export).
Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
#1 fleet_report_insights: on a successful fleet run the factory now reports the
parsed cost/token/effort metrics (model, tokensIn/Out/cached, costUsd, turns,
toolCalls) plus the run result onto the coordinator run via POST
.../lease/release (which also frees the lease). parse_usage already extracted
these into the job meta; they were never sent. Engines that do not expose usage
locally (devin) still land result + endedAt.
#2 normalize AQ_FLEET_API: platform-service mounts fleet under /api, so a base
without it silently returned 404 on every call. Strip a trailing slash and
append /api unless already present, so AQ_FLEET_API=http://host:4003 works too.
selftest: +2 cases (insights reported via lease/release; API-base normalization).
Full self-test PASS.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Document a phased roadmap for the single-VM deployment layer (build-off-VM,
recreate-in-place to cut downtime, change-detection + BuildKit guarantee,
image slimming + resource caps, artifact-based rollback). Scoped to deploy
orchestration; defers image-build internals to docker-build-optimization-roadmap.
Register the doc in repo-map.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Document the two ways @bytelyst/* packages resolve (local workspace links
vs Gitea npm registry for Docker/CI), the common 'registry offline' local-dev
failure and its fix (sibling directory layout, not a token), and the
deploy-side 'package not published' / token issues with remediation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document how the daemon + agents must run after a review found jobs executing
in --yolo/dangerous mode directly against live working trees (the root cause of
repo dirtiness + duplicate commits). Policy: per-job worktree off origin/main,
branch-per-task + PR, yolo:false by default (dangerous only in disposable
sandboxes), clean-tree contract, one writer per repo. Linked from the README.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two bugs caused duplicate items on re-run: the dedupe list used limit=500
(server caps at 100 -> 400 -> silent empty set -> dupes), and meta productIds
weren't registered so GET /items 400'd ("Unknown product"). Now registers every
referenced product first (idempotent) and lists with limit=100; dedupe failures
are logged loudly. Verified idempotent: re-run skips all 16.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
deploy-gigafactory.sh loads platform-service/.env, starts the fleet backend,
waits for /health, and registers the ecosystem products (idempotent) so live
/api/fleet/* calls resolve. Supports --stop / --register-only / --no-register.
Registered the 11 ecosystem products against the configured Cosmos during a
live run; note fleet metrics needs a composite index on real Azure Cosmos.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a Work Tracking entry to README Primary Entry Points and a short pointer
in CLAUDE.md, both routing to scripts/tracker-seed/ and the AGENTS.md section.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a "Cutting Tracker Items" section to AGENTS.md and register
scripts/tracker-seed/ in docs/repo-map.md so future "cut items to track"
requests route to the seed tooling instead of ad-hoc API calls.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Rename agent-queue/docs/gigafactory/ to docs/GIGAFACTORY/ and update every
reference (README, system-overview code-map, and all phase job specs). Add an
index README that lists the docs and points to the companion docs in
learning_ai_common_plat. Docs-only; no behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Files the ENGINEERING_REVIEW_SCORECARD.md P0-P3 action plan as tracker items
(one per affected product) via the platform-service POST /api/items API.
Dependency-free Node seeder mints an HS256 token from $JWT_SECRET, dedupes by
title, and supports --dry-run. No live writes performed (stack is down); run
the script once the platform stack is up.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Move GIGAFACTORY_ROADMAP.md and GIGAFACTORY_SYSTEM_OVERVIEW.md under
agent-queue/docs/gigafactory/ so the scattered top-level docs are easy to
discover. Update the README links, the overview code-map, and all phase
job-spec source-of-truth paths to the new location. Pure docs move; no
behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Jobs move through .state/inbox/building/testing/review/failed/shipped/logs at
runtime, which constantly dirtied the repo and blocked clean rebases. Ignore
the per-job lifecycle files (keeping each dir via .gitkeep) and stop tracking
the consumed inbox job instances.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Reusable evidence-based review prompt covering repos, code, architecture,
DevOps, testing, security, product-readiness, and AI-agent practices, with
a 1-10 scorecard and prioritized action plan output.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add GIGAFACTORY_SYSTEM_OVERVIEW.md — a current-state companion to the roadmap
spec covering: what the Agent Gigafactory is, a completion snapshot, three
Mermaid diagrams (component architecture, job-lifecycle state machine, atomic
claim + lease-fencing sequence), the Cosmos data model, the scoring router,
subsystem map, full /fleet REST surface, feature flags, the two control planes,
a cross-repo code map, test coverage, next steps (Phase 4/5), and an honest
bugs/gaps/risks section. All three Mermaid blocks validated with mermaid.parse.
Also correct documentation drift in GIGAFACTORY_ROADMAP.md found during the
review:
- §0 progress table showed Phase 3 as "0% not started" while every Phase-3 box
is ticked; updated phases 1-3 to done with realistic percentages.
- Phase-2 boxes "scheduler/router wired into assignment", "tracker adapter
direct call", and "factory enrollment + scoped tokens" are implemented in
common-plat (coordinator.ts uses selectJob; routes.ts enforces
enrollment.enforceFactoryToken; tracker-bridge.ts) but were left unticked —
ticked with evidence and refreshed the stale "remaining for 100%" notes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an opt-in fleet mode to the dashboard so an operator can drive the
coordinator fleet from the same TUI used for the local folder queue.
- lib/fleet-dash.mjs: dependency-injectable read/act adapter over the
platform-service /fleet REST surface (jobs, metrics, factories, events,
ship/requeue/reject). Pure-ish + fully unit-testable without a live service.
- dashboard.mjs: render + act in fleet mode when AQ_FLEET_DASH=1 — board with
counts, factories (per-factory rows or metrics aggregate), alerts, running
(by lease/factory), actionable JOBS with manifest tags, recent, and a
per-job events log. Single-flight async refresh keeps the last good board on
failure; ship re-GETs a fresh leaseEpoch before PATCH; run/stop/promote are
disabled (no safe server contract). Local mode is byte-for-byte unchanged.
- lib/fleet-dash.test.mjs: 22 node:assert assertions (config, stage mapping,
toBoard, fetch headers/timeout/errors, board assembly + graceful degradation,
events, job actions) wired into selftest.sh.
- docs: tick the Phase 3 "TUI re-pointed at /fleet" roadmap boxes.
Verified: selftest.sh green (incl. new fleet-dash checks); live non-TTY render
smoke against a stub /fleet server (both factories and metrics-aggregate paths);
local mode unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Render a per-job tags line on the RUNNING workers and JOBS lists showing the
routing inputs operators care about: priority, profile, capabilities, and the
tracker-item reference. Tags come from the launched meta, falling back to the
job's .md frontmatter for never-launched inbox jobs (new readManifest parser).
The tracker-item becomes a clickable terminal hyperlink when AQ_TRACKER_WEB is
set. Also renders the new budget_exceeded result as a failed RECENT row.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> })
and arm it alongside the per-run timeout. Whichever ceiling fires first binds;
the kill is recorded as result=timeout or result=budget_exceeded accordingly.
budget.wall extends timeout: a job with only a budget.wall (no timeout) is now
hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class
by default and maps to the failed tracker status.
Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live
selftest coverage; usd/tokens remain best-effort and are not enforced here.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Backend + web rebuilt from the previous session's Dockerfile fixes.
- Phase 1-7 dashboard UI now actually live; the dist/server.js CORS
hot-patch is retired (CORS is env-driven via EXTRA_CORS_ORIGINS).
- Tailscale serve restored: caddy was bound to 0.0.0.0:443 and blocked
tailscaled from claiming 100.87.53.10:443. Fixed via a one-line
compose change in learning_ai_common_plat (commit c0db2901).
- End-to-end login through real Cosmos verified at
https://srv1491630.tailf85608.ts.net/login.
Active-repo sweep results (clock, notes, flowmonk, invt_trdg) and HOLD
repo triage are documented in the checkpoint.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The image rebuilds were broken because `backend/package-lock.json` and
`web/package-lock.json` had been regenerated inside the pnpm workspace
and contained pnpm-store symlinks (e.g. `node_modules/typescript` →
`../node_modules/.pnpm/typescript@5.9.3/...` with `link: true`). When
`npm ci` ran in Docker outside the pnpm workspace, those link targets
didn't exist, so devDeps including TypeScript were silently not
installed — leaving `tsc: not found` at build time.
Fix aligns Docker builds with the declared `packageManager: pnpm@10.6.5`
field:
- Both Dockerfiles now use corepack + pnpm with the workspace
`pnpm-lock.yaml` and `--filter ... --frozen-lockfile`
- Production stage uses `pnpm deploy --prod --legacy` to carve out a
devDep-free node_modules
- Drop the stale `backend/package-lock.json` and
`web/package-lock.json` (they're regenerated wrong every time anyone
runs npm in here)
- Add `pino` + `pino-pretty` to backend deps (used by
`src/lib/logger.ts` from the Phase 5 P1 structured-logging work but
never declared)
- Fix pre-existing bug in backend runtime stage: `docker.io` package
in debian:bookworm-slim pre-creates a `docker` group at GID ~101,
so `groupadd --gid 999` then `useradd --gid 999` failed. Use
`groupmod` when the group already exists.
After this commit:
- 87/87 tests pass (74 backend + 13 web)
- typecheck clean
- lint: 0 errors (only pre-existing unused-var warnings)
- `docker compose build && up` succeeds end-to-end
- Tailscale URL serves the new dashboard with all Phase 1-7 work live
- CORS allow-list driven by `EXTRA_CORS_ORIGINS` env var (no hot-patch
needed in the running container)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Captures the in-progress state of the long-running v2 dashboard session
so the next session (post `--permission-mode dangerous` relaunch) can
pick up without losing context. The full handoff narrative lives in
`docs/SESSION_CHECKPOINT_2026-05-30.md` — read it first.
Code change:
- `backend/src/server.ts` CORS allow-list is now env-driven via
`EXTRA_CORS_ORIGINS` (comma-separated). Originally added because
the user's browser is hitting the deployed dashboard via a
Tailscale-served hostname (`srv1491630.tailf85608.ts.net`), and
the static built-in list only knew `localhost` + `devops.bytelyst.com`.
Honours `*` as a wildcard for trusted-network deployments. Adds
`Vary: Origin` so caches behave.
- `backend/package-lock.json` regenerated to match `package.json`
(was missing the Phase 5 ESLint deps added earlier this session).
Note: the Dockerfile build is STILL broken with `tsc: not found`
despite typescript being in devDeps — this is a separate
dual-lockfile issue documented in the checkpoint. Untangle on
resume.
Live infra carry-forward summarised in the checkpoint doc:
- Real Azure Cosmos DB (`cosmos-mywisprai` / new `bytelyst` db)
replaces the crash-looping local emulator.
- `learning_ai_common_plat/docker-compose.yml` has uncommitted
changes mirroring this; that repo is 15 commits behind origin/main
and needs a rebase+commit pass separately.
- Hot-patched the running `devops-backend` container's `dist/server.js`
to allow the Tailscale origin (ephemeral; lost on next image build,
superseded by the code change above once rebuild works).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Close the final Phase-2 exit-criteria box: >=2 factories executing jobs in parallel
through one coordinator, proving the concurrency guarantees end-to-end. This is a
DEMO HARNESS over the existing runtime — agent-queue.sh and lib/fleet-client.sh are
unchanged (read + called, not modified).
demo/two-factory-demo.sh: starts two real `agent-queue.sh run` daemons (mac-1 +
ubuntu-1, separate queues/cwds) that compete ONLY through the coordinator, then
asserts: (a) no double-assign — each of 3 jobs executed by exactly one factory;
(b) fencing + reclaim — kill a factory mid-job, the reaper returns its job, the
survivor reclaims + completes it, and the dead worker's late/zombie report (stale
leaseEpoch) is FENCED (HTTP 409, never shipped); (c) parallelism — both factories
hold active jobs concurrently. Dual-mode: CI-safe stateful stub by default; live
platform-service when AQ_FLEET_API/AQ_FLEET_TOKEN set.
demo/coordinator-stub.sh: stateful, mkdir-lock-guarded, file-backed coordinator
implementing claim/lease/fence/renew/release + reaper-reclaim via the existing
AQ_FLEET_API_CMD seam — the selftest stub pattern extended with shared state so
>=2 processes coordinate through one coordinator.
demo/README.md: stub + real invocations, env knobs, what each guarantee proves,
what-to-watch guide.
selftest.sh: +3 headless stub-mode checks (existing 68 unchanged byte-for-byte ->
71 total green).
docs/GIGAFACTORY_ROADMAP.md: tick the §14 two-factory-demo box; annotate Phase-2
exit criteria; bump §0 Phase 2 to 80% (remaining: scheduler-core wiring [common-plat
PR #31], tracker-direct call, factory enrollment).
bash 3.2 + awk/sed/grep/pgrep only; mac+linux safe; no new runtime deps.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the remaining tractable items from the carry-forward queue.
1. Drop-root scaffold for the backend container (P2 mitigation)
`backend/Dockerfile` adds non-root `app` user (uid 1001) + `docker`
group (gid via `DOCKER_GID` build arg, default 999). `BACKEND_USER`
build arg defaults to `root` so existing deployments keep working;
set it to `app` plus `DOCKER_GID=$(getent group docker | cut -d: -f3)`
to flip the runtime non-root. `dashboard/DEPLOYMENT.md` gets a new
"Running non-root" section with the exact `chgrp`/`chmod` recipe
for the bind-mounted log files (the host-side prep that pairs with
the build flip). DEPLOYMENT.md mitigation roadmap updated.
2. Phase 6 trend cards
`lib/hermes-ops-history.ts` keeps the last 24 ops snapshots in
localStorage (de-duped on `generatedAt`, schema-guarded on read,
degrades silently on quota exceeded). Three trend cards in the
ops panel:
- Warning-volume sparkline + current count
- Healthy-instance count sparkline (X/2)
- Per-instance "minutes since last backup commit" with a 30m
stale threshold
SVG polyline sparklines, no chart library — `<svg viewBox="0 0
100 100" preserveAspectRatio="none">` with `vector-effect:
non-scaling-stroke` so the line stays 2px regardless of the
parent's width.
3. Phase 6 theme toggle
`components/theme-toggle.tsx` Sun/Moon button mounted in the
Hermes layout next to the instance switcher. Persists in
localStorage `bytelyst.theme.v1`. The design system already
defined `[data-theme="light"]` overrides in `styles/tokens.css`;
the toggle just sets the attribute. FOUC-prevention inline script
in the root layout reads the same key BEFORE React hydrates so
the first paint matches the user's last choice.
4. Phase 3 partial close: Agents pane → telemetry inventory
`/hermes/agents` now renders a "Memory & Skills inventory (live)"
SectionCard backed by the Phase 3 telemetry endpoint per instance
— `hermes memory list` and `hermes skills list` rendered with
per-section probe-status badges (`up`/`unknown`), item counts,
and the first N entries each. Agent **health** statuses (latency,
failure rate, last-success/failure) stay seed-data — observability
for those needs a separate ingestion contract that the telemetry
endpoint doesn't provide today.
5. Phase 0 reconfirmation
Roadmap Phase 0 ticked with explicit verification notes for each
guardrail (no public listener, manual approvals, secret hygiene,
Caddy review). Remains "must hold throughout" — the ticks reflect
today's verified state, not single-checkbox completion.
Verified: backend typecheck ✅, 74/74 backend unit tests ✅, web
typecheck ✅, 7/7 E2E ✅, lint 0 errors, build green, coverage gate
≥95% lines on every gated file.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.
1. Allow-list shell wrapper (P1)
New `lib/shell.ts`:
- `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
interpolation. Single escape hatch for ad-hoc invocations.
- `dockerRestart(name)` — name validated against
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
on anything else (including non-strings, shell metacharacters,
command-substitution attempts). Tests cover all of these.
- `dockerPrune(kind, {all?})` — kind constrained to
{container,image,volume,builder}; `--all` only valid for image.
- `runBashScript(path, args, {allowedRoots})` — script path AND
cwd both checked against allowed roots; rejects `..` escapes
and prefix-matching siblings (`/opt/projects-evil` vs
`/opt/projects`).
- `runNpmScript(script, {cwd, allowedRoots})` — script ∈
{typecheck,lint,build,test,test:run,start}; cwd inside roots.
17 unit tests cover every rejection path. Module added to the
coverage gate (≥95% lines).
Migrated highest-risk callers off template-literal `exec`:
- `vm/repository.ts:restartContainer` → `dockerRestart`. Was
previously `await execAsync(\`docker restart "${name}"\`)`
with only a regex check; now goes through the wrapper.
- `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
+ `execAllowed` for `docker system df`. Drops the array of
template-literal command strings entirely.
- `code-quality/repository.ts` → `runNpmScript` for every
lifecycle invocation. cwd is now the resolved (normalised,
`..`-collapsed) path, not the raw input.
2. projectPath validation for /code-quality/check (P1)
`runCodeQualityCheck` now calls
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
any subprocess spawns. `getAllowedRoots()` reads
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
`/opt/bytelyst`). Rejection happens with a clear error message
listing the configured roots so operators know what to allow.
3. Audit-log every privileged shell-out (P2)
`audit/types.ts` extended: `action` now includes `'shell-exec'`,
`entityType` includes `'host'`. The migration is additive — old
audit rows still validate.
Three privileged routes now write a `shell-exec` audit row with
actor (authUserId / authRole), entity id, and a sanitized details
payload before responding:
- `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
details include {type, force, freedSpace}.
- `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
- `POST /vm/containers/:name/restart` — `entityId:
container-restart:<name>`, details include {success, message}.
Audited even on failure so attempted privileged actions are
still recorded.
Audit writes are best-effort — a Cosmos hiccup logs a warn but
never fails the request the operator was running.
Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for
shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 8 line that's actually a docs/codebase change. The
other two Phase 8 items are VM-ops work (bot tokens + watchdog
extensions) and live as a delegation brief.
What's in this repo
- `docs/hermes-operations.md` gains a "Telegram Notification
Convention" section codifying:
* routing per instance (Vijay → root chat, Bheem → Uma chat,
cross-cutting → root)
* silent-on-healthy + post-on-recovery
* the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and
why it survives Telegram client rendering
* approval-prompt UI expectation
* "don't paste secrets" pointer back to `lib/logger.ts`'s
redaction path-list
- `docs/prompts/phase8-telegram-loop.md` — full delegation brief
for the VM-side implementation. Design: dashboard backend writes
new warnings (with `instance=<id>` tag, deduped over 1h) to an
append-only log; both watchdogs tail it and route through the
existing Telegram delivery path. Avoids splitting the delivery
code into two places that would each need rate-limit + token-
rotation handling. Brief is gated on Phase 4 — Uma's watchdog
must exist first.
- Roadmap Phase 8 ticked for "preserve numbered-emoji convention"
(codified in operations doc); the other two items have notes
pointing at the brief.
Phase 8 doesn't fully close in this repo because the delivery loop
needs real bot tokens and the Phase 4 Uma watchdog before it can be
end-to-end validated. The codebase's contribution is everything that
doesn't need a token: the convention, the design, and the delegation
brief.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes Phase 6 (the items that don't need a backend change). Three
threads, all on the Hermes Mission Control overview:
1. Severity-tagged alerts on the ops panel
New `RecentAlerts` component classifies each `recentAlerts` string
into critical / warn / info by leading token (CRITICAL/ERROR/FATAL
→ critical; INFO/OK → info; default → warn — most ops alerts are
warnings) and renders a colour-coded badge per alert. A
per-severity radiogroup filter sits in the panel header with live
counts. Pure UI — no backend contract change. The watchdog log
tailer in `hermes-telemetry/repository.ts` already emits structured
severities for the future migration off of leading-token parsing.
2. Per-instance action row on each `InstanceCard`
Adds three buttons next to "Open dashboard" / "Copy URL":
- "Copy SSH command": Tailscale-scoped only — never raw `ssh` —
and per-instance user (`tailscale ssh root@<ts-ip>` for Vijay,
`tailscale ssh uma@<ts-ip>` for Bheem). Disabled when the
snapshot has no Tailscale IP.
- "View tasks": deep link into the Task Ledger pre-filtered by
instance via `/hermes/tasks?instance=<id>`.
- "Open runbook": link to `docs/hermes-operations.md`.
"How to restart this gateway" is intentionally a runbook link, not
a button — restarting is privileged and should go through the
documented procedure, not the dashboard UI.
3. URL-param hydration of the instance switcher
`HermesInstanceProvider` now reads `?instance=` from the URL on
mount (and on subsequent navigations to a different value). The
URL value wins over the persisted localStorage selection so deep
links from the ops panel land on a pre-filtered pane. The param
is intentionally not auto-stripped — back/forward and copy-paste
stay meaningful.
Roadmap status: Phase 6 ticked except trend cards (deferred — needs
client-side history persistence) and theme toggle (deferred — shell
doesn't expose a switch primitive yet). Unified-alerts-feed bullet
partially achieved by the new severity filter; the per-instance roll-up
will land when a UI consumer is built for the Phase 3 telemetry
endpoint.
Verified: typecheck ✅, build ✅, 7/7 E2E ✅ (the existing switcher
test exercises the new context code path; URL hydration is covered
indirectly by the deep-link button → Task Ledger pre-filter).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two threads, one commit because they're both about closing dashboard-
side roadmap items that don't need their own slice.
Phase 7 — auth coverage on hermes routes:
- `/api/hermes/ops` was the last unauthenticated Hermes endpoint —
despite revealing instance / gateway / Tailscale-IP / backup-repo /
warnings state. Now gated on `requireAdmin`, matching the new
`/api/hermes/telemetry/:instance` from the previous slice and
every other privileged route in this backend.
- Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to
show `requireAdmin` for both Hermes routes; the previous
"no auth, read-only ops snapshot" carve-out is gone.
- Roadmap Phase 7 ticks for "require auth on hermes routes" + "keep
hermes data private-only" with verification notes.
Phase 4 — Bheem/Uma parity (delegation brief):
- Phase 4 is **VM ops, not codebase work** — it requires sudo on the
Hostinger VM, Uma-owned GitHub credentials, and Telegram bot
tokens. None of it is editable in this repo. Wrote
`docs/prompts/phase4-bheem-uma-parity.md` as a self-contained
delegation brief covering: Uma persistent-backup repo + timer,
Uma health watchdog, first restore rehearsal, quarterly drill
reminder, and the dashboard-side verification (the /hermes/ops +
/hermes/telemetry/bheem outputs that confirm the gap is closed).
- Phase 4 section header in the roadmap now points at the brief
and explains why the checkboxes stay open in this repo.
Verified: backend 57/57 unit tests ✅, web 7/7 E2E ✅ (Playwright
mocks bypass requireAdmin since they fulfill before the request
reaches Fastify; real auth'd users get the same flow as every other
admin route). Lint 0 errors, build green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the "drop the fabricated 50-item mock" Phase 3 line. The Mission
Control Products pane now renders the **real** deployment registry as
its primary view, sourced from `backend/src/modules/services` (the
Cosmos-backed service registry) joined with the health module.
Page layout:
- Top "Live services" SectionCard: real services from
`api.getServices()` joined with `api.getHealth()`. Per-card: status
(up / degraded / down derived from the most recent health probe),
version, health URL, repo path, last deploy, last health check,
response time. Refresh button (busts the 30s health cache via
`clearHealthCache`). Loading / empty / error states. Health-check
poll loop is intentionally not added on this page — the home
dashboard already runs one and our cache layer dedupes.
- Bottom "Planned products (seed data)" SectionCard: the previous
50-item seed view, now clearly labelled `Seed` and demoted below
the live data. Kept until manual entries for not-yet-deployed
products are wired in (per the Phase 3 roadmap note).
E2E:
- `hermes.spec.ts` `beforeEach` now mocks `/api/services`,
`/api/health`, `/api/health/cache` so the products page renders
deterministically without a live backend (the dashboard spec
already does the same for the home page).
Verified: typecheck ✅, 13/13 web unit tests ✅, 7/7 E2E ✅, lint 0
errors, build green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
First slice of Phase 3 ("real per-instance telemetry"). Defines the
read-only artifact contract from Decision #1 (sessions, cron, memory,
skills, watchdog alerts, backup history) and ships an admin-gated
backend endpoint that probes the live Hermes instance, gracefully
degrading to status:'unknown' wherever the source isn't readable.
What's new
- `backend/src/modules/hermes-telemetry/types.ts` — Zod schemas for
every section of the snapshot, plus a `HermesProbeStatus` reused
from hermes-ops so the UI can distinguish "definitely empty" from
"couldn't read the source" for each section independently.
- `backend/src/modules/hermes-telemetry/repository.ts` — implementation
that:
* shells out via `runuser -u <user> --` for cross-user instances
(Bheem/uma) the same way `hermes-ops/repository.ts` does;
* parses `hermes sessions stats / cron list / memory list /
skills list --json` when the CLI is present, otherwise
reports status:'unknown';
* tails the watchdog log and buckets each line by severity
(critical / warn / info);
* pulls `git -C <repo> log` against the instance's backup repo
for backup history;
* caches per-instance with a 30s TTL + in-flight coalescing,
same pattern as hermes-ops.
- `backend/src/modules/hermes-telemetry/routes.ts` — admin-only GET
`/api/hermes/telemetry/:instance` (the `instance` path param is
Zod-validated; the response is validated against
`HermesTelemetrySnapshotSchema` before send so a shape regression
surfaces here, not in the UI).
- `backend/src/modules/hermes-telemetry/hermes-telemetry.test.ts` —
6 unit tests: ENOENT-on-everything case validates against the
schema, JSON-parse path for sessions/cron/memory/skills, watchdog
log severity bucketing, backup-history `git log` parsing, cache
hit, per-instance cache isolation. Coverage: 95.17% lines on the
new repository module.
- `backend/vitest.config.ts` — telemetry repository added to the
coverage gate's `include` list (ratchet).
- `web/src/lib/api.ts` — typed surface for the new endpoint:
`HermesTelemetrySnapshot` + sub-types + `api.getHermesTelemetry`.
What's NOT in this slice
- UI consumption. The Task Ledger / Agents / History panes still
render mock data; converting them is queued for the next slices.
This slice ships the contract + the backend so those slices can
build on a stable shape.
- Backward-compat replacement of `/api/hermes/ops` (which is
unauthenticated today). That comes with the Phase 7 auth pass.
Verified: backend typecheck ✅, 57/57 unit tests ✅, web typecheck ✅,
lint 0 errors, coverage gate ≥95% lines on every gated file.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes Phase 2. Every entity in `web/src/lib/hermes` now carries an
`instanceId: 'vijay' | 'bheem'` (with `'all'` allowed for cross-cutting
agents like Hermes Core / GitHub link), and a global instance switcher
above every Mission Control pane filters them.
Library changes (`web/src/lib/hermes.ts`):
- New `HermesInstanceId` / `HermesInstanceFilter` types + `HERMES_INSTANCES`
metadata array.
- `instanceId` added to `HermesProduct`, `HermesTask`, `HermesEvent`,
`HermesRun`, `HermesAgentStatus`. Seed data deterministically split
~50/50 across instances; agents tagged per-scope (Local VM runner →
bheem, CLI runner / Scheduler → vijay, Hermes Core / GitHub /
OpenClaw / deployment / notifications → all).
- `getHermesTasks({instance})`, `getHermesProducts(view, instance)`,
`getHermesAgents(instance)`, `getHermesHistory(instance)`,
`getHermesOverview(instance)` all accept the filter; helper
`instanceMatches(scope, filter)` keeps the semantics consistent
(always-match for `'all'` on either side).
UI changes:
- New `HermesInstanceProvider` (React context, localStorage-backed
under `hermes.instanceFilter.v1`, SSR-safe default to avoid
hydration mismatch) mounted in `app/hermes/layout.tsx`.
- New `HermesInstanceSwitcher` segmented control (radiogroup with
aria-checked) rendered in the layout header above every pane.
- New `HermesInstanceBadge` shown on task rows (Active Missions +
Task Ledger), product cards (overview minicards + portfolio
cards), and agent cards.
- `/hermes` overview gains a "Per-instance roll-up" section that
always shows Vijay vs Bheem side-by-side regardless of the active
filter — that's the always-cross-instance comparison view, while
the eight metric cards above it are filtered by the switcher.
Tests:
- 2 new unit tests in `lib/hermes.test.ts` (instance tagging on seed
data + filter semantics across tasks/products/agents/overview).
- 1 new E2E test asserting the switcher's radiogroup, default
selection, and persistence-friendly state change.
- All green: 13/13 web unit tests, 7/7 E2E.
`web/test-results/` and `web/playwright-report/` added to `.gitignore`
since they're regenerated per run.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>