Add a "Cutting Tracker Items" section to AGENTS.md and register
scripts/tracker-seed/ in docs/repo-map.md so future "cut items to track"
requests route to the seed tooling instead of ad-hoc API calls.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Rename agent-queue/docs/gigafactory/ to docs/GIGAFACTORY/ and update every
reference (README, system-overview code-map, and all phase job specs). Add an
index README that lists the docs and points to the companion docs in
learning_ai_common_plat. Docs-only; no behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Files the ENGINEERING_REVIEW_SCORECARD.md P0-P3 action plan as tracker items
(one per affected product) via the platform-service POST /api/items API.
Dependency-free Node seeder mints an HS256 token from $JWT_SECRET, dedupes by
title, and supports --dry-run. No live writes performed (stack is down); run
the script once the platform stack is up.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Move GIGAFACTORY_ROADMAP.md and GIGAFACTORY_SYSTEM_OVERVIEW.md under
agent-queue/docs/gigafactory/ so the scattered top-level docs are easy to
discover. Update the README links, the overview code-map, and all phase
job-spec source-of-truth paths to the new location. Pure docs move; no
behavior change.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Jobs move through .state/inbox/building/testing/review/failed/shipped/logs at
runtime, which constantly dirtied the repo and blocked clean rebases. Ignore
the per-job lifecycle files (keeping each dir via .gitkeep) and stop tracking
the consumed inbox job instances.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Reusable evidence-based review prompt covering repos, code, architecture,
DevOps, testing, security, product-readiness, and AI-agent practices, with
a 1-10 scorecard and prioritized action plan output.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add GIGAFACTORY_SYSTEM_OVERVIEW.md — a current-state companion to the roadmap
spec covering: what the Agent Gigafactory is, a completion snapshot, three
Mermaid diagrams (component architecture, job-lifecycle state machine, atomic
claim + lease-fencing sequence), the Cosmos data model, the scoring router,
subsystem map, full /fleet REST surface, feature flags, the two control planes,
a cross-repo code map, test coverage, next steps (Phase 4/5), and an honest
bugs/gaps/risks section. All three Mermaid blocks validated with mermaid.parse.
Also correct documentation drift in GIGAFACTORY_ROADMAP.md found during the
review:
- §0 progress table showed Phase 3 as "0% not started" while every Phase-3 box
is ticked; updated phases 1-3 to done with realistic percentages.
- Phase-2 boxes "scheduler/router wired into assignment", "tracker adapter
direct call", and "factory enrollment + scoped tokens" are implemented in
common-plat (coordinator.ts uses selectJob; routes.ts enforces
enrollment.enforceFactoryToken; tracker-bridge.ts) but were left unticked —
ticked with evidence and refreshed the stale "remaining for 100%" notes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an opt-in fleet mode to the dashboard so an operator can drive the
coordinator fleet from the same TUI used for the local folder queue.
- lib/fleet-dash.mjs: dependency-injectable read/act adapter over the
platform-service /fleet REST surface (jobs, metrics, factories, events,
ship/requeue/reject). Pure-ish + fully unit-testable without a live service.
- dashboard.mjs: render + act in fleet mode when AQ_FLEET_DASH=1 — board with
counts, factories (per-factory rows or metrics aggregate), alerts, running
(by lease/factory), actionable JOBS with manifest tags, recent, and a
per-job events log. Single-flight async refresh keeps the last good board on
failure; ship re-GETs a fresh leaseEpoch before PATCH; run/stop/promote are
disabled (no safe server contract). Local mode is byte-for-byte unchanged.
- lib/fleet-dash.test.mjs: 22 node:assert assertions (config, stage mapping,
toBoard, fetch headers/timeout/errors, board assembly + graceful degradation,
events, job actions) wired into selftest.sh.
- docs: tick the Phase 3 "TUI re-pointed at /fleet" roadmap boxes.
Verified: selftest.sh green (incl. new fleet-dash checks); live non-TTY render
smoke against a stub /fleet server (both factories and metrics-aggregate paths);
local mode unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Render a per-job tags line on the RUNNING workers and JOBS lists showing the
routing inputs operators care about: priority, profile, capabilities, and the
tracker-item reference. Tags come from the launched meta, falling back to the
job's .md frontmatter for never-launched inbox jobs (new readManifest parser).
The tracker-item becomes a clickable terminal hyperlink when AQ_TRACKER_WEB is
set. Also renders the new budget_exceeded result as a failed RECENT row.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the wall ceiling from the budget manifest map (budget: { wall: <dur> })
and arm it alongside the per-run timeout. Whichever ceiling fires first binds;
the kill is recorded as result=timeout or result=budget_exceeded accordingly.
budget.wall extends timeout: a job with only a budget.wall (no timeout) is now
hard-killed at the ceiling. budget_exceeded is a terminal, non-retryable class
by default and maps to the failed tracker status.
Adds _budget_wall_secs + _effective_kill helpers (pure, unit-tested) and live
selftest coverage; usd/tokens remain best-effort and are not enforced here.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Backend + web rebuilt from the previous session's Dockerfile fixes.
- Phase 1-7 dashboard UI now actually live; the dist/server.js CORS
hot-patch is retired (CORS is env-driven via EXTRA_CORS_ORIGINS).
- Tailscale serve restored: caddy was bound to 0.0.0.0:443 and blocked
tailscaled from claiming 100.87.53.10:443. Fixed via a one-line
compose change in learning_ai_common_plat (commit c0db2901).
- End-to-end login through real Cosmos verified at
https://srv1491630.tailf85608.ts.net/login.
Active-repo sweep results (clock, notes, flowmonk, invt_trdg) and HOLD
repo triage are documented in the checkpoint.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The image rebuilds were broken because `backend/package-lock.json` and
`web/package-lock.json` had been regenerated inside the pnpm workspace
and contained pnpm-store symlinks (e.g. `node_modules/typescript` →
`../node_modules/.pnpm/typescript@5.9.3/...` with `link: true`). When
`npm ci` ran in Docker outside the pnpm workspace, those link targets
didn't exist, so devDeps including TypeScript were silently not
installed — leaving `tsc: not found` at build time.
Fix aligns Docker builds with the declared `packageManager: pnpm@10.6.5`
field:
- Both Dockerfiles now use corepack + pnpm with the workspace
`pnpm-lock.yaml` and `--filter ... --frozen-lockfile`
- Production stage uses `pnpm deploy --prod --legacy` to carve out a
devDep-free node_modules
- Drop the stale `backend/package-lock.json` and
`web/package-lock.json` (they're regenerated wrong every time anyone
runs npm in here)
- Add `pino` + `pino-pretty` to backend deps (used by
`src/lib/logger.ts` from the Phase 5 P1 structured-logging work but
never declared)
- Fix pre-existing bug in backend runtime stage: `docker.io` package
in debian:bookworm-slim pre-creates a `docker` group at GID ~101,
so `groupadd --gid 999` then `useradd --gid 999` failed. Use
`groupmod` when the group already exists.
After this commit:
- 87/87 tests pass (74 backend + 13 web)
- typecheck clean
- lint: 0 errors (only pre-existing unused-var warnings)
- `docker compose build && up` succeeds end-to-end
- Tailscale URL serves the new dashboard with all Phase 1-7 work live
- CORS allow-list driven by `EXTRA_CORS_ORIGINS` env var (no hot-patch
needed in the running container)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Captures the in-progress state of the long-running v2 dashboard session
so the next session (post `--permission-mode dangerous` relaunch) can
pick up without losing context. The full handoff narrative lives in
`docs/SESSION_CHECKPOINT_2026-05-30.md` — read it first.
Code change:
- `backend/src/server.ts` CORS allow-list is now env-driven via
`EXTRA_CORS_ORIGINS` (comma-separated). Originally added because
the user's browser is hitting the deployed dashboard via a
Tailscale-served hostname (`srv1491630.tailf85608.ts.net`), and
the static built-in list only knew `localhost` + `devops.bytelyst.com`.
Honours `*` as a wildcard for trusted-network deployments. Adds
`Vary: Origin` so caches behave.
- `backend/package-lock.json` regenerated to match `package.json`
(was missing the Phase 5 ESLint deps added earlier this session).
Note: the Dockerfile build is STILL broken with `tsc: not found`
despite typescript being in devDeps — this is a separate
dual-lockfile issue documented in the checkpoint. Untangle on
resume.
Live infra carry-forward summarised in the checkpoint doc:
- Real Azure Cosmos DB (`cosmos-mywisprai` / new `bytelyst` db)
replaces the crash-looping local emulator.
- `learning_ai_common_plat/docker-compose.yml` has uncommitted
changes mirroring this; that repo is 15 commits behind origin/main
and needs a rebase+commit pass separately.
- Hot-patched the running `devops-backend` container's `dist/server.js`
to allow the Tailscale origin (ephemeral; lost on next image build,
superseded by the code change above once rebuild works).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Close the final Phase-2 exit-criteria box: >=2 factories executing jobs in parallel
through one coordinator, proving the concurrency guarantees end-to-end. This is a
DEMO HARNESS over the existing runtime — agent-queue.sh and lib/fleet-client.sh are
unchanged (read + called, not modified).
demo/two-factory-demo.sh: starts two real `agent-queue.sh run` daemons (mac-1 +
ubuntu-1, separate queues/cwds) that compete ONLY through the coordinator, then
asserts: (a) no double-assign — each of 3 jobs executed by exactly one factory;
(b) fencing + reclaim — kill a factory mid-job, the reaper returns its job, the
survivor reclaims + completes it, and the dead worker's late/zombie report (stale
leaseEpoch) is FENCED (HTTP 409, never shipped); (c) parallelism — both factories
hold active jobs concurrently. Dual-mode: CI-safe stateful stub by default; live
platform-service when AQ_FLEET_API/AQ_FLEET_TOKEN set.
demo/coordinator-stub.sh: stateful, mkdir-lock-guarded, file-backed coordinator
implementing claim/lease/fence/renew/release + reaper-reclaim via the existing
AQ_FLEET_API_CMD seam — the selftest stub pattern extended with shared state so
>=2 processes coordinate through one coordinator.
demo/README.md: stub + real invocations, env knobs, what each guarantee proves,
what-to-watch guide.
selftest.sh: +3 headless stub-mode checks (existing 68 unchanged byte-for-byte ->
71 total green).
docs/GIGAFACTORY_ROADMAP.md: tick the §14 two-factory-demo box; annotate Phase-2
exit criteria; bump §0 Phase 2 to 80% (remaining: scheduler-core wiring [common-plat
PR #31], tracker-direct call, factory enrollment).
bash 3.2 + awk/sed/grep/pgrep only; mac+linux safe; no new runtime deps.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the remaining tractable items from the carry-forward queue.
1. Drop-root scaffold for the backend container (P2 mitigation)
`backend/Dockerfile` adds non-root `app` user (uid 1001) + `docker`
group (gid via `DOCKER_GID` build arg, default 999). `BACKEND_USER`
build arg defaults to `root` so existing deployments keep working;
set it to `app` plus `DOCKER_GID=$(getent group docker | cut -d: -f3)`
to flip the runtime non-root. `dashboard/DEPLOYMENT.md` gets a new
"Running non-root" section with the exact `chgrp`/`chmod` recipe
for the bind-mounted log files (the host-side prep that pairs with
the build flip). DEPLOYMENT.md mitigation roadmap updated.
2. Phase 6 trend cards
`lib/hermes-ops-history.ts` keeps the last 24 ops snapshots in
localStorage (de-duped on `generatedAt`, schema-guarded on read,
degrades silently on quota exceeded). Three trend cards in the
ops panel:
- Warning-volume sparkline + current count
- Healthy-instance count sparkline (X/2)
- Per-instance "minutes since last backup commit" with a 30m
stale threshold
SVG polyline sparklines, no chart library — `<svg viewBox="0 0
100 100" preserveAspectRatio="none">` with `vector-effect:
non-scaling-stroke` so the line stays 2px regardless of the
parent's width.
3. Phase 6 theme toggle
`components/theme-toggle.tsx` Sun/Moon button mounted in the
Hermes layout next to the instance switcher. Persists in
localStorage `bytelyst.theme.v1`. The design system already
defined `[data-theme="light"]` overrides in `styles/tokens.css`;
the toggle just sets the attribute. FOUC-prevention inline script
in the root layout reads the same key BEFORE React hydrates so
the first paint matches the user's last choice.
4. Phase 3 partial close: Agents pane → telemetry inventory
`/hermes/agents` now renders a "Memory & Skills inventory (live)"
SectionCard backed by the Phase 3 telemetry endpoint per instance
— `hermes memory list` and `hermes skills list` rendered with
per-section probe-status badges (`up`/`unknown`), item counts,
and the first N entries each. Agent **health** statuses (latency,
failure rate, last-success/failure) stay seed-data — observability
for those needs a separate ingestion contract that the telemetry
endpoint doesn't provide today.
5. Phase 0 reconfirmation
Roadmap Phase 0 ticked with explicit verification notes for each
guardrail (no public listener, manual approvals, secret hygiene,
Caddy review). Remains "must hold throughout" — the ticks reflect
today's verified state, not single-checkbox completion.
Verified: backend typecheck ✅, 74/74 backend unit tests ✅, web
typecheck ✅, 7/7 E2E ✅, lint 0 errors, build green, coverage gate
≥95% lines on every gated file.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the three Phase 5 P2 follow-ups from the DEPLOYMENT.md
mitigation roadmap that don't need infra changes. Two P2 items remain
(non-root container, docker-proxy daemon) — both genuinely need
container/orchestration work and stay queued.
1. Allow-list shell wrapper (P1)
New `lib/shell.ts`:
- `execAllowed(cmd, args, opts)` — `execFile`-only, no shell, no
interpolation. Single escape hatch for ad-hoc invocations.
- `dockerRestart(name)` — name validated against
`[a-zA-Z0-9][a-zA-Z0-9._-]{0,127}`; throws InvalidShellArgError
on anything else (including non-strings, shell metacharacters,
command-substitution attempts). Tests cover all of these.
- `dockerPrune(kind, {all?})` — kind constrained to
{container,image,volume,builder}; `--all` only valid for image.
- `runBashScript(path, args, {allowedRoots})` — script path AND
cwd both checked against allowed roots; rejects `..` escapes
and prefix-matching siblings (`/opt/projects-evil` vs
`/opt/projects`).
- `runNpmScript(script, {cwd, allowedRoots})` — script ∈
{typecheck,lint,build,test,test:run,start}; cwd inside roots.
17 unit tests cover every rejection path. Module added to the
coverage gate (≥95% lines).
Migrated highest-risk callers off template-literal `exec`:
- `vm/repository.ts:restartContainer` → `dockerRestart`. Was
previously `await execAsync(\`docker restart "${name}"\`)`
with only a regex check; now goes through the wrapper.
- `system/repository.ts:dockerCleanup` → `dockerPrune` per kind
+ `execAllowed` for `docker system df`. Drops the array of
template-literal command strings entirely.
- `code-quality/repository.ts` → `runNpmScript` for every
lifecycle invocation. cwd is now the resolved (normalised,
`..`-collapsed) path, not the raw input.
2. projectPath validation for /code-quality/check (P1)
`runCodeQualityCheck` now calls
`assertPathInAllowedRoots(projectPath, getAllowedRoots())` before
any subprocess spawns. `getAllowedRoots()` reads
`CODE_QUALITY_ALLOWED_ROOTS` (colon-separated env, defaults to
`/opt/bytelyst`). Rejection happens with a clear error message
listing the configured roots so operators know what to allow.
3. Audit-log every privileged shell-out (P2)
`audit/types.ts` extended: `action` now includes `'shell-exec'`,
`entityType` includes `'host'`. The migration is additive — old
audit rows still validate.
Three privileged routes now write a `shell-exec` audit row with
actor (authUserId / authRole), entity id, and a sanitized details
payload before responding:
- `POST /docker/cleanup` — `entityId: docker-cleanup:<type>`,
details include {type, force, freedSpace}.
- `POST /vm/cleanup` — `entityId: vm-cleanup:<mode>`.
- `POST /vm/containers/:name/restart` — `entityId:
container-restart:<name>`, details include {success, message}.
Audited even on failure so attempted privileged actions are
still recorded.
Audit writes are best-effort — a Cosmos hiccup logs a warn but
never fails the request the operator was running.
Verified: backend typecheck ✅, 74/74 unit tests ✅ (17 new for
shell.ts + audit changes), 7/7 E2E ✅, lint 0 errors, coverage gate
≥95% lines on every gated file (which now includes shell.ts).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 8 line that's actually a docs/codebase change. The
other two Phase 8 items are VM-ops work (bot tokens + watchdog
extensions) and live as a delegation brief.
What's in this repo
- `docs/hermes-operations.md` gains a "Telegram Notification
Convention" section codifying:
* routing per instance (Vijay → root chat, Bheem → Uma chat,
cross-cutting → root)
* silent-on-healthy + post-on-recovery
* the numbered-emoji progress convention (`1️⃣`, `2️⃣`, …) and
why it survives Telegram client rendering
* approval-prompt UI expectation
* "don't paste secrets" pointer back to `lib/logger.ts`'s
redaction path-list
- `docs/prompts/phase8-telegram-loop.md` — full delegation brief
for the VM-side implementation. Design: dashboard backend writes
new warnings (with `instance=<id>` tag, deduped over 1h) to an
append-only log; both watchdogs tail it and route through the
existing Telegram delivery path. Avoids splitting the delivery
code into two places that would each need rate-limit + token-
rotation handling. Brief is gated on Phase 4 — Uma's watchdog
must exist first.
- Roadmap Phase 8 ticked for "preserve numbered-emoji convention"
(codified in operations doc); the other two items have notes
pointing at the brief.
Phase 8 doesn't fully close in this repo because the delivery loop
needs real bot tokens and the Phase 4 Uma watchdog before it can be
end-to-end validated. The codebase's contribution is everything that
doesn't need a token: the convention, the design, and the delegation
brief.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes Phase 6 (the items that don't need a backend change). Three
threads, all on the Hermes Mission Control overview:
1. Severity-tagged alerts on the ops panel
New `RecentAlerts` component classifies each `recentAlerts` string
into critical / warn / info by leading token (CRITICAL/ERROR/FATAL
→ critical; INFO/OK → info; default → warn — most ops alerts are
warnings) and renders a colour-coded badge per alert. A
per-severity radiogroup filter sits in the panel header with live
counts. Pure UI — no backend contract change. The watchdog log
tailer in `hermes-telemetry/repository.ts` already emits structured
severities for the future migration off of leading-token parsing.
2. Per-instance action row on each `InstanceCard`
Adds three buttons next to "Open dashboard" / "Copy URL":
- "Copy SSH command": Tailscale-scoped only — never raw `ssh` —
and per-instance user (`tailscale ssh root@<ts-ip>` for Vijay,
`tailscale ssh uma@<ts-ip>` for Bheem). Disabled when the
snapshot has no Tailscale IP.
- "View tasks": deep link into the Task Ledger pre-filtered by
instance via `/hermes/tasks?instance=<id>`.
- "Open runbook": link to `docs/hermes-operations.md`.
"How to restart this gateway" is intentionally a runbook link, not
a button — restarting is privileged and should go through the
documented procedure, not the dashboard UI.
3. URL-param hydration of the instance switcher
`HermesInstanceProvider` now reads `?instance=` from the URL on
mount (and on subsequent navigations to a different value). The
URL value wins over the persisted localStorage selection so deep
links from the ops panel land on a pre-filtered pane. The param
is intentionally not auto-stripped — back/forward and copy-paste
stay meaningful.
Roadmap status: Phase 6 ticked except trend cards (deferred — needs
client-side history persistence) and theme toggle (deferred — shell
doesn't expose a switch primitive yet). Unified-alerts-feed bullet
partially achieved by the new severity filter; the per-instance roll-up
will land when a UI consumer is built for the Phase 3 telemetry
endpoint.
Verified: typecheck ✅, build ✅, 7/7 E2E ✅ (the existing switcher
test exercises the new context code path; URL hydration is covered
indirectly by the deep-link button → Task Ledger pre-filter).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two threads, one commit because they're both about closing dashboard-
side roadmap items that don't need their own slice.
Phase 7 — auth coverage on hermes routes:
- `/api/hermes/ops` was the last unauthenticated Hermes endpoint —
despite revealing instance / gateway / Tailscale-IP / backup-repo /
warnings state. Now gated on `requireAdmin`, matching the new
`/api/hermes/telemetry/:instance` from the previous slice and
every other privileged route in this backend.
- Privilege-surface table in `dashboard/DEPLOYMENT.md` updated to
show `requireAdmin` for both Hermes routes; the previous
"no auth, read-only ops snapshot" carve-out is gone.
- Roadmap Phase 7 ticks for "require auth on hermes routes" + "keep
hermes data private-only" with verification notes.
Phase 4 — Bheem/Uma parity (delegation brief):
- Phase 4 is **VM ops, not codebase work** — it requires sudo on the
Hostinger VM, Uma-owned GitHub credentials, and Telegram bot
tokens. None of it is editable in this repo. Wrote
`docs/prompts/phase4-bheem-uma-parity.md` as a self-contained
delegation brief covering: Uma persistent-backup repo + timer,
Uma health watchdog, first restore rehearsal, quarterly drill
reminder, and the dashboard-side verification (the /hermes/ops +
/hermes/telemetry/bheem outputs that confirm the gap is closed).
- Phase 4 section header in the roadmap now points at the brief
and explains why the checkboxes stay open in this repo.
Verified: backend 57/57 unit tests ✅, web 7/7 E2E ✅ (Playwright
mocks bypass requireAdmin since they fulfill before the request
reaches Fastify; real auth'd users get the same flow as every other
admin route). Lint 0 errors, build green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the "drop the fabricated 50-item mock" Phase 3 line. The Mission
Control Products pane now renders the **real** deployment registry as
its primary view, sourced from `backend/src/modules/services` (the
Cosmos-backed service registry) joined with the health module.
Page layout:
- Top "Live services" SectionCard: real services from
`api.getServices()` joined with `api.getHealth()`. Per-card: status
(up / degraded / down derived from the most recent health probe),
version, health URL, repo path, last deploy, last health check,
response time. Refresh button (busts the 30s health cache via
`clearHealthCache`). Loading / empty / error states. Health-check
poll loop is intentionally not added on this page — the home
dashboard already runs one and our cache layer dedupes.
- Bottom "Planned products (seed data)" SectionCard: the previous
50-item seed view, now clearly labelled `Seed` and demoted below
the live data. Kept until manual entries for not-yet-deployed
products are wired in (per the Phase 3 roadmap note).
E2E:
- `hermes.spec.ts` `beforeEach` now mocks `/api/services`,
`/api/health`, `/api/health/cache` so the products page renders
deterministically without a live backend (the dashboard spec
already does the same for the home page).
Verified: typecheck ✅, 13/13 web unit tests ✅, 7/7 E2E ✅, lint 0
errors, build green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
First slice of Phase 3 ("real per-instance telemetry"). Defines the
read-only artifact contract from Decision #1 (sessions, cron, memory,
skills, watchdog alerts, backup history) and ships an admin-gated
backend endpoint that probes the live Hermes instance, gracefully
degrading to status:'unknown' wherever the source isn't readable.
What's new
- `backend/src/modules/hermes-telemetry/types.ts` — Zod schemas for
every section of the snapshot, plus a `HermesProbeStatus` reused
from hermes-ops so the UI can distinguish "definitely empty" from
"couldn't read the source" for each section independently.
- `backend/src/modules/hermes-telemetry/repository.ts` — implementation
that:
* shells out via `runuser -u <user> --` for cross-user instances
(Bheem/uma) the same way `hermes-ops/repository.ts` does;
* parses `hermes sessions stats / cron list / memory list /
skills list --json` when the CLI is present, otherwise
reports status:'unknown';
* tails the watchdog log and buckets each line by severity
(critical / warn / info);
* pulls `git -C <repo> log` against the instance's backup repo
for backup history;
* caches per-instance with a 30s TTL + in-flight coalescing,
same pattern as hermes-ops.
- `backend/src/modules/hermes-telemetry/routes.ts` — admin-only GET
`/api/hermes/telemetry/:instance` (the `instance` path param is
Zod-validated; the response is validated against
`HermesTelemetrySnapshotSchema` before send so a shape regression
surfaces here, not in the UI).
- `backend/src/modules/hermes-telemetry/hermes-telemetry.test.ts` —
6 unit tests: ENOENT-on-everything case validates against the
schema, JSON-parse path for sessions/cron/memory/skills, watchdog
log severity bucketing, backup-history `git log` parsing, cache
hit, per-instance cache isolation. Coverage: 95.17% lines on the
new repository module.
- `backend/vitest.config.ts` — telemetry repository added to the
coverage gate's `include` list (ratchet).
- `web/src/lib/api.ts` — typed surface for the new endpoint:
`HermesTelemetrySnapshot` + sub-types + `api.getHermesTelemetry`.
What's NOT in this slice
- UI consumption. The Task Ledger / Agents / History panes still
render mock data; converting them is queued for the next slices.
This slice ships the contract + the backend so those slices can
build on a stable shape.
- Backward-compat replacement of `/api/hermes/ops` (which is
unauthenticated today). That comes with the Phase 7 auth pass.
Verified: backend typecheck ✅, 57/57 unit tests ✅, web typecheck ✅,
lint 0 errors, coverage gate ≥95% lines on every gated file.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes Phase 2. Every entity in `web/src/lib/hermes` now carries an
`instanceId: 'vijay' | 'bheem'` (with `'all'` allowed for cross-cutting
agents like Hermes Core / GitHub link), and a global instance switcher
above every Mission Control pane filters them.
Library changes (`web/src/lib/hermes.ts`):
- New `HermesInstanceId` / `HermesInstanceFilter` types + `HERMES_INSTANCES`
metadata array.
- `instanceId` added to `HermesProduct`, `HermesTask`, `HermesEvent`,
`HermesRun`, `HermesAgentStatus`. Seed data deterministically split
~50/50 across instances; agents tagged per-scope (Local VM runner →
bheem, CLI runner / Scheduler → vijay, Hermes Core / GitHub /
OpenClaw / deployment / notifications → all).
- `getHermesTasks({instance})`, `getHermesProducts(view, instance)`,
`getHermesAgents(instance)`, `getHermesHistory(instance)`,
`getHermesOverview(instance)` all accept the filter; helper
`instanceMatches(scope, filter)` keeps the semantics consistent
(always-match for `'all'` on either side).
UI changes:
- New `HermesInstanceProvider` (React context, localStorage-backed
under `hermes.instanceFilter.v1`, SSR-safe default to avoid
hydration mismatch) mounted in `app/hermes/layout.tsx`.
- New `HermesInstanceSwitcher` segmented control (radiogroup with
aria-checked) rendered in the layout header above every pane.
- New `HermesInstanceBadge` shown on task rows (Active Missions +
Task Ledger), product cards (overview minicards + portfolio
cards), and agent cards.
- `/hermes` overview gains a "Per-instance roll-up" section that
always shows Vijay vs Bheem side-by-side regardless of the active
filter — that's the always-cross-instance comparison view, while
the eight metric cards above it are filtered by the switcher.
Tests:
- 2 new unit tests in `lib/hermes.test.ts` (instance tagging on seed
data + filter semantics across tasks/products/agents/overview).
- 1 new E2E test asserting the switcher's radiogroup, default
selection, and persistence-friendly state change.
- All green: 13/13 web unit tests, 7/7 E2E.
`web/test-results/` and `web/playwright-report/` added to `.gitignore`
since they're regenerated per run.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 5 P2 checkbox (second half — first half: pino logging
in 1e64d75). Phase 5 is now fully green.
Two changes:
1. `web/e2e/hermes.spec.ts` now intercepts `/api/hermes/ops` with a
fixture snapshot. The backend's hermes-ops endpoint shells out to
`systemctl` / `git` / `ps` / `du` on the live VM and is therefore
neither available nor deterministic in CI. Mocking it lets the
suite run against the web stack alone (no backend, no live VM).
Fixture shape mirrors the Zod schema in
`backend/src/modules/hermes-ops/types.ts`.
2. `.gitea/workflows/ci.yml` re-enables the previously-commented-out
E2E step. Adds a preceding `playwright install --with-deps
chromium` step so the runner pulls the browser fresh per run.
The web suite starts its own Next dev server via Playwright's
`webServer` config (`pnpm exec next dev -p 3200`), so we do NOT
start the backend in CI — every backend route used by the suite
is mocked via `page.route` (auth, csrf, services, deployments,
health/cache, seed, hermes-ops).
Verified locally: `pnpm exec playwright test` → 6 passed in 19.5s
(2 hermes specs + 4 dashboard/login specs across desktop + mobile).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a safe, reversible path to validate the fleet coordinator against the proven
single-host path BEFORE cutover, via three independently-toggleable flags:
AQ_FLEET=0 pure offline (zero coordinator calls; offline path unchanged)
AQ_FLEET_ROUTE=1 route_via_service: coordinator authoritative for claim (default = P2-S3)
AQ_FLEET_ROUTE=0 local inbox authoritative (coordinator not used to source work)
AQ_FLEET_SHADOW=1 dual-run (needs AQ_FLEET=1 + ROUTE=0): query coordinator in parallel,
record divergence, NEVER act on it
Precedence: SHADOW only when ROUTE=0; if ROUTE=1 + SHADOW=1, ROUTE wins (one-shot warning).
lib/fleet-client.sh: fleet_route_enabled / fleet_shadow_enabled / fleet_flags_warn_once /
fleet_flags_state; fleet_shadow_claim (read-only — isolated `-shadow` factoryId +
dryRun, releases any real lease, never materializes), fleet_shadow_compare
(AGREE/DIVERGE/COORD_EMPTY/LOCAL_EMPTY → .state/fleet-shadow.log), fleet_shadow_report
(shadow:true, response never acted on), cmd_fleet_shadow_report (counts + agreement rate).
agent-queue.sh: ROUTE-gate claim sourcing (claim only when route_via_service);
shadow hook after the local authoritative decision each iteration (best-effort,
error-swallowed — shadow can never fail a real job); `fleet-shadow-report` subcommand
+ help; resolved flags surfaced in `status`/`fleet-status`. tryClaim/fence/offline
paths unchanged.
Strictly side-effect-free on real job state: shadow never ships, quarantines, or
mutates real jobs. Offline path byte-for-byte unchanged when AQ_FLEET=0.
selftest.sh: +8 checks (shadow AGREE/DIVERGE/COORD_EMPTY, non-fatal 5xx, ROUTE
precedence, ROUTE=0 local-authoritative, fleet-shadow-report summary, shadow_report
unit). 60 prior checks unchanged → 68 total green. README + GIGAFACTORY_ROADMAP
document the flag model + cutover ladder.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
First half of Phase 5 P2 (the "structured backend logging" piece;
E2E-in-CI lands separately so the diff stays reviewable).
Adds `lib/logger.ts` exporting a singleton pino instance shared between
Fastify (via `loggerInstance`) and any non-request code path. One
configured logger across the backend means uniform formatting,
redaction, and log-level control:
- LOG_LEVEL env knob (defaults: debug in non-prod, info in prod when
NODE_ENV=production). Documented in `.env.example`.
- Built-in redaction for Authorization / Cookie headers and the
common secret-shaped field names (password, token, refreshToken,
accessToken, csrfToken, JWT_SECRET, CSRF_SECRET, ENCRYPTION_KEY,
COSMOS_KEY, AZURE_CLIENT_SECRET) so an accidental
`req.log.info(req.body)` or `logger.error({ err, config }, …)`
won't dump credentials. This is a backstop, not the primary
defense — call sites should still avoid logging raw config/req.
- JSON to stdout in every environment. Pipe through `pino-pretty`
locally if you want pretty output; we deliberately don't bundle
pino-pretty as a runtime dep.
- `childLogger(module)` helper tags log lines with their origin so
repositories/background workers don't have to repeat the module
name on every line.
Sweeps the runtime `console.error` sites that lose request context
(deployment orchestrator background fire-and-forget, system docker
stats/cleanup, backup CRUD, vm getAllContainers) onto the structured
logger. CLI-only modules (`scripts/run-migrations.ts`,
`migrations/index.ts`, `cosmos-init.ts` startup, `azure-keyvault.ts`,
`config.ts` env warnings, `lib/migrations.ts` no-op message) keep
`console.*` for now — they run before Fastify is up and are queued for
a separate cleanup pass.
Tests, typecheck, lint (0 errors), build green. Coverage gate still
passing (≥95% lines on every gated file).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the final Phase 5 P1 checkbox and REVIEW_ACTIONS #6.
The backend container has root-equivalent host access via the docker
socket, host log mounts, and the VM scripts mount, but until now the
"who can do what to the host?" answer was scattered across compose
files and route handlers. This commit centralizes it.
DEPLOYMENT.md gains a "Privilege Surface" section that lists:
- every host mount + container path + mode + purpose
- every shell-outing route, the actual commands it runs, and the
auth gate on each
- what an admin token can do today (≈ host shell)
- five known sharp edges (un-allow-listed container names, unvalidated
projectPath, no per-route audit-log on shell-outs, container runs
as root, global rate-limit only)
- a P1 → P3 mitigation roadmap (allow-list wrapper around shell-outs,
projectPath validation, audit-logging shell-outs, drop root in
container, replace docker.sock with a verb-restricted proxy)
Concurrent code fix: `POST /code-quality/check` was reachable
**unauthenticated** despite shelling out to `npm run typecheck/lint/
build/test:run` in a caller-supplied `projectPath`. Added
`preHandler: requireAdmin` to bring it in line with every other
shell-outing route in the dashboard. Same commit because the
documentation table promises this gate exists.
REVIEW_ACTIONS #6 marked RESOLVED with the rationale; roadmap checkbox
ticked. Tests, typecheck, lint (0 errors), build, and coverage gate
(≥95% lines on every gated file) all stay green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 5 P1 doc-drift checkbox and REVIEW_ACTIONS #5.
The 3000-vs-3049 confusion came from prose claims in three docs that
each picked a different "right" answer. The truth is: the web container
listens on :3000; docker-compose maps `127.0.0.1:3049:3000`; production
is fronted by Traefik on `https://devops.bytelyst.com`. Encoding that
explicitly so future readers don't have to dig through compose files:
- DEPLOYMENT.md becomes canonical. Its content is now the (more
accurate) old DEPLOYMENT_GUIDE.md merged with a "Ports — quick
reference" table covering Local dev / Docker Compose / Production
Traefik, plus a Local-development section for `pnpm dev`.
- DEPLOYMENT_GUIDE.md → 5-line redirect stub pointing at
DEPLOYMENT.md (kept for `deploy.sh` and any external links).
- deploy.sh updated to point at DEPLOYMENT.md.
- README.md "Web port: 3000" line rewritten to spell out container
vs Compose-host vs dev-mode and link to the port table.
- ENDPOINTS.md gets a top-of-file note: every `localhost:3000` URL
in that file is the `pnpm dev` workflow; substitute `:3049` for
the Dockerized stack.
- REVIEW_ACTIONS.md #5 marked RESOLVED with the rationale.
No code, behavior, lint, or test changes.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the long-standing SSE TODO. The previous attempt with
`fastify-sse-v2 ^4` was incompatible with Fastify 5 and was never wired
in; the README/DEPLOYMENT.md kept advertising "real-time log streaming"
that didn't exist. The web client never used EventSource — `web/src/
lib/api.ts` already polls `/deployments/:id/logs` via the normal
`apiRequest` helper.
Resolution: remove the claim, not ship the feature.
- drop `fastify-sse-v2` dep from `backend/package.json` + lockfile
- delete the commented-out plugin import + register in `server.ts`,
replace with a NOTE explaining the JSON-polling decision and how
to add a stream later (`reply.raw`)
- remove the `TODO: Re-enable SSE` comment in `deployments/routes.ts`;
the endpoint already returns JSON, document that explicitly
- rewrite the README "Deployment Log Streaming" section as
"Deployment Logs" (JSON-polled, no SSE); fix the endpoint table
- flip the DEPLOYMENT.md bullet from "Real-time log streaming (SSE)"
to "Deployment log retrieval (JSON polling — no SSE)"
- mark REVIEW_ACTIONS #4 RESOLVED with the reasoning
- tick the roadmap checkbox
If a real-time stream is wanted later, ship it explicitly via
`reply.raw` and update README/DEPLOYMENT.md/the route comment in the
same change. Don't reintroduce a half-disabled plugin.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 5 P1 testing checkbox. Adds 35 new unit tests across the
modules called out in the roadmap and wires a v8 coverage gate into CI.
Coverage of newly-tested files (lines / branches):
lib/auth.ts 94.4% / 100%
lib/csrf.ts 95.1% / 90%
modules/health/repository.ts 100% / 92%
modules/deployments/orchestrator.ts 95.2% / 74%
modules/services/repository.ts 100% / 100%
modules/hermes-ops/repository.ts 95.2% / 68%
Threshold (lines/funcs/stmts ≥85%, branches ≥65%) is scoped to those six
files via `coverage.include` so untested legacy modules (vm, system,
audit, route handlers) report but don't gate. Add files there as they
gain real tests — ratchet up, never relax.
Test approach mirrors the existing services/hermes-ops suites: hoisted
mocks for I/O (fetch, child_process, fs/promises, cosmos-init), real
JOSE-signed JWTs for the auth path, fake timers for cache TTL and CSRF
expiry assertions.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Short-TTL (30s) snapshot cache + in-flight coalescing so the panel poll and
concurrent refreshes don't fan out ~20 systemctl/git/ps/du subprocesses each
time; snapshot carries a `cached` flag and `getHermesOpsSnapshot({force})`.
- Distinguish "unit inactive" (down) from "probe couldn't run" (unknown): a new
exec() wrapper reports whether the command actually ran (ENOENT/timeout =
unknown) vs exited non-zero with output (e.g. systemctl is-active -> inactive).
Per-field ProbeStatus on gateway/dashboard/timer/repo; warnings differentiate
"is not active" from "status could not be determined".
- Robust Bheem/Uma checks: `runuser -u uma -- systemctl --user is-active/
is-enabled` with a ps / existsSync fallback so a failed probe degrades to the
legacy check instead of a false "down".
- Zod schema (HermesOpsSnapshotSchema) as the stable typed contract; the route
validates output before sending. New status fields are additive (active/
enabled/url/etc. preserved) so the existing web client is unaffected.
- Unit tests (mock execFile/fs): healthy snapshot, down vs unknown mapping,
runuser->ps fallback, unreadable repo, cache hit + force bypass, request
coalescing. Backend: 16 tests green.
Roadmap: check off Phase 1 items and Phase 5 P0 in hermes_dashboard_v2_roadmap.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ci.yml: actions/checkout into the runner workspace instead of cd-ing into a
hard-coded host path and `git reset --hard origin/main` on the live checkout;
install via `pnpm install:gitea` (self-contained, no sibling common-plat
checkout); E2E step left as a TODO pointer (ci-e2e-hardening, Phase 5 P2).
- Fix the same stale /opt/bytelyst/bytelyst-devops-tools path in deploy.sh,
scripts/deploy-hotcopy.sh, DEPLOYMENT.md, DEPLOYMENT_GUIDE.md.
- Replace the no-op `lint` echoes with real ESLint 9 flat configs (js +
typescript-eslint recommended) for backend and web; add a root `pnpm lint`.
- Fix the 10 errors lint surfaced, incl. require('os') in an ESM backend
(system/repository.ts -> import * as os), prefer-const x4, and a ternary
expression-statement in web vm/page.tsx.
Verified locally: secret-scan, lint (0 errors; correctly fails on bad code),
typecheck, unit tests (backend 9 / web 11), and build all green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sources lib/fleet-client.sh and adds a few fleet_enabled-gated hooks so the offline
git-queue path is byte-for-byte unchanged when AQ_FLEET is unset/0:
- cmd_run: register at loop start; per-iteration heartbeat (cadence) + lease renew
for in-flight fleet jobs + claim one coordinator job into inbox when capacity.
- meta: persist fleet_job_id + fleet_lease_epoch (from claim frontmatter).
- run_worker: report `building` (with WIP checkpoint) after WIP setup and `review`
before accepting the agent's output — a FENCED (stale-epoch/409) report self-aborts
and quarantines (never ships); 5xx/unreachable degrades (finish locally).
- _auto_echo: for fleet jobs route the outcome echo through the coordinator
(fleet_events) instead of the direct tracker echo; offline jobs unchanged.
- cmd_ship: fence-check before shipping a fleet job; release lease after.
- status: show factory id + per-job fleet=<id>@e<epoch>; insights lists fleet_* fields.
- dispatch + help: `fleet-status` command + a FLEET env section.
New sourced library implementing the factory side of the Phase-2 `fleet`
coordinator contract — curl-only + POSIX awk, reusing the Slice-4 HTTP/JSON
helper patterns, no new deps. Every function is a no-op unless AQ_FLEET=1.
- fleet_enabled / fleet_api (AQ_FLEET_API_CMD test seam) / _fleet_call
- fleet_detect_caps (reuses detect_capabilities) -> JSON caps array
- fleet_heartbeat (+ _maybe cadence): registration == first heartbeat
- fleet_claim: POST /fleet/claim, parse job id/bodyMd/leaseEpoch, materialize a
transient local .md (fleet-job-id + fleet-lease-epoch in frontmatter)
- fleet_report: PATCH fenced stage transition {stage, leaseEpoch, checkpoint?};
returns ok / FENCED(2, stale epoch -> self-abort) / degraded(1, unreachable)
- fleet_lease_renew / fleet_lease_release / fleet_renew_active (fenced)
- fleet_quarantine: park a reclaimed (fenced) job in failed/ for human triage
- cmd_fleet_status: register + print factory identity/caps
Report payloads carry only stage/epoch/checkpoint — never prompt/bodyMd/token.
vm-health-check.sh:
- check_gpu(): nvidia-smi probe; "CPU-only" OK on this VM (no GPU)
- check_image_freshness(): flag containers running images >30d old.
Skips third-party images (gitea, grafana, prom, mcr.microsoft, axllent,
caddy, traefik, valkey, cadvisor) — they have their own rebuild cadence.
Currently flags 19 stale product images (~60d old).
chaos-validation.sh:
- Monthly chaos test: kill PID 1 in chronomind-web, wait up to 35 min
for docker-health-watchdog to detect + restart. Telegram pass/fail.
- Refuses to run if target not healthy. systemd timer fires 1st of month
at 10:00 UTC (after 08:00 weekly digest).
vm-io-anomaly-check.sh:
- 6h avg sda write rate; transition alerts at WARN (1 GB/hr) /
CRIT (2.5 GB/hr). De-dupes via /var/log/vm-io-anomaly-state so the
alert fires once per transition, not every 6h. Current baseline:
~1.94 GB/hr (orphan-container state-file writes; see Phase 0.3).
- Reports recovery to OK when rate drops back.
vm/page.tsx: gpu + image_freshness added to CHECK_META so they render
with proper icon/label and slot into CHECK_ORDER.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>