Closes the final Phase 5 P1 checkbox and REVIEW_ACTIONS #6.
The backend container has root-equivalent host access via the docker
socket, host log mounts, and the VM scripts mount, but until now the
"who can do what to the host?" answer was scattered across compose
files and route handlers. This commit centralizes it.
DEPLOYMENT.md gains a "Privilege Surface" section that lists:
- every host mount + container path + mode + purpose
- every shell-outing route, the actual commands it runs, and the
auth gate on each
- what an admin token can do today (≈ host shell)
- five known sharp edges (un-allow-listed container names, unvalidated
projectPath, no per-route audit-log on shell-outs, container runs
as root, global rate-limit only)
- a P1 → P3 mitigation roadmap (allow-list wrapper around shell-outs,
projectPath validation, audit-logging shell-outs, drop root in
container, replace docker.sock with a verb-restricted proxy)
Concurrent code fix: `POST /code-quality/check` was reachable
**unauthenticated** despite shelling out to `npm run typecheck/lint/
build/test:run` in a caller-supplied `projectPath`. Added
`preHandler: requireAdmin` to bring it in line with every other
shell-outing route in the dashboard. Same commit because the
documentation table promises this gate exists.
REVIEW_ACTIONS #6 marked RESOLVED with the rationale; roadmap checkbox
ticked. Tests, typecheck, lint (0 errors), build, and coverage gate
(≥95% lines on every gated file) all stay green.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 5 P1 doc-drift checkbox and REVIEW_ACTIONS #5.
The 3000-vs-3049 confusion came from prose claims in three docs that
each picked a different "right" answer. The truth is: the web container
listens on :3000; docker-compose maps `127.0.0.1:3049:3000`; production
is fronted by Traefik on `https://devops.bytelyst.com`. Encoding that
explicitly so future readers don't have to dig through compose files:
- DEPLOYMENT.md becomes canonical. Its content is now the (more
accurate) old DEPLOYMENT_GUIDE.md merged with a "Ports — quick
reference" table covering Local dev / Docker Compose / Production
Traefik, plus a Local-development section for `pnpm dev`.
- DEPLOYMENT_GUIDE.md → 5-line redirect stub pointing at
DEPLOYMENT.md (kept for `deploy.sh` and any external links).
- deploy.sh updated to point at DEPLOYMENT.md.
- README.md "Web port: 3000" line rewritten to spell out container
vs Compose-host vs dev-mode and link to the port table.
- ENDPOINTS.md gets a top-of-file note: every `localhost:3000` URL
in that file is the `pnpm dev` workflow; substitute `:3049` for
the Dockerized stack.
- REVIEW_ACTIONS.md #5 marked RESOLVED with the rationale.
No code, behavior, lint, or test changes.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the long-standing SSE TODO. The previous attempt with
`fastify-sse-v2 ^4` was incompatible with Fastify 5 and was never wired
in; the README/DEPLOYMENT.md kept advertising "real-time log streaming"
that didn't exist. The web client never used EventSource — `web/src/
lib/api.ts` already polls `/deployments/:id/logs` via the normal
`apiRequest` helper.
Resolution: remove the claim, not ship the feature.
- drop `fastify-sse-v2` dep from `backend/package.json` + lockfile
- delete the commented-out plugin import + register in `server.ts`,
replace with a NOTE explaining the JSON-polling decision and how
to add a stream later (`reply.raw`)
- remove the `TODO: Re-enable SSE` comment in `deployments/routes.ts`;
the endpoint already returns JSON, document that explicitly
- rewrite the README "Deployment Log Streaming" section as
"Deployment Logs" (JSON-polled, no SSE); fix the endpoint table
- flip the DEPLOYMENT.md bullet from "Real-time log streaming (SSE)"
to "Deployment log retrieval (JSON polling — no SSE)"
- mark REVIEW_ACTIONS #4 RESOLVED with the reasoning
- tick the roadmap checkbox
If a real-time stream is wanted later, ship it explicitly via
`reply.raw` and update README/DEPLOYMENT.md/the route comment in the
same change. Don't reintroduce a half-disabled plugin.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes the Phase 5 P1 testing checkbox. Adds 35 new unit tests across the
modules called out in the roadmap and wires a v8 coverage gate into CI.
Coverage of newly-tested files (lines / branches):
lib/auth.ts 94.4% / 100%
lib/csrf.ts 95.1% / 90%
modules/health/repository.ts 100% / 92%
modules/deployments/orchestrator.ts 95.2% / 74%
modules/services/repository.ts 100% / 100%
modules/hermes-ops/repository.ts 95.2% / 68%
Threshold (lines/funcs/stmts ≥85%, branches ≥65%) is scoped to those six
files via `coverage.include` so untested legacy modules (vm, system,
audit, route handlers) report but don't gate. Add files there as they
gain real tests — ratchet up, never relax.
Test approach mirrors the existing services/hermes-ops suites: hoisted
mocks for I/O (fetch, child_process, fs/promises, cosmos-init), real
JOSE-signed JWTs for the auth path, fake timers for cache TTL and CSRF
expiry assertions.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Short-TTL (30s) snapshot cache + in-flight coalescing so the panel poll and
concurrent refreshes don't fan out ~20 systemctl/git/ps/du subprocesses each
time; snapshot carries a `cached` flag and `getHermesOpsSnapshot({force})`.
- Distinguish "unit inactive" (down) from "probe couldn't run" (unknown): a new
exec() wrapper reports whether the command actually ran (ENOENT/timeout =
unknown) vs exited non-zero with output (e.g. systemctl is-active -> inactive).
Per-field ProbeStatus on gateway/dashboard/timer/repo; warnings differentiate
"is not active" from "status could not be determined".
- Robust Bheem/Uma checks: `runuser -u uma -- systemctl --user is-active/
is-enabled` with a ps / existsSync fallback so a failed probe degrades to the
legacy check instead of a false "down".
- Zod schema (HermesOpsSnapshotSchema) as the stable typed contract; the route
validates output before sending. New status fields are additive (active/
enabled/url/etc. preserved) so the existing web client is unaffected.
- Unit tests (mock execFile/fs): healthy snapshot, down vs unknown mapping,
runuser->ps fallback, unreadable repo, cache hit + force bypass, request
coalescing. Backend: 16 tests green.
Roadmap: check off Phase 1 items and Phase 5 P0 in hermes_dashboard_v2_roadmap.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ci.yml: actions/checkout into the runner workspace instead of cd-ing into a
hard-coded host path and `git reset --hard origin/main` on the live checkout;
install via `pnpm install:gitea` (self-contained, no sibling common-plat
checkout); E2E step left as a TODO pointer (ci-e2e-hardening, Phase 5 P2).
- Fix the same stale /opt/bytelyst/bytelyst-devops-tools path in deploy.sh,
scripts/deploy-hotcopy.sh, DEPLOYMENT.md, DEPLOYMENT_GUIDE.md.
- Replace the no-op `lint` echoes with real ESLint 9 flat configs (js +
typescript-eslint recommended) for backend and web; add a root `pnpm lint`.
- Fix the 10 errors lint surfaced, incl. require('os') in an ESM backend
(system/repository.ts -> import * as os), prefer-const x4, and a ternary
expression-statement in web vm/page.tsx.
Verified locally: secret-scan, lint (0 errors; correctly fails on bad code),
typecheck, unit tests (backend 9 / web 11), and build all green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sources lib/fleet-client.sh and adds a few fleet_enabled-gated hooks so the offline
git-queue path is byte-for-byte unchanged when AQ_FLEET is unset/0:
- cmd_run: register at loop start; per-iteration heartbeat (cadence) + lease renew
for in-flight fleet jobs + claim one coordinator job into inbox when capacity.
- meta: persist fleet_job_id + fleet_lease_epoch (from claim frontmatter).
- run_worker: report `building` (with WIP checkpoint) after WIP setup and `review`
before accepting the agent's output — a FENCED (stale-epoch/409) report self-aborts
and quarantines (never ships); 5xx/unreachable degrades (finish locally).
- _auto_echo: for fleet jobs route the outcome echo through the coordinator
(fleet_events) instead of the direct tracker echo; offline jobs unchanged.
- cmd_ship: fence-check before shipping a fleet job; release lease after.
- status: show factory id + per-job fleet=<id>@e<epoch>; insights lists fleet_* fields.
- dispatch + help: `fleet-status` command + a FLEET env section.
New sourced library implementing the factory side of the Phase-2 `fleet`
coordinator contract — curl-only + POSIX awk, reusing the Slice-4 HTTP/JSON
helper patterns, no new deps. Every function is a no-op unless AQ_FLEET=1.
- fleet_enabled / fleet_api (AQ_FLEET_API_CMD test seam) / _fleet_call
- fleet_detect_caps (reuses detect_capabilities) -> JSON caps array
- fleet_heartbeat (+ _maybe cadence): registration == first heartbeat
- fleet_claim: POST /fleet/claim, parse job id/bodyMd/leaseEpoch, materialize a
transient local .md (fleet-job-id + fleet-lease-epoch in frontmatter)
- fleet_report: PATCH fenced stage transition {stage, leaseEpoch, checkpoint?};
returns ok / FENCED(2, stale epoch -> self-abort) / degraded(1, unreachable)
- fleet_lease_renew / fleet_lease_release / fleet_renew_active (fenced)
- fleet_quarantine: park a reclaimed (fenced) job in failed/ for human triage
- cmd_fleet_status: register + print factory identity/caps
Report payloads carry only stage/epoch/checkpoint — never prompt/bodyMd/token.
vm-health-check.sh:
- check_gpu(): nvidia-smi probe; "CPU-only" OK on this VM (no GPU)
- check_image_freshness(): flag containers running images >30d old.
Skips third-party images (gitea, grafana, prom, mcr.microsoft, axllent,
caddy, traefik, valkey, cadvisor) — they have their own rebuild cadence.
Currently flags 19 stale product images (~60d old).
chaos-validation.sh:
- Monthly chaos test: kill PID 1 in chronomind-web, wait up to 35 min
for docker-health-watchdog to detect + restart. Telegram pass/fail.
- Refuses to run if target not healthy. systemd timer fires 1st of month
at 10:00 UTC (after 08:00 weekly digest).
vm-io-anomaly-check.sh:
- 6h avg sda write rate; transition alerts at WARN (1 GB/hr) /
CRIT (2.5 GB/hr). De-dupes via /var/log/vm-io-anomaly-state so the
alert fires once per transition, not every 6h. Current baseline:
~1.94 GB/hr (orphan-container state-file writes; see Phase 0.3).
- Reports recovery to OK when rate drops back.
vm/page.tsx: gpu + image_freshness added to CHECK_META so they render
with proper icon/label and slot into CHECK_ORDER.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prometheus.ts: new Prometheus client with 7d/30d range queries for disk,
memory, swap, CPU steal, and disk I/O (GB/hr); getWeeklyDigestData()
aggregates all metrics for digest and API endpoint
- routes.ts: GET /api/vm/metrics/trend?metric=…&range=… and
GET /api/vm/weekly-digest endpoints
- api.ts: TrendPoint/TrendSeries types; getTrend() and getMemoryTrend()
added to vmApi
- vm/page.tsx: Sparkline (pure SVG polyline+fill), TrendCard with
latest/avg/peak and threshold colouring, TrendsPanel with lazy load
on first open; Promise.allSettled() isolation for all 5 data panels
- vm-weekly-digest.sh: weekly Telegram digest via docker exec into
devops-backend to reach Prometheus; emoji severity indicators; cron
summary from /var/log/vm-cleanup.log
- systemd timer: Mon 08:00 UTC, Persistent=true (fires on next boot
if missed); first trigger 2026-06-02
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements §10 single-host tracker integration, closing the last Phase-1 §14 item:
- tracker_api: one curl-only HTTP wrapper (base URL + bearer + productId header),
overridable via AQ_TRACKER_API_CMD so tests need no live service. Emits the
response body + a trailing HTTP-code line; _api_call splits into API_BODY/API_CODE.
- aq from-tracker <ITEM_ID>: GET the Item, map title/description -> job body,
labels (engine-class:/profile:/priority:/cap:) + Item priority -> frontmatter,
and stamp tracker-item + a stable idempotency-key tracker-<id>. Materializes a
.md into inbox/ via cmd_add; idempotent (Slice 1 dedupe) so a re-pull never dups.
JSON parsed with POSIX awk (no jq) — mac + linux safe.
- aq to-tracker <job>: one-way echo (child -> tracker, §24.5). PATCHes the Item
status (building/review/testing->in_progress, shipped->done, failures->wont_fix,
all overridable) and posts a metrics-only comment (result/attempts/duration/
tokens/cost/diff — NEVER prompt content or secrets). Idempotent via meta
tracker_echoed; an echo failure (e.g. HTTP 500) is logged and non-fatal — the
tracker is downstream, never authoritative for execution.
- Opt-in auto-echo (AQ_TRACKER_AUTO=1, default OFF): the worker echoes on each
transition (building via cmd_run, review/testing/failed via run_worker, shipped
via ship/promote); never blocks or fails a job.
- status + insights surface tracker-item and the last echoed status.
curl-only HTTP; no new runtime deps; conventional + backward-compatible.
Implements roadmap §6 (profiles) and §5 deps on the bash runner, backward-compatible
(jobs without profile/deps behave exactly as before).
Profiles (§6):
- profile_get / profile_persona / fm_eff helpers + PROFILES_DIR (AGENT_QUEUE_PROFILES
override). A job's `profile:` inherits verify (<- default-verify), capabilities,
engine-class, prefers-engine, allowed-scope, review-policy when the job omits them;
job fields always override (precedence job > profile > default). Resolution runs via
fm_eff inside the capability gate and resolve_engine, so inherited caps/engine-class
take effect before launch.
- persona injection: the profile's persona block is prepended to the stripped body
fed to the engine (job .md unchanged on disk; nothing secret logged).
- allowed-scope guardrail (WARN-ONLY): scope_check logs a non-blocking WARNING +
records scope_warning= for changed paths outside the globs; path_in_scope is a
pure, unit-testable matcher (`dir/**` = subtree).
deps / DAG, single host (§5):
- deps reference other jobs by idempotency-key. dep_satisfied: shipped/ (hard) or
shipped/+testing/ (deps-mode: soft). deps_unmet drives a block-with-reason skip in
inbox selection (never launched/failed); cmd_status surfaces "blocked (waiting on
<keys>)". deps_would_cycle rejects cyclic submits on `add`.
- _drain_pending: `--once` drains past dep-blocked jobs (idle can't satisfy them)
while still waiting on retry/recovery backoff timers.
Meta now records effective (inherited) capabilities/engine-class/prefers-engine/
review-policy/allowed-scope so `status` reflects resolved config.
Read-only from meta: tokens or cost + attempts + line deltas + duration; recognizes the new retries_exhausted result. agent-queue.sh stays the source of truth.
Implements the single-host bash equivalents of roadmap §25 (durability/crash
recovery) and §26 (execution insights), plus §11 retry/dead-letter stand-in.
Resilience (A1-A4):
- recover_orphans + `recover` command: building/ jobs with a dead worker (dead
pid, pidstart reuse-guard) are moved back to inbox/ with attempts incremented,
on `run` startup and each loop. Idempotent (folder location is the guard).
- WIP checkpointing: for a git cwd, _wip_start creates/checks out aq/wip/<job>
and _wip_checkpoint commits changes on every exit path via an EXIT/INT/TERM
trap; never commits to main/current branch; non-git cwd skipped. RESUME: a
relaunch whose aq/wip/<job> exists checks it out first (continue from
checkpoint). wip_base persisted in a write-once sidecar.
- retry policy (now functional): retry { max, backoff, on } requeues failures
whose class (timeout|verify_failed|crash) is in `on`, honoring backoff via
next_eligible (selection skips until eligible), up to max attempts; exhaustion
-> failed/ result=retries_exhausted with the WIP branch + full log preserved.
- state integrity: all meta writes stay append-only; attempts/next_eligible/wip_*
are re-derivable; recovery is crash-safe.
Insights (B1-B6):
- per-run metrics into meta: duration_s, exit, result, attempts, and (git cwd)
files_changed/lines_added/lines_deleted from numstat wip_base..HEAD.
- parse_usage(engine, log) adapter: generic AQ_USAGE line + Claude/Codex token
heuristics; Devin/Copilot TODO; usage_estimated flag; never fabricates numbers.
- status insights sub-line; new `insights [job]` command (per-job metrics or a
recent table + per-engine token/cost/success/duration rollup).
- privacy: only metrics are recorded, never prompt content or secrets.
Backward-compatible: legacy .md and non-git cwd behave exactly as before.
New §24 + feature-catalog row:
- two delegation modes: atomic (leaf bug/feature/task) vs composite (roadmap/epic)
- introduce job kind (leaf|composite); composite routes to a planner/orchestrator
that fans out child leaf jobs as a DAG across factories/agents/profiles
- parentId hierarchy + rollup semantics (status/budget/verify/phase-gates) +
idempotent re-run (skip shipped children)
- source-of-truth/sync discipline (one record referenced by many; one-way echo)
- HYBRID decision recorded: model kind/parentId/rollup in the fleet layer now,
keep shared tracker ITEM_TYPES unchanged (label kind:roadmap), promote to a
first-class epic type later via additive migration once proven
- phasing: leaf-only P1-P2; manual composite P3; auto-decomposition planner P3->P5