- prometheus.ts: new Prometheus client with 7d/30d range queries for disk,
memory, swap, CPU steal, and disk I/O (GB/hr); getWeeklyDigestData()
aggregates all metrics for digest and API endpoint
- routes.ts: GET /api/vm/metrics/trend?metric=…&range=… and
GET /api/vm/weekly-digest endpoints
- api.ts: TrendPoint/TrendSeries types; getTrend() and getMemoryTrend()
added to vmApi
- vm/page.tsx: Sparkline (pure SVG polyline+fill), TrendCard with
latest/avg/peak and threshold colouring, TrendsPanel with lazy load
on first open; Promise.allSettled() isolation for all 5 data panels
- vm-weekly-digest.sh: weekly Telegram digest via docker exec into
devops-backend to reach Prometheus; emoji severity indicators; cron
summary from /var/log/vm-cleanup.log
- systemd timer: Mon 08:00 UTC, Persistent=true (fires on next boot
if missed); first trigger 2026-06-02
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements §10 single-host tracker integration, closing the last Phase-1 §14 item:
- tracker_api: one curl-only HTTP wrapper (base URL + bearer + productId header),
overridable via AQ_TRACKER_API_CMD so tests need no live service. Emits the
response body + a trailing HTTP-code line; _api_call splits into API_BODY/API_CODE.
- aq from-tracker <ITEM_ID>: GET the Item, map title/description -> job body,
labels (engine-class:/profile:/priority:/cap:) + Item priority -> frontmatter,
and stamp tracker-item + a stable idempotency-key tracker-<id>. Materializes a
.md into inbox/ via cmd_add; idempotent (Slice 1 dedupe) so a re-pull never dups.
JSON parsed with POSIX awk (no jq) — mac + linux safe.
- aq to-tracker <job>: one-way echo (child -> tracker, §24.5). PATCHes the Item
status (building/review/testing->in_progress, shipped->done, failures->wont_fix,
all overridable) and posts a metrics-only comment (result/attempts/duration/
tokens/cost/diff — NEVER prompt content or secrets). Idempotent via meta
tracker_echoed; an echo failure (e.g. HTTP 500) is logged and non-fatal — the
tracker is downstream, never authoritative for execution.
- Opt-in auto-echo (AQ_TRACKER_AUTO=1, default OFF): the worker echoes on each
transition (building via cmd_run, review/testing/failed via run_worker, shipped
via ship/promote); never blocks or fails a job.
- status + insights surface tracker-item and the last echoed status.
curl-only HTTP; no new runtime deps; conventional + backward-compatible.
Implements roadmap §6 (profiles) and §5 deps on the bash runner, backward-compatible
(jobs without profile/deps behave exactly as before).
Profiles (§6):
- profile_get / profile_persona / fm_eff helpers + PROFILES_DIR (AGENT_QUEUE_PROFILES
override). A job's `profile:` inherits verify (<- default-verify), capabilities,
engine-class, prefers-engine, allowed-scope, review-policy when the job omits them;
job fields always override (precedence job > profile > default). Resolution runs via
fm_eff inside the capability gate and resolve_engine, so inherited caps/engine-class
take effect before launch.
- persona injection: the profile's persona block is prepended to the stripped body
fed to the engine (job .md unchanged on disk; nothing secret logged).
- allowed-scope guardrail (WARN-ONLY): scope_check logs a non-blocking WARNING +
records scope_warning= for changed paths outside the globs; path_in_scope is a
pure, unit-testable matcher (`dir/**` = subtree).
deps / DAG, single host (§5):
- deps reference other jobs by idempotency-key. dep_satisfied: shipped/ (hard) or
shipped/+testing/ (deps-mode: soft). deps_unmet drives a block-with-reason skip in
inbox selection (never launched/failed); cmd_status surfaces "blocked (waiting on
<keys>)". deps_would_cycle rejects cyclic submits on `add`.
- _drain_pending: `--once` drains past dep-blocked jobs (idle can't satisfy them)
while still waiting on retry/recovery backoff timers.
Meta now records effective (inherited) capabilities/engine-class/prefers-engine/
review-policy/allowed-scope so `status` reflects resolved config.
Read-only from meta: tokens or cost + attempts + line deltas + duration; recognizes the new retries_exhausted result. agent-queue.sh stays the source of truth.
Implements the single-host bash equivalents of roadmap §25 (durability/crash
recovery) and §26 (execution insights), plus §11 retry/dead-letter stand-in.
Resilience (A1-A4):
- recover_orphans + `recover` command: building/ jobs with a dead worker (dead
pid, pidstart reuse-guard) are moved back to inbox/ with attempts incremented,
on `run` startup and each loop. Idempotent (folder location is the guard).
- WIP checkpointing: for a git cwd, _wip_start creates/checks out aq/wip/<job>
and _wip_checkpoint commits changes on every exit path via an EXIT/INT/TERM
trap; never commits to main/current branch; non-git cwd skipped. RESUME: a
relaunch whose aq/wip/<job> exists checks it out first (continue from
checkpoint). wip_base persisted in a write-once sidecar.
- retry policy (now functional): retry { max, backoff, on } requeues failures
whose class (timeout|verify_failed|crash) is in `on`, honoring backoff via
next_eligible (selection skips until eligible), up to max attempts; exhaustion
-> failed/ result=retries_exhausted with the WIP branch + full log preserved.
- state integrity: all meta writes stay append-only; attempts/next_eligible/wip_*
are re-derivable; recovery is crash-safe.
Insights (B1-B6):
- per-run metrics into meta: duration_s, exit, result, attempts, and (git cwd)
files_changed/lines_added/lines_deleted from numstat wip_base..HEAD.
- parse_usage(engine, log) adapter: generic AQ_USAGE line + Claude/Codex token
heuristics; Devin/Copilot TODO; usage_estimated flag; never fabricates numbers.
- status insights sub-line; new `insights [job]` command (per-job metrics or a
recent table + per-engine token/cost/success/duration rollup).
- privacy: only metrics are recorded, never prompt content or secrets.
Backward-compatible: legacy .md and non-git cwd behave exactly as before.
New §24 + feature-catalog row:
- two delegation modes: atomic (leaf bug/feature/task) vs composite (roadmap/epic)
- introduce job kind (leaf|composite); composite routes to a planner/orchestrator
that fans out child leaf jobs as a DAG across factories/agents/profiles
- parentId hierarchy + rollup semantics (status/budget/verify/phase-gates) +
idempotent re-run (skip shipped children)
- source-of-truth/sync discipline (one record referenced by many; one-way echo)
- HYBRID decision recorded: model kind/parentId/rollup in the fleet layer now,
keep shared tracker ITEM_TYPES unchanged (label kind:roadmap), promote to a
first-class epic type later via additive migration once proven
- phasing: leaf-only P1-P2; manual composite P3; auto-decomposition planner P3->P5
Adds (never weakens existing) cases, each in its own temp AGENT_QUEUE_ROOT using
the no-op engine stub:
- backward-compat: legacy engine/cwd/yolo-only .md still lands in review/.
- priority: with --max 1, a critical job queued after a low job runs first
(order-recording stub).
- capability mismatch: has:definitely-not-installed -> failed/
result=capability_mismatch, asserting the agent was never launched.
- engine-class: agentic-coder + no engine, DEVIN_BIN stubbed -> review/.
- idempotency: same key+body twice -> 1 inbox file; same key+changed body in
inbox -> superseded; same key+different body after drain -> rejected.
Inbox counts use find (not a globbing ls) so set -e/pipefail tolerate an empty inbox.
Implements Gigafactory Phase 1 - Slice 1 in the bash runner (backward-compatible;
a legacy engine/cwd/yolo-only .md behaves exactly as before):
- Parse all new §5 manifest keys via fm_get with safe defaults; record them in
<job>.meta and surface priority/profile/capabilities/tracker-item in `status`.
Only priority, capabilities, engine-class and idempotency-key are functional
this slice; the rest (profile, prefers, budget, deps, deps-mode, retry,
review-policy, artifacts, tracker-item) are stored but inert.
- priority ordering: inbox_sorted picks critical>high>medium>low, ties by oldest;
per-lock serialization preserved.
- capability grammar + match: detect_capabilities advertises os/engine/node/has
tokens; caps_match honors key, key:value, key<op>version and os:any. A job whose
declared capabilities the host cannot satisfy is moved to failed/ with
result=capability_mismatch and the agent is never launched.
- engine-class resolution: explicit engine wins; else engine-class picks the first
available engine honoring prefers-engine (agentic-coder->devin,claude,codex;
chat-coder->copilot). No available engine -> result=no_engine. Adds copilot to
the engine driver + COPILOT_BIN.
- idempotency-key dedupe on add: same key+body -> no-op; same key+different body
supersedes an inbox prior, else is rejected with a clear error.
No change to queue/ data or the run/ship lifecycle. macOS + Linux safe.
- docs/GIGAFACTORY_ROADMAP.md: distributed multi-machine fleet vision
(factory x tool x profile routing) as a checklist-driven, phased
implementation roadmap (Phase 0-5) with acceptance criteria, verify
gates, and a 100% Definition-of-Done rubric
- committed path: coordinator as a platform-service module + control
plane on tracker-web, reached via a thin tracker adapter first; bash
runner survives as the offline edge factory agent
- README: add vision/roadmap pointer
Turn dash into a menu-driven control panel (single mjs script):
- numbered, arrow/j-k/1-9 selectable JOBS list (review/testing/failed/inbox)
- single-key actions wired to agent-queue.sh (single source of truth):
p promote, s ship, x reject, u requeue (reject/requeue confirm y/n)
- enter/l opens a live log viewer; r starts a detached run loop, S stops it
- run-loop pid indicator, transient action flashes, ? help overlay
- non-TTY falls back to the read-only live view
- README: dash command + interactive key table
- requeue <job>: move a failed job back to inbox/ and drop stale meta/body so
it re-runs cleanly
- clean [--keep N]: archive finished jobs' logs+meta beyond the newest N
(default 50) into queue/.archive/<ts>/; running jobs + .md records untouched
- document both in usage + bytelyst-cli subcommand list
- .env via 'set -a; . ./.env; set +a' (handles quoted values/spaces safely)
- printf for the GITHUB_TOKEN message so the newline renders
- gh_get_all: paginate all pages (per_page=100) and verify HTTP 200 before jq;
rewire list-public/list-private/check-collaborators through it
- fix SC2199 whitelist membership (explicit loop, no substring false-matches)
- shell-ci: gate shellcheck on bytelyst-cli + run agent-queue self-test
Record pidstart (ps lstart) at launch and verify it in all liveness checks
(_meta_active, status, stop) via _pid_alive, so a recycled pid can never be
mistaken for our worker. Falls back to plain liveness when no start time recorded.
cmd_run now checks daemon.pid liveness up front: if a run loop is alive it exits
with an error (protecting the single-launcher invariant locking depends on); a
stale daemon.pid (dead pid) is cleared and the run proceeds.
Replace live_workers with reservation-aware active_workers + shared _meta_active:
a job counts toward --max the moment its meta is written (before the worker is
backgrounded), so --max can never be exceeded. A <30s guard prevents a meta
orphaned mid-launch from pinning a slot. busy_keys now shares _meta_active.