Turn dash into a menu-driven control panel (single mjs script):
- numbered, arrow/j-k/1-9 selectable JOBS list (review/testing/failed/inbox)
- single-key actions wired to agent-queue.sh (single source of truth):
p promote, s ship, x reject, u requeue (reject/requeue confirm y/n)
- enter/l opens a live log viewer; r starts a detached run loop, S stops it
- run-loop pid indicator, transient action flashes, ? help overlay
- non-TTY falls back to the read-only live view
- README: dash command + interactive key table
- requeue <job>: move a failed job back to inbox/ and drop stale meta/body so
it re-runs cleanly
- clean [--keep N]: archive finished jobs' logs+meta beyond the newest N
(default 50) into queue/.archive/<ts>/; running jobs + .md records untouched
- document both in usage + bytelyst-cli subcommand list
- .env via 'set -a; . ./.env; set +a' (handles quoted values/spaces safely)
- printf for the GITHUB_TOKEN message so the newline renders
- gh_get_all: paginate all pages (per_page=100) and verify HTTP 200 before jq;
rewire list-public/list-private/check-collaborators through it
- fix SC2199 whitelist membership (explicit loop, no substring false-matches)
- shell-ci: gate shellcheck on bytelyst-cli + run agent-queue self-test
Record pidstart (ps lstart) at launch and verify it in all liveness checks
(_meta_active, status, stop) via _pid_alive, so a recycled pid can never be
mistaken for our worker. Falls back to plain liveness when no start time recorded.
cmd_run now checks daemon.pid liveness up front: if a run loop is alive it exits
with an error (protecting the single-launcher invariant locking depends on); a
stale daemon.pid (dead pid) is cleared and the run proceeds.
Replace live_workers with reservation-aware active_workers + shared _meta_active:
a job counts toward --max the moment its meta is written (before the worker is
backgrounded), so --max can never be exceeded. A <30s guard prevents a meta
orphaned mid-launch from pinning a slot. busy_keys now shares _meta_active.
Mark a running worker '⚠ stalled' when its log has not changed for more than
AGENT_QUEUE_STALL_MIN minutes (default 10), using log mtime as the freshness
signal. Implemented in both the bash status table and the Node dashboard.
Honor 'timeout: 45m' (90s|45m|2h|1d) by wrapping the agent in timeout/gtimeout
when available (hard process-tree kill), else a portable bash watchdog. On expiry
the job moves doing->failed with result=timeout and a TIMED OUT log line.
Serialize jobs by lock key (frontmatter 'lock:' override, default cwd) via the
single run-loop's pre-launch eligibility check; the oldest non-busy job is picked
regardless of --max. Adds a flock-based worker guard where flock exists (Linux);
macOS relies on the single-daemon model. Records lock= in job meta.
Lints agent-queue.sh + bytelyst-cli.sh (shellcheck --severity=error),
syntax-checks all scripts (bash -n) and the Node dashboard (node --check),
and runs a no-agent smoke test (init/add/drain -> failed/). Gitea runner
labels + node:20-bookworm container, path-filtered to the touched files.
- docker-lint CI propagated to all 9 remaining consumer repos
- all 10 remaining repos mirrored to Gitea; 9/9 docker-lint jobs green
- Gitea Actions runner hardened (capacity 1->2, env_file token) + documented
- repair corrupted §10 execution-log region from prior rebase
Phase 3.1 — VM Score Card (0–100):
- 6 weighted dimensions: steal time, RAM, disk, service health,
maintenance hygiene, LLM readiness (matching roadmap scoring)
- Color-coded gauge + per-dimension progress bars with detail text
- Computed from health + cron + unhealthy data; degrades gracefully
when any source is unavailable
Phase 1.3 — Unhealthy Container Detail Panel:
- Loads independently from GET /api/vm/containers/unhealthy
- Per-container: name, unhealthy since, restart count, last health logs
- Expandable row for health check output
- One-click restart with spinner, feedback toast, auto-refresh after 3s
Phase 1.1 — Cron Status Panel:
- Loads from GET /api/vm/cron-status
- Table: 4 managed jobs × schedule | last run | freed MB | status | next
- Collapsible run history (last 10) with step-by-step log expansion
Phase 3.4 — Ollama/LLM Panel:
- Loads from GET /api/vm/ollama/models
- Currently-loaded section with RAM pressure warning (<4 GB free)
- RAM bar visualisation showing model footprint
- Model list with size + last-used time
- One-click unload button
Other improvements:
- All data fetched in parallel (Promise.allSettled) — any panel failure
does not block the rest of the page
- Add steal, failed_units, cron_missing_paths to CHECK_META/CHECK_ORDER
- Refresh now updates all 5 data sources atomically
- web/package-lock.json regenerated (was stale, caused build failure)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- code-quality/repository.ts: fix tsErrorMatch[3] → [4] for type field (group 3 is column, 4 is error|warning)
- code-quality/repository.ts: fix ESLint regex to make rule brackets optional (not all formatters include them)
- code-quality/repository.ts: fix Vitest test count — parse 'Tests' line (individual tests) instead of 'Test Files' (file count); improve Jest regex to capture pass/fail independently
- env/repository.ts: replace raw process.env.ENCRYPTION_KEY with config.ENCRYPTION_KEY so the validated default flows through a single source of truth
- config.ts: add startup console.warn when CSRF_SECRET or ENCRYPTION_KEY are using insecure defaults
- deployments/orchestrator.ts: refactor runDeploymentScript to use try/catch/finally — deployment record is now always written in the finally block, preventing zombie 'running' states if updateDeployment itself throws
- auth.tsx: remove dead 'user &&' guard (user is always truthy after the !user check above); remove debug console.log calls, keep console.error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>