bytelyst-devops-tools

Author	SHA1	Message	Date
saravanakumardb1	a849a30e11	feat(agent-queue): refuse a second run when a daemon is already active cmd_run now checks daemon.pid liveness up front: if a run loop is alive it exits with an error (protecting the single-launcher invariant locking depends on); a stale daemon.pid (dead pid) is cleared and the run proceeds.	2026-05-28 22:21:31 -07:00
saravanakumardb1	11935d0539	fix(agent-queue): reserve concurrency slot before backgrounding worker Replace live_workers with reservation-aware active_workers + shared _meta_active: a job counts toward --max the moment its meta is written (before the worker is backgrounded), so --max can never be exceeded. A <30s guard prevents a meta orphaned mid-launch from pinning a slot. busy_keys now shares _meta_active.	2026-05-28 22:17:36 -07:00
saravanakumardb1	79331d591f	feat(agent-queue): flag stalled workers in status + dash Mark a running worker '⚠ stalled' when its log has not changed for more than AGENT_QUEUE_STALL_MIN minutes (default 10), using log mtime as the freshness signal. Implemented in both the bash status table and the Node dashboard.	2026-05-28 22:15:26 -07:00
saravanakumardb1	3b71f0117a	feat(agent-queue): per-job timeout via frontmatter timeout: Honor 'timeout: 45m' (90s\|45m\|2h\|1d) by wrapping the agent in timeout/gtimeout when available (hard process-tree kill), else a portable bash watchdog. On expiry the job moves doing->failed with result=timeout and a TIMED OUT log line.	2026-05-28 22:13:50 -07:00
saravanakumardb1	f14e6c2336	feat(agent-queue): per-cwd locking so two agents never share a repo Serialize jobs by lock key (frontmatter 'lock:' override, default cwd) via the single run-loop's pre-launch eligibility check; the oldest non-busy job is picked regardless of --max. Adds a flock-based worker guard where flock exists (Linux); macOS relies on the single-daemon model. Records lock= in job meta.	2026-05-28 22:10:30 -07:00
saravanakumardb1	9b49c28af5	chore(agent-queue): add self-test harness (shellcheck + no-op run cycle)	2026-05-28 22:07:15 -07:00
saravanakumardb1	0c21a6466a	feat(aliases): add aq/aqs/aqd agent-queue aliases; scope shell-ci shellcheck - aliases/_agent.alias: aq, aqs (status), aqd (dash) — path-relative to repo - register _agent.alias in _source_all.alias loader + document in README - shell-ci: gate shellcheck on agent-queue.sh; bytelyst-cli.sh shellcheck is non-gating (pre-existing legacy SC2199 in check_collaborators), bash -n still gates both	2026-05-28 21:52:36 -07:00
saravanakumardb1	9c16a631e2	ci(agent-queue): add Gitea shell-ci workflow (shellcheck + syntax + smoke) Lints agent-queue.sh + bytelyst-cli.sh (shellcheck --severity=error), syntax-checks all scripts (bash -n) and the Node dashboard (node --check), and runs a no-agent smoke test (init/add/drain -> failed/). Gitea runner labels + node:20-bookworm container, path-filtered to the touched files.	2026-05-28 21:43:22 -07:00
saravanakumardb1	169e944c3c	feat(agent-queue): Node live dashboard + bytelyst-cli integration - dashboard.mjs: zero-dep Node TUI (running workers w/ engine, elapsed, cwd, last log line + recent done/failed); 'dash' subcommand execs it - bytelyst-cli.sh: 'agent-queue' / 'aq' passthrough handled before the GITHUB_TOKEN + jq/curl gates; usage + interactive-menu entry - README: document dash + bytelyst-cli usage	2026-05-28 21:39:25 -07:00
saravanakumardb1	8f725f8587	docs(repo-map): register agent-queue tool directory	2026-05-28 21:35:59 -07:00
saravanakumardb1	179108504f	feat(agent-queue): folder-kanban runner for devin/claude/codex CLIs Add a zero-dependency, bash 3.2-compatible queue runner that executes prompt .md files through headless coding-agent CLIs in auto-approve mode, moving them inbox -> doing -> done/failed with per-job logs and live status. - pluggable engine drivers (devin --prompt-file, claude/codex via stdin) - per-task YAML frontmatter: engine, cwd, yolo - subcommands: init, add, run (--max N), status, watch, stop, logs - runtime queue/ state gitignored	2026-05-28 21:35:59 -07:00
saravanakumardb1	a049e9c602	docs(roadmap): record post-roadmap follow-ups complete (v15) - docker-lint CI propagated to all 9 remaining consumer repos - all 10 remaining repos mirrored to Gitea; 9/9 docker-lint jobs green - Gitea Actions runner hardened (capacity 1->2, env_file token) + documented - repair corrupted §10 execution-log region from prior rebase	2026-05-28 18:07:36 -07:00
Hermes VM	0e1905aa33	docs: document local LLM utility workflows Some checks failed pre-commit / pre-commit (push) Failing after 33s Details	2026-05-28 00:21:06 +00:00
Hermes VM	44fd6a462a	fix: bind DevOps dashboard ports to loopback Some checks failed pre-commit / pre-commit (push) Failing after 27s Details	2026-05-27 21:55:46 +00:00
Hermes VM	f936c2231c	docs: record product port hardening Some checks failed pre-commit / pre-commit (push) Failing after 25s Details	2026-05-27 21:53:08 +00:00
Hermes VM	7047d625ef	feat(dashboard/vm): Phases 1.1, 1.3, 3.1, 3.4 — VM page panels Phase 3.1 — VM Score Card (0–100): - 6 weighted dimensions: steal time, RAM, disk, service health, maintenance hygiene, LLM readiness (matching roadmap scoring) - Color-coded gauge + per-dimension progress bars with detail text - Computed from health + cron + unhealthy data; degrades gracefully when any source is unavailable Phase 1.3 — Unhealthy Container Detail Panel: - Loads independently from GET /api/vm/containers/unhealthy - Per-container: name, unhealthy since, restart count, last health logs - Expandable row for health check output - One-click restart with spinner, feedback toast, auto-refresh after 3s Phase 1.1 — Cron Status Panel: - Loads from GET /api/vm/cron-status - Table: 4 managed jobs × schedule \| last run \| freed MB \| status \| next - Collapsible run history (last 10) with step-by-step log expansion Phase 3.4 — Ollama/LLM Panel: - Loads from GET /api/vm/ollama/models - Currently-loaded section with RAM pressure warning (<4 GB free) - RAM bar visualisation showing model footprint - Model list with size + last-used time - One-click unload button Other improvements: - All data fetched in parallel (Promise.allSettled) — any panel failure does not block the rest of the page - Add steal, failed_units, cron_missing_paths to CHECK_META/CHECK_ORDER - Refresh now updates all 5 data sources atomically - web/package-lock.json regenerated (was stale, caused build failure) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:49:23 +00:00
Hermes VM	b15c570587	docs: record common-platform port hardening Some checks failed pre-commit / pre-commit (push) Failing after 37s Details	2026-05-27 21:32:31 +00:00
Hermes VM	d9618ba7b0	feat(vm): Phases 1.2, 1.4, 2.1 — steal time, swap pressure, health watchdog Phase 1.2 — CPU steal time metric in vm-health-check.sh: - Samples /proc/stat twice 1s apart for accurate current steal % - Thresholds: >5% WARN, >15% CRIT (currently 0.8% on this host) - Inserts before memory check so steal is visible alongside load Phase 1.4 — Swap pressure indicator: - Reads SwapCached from /proc/meminfo as secondary metric - Raises SWAP_USED_WARN_GB 1→1.5 to reduce noise (current usage 0.6G) - New WARN path: SwapCached > 200MB signals recent pressure even when current swap usage looks ok (catches post-spike state) Phase 2.1 — Docker health-check watchdog: - docker-health-watchdog.sh: checks unhealthy containers every 10 min, restarts only after 3 consecutive failing health checks (30min grace) - docker-health-watchdog.service + .timer: enabled, fires every 10 min - Sends Telegram notification on each auto-restart - Rollback: systemctl disable docker-health-watchdog.timer Phase 2.2 already complete: sync_hermes_persistent_backup.py handles diverge gracefully with rebase/reset-hard fallback; running successfully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:31:09 +00:00
Hermes VM	d60c81ebda	docs: record internal port loopback hardening Some checks failed pre-commit / pre-commit (push) Failing after 38s Details	2026-05-27 21:25:38 +00:00
Hermes VM	2fc23d6baa	feat(vm): fix devops-backend VM module — Phase 0.1 complete - Switch backend runner from node:20-alpine to node:20-slim so GNU df flags (--output=pcent/avail) work inside the container - Add volume mounts to docker-compose.yml: scripts (ro), VM logs (rw), docker.sock; set VM_SCRIPTS_PATH + VM_LOG_DIR env vars - Rebuild repository.ts: env-configurable paths, cron history parser, unhealthy-container inspector, Ollama model endpoints - Add routes: GET /api/vm/cron-status, unhealthy containers, Ollama models, container restart, model unload - vm-cleanup.sh: add step_cosmos_pglog, step_docker_aged_images; fix (( count++ )) → count=$(( count + 1 )) for set -e compatibility - Add docs/VM_OBSERVABILITY_ROADMAP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 21:13:45 +00:00
Hermes VM	5a2d92f519	docs: record VM container health fix Some checks failed pre-commit / pre-commit (push) Failing after 33s Details	2026-05-27 21:12:45 +00:00
Saravana Kumar	e2db92f3b1	Add Hermes snapshot diff view	2026-05-27 21:05:57 +00:00
Saravana Kumar	8f522e3505	Add Hermes dashboard improvement backlog	2026-05-27 21:02:23 +00:00
Hermes VM	9210a8890f	feat: detect stale VM automation Some checks failed pre-commit / pre-commit (push) Failing after 32s Details	2026-05-27 21:00:43 +00:00
Hermes VM	3d5f369f3d	docs: record Gitea runner recovery Some checks failed pre-commit / pre-commit (push) Failing after 40s Details	2026-05-27 20:58:16 +00:00
Hermes VM	1f2eea8268	docs: record VM backup and cron fixes Some checks failed pre-commit / pre-commit (push) Has been cancelled Details	2026-05-27 20:56:11 +00:00
Saravana Kumar	90f6db2014	Complete Hermes ops dashboard and roadmap	2026-05-27 20:53:58 +00:00
Hermes VM	e3d1dddf51	docs: add VM exposure inventory Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:51:27 +00:00
Saravana Kumar	98a7915a38	Reconcile Hermes roadmap and dashboard status	2026-05-27 20:46:16 +00:00
Saravana Kumar	ac79591903	Mark web search tooling complete	2026-05-27 20:46:16 +00:00
Hermes VM	313a775fa0	docs: strengthen VM security roadmap gates Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:34:37 +00:00
Hermes VM	2c125adb05	docs: add VM security blind spots roadmap Some checks are pending pre-commit / pre-commit (push) Waiting to run Details	2026-05-27 20:21:52 +00:00
Saravana Kumar	c89018ae47	Tighten Telegram fallback wording	2026-05-27 20:18:46 +00:00
Saravana Kumar	8145484136	Verify Telegram fallback platform context	2026-05-27 20:16:30 +00:00
Saravana Kumar	8da66497cc	Tighten Hermes local fallback chain	2026-05-27 19:58:09 +00:00
Saravana Kumar	3e26f0da31	Close Hermes browser and web backend items	2026-05-27 19:23:55 +00:00
root	d1f234fc01	Mark Firecrawl as locally configured	2026-05-27 18:57:50 +00:00
Hermes VM	70d96d7684	feat: add gitea backup timer assets	2026-05-27 18:53:20 +00:00
Hermes VM	147db72330	docs: add hostinger maintenance operations entry	2026-05-27 18:53:20 +00:00
Hermes VM	31b414d62b	fix: systematic bug fixes — code-quality parser, env key, config warnings, auth cleanup, deployment safety - code-quality/repository.ts: fix tsErrorMatch[3] → [4] for type field (group 3 is column, 4 is error\|warning) - code-quality/repository.ts: fix ESLint regex to make rule brackets optional (not all formatters include them) - code-quality/repository.ts: fix Vitest test count — parse 'Tests' line (individual tests) instead of 'Test Files' (file count); improve Jest regex to capture pass/fail independently - env/repository.ts: replace raw process.env.ENCRYPTION_KEY with config.ENCRYPTION_KEY so the validated default flows through a single source of truth - config.ts: add startup console.warn when CSRF_SECRET or ENCRYPTION_KEY are using insecure defaults - deployments/orchestrator.ts: refactor runDeploymentScript to use try/catch/finally — deployment record is now always written in the finally block, preventing zombie 'running' states if updateDeployment itself throws - auth.tsx: remove dead 'user &&' guard (user is always truthy after the !user check above); remove debug console.log calls, keep console.error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	cdc23696b2	fix: resolve all TypeScript errors — green tsc Primitives.tsx (TS2339): - asChild branch read children.props.className before the cast applied, making props typed as unknown. Extract typedChild first, then read props. hermes/page.tsx + agents/page.tsx + tasks/page.tsx + tasks/[id]/page.tsx (TS2322): - Badge.variant accepts 'neutral'\|'success'\|'warning'\|'error'\|'info' but callers were passing 'danger' (should be 'error') and 'default' (should be 'neutral'). MetricCard.tone is a separate type and is correct as-is. Changes: - statusTone map in hermes/page.tsx: 'danger' → 'error', 'default' → 'neutral' - getTaskTone fallback: 'default' → 'neutral'; explicit return type added - levelTone in tasks/[id]/page.tsx: 'danger' → 'error'; explicit return type added - Inline Badge variants: all remaining 'danger' → 'error' across 3 files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	1099d518ef	improve: dashboard security, code quality, and UX fixes Security (backend): - env/routes: add requireAdmin to all 6 env endpoints — GET /env was fully open, exposing all secret values to unauthenticated requests - deployments/routes: add requireAdmin to all 4 GET endpoints (deployment history and logs were publicly readable) - health/routes: remove duplicate requireAdmin call from DELETE /health/cache handler body (was already enforced via preHandler) Frontend — auth/api: - system/page: replace raw fetch + localStorage token with apiRequest (mutations now go through CSRF flow) - vm/page: same — replace raw fetch with vmApi from api.ts - api.ts: add vmApi (getHealth, getCleanupLog, runCleanup) + shared VmHealthResult / VmCheck / VmCheckLevel types Shared utilities: - utils.ts: add formatBytes() and getStatusColor() shared helpers - system/page: remove duplicate formatBytes, import from utils - health/page: remove duplicate getStatusColor, import from utils - page.tsx (home): remove duplicate getStatusColor, import from utils UX improvements: - page.tsx: remove Seed Services button from normal header (debug tool) - page.tsx: deploy button now always enabled; shows inline warning banner when service is not 'up' instead of silently disabling the button - metrics: fix bar chart — bars now grow from bottom (flex-col-reverse), add empty state, fix date parsing timezone edge case - sidebar-nav: theme toggle now functional — persists to localStorage and toggles document.documentElement class 'dark' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	d0b8ce2c74	feat: add VM Health page to devops dashboard Backend (Fastify): - New module: modules/vm/ (types, repository, routes) - GET /api/vm/health — runs vm-health-check.sh --json, returns structured result - GET /api/vm/cleanup-log — tails /var/log/vm-cleanup.log - POST /api/vm/cleanup — triggers vm-cleanup.sh (weekly / monthly / dry-run) - Registered vmRoutes in server.ts Frontend (Next.js): - New page: /vm — VM Health - Overall status banner (OK/WARN/CRIT) with issue summary - Per-check cards: disk, load, RAM, swap, crash loops, container health, build cache, docker images, journal, syslog — color-coded by level - Cleanup trigger buttons (dry-run, weekly, monthly) with output viewer - Collapsible cleanup log viewer (last 40 lines) - Auto-refresh every 60s - sidebar-nav.tsx: added 'VM Health' entry with Server icon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	678430d77d	fix: cron health-check entry should call vm-health-check.sh --notify The 07:00 daily cron was incorrectly pointing to vm-cleanup.sh instead of vm-health-check.sh. Health check is read-only; cleanup is not. Also add --notify so Telegram alerts fire when WARNING/CRITICAL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
Hermes VM	0a2d303f93	add HostingerVM health-check and cleanup scripts - vm-health-check.sh: read-only checks for disk, load, RAM, swap, Docker containers (crash-loops + healthchecks), build cache, journal. Flags: --quiet, --json, --notify (Telegram). Exit 0/1/2 = OK/WARN/CRIT. - vm-cleanup.sh: safe periodic cleanup. Default (weekly): build cache, journal, apt, npm, .next/cache. --full (monthly): adds docker system prune, pnpm store, old logs, HOLD cleanup. --dry-run, --install-cron, --uninstall-cron. Logs to /var/log/vm-cleanup.log. Related: docs/hostinger-vm-maintenance.md, scripts/VMs/HostingerVM/CRON_SETUP.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 18:53:20 +00:00
root	4249b17afc	Document Firecrawl backend selection	2026-05-27 18:52:39 +00:00
root	08f32a79e8	Clarify remaining Hermes fallback verification	2026-05-27 18:46:32 +00:00
root	8fbb535d90	Add shared local Hermes fallback chain	2026-05-27 18:43:30 +00:00
root	9ee060e839	Harden Hermes operations dashboard status	2026-05-27 17:45:41 +00:00
root	0e6528b366	Add live Hermes operations dashboard	2026-05-27 13:04:36 +00:00

1 2 3 4

179 Commits