Implements Gigafactory Phase 1 - Slice 1 in the bash runner (backward-compatible;
a legacy engine/cwd/yolo-only .md behaves exactly as before):
- Parse all new §5 manifest keys via fm_get with safe defaults; record them in
<job>.meta and surface priority/profile/capabilities/tracker-item in `status`.
Only priority, capabilities, engine-class and idempotency-key are functional
this slice; the rest (profile, prefers, budget, deps, deps-mode, retry,
review-policy, artifacts, tracker-item) are stored but inert.
- priority ordering: inbox_sorted picks critical>high>medium>low, ties by oldest;
per-lock serialization preserved.
- capability grammar + match: detect_capabilities advertises os/engine/node/has
tokens; caps_match honors key, key:value, key<op>version and os:any. A job whose
declared capabilities the host cannot satisfy is moved to failed/ with
result=capability_mismatch and the agent is never launched.
- engine-class resolution: explicit engine wins; else engine-class picks the first
available engine honoring prefers-engine (agentic-coder->devin,claude,codex;
chat-coder->copilot). No available engine -> result=no_engine. Adds copilot to
the engine driver + COPILOT_BIN.
- idempotency-key dedupe on add: same key+body -> no-op; same key+different body
supersedes an inbox prior, else is rejected with a clear error.
No change to queue/ data or the run/ship lifecycle. macOS + Linux safe.
- requeue <job>: move a failed job back to inbox/ and drop stale meta/body so
it re-runs cleanly
- clean [--keep N]: archive finished jobs' logs+meta beyond the newest N
(default 50) into queue/.archive/<ts>/; running jobs + .md records untouched
- document both in usage + bytelyst-cli subcommand list
Record pidstart (ps lstart) at launch and verify it in all liveness checks
(_meta_active, status, stop) via _pid_alive, so a recycled pid can never be
mistaken for our worker. Falls back to plain liveness when no start time recorded.
cmd_run now checks daemon.pid liveness up front: if a run loop is alive it exits
with an error (protecting the single-launcher invariant locking depends on); a
stale daemon.pid (dead pid) is cleared and the run proceeds.
Replace live_workers with reservation-aware active_workers + shared _meta_active:
a job counts toward --max the moment its meta is written (before the worker is
backgrounded), so --max can never be exceeded. A <30s guard prevents a meta
orphaned mid-launch from pinning a slot. busy_keys now shares _meta_active.
Mark a running worker '⚠ stalled' when its log has not changed for more than
AGENT_QUEUE_STALL_MIN minutes (default 10), using log mtime as the freshness
signal. Implemented in both the bash status table and the Node dashboard.
Honor 'timeout: 45m' (90s|45m|2h|1d) by wrapping the agent in timeout/gtimeout
when available (hard process-tree kill), else a portable bash watchdog. On expiry
the job moves doing->failed with result=timeout and a TIMED OUT log line.
Serialize jobs by lock key (frontmatter 'lock:' override, default cwd) via the
single run-loop's pre-launch eligibility check; the oldest non-busy job is picked
regardless of --max. Adds a flock-based worker guard where flock exists (Linux);
macOS relies on the single-daemon model. Records lock= in job meta.