docs(agent-queue): add Phase 1 Slice 3 prompt (resilience & insights, single host)
This commit is contained in:
parent
beb225162a
commit
1f18f5d7a3
168
agent-queue/docs/jobs/phase1-slice3.md
Normal file
168
agent-queue/docs/jobs/phase1-slice3.md
Normal file
@ -0,0 +1,168 @@
|
||||
---
|
||||
engine: devin
|
||||
cwd: /Users/sd9235/code/mygh/learning_ai_devops_tools
|
||||
yolo: true
|
||||
lock: devops-tools
|
||||
timeout: 4h
|
||||
---
|
||||
|
||||
ROLE: Senior engineer. Implement Phase 1 — Slice 3: RESILIENCE & INSIGHTS (single host).
|
||||
This is a LARGE, fully self-contained slice (git + log parsing only — NO network,
|
||||
NO external service, NO credentials) so it runs end-to-end without blockers.
|
||||
|
||||
SOURCE OF TRUTH: agent-queue/docs/GIGAFACTORY_ROADMAP.md (read §11 lifecycle/retry,
|
||||
§25 durability/crash-recovery, §26 execution insights, §17 observability, §14 Phase 1).
|
||||
Implement the SINGLE-HOST bash equivalents of §25 and §26.
|
||||
|
||||
PREREQUISITE / BRANCHING:
|
||||
- Builds on Slice 1 (PR #1, branch feat/gigafactory-p1-slice1).
|
||||
- Base on `main` IF PR #1 (and PR #2 if present) are merged; otherwise branch off
|
||||
feat/gigafactory-p1-slice1. Do NOT revert or duplicate earlier slice code.
|
||||
- This slice is INDEPENDENT of Slice 2 (profiles/deps) — do not depend on it.
|
||||
- New branch: feat/gigafactory-p1-slice3. Commit in logical steps, push, open a PR.
|
||||
DO NOT merge (human gate).
|
||||
|
||||
STRICT SCOPE:
|
||||
- Edit ONLY under agent-queue/ (agent-queue.sh, selftest.sh, README.md,
|
||||
docs/GIGAFACTORY_ROADMAP.md). No other repo.
|
||||
- DO NOT modify/delete anything under agent-queue/queue/ (live jobs). DO NOT run
|
||||
`agent-queue.sh run` against the real queue. selftest.sh uses its own temp
|
||||
AGENT_QUEUE_ROOT and temp git repos only.
|
||||
- bash, single host, macOS + Linux safe, zero new runtime deps.
|
||||
|
||||
==================================================================
|
||||
A. CRASH RECOVERY & WORK PRESERVATION (single-host §25)
|
||||
==================================================================
|
||||
A1. ORPHAN RECOVERY: On `run` startup (and at the top of each run loop), detect
|
||||
jobs stuck in building/ whose worker is no longer alive — i.e. the meta has a
|
||||
`pid=` whose process is dead (and `pidstart` mismatch guards PID reuse), or no
|
||||
live pid at all. Such a job is an ORPHAN from a previous crash/power-off.
|
||||
Recover it deterministically (never lose or strand it):
|
||||
- increment an `attempts=` counter in the meta,
|
||||
- log a clear recovery line,
|
||||
- move it back to inbox/ for re-selection (subject to retry policy A3),
|
||||
- recovery MUST be idempotent (running it twice recovers once).
|
||||
|
||||
A2. WIP CHECKPOINTING (work preservation): when a job's `cwd` is inside a git repo,
|
||||
the worker preserves partial work on a dedicated branch so a crash never loses it:
|
||||
- at START: ensure/create branch `aq/wip/<job>` (from current HEAD), record
|
||||
`wip_branch=` + `wip_base=` in meta. NEVER touch main/protected branches.
|
||||
- on EVERY exit path (success, failure, timeout, signal/trap): commit any
|
||||
changes in cwd to `aq/wip/<job>` with a message like
|
||||
"aq wip: <job> (<stage/exit>)" and record `wip_commit=` in meta.
|
||||
- use a trap so even SIGTERM/SIGINT/timeout still checkpoints.
|
||||
- if cwd is NOT a git repo: skip cleanly (log "wip: cwd not a git repo").
|
||||
RESUME: when an orphan/retry of a job whose `aq/wip/<job>` branch exists is
|
||||
relaunched, check out / fast-forward that branch first so the agent continues
|
||||
from the checkpoint instead of from zero. Document the resume behavior.
|
||||
|
||||
A3. RETRY POLICY (make the reserved `retry` field FUNCTIONAL):
|
||||
parse `retry: { max: N, backoff: 5m, on: [timeout, verify_failed, crash] }`.
|
||||
On a failure whose class is in `on` (agent rc!=0 => crash/agent_error,
|
||||
timeout => timeout, verify fail => verify_failed), requeue to inbox/ with the
|
||||
backoff delay honored (record `next_eligible=` epoch; selection skips until
|
||||
then) up to `max` attempts. On exhaustion → failed/ with
|
||||
result=retries_exhausted (single-host stand-in for dead_letter), preserving the
|
||||
wip branch + full diagnostics in the log. Default when `retry` absent = no
|
||||
retry (current behavior).
|
||||
|
||||
A4. STATE INTEGRITY: keep all meta writes append-only (as today); never truncate a
|
||||
live meta. Recovery/retry/backoff bookkeeping must be crash-safe (re-derivable
|
||||
from meta + folder location).
|
||||
|
||||
==================================================================
|
||||
B. EXECUTION INSIGHTS & TOKEN ACCOUNTING (single-host §26)
|
||||
==================================================================
|
||||
B1. PER-RUN METRICS: on completion, record into the job meta:
|
||||
duration_s, exit, result, attempts, and repo deltas for the run —
|
||||
files_changed, lines_added, lines_deleted (from `git -C <cwd> diff --numstat`
|
||||
against wip_base, or against HEAD~ if applicable).
|
||||
B2. TOKEN/COST CAPTURE (best-effort, honest): add a single extensible adapter
|
||||
`parse_usage <engine> <logfile>` that extracts, when present in the engine's
|
||||
output: model, tokens_in, tokens_out, tokens_cached, cost_usd, turns,
|
||||
tool_calls. Where the engine does not expose usage, omit the field or set an
|
||||
`estimated=true` marker — DO NOT fabricate precise numbers. Centralize all
|
||||
per-engine patterns in this one function (devin/claude/codex/copilot stubs;
|
||||
real patterns where known, TODO-commented otherwise).
|
||||
B3. SURFACE in `status`: add an insights sub-line per finished/running job
|
||||
(duration, attempts, tokens/cost if known, +/- lines).
|
||||
B4. NEW COMMAND `aq insights [job]`:
|
||||
- with a job id: print that job's full metrics.
|
||||
- without: print a table of recent finished jobs + an AGGREGATE rollup by
|
||||
engine (total tokens, total cost (mark if any estimated), job count,
|
||||
success rate, avg duration).
|
||||
B5. dashboard.mjs: surface a compact insights column/panel (tokens or cost +
|
||||
attempts) for finished jobs. Keep it read-only from meta (agent-queue.sh
|
||||
stays the single source of truth).
|
||||
B6. PRIVACY: never write prompt content or secrets into meta/insights/logs beyond
|
||||
what already exists.
|
||||
|
||||
==================================================================
|
||||
TESTS (selftest.sh — tests are sacred; only ADD; use temp git repos + stubs)
|
||||
==================================================================
|
||||
- orphan recovery: craft a building/ job whose meta pid is a dead PID → a `run`
|
||||
startup recovers it to inbox/ with attempts incremented; running recovery twice
|
||||
recovers exactly once.
|
||||
- wip checkpoint (git): job with a git-repo cwd that creates a file → after the
|
||||
run, branch aq/wip/<job> exists and contains a commit with the change; main
|
||||
branch untouched. Non-git cwd → skipped cleanly (no error).
|
||||
- wip resume: a recovered job whose aq/wip/<job> has a prior commit → the relaunch
|
||||
checks out that branch (assert HEAD is on aq/wip/<job> when the agent runs).
|
||||
- retry policy: verify-fail job with retry.max=1 on=[verify_failed] → requeued once
|
||||
(attempts=2) then → failed/ result=retries_exhausted; backoff next_eligible
|
||||
respected (job not picked before its delay — use a tiny backoff like 1s).
|
||||
- retry on crash: agent rc!=0 with on=[crash] retries; without `crash` in `on`,
|
||||
it goes straight to failed/ (no retry).
|
||||
- insights parse: feed a stub engine log containing a known usage line →
|
||||
parse_usage extracts tokens/cost into meta; `aq insights <job>` prints them;
|
||||
a no-usage log → fields omitted/estimated, no crash.
|
||||
- insights aggregate: two finished jobs → `aq insights` prints a per-engine rollup
|
||||
with correct totals + success rate.
|
||||
- numstat deltas: a run that adds N lines → lines_added recorded.
|
||||
- REGRESSION: all existing selftest cases (Slice 0 + Slice 1) still green.
|
||||
|
||||
==================================================================
|
||||
DOCS
|
||||
==================================================================
|
||||
- README: new "Resilience" section (orphan recovery, WIP checkpoint/resume, retry)
|
||||
and "Insights" section (metrics, `aq insights`, token caveat) + document the
|
||||
`retry` frontmatter (now active) and the new result= values
|
||||
(retries_exhausted). Update the manifest table: move `retry` from RESERVED to ACTIVE.
|
||||
- docs/GIGAFACTORY_ROADMAP.md: tick the single-host items you fully completed in
|
||||
§11 (retry/dead-letter stand-in), §25 (orphan/WIP/retry — note "single-host
|
||||
subset"), §26 (capture/insights — single-host subset); bump §0 Phase 1 %.
|
||||
|
||||
==================================================================
|
||||
CONSTRAINTS
|
||||
==================================================================
|
||||
- bash style consistent with the existing script; no new runtime deps; mac+linux
|
||||
safe (no GNU-only flags without a fallback — note macOS has BSD date/stat);
|
||||
no emojis in code; no leftover debug noise; conventional commits.
|
||||
- Be careful with `set -euo pipefail` + traps so the WIP-on-exit checkpoint always
|
||||
runs even on failure/timeout.
|
||||
|
||||
VERIFY GATE (must pass before finishing):
|
||||
- bash agent-queue/selftest.sh → fully green (existing + all new cases).
|
||||
- bash -n agent-queue/agent-queue.sh ; node --check agent-queue/dashboard.mjs.
|
||||
- shellcheck --severity=error agent-queue/agent-queue.sh (if available) → clean.
|
||||
|
||||
FINAL OUTPUT — print the implementation report in EXACTLY this format:
|
||||
|
||||
## Implementation Report — Phase 1 Slice 3
|
||||
### Branch & commits
|
||||
- branch / based-on: <name> (based on main | feat/gigafactory-p1-slice1)
|
||||
- commits: <sha> <message> (one per line)
|
||||
- PR: <url or "opened, not merged">
|
||||
### Files changed
|
||||
- <path>: <one-line summary>
|
||||
### What was implemented (A1-A4, B1-B6)
|
||||
- <item>: <how, key functions added/changed>
|
||||
### Tests added
|
||||
- <test name>: <what it asserts> (plus selftest.sh PASS/FAIL summary)
|
||||
### Verify gate results
|
||||
- selftest.sh: <PASS/FAIL + counts>
|
||||
- bash -n / node --check / shellcheck: <result>
|
||||
### Deviations / assumptions
|
||||
- <anything changed from spec and why; which engines have real token parsing vs TODO>
|
||||
### Suggested next slice
|
||||
- <what should come next (likely: tracker adapter aq from-tracker/to-tracker, P2)>
|
||||
Loading…
Reference in New Issue
Block a user