Observability + defense-in-depth for the recovery/GC machinery: - The reaper now accumulates process-wide telemetry (getReaperStats): cumulative expired/stale reclaims and per-container GC deletions, plus startedAt/last-run timestamps. GET /fleet/metrics returns it under a `reaper` field so operators can see recovery activity (dead_letter counts/alerts were already added). - Cosmos TTL backstop on fleet_leases (2 days): a held lease is renewed continuously so it never expires while active; only finished leases age out, matching the ~24h app GC. Purely defense-in-depth behind the reaper, which still OWNS recovery (requeue + epoch bump + checkpoint). TTL is deliberately NOT set on fleet_events (ids are <jobId>:evt:<seq> with seq=count, so partial TTL deletion could collide ids); events/runs/jobs are pruned by the cascade GC instead. Memory provider ignores defaultTtl, so tests/dev are unaffected. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| scripts | ||
| src | ||
| .gitignore | ||
| Dockerfile | ||
| package.json | ||
| POSTAL_SMTP_SETUP.md | ||
| tsconfig.json | ||
| vitest.config.ts | ||