learning_ai_common_plat/services
saravanakumardb1 5413c0e789 feat(fleet): reaper/GC telemetry in /fleet/metrics + lease TTL backstop
Observability + defense-in-depth for the recovery/GC machinery:

- The reaper now accumulates process-wide telemetry (getReaperStats): cumulative
  expired/stale reclaims and per-container GC deletions, plus startedAt/last-run
  timestamps. GET /fleet/metrics returns it under a `reaper` field so operators
  can see recovery activity (dead_letter counts/alerts were already added).
- Cosmos TTL backstop on fleet_leases (2 days): a held lease is renewed
  continuously so it never expires while active; only finished leases age out,
  matching the ~24h app GC. Purely defense-in-depth behind the reaper, which still
  OWNS recovery (requeue + epoch bump + checkpoint). TTL is deliberately NOT set on
  fleet_events (ids are <jobId>:evt:<seq> with seq=count, so partial TTL deletion
  could collide ids); events/runs/jobs are pruned by the cascade GC instead.

Memory provider ignores defaultTtl, so tests/dev are unaffected.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-06-01 12:18:22 -07:00
..
cowork-service chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
extraction-service chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
mcp-server chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
monitoring chore(deps): bump @types/node 22 -> 25 (dev types) 2026-05-31 04:02:56 -07:00
platform-service feat(fleet): reaper/GC telemetry in /fleet/metrics + lease TTL backstop 2026-06-01 12:18:22 -07:00