learning_ai_common_plat

History

saravanakumardb1 68bfa3dbd8 feat(fleet): stale-factory lease reclaim + bounded GC sweep Two recovery/cleanup gaps left the coordinator's containers growing without bound and jobs stuck longer than necessary: - reclaimStaleFactoryLeases: a crashed/partitioned factory stops heartbeating ~90s before its 900s lease TTL expires; the reaper now reclaims held leases of stale (or vanished) holders within one stale window, via the same fence + checkpoint-preserving path as the expiry reaper (refactored into reclaimLeaseJob). - sweepFleetGarbage: deletes ephemeral coordination state on by default (finished expired/released leases past a 24h TTL; factory docs with no heartbeat for 7d — a live host just re-registers). Terminal-job retention (jobs + their runs/events/ artifacts+blobs) is OPT-IN only via FLEET_GC_RETENTION_DAYS (default 0 = never delete history). Every delete is best-effort so one failure can't stall the sweep. Both are wired into the existing reaper loop: recovery scans run every 30s, the deletion sweep is throttled to hourly. New repo helpers (listHeldLeases, listFinishedLeasesOlderThan, deleteLease, listAllFactories, deleteFactory, listTerminalJobsOlderThan, deleteRun, deleteEvent) back the new coordinator functions. Covered by cleanup.test.ts + expanded reaper.test.ts. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>		2026-06-01 11:34:14 -07:00
..
cowork-service	chore(deps): bump @types/node 22 -> 25 (dev types)	2026-05-31 04:02:56 -07:00
extraction-service	chore(deps): bump @types/node 22 -> 25 (dev types)	2026-05-31 04:02:56 -07:00
mcp-server	chore(deps): bump @types/node 22 -> 25 (dev types)	2026-05-31 04:02:56 -07:00
monitoring	chore(deps): bump @types/node 22 -> 25 (dev types)	2026-05-31 04:02:56 -07:00
platform-service	feat(fleet): stale-factory lease reclaim + bounded GC sweep	2026-06-01 11:34:14 -07:00