learning_ai_common_plat/services
saravanakumardb1 8f51570da7 feat(platform-service): fleet coordinator — claim/lease/fence/heartbeat/reaper (P2 foundation)
The concurrency core (§4/§7/§8/§18/§25):
- claimNextJob: priority+age selection over queued/dep-satisfied jobs whose caps
  are a subset of the factory's, then tryClaimJob does a rev CAS to flip to
  assigned + acquire the lease — exactly one contender wins, no double-assignment.
- leases + fencing: acquire/reclaim bumps leaseEpoch; patchJobFenced/renew/release
  reject a call whose leaseEpoch < job.leaseEpoch (zombie worker can't overwrite).
- heartbeat + isFactoryStale for factory liveness.
- reapExpiredLeases: returns expired-lease jobs to queued/blocked, bumps the epoch
  (fencing the dead holder), preserves the checkpoint pointer (resume), marks the
  lease expired; idempotent. Documents why Cosmos TTL cannot do this.
- submit: idempotent (dedup/supersede/409) + submit-time dependency cycle
  detection; deps gating (shipped, or testing when depsMode:soft).

Tests drive the atomic-claim race, fencing, and reaper deterministically via the
rev CAS (no real threads).
2026-05-29 20:20:30 -07:00
..
cowork-service chore(cowork-service): type test doubles 2026-05-04 15:25:43 -07:00
extraction-service chore(extraction): document sidecar dev alerts 2026-05-04 16:42:25 -07:00
mcp-server test(mcp-server): cover chronomind tool proxies 2026-05-18 09:33:30 +00:00
monitoring feat(monitoring): add VM Overview Grafana dashboard 2026-05-29 21:26:35 +00:00
platform-service feat(platform-service): fleet coordinator — claim/lease/fence/heartbeat/reaper (P2 foundation) 2026-05-29 20:20:30 -07:00