# Ecosystem Agent Runtime Contract

> **Status:** Phase 5 baseline implemented
> **Owner:** `learning_ai_common_plat`
> **Reference inputs:** `claw-code-oss`, `claw-cowork`, `learning_ai_trails`, `learning_ai_flowmonk`, `learning_ai_jarvis_jr`
> **Purpose:** Standardize session state, task state, resume behavior, dispatch semantics, approvals, and audit hooks across agent-capable products.

---

## 1. Problem

The ecosystem already has multiple agent-runtime ideas:

- `claw-code` runtime sessions, todos, project memory, resume, MCP lifecycle
- `claw-cowork` task orchestration, dispatch, scheduling, approvals, audit logging
- FlowMonk planning/execution
- JarvisJr coaching/delegation concepts
- ActionTrail review and replay

Without a shared runtime contract:

- each repo reinvents session models
- handoff and resume become inconsistent
- audit/replay becomes lossy
- approvals cannot be shared cleanly

---

## 2. Goals

1. Define the canonical runtime state model.
2. Define session continuity and resume semantics.
3. Define dispatch and handoff metadata.
4. Define approval checkpoints and audit hooks.
5. Allow multiple implementations while preserving one contract.

## 3. Non-Goals

1. Forcing all agent products to use one codebase.
2. Standardizing UI/UX across all agent surfaces.
3. Replacing product-specific orchestration logic.

---

## 4. Core Entities

The shared runtime contract should define:

- `AgentSession`
- `AgentTask`
- `AgentTodo`
- `AgentRun`
- `AgentApprovalCheckpoint`
- `AgentDispatchRequest`
- `AgentHandoff`
- `AgentActionLog`

---

## 5. Minimum Session Shape

```ts
type AgentSession = {
  sessionId: string;
  productId: string;
  userId: string;
  status: 'active' | 'paused' | 'waiting-approval' | 'completed' | 'failed' | 'cancelled';
  startedAt: string;
  updatedAt: string;
  resumable: boolean;
  currentTaskId?: string | null;
  memoryRefs: string[];
  artifactRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
};

type AgentTask = {
  taskId: string;
  sessionId: string;
  title: string;
  intent: string;
  status: 'queued' | 'running' | 'blocked' | 'completed' | 'failed' | 'cancelled';
  priority?: string;
  createdAt: string;
  updatedAt: string;
};

type AgentTodo = {
  todoId: string;
  sessionId: string;
  text: string;
  status: 'open' | 'in-progress' | 'done' | 'dropped';
  createdAt: string;
  updatedAt: string;
};
```

---

## 6. Required Runtime Behaviors

Every compliant implementation should support:

1. session creation
2. resumable state checkpoints
3. todo/task updates during execution
4. approval checkpoints
5. action-log emission
6. artifact emission
7. dispatch metadata when execution originates elsewhere
8. replayability in ActionTrail

---

## 7. Dispatch Model

The contract should support:

- browser-originated requests
- mobile-originated requests
- desktop-originated requests
- inter-product dispatch
- trusted desktop executor dispatch

Example:

```ts
type AgentDispatchContext = {
  originSurface: 'browser' | 'mobile' | 'desktop' | 'web' | 'product-api';
  originProductId: string;
  dispatchMode: 'interactive' | 'queued' | 'scheduled' | 'remote';
  initiatedAt: string;
};
```

---

## 8. First Implementations

The first conforming runtime integrations should target:

1. `oss/learning_ai_claw-cowork`
2. `learning_ai_trails`
3. `learning_ai_flowmonk`
4. `learning_ai_jarvis_jr`

Later:

5. `learning_voice_ai_agent` transformation workflows
6. shared operator tools in `learning_ai_common_plat`

---

## 9. Key Open Decisions

1. How much of `claw-code` todo/session semantics should be adopted directly vs normalized?
2. Should scheduled runs create new sessions or new runs under one session?
3. What is the minimum checkpoint payload required for resume-anywhere?
4. Which runtime actions must always emit ActionTrail logs?
5. How should worktree-isolated code tasks be represented vs non-code tasks?

---

## 10. Lifecycle Boundaries

The current runtime model now uses these boundaries:

- `AgentSession`
  A durable container for related work over time. Sessions can outlive individual runs and can stay resumable even after one run finishes.
- `AgentRun`
  A concrete execution instance.
  A run is the thing that can be `queued`, `running`, `paused`, `waiting-approval`, `completed`, `failed`, or `cancelled`.
- `AgentTask`
  A user-meaningful unit of intent inside a session.
  Tasks should remain stable enough to describe the work, even when execution is retried or rescheduled.
- `AgentTodo`
  A smaller actionable checklist item.
  When a product has no separate checklist model yet, it may temporarily project todos from its native task backlog, but that mapping must be called out explicitly.

Interpretation rules:

1. `queued` means execution has not started yet.
2. `paused` means execution started and is intentionally halted or deferred.
3. `waiting-approval` means the run is blocked on human review.
4. a session may contain multiple runs over time
5. a task may survive multiple runs if execution is retried, resumed, or rescheduled
6. todos should never imply a separate execution history unless the product truly tracks that internally

Current product mappings:

- Cowork
  - session: persisted/resumable workspace session
  - task: orchestrator task
  - run: orchestrator execution of that task
  - todo: interim task-backed projection until a first-class todo source exists
- FlowMonk
  - session: user planning workspace
  - task: planning backlog task
  - run: scheduled entry
  - approval: agent-suggested schedule entry awaiting or receiving confirmation
  - todo: task-backed projection until a distinct checklist primitive exists

---

## 11. Checkpoint And Resume Semantics

Every runtime implementation that claims `resumable: true` should be able to produce a stable
checkpoint envelope for the current session or run.

Minimum checkpoint shape:

```ts
type AgentCheckpoint = {
  checkpointId: string;
  sessionId: string;
  runId?: string | null;
  productId: string;
  userId: string;
  createdAt: string;
  statusAtCapture:
    | 'queued'
    | 'running'
    | 'paused'
    | 'waiting-approval'
    | 'completed'
    | 'failed'
    | 'cancelled';
  currentTaskId?: string | null;
  todoIds: string[];
  artifactRefs: string[];
  memoryRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
  resumeToken?: string | null;
  stateSummary: {
    title: string;
    summary: string;
    lastActionAt?: string | null;
  };
};
```

Required semantics:

1. a checkpoint captures enough context to resume work without re-deriving user intent from scratch
2. a checkpoint may point at a `runId`, but it must always belong to one `sessionId`
3. `resumeToken` is product-defined, but it must be stable enough for the same product runtime to
   reopen the session safely
4. a resumed run should preserve the same `sessionId` and should create a new `runId` only if the
   product treats the resumed execution as a new execution instance
5. checkpoint creation should append to action/audit history instead of overwriting earlier state
6. a session must not be marked `resumable: true` unless the product can actually restore from the
   latest checkpoint or equivalent persisted state

Current product expectations:

- Cowork
  - the persisted workspace/session state is the effective checkpoint source
  - a resumed task may create a new run while preserving the same session
- FlowMonk
  - the schedule/planning workspace is the effective checkpoint source
  - scheduled entries can resume planning context even when no long-running executor is active

---

## 12. ActionTrail Replay Requirements

ActionTrail replay is not required to reproduce every UI pixel of a product runtime.
For the ecosystem roadmap, replay means reconstructing the execution narrative with enough fidelity
to answer:

1. what was requested
2. what ran
3. what approvals or pauses happened
4. what artifacts or memories were produced
5. why the final state was reached

Minimum replay evidence for a run:

1. stable identity
   - `sessionId`
   - `runId`
   - `productId`
   - `userId`
2. execution timing
   - `startedAt`
   - `completedAt` when available
   - checkpoint timestamps when resumable
3. causal chain
   - `correlationId`
   - `causationId`
   - `parentEventId`
   - canonical event IDs on audit/action records when available
4. control-flow state changes
   - queued
   - running
   - paused
   - waiting-approval
   - completed / failed / cancelled
5. human intervention evidence
   - approval checkpoints
   - approval decisions
   - actor identity for approvals or overrides
6. output evidence
   - `artifactRefs`
   - `memoryRefs`
   - relevant task and todo state at the end of the run

Required behaviors:

1. products must preserve runtime action logs or equivalent audit records long enough for replay and
   review
2. products may keep private implementation details, but they must emit enough canonical metadata to
   reconstruct the run narrative externally
3. replay consumers should trust canonical event IDs and action-log IDs over inferred timestamps
4. replay should tolerate partial fidelity
   - if UI frames or low-level desktop events are unavailable, ActionTrail should still be able to
     render a narrative replay from runtime actions, approvals, checkpoints, and artifacts
5. replay views must clearly distinguish between:
   - observed canonical events
   - inferred transitions derived from checkpoints or final state

Current product expectations:

- Cowork
  - Rust audit records plus canonical `event_id` values are the primary replay anchor
  - checkpoint/resume should explain why a run paused, resumed, or required approval
- FlowMonk
  - scheduled entries, confirmations, and projected action logs are the primary replay anchor
  - replay should make it obvious when an automation was queued versus actively running

---

## 13. First Conforming Implementation Plan

The first conforming implementations for Phase 5 should be treated as two tracks:

### Track A: Cowork conformance

Goal:

- present Cowork as the reference high-autonomy runtime producer

Required external contract surface:

- `AgentSession`
- `AgentTask`
- `AgentTodo`
- `AgentRun`
- `AgentApprovalCheckpoint`
- `AgentActionLog`
- dispatch validation
- checkpoint/resume narrative support

Current state:

- sessions, tasks, runs, approvals, actions, and dispatch validation are already exposed through
  `cowork-service`
- todos and checkpoint summaries now come from persisted Cowork checkpoint records
- replay anchors now use canonical Rust audit `event_id`
- checkpoint summaries now preserve artifact, memory, and approval refs when Cowork provides them
- Cowork session and task IPC projections now expose canonical runtime event IDs directly

Direct observations vs projections:

- direct observations from Rust / IPC:
  - task lifecycle values from orchestrator task state:
    - `pending`
    - `running`
    - `completed`
    - `failed`
    - `cancelled`
  - session-level `waitingApproval`
  - persisted checkpoint terminal flags:
    - `completed`
    - `cancelled`
    - checkpoint `error`
  - canonical audit `event_id`
  - checkpoint artifact, memory, and approval refs when present
- derived by `cowork-service` projection:
  - `AgentRun.status=queued` from Rust `pending`
  - `AgentTask.status=queued` from Rust `pending`
  - `AgentTodo.status=open|in-progress|done|dropped` from checkpoint outcome plus task state
  - `AgentCheckpoint.statusAtCapture=paused` when no stronger Rust signal exists
  - session `status=active` when `waitingApproval` is false
  - approval/action runtime objects from audit records rather than first-class IPC records
  - dispatch context normalization for shared runtime consumers

Interpretation rule:

- if a state comes directly from Rust task state, checkpoint flags, or session approval state, treat it
  as authoritative
- if a state is produced by mapping logic in `cowork-service`, treat it as a shared-contract view for
  downstream consumers, not as a replacement for native Cowork internals

Conformance bar:

- a reviewer can inspect one Cowork session and understand queued, running, paused, waiting-approval,
  resumed, and completed states without reading Rust internals

### Track B: FlowMonk conformance

Goal:

- present FlowMonk as the reference scheduled/queued runtime producer

Required external contract surface:

- `AgentSession`
- `AgentTask`
- `AgentTodo`
- `AgentRun`
- `AgentApprovalCheckpoint`
- `AgentActionLog`
- dispatch validation where scheduling hands off into downstream execution
- checkpoint/resume narrative for planning state

Current state:

- sessions, tasks, todos, runs, approvals, actions, and dispatch validation are exposed through the backend
- approvals, todos, and checkpoints are now persisted as native runtime records in FlowMonk
- run/session rollover rule: one planning workspace session exists per user, and each schedule entry becomes a new run inside that session
- runtime todos and approvals now have direct end-user PATCH surfaces that update both the runtime record and the source task or schedule entry

Next conforming steps:

1. preserve richer artifact and memory refs in checkpoint summaries once downstream runtime producers are connected
2. decide whether FlowMonk should expose a richer UI for editing approvals and todos beyond the current backend surface

Conformance bar:

- a reviewer can inspect one FlowMonk planning session and understand which work was queued,
  confirmed, executed, deferred, or resumed without guessing from schedule records alone

Shared closeout requirement:

- both products should publish a short product-local adoption note that maps their native entities to
  the shared runtime contract and explicitly names any remaining derived projections

---

## 14. Acceptance Criteria

1. A dispatched Cowork task can be resumed after interruption without losing audit continuity.
2. A FlowMonk execution can emit task/todo state using the same contract.
3. ActionTrail can replay a run using the shared action-log structure.
4. Approval checkpoints can be handed off to Auth App without losing run context.
5. Product-specific runtimes can remain different internally while still producing the same contract externally.

---

## 15. Implementation Checklist

- [x] finalize entity list and minimum required fields
- [x] define run vs session vs task boundaries
- [x] define checkpoint/resume semantics
- [x] define dispatch payload contract
- [x] define action-log hook points
- [x] define ActionTrail replay requirements
- [x] define first conforming implementation plan for Cowork and FlowMonk

Commits:

- `eae3409` drafted the initial stub
- `3f2482b` added the baseline runtime schemas for dispatch, session, task, todo, run, approval, and action logs
- `97b731e` added the Cowork task-backed `AgentTodo` projection
- `faf93ec` added FlowMonk direct `AgentApprovalCheckpoint` and task-backed `AgentTodo` projections
- `ff8c5eb` promoted `queued` to a first-class `AgentRun` state