15 KiB
Ecosystem Agent Runtime Contract
Status: Phase 5 baseline implemented Owner:
learning_ai_common_platReference inputs:claw-code-oss,claw-cowork,learning_ai_trails,learning_ai_flowmonk,learning_ai_jarvis_jrPurpose: Standardize session state, task state, resume behavior, dispatch semantics, approvals, and audit hooks across agent-capable products.
1. Problem
The ecosystem already has multiple agent-runtime ideas:
claw-coderuntime sessions, todos, project memory, resume, MCP lifecycleclaw-coworktask orchestration, dispatch, scheduling, approvals, audit logging- FlowMonk planning/execution
- JarvisJr coaching/delegation concepts
- ActionTrail review and replay
Without a shared runtime contract:
- each repo reinvents session models
- handoff and resume become inconsistent
- audit/replay becomes lossy
- approvals cannot be shared cleanly
2. Goals
- Define the canonical runtime state model.
- Define session continuity and resume semantics.
- Define dispatch and handoff metadata.
- Define approval checkpoints and audit hooks.
- Allow multiple implementations while preserving one contract.
3. Non-Goals
- Forcing all agent products to use one codebase.
- Standardizing UI/UX across all agent surfaces.
- Replacing product-specific orchestration logic.
4. Core Entities
The shared runtime contract should define:
AgentSessionAgentTaskAgentTodoAgentRunAgentApprovalCheckpointAgentDispatchRequestAgentHandoffAgentActionLog
5. Minimum Session Shape
type AgentSession = {
sessionId: string;
productId: string;
userId: string;
status: 'active' | 'paused' | 'waiting-approval' | 'completed' | 'failed' | 'cancelled';
startedAt: string;
updatedAt: string;
resumable: boolean;
currentTaskId?: string | null;
memoryRefs: string[];
artifactRefs: string[];
approvalRefs: string[];
dispatchContext?: AgentDispatchContext | null;
};
type AgentTask = {
taskId: string;
sessionId: string;
title: string;
intent: string;
status: 'queued' | 'running' | 'blocked' | 'completed' | 'failed' | 'cancelled';
priority?: string;
createdAt: string;
updatedAt: string;
};
type AgentTodo = {
todoId: string;
sessionId: string;
text: string;
status: 'open' | 'in-progress' | 'done' | 'dropped';
createdAt: string;
updatedAt: string;
};
6. Required Runtime Behaviors
Every compliant implementation should support:
- session creation
- resumable state checkpoints
- todo/task updates during execution
- approval checkpoints
- action-log emission
- artifact emission
- dispatch metadata when execution originates elsewhere
- replayability in ActionTrail
7. Dispatch Model
The contract should support:
- browser-originated requests
- mobile-originated requests
- desktop-originated requests
- inter-product dispatch
- trusted desktop executor dispatch
Example:
type AgentDispatchContext = {
originSurface: 'browser' | 'mobile' | 'desktop' | 'web' | 'product-api';
originProductId: string;
dispatchMode: 'interactive' | 'queued' | 'scheduled' | 'remote';
initiatedAt: string;
};
8. First Implementations
The first conforming runtime integrations should target:
oss/learning_ai_claw-coworklearning_ai_trailslearning_ai_flowmonklearning_ai_jarvis_jr
Later:
learning_voice_ai_agenttransformation workflows- shared operator tools in
learning_ai_common_plat
9. Key Open Decisions
- How much of
claw-codetodo/session semantics should be adopted directly vs normalized? - Should scheduled runs create new sessions or new runs under one session?
- What is the minimum checkpoint payload required for resume-anywhere?
- Which runtime actions must always emit ActionTrail logs?
- How should worktree-isolated code tasks be represented vs non-code tasks?
10. Lifecycle Boundaries
The current runtime model now uses these boundaries:
AgentSessionA durable container for related work over time. Sessions can outlive individual runs and can stay resumable even after one run finishes.AgentRunA concrete execution instance. A run is the thing that can bequeued,running,paused,waiting-approval,completed,failed, orcancelled.AgentTaskA user-meaningful unit of intent inside a session. Tasks should remain stable enough to describe the work, even when execution is retried or rescheduled.AgentTodoA smaller actionable checklist item. When a product has no separate checklist model yet, it may temporarily project todos from its native task backlog, but that mapping must be called out explicitly.
Interpretation rules:
queuedmeans execution has not started yet.pausedmeans execution started and is intentionally halted or deferred.waiting-approvalmeans the run is blocked on human review.- a session may contain multiple runs over time
- a task may survive multiple runs if execution is retried, resumed, or rescheduled
- todos should never imply a separate execution history unless the product truly tracks that internally
Current product mappings:
- Cowork
- session: persisted/resumable workspace session
- task: orchestrator task
- run: orchestrator execution of that task
- todo: interim task-backed projection until a first-class todo source exists
- FlowMonk
- session: user planning workspace
- task: planning backlog task
- run: scheduled entry
- approval: agent-suggested schedule entry awaiting or receiving confirmation
- todo: task-backed projection until a distinct checklist primitive exists
11. Checkpoint And Resume Semantics
Every runtime implementation that claims resumable: true should be able to produce a stable
checkpoint envelope for the current session or run.
Minimum checkpoint shape:
type AgentCheckpoint = {
checkpointId: string;
sessionId: string;
runId?: string | null;
productId: string;
userId: string;
createdAt: string;
statusAtCapture:
| 'queued'
| 'running'
| 'paused'
| 'waiting-approval'
| 'completed'
| 'failed'
| 'cancelled';
currentTaskId?: string | null;
todoIds: string[];
artifactRefs: string[];
memoryRefs: string[];
approvalRefs: string[];
dispatchContext?: AgentDispatchContext | null;
resumeToken?: string | null;
stateSummary: {
title: string;
summary: string;
lastActionAt?: string | null;
};
};
Required semantics:
- a checkpoint captures enough context to resume work without re-deriving user intent from scratch
- a checkpoint may point at a
runId, but it must always belong to onesessionId resumeTokenis product-defined, but it must be stable enough for the same product runtime to reopen the session safely- a resumed run should preserve the same
sessionIdand should create a newrunIdonly if the product treats the resumed execution as a new execution instance - checkpoint creation should append to action/audit history instead of overwriting earlier state
- a session must not be marked
resumable: trueunless the product can actually restore from the latest checkpoint or equivalent persisted state
Current product expectations:
- Cowork
- the persisted workspace/session state is the effective checkpoint source
- a resumed task may create a new run while preserving the same session
- FlowMonk
- the schedule/planning workspace is the effective checkpoint source
- scheduled entries can resume planning context even when no long-running executor is active
12. ActionTrail Replay Requirements
ActionTrail replay is not required to reproduce every UI pixel of a product runtime. For the ecosystem roadmap, replay means reconstructing the execution narrative with enough fidelity to answer:
- what was requested
- what ran
- what approvals or pauses happened
- what artifacts or memories were produced
- why the final state was reached
Minimum replay evidence for a run:
- stable identity
sessionIdrunIdproductIduserId
- execution timing
startedAtcompletedAtwhen available- checkpoint timestamps when resumable
- causal chain
correlationIdcausationIdparentEventId- canonical event IDs on audit/action records when available
- control-flow state changes
- queued
- running
- paused
- waiting-approval
- completed / failed / cancelled
- human intervention evidence
- approval checkpoints
- approval decisions
- actor identity for approvals or overrides
- output evidence
artifactRefsmemoryRefs- relevant task and todo state at the end of the run
Required behaviors:
- products must preserve runtime action logs or equivalent audit records long enough for replay and review
- products may keep private implementation details, but they must emit enough canonical metadata to reconstruct the run narrative externally
- replay consumers should trust canonical event IDs and action-log IDs over inferred timestamps
- replay should tolerate partial fidelity
- if UI frames or low-level desktop events are unavailable, ActionTrail should still be able to render a narrative replay from runtime actions, approvals, checkpoints, and artifacts
- replay views must clearly distinguish between:
- observed canonical events
- inferred transitions derived from checkpoints or final state
Current product expectations:
- Cowork
- Rust audit records plus canonical
event_idvalues are the primary replay anchor - checkpoint/resume should explain why a run paused, resumed, or required approval
- Rust audit records plus canonical
- FlowMonk
- scheduled entries, confirmations, and projected action logs are the primary replay anchor
- replay should make it obvious when an automation was queued versus actively running
13. First Conforming Implementation Plan
The first conforming implementations for Phase 5 should be treated as two tracks:
Track A: Cowork conformance
Goal:
- present Cowork as the reference high-autonomy runtime producer
Required external contract surface:
AgentSessionAgentTaskAgentTodoAgentRunAgentApprovalCheckpointAgentActionLog- dispatch validation
- checkpoint/resume narrative support
Current state:
- sessions, tasks, runs, approvals, actions, and dispatch validation are already exposed through
cowork-service - todos and checkpoint summaries now come from persisted Cowork checkpoint records
- replay anchors now use canonical Rust audit
event_id - checkpoint summaries now preserve artifact, memory, and approval refs when Cowork provides them
- Cowork session and task IPC projections now expose canonical runtime event IDs directly
Direct observations vs projections:
- direct observations from Rust / IPC:
- task lifecycle values from orchestrator task state:
pendingrunningcompletedfailedcancelled
- session-level
waitingApproval - persisted checkpoint terminal flags:
completedcancelled- checkpoint
error
- canonical audit
event_id - checkpoint artifact, memory, and approval refs when present
- task lifecycle values from orchestrator task state:
- derived by
cowork-serviceprojection:AgentRun.status=queuedfrom RustpendingAgentTask.status=queuedfrom RustpendingAgentTodo.status=open|in-progress|done|droppedfrom checkpoint outcome plus task stateAgentCheckpoint.statusAtCapture=pausedwhen no stronger Rust signal exists- session
status=activewhenwaitingApprovalis false - approval/action runtime objects from audit records rather than first-class IPC records
- dispatch context normalization for shared runtime consumers
Interpretation rule:
- if a state comes directly from Rust task state, checkpoint flags, or session approval state, treat it as authoritative
- if a state is produced by mapping logic in
cowork-service, treat it as a shared-contract view for downstream consumers, not as a replacement for native Cowork internals
Conformance bar:
- a reviewer can inspect one Cowork session and understand queued, running, paused, waiting-approval, resumed, and completed states without reading Rust internals
Track B: FlowMonk conformance
Goal:
- present FlowMonk as the reference scheduled/queued runtime producer
Required external contract surface:
AgentSessionAgentTaskAgentTodoAgentRunAgentApprovalCheckpointAgentActionLog- dispatch validation where scheduling hands off into downstream execution
- checkpoint/resume narrative for planning state
Current state:
- sessions, tasks, todos, runs, approvals, actions, and dispatch validation are exposed through the backend
- approvals, todos, and checkpoints are now persisted as native runtime records in FlowMonk
- run/session rollover rule: one planning workspace session exists per user, and each schedule entry becomes a new run inside that session
- runtime todos and approvals now have direct end-user PATCH surfaces that update both the runtime record and the source task or schedule entry
Next conforming steps:
- preserve richer artifact and memory refs in checkpoint summaries once downstream runtime producers are connected
- decide whether FlowMonk should expose a richer UI for editing approvals and todos beyond the current backend surface
Conformance bar:
- a reviewer can inspect one FlowMonk planning session and understand which work was queued, confirmed, executed, deferred, or resumed without guessing from schedule records alone
Shared closeout requirement:
- both products should publish a short product-local adoption note that maps their native entities to the shared runtime contract and explicitly names any remaining derived projections
14. Acceptance Criteria
- A dispatched Cowork task can be resumed after interruption without losing audit continuity.
- A FlowMonk execution can emit task/todo state using the same contract.
- ActionTrail can replay a run using the shared action-log structure.
- Approval checkpoints can be handed off to Auth App without losing run context.
- Product-specific runtimes can remain different internally while still producing the same contract externally.
15. Implementation Checklist
- finalize entity list and minimum required fields
- define run vs session vs task boundaries
- define checkpoint/resume semantics
- define dispatch payload contract
- define action-log hook points
- define ActionTrail replay requirements
- define first conforming implementation plan for Cowork and FlowMonk
Commits:
eae3409drafted the initial stub3f2482badded the baseline runtime schemas for dispatch, session, task, todo, run, approval, and action logs97b731eadded the Cowork task-backedAgentTodoprojectionfaf93ecadded FlowMonk directAgentApprovalCheckpointand task-backedAgentTodoprojectionsff8c5ebpromotedqueuedto a first-classAgentRunstate