Saravana Achu Mac 1c92284314 docs(runtime): clarify Cowork direct and projected states

2026-04-04 14:59:15 -07:00

15 KiB

Raw Blame History

Ecosystem Agent Runtime Contract

Status: Phase 5 baseline implemented Owner: learning_ai_common_plat Reference inputs: claw-code-oss, claw-cowork, learning_ai_trails, learning_ai_flowmonk, learning_ai_jarvis_jr Purpose: Standardize session state, task state, resume behavior, dispatch semantics, approvals, and audit hooks across agent-capable products.

1. Problem

The ecosystem already has multiple agent-runtime ideas:

claw-code runtime sessions, todos, project memory, resume, MCP lifecycle
claw-cowork task orchestration, dispatch, scheduling, approvals, audit logging
FlowMonk planning/execution
JarvisJr coaching/delegation concepts
ActionTrail review and replay

Without a shared runtime contract:

each repo reinvents session models
handoff and resume become inconsistent
audit/replay becomes lossy
approvals cannot be shared cleanly

2. Goals

Define the canonical runtime state model.
Define session continuity and resume semantics.
Define dispatch and handoff metadata.
Define approval checkpoints and audit hooks.
Allow multiple implementations while preserving one contract.

3. Non-Goals

Forcing all agent products to use one codebase.
Standardizing UI/UX across all agent surfaces.
Replacing product-specific orchestration logic.

4. Core Entities

The shared runtime contract should define:

AgentSession
AgentTask
AgentTodo
AgentRun
AgentApprovalCheckpoint
AgentDispatchRequest
AgentHandoff
AgentActionLog

5. Minimum Session Shape

type AgentSession = {
  sessionId: string;
  productId: string;
  userId: string;
  status: 'active' | 'paused' | 'waiting-approval' | 'completed' | 'failed' | 'cancelled';
  startedAt: string;
  updatedAt: string;
  resumable: boolean;
  currentTaskId?: string | null;
  memoryRefs: string[];
  artifactRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
};

type AgentTask = {
  taskId: string;
  sessionId: string;
  title: string;
  intent: string;
  status: 'queued' | 'running' | 'blocked' | 'completed' | 'failed' | 'cancelled';
  priority?: string;
  createdAt: string;
  updatedAt: string;
};

type AgentTodo = {
  todoId: string;
  sessionId: string;
  text: string;
  status: 'open' | 'in-progress' | 'done' | 'dropped';
  createdAt: string;
  updatedAt: string;
};

6. Required Runtime Behaviors

Every compliant implementation should support:

session creation
resumable state checkpoints
todo/task updates during execution
approval checkpoints
action-log emission
artifact emission
dispatch metadata when execution originates elsewhere
replayability in ActionTrail

7. Dispatch Model

The contract should support:

browser-originated requests
mobile-originated requests
desktop-originated requests
inter-product dispatch
trusted desktop executor dispatch

Example:

type AgentDispatchContext = {
  originSurface: 'browser' | 'mobile' | 'desktop' | 'web' | 'product-api';
  originProductId: string;
  dispatchMode: 'interactive' | 'queued' | 'scheduled' | 'remote';
  initiatedAt: string;
};

8. First Implementations

The first conforming runtime integrations should target:

oss/learning_ai_claw-cowork
learning_ai_trails
learning_ai_flowmonk
learning_ai_jarvis_jr

Later:

learning_voice_ai_agent transformation workflows
shared operator tools in learning_ai_common_plat

9. Key Open Decisions

How much of claw-code todo/session semantics should be adopted directly vs normalized?
Should scheduled runs create new sessions or new runs under one session?
What is the minimum checkpoint payload required for resume-anywhere?
Which runtime actions must always emit ActionTrail logs?
How should worktree-isolated code tasks be represented vs non-code tasks?

10. Lifecycle Boundaries

The current runtime model now uses these boundaries:

AgentSession A durable container for related work over time. Sessions can outlive individual runs and can stay resumable even after one run finishes.
AgentRun A concrete execution instance. A run is the thing that can be queued, running, paused, waiting-approval, completed, failed, or cancelled.
AgentTask A user-meaningful unit of intent inside a session. Tasks should remain stable enough to describe the work, even when execution is retried or rescheduled.
AgentTodo A smaller actionable checklist item. When a product has no separate checklist model yet, it may temporarily project todos from its native task backlog, but that mapping must be called out explicitly.

Interpretation rules:

queued means execution has not started yet.
paused means execution started and is intentionally halted or deferred.
waiting-approval means the run is blocked on human review.
a session may contain multiple runs over time
a task may survive multiple runs if execution is retried, resumed, or rescheduled
todos should never imply a separate execution history unless the product truly tracks that internally

Current product mappings:

Cowork
- session: persisted/resumable workspace session
- task: orchestrator task
- run: orchestrator execution of that task
- todo: interim task-backed projection until a first-class todo source exists
FlowMonk
- session: user planning workspace
- task: planning backlog task
- run: scheduled entry
- approval: agent-suggested schedule entry awaiting or receiving confirmation
- todo: task-backed projection until a distinct checklist primitive exists

11. Checkpoint And Resume Semantics

Every runtime implementation that claims resumable: true should be able to produce a stable checkpoint envelope for the current session or run.

Minimum checkpoint shape:

type AgentCheckpoint = {
  checkpointId: string;
  sessionId: string;
  runId?: string | null;
  productId: string;
  userId: string;
  createdAt: string;
  statusAtCapture:
    | 'queued'
    | 'running'
    | 'paused'
    | 'waiting-approval'
    | 'completed'
    | 'failed'
    | 'cancelled';
  currentTaskId?: string | null;
  todoIds: string[];
  artifactRefs: string[];
  memoryRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
  resumeToken?: string | null;
  stateSummary: {
    title: string;
    summary: string;
    lastActionAt?: string | null;
  };
};

Required semantics:

a checkpoint captures enough context to resume work without re-deriving user intent from scratch
a checkpoint may point at a runId, but it must always belong to one sessionId
resumeToken is product-defined, but it must be stable enough for the same product runtime to reopen the session safely
a resumed run should preserve the same sessionId and should create a new runId only if the product treats the resumed execution as a new execution instance
checkpoint creation should append to action/audit history instead of overwriting earlier state
a session must not be marked resumable: true unless the product can actually restore from the latest checkpoint or equivalent persisted state

Current product expectations:

Cowork
- the persisted workspace/session state is the effective checkpoint source
- a resumed task may create a new run while preserving the same session
FlowMonk
- the schedule/planning workspace is the effective checkpoint source
- scheduled entries can resume planning context even when no long-running executor is active

12. ActionTrail Replay Requirements

ActionTrail replay is not required to reproduce every UI pixel of a product runtime. For the ecosystem roadmap, replay means reconstructing the execution narrative with enough fidelity to answer:

what was requested
what ran
what approvals or pauses happened
what artifacts or memories were produced
why the final state was reached

Minimum replay evidence for a run:

stable identity
- sessionId
- runId
- productId
- userId
execution timing
- startedAt
- completedAt when available
- checkpoint timestamps when resumable
causal chain
- correlationId
- causationId
- parentEventId
- canonical event IDs on audit/action records when available
control-flow state changes
- queued
- running
- paused
- waiting-approval
- completed / failed / cancelled
human intervention evidence
- approval checkpoints
- approval decisions
- actor identity for approvals or overrides
output evidence
- artifactRefs
- memoryRefs
- relevant task and todo state at the end of the run

Required behaviors:

products must preserve runtime action logs or equivalent audit records long enough for replay and review
products may keep private implementation details, but they must emit enough canonical metadata to reconstruct the run narrative externally
replay consumers should trust canonical event IDs and action-log IDs over inferred timestamps
replay should tolerate partial fidelity
- if UI frames or low-level desktop events are unavailable, ActionTrail should still be able to render a narrative replay from runtime actions, approvals, checkpoints, and artifacts
replay views must clearly distinguish between:
- observed canonical events
- inferred transitions derived from checkpoints or final state

Current product expectations:

Cowork
- Rust audit records plus canonical event_id values are the primary replay anchor
- checkpoint/resume should explain why a run paused, resumed, or required approval
FlowMonk
- scheduled entries, confirmations, and projected action logs are the primary replay anchor
- replay should make it obvious when an automation was queued versus actively running

13. First Conforming Implementation Plan

The first conforming implementations for Phase 5 should be treated as two tracks:

Track A: Cowork conformance

Goal:

present Cowork as the reference high-autonomy runtime producer

Required external contract surface:

AgentSession
AgentTask
AgentTodo
AgentRun
AgentApprovalCheckpoint
AgentActionLog
dispatch validation
checkpoint/resume narrative support

Current state:

sessions, tasks, runs, approvals, actions, and dispatch validation are already exposed through cowork-service
todos and checkpoint summaries now come from persisted Cowork checkpoint records
replay anchors now use canonical Rust audit event_id
checkpoint summaries now preserve artifact, memory, and approval refs when Cowork provides them
Cowork session and task IPC projections now expose canonical runtime event IDs directly

Direct observations vs projections:

direct observations from Rust / IPC:
- task lifecycle values from orchestrator task state:
  - pending
  - running
  - completed
  - failed
  - cancelled
- session-level waitingApproval
- persisted checkpoint terminal flags:
  - completed
  - cancelled
  - checkpoint error
- canonical audit event_id
- checkpoint artifact, memory, and approval refs when present
derived by cowork-service projection:
- AgentRun.status=queued from Rust pending
- AgentTask.status=queued from Rust pending
- AgentTodo.status=open|in-progress|done|dropped from checkpoint outcome plus task state
- AgentCheckpoint.statusAtCapture=paused when no stronger Rust signal exists
- session status=active when waitingApproval is false
- approval/action runtime objects from audit records rather than first-class IPC records
- dispatch context normalization for shared runtime consumers

Interpretation rule:

if a state comes directly from Rust task state, checkpoint flags, or session approval state, treat it as authoritative
if a state is produced by mapping logic in cowork-service, treat it as a shared-contract view for downstream consumers, not as a replacement for native Cowork internals

Conformance bar:

a reviewer can inspect one Cowork session and understand queued, running, paused, waiting-approval, resumed, and completed states without reading Rust internals

Track B: FlowMonk conformance