learning_ai_common_plat/docs/ecosystem/ECOSYSTEM_AGENT_RUNTIME_CONTRACT.md

8.7 KiB

Ecosystem Agent Runtime Contract

Status: Phase 5 baseline implemented Owner: learning_ai_common_plat Reference inputs: claw-code-oss, claw-cowork, learning_ai_trails, learning_ai_flowmonk, learning_ai_jarvis_jr Purpose: Standardize session state, task state, resume behavior, dispatch semantics, approvals, and audit hooks across agent-capable products.


1. Problem

The ecosystem already has multiple agent-runtime ideas:

  • claw-code runtime sessions, todos, project memory, resume, MCP lifecycle
  • claw-cowork task orchestration, dispatch, scheduling, approvals, audit logging
  • FlowMonk planning/execution
  • JarvisJr coaching/delegation concepts
  • ActionTrail review and replay

Without a shared runtime contract:

  • each repo reinvents session models
  • handoff and resume become inconsistent
  • audit/replay becomes lossy
  • approvals cannot be shared cleanly

2. Goals

  1. Define the canonical runtime state model.
  2. Define session continuity and resume semantics.
  3. Define dispatch and handoff metadata.
  4. Define approval checkpoints and audit hooks.
  5. Allow multiple implementations while preserving one contract.

3. Non-Goals

  1. Forcing all agent products to use one codebase.
  2. Standardizing UI/UX across all agent surfaces.
  3. Replacing product-specific orchestration logic.

4. Core Entities

The shared runtime contract should define:

  • AgentSession
  • AgentTask
  • AgentTodo
  • AgentRun
  • AgentApprovalCheckpoint
  • AgentDispatchRequest
  • AgentHandoff
  • AgentActionLog

5. Minimum Session Shape

type AgentSession = {
  sessionId: string;
  productId: string;
  userId: string;
  status: 'active' | 'paused' | 'waiting-approval' | 'completed' | 'failed' | 'cancelled';
  startedAt: string;
  updatedAt: string;
  resumable: boolean;
  currentTaskId?: string | null;
  memoryRefs: string[];
  artifactRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
};

type AgentTask = {
  taskId: string;
  sessionId: string;
  title: string;
  intent: string;
  status: 'queued' | 'running' | 'blocked' | 'completed' | 'failed' | 'cancelled';
  priority?: string;
  createdAt: string;
  updatedAt: string;
};

type AgentTodo = {
  todoId: string;
  sessionId: string;
  text: string;
  status: 'open' | 'in-progress' | 'done' | 'dropped';
  createdAt: string;
  updatedAt: string;
};

6. Required Runtime Behaviors

Every compliant implementation should support:

  1. session creation
  2. resumable state checkpoints
  3. todo/task updates during execution
  4. approval checkpoints
  5. action-log emission
  6. artifact emission
  7. dispatch metadata when execution originates elsewhere
  8. replayability in ActionTrail

7. Dispatch Model

The contract should support:

  • browser-originated requests
  • mobile-originated requests
  • desktop-originated requests
  • inter-product dispatch
  • trusted desktop executor dispatch

Example:

type AgentDispatchContext = {
  originSurface: 'browser' | 'mobile' | 'desktop' | 'web' | 'product-api';
  originProductId: string;
  dispatchMode: 'interactive' | 'queued' | 'scheduled' | 'remote';
  initiatedAt: string;
};

8. First Implementations

The first conforming runtime integrations should target:

  1. oss/learning_ai_claw-cowork
  2. learning_ai_trails
  3. learning_ai_flowmonk
  4. learning_ai_jarvis_jr

Later:

  1. learning_voice_ai_agent transformation workflows
  2. shared operator tools in learning_ai_common_plat

9. Key Open Decisions

  1. How much of claw-code todo/session semantics should be adopted directly vs normalized?
  2. Should scheduled runs create new sessions or new runs under one session?
  3. What is the minimum checkpoint payload required for resume-anywhere?
  4. Which runtime actions must always emit ActionTrail logs?
  5. How should worktree-isolated code tasks be represented vs non-code tasks?

10. Lifecycle Boundaries

The current runtime model now uses these boundaries:

  • AgentSession A durable container for related work over time. Sessions can outlive individual runs and can stay resumable even after one run finishes.
  • AgentRun A concrete execution instance. A run is the thing that can be queued, running, paused, waiting-approval, completed, failed, or cancelled.
  • AgentTask A user-meaningful unit of intent inside a session. Tasks should remain stable enough to describe the work, even when execution is retried or rescheduled.
  • AgentTodo A smaller actionable checklist item. When a product has no separate checklist model yet, it may temporarily project todos from its native task backlog, but that mapping must be called out explicitly.

Interpretation rules:

  1. queued means execution has not started yet.
  2. paused means execution started and is intentionally halted or deferred.
  3. waiting-approval means the run is blocked on human review.
  4. a session may contain multiple runs over time
  5. a task may survive multiple runs if execution is retried, resumed, or rescheduled
  6. todos should never imply a separate execution history unless the product truly tracks that internally

Current product mappings:

  • Cowork
    • session: persisted/resumable workspace session
    • task: orchestrator task
    • run: orchestrator execution of that task
    • todo: interim task-backed projection until a first-class todo source exists
  • FlowMonk
    • session: user planning workspace
    • task: planning backlog task
    • run: scheduled entry
    • approval: agent-suggested schedule entry awaiting or receiving confirmation
    • todo: task-backed projection until a distinct checklist primitive exists

11. Checkpoint And Resume Semantics

Every runtime implementation that claims resumable: true should be able to produce a stable checkpoint envelope for the current session or run.

Minimum checkpoint shape:

type AgentCheckpoint = {
  checkpointId: string;
  sessionId: string;
  runId?: string | null;
  productId: string;
  userId: string;
  createdAt: string;
  statusAtCapture:
    | 'queued'
    | 'running'
    | 'paused'
    | 'waiting-approval'
    | 'completed'
    | 'failed'
    | 'cancelled';
  currentTaskId?: string | null;
  todoIds: string[];
  artifactRefs: string[];
  memoryRefs: string[];
  approvalRefs: string[];
  dispatchContext?: AgentDispatchContext | null;
  resumeToken?: string | null;
  stateSummary: {
    title: string;
    summary: string;
    lastActionAt?: string | null;
  };
};

Required semantics:

  1. a checkpoint captures enough context to resume work without re-deriving user intent from scratch
  2. a checkpoint may point at a runId, but it must always belong to one sessionId
  3. resumeToken is product-defined, but it must be stable enough for the same product runtime to reopen the session safely
  4. a resumed run should preserve the same sessionId and should create a new runId only if the product treats the resumed execution as a new execution instance
  5. checkpoint creation should append to action/audit history instead of overwriting earlier state
  6. a session must not be marked resumable: true unless the product can actually restore from the latest checkpoint or equivalent persisted state

Current product expectations:

  • Cowork
    • the persisted workspace/session state is the effective checkpoint source
    • a resumed task may create a new run while preserving the same session
  • FlowMonk
    • the schedule/planning workspace is the effective checkpoint source
    • scheduled entries can resume planning context even when no long-running executor is active

12. Acceptance Criteria

  1. A dispatched Cowork task can be resumed after interruption without losing audit continuity.
  2. A FlowMonk execution can emit task/todo state using the same contract.
  3. ActionTrail can replay a run using the shared action-log structure.
  4. Approval checkpoints can be handed off to Auth App without losing run context.
  5. Product-specific runtimes can remain different internally while still producing the same contract externally.

13. Implementation Checklist

  • finalize entity list and minimum required fields
  • define run vs session vs task boundaries
  • define checkpoint/resume semantics
  • define dispatch payload contract
  • define action-log hook points
  • define ActionTrail replay requirements
  • define first conforming implementation plan for Cowork and FlowMonk

Commits:

  • eae3409 drafted the initial stub
  • 3f2482b added the baseline runtime schemas for dispatch, session, task, todo, run, approval, and action logs
  • 97b731e added the Cowork task-backed AgentTodo projection
  • faf93ec added FlowMonk direct AgentApprovalCheckpoint and task-backed AgentTodo projections
  • ff8c5eb promoted queued to a first-class AgentRun state