learning_ai_common_plat/docs/ecosystem/ECOSYSTEM_SHARED_ARTIFACT_SCHEMA.md

8.7 KiB

Ecosystem Shared Artifact Schema

Status: Hardened baseline Owner: learning_ai_common_plat Purpose: Define the canonical cross-product artifact model so notes, transcripts, plans, memories, trails, routes, and agent outputs can interoperate cleanly.


1. Problem

Multiple repos produce durable user-value objects:

  • LysnrAI transcripts
  • NoteLett notes
  • MindLyst memory entries
  • FlowMonk plans
  • ChronoMind routines
  • ActionTrail audit records
  • PeakPulse sessions
  • Cowork generated reports and file outputs

Today these are product-local objects. The ecosystem needs one canonical artifact shape so:

  • artifacts can be linked, searched, and retrieved consistently
  • provenance can be preserved
  • agent outputs can become user-editable assets
  • one user action can create downstream value across products

2. Goals

  1. Define one envelope schema shared across all products.
  2. Preserve product-specific payloads inside a canonical wrapper.
  3. Support provenance, lineage, and visibility rules.
  4. Support human-created, machine-created, and mixed artifacts.
  5. Support cloud-first and local-first storage models.

3. Non-Goals

  1. Replacing product-local domain schemas entirely.
  2. Forcing one storage backend for all products.
  3. Removing product-specific UX or metadata.

4. Proposed Canonical Shape

Every artifact should include:

  • id
  • artifactType
  • productId
  • sourceSurface
  • title
  • summary
  • createdAt
  • updatedAt
  • createdBy
  • ownership
  • visibility
  • status
  • tags
  • links
  • provenance
  • payload

Example types:

  • transcript
  • note
  • memory
  • plan
  • routine
  • habit-checkin
  • trail-report
  • route-session
  • agent-output
  • document
  • digest

5. Minimum Envelope Fields

type ArtifactEnvelope = {
  id: string;
  artifactType: string;
  productId: string;
  sourceSurface: string;
  title: string | null;
  summary: string | null;
  createdAt: string;
  updatedAt: string;
  createdBy: {
    actorType: 'user' | 'agent' | 'system' | 'mixed';
    actorId: string | null;
  };
  ownership: {
    userId: string;
    orgId?: string | null;
  };
  visibility: {
    scope: 'private' | 'org' | 'shared' | 'local-only';
    allowedProducts?: string[];
  };
  status: string;
  tags: string[];
  links: ArtifactLink[];
  provenance: ArtifactProvenance;
  payload: Record<string, unknown>;
};

type ArtifactLink = {
  relation:
    | 'derived-from'
    | 'summarizes'
    | 'generated-task'
    | 'generated-routine'
    | 'generated-memory'
    | 'evidence-for'
    | 'review-of'
    | 'attached-to';
  targetArtifactId: string;
};

type ArtifactProvenance = {
  originProductId: string;
  originActionId?: string | null;
  sessionId?: string | null;
  runId?: string | null;
  approvalId?: string | null;
  lineage: Array<{
    stepType: string;
    productId: string;
    actorType: 'user' | 'agent' | 'system';
    timestamp: string;
  }>;
};

6. First Adopters

Phase 1 adopters:

  1. learning_voice_ai_agent
  2. learning_ai_notes
  3. learning_multimodal_memory_agents
  4. learning_ai_flowmonk
  5. learning_ai_trails
  6. oss/learning_ai_claw-cowork

Phase 2 adopters:

  1. learning_ai_clock
  2. learning_ai_efforise

7. Storage And Lifecycle Rules

The canonical envelope does not force one backend, but it does require consistent lifecycle semantics.

7.1 Storage classes

Every artifact should be classified into one of these storage classes:

  • primary-record
    • the canonical user-facing artifact record
    • examples: note, plan, memory, routine
  • derived-record
    • generated from other artifacts or events but still retained as a first-class object
    • examples: trail-report, digest, agent-output
  • ephemeral-cache
    • regenerable helper artifact that may be dropped without lineage loss
    • examples: intermediate summaries, transient extraction output
  • binary-attachment
    • large file/blob stored through metadata indirection
    • examples: audio, screenshots, exported reports

7.2 Retention rules

  • primary-record
    • retain until user deletion, legal deletion requirement, or explicit archival policy
  • derived-record
    • retain while upstream artifacts still exist unless product policy intentionally purges it
  • ephemeral-cache
    • safe to expire automatically if provenance and regeneration inputs remain
  • binary-attachment
    • may use external blob storage, but the artifact envelope must retain the reference metadata

7.3 Versioning rules

  • the envelope may evolve independently from product payload versions
  • payload should carry a product-local version when the artifact type is likely to evolve incompatibly
  • migrations must preserve:
    • id
    • artifactType
    • ownership
    • provenance
    • links

7.4 Binary attachment rules

Binary artifacts should not inline the binary content in the canonical envelope. Instead:

  • store file metadata in payload or links
  • include:
    • content type
    • logical filename
    • byte size when known
    • blob/storage reference
    • integrity hash when feasible
  • preserve enough metadata for downstream access control and audit

7.5 Deletion and archival rules

  • deleting an artifact should not silently delete downstream lineage evidence
  • tombstones or archived metadata should preserve:
    • id
    • artifactType
    • productId
    • ownership
    • minimal provenance
  • archival may hide payload details while keeping the lineage graph intact

8. Migration Strategy For First Adopters

Phase-based migration should be incremental, not all-or-nothing.

  1. wrap product-local objects in the canonical envelope without replacing local schemas
  2. emit canonical links and provenance first
  3. backfill stable artifact IDs only when needed for downstream consumers
  4. move binary/file references behind metadata indirection where needed
  5. only after that, tighten product-local persistence or indexing around the shared model

First-adopter guidance:

  • LysnrAI
    • transcript artifact is the canonical record; audio remains a binary attachment reference
  • NoteLett
    • note remains the primary record; transcript and trail relationships are represented through links
  • MindLyst
    • memory proposal is a derived record until accepted into a stronger user-owned memory shape
  • FlowMonk, ChronoMind, EffoRise
    • plan, routine, and habit remain product-local domain records wrapped in canonical artifact envelopes
  • ActionTrail
    • trail-report remains a derived record with strong provenance back to Cowork audit actions
  • Cowork
    • generated report or exported file should use agent-output plus attachment metadata when file-backed

9. Key Open Decisions

  1. Should artifact IDs be globally unique UUIDs or product-prefixed IDs?
  2. Which fields belong in the envelope vs product payload?
  3. Should payload be versioned per artifact type?
  4. How should local-only artifacts from learning_ai_local_memory_gpt be represented without leaking sync assumptions?
  5. What is the minimum tombstone shape when a user deletes a previously linked artifact?

10. Acceptance Criteria

  1. A LysnrAI transcript can be wrapped as an artifact without losing transcript-specific metadata.
  2. A NoteLett note can link back to a transcript artifact.
  3. A MindLyst memory can record provenance to source transcript/note artifacts.
  4. A Cowork output can be stored as an agent-output artifact and later opened in NoteLett.
  5. ActionTrail can render provenance from the envelope alone.
  6. Products can classify artifacts by storage class without changing the envelope shape.
  7. Binary artifacts can be referenced without embedding raw file contents in the envelope.

11. Implementation Checklist

  • finalize canonical envelope field names and required/optional splits
  • define first 8 artifact types in stable form
  • define link relation vocabulary
  • define provenance minimum payload
  • define storage/binary attachment rules
  • define migration strategy for first adopters
  • add examples for LysnrAI, NoteLett, and MindLyst
  • add contract-backed examples for FlowMonk, ChronoMind, and EffoRise
  • add examples for Cowork and ActionTrail

Current implementation location:

Commits:

  • eae3409 drafted the initial stub
  • 76f1b47 added Phase 1 artifact schemas and canonical fixtures
  • 78918fb added Phase 2 artifact schemas for plan, routine, and habit
  • e6b58b7 added Phase 3 trail-report schema coverage and examples for Cowork/ActionTrail
  • ab50831 tracked artifact lifecycle hardening as remaining work