learning_ai_common_plat/docs/EXTRACTION_SERVICE_ROADMAP.md

# Extraction Service — Roadmap & Task Checklist

> **Service:** `@lysnrai/extraction-service` (port 4005)
> **Package:** `@bytelyst/extraction` (shared types + client)
> **Core dependency:** [google/langextract](https://github.com/google/langextract) (Python)
>
> **Companion docs:** [ECOSYSTEM_ARCHITECTURE.md](./ECOSYSTEM_ARCHITECTURE.md) · [ROADMAP.md](./ROADMAP.md)

---

## Overview

A shared extraction microservice that uses Google's LangExtract library to extract structured information from unstructured text. Both LysnrAI and MindLyst consume this service for their respective extraction needs.

**Architecture:** Fastify (routing, auth, validation, request tracing) + Python sidecar (LangExtract). The Fastify layer keeps the service consistent with the other 4 services. The Python process handles the actual LLM-powered extraction.

```
┌──────────────────────────────────────────────────────────┐
│                   extraction-service                      │
│                      (port 4005)                          │
│                                                           │
│  ┌─────────────────────┐    ┌──────────────────────────┐ │
│  │   Fastify (TS)      │    │   Python Sidecar         │ │
│  │                     │    │                          │ │
│  │  - Auth middleware   │──►│  - LangExtract wrapper   │ │
│  │  - Zod validation   │◄──│  - Task registry         │ │
│  │  - x-request-id     │    │  - Model provider config │ │
│  │  - Rate limiting    │    │  - Result caching        │ │
│  │  - /health          │    │                          │ │
│  └─────────────────────┘    └──────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
        ▲                              ▲
        │                              │
   REST API                     FastAPI (internal :4006)
   (external)                   or subprocess stdio
```

### Consumers

| Product                       | Use Case                                                                 | Entry Point                                        |
| ----------------------------- | ------------------------------------------------------------------------ | -------------------------------------------------- |
| **LysnrAI** — Desktop/Backend | Post-transcription extraction (action items, decisions, dates, people)   | `backend/src/clients/extraction_client.py`         |
| **LysnrAI** — Admin Dashboard | Transcript analytics, entity review                                      | `admin-dashboard-web/src/lib/extraction-client.ts` |
| **MindLyst** — KMP/Web        | Triage pipeline (brain routing, entity extraction, topic classification) | `mindlyst-native/web/src/pages/api/triage.ts`      |
| **MindLyst** — Web Dashboard  | Brain insight generation, reflection enrichment                          | Direct API calls via `@bytelyst/api-client`        |

---

## Phase 0 — Foundation & Scaffolding

> **Goal:** Set up the service skeleton, Python environment, and build pipeline.

### Service scaffold (Fastify)

- [x] **0.1** Create `services/extraction-service/` directory structure: [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
  ```
  services/extraction-service/
    src/
      lib/
        config.ts            # Zod config schema (PORT, HOST, CORS, PYTHON_SIDECAR_URL, etc.)
        errors.ts            # Re-export from @bytelyst/errors
        cosmos.ts            # Re-export from @bytelyst/cosmos (for task registry persistence)
        product-config.ts    # Re-export from @bytelyst/config
        python-bridge.ts     # HTTP client to Python sidecar
      modules/
        extract/
          types.ts           # Zod schemas: ExtractionTask, ExtractionExample, ExtractionResult
          routes.ts          # POST /api/extract, POST /api/extract/batch, GET /api/tasks
        tasks/
          types.ts           # Predefined task definitions (triage, transcript, etc.)
          repository.ts      # Cosmos CRUD for custom task definitions
          routes.ts          # CRUD endpoints for task management
      server.ts              # createServiceApp + route registration
    package.json
    tsconfig.json
    Dockerfile
  ```
- [x] **0.2** Create `package.json` (`@lysnrai/extraction-service`, port 4005) matching existing service conventions [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.3** Create `tsconfig.json` (self-contained, matching tracker-service pattern) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.4** Create `src/lib/config.ts` with Zod schema (PORT, HOST, NODE*ENV, CORS_ORIGIN, SERVICE_NAME, PYTHON_SIDECAR_URL, DEFAULT_MODEL_ID, COSMOS*\*, JWT_SECRET, DEFAULT_PRODUCT_ID) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.5** Create `src/server.ts` using `createServiceApp()` + `startService()` from `@bytelyst/fastify-core` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.6** Add `.env.example` with all required env vars [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.7** Verify: `pnpm build` passes for the new service [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### Python sidecar scaffold

- [x] **0.8** Create `services/extraction-service/python/` directory: [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
  ```
  python/
    src/
      __init__.py
      app.py                 # FastAPI app (internal, port 4006)
      extractor.py           # LangExtract wrapper
      task_registry.py       # Built-in task definitions
      models.py              # Pydantic models matching TS Zod schemas
    requirements.txt         # langextract, fastapi, uvicorn, pydantic
    Dockerfile               # Python 3.12 slim
  ```
- [x] **0.9** Create `python/requirements.txt`: [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
  ```
  langextract>=0.3.0
  fastapi>=0.115.0
  uvicorn>=0.34.0
  pydantic>=2.10.0
  pydantic-settings>=2.7.0
  structlog>=24.4.0
  ```
- [x] **0.10** Create `python/src/app.py` — FastAPI app with POST /extract, POST /extract/batch, GET /health [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.11** Create `python/src/extractor.py` — wrapper around `lx.extract()` with mock fallback [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [ ] **0.12** Verify: Python sidecar starts and `/health` returns OK (requires `pip install` — deferred to Phase 1)

### Package scaffold (`@bytelyst/extraction`)

- [x] **0.13** Create `packages/extraction/` directory: [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
  ```
  packages/extraction/
    src/
      index.ts               # Public API
      types.ts               # Shared TypeScript types
      client.ts              # createExtractionClient() factory
    package.json
    tsconfig.json
  ```
- [x] **0.14** Create `package.json` (`@bytelyst/extraction`) with `@bytelyst/api-client` as peer dep [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.15** Define TypeScript types (ExtractionTask, ExtractionExample, ExtractionEntity, ExtractRequest, ExtractResponse, BatchExtractRequest, BatchExtractResponse) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.16** Create `createExtractionClient()` factory using `createApiClient()` pattern [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.17** Verify: `pnpm build` passes for the new package [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### Workspace wiring

- [x] **0.18** Verify `extraction-service` and `extraction` covered by `packages/*` + `services/*` globs in `pnpm-workspace.yaml` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.19** Run `pnpm install` from repo root — workspace resolution verified [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **0.20** Verify: `pnpm build` passes for both extraction-service and @bytelyst/extraction [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

---

## Phase 1 — Core Extraction API

> **Goal:** Working extraction endpoint that accepts text + task definition and returns structured results via LangExtract.

### Python extractor implementation

- [x] **1.1** Implement `extractor.py` — LangExtract wrapper with mock fallback, configurable model_id, extraction_passes, max_workers, max_char_buffer [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.2** Model provider configuration — Gemini default via DEFAULT_MODEL_ID env var, model_id passthrough to lx.extract() [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.3** structlog logging in extractor.py and app.py (extraction_complete, extraction_failed, extract_request) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.4** Request timeout in python-bridge.ts (DEFAULT_TIMEOUT_MS = 120s, configurable per-call) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### Fastify routes

- [x] **1.5** Implement `src/modules/extract/types.ts` — ExtractRequestSchema, ExtractResponseSchema, BatchExtractRequestSchema (Zod) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.6** Implement `src/modules/extract/routes.ts` — POST /extract, POST /extract/batch, GET /extract/models, GET /extract/sidecar-health [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.7** Implement `src/lib/python-bridge.ts` — sidecarExtract, sidecarExtractBatch, sidecarHealth, waitForSidecar with x-request-id forwarding [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **1.8** Rate limiting on extract routes (30 req/min per IP via @fastify/rate-limit) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)

### Tests

- [x] **1.9** Unit tests for Zod schemas — 13 extract tests + 8 task tests (21 total) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
- [ ] **1.10** Integration tests for extract routes (mock Python sidecar responses) — deferred to Phase 3
- [ ] **1.11** Python unit tests for `extractor.py` — deferred (requires pip install in CI)
- [x] **1.12** Verify: `pnpm test` passes (21 tests) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)

---

## Phase 2 — Predefined Task Library

> **Goal:** Ship a curated set of extraction task definitions that LysnrAI and MindLyst can use out-of-the-box.

### Task definitions

- [x] **2.1** Define `transcript-extraction` task (6 classes, few-shot examples) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.2** Define `triage` task (MindLyst) — 6 classes incl. brain_signal with brain/confidence attributes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.3** Define `memory-insight` task (MindLyst) — 4 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.4** Define `reflection-enrichment` task (MindLyst) — 4 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.5** Define `bug-report-extraction` task (Tracker) — 5 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### Task registry (Cosmos DB)

- [x] **2.6** Cosmos container `extraction_tasks` (partition `/productId`) — created on first access via repository [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.7** Implement `src/modules/tasks/repository.ts` — listTasks, getTask, createTask, updateTask, deleteTask, upsertTask [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.8** Implement `src/modules/tasks/routes.ts` — GET/POST/PUT/DELETE /tasks [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.9** Seed built-in tasks on startup via `seed.ts` (idempotent upsert, 5 tasks) [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)
- [x] **2.10** `productId` on all task documents (DEFAULT_PRODUCT_ID from env) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### Python task registry

- [x] **2.11** Implement `task_registry.py` — BUILTIN_TASKS with full definitions inline [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **2.12** Task definitions stored inline in `task_registry.py` (no separate JSON needed) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [ ] **2.13** Task validation: verify examples follow LangExtract best practices — deferred to Phase 5

### Tests

- [x] **2.14** Tests for task schemas (8 tests in types.test.ts) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
- [x] **2.15** Tests for task seeding (7 tests in seed.test.ts) [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)
- [x] **2.16** Verify: all 28 tests pass [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)

---

## Phase 3 — Consumer Integration

> **Goal:** Wire LysnrAI and MindLyst to call the extraction service.

### `@bytelyst/extraction` package finalization

- [x] **3.1** `createExtractionClient()` with extract(), extractBatch(), listTasks(), getTask() [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **3.2** Export all types from `src/index.ts` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
- [x] **3.3** `pnpm build` passes for `@bytelyst/extraction` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)

### LysnrAI integration

- [x] **3.4** Add `@bytelyst/extraction` to `admin-dashboard-web/package.json` (via `file:` ref) [`944609a`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/944609a)
- [x] **3.5** Create `admin-dashboard-web/src/lib/extraction-client.ts` — extractText, extractTranscript, extractBatch, listTasks, getTask, getSidecarHealth [`944609a`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/944609a)
- [ ] **3.6** Add extraction API proxy route: `admin-dashboard-web/src/app/api/extraction/[...path]/route.ts` — deferred (client calls service directly for now)
- [ ] **3.7** Python extraction client in `backend/src/clients/extraction_client.py` — deferred to Phase 5
- [ ] **3.8** Post-transcription extraction endpoint `POST /api/transcripts/{id}/extract` — deferred to Phase 5
- [ ] **3.9** Extraction results UI in admin dashboard — deferred to Phase 5

### MindLyst integration

- [x] **3.10** MindLyst web extraction client (standalone, no @bytelyst deps needed) [`b545244`](https://github.com/saravanakumardb1/learning_multimodal_memory_agents/commit/b545244)
- [x] **3.11** Create `mindlyst-native/web/src/lib/extraction-client.ts` — triageExtract, memoryInsightExtract, reflectionExtract, isExtractionAvailable [`b545244`](https://github.com/saravanakumardb1/learning_multimodal_memory_agents/commit/b545244)
- [ ] **3.12** Create API route `src/pages/api/extract.ts` — deferred (client ready, route integration next)
- [ ] **3.13** Wire triage flow to use extraction results — deferred to Phase 5
- [ ] **3.14** Wire brain insights to `memory-insight` task — deferred to Phase 5
- [ ] **3.15** Wire reflections to `reflection-enrichment` task — deferred to Phase 5

### Tests

- [ ] **3.16** Integration tests for LysnrAI extraction — deferred to Phase 5
- [ ] **3.17** Integration tests for MindLyst triage-via-extraction — deferred to Phase 5
- [ ] **3.18** Verify `npx tsc --noEmit` across all dashboards — deferred to Phase 5

---

## Phase 4 — Docker & DevOps

> **Goal:** Containerize, add to docker-compose, update run scripts.

### Dockerfile

- [ ] **4.1** Create multi-stage `Dockerfile` for extraction-service — deferred (hybrid TS+Python needs two-container approach)
- [ ] **4.2** Create `supervisord.conf` — deferred (see 4.1)
- [ ] **4.3** Verify: `docker build` succeeds — deferred

### Docker Compose

- [x] **4.4** Add `extraction-service` to `docker-compose.yml` (port 4005, Traefik, Loki, healthcheck) [`bdd9bb1`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/bdd9bb1)
- [ ] **4.5** Add to LysnrAI `docker-compose.yml` — deferred

### Run scripts

- [ ] **4.6** Add extraction-service to `run-local-all-services.sh` — deferred
- [ ] **4.7** Add extraction-service to `.windsurf/workflows/start-all-services.md` — deferred
- [x] **4.8** Add `EXTRACTION_SERVICE_URL` to LysnrAI `.env.example` [`944609a`](https://github.com/saravanakumardb1/learning_voice_ai_agent/commit/944609a)
- [x] **4.9** Add extraction service env vars to common platform `.env.example` [`bdd9bb1`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/bdd9bb1)

### CI

- [ ] **4.10** Create `.github/workflows/ci-extraction-service.yml` — deferred
- [ ] **4.11** Verify: CI workflow passes — deferred

---

## Phase 5 — Production Hardening

> **Goal:** Rate limiting, caching, observability, cost controls.

### Caching

- [ ] **5.1** Add result caching in Python sidecar:
  - Cache key: hash(task_id + input_text + model_id)
  - TTL: configurable (default 24h)
  - Storage: in-memory LRU (dev) or Redis (prod)
- [ ] **5.2** Add cache hit/miss headers to Fastify response (`X-Extraction-Cache: HIT/MISS`)

### Cost controls

- [ ] **5.3** Add per-user daily extraction quota (configurable per plan tier):
  - Free: 10 extractions/day
  - Pro: 100 extractions/day
  - Enterprise: unlimited
- [ ] **5.4** Track usage in Cosmos `extraction_usage` container (partition: `/userId`)
- [ ] **5.5** Return `429 Too Many Requests` with quota info when exceeded
- [ ] **5.6** Add usage reporting endpoint: `GET /api/extract/usage` (admin)

### Observability

- [ ] **5.7** Add structured logging for every extraction:
  - Request: task_id, input_length, model_id, user_id, product_id
  - Response: entity_count, duration_ms, token_count, cache_hit
- [ ] **5.8** Add Prometheus metrics (via `fastify-metrics`):
  - `extraction_requests_total` (labels: task_id, model_id, product_id, status)
  - `extraction_duration_seconds` (histogram)
  - `extraction_entities_extracted` (histogram)
  - `extraction_cache_hit_total`
- [ ] **5.9** Add Grafana dashboard for extraction service (in `services/monitoring/grafana/dashboards/`)

### Error handling

- [ ] **5.10** Map LangExtract errors to `@bytelyst/errors`:
  - Model timeout → `408 Request Timeout`
  - Rate limit (upstream LLM) → `429 Too Many Requests` with retry-after
  - Invalid task definition → `400 Bad Request`
  - Model unavailable → `503 Service Unavailable`
- [ ] **5.11** Add circuit breaker for Python sidecar (fail fast if sidecar is down)
- [ ] **5.12** Add graceful degradation: return partial results if some chunks fail

---

## Phase 6 — Advanced Features (Future)

> **Goal:** Power-user features, visualization, and batch processing.

### Visualization

- [ ] **6.1** Expose LangExtract's HTML visualization:
  - `GET /api/extract/:requestId/visualization` — returns interactive HTML
  - Embed in admin dashboard for extraction quality review
- [ ] **6.2** Store visualization artifacts in Azure Blob Storage (`extractions` container)

### Batch & async processing

- [ ] **6.3** Add async extraction endpoint:
  - `POST /api/extract/async` — returns job ID immediately
  - `GET /api/extract/jobs/:id` — poll for status + results
  - Webhook callback when complete
- [ ] **6.4** Add Vertex AI batch processing support (for high-volume MindLyst triage)

### Custom model support

- [ ] **6.5** Add Ollama provider for local/air-gapped deployments
- [ ] **6.6** Add model benchmarking endpoint: run same task across models, compare quality + cost

### Multi-language extraction

- [ ] **6.7** Test and validate extraction across languages (LangExtract supports multi-language via LLM)
- [ ] **6.8** Add language detection to extraction pipeline (auto-detect input language)

---

## Env Vars Summary

| Variable                 | Service            | Default                 | Description                         |
| ------------------------ | ------------------ | ----------------------- | ----------------------------------- |
| `PORT`                   | extraction-service | `4005`                  | Fastify listen port                 |
| `HOST`                   | extraction-service | `0.0.0.0`               | Fastify listen host                 |
| `CORS_ORIGIN`            | extraction-service | `*`                     | Allowed origins                     |
| `PYTHON_SIDECAR_URL`     | extraction-service | `http://localhost:4006` | Python sidecar URL                  |
| `DEFAULT_MODEL_ID`       | extraction-service | `gemini-2.5-flash`      | Default LLM model                   |
| `GEMINI_API_KEY`         | python sidecar     | —                       | Google Gemini API key               |
| `AZURE_OPENAI_API_KEY`   | python sidecar     | —                       | Azure OpenAI key (alternative)      |
| `AZURE_OPENAI_ENDPOINT`  | python sidecar     | —                       | Azure OpenAI endpoint (alternative) |
| `MAX_WORKERS`            | python sidecar     | `10`                    | Parallel extraction workers         |
| `MAX_CHAR_BUFFER`        | python sidecar     | `2000`                  | Chunk size for long docs            |
| `EXTRACTION_CACHE_TTL`   | python sidecar     | `86400`                 | Cache TTL in seconds                |
| `COSMOS_ENDPOINT`        | extraction-service | —                       | Azure Cosmos DB endpoint            |
| `COSMOS_KEY`             | extraction-service | —                       | Azure Cosmos DB key                 |
| `COSMOS_DATABASE`        | extraction-service | `lysnrai`               | Database name                       |
| `JWT_SECRET`             | extraction-service | —                       | JWT validation secret               |
| `EXTRACTION_SERVICE_URL` | consumers          | `http://localhost:4005` | Used by dashboards/backends         |

---

## Port Allocation

| Service                                      | Port     |
| -------------------------------------------- | -------- |
| growth-service                               | 4001     |
| billing-service                              | 4002     |
| platform-service                             | 4003     |
| tracker-service                              | 4004     |
| **extraction-service**                       | **4005** |
| extraction-service python sidecar (internal) | 4006     |

---

## Dependency Graph

```
@bytelyst/extraction (package)
  └── @bytelyst/api-client (peer dep)

@lysnrai/extraction-service (service)
  ├── @bytelyst/fastify-core
  ├── @bytelyst/auth
  ├── @bytelyst/config
  ├── @bytelyst/cosmos
  ├── @bytelyst/errors
  ├── fastify, zod, jose (direct deps)
  └── python sidecar
      └── langextract, fastapi, uvicorn, structlog
```

---

## Estimated Effort

| Phase                          | Effort   | Dependencies |
| ------------------------------ | -------- | ------------ |
| Phase 0 — Foundation           | 2–3 days | None         |
| Phase 1 — Core API             | 2–3 days | Phase 0      |
| Phase 2 — Task Library         | 2 days   | Phase 1      |
| Phase 3 — Consumer Integration | 3–4 days | Phase 2      |
| Phase 4 — Docker & DevOps      | 1–2 days | Phase 1      |
| Phase 5 — Production Hardening | 2–3 days | Phase 3      |
| Phase 6 — Advanced (future)    | Ongoing  | Phase 5      |

**Total MVP (Phases 0–4): ~10–14 days**

---

## Rollback Strategy

- The extraction-service is **additive** — no existing code is modified until Phase 3
- Phase 3 consumer integration uses new endpoints/routes — existing triage/transcript flows remain untouched
- If extraction-service is down, consumers fall back to their existing behavior (MindLyst mock triage, LysnrAI raw transcripts)
- The `@bytelyst/extraction` package is optional — dashboards only import it for new extraction features