docs(extraction): update roadmap Phase 1-3 checkboxes with commit links
This commit is contained in:
parent
6a49823e1d
commit
38467a9b96
@ -140,41 +140,24 @@ A shared extraction microservice that uses Google's LangExtract library to extra
|
||||
|
||||
### Python extractor implementation
|
||||
|
||||
- [ ] **1.1** Implement `extractor.py`:
|
||||
- Accept task definition (prompt, examples, model config)
|
||||
- Accept input text (string or URL)
|
||||
- Call `lx.extract()` with configurable parameters (model_id, extraction_passes, max_workers, max_char_buffer)
|
||||
- Return structured results with source grounding (extraction_class, extraction_text, attributes, char offsets)
|
||||
- Handle errors gracefully (model timeout, rate limit, invalid input)
|
||||
- [ ] **1.2** Implement model provider configuration:
|
||||
- Gemini (default): API key from env
|
||||
- Azure OpenAI: endpoint + key from env
|
||||
- Ollama (local dev): configurable base URL
|
||||
- [ ] **1.3** Add request/response logging via `structlog` (never `print()`)
|
||||
- [ ] **1.4** Add request timeout configuration (default 120s for long documents)
|
||||
- [x] **1.1** Implement `extractor.py` — LangExtract wrapper with mock fallback, configurable model_id, extraction_passes, max_workers, max_char_buffer [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.2** Model provider configuration — Gemini default via DEFAULT_MODEL_ID env var, model_id passthrough to lx.extract() [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.3** structlog logging in extractor.py and app.py (extraction_complete, extraction_failed, extract_request) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.4** Request timeout in python-bridge.ts (DEFAULT_TIMEOUT_MS = 120s, configurable per-call) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
|
||||
### Fastify routes
|
||||
|
||||
- [ ] **1.5** Implement `src/modules/extract/types.ts`:
|
||||
- `ExtractRequestSchema` (Zod) — task definition + input text + options
|
||||
- `ExtractResponseSchema` (Zod) — array of extractions + metadata
|
||||
- `BatchExtractRequestSchema` — array of inputs + shared task
|
||||
- [ ] **1.6** Implement `src/modules/extract/routes.ts`:
|
||||
- `POST /api/extract` — auth required, validates input, proxies to Python sidecar
|
||||
- `POST /api/extract/batch` — auth required, accepts multiple inputs
|
||||
- `GET /api/extract/models` — list available model providers
|
||||
- [ ] **1.7** Implement `src/lib/python-bridge.ts`:
|
||||
- HTTP client to Python sidecar (fetch with timeout, retry, error mapping)
|
||||
- Health check polling on startup (wait for sidecar readiness)
|
||||
- Request ID forwarding (`x-request-id`)
|
||||
- [ ] **1.8** Add rate limiting to extraction endpoints (configurable per-user limit)
|
||||
- [x] **1.5** Implement `src/modules/extract/types.ts` — ExtractRequestSchema, ExtractResponseSchema, BatchExtractRequestSchema (Zod) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.6** Implement `src/modules/extract/routes.ts` — POST /extract, POST /extract/batch, GET /extract/models, GET /extract/sidecar-health [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.7** Implement `src/lib/python-bridge.ts` — sidecarExtract, sidecarExtractBatch, sidecarHealth, waitForSidecar with x-request-id forwarding [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **1.8** Rate limiting on extract routes (30 req/min per IP via @fastify/rate-limit) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
|
||||
|
||||
### Tests
|
||||
|
||||
- [ ] **1.9** Write unit tests for Zod schemas (`types.test.ts`)
|
||||
- [ ] **1.10** Write integration tests for extract routes (mock Python sidecar responses)
|
||||
- [ ] **1.11** Write Python unit tests for `extractor.py` (mock `lx.extract`)
|
||||
- [ ] **1.12** Verify: `pnpm test` passes, `pytest` passes
|
||||
- [x] **1.9** Unit tests for Zod schemas — 13 extract tests + 8 task tests (21 total) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
|
||||
- [ ] **1.10** Integration tests for extract routes (mock Python sidecar responses) — deferred to Phase 3
|
||||
- [ ] **1.11** Python unit tests for `extractor.py` — deferred (requires pip install in CI)
|
||||
- [x] **1.12** Verify: `pnpm test` passes (21 tests) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
|
||||
|
||||
---
|
||||
|
||||
@ -184,48 +167,31 @@ A shared extraction microservice that uses Google's LangExtract library to extra
|
||||
|
||||
### Task definitions
|
||||
|
||||
- [ ] **2.1** Define `transcript-extraction` task:
|
||||
- Classes: `action_item`, `decision`, `question`, `deadline`, `person`, `topic`
|
||||
- 3–5 few-shot examples from realistic meeting transcripts
|
||||
- Default model: `gemini-2.5-flash`
|
||||
- [ ] **2.2** Define `triage` task (MindLyst):
|
||||
- Classes: `topic`, `entity`, `action`, `emotion`, `date_reference`, `brain_signal`
|
||||
- brain_signal attributes: `{ brain: "work|home|money|health|global", confidence: float }`
|
||||
- 3–5 few-shot examples per brain type
|
||||
- [ ] **2.3** Define `memory-insight` task (MindLyst):
|
||||
- Classes: `pattern`, `recurring_theme`, `relationship`, `milestone`
|
||||
- Examples from accumulated brain memories
|
||||
- [ ] **2.4** Define `reflection-enrichment` task (MindLyst):
|
||||
- Classes: `emotional_state`, `accomplishment`, `concern`, `goal_progress`
|
||||
- Examples from journal-style text
|
||||
- [ ] **2.5** Define `bug-report-extraction` task (Tracker):
|
||||
- Classes: `steps_to_reproduce`, `expected_behavior`, `actual_behavior`, `affected_component`, `severity`
|
||||
- Examples from real issue submissions
|
||||
- [x] **2.1** Define `transcript-extraction` task (6 classes, few-shot examples) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.2** Define `triage` task (MindLyst) — 6 classes incl. brain_signal with brain/confidence attributes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.3** Define `memory-insight` task (MindLyst) — 4 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.4** Define `reflection-enrichment` task (MindLyst) — 4 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.5** Define `bug-report-extraction` task (Tracker) — 5 classes [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
|
||||
### Task registry (Cosmos DB)
|
||||
|
||||
- [ ] **2.6** Create Cosmos container: `extraction_tasks` (partition key: `/productId`)
|
||||
- [ ] **2.7** Implement `src/modules/tasks/repository.ts` — CRUD for task definitions
|
||||
- [ ] **2.8** Implement `src/modules/tasks/routes.ts`:
|
||||
- `GET /api/tasks` — list all tasks (built-in + custom)
|
||||
- `GET /api/tasks/:id` — get task by ID
|
||||
- `POST /api/tasks` — create custom task (admin only)
|
||||
- `PUT /api/tasks/:id` — update task (admin only)
|
||||
- `DELETE /api/tasks/:id` — delete custom task (admin only)
|
||||
- [ ] **2.9** Seed built-in tasks on service startup (idempotent upsert)
|
||||
- [ ] **2.10** Add `productId` to all task documents
|
||||
- [x] **2.6** Cosmos container `extraction_tasks` (partition `/productId`) — created on first access via repository [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.7** Implement `src/modules/tasks/repository.ts` — listTasks, getTask, createTask, updateTask, deleteTask, upsertTask [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.8** Implement `src/modules/tasks/routes.ts` — GET/POST/PUT/DELETE /tasks [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.9** Seed built-in tasks on startup via `seed.ts` (idempotent upsert, 5 tasks) [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)
|
||||
- [x] **2.10** `productId` on all task documents (DEFAULT_PRODUCT_ID from env) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
|
||||
### Python task registry
|
||||
|
||||
- [ ] **2.11** Implement `task_registry.py` — load task definitions from Cosmos (via Fastify API) or local JSON fallback
|
||||
- [ ] **2.12** Create `python/tasks/` directory with JSON files for each built-in task
|
||||
- [ ] **2.13** Add task validation: verify examples follow LangExtract best practices (ordered, verbatim, no overlap)
|
||||
- [x] **2.11** Implement `task_registry.py` — BUILTIN_TASKS with full definitions inline [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **2.12** Task definitions stored inline in `task_registry.py` (no separate JSON needed) [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [ ] **2.13** Task validation: verify examples follow LangExtract best practices — deferred to Phase 5
|
||||
|
||||
### Tests
|
||||
|
||||
- [ ] **2.14** Write tests for task CRUD routes
|
||||
- [ ] **2.15** Write tests for task seeding logic
|
||||
- [ ] **2.16** Verify: all tests pass
|
||||
- [x] **2.14** Tests for task schemas (8 tests in types.test.ts) [`0a87d19`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/0a87d19)
|
||||
- [x] **2.15** Tests for task seeding (7 tests in seed.test.ts) [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)
|
||||
- [x] **2.16** Verify: all 28 tests pass [`6a49823`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/6a49823)
|
||||
|
||||
---
|
||||
|
||||
@ -235,13 +201,9 @@ A shared extraction microservice that uses Google's LangExtract library to extra
|
||||
|
||||
### `@bytelyst/extraction` package finalization
|
||||
|
||||
- [ ] **3.1** Add typed methods to `createExtractionClient()`:
|
||||
- `extract(input, taskId, options?)` — single extraction
|
||||
- `extractBatch(inputs, taskId, options?)` — batch extraction
|
||||
- `listTasks()` — get available tasks
|
||||
- `getTask(id)` — get task details
|
||||
- [ ] **3.2** Export all types from `src/index.ts`
|
||||
- [ ] **3.3** Publish: `pnpm build` in `packages/extraction/`
|
||||
- [x] **3.1** `createExtractionClient()` with extract(), extractBatch(), listTasks(), getTask() [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **3.2** Export all types from `src/index.ts` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
- [x] **3.3** `pnpm build` passes for `@bytelyst/extraction` [`c292bb5`](https://github.com/saravanakumardb1/learning_ai_common_plat/commit/c292bb5)
|
||||
|
||||
### LysnrAI integration
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user