learning_ai_common_plat/services/extraction-service
saravanakumardb1 031e910607 fix(extraction-service): review fixes — locale mapping, model passthrough, content-type validation
BUG 1: Azure locale derivation produced 'en-EN' (invalid) for 2-letter codes.
  → Added toAzureLocale() with 28-language mapping table (en→en-US, pt→pt-BR, etc.)
  → Exported for testing; falls back to code-CODE for unmapped languages.

BUG 2: model field from request schema was silently dropped after provider refactor.
  → Added optional model field to TranscriptionInput interface.
  → OpenAI provider now uses input.model override (falls back to config.model).
  → Route passes model through to provider.transcribe().

GAP 4: SUPPORTED_AUDIO_TYPES was defined but never validated against.
  → Route now rejects unsupported content-types with a clear error message.
  → Allows application/octet-stream (Azure Blob SAS URLs often return this).

GAP 5: Client JSDoc still said 'via OpenAI Whisper API' — now 'via configured STT provider'.

GAP 8: Azure WAV content-type hardcoded samplerate=16000 — now generic audio/wav.

Tests: 42 transcription tests (was 35), 178 total passing.
  → toAzureLocale: 4 tests (locale mapping, passthrough, fallback, case-insensitive)
  → setSTT: 1 test (singleton override)
  → model passthrough: 2 tests (mock ignores, input accepts)
2026-04-06 11:40:27 -07:00
..
evals docs(local-llm): update original setup doc to redirect to docs/ structure 2026-02-19 13:01:35 -08:00
python fix(devops): harden local shared-service docker bring-up 2026-03-22 12:34:38 -07:00
src fix(extraction-service): review fixes — locale mapping, model passthrough, content-type validation 2026-04-06 11:40:27 -07:00
.env.example feat(extraction): scaffold extraction-service + @bytelyst/extraction package 2026-02-14 13:31:40 -08:00
Dockerfile feat(extraction): back product rate limits with valkey 2026-03-31 08:08:53 +00:00
package.json feat(extraction): back product rate limits with valkey 2026-03-31 08:08:53 +00:00
supervisord.conf fix(extraction-service): run python sidecar on railway 2026-02-17 11:32:40 -08:00
tsconfig.json feat(extraction): scaffold extraction-service + @bytelyst/extraction package 2026-02-14 13:31:40 -08:00