# @bytelyst/llm-router Pure-code LLM router for free-tier API providers. No LLM-in-the-loop — deterministic routing with automatic fallback, health tracking, and round-robin load distribution. ## Features - **4 free providers** out of the box: Groq, OpenRouter, Together AI, Cerebras - **Prompt classification** — regex-based detection of code/math/reasoning/creative prompts - **Smart selection** — routes to the best model for each prompt category - **Round-robin** — distributes load across providers to maximize free-tier usage - **Auto-fallback** — retries on 429/5xx with next-best provider - **Health tracking** — sliding-window stats (latency, error rate, rate-limit rate) - **Telemetry hook** — log every routing decision for analysis - **OpenAI-compatible** — same request/response format as OpenAI chat completions - **Zero dependencies** — pure TypeScript, uses native `fetch` ## Quick Start ```bash # Set at least one API key export GROQ_API_KEY=gsk_... export OPENROUTER_API_KEY=sk-or-... export TOGETHER_API_KEY=... export CEREBRAS_API_KEY=... ``` ```typescript import { LlmRouter } from '@bytelyst/llm-router'; const router = new LlmRouter(); // Automatic routing — classifier picks best provider+model const result = await router.chat({ messages: [{ role: 'user', content: 'Write a quicksort in TypeScript' }], }); console.log(result.response.choices[0].message.content); console.log(`Served by: ${result.provider}/${result.model} in ${result.totalLatencyMs}ms`); ``` ## Explicit Provider Routing ```typescript // Force a specific provider:model const result = await router.chat({ messages: [{ role: 'user', content: 'Hello' }], model: 'groq:llama-3.3-70b-versatile', }); ``` ## Telemetry ```typescript const router = new LlmRouter({ onTelemetry: entry => { // entry: { event, provider, model, attempt, latencyMs, category, tokens?, error? } console.log(`[${entry.event}] ${entry.provider}/${entry.model} — ${entry.latencyMs}ms`); }, }); ``` ## Health Monitoring ```typescript const snapshots = router.getHealth(); // Returns: HealthSnapshot[] with per-provider stats // { provider, model, totalRequests, successes, rateLimits, errors, avgLatencyMs, p95LatencyMs, healthy } ``` ## Configuration ```typescript const router = new LlmRouter({ // Override default providers providers: [...], // Health window (default: 60s) healthWindowMs: 120_000, // Error rate to mark unhealthy (default: 50%) errorThreshold: 0.4, // Rate-limit rate to mark unhealthy (default: 30%) rateLimitThreshold: 0.2, // Request timeout (default: 30s) timeoutMs: 15_000, // Max retry attempts (default: 3) maxRetries: 4, }); ``` ## Provider Selection Logic 1. **Classify** prompt → code, math, reasoning, creative, or general 2. **Score** each available model based on category match, speed tier, context window, and model size 3. **Filter** unhealthy models (based on sliding-window error/rate-limit rates) 4. **Round-robin** across top-scoring providers to spread rate-limit load 5. **Fallback** on 429/5xx → exclude failed model, pick next best ## Default Provider Registry | Provider | Models | Speed | Strengths | | -------------- | ---------------------------------------- | ---------- | ------------------------ | | **Groq** | Llama 3.3 70B, Llama 3.1 8B, Gemma 2 9B | ⚡ Fastest | General, reasoning, code | | **OpenRouter** | DeepSeek R1, Llama 3.3 70B, Gemma 2 9B | Medium | Reasoning, code, math | | **Together** | Llama 3.3 70B Turbo, DeepSeek R1 Distill | Medium | General, reasoning, code | | **Cerebras** | Llama 3.3 70B | ⚡ Fastest | General, reasoning, code | ## Adding Custom Providers Any OpenAI-compatible endpoint works: ```typescript import { LlmRouter, DEFAULT_PROVIDERS } from '@bytelyst/llm-router'; const router = new LlmRouter({ providers: [ ...DEFAULT_PROVIDERS, { name: 'my-provider', baseUrl: 'https://my-api.example.com/v1', apiKeyEnv: 'MY_PROVIDER_KEY', rpmLimit: 60, tpmLimit: 100_000, models: [ { id: 'my-model', label: 'My Model', contextWindow: 32_000, strengths: ['general', 'code'], speedTier: 2, }, ], }, ], }); ```