- 4 providers: Groq, OpenRouter, Together AI, Cerebras - Regex-based prompt classifier (code/math/reasoning/creative/general) - Instance-level round-robin state (no shared module globals) - Sliding-window health tracker (latency, error rate, rate-limit rate) - Auto-fallback on 429/5xx with per-attempt latency tracking - Telemetry hook for all routing decisions (auto + explicit) - OpenRouter recommended headers (HTTP-Referer, X-Title) - 47 tests across 5 test files, zero runtime deps
135 lines
4.3 KiB
Markdown
135 lines
4.3 KiB
Markdown
# @bytelyst/llm-router
|
|
|
|
Pure-code LLM router for free-tier API providers. No LLM-in-the-loop — deterministic routing with automatic fallback, health tracking, and round-robin load distribution.
|
|
|
|
## Features
|
|
|
|
- **4 free providers** out of the box: Groq, OpenRouter, Together AI, Cerebras
|
|
- **Prompt classification** — regex-based detection of code/math/reasoning/creative prompts
|
|
- **Smart selection** — routes to the best model for each prompt category
|
|
- **Round-robin** — distributes load across providers to maximize free-tier usage
|
|
- **Auto-fallback** — retries on 429/5xx with next-best provider
|
|
- **Health tracking** — sliding-window stats (latency, error rate, rate-limit rate)
|
|
- **Telemetry hook** — log every routing decision for analysis
|
|
- **OpenAI-compatible** — same request/response format as OpenAI chat completions
|
|
- **Zero dependencies** — pure TypeScript, uses native `fetch`
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Set at least one API key
|
|
export GROQ_API_KEY=gsk_...
|
|
export OPENROUTER_API_KEY=sk-or-...
|
|
export TOGETHER_API_KEY=...
|
|
export CEREBRAS_API_KEY=...
|
|
```
|
|
|
|
```typescript
|
|
import { LlmRouter } from '@bytelyst/llm-router';
|
|
|
|
const router = new LlmRouter();
|
|
|
|
// Automatic routing — classifier picks best provider+model
|
|
const result = await router.chat({
|
|
messages: [{ role: 'user', content: 'Write a quicksort in TypeScript' }],
|
|
});
|
|
|
|
console.log(result.response.choices[0].message.content);
|
|
console.log(`Served by: ${result.provider}/${result.model} in ${result.totalLatencyMs}ms`);
|
|
```
|
|
|
|
## Explicit Provider Routing
|
|
|
|
```typescript
|
|
// Force a specific provider:model
|
|
const result = await router.chat({
|
|
messages: [{ role: 'user', content: 'Hello' }],
|
|
model: 'groq:llama-3.3-70b-versatile',
|
|
});
|
|
```
|
|
|
|
## Telemetry
|
|
|
|
```typescript
|
|
const router = new LlmRouter({
|
|
onTelemetry: entry => {
|
|
// entry: { event, provider, model, attempt, latencyMs, category, tokens?, error? }
|
|
console.log(`[${entry.event}] ${entry.provider}/${entry.model} — ${entry.latencyMs}ms`);
|
|
},
|
|
});
|
|
```
|
|
|
|
## Health Monitoring
|
|
|
|
```typescript
|
|
const snapshots = router.getHealth();
|
|
// Returns: HealthSnapshot[] with per-provider stats
|
|
// { provider, model, totalRequests, successes, rateLimits, errors, avgLatencyMs, p95LatencyMs, healthy }
|
|
```
|
|
|
|
## Configuration
|
|
|
|
```typescript
|
|
const router = new LlmRouter({
|
|
// Override default providers
|
|
providers: [...],
|
|
// Health window (default: 60s)
|
|
healthWindowMs: 120_000,
|
|
// Error rate to mark unhealthy (default: 50%)
|
|
errorThreshold: 0.4,
|
|
// Rate-limit rate to mark unhealthy (default: 30%)
|
|
rateLimitThreshold: 0.2,
|
|
// Request timeout (default: 30s)
|
|
timeoutMs: 15_000,
|
|
// Max retry attempts (default: 3)
|
|
maxRetries: 4,
|
|
});
|
|
```
|
|
|
|
## Provider Selection Logic
|
|
|
|
1. **Classify** prompt → code, math, reasoning, creative, or general
|
|
2. **Score** each available model based on category match, speed tier, context window, and model size
|
|
3. **Filter** unhealthy models (based on sliding-window error/rate-limit rates)
|
|
4. **Round-robin** across top-scoring providers to spread rate-limit load
|
|
5. **Fallback** on 429/5xx → exclude failed model, pick next best
|
|
|
|
## Default Provider Registry
|
|
|
|
| Provider | Models | Speed | Strengths |
|
|
| -------------- | ---------------------------------------- | ---------- | ------------------------ |
|
|
| **Groq** | Llama 3.3 70B, Llama 3.1 8B, Gemma 2 9B | ⚡ Fastest | General, reasoning, code |
|
|
| **OpenRouter** | DeepSeek R1, Llama 3.3 70B, Gemma 2 9B | Medium | Reasoning, code, math |
|
|
| **Together** | Llama 3.3 70B Turbo, DeepSeek R1 Distill | Medium | General, reasoning, code |
|
|
| **Cerebras** | Llama 3.3 70B | ⚡ Fastest | General, reasoning, code |
|
|
|
|
## Adding Custom Providers
|
|
|
|
Any OpenAI-compatible endpoint works:
|
|
|
|
```typescript
|
|
import { LlmRouter, DEFAULT_PROVIDERS } from '@bytelyst/llm-router';
|
|
|
|
const router = new LlmRouter({
|
|
providers: [
|
|
...DEFAULT_PROVIDERS,
|
|
{
|
|
name: 'my-provider',
|
|
baseUrl: 'https://my-api.example.com/v1',
|
|
apiKeyEnv: 'MY_PROVIDER_KEY',
|
|
rpmLimit: 60,
|
|
tpmLimit: 100_000,
|
|
models: [
|
|
{
|
|
id: 'my-model',
|
|
label: 'My Model',
|
|
contextWindow: 32_000,
|
|
strengths: ['general', 'code'],
|
|
speedTier: 2,
|
|
},
|
|
],
|
|
},
|
|
],
|
|
});
|
|
```
|