feat(llm-router): add @bytelyst/llm-router — pure-code LLM router for free-tier providers
- 4 providers: Groq, OpenRouter, Together AI, Cerebras - Regex-based prompt classifier (code/math/reasoning/creative/general) - Instance-level round-robin state (no shared module globals) - Sliding-window health tracker (latency, error rate, rate-limit rate) - Auto-fallback on 429/5xx with per-attempt latency tracking - Telemetry hook for all routing decisions (auto + explicit) - OpenRouter recommended headers (HTTP-Referer, X-Title) - 47 tests across 5 test files, zero runtime deps
This commit is contained in:
parent
ae13abfab2
commit
b1b3fe42df
134
packages/llm-router/README.md
Normal file
134
packages/llm-router/README.md
Normal file
@ -0,0 +1,134 @@
|
|||||||
|
# @bytelyst/llm-router
|
||||||
|
|
||||||
|
Pure-code LLM router for free-tier API providers. No LLM-in-the-loop — deterministic routing with automatic fallback, health tracking, and round-robin load distribution.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **4 free providers** out of the box: Groq, OpenRouter, Together AI, Cerebras
|
||||||
|
- **Prompt classification** — regex-based detection of code/math/reasoning/creative prompts
|
||||||
|
- **Smart selection** — routes to the best model for each prompt category
|
||||||
|
- **Round-robin** — distributes load across providers to maximize free-tier usage
|
||||||
|
- **Auto-fallback** — retries on 429/5xx with next-best provider
|
||||||
|
- **Health tracking** — sliding-window stats (latency, error rate, rate-limit rate)
|
||||||
|
- **Telemetry hook** — log every routing decision for analysis
|
||||||
|
- **OpenAI-compatible** — same request/response format as OpenAI chat completions
|
||||||
|
- **Zero dependencies** — pure TypeScript, uses native `fetch`
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set at least one API key
|
||||||
|
export GROQ_API_KEY=gsk_...
|
||||||
|
export OPENROUTER_API_KEY=sk-or-...
|
||||||
|
export TOGETHER_API_KEY=...
|
||||||
|
export CEREBRAS_API_KEY=...
|
||||||
|
```
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import { LlmRouter } from '@bytelyst/llm-router';
|
||||||
|
|
||||||
|
const router = new LlmRouter();
|
||||||
|
|
||||||
|
// Automatic routing — classifier picks best provider+model
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Write a quicksort in TypeScript' }],
|
||||||
|
});
|
||||||
|
|
||||||
|
console.log(result.response.choices[0].message.content);
|
||||||
|
console.log(`Served by: ${result.provider}/${result.model} in ${result.totalLatencyMs}ms`);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Explicit Provider Routing
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Force a specific provider:model
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
model: 'groq:llama-3.3-70b-versatile',
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
## Telemetry
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const router = new LlmRouter({
|
||||||
|
onTelemetry: entry => {
|
||||||
|
// entry: { event, provider, model, attempt, latencyMs, category, tokens?, error? }
|
||||||
|
console.log(`[${entry.event}] ${entry.provider}/${entry.model} — ${entry.latencyMs}ms`);
|
||||||
|
},
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
## Health Monitoring
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const snapshots = router.getHealth();
|
||||||
|
// Returns: HealthSnapshot[] with per-provider stats
|
||||||
|
// { provider, model, totalRequests, successes, rateLimits, errors, avgLatencyMs, p95LatencyMs, healthy }
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const router = new LlmRouter({
|
||||||
|
// Override default providers
|
||||||
|
providers: [...],
|
||||||
|
// Health window (default: 60s)
|
||||||
|
healthWindowMs: 120_000,
|
||||||
|
// Error rate to mark unhealthy (default: 50%)
|
||||||
|
errorThreshold: 0.4,
|
||||||
|
// Rate-limit rate to mark unhealthy (default: 30%)
|
||||||
|
rateLimitThreshold: 0.2,
|
||||||
|
// Request timeout (default: 30s)
|
||||||
|
timeoutMs: 15_000,
|
||||||
|
// Max retry attempts (default: 3)
|
||||||
|
maxRetries: 4,
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
## Provider Selection Logic
|
||||||
|
|
||||||
|
1. **Classify** prompt → code, math, reasoning, creative, or general
|
||||||
|
2. **Score** each available model based on category match, speed tier, context window, and model size
|
||||||
|
3. **Filter** unhealthy models (based on sliding-window error/rate-limit rates)
|
||||||
|
4. **Round-robin** across top-scoring providers to spread rate-limit load
|
||||||
|
5. **Fallback** on 429/5xx → exclude failed model, pick next best
|
||||||
|
|
||||||
|
## Default Provider Registry
|
||||||
|
|
||||||
|
| Provider | Models | Speed | Strengths |
|
||||||
|
| -------------- | ---------------------------------------- | ---------- | ------------------------ |
|
||||||
|
| **Groq** | Llama 3.3 70B, Llama 3.1 8B, Gemma 2 9B | ⚡ Fastest | General, reasoning, code |
|
||||||
|
| **OpenRouter** | DeepSeek R1, Llama 3.3 70B, Gemma 2 9B | Medium | Reasoning, code, math |
|
||||||
|
| **Together** | Llama 3.3 70B Turbo, DeepSeek R1 Distill | Medium | General, reasoning, code |
|
||||||
|
| **Cerebras** | Llama 3.3 70B | ⚡ Fastest | General, reasoning, code |
|
||||||
|
|
||||||
|
## Adding Custom Providers
|
||||||
|
|
||||||
|
Any OpenAI-compatible endpoint works:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import { LlmRouter, DEFAULT_PROVIDERS } from '@bytelyst/llm-router';
|
||||||
|
|
||||||
|
const router = new LlmRouter({
|
||||||
|
providers: [
|
||||||
|
...DEFAULT_PROVIDERS,
|
||||||
|
{
|
||||||
|
name: 'my-provider',
|
||||||
|
baseUrl: 'https://my-api.example.com/v1',
|
||||||
|
apiKeyEnv: 'MY_PROVIDER_KEY',
|
||||||
|
rpmLimit: 60,
|
||||||
|
tpmLimit: 100_000,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'my-model',
|
||||||
|
label: 'My Model',
|
||||||
|
contextWindow: 32_000,
|
||||||
|
strengths: ['general', 'code'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
```
|
||||||
26
packages/llm-router/package.json
Normal file
26
packages/llm-router/package.json
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"name": "@bytelyst/llm-router",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"description": "Pure-code LLM router for free-tier API providers with round-robin, fallback, and health tracking",
|
||||||
|
"type": "module",
|
||||||
|
"exports": {
|
||||||
|
".": {
|
||||||
|
"import": "./dist/index.js",
|
||||||
|
"types": "./dist/index.d.ts"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"main": "./dist/index.js",
|
||||||
|
"types": "./dist/index.d.ts",
|
||||||
|
"files": [
|
||||||
|
"dist"
|
||||||
|
],
|
||||||
|
"scripts": {
|
||||||
|
"build": "tsc",
|
||||||
|
"test": "vitest run",
|
||||||
|
"typecheck": "tsc --noEmit"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"vitest": "^3.0.0",
|
||||||
|
"typescript": "^5.7.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
73
packages/llm-router/src/__tests__/classifier.test.ts
Normal file
73
packages/llm-router/src/__tests__/classifier.test.ts
Normal file
@ -0,0 +1,73 @@
|
|||||||
|
import { describe, it, expect } from 'vitest';
|
||||||
|
import { classifyPrompt } from '../classifier.js';
|
||||||
|
|
||||||
|
describe('classifyPrompt', () => {
|
||||||
|
it('classifies code prompts', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{ role: 'user', content: 'Write a typescript function to sort an array' },
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('code');
|
||||||
|
expect(result.estimatedTokens).toBeGreaterThan(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('classifies code with keywords like refactor and debug', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{ role: 'user', content: 'Debug this error in my React component and refactor the handler' },
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('code');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('classifies math prompts', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{ role: 'user', content: 'Calculate the integral of x^2 from 0 to 5' },
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('math');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('classifies reasoning prompts', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{
|
||||||
|
role: 'user',
|
||||||
|
content:
|
||||||
|
'Explain step by step why this approach has trade-offs and analyze the implications',
|
||||||
|
},
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('reasoning');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('classifies creative prompts', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{ role: 'user', content: 'Write a short story about a robot who learns to paint' },
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('creative');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('defaults to general for ambiguous prompts', () => {
|
||||||
|
const result = classifyPrompt([{ role: 'user', content: 'Hello, how are you?' }]);
|
||||||
|
expect(result.category).toBe('general');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('estimates tokens roughly correctly', () => {
|
||||||
|
const text = 'a'.repeat(400); // ~100 tokens
|
||||||
|
const result = classifyPrompt([{ role: 'user', content: text }]);
|
||||||
|
expect(result.estimatedTokens).toBe(100);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('handles multi-message conversations', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{ role: 'system', content: 'You are a coding assistant' },
|
||||||
|
{ role: 'user', content: 'Fix the bug in my python function' },
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('code');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('detects code blocks in backticks', () => {
|
||||||
|
const result = classifyPrompt([
|
||||||
|
{
|
||||||
|
role: 'user',
|
||||||
|
content: 'What is wrong with this?\n```\nconst x = 1;\nconsole.log(x);\n```',
|
||||||
|
},
|
||||||
|
]);
|
||||||
|
expect(result.category).toBe('code');
|
||||||
|
});
|
||||||
|
});
|
||||||
121
packages/llm-router/src/__tests__/health.test.ts
Normal file
121
packages/llm-router/src/__tests__/health.test.ts
Normal file
@ -0,0 +1,121 @@
|
|||||||
|
import { describe, it, expect, beforeEach } from 'vitest';
|
||||||
|
import { HealthTracker } from '../health.js';
|
||||||
|
|
||||||
|
describe('HealthTracker', () => {
|
||||||
|
let tracker: HealthTracker;
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
tracker = new HealthTracker({ windowMs: 10_000, errorThreshold: 0.5, rateLimitThreshold: 0.3 });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reports healthy with no data', () => {
|
||||||
|
expect(tracker.isHealthy('groq', 'llama-3.3-70b')).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reports healthy with all successes', () => {
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
tracker.record('groq', 'llama-3.3-70b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
expect(tracker.isHealthy('groq', 'llama-3.3-70b')).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('marks unhealthy when error rate exceeds threshold', () => {
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
tracker.record('groq', 'llama-3.3-70b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
expect(tracker.isHealthy('groq', 'llama-3.3-70b')).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('marks unhealthy when rate-limit rate exceeds threshold', () => {
|
||||||
|
// 2 successes + 3 rate limits = 60% rate limit rate > 30% threshold
|
||||||
|
for (let i = 0; i < 2; i++) {
|
||||||
|
tracker.record('openrouter', 'model-a', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 100,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
for (let i = 0; i < 3; i++) {
|
||||||
|
tracker.record('openrouter', 'model-a', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 50,
|
||||||
|
status: 'rate_limit',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
expect(tracker.isHealthy('openrouter', 'model-a')).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('assumes healthy with fewer than 3 records', () => {
|
||||||
|
tracker.record('groq', 'llama-3.3-70b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
tracker.record('groq', 'llama-3.3-70b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
// Only 2 records — not enough data, should still be healthy
|
||||||
|
expect(tracker.isHealthy('groq', 'llama-3.3-70b')).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('computes avg and p95 latency', () => {
|
||||||
|
const latencies = [100, 200, 300, 400, 500];
|
||||||
|
for (const latencyMs of latencies) {
|
||||||
|
tracker.record('groq', 'model-a', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const snap = tracker.snapshot('groq', 'model-a');
|
||||||
|
expect(snap.avgLatencyMs).toBe(300);
|
||||||
|
expect(snap.p95LatencyMs).toBe(500);
|
||||||
|
expect(snap.successes).toBe(5);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('tracks different providers independently', () => {
|
||||||
|
tracker.record('groq', 'model-a', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 100,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
tracker.record('openrouter', 'model-b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 500,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
|
||||||
|
const snapA = tracker.snapshot('groq', 'model-a');
|
||||||
|
const snapB = tracker.snapshot('openrouter', 'model-b');
|
||||||
|
expect(snapA.successes).toBe(1);
|
||||||
|
expect(snapB.errors).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns all snapshots', () => {
|
||||||
|
tracker.record('groq', 'model-a', { timestamp: Date.now(), latencyMs: 100, status: 'success' });
|
||||||
|
tracker.record('together', 'model-b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
|
||||||
|
const all = tracker.allSnapshots();
|
||||||
|
expect(all).toHaveLength(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resets all data', () => {
|
||||||
|
tracker.record('groq', 'model-a', { timestamp: Date.now(), latencyMs: 100, status: 'success' });
|
||||||
|
tracker.reset();
|
||||||
|
expect(tracker.allSnapshots()).toHaveLength(0);
|
||||||
|
});
|
||||||
|
});
|
||||||
83
packages/llm-router/src/__tests__/registry.test.ts
Normal file
83
packages/llm-router/src/__tests__/registry.test.ts
Normal file
@ -0,0 +1,83 @@
|
|||||||
|
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
|
||||||
|
import { getAvailableProviders, DEFAULT_PROVIDERS } from '../registry.js';
|
||||||
|
import type { ProviderConfig } from '../types.js';
|
||||||
|
|
||||||
|
describe('getAvailableProviders', () => {
|
||||||
|
const saved: Record<string, string | undefined> = {};
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
// Save and clear all default provider env vars
|
||||||
|
for (const p of DEFAULT_PROVIDERS) {
|
||||||
|
saved[p.apiKeyEnv] = process.env[p.apiKeyEnv];
|
||||||
|
delete process.env[p.apiKeyEnv];
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
// Restore original env
|
||||||
|
for (const [key, val] of Object.entries(saved)) {
|
||||||
|
if (val === undefined) {
|
||||||
|
delete process.env[key];
|
||||||
|
} else {
|
||||||
|
process.env[key] = val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns empty array when no API keys are set', () => {
|
||||||
|
expect(getAvailableProviders()).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns only providers with API keys set', () => {
|
||||||
|
process.env.GROQ_API_KEY = 'gsk_test';
|
||||||
|
const result = getAvailableProviders();
|
||||||
|
expect(result).toHaveLength(1);
|
||||||
|
expect(result[0]!.name).toBe('groq');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns multiple providers when multiple keys are set', () => {
|
||||||
|
process.env.GROQ_API_KEY = 'gsk_test';
|
||||||
|
process.env.CEREBRAS_API_KEY = 'csk_test';
|
||||||
|
const result = getAvailableProviders();
|
||||||
|
expect(result).toHaveLength(2);
|
||||||
|
const names = result.map(p => p.name);
|
||||||
|
expect(names).toContain('groq');
|
||||||
|
expect(names).toContain('cerebras');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('excludes providers with empty string API key', () => {
|
||||||
|
process.env.GROQ_API_KEY = '';
|
||||||
|
expect(getAvailableProviders()).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('works with custom provider list', () => {
|
||||||
|
const custom: ProviderConfig[] = [
|
||||||
|
{
|
||||||
|
name: 'custom',
|
||||||
|
baseUrl: 'https://example.com/v1',
|
||||||
|
apiKeyEnv: 'CUSTOM_TEST_KEY',
|
||||||
|
rpmLimit: 10,
|
||||||
|
tpmLimit: 0,
|
||||||
|
models: [],
|
||||||
|
},
|
||||||
|
];
|
||||||
|
expect(getAvailableProviders(custom)).toEqual([]);
|
||||||
|
|
||||||
|
process.env.CUSTOM_TEST_KEY = 'test';
|
||||||
|
expect(getAvailableProviders(custom)).toHaveLength(1);
|
||||||
|
delete process.env.CUSTOM_TEST_KEY;
|
||||||
|
});
|
||||||
|
|
||||||
|
it('DEFAULT_PROVIDERS includes all 4 providers', () => {
|
||||||
|
expect(DEFAULT_PROVIDERS).toHaveLength(4);
|
||||||
|
const names = DEFAULT_PROVIDERS.map(p => p.name);
|
||||||
|
expect(names).toEqual(['groq', 'openrouter', 'together', 'cerebras']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('OpenRouter provider has recommended extra headers', () => {
|
||||||
|
const openrouter = DEFAULT_PROVIDERS.find(p => p.name === 'openrouter');
|
||||||
|
expect(openrouter?.extraHeaders).toBeDefined();
|
||||||
|
expect(openrouter?.extraHeaders?.['HTTP-Referer']).toBeDefined();
|
||||||
|
expect(openrouter?.extraHeaders?.['X-Title']).toBeDefined();
|
||||||
|
});
|
||||||
|
});
|
||||||
290
packages/llm-router/src/__tests__/router.test.ts
Normal file
290
packages/llm-router/src/__tests__/router.test.ts
Normal file
@ -0,0 +1,290 @@
|
|||||||
|
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
||||||
|
import { LlmRouter } from '../router.js';
|
||||||
|
import type { ProviderConfig, ChatCompletionResponse } from '../types.js';
|
||||||
|
import * as client from '../client.js';
|
||||||
|
|
||||||
|
// Mock the HTTP client
|
||||||
|
vi.mock('../client.js', () => ({
|
||||||
|
sendChatCompletion: vi.fn(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
const MOCK_RESPONSE: ChatCompletionResponse = {
|
||||||
|
id: 'chatcmpl-test',
|
||||||
|
object: 'chat.completion',
|
||||||
|
created: Date.now(),
|
||||||
|
model: 'test-model',
|
||||||
|
choices: [
|
||||||
|
{
|
||||||
|
index: 0,
|
||||||
|
message: { role: 'assistant', content: 'Hello!' },
|
||||||
|
finish_reason: 'stop',
|
||||||
|
},
|
||||||
|
],
|
||||||
|
usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
|
||||||
|
};
|
||||||
|
|
||||||
|
const TEST_PROVIDERS: ProviderConfig[] = [
|
||||||
|
{
|
||||||
|
name: 'test-fast',
|
||||||
|
baseUrl: 'https://fast.test/v1',
|
||||||
|
apiKeyEnv: 'TEST_FAST_KEY',
|
||||||
|
rpmLimit: 30,
|
||||||
|
tpmLimit: 10_000,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'fast-model',
|
||||||
|
label: 'Fast',
|
||||||
|
contextWindow: 8_192,
|
||||||
|
strengths: ['general'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: 'test-quality',
|
||||||
|
baseUrl: 'https://quality.test/v1',
|
||||||
|
apiKeyEnv: 'TEST_QUALITY_KEY',
|
||||||
|
rpmLimit: 20,
|
||||||
|
tpmLimit: 0,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'quality-model',
|
||||||
|
label: 'Quality',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['code', 'reasoning'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
describe('LlmRouter', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.resetAllMocks();
|
||||||
|
// Set fake API keys
|
||||||
|
process.env.TEST_FAST_KEY = 'test-key-fast';
|
||||||
|
process.env.TEST_QUALITY_KEY = 'test-key-quality';
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
delete process.env.TEST_FAST_KEY;
|
||||||
|
delete process.env.TEST_QUALITY_KEY;
|
||||||
|
});
|
||||||
|
|
||||||
|
it('throws if no providers have API keys', () => {
|
||||||
|
delete process.env.TEST_FAST_KEY;
|
||||||
|
delete process.env.TEST_QUALITY_KEY;
|
||||||
|
expect(() => new LlmRouter({ providers: TEST_PROVIDERS })).toThrow('No providers available');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('routes a simple prompt to a provider', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 150,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.response.choices[0]!.message.content).toBe('Hello!');
|
||||||
|
expect(result.attempts).toBe(1);
|
||||||
|
expect(result.provider).toBeDefined();
|
||||||
|
expect(result.model).toBeDefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('retries on 429 with fallback provider', async () => {
|
||||||
|
// First call: rate limited
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: null as unknown as ChatCompletionResponse,
|
||||||
|
latencyMs: 50,
|
||||||
|
status: 429,
|
||||||
|
});
|
||||||
|
// Second call: success
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.attempts).toBe(2);
|
||||||
|
expect(result.response.choices[0]!.message.content).toBe('Hello!');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('retries on error with fallback provider', async () => {
|
||||||
|
// First call: error
|
||||||
|
vi.mocked(client.sendChatCompletion).mockRejectedValueOnce(new Error('Network error'));
|
||||||
|
// Second call: success
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.attempts).toBe(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('throws after exhausting all retries', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockRejectedValue(new Error('All down'));
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS, maxRetries: 2 });
|
||||||
|
await expect(router.chat({ messages: [{ role: 'user', content: 'Hello' }] })).rejects.toThrow(
|
||||||
|
'All providers exhausted'
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('routes code prompts to code-capable models', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Write a typescript function to sort an array' }],
|
||||||
|
});
|
||||||
|
|
||||||
|
// Should have been called with quality-model (has 'code' strength)
|
||||||
|
const callArgs = vi.mocked(client.sendChatCompletion).mock.calls[0]!;
|
||||||
|
expect(callArgs[1]).toBe('quality-model');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('fires telemetry callback on success', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 150,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const telemetry = vi.fn();
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS, onTelemetry: telemetry });
|
||||||
|
await router.chat({ messages: [{ role: 'user', content: 'Hello' }] });
|
||||||
|
|
||||||
|
expect(telemetry).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({ event: 'success', attempt: 1 })
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('fires telemetry callback on rate limit', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: null as unknown as ChatCompletionResponse,
|
||||||
|
latencyMs: 50,
|
||||||
|
status: 429,
|
||||||
|
});
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 200,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const telemetry = vi.fn();
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS, onTelemetry: telemetry });
|
||||||
|
await router.chat({ messages: [{ role: 'user', content: 'Hello' }] });
|
||||||
|
|
||||||
|
expect(telemetry).toHaveBeenCalledWith(expect.objectContaining({ event: 'rate_limit' }));
|
||||||
|
});
|
||||||
|
|
||||||
|
it('handles explicit provider:model routing', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 100,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
const result = await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
model: 'test-fast:fast-model',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.provider).toBe('test-fast');
|
||||||
|
expect(result.model).toBe('fast-model');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('throws for unknown explicit provider', async () => {
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
await expect(
|
||||||
|
router.chat({ messages: [{ role: 'user', content: 'Hello' }], model: 'unknown:model' })
|
||||||
|
).rejects.toThrow('Provider "unknown" not found');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns health snapshots', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 150,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
await router.chat({ messages: [{ role: 'user', content: 'Hello' }] });
|
||||||
|
|
||||||
|
const health = router.getHealth();
|
||||||
|
expect(health.length).toBeGreaterThan(0);
|
||||||
|
expect(health[0]!.successes).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('lists available providers', () => {
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS });
|
||||||
|
expect(router.getProviders()).toEqual(['test-fast', 'test-quality']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('fires telemetry for explicit model routing', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: MOCK_RESPONSE,
|
||||||
|
latencyMs: 100,
|
||||||
|
status: 200,
|
||||||
|
});
|
||||||
|
|
||||||
|
const telemetry = vi.fn();
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS, onTelemetry: telemetry });
|
||||||
|
await router.chat({
|
||||||
|
messages: [{ role: 'user', content: 'Hello' }],
|
||||||
|
model: 'test-fast:fast-model',
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(telemetry).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({
|
||||||
|
event: 'success',
|
||||||
|
provider: 'test-fast',
|
||||||
|
model: 'fast-model',
|
||||||
|
category: 'explicit',
|
||||||
|
})
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('records health on explicit model 429', async () => {
|
||||||
|
vi.mocked(client.sendChatCompletion).mockResolvedValueOnce({
|
||||||
|
response: null as unknown as ChatCompletionResponse,
|
||||||
|
latencyMs: 50,
|
||||||
|
status: 429,
|
||||||
|
});
|
||||||
|
|
||||||
|
const telemetry = vi.fn();
|
||||||
|
const router = new LlmRouter({ providers: TEST_PROVIDERS, onTelemetry: telemetry });
|
||||||
|
await expect(
|
||||||
|
router.chat({ messages: [{ role: 'user', content: 'Hello' }], model: 'test-fast:fast-model' })
|
||||||
|
).rejects.toThrow('Rate limited');
|
||||||
|
|
||||||
|
expect(telemetry).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({ event: 'rate_limit', provider: 'test-fast' })
|
||||||
|
);
|
||||||
|
|
||||||
|
// Health should have recorded the rate limit
|
||||||
|
const health = router.getHealth();
|
||||||
|
expect(health).toHaveLength(1);
|
||||||
|
expect(health[0]!.rateLimits).toBe(1);
|
||||||
|
});
|
||||||
|
});
|
||||||
138
packages/llm-router/src/__tests__/selector.test.ts
Normal file
138
packages/llm-router/src/__tests__/selector.test.ts
Normal file
@ -0,0 +1,138 @@
|
|||||||
|
import { describe, it, expect, beforeEach } from 'vitest';
|
||||||
|
import {
|
||||||
|
selectCandidates,
|
||||||
|
pickNext,
|
||||||
|
excludeCandidate,
|
||||||
|
createRoundRobinState,
|
||||||
|
} from '../selector.js';
|
||||||
|
import { HealthTracker } from '../health.js';
|
||||||
|
import type { ProviderConfig } from '../types.js';
|
||||||
|
|
||||||
|
const MOCK_PROVIDERS: ProviderConfig[] = [
|
||||||
|
{
|
||||||
|
name: 'fast-provider',
|
||||||
|
baseUrl: 'https://fast.example.com/v1',
|
||||||
|
apiKeyEnv: 'FAST_KEY',
|
||||||
|
rpmLimit: 30,
|
||||||
|
tpmLimit: 10_000,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'small-8b',
|
||||||
|
label: 'Small 8B',
|
||||||
|
contextWindow: 8_192,
|
||||||
|
strengths: ['general'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'large-70b',
|
||||||
|
label: 'Large 70B',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['code', 'reasoning'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: 'quality-provider',
|
||||||
|
baseUrl: 'https://quality.example.com/v1',
|
||||||
|
apiKeyEnv: 'QUALITY_KEY',
|
||||||
|
rpmLimit: 20,
|
||||||
|
tpmLimit: 0,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'deepseek-r1',
|
||||||
|
label: 'DeepSeek R1',
|
||||||
|
contextWindow: 64_000,
|
||||||
|
strengths: ['reasoning', 'code', 'math'],
|
||||||
|
speedTier: 3,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
describe('selectCandidates', () => {
|
||||||
|
let health: HealthTracker;
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
health = new HealthTracker();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns candidates sorted by score for code', () => {
|
||||||
|
const candidates = selectCandidates(MOCK_PROVIDERS, 'code', health);
|
||||||
|
expect(candidates.length).toBeGreaterThan(0);
|
||||||
|
// large-70b and deepseek-r1 should score high for code
|
||||||
|
const names = candidates.map(c => c.model.id);
|
||||||
|
expect(names[0]).toBe('large-70b'); // speed 1 + code strength + 70b bonus
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns candidates sorted by score for general', () => {
|
||||||
|
const candidates = selectCandidates(MOCK_PROVIDERS, 'general', health);
|
||||||
|
// small-8b has 'general' strength + speed tier 1
|
||||||
|
expect(candidates[0]!.model.id).toBe('small-8b');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('filters out unhealthy providers', () => {
|
||||||
|
// Make fast-provider/large-70b unhealthy
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
health.record('fast-provider', 'large-70b', {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: 100,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const candidates = selectCandidates(MOCK_PROVIDERS, 'code', health);
|
||||||
|
const ids = candidates.map(c => `${c.provider.name}::${c.model.id}`);
|
||||||
|
expect(ids).not.toContain('fast-provider::large-70b');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('pickNext', () => {
|
||||||
|
it('returns null for empty candidates', () => {
|
||||||
|
const state = createRoundRobinState();
|
||||||
|
expect(pickNext([], state)).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns the only candidate when there is one', () => {
|
||||||
|
const state = createRoundRobinState();
|
||||||
|
const candidate = { provider: MOCK_PROVIDERS[0]!, model: MOCK_PROVIDERS[0]!.models[0]! };
|
||||||
|
expect(pickNext([candidate], state)).toBe(candidate);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('round-robins across providers', () => {
|
||||||
|
const state = createRoundRobinState();
|
||||||
|
const candidates = [
|
||||||
|
{ provider: MOCK_PROVIDERS[0]!, model: MOCK_PROVIDERS[0]!.models[0]! },
|
||||||
|
{ provider: MOCK_PROVIDERS[1]!, model: MOCK_PROVIDERS[1]!.models[0]! },
|
||||||
|
];
|
||||||
|
|
||||||
|
const first = pickNext(candidates, state);
|
||||||
|
const second = pickNext(candidates, state);
|
||||||
|
expect(first!.provider.name).not.toBe(second!.provider.name);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('uses independent state per instance', () => {
|
||||||
|
const stateA = createRoundRobinState();
|
||||||
|
const stateB = createRoundRobinState();
|
||||||
|
const candidates = [
|
||||||
|
{ provider: MOCK_PROVIDERS[0]!, model: MOCK_PROVIDERS[0]!.models[0]! },
|
||||||
|
{ provider: MOCK_PROVIDERS[1]!, model: MOCK_PROVIDERS[1]!.models[0]! },
|
||||||
|
];
|
||||||
|
|
||||||
|
const fromA = pickNext(candidates, stateA);
|
||||||
|
const fromB = pickNext(candidates, stateB);
|
||||||
|
// Both start at same position since states are independent
|
||||||
|
expect(fromA!.provider.name).toBe(fromB!.provider.name);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('excludeCandidate', () => {
|
||||||
|
it('removes the specified candidate', () => {
|
||||||
|
const candidates = [
|
||||||
|
{ provider: MOCK_PROVIDERS[0]!, model: MOCK_PROVIDERS[0]!.models[0]! },
|
||||||
|
{ provider: MOCK_PROVIDERS[1]!, model: MOCK_PROVIDERS[1]!.models[0]! },
|
||||||
|
];
|
||||||
|
const remaining = excludeCandidate(candidates, 'fast-provider', 'small-8b');
|
||||||
|
expect(remaining).toHaveLength(1);
|
||||||
|
expect(remaining[0]!.provider.name).toBe('quality-provider');
|
||||||
|
});
|
||||||
|
});
|
||||||
85
packages/llm-router/src/classifier.ts
Normal file
85
packages/llm-router/src/classifier.ts
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
import type { ClassificationResult, PromptCategory } from './types.js';
|
||||||
|
|
||||||
|
// ── Keyword patterns for classification ────────────────────────
|
||||||
|
|
||||||
|
const CODE_PATTERNS = [
|
||||||
|
/\b(function|const |let |var |class |import |export |return |async |await )\b/,
|
||||||
|
/\b(def |print\(|if __name__|lambda )\b/,
|
||||||
|
/[{}();]=>/,
|
||||||
|
/```[\s\S]*```/,
|
||||||
|
/\b(typescript|javascript|python|rust|golang|java|kotlin|swift|sql|html|css|react|node)\b/i,
|
||||||
|
/\b(debug|refactor|compile|build|deploy|lint|test|api|endpoint|route|middleware)\b/i,
|
||||||
|
/\b(fix|bug|error|exception|stack trace|undefined|null|NaN)\b/i,
|
||||||
|
];
|
||||||
|
|
||||||
|
const MATH_PATTERNS = [
|
||||||
|
/\b(calculate|compute|solve|equation|formula|integral|derivative|matrix)\b/i,
|
||||||
|
/\b(probability|statistics|regression|correlation|variance|median|mean)\b/i,
|
||||||
|
/\b(algebra|geometry|calculus|theorem|proof|hypothesis)\b/i,
|
||||||
|
/[+\-*/^=]{2,}/,
|
||||||
|
/\d+\s*[+\-*/^]\s*\d+/,
|
||||||
|
];
|
||||||
|
|
||||||
|
const REASONING_PATTERNS = [
|
||||||
|
/\b(explain|analyze|compare|evaluate|reason|logic|argument|conclusion)\b/i,
|
||||||
|
/\b(why|how does|what if|pros and cons|trade-?offs|implications)\b/i,
|
||||||
|
/\b(step[- ]by[- ]step|chain of thought|think through|break down)\b/i,
|
||||||
|
/\b(strategy|approach|methodology|framework|architecture|design)\b/i,
|
||||||
|
];
|
||||||
|
|
||||||
|
const CREATIVE_PATTERNS = [
|
||||||
|
/\b(write|compose|draft|create|generate|story|poem|essay|blog|article)\b/i,
|
||||||
|
/\b(creative|imaginative|brainstorm|ideas|fiction|narrative|dialogue)\b/i,
|
||||||
|
/\b(rewrite|rephrase|summarize|translate|tone|style|voice)\b/i,
|
||||||
|
];
|
||||||
|
|
||||||
|
// ── Token estimation ───────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Rough token estimate: ~4 chars per token for English text.
|
||||||
|
* Good enough for routing decisions.
|
||||||
|
*/
|
||||||
|
function estimateTokens(text: string): number {
|
||||||
|
return Math.ceil(text.length / 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Classifier ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function countMatches(text: string, patterns: RegExp[]): number {
|
||||||
|
let count = 0;
|
||||||
|
for (const pattern of patterns) {
|
||||||
|
if (pattern.test(text)) count++;
|
||||||
|
}
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Classify a prompt into a category based on keyword matching.
|
||||||
|
* No LLM needed — pure regex heuristics.
|
||||||
|
*/
|
||||||
|
export function classifyPrompt(
|
||||||
|
messages: { role: string; content: string }[]
|
||||||
|
): ClassificationResult {
|
||||||
|
const fullText = messages.map(m => m.content).join('\n');
|
||||||
|
const estimatedTokens = estimateTokens(fullText);
|
||||||
|
|
||||||
|
const scores: Record<PromptCategory, number> = {
|
||||||
|
code: countMatches(fullText, CODE_PATTERNS),
|
||||||
|
math: countMatches(fullText, MATH_PATTERNS),
|
||||||
|
reasoning: countMatches(fullText, REASONING_PATTERNS),
|
||||||
|
creative: countMatches(fullText, CREATIVE_PATTERNS),
|
||||||
|
general: 1, // baseline
|
||||||
|
};
|
||||||
|
|
||||||
|
// Pick highest scoring category
|
||||||
|
let best: PromptCategory = 'general';
|
||||||
|
let bestScore = 0;
|
||||||
|
for (const [cat, score] of Object.entries(scores) as [PromptCategory, number][]) {
|
||||||
|
if (score > bestScore) {
|
||||||
|
bestScore = score;
|
||||||
|
best = cat;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return { category: best, estimatedTokens };
|
||||||
|
}
|
||||||
66
packages/llm-router/src/client.ts
Normal file
66
packages/llm-router/src/client.ts
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
import type { ChatCompletionRequest, ChatCompletionResponse, ProviderConfig } from './types.js';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Send an OpenAI-compatible chat completion request to a provider.
|
||||||
|
* Returns the parsed response or throws on HTTP/network errors.
|
||||||
|
*/
|
||||||
|
export async function sendChatCompletion(
|
||||||
|
provider: ProviderConfig,
|
||||||
|
modelId: string,
|
||||||
|
request: ChatCompletionRequest,
|
||||||
|
timeoutMs: number = 30_000
|
||||||
|
): Promise<{ response: ChatCompletionResponse; latencyMs: number; status: number }> {
|
||||||
|
const apiKey = process.env[provider.apiKeyEnv];
|
||||||
|
if (!apiKey) {
|
||||||
|
throw new Error(`Missing API key: env var ${provider.apiKeyEnv} is not set`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const url = `${provider.baseUrl}/chat/completions`;
|
||||||
|
const headers: Record<string, string> = {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
Authorization: `Bearer ${apiKey}`,
|
||||||
|
...provider.extraHeaders,
|
||||||
|
};
|
||||||
|
|
||||||
|
const body = JSON.stringify({
|
||||||
|
model: modelId,
|
||||||
|
messages: request.messages,
|
||||||
|
...(request.temperature !== undefined && { temperature: request.temperature }),
|
||||||
|
...(request.max_tokens !== undefined && { max_tokens: request.max_tokens }),
|
||||||
|
...(request.top_p !== undefined && { top_p: request.top_p }),
|
||||||
|
stream: false,
|
||||||
|
});
|
||||||
|
|
||||||
|
const controller = new AbortController();
|
||||||
|
const timer = setTimeout(() => controller.abort(), timeoutMs);
|
||||||
|
const start = Date.now();
|
||||||
|
|
||||||
|
try {
|
||||||
|
const res = await fetch(url, {
|
||||||
|
method: 'POST',
|
||||||
|
headers,
|
||||||
|
body,
|
||||||
|
signal: controller.signal,
|
||||||
|
});
|
||||||
|
|
||||||
|
const latencyMs = Date.now() - start;
|
||||||
|
|
||||||
|
if (res.status === 429) {
|
||||||
|
return {
|
||||||
|
response: null as unknown as ChatCompletionResponse,
|
||||||
|
latencyMs,
|
||||||
|
status: 429,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!res.ok) {
|
||||||
|
const text = await res.text().catch(() => '');
|
||||||
|
throw new Error(`${provider.name} returned ${res.status}: ${text.slice(0, 200)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const data = (await res.json()) as ChatCompletionResponse;
|
||||||
|
return { response: data, latencyMs, status: res.status };
|
||||||
|
} finally {
|
||||||
|
clearTimeout(timer);
|
||||||
|
}
|
||||||
|
}
|
||||||
103
packages/llm-router/src/health.ts
Normal file
103
packages/llm-router/src/health.ts
Normal file
@ -0,0 +1,103 @@
|
|||||||
|
import type { HealthSnapshot, RequestRecord } from './types.js';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sliding-window health tracker for provider+model pairs.
|
||||||
|
* Tracks latency, error rates, and rate-limit hits.
|
||||||
|
*/
|
||||||
|
export class HealthTracker {
|
||||||
|
private records = new Map<string, RequestRecord[]>();
|
||||||
|
private readonly windowMs: number;
|
||||||
|
private readonly errorThreshold: number;
|
||||||
|
private readonly rateLimitThreshold: number;
|
||||||
|
|
||||||
|
constructor(opts?: { windowMs?: number; errorThreshold?: number; rateLimitThreshold?: number }) {
|
||||||
|
this.windowMs = opts?.windowMs ?? 60_000;
|
||||||
|
this.errorThreshold = opts?.errorThreshold ?? 0.5;
|
||||||
|
this.rateLimitThreshold = opts?.rateLimitThreshold ?? 0.3;
|
||||||
|
}
|
||||||
|
|
||||||
|
private key(provider: string, model: string): string {
|
||||||
|
return `${provider}::${model}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
private prune(records: RequestRecord[]): RequestRecord[] {
|
||||||
|
const cutoff = Date.now() - this.windowMs;
|
||||||
|
return records.filter(r => r.timestamp >= cutoff);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Record a completed request (success, rate_limit, or error). */
|
||||||
|
record(provider: string, model: string, entry: RequestRecord): void {
|
||||||
|
const k = this.key(provider, model);
|
||||||
|
const existing = this.records.get(k) ?? [];
|
||||||
|
existing.push(entry);
|
||||||
|
this.records.set(k, this.prune(existing));
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Get health snapshot for a provider+model pair. */
|
||||||
|
snapshot(provider: string, model: string): HealthSnapshot {
|
||||||
|
const k = this.key(provider, model);
|
||||||
|
const raw = this.records.get(k) ?? [];
|
||||||
|
const records = this.prune(raw);
|
||||||
|
this.records.set(k, records);
|
||||||
|
|
||||||
|
const total = records.length;
|
||||||
|
const successes = records.filter(r => r.status === 'success').length;
|
||||||
|
const rateLimits = records.filter(r => r.status === 'rate_limit').length;
|
||||||
|
const errors = records.filter(r => r.status === 'error').length;
|
||||||
|
|
||||||
|
const successLatencies = records
|
||||||
|
.filter(r => r.status === 'success')
|
||||||
|
.map(r => r.latencyMs)
|
||||||
|
.sort((a, b) => a - b);
|
||||||
|
|
||||||
|
const avgLatencyMs =
|
||||||
|
successLatencies.length > 0
|
||||||
|
? successLatencies.reduce((a, b) => a + b, 0) / successLatencies.length
|
||||||
|
: 0;
|
||||||
|
|
||||||
|
const p95LatencyMs =
|
||||||
|
successLatencies.length > 0
|
||||||
|
? (successLatencies[Math.floor(successLatencies.length * 0.95)] ??
|
||||||
|
successLatencies[successLatencies.length - 1]!)
|
||||||
|
: 0;
|
||||||
|
|
||||||
|
// Healthy = not too many errors or rate limits
|
||||||
|
const errorRate = total > 0 ? errors / total : 0;
|
||||||
|
const rateLimitRate = total > 0 ? rateLimits / total : 0;
|
||||||
|
const healthy =
|
||||||
|
total < 3 || // not enough data → assume healthy
|
||||||
|
(errorRate < this.errorThreshold && rateLimitRate < this.rateLimitThreshold);
|
||||||
|
|
||||||
|
return {
|
||||||
|
provider,
|
||||||
|
model,
|
||||||
|
totalRequests: total,
|
||||||
|
successes,
|
||||||
|
rateLimits,
|
||||||
|
errors,
|
||||||
|
avgLatencyMs: Math.round(avgLatencyMs),
|
||||||
|
p95LatencyMs: Math.round(p95LatencyMs),
|
||||||
|
healthy,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Check if a specific provider+model is currently healthy. */
|
||||||
|
isHealthy(provider: string, model: string): boolean {
|
||||||
|
return this.snapshot(provider, model).healthy;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Get all tracked snapshots. */
|
||||||
|
allSnapshots(): HealthSnapshot[] {
|
||||||
|
const snapshots: HealthSnapshot[] = [];
|
||||||
|
for (const k of this.records.keys()) {
|
||||||
|
const [provider, model] = k.split('::') as [string, string];
|
||||||
|
snapshots.push(this.snapshot(provider, model));
|
||||||
|
}
|
||||||
|
return snapshots;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Clear all tracking data. */
|
||||||
|
reset(): void {
|
||||||
|
this.records.clear();
|
||||||
|
}
|
||||||
|
}
|
||||||
25
packages/llm-router/src/index.ts
Normal file
25
packages/llm-router/src/index.ts
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
export { LlmRouter } from './router.js';
|
||||||
|
export type { TelemetryEntry } from './router.js';
|
||||||
|
|
||||||
|
export { DEFAULT_PROVIDERS, getAvailableProviders } from './registry.js';
|
||||||
|
export { classifyPrompt } from './classifier.js';
|
||||||
|
export { HealthTracker } from './health.js';
|
||||||
|
export { selectCandidates, pickNext, excludeCandidate, createRoundRobinState } from './selector.js';
|
||||||
|
export type { SelectionCandidate } from './selector.js';
|
||||||
|
export { sendChatCompletion } from './client.js';
|
||||||
|
|
||||||
|
export type {
|
||||||
|
ModelConfig,
|
||||||
|
ProviderConfig,
|
||||||
|
PromptCategory,
|
||||||
|
ClassificationResult,
|
||||||
|
HealthSnapshot,
|
||||||
|
RequestRecord,
|
||||||
|
RouterConfig,
|
||||||
|
ChatMessage,
|
||||||
|
ChatCompletionRequest,
|
||||||
|
ChatCompletionChoice,
|
||||||
|
ChatCompletionUsage,
|
||||||
|
ChatCompletionResponse,
|
||||||
|
RouteResult,
|
||||||
|
} from './types.js';
|
||||||
134
packages/llm-router/src/registry.ts
Normal file
134
packages/llm-router/src/registry.ts
Normal file
@ -0,0 +1,134 @@
|
|||||||
|
import type { ProviderConfig } from './types.js';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Default free-tier provider configurations.
|
||||||
|
* All use OpenAI-compatible /v1/chat/completions endpoints.
|
||||||
|
*/
|
||||||
|
export const DEFAULT_PROVIDERS: ProviderConfig[] = [
|
||||||
|
// ── Groq ─────────────────────────────────────────────────────
|
||||||
|
// Free tier: 30 RPM, 14.4K TPM (large), 30K TPM (small)
|
||||||
|
{
|
||||||
|
name: 'groq',
|
||||||
|
baseUrl: 'https://api.groq.com/openai/v1',
|
||||||
|
apiKeyEnv: 'GROQ_API_KEY',
|
||||||
|
rpmLimit: 30,
|
||||||
|
tpmLimit: 14_400,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'llama-3.3-70b-versatile',
|
||||||
|
label: 'Llama 3.3 70B',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['general', 'reasoning', 'code'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'llama-3.1-8b-instant',
|
||||||
|
label: 'Llama 3.1 8B Instant',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['general'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'gemma2-9b-it',
|
||||||
|
label: 'Gemma 2 9B',
|
||||||
|
contextWindow: 8_192,
|
||||||
|
strengths: ['general', 'creative'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
|
||||||
|
// ── OpenRouter ───────────────────────────────────────────────
|
||||||
|
// Free models available (rate-limited per model)
|
||||||
|
{
|
||||||
|
name: 'openrouter',
|
||||||
|
baseUrl: 'https://openrouter.ai/api/v1',
|
||||||
|
apiKeyEnv: 'OPENROUTER_API_KEY',
|
||||||
|
extraHeaders: {
|
||||||
|
'HTTP-Referer': 'https://bytelyst.com',
|
||||||
|
'X-Title': 'ByteLyst LLM Router',
|
||||||
|
},
|
||||||
|
rpmLimit: 20,
|
||||||
|
tpmLimit: 0,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'deepseek/deepseek-r1:free',
|
||||||
|
label: 'DeepSeek R1 (Free)',
|
||||||
|
contextWindow: 64_000,
|
||||||
|
strengths: ['reasoning', 'code', 'math'],
|
||||||
|
speedTier: 3,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'meta-llama/llama-3.3-70b-instruct:free',
|
||||||
|
label: 'Llama 3.3 70B (Free)',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['general', 'reasoning', 'code'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'google/gemma-2-9b-it:free',
|
||||||
|
label: 'Gemma 2 9B (Free)',
|
||||||
|
contextWindow: 8_192,
|
||||||
|
strengths: ['general', 'creative'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
|
||||||
|
// ── Together AI ──────────────────────────────────────────────
|
||||||
|
// Free tier: limited RPM, several open models
|
||||||
|
{
|
||||||
|
name: 'together',
|
||||||
|
baseUrl: 'https://api.together.xyz/v1',
|
||||||
|
apiKeyEnv: 'TOGETHER_API_KEY',
|
||||||
|
rpmLimit: 20,
|
||||||
|
tpmLimit: 0,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
|
||||||
|
label: 'Llama 3.3 70B Turbo',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['general', 'reasoning', 'code'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: 'deepseek-ai/DeepSeek-R1-Distill-Llama-70B',
|
||||||
|
label: 'DeepSeek R1 Distill 70B',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['reasoning', 'math', 'code'],
|
||||||
|
speedTier: 2,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
|
||||||
|
// ── Cerebras ─────────────────────────────────────────────────
|
||||||
|
// Free inference tier — extremely fast
|
||||||
|
{
|
||||||
|
name: 'cerebras',
|
||||||
|
baseUrl: 'https://api.cerebras.ai/v1',
|
||||||
|
apiKeyEnv: 'CEREBRAS_API_KEY',
|
||||||
|
rpmLimit: 30,
|
||||||
|
tpmLimit: 60_000,
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: 'llama-3.3-70b',
|
||||||
|
label: 'Llama 3.3 70B (Cerebras)',
|
||||||
|
contextWindow: 128_000,
|
||||||
|
strengths: ['general', 'reasoning', 'code'],
|
||||||
|
speedTier: 1,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Filter providers to only those with API keys present in env.
|
||||||
|
*/
|
||||||
|
export function getAvailableProviders(
|
||||||
|
providers: ProviderConfig[] = DEFAULT_PROVIDERS
|
||||||
|
): ProviderConfig[] {
|
||||||
|
return providers.filter(p => {
|
||||||
|
const key = process.env[p.apiKeyEnv];
|
||||||
|
return key !== undefined && key !== '';
|
||||||
|
});
|
||||||
|
}
|
||||||
285
packages/llm-router/src/router.ts
Normal file
285
packages/llm-router/src/router.ts
Normal file
@ -0,0 +1,285 @@
|
|||||||
|
import type {
|
||||||
|
ChatCompletionRequest,
|
||||||
|
RouterConfig,
|
||||||
|
ProviderConfig,
|
||||||
|
RouteResult,
|
||||||
|
HealthSnapshot,
|
||||||
|
} from './types.js';
|
||||||
|
import { DEFAULT_PROVIDERS, getAvailableProviders } from './registry.js';
|
||||||
|
import { classifyPrompt } from './classifier.js';
|
||||||
|
import { HealthTracker } from './health.js';
|
||||||
|
import { selectCandidates, pickNext, excludeCandidate, createRoundRobinState } from './selector.js';
|
||||||
|
import { sendChatCompletion } from './client.js';
|
||||||
|
|
||||||
|
export class LlmRouter {
|
||||||
|
private readonly providers: ProviderConfig[];
|
||||||
|
private readonly health: HealthTracker;
|
||||||
|
private readonly timeoutMs: number;
|
||||||
|
private readonly maxRetries: number;
|
||||||
|
private readonly log: (entry: TelemetryEntry) => void;
|
||||||
|
private readonly roundRobinState: Map<string, number>;
|
||||||
|
|
||||||
|
constructor(config?: RouterConfig & { onTelemetry?: (entry: TelemetryEntry) => void }) {
|
||||||
|
const allProviders = config?.providers ?? DEFAULT_PROVIDERS;
|
||||||
|
this.providers = getAvailableProviders(allProviders);
|
||||||
|
|
||||||
|
if (this.providers.length === 0) {
|
||||||
|
throw new Error(
|
||||||
|
'No providers available. Set at least one API key env var: ' +
|
||||||
|
allProviders.map(p => p.apiKeyEnv).join(', ')
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
this.health = new HealthTracker({
|
||||||
|
windowMs: config?.healthWindowMs,
|
||||||
|
errorThreshold: config?.errorThreshold,
|
||||||
|
rateLimitThreshold: config?.rateLimitThreshold,
|
||||||
|
});
|
||||||
|
|
||||||
|
this.timeoutMs = config?.timeoutMs ?? 30_000;
|
||||||
|
this.maxRetries = config?.maxRetries ?? 3;
|
||||||
|
this.log = config?.onTelemetry ?? (() => {});
|
||||||
|
this.roundRobinState = createRoundRobinState();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Route a chat completion request to the best available provider.
|
||||||
|
* Automatically retries on 429/5xx with fallback to other providers.
|
||||||
|
*/
|
||||||
|
async chat(request: ChatCompletionRequest): Promise<RouteResult> {
|
||||||
|
const startTime = Date.now();
|
||||||
|
|
||||||
|
// If user specified a specific provider:model or provider/model, try that first
|
||||||
|
if (request.model && (request.model.includes(':') || request.model.includes('/'))) {
|
||||||
|
return this.chatWithExplicitModel(request, startTime);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Classify the prompt
|
||||||
|
const classification = classifyPrompt(request.messages);
|
||||||
|
|
||||||
|
// Get ranked candidates
|
||||||
|
let candidates = selectCandidates(this.providers, classification.category, this.health);
|
||||||
|
|
||||||
|
if (candidates.length === 0) {
|
||||||
|
throw new Error('No healthy providers available for routing');
|
||||||
|
}
|
||||||
|
|
||||||
|
let lastError: Error | null = null;
|
||||||
|
|
||||||
|
for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
|
||||||
|
const pick = pickNext(candidates, this.roundRobinState);
|
||||||
|
if (!pick) break;
|
||||||
|
|
||||||
|
const { provider, model } = pick;
|
||||||
|
const attemptStart = Date.now();
|
||||||
|
|
||||||
|
try {
|
||||||
|
const result = await sendChatCompletion(provider, model.id, request, this.timeoutMs);
|
||||||
|
|
||||||
|
if (result.status === 429) {
|
||||||
|
// Rate limited — record and try next provider
|
||||||
|
this.health.record(provider.name, model.id, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
status: 'rate_limit',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'rate_limit',
|
||||||
|
provider: provider.name,
|
||||||
|
model: model.id,
|
||||||
|
attempt,
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
category: classification.category,
|
||||||
|
});
|
||||||
|
|
||||||
|
candidates = excludeCandidate(candidates, provider.name, model.id);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Success
|
||||||
|
this.health.record(provider.name, model.id, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'success',
|
||||||
|
provider: provider.name,
|
||||||
|
model: model.id,
|
||||||
|
attempt,
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
category: classification.category,
|
||||||
|
tokens: result.response.usage?.total_tokens,
|
||||||
|
});
|
||||||
|
|
||||||
|
return {
|
||||||
|
response: result.response,
|
||||||
|
provider: provider.name,
|
||||||
|
model: model.id,
|
||||||
|
totalLatencyMs: Date.now() - startTime,
|
||||||
|
attempts: attempt,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
lastError = err instanceof Error ? err : new Error(String(err));
|
||||||
|
const attemptLatency = Date.now() - attemptStart;
|
||||||
|
|
||||||
|
this.health.record(provider.name, model.id, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: attemptLatency,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'error',
|
||||||
|
provider: provider.name,
|
||||||
|
model: model.id,
|
||||||
|
attempt,
|
||||||
|
latencyMs: attemptLatency,
|
||||||
|
category: classification.category,
|
||||||
|
error: lastError.message,
|
||||||
|
});
|
||||||
|
|
||||||
|
candidates = excludeCandidate(candidates, provider.name, model.id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new Error(
|
||||||
|
`All providers exhausted after ${this.maxRetries} attempts. Last error: ${lastError?.message ?? 'unknown'}`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Handle explicit provider:model routing (bypass classifier).
|
||||||
|
*/
|
||||||
|
private async chatWithExplicitModel(
|
||||||
|
request: ChatCompletionRequest,
|
||||||
|
startTime: number
|
||||||
|
): Promise<RouteResult> {
|
||||||
|
// Support both "provider:model" and "provider/model" separators
|
||||||
|
// Use first colon or first slash (whichever comes first) as separator
|
||||||
|
const raw = request.model!;
|
||||||
|
const colonIdx = raw.indexOf(':');
|
||||||
|
const slashIdx = raw.indexOf('/');
|
||||||
|
let sepIdx: number;
|
||||||
|
if (colonIdx === -1 && slashIdx === -1) {
|
||||||
|
sepIdx = -1;
|
||||||
|
} else if (colonIdx === -1) {
|
||||||
|
sepIdx = slashIdx;
|
||||||
|
} else if (slashIdx === -1) {
|
||||||
|
sepIdx = colonIdx;
|
||||||
|
} else {
|
||||||
|
sepIdx = Math.min(colonIdx, slashIdx);
|
||||||
|
}
|
||||||
|
|
||||||
|
const providerName = sepIdx === -1 ? raw : raw.slice(0, sepIdx);
|
||||||
|
const modelId = sepIdx === -1 ? '' : raw.slice(sepIdx + 1);
|
||||||
|
|
||||||
|
const provider = this.providers.find(p => p.name === providerName);
|
||||||
|
if (!provider) {
|
||||||
|
throw new Error(
|
||||||
|
`Provider "${providerName}" not found. Available: ${this.providers.map(p => p.name).join(', ')}`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const result = await sendChatCompletion(provider, modelId, request, this.timeoutMs);
|
||||||
|
|
||||||
|
if (result.status === 429) {
|
||||||
|
this.health.record(provider.name, modelId, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
status: 'rate_limit',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'rate_limit',
|
||||||
|
provider: provider.name,
|
||||||
|
model: modelId,
|
||||||
|
attempt: 1,
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
category: 'explicit',
|
||||||
|
});
|
||||||
|
|
||||||
|
throw new Error(`Rate limited by ${providerName} for model ${modelId}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
this.health.record(provider.name, modelId, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
status: 'success',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'success',
|
||||||
|
provider: provider.name,
|
||||||
|
model: modelId,
|
||||||
|
attempt: 1,
|
||||||
|
latencyMs: result.latencyMs,
|
||||||
|
category: 'explicit',
|
||||||
|
tokens: result.response.usage?.total_tokens,
|
||||||
|
});
|
||||||
|
|
||||||
|
return {
|
||||||
|
response: result.response,
|
||||||
|
provider: provider.name,
|
||||||
|
model: modelId,
|
||||||
|
totalLatencyMs: Date.now() - startTime,
|
||||||
|
attempts: 1,
|
||||||
|
};
|
||||||
|
} catch (err) {
|
||||||
|
// Re-throw rate-limit errors (already logged above)
|
||||||
|
if (err instanceof Error && err.message.startsWith('Rate limited by')) {
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
|
||||||
|
const latency = Date.now() - startTime;
|
||||||
|
this.health.record(provider.name, modelId, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
latencyMs: latency,
|
||||||
|
status: 'error',
|
||||||
|
});
|
||||||
|
|
||||||
|
this.log({
|
||||||
|
event: 'error',
|
||||||
|
provider: provider.name,
|
||||||
|
model: modelId,
|
||||||
|
attempt: 1,
|
||||||
|
latencyMs: latency,
|
||||||
|
category: 'explicit',
|
||||||
|
error: err instanceof Error ? err.message : String(err),
|
||||||
|
});
|
||||||
|
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Get health snapshots for all tracked provider+model pairs. */
|
||||||
|
getHealth(): HealthSnapshot[] {
|
||||||
|
return this.health.allSnapshots();
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Get list of available (configured) providers. */
|
||||||
|
getProviders(): string[] {
|
||||||
|
return this.providers.map(p => p.name);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Reset health tracking data. */
|
||||||
|
resetHealth(): void {
|
||||||
|
this.health.reset();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Telemetry types ────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface TelemetryEntry {
|
||||||
|
event: 'success' | 'rate_limit' | 'error';
|
||||||
|
provider: string;
|
||||||
|
model: string;
|
||||||
|
attempt: number;
|
||||||
|
latencyMs: number;
|
||||||
|
category: string;
|
||||||
|
tokens?: number;
|
||||||
|
error?: string;
|
||||||
|
}
|
||||||
101
packages/llm-router/src/selector.ts
Normal file
101
packages/llm-router/src/selector.ts
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
import type { ModelConfig, PromptCategory, ProviderConfig } from './types.js';
|
||||||
|
import type { HealthTracker } from './health.js';
|
||||||
|
|
||||||
|
export interface SelectionCandidate {
|
||||||
|
provider: ProviderConfig;
|
||||||
|
model: ModelConfig;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Create a fresh round-robin state map (one per router instance). */
|
||||||
|
export function createRoundRobinState(): Map<string, number> {
|
||||||
|
return new Map<string, number>();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Score a model for a given prompt category.
|
||||||
|
* Higher = better fit.
|
||||||
|
*/
|
||||||
|
function scoreModel(model: ModelConfig, category: PromptCategory): number {
|
||||||
|
let score = 0;
|
||||||
|
|
||||||
|
// Direct strength match is the strongest signal
|
||||||
|
if (model.strengths.includes(category)) {
|
||||||
|
score += 10;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Speed bonus (lower tier = faster = better for simple tasks)
|
||||||
|
score += (4 - model.speedTier) * 2;
|
||||||
|
|
||||||
|
// Context window bonus for reasoning/creative (often longer)
|
||||||
|
if ((category === 'reasoning' || category === 'creative') && model.contextWindow >= 64_000) {
|
||||||
|
score += 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prefer larger models for code/math/reasoning
|
||||||
|
if (['code', 'math', 'reasoning'].includes(category)) {
|
||||||
|
if (model.id.includes('70b') || model.id.includes('70B')) score += 5;
|
||||||
|
if (model.id.includes('r1') || model.id.includes('R1')) score += 4;
|
||||||
|
}
|
||||||
|
|
||||||
|
return score;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Select the best provider+model candidates for a prompt category.
|
||||||
|
* Returns candidates sorted by score (best first), filtered by health.
|
||||||
|
*/
|
||||||
|
export function selectCandidates(
|
||||||
|
providers: ProviderConfig[],
|
||||||
|
category: PromptCategory,
|
||||||
|
health: HealthTracker
|
||||||
|
): SelectionCandidate[] {
|
||||||
|
const candidates: (SelectionCandidate & { score: number })[] = [];
|
||||||
|
|
||||||
|
for (const provider of providers) {
|
||||||
|
for (const model of provider.models) {
|
||||||
|
if (!health.isHealthy(provider.name, model.id)) continue;
|
||||||
|
|
||||||
|
const score = scoreModel(model, category);
|
||||||
|
candidates.push({ provider, model, score });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sort by score descending
|
||||||
|
candidates.sort((a, b) => b.score - a.score);
|
||||||
|
|
||||||
|
return candidates;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Pick the next candidate using round-robin within the top tier.
|
||||||
|
* Groups candidates by provider, rotates between them to spread rate-limit load.
|
||||||
|
*/
|
||||||
|
export function pickNext(
|
||||||
|
candidates: SelectionCandidate[],
|
||||||
|
state: Map<string, number>
|
||||||
|
): SelectionCandidate | null {
|
||||||
|
if (candidates.length === 0) return null;
|
||||||
|
if (candidates.length === 1) return candidates[0]!;
|
||||||
|
|
||||||
|
// Group by provider name for round-robin
|
||||||
|
const providerNames = [...new Set(candidates.map(c => c.provider.name))];
|
||||||
|
const key = providerNames.join(',');
|
||||||
|
|
||||||
|
const idx = state.get(key) ?? 0;
|
||||||
|
const targetProvider = providerNames[idx % providerNames.length]!;
|
||||||
|
state.set(key, idx + 1);
|
||||||
|
|
||||||
|
// Pick the best model from the selected provider
|
||||||
|
return candidates.find(c => c.provider.name === targetProvider) ?? candidates[0]!;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Remove a candidate from the list (after failure) and return remaining.
|
||||||
|
*/
|
||||||
|
export function excludeCandidate(
|
||||||
|
candidates: SelectionCandidate[],
|
||||||
|
provider: string,
|
||||||
|
model: string
|
||||||
|
): SelectionCandidate[] {
|
||||||
|
return candidates.filter(c => !(c.provider.name === provider && c.model.id === model));
|
||||||
|
}
|
||||||
136
packages/llm-router/src/types.ts
Normal file
136
packages/llm-router/src/types.ts
Normal file
@ -0,0 +1,136 @@
|
|||||||
|
// ── Provider & Model Types ─────────────────────────────────────
|
||||||
|
|
||||||
|
export interface ModelConfig {
|
||||||
|
/** Model identifier as the provider expects it */
|
||||||
|
id: string;
|
||||||
|
/** Human-readable label */
|
||||||
|
label: string;
|
||||||
|
/** Max context window tokens */
|
||||||
|
contextWindow: number;
|
||||||
|
/** What this model is good at */
|
||||||
|
strengths: PromptCategory[];
|
||||||
|
/** Relative speed tier: 1 = fastest, 3 = slowest */
|
||||||
|
speedTier: 1 | 2 | 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ProviderConfig {
|
||||||
|
/** Unique provider name */
|
||||||
|
name: string;
|
||||||
|
/** OpenAI-compatible base URL (e.g. https://api.groq.com/openai/v1) */
|
||||||
|
baseUrl: string;
|
||||||
|
/** Environment variable name that holds the API key */
|
||||||
|
apiKeyEnv: string;
|
||||||
|
/** Available models on this provider */
|
||||||
|
models: ModelConfig[];
|
||||||
|
/** Extra headers to send with every request */
|
||||||
|
extraHeaders?: Record<string, string>;
|
||||||
|
/** Free-tier rate limit: requests per minute (0 = unknown) */
|
||||||
|
rpmLimit: number;
|
||||||
|
/** Free-tier rate limit: tokens per minute (0 = unknown) */
|
||||||
|
tpmLimit: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Prompt Classification ──────────────────────────────────────
|
||||||
|
|
||||||
|
export type PromptCategory = 'code' | 'math' | 'reasoning' | 'creative' | 'general';
|
||||||
|
|
||||||
|
export interface ClassificationResult {
|
||||||
|
category: PromptCategory;
|
||||||
|
estimatedTokens: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Health Tracking ────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface HealthSnapshot {
|
||||||
|
provider: string;
|
||||||
|
model: string;
|
||||||
|
/** Total requests in the window */
|
||||||
|
totalRequests: number;
|
||||||
|
/** Successful requests */
|
||||||
|
successes: number;
|
||||||
|
/** 429 rate-limit hits */
|
||||||
|
rateLimits: number;
|
||||||
|
/** 5xx / network errors */
|
||||||
|
errors: number;
|
||||||
|
/** Average latency in ms (successes only) */
|
||||||
|
avgLatencyMs: number;
|
||||||
|
/** p95 latency in ms */
|
||||||
|
p95LatencyMs: number;
|
||||||
|
/** Whether this provider is currently considered healthy */
|
||||||
|
healthy: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface RequestRecord {
|
||||||
|
timestamp: number;
|
||||||
|
latencyMs: number;
|
||||||
|
status: 'success' | 'rate_limit' | 'error';
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Router Config ──────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface RouterConfig {
|
||||||
|
/** Provider configurations (use DEFAULT_PROVIDERS if omitted) */
|
||||||
|
providers?: ProviderConfig[];
|
||||||
|
/** Health window in ms (default: 60_000 = 1 minute) */
|
||||||
|
healthWindowMs?: number;
|
||||||
|
/** Error rate threshold to mark unhealthy (default: 0.5 = 50%) */
|
||||||
|
errorThreshold?: number;
|
||||||
|
/** Rate-limit rate threshold to mark unhealthy (default: 0.3 = 30%) */
|
||||||
|
rateLimitThreshold?: number;
|
||||||
|
/** Request timeout in ms (default: 30_000) */
|
||||||
|
timeoutMs?: number;
|
||||||
|
/** Max retry attempts across providers (default: 3) */
|
||||||
|
maxRetries?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── OpenAI-Compatible Request/Response ─────────────────────────
|
||||||
|
|
||||||
|
export interface ChatMessage {
|
||||||
|
role: 'system' | 'user' | 'assistant';
|
||||||
|
content: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ChatCompletionRequest {
|
||||||
|
messages: ChatMessage[];
|
||||||
|
/** Optional: force a specific model (provider:model format or just model id) */
|
||||||
|
model?: string;
|
||||||
|
temperature?: number;
|
||||||
|
max_tokens?: number;
|
||||||
|
top_p?: number;
|
||||||
|
stream?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ChatCompletionChoice {
|
||||||
|
index: number;
|
||||||
|
message: ChatMessage;
|
||||||
|
finish_reason: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ChatCompletionUsage {
|
||||||
|
prompt_tokens: number;
|
||||||
|
completion_tokens: number;
|
||||||
|
total_tokens: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ChatCompletionResponse {
|
||||||
|
id: string;
|
||||||
|
object: 'chat.completion';
|
||||||
|
created: number;
|
||||||
|
model: string;
|
||||||
|
choices: ChatCompletionChoice[];
|
||||||
|
usage?: ChatCompletionUsage;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Router Result (wraps response + metadata) ──────────────────
|
||||||
|
|
||||||
|
export interface RouteResult {
|
||||||
|
response: ChatCompletionResponse;
|
||||||
|
/** Which provider served this request */
|
||||||
|
provider: string;
|
||||||
|
/** Which model was used */
|
||||||
|
model: string;
|
||||||
|
/** Total latency in ms (including retries) */
|
||||||
|
totalLatencyMs: number;
|
||||||
|
/** How many attempts were made */
|
||||||
|
attempts: number;
|
||||||
|
}
|
||||||
9
packages/llm-router/tsconfig.json
Normal file
9
packages/llm-router/tsconfig.json
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
{
|
||||||
|
"extends": "../../tsconfig.base.json",
|
||||||
|
"compilerOptions": {
|
||||||
|
"outDir": "dist",
|
||||||
|
"rootDir": "src"
|
||||||
|
},
|
||||||
|
"include": ["src/**/*.ts"],
|
||||||
|
"exclude": ["src/**/*.test.ts"]
|
||||||
|
}
|
||||||
8
packages/llm-router/vitest.config.ts
Normal file
8
packages/llm-router/vitest.config.ts
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
import { defineConfig } from 'vitest/config';
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
test: {
|
||||||
|
globals: true,
|
||||||
|
passWithNoTests: true,
|
||||||
|
},
|
||||||
|
});
|
||||||
Loading…
Reference in New Issue
Block a user