learning_ai_common_plat/packages/llm-router
saravanakumardb1 dd90f709e1 fix(gitea): set ROOT_URL=host.docker.internal, NO_PROXY for host (F17)
Resolves F17 in docker-build-optimization-roadmap.

Root cause:
  Gitea's app.ini ROOT_URL was http://localhost:3300/. Gitea bakes
  ROOT_URL into the dist.tarball field of every published package's
  metadata. Inside a Docker container, 'localhost' is the container
  itself, not the host \u2014 so any 'pnpm install' that needed to fetch
  a tarball would ECONNREFUSED, even though the registry metadata
  itself was reachable via host.docker.internal.

Server-side fix (not in git, requires manual replication on each dev
machine; documented in roadmap \u00a73 A-pre-6):
  - Edit /opt/homebrew/var/gitea/custom/conf/app.ini:
    ROOT_URL = http://host.docker.internal:3300/
  - brew services restart gitea
  - sudo sh -c 'echo "127.0.0.1 host.docker.internal" >> /etc/hosts'

Repo-side fix (this commit):
  - switch-network.sh: add host.docker.internal to NO_PROXY +
    NPM_CONFIG_NOPROXY when NETWORK=corp. Required so host-side curl/
    pnpm/npm bypass the corporate proxy (cso.proxy.att.com) when
    resolving host.docker.internal. Without this, host installs fail
    with the corp proxy's 'Unknown Host' 504 page.

Republished all 64 @bytelyst/* packages so tarball URLs reflect the
new ROOT_URL:
  - .publish-manifest.json: 64 entries with new content hashes
  - packages/*/package.json: 64 patch-version bumps
    (auto-bumped by publish-outdated-packages.sh because previous
    versions already existed in registry)

Verification:
  curl http://localhost:3300/.../@bytelyst%2Ferrors | jq .dist.tarball
  → http://host.docker.internal:3300/.../errors-0.1.11.tgz  (was localhost:3300)
  workspace:* refs across all 64 packages: 0

Unblocks: A0-V on every pilot. Verified PASSING on learning_ai_clock:
  backend cold build: 59.2 s
  web cold build:     3:13 (193 s)
  Both via Gitea registry, no docker-prep.sh tarballs needed.
2026-05-27 01:51:43 -07:00
..
src ci: update CI/CD configuration 2026-04-12 23:51:10 -07:00
package.json fix(gitea): set ROOT_URL=host.docker.internal, NO_PROXY for host (F17) 2026-05-27 01:51:43 -07:00
README.md feat(llm-router): add @bytelyst/llm-router — pure-code LLM router for free-tier providers 2026-03-12 13:45:49 -07:00
tsconfig.json feat(llm-router): add @bytelyst/llm-router — pure-code LLM router for free-tier providers 2026-03-12 13:45:49 -07:00
vitest.config.ts fix(ci): add --pool forks to all vitest test scripts to fix kill EPERM on Node v25 2026-03-27 23:23:38 -07:00

@bytelyst/llm-router

Pure-code LLM router for free-tier API providers. No LLM-in-the-loop — deterministic routing with automatic fallback, health tracking, and round-robin load distribution.

Features

  • 4 free providers out of the box: Groq, OpenRouter, Together AI, Cerebras
  • Prompt classification — regex-based detection of code/math/reasoning/creative prompts
  • Smart selection — routes to the best model for each prompt category
  • Round-robin — distributes load across providers to maximize free-tier usage
  • Auto-fallback — retries on 429/5xx with next-best provider
  • Health tracking — sliding-window stats (latency, error rate, rate-limit rate)
  • Telemetry hook — log every routing decision for analysis
  • OpenAI-compatible — same request/response format as OpenAI chat completions
  • Zero dependencies — pure TypeScript, uses native fetch

Quick Start

# Set at least one API key
export GROQ_API_KEY=gsk_...
export OPENROUTER_API_KEY=sk-or-...
export TOGETHER_API_KEY=...
export CEREBRAS_API_KEY=...
import { LlmRouter } from '@bytelyst/llm-router';

const router = new LlmRouter();

// Automatic routing — classifier picks best provider+model
const result = await router.chat({
  messages: [{ role: 'user', content: 'Write a quicksort in TypeScript' }],
});

console.log(result.response.choices[0].message.content);
console.log(`Served by: ${result.provider}/${result.model} in ${result.totalLatencyMs}ms`);

Explicit Provider Routing

// Force a specific provider:model
const result = await router.chat({
  messages: [{ role: 'user', content: 'Hello' }],
  model: 'groq:llama-3.3-70b-versatile',
});

Telemetry

const router = new LlmRouter({
  onTelemetry: entry => {
    // entry: { event, provider, model, attempt, latencyMs, category, tokens?, error? }
    console.log(`[${entry.event}] ${entry.provider}/${entry.model}${entry.latencyMs}ms`);
  },
});

Health Monitoring

const snapshots = router.getHealth();
// Returns: HealthSnapshot[] with per-provider stats
// { provider, model, totalRequests, successes, rateLimits, errors, avgLatencyMs, p95LatencyMs, healthy }

Configuration

const router = new LlmRouter({
  // Override default providers
  providers: [...],
  // Health window (default: 60s)
  healthWindowMs: 120_000,
  // Error rate to mark unhealthy (default: 50%)
  errorThreshold: 0.4,
  // Rate-limit rate to mark unhealthy (default: 30%)
  rateLimitThreshold: 0.2,
  // Request timeout (default: 30s)
  timeoutMs: 15_000,
  // Max retry attempts (default: 3)
  maxRetries: 4,
});

Provider Selection Logic

  1. Classify prompt → code, math, reasoning, creative, or general
  2. Score each available model based on category match, speed tier, context window, and model size
  3. Filter unhealthy models (based on sliding-window error/rate-limit rates)
  4. Round-robin across top-scoring providers to spread rate-limit load
  5. Fallback on 429/5xx → exclude failed model, pick next best

Default Provider Registry

Provider Models Speed Strengths
Groq Llama 3.3 70B, Llama 3.1 8B, Gemma 2 9B Fastest General, reasoning, code
OpenRouter DeepSeek R1, Llama 3.3 70B, Gemma 2 9B Medium Reasoning, code, math
Together Llama 3.3 70B Turbo, DeepSeek R1 Distill Medium General, reasoning, code
Cerebras Llama 3.3 70B Fastest General, reasoning, code

Adding Custom Providers

Any OpenAI-compatible endpoint works:

import { LlmRouter, DEFAULT_PROVIDERS } from '@bytelyst/llm-router';

const router = new LlmRouter({
  providers: [
    ...DEFAULT_PROVIDERS,
    {
      name: 'my-provider',
      baseUrl: 'https://my-api.example.com/v1',
      apiKeyEnv: 'MY_PROVIDER_KEY',
      rpmLimit: 60,
      tpmLimit: 100_000,
      models: [
        {
          id: 'my-model',
          label: 'My Model',
          contextWindow: 32_000,
          strengths: ['general', 'code'],
          speedTier: 2,
        },
      ],
    },
  ],
});