Commit Graph

2 Commits

Author SHA1 Message Date
saravanakumardb1
cfc1194079 docs(local-llms): add latency/cost comparison and deepseek-r1 transform pattern to evals doc
- Add Latency & Cost Comparison table: llama3.1:8b (~1m27s), qwen2.5-coder:32b
  (~5-8m est.), deepseek-r1:32b (~5-8m est.) vs gemini-2.5-flash (~15-25s, $0.003)
  and gpt-4o (~20-40s, $0.05-0.15) — all measured at 19 cases, concurrency=4
- Fix assertion pattern docs: single expressions required, not const/return blocks
- Add deepseek-r1 <think> strip transform pattern for promptfoo provider config
- Expand recommended models table with Disk, Reasoning, Pass Rate, and Notes columns
2026-02-19 16:05:52 -08:00
saravanakumardb1
80f794dee7 docs(local-llm): add Ollama setup, extraction evals, and env vars reference
- docs/02-ollama-setup-and-models.md: installation, server config, memory management,
  idle timeout, manual load/unload, OpenAI-compatible API, native API reference,
  performance tuning flags (flash attention, KV cache)
- docs/06-extraction-service-evals.md: promptfoo eval suite against Ollama, 19 cases
  across 5 tasks, assertion patterns for JSON string output, Python sidecar config
- docs/09-environment-variables.md: comprehensive var reference for Ollama server,
  evals, Python sidecar, dashboard, whisper CLI flags, proxy/network settings
2026-02-19 13:01:05 -08:00