saravanakumardb1
|
cfc1194079
|
docs(local-llms): add latency/cost comparison and deepseek-r1 transform pattern to evals doc
- Add Latency & Cost Comparison table: llama3.1:8b (~1m27s), qwen2.5-coder:32b
(~5-8m est.), deepseek-r1:32b (~5-8m est.) vs gemini-2.5-flash (~15-25s, $0.003)
and gpt-4o (~20-40s, $0.05-0.15) — all measured at 19 cases, concurrency=4
- Fix assertion pattern docs: single expressions required, not const/return blocks
- Add deepseek-r1 <think> strip transform pattern for promptfoo provider config
- Expand recommended models table with Disk, Reasoning, Pass Rate, and Notes columns
|
2026-02-19 16:05:52 -08:00 |
|
saravanakumardb1
|
80f794dee7
|
docs(local-llm): add Ollama setup, extraction evals, and env vars reference
- docs/02-ollama-setup-and-models.md: installation, server config, memory management,
idle timeout, manual load/unload, OpenAI-compatible API, native API reference,
performance tuning flags (flash attention, KV cache)
- docs/06-extraction-service-evals.md: promptfoo eval suite against Ollama, 19 cases
across 5 tasks, assertion patterns for JSON string output, Python sidecar config
- docs/09-environment-variables.md: comprehensive var reference for Ollama server,
evals, Python sidecar, dashboard, whisper CLI flags, proxy/network settings
|
2026-02-19 13:01:05 -08:00 |
|