learning_ai_common_plat/__LOCAL_LLMs/docs/09-environment-variables.md
saravanakumardb1 80f794dee7 docs(local-llm): add Ollama setup, extraction evals, and env vars reference
- docs/02-ollama-setup-and-models.md: installation, server config, memory management,
  idle timeout, manual load/unload, OpenAI-compatible API, native API reference,
  performance tuning flags (flash attention, KV cache)
- docs/06-extraction-service-evals.md: promptfoo eval suite against Ollama, 19 cases
  across 5 tasks, assertion patterns for JSON string output, Python sidecar config
- docs/09-environment-variables.md: comprehensive var reference for Ollama server,
  evals, Python sidecar, dashboard, whisper CLI flags, proxy/network settings
2026-02-19 13:01:05 -08:00

7.0 KiB

09 — Environment Variables Reference

All configuration variables for Ollama, Whisper, dashboard, and evals.


Ollama Server

Variable Default Purpose
OLLAMA_HOST http://127.0.0.1:11434 Server bind address
OLLAMA_MODELS ~/.ollama/models Model storage path
OLLAMA_KEEP_ALIVE 5m How long to keep model loaded after last request
OLLAMA_FLASH_ATTENTION false Enable flash attention (faster, less RAM)
OLLAMA_KV_CACHE_TYPE (none) KV cache quantization (q8_0 = smaller RAM footprint)
OLLAMA_NUM_PARALLEL 1 Number of concurrent requests
OLLAMA_MAX_LOADED_MODELS 1 Max models loaded in RAM simultaneously
OLLAMA_GPU_OVERHEAD (auto) Reserved GPU memory (bytes)
OLLAMA_ORIGINS * Allowed CORS origins
OLLAMA_DEBUG false Enable debug logging
HTTP_PROXY (system) HTTP proxy for model downloads
HTTPS_PROXY (system) HTTPS proxy for model downloads
NO_PROXY (none) Hosts to bypass proxy

Performance Tuning Combo

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=q8_0 \
OLLAMA_NUM_PARALLEL=2 \
OLLAMA_KEEP_ALIVE=10m \
ollama serve

Extraction Service Evals (promptfoo)

Variable Default Purpose
OLLAMA_MODEL llama3.1:8b Model used by pnpm eval:ollama
OLLAMA_BASE_URL http://localhost:11434/v1 OpenAI-compat endpoint for promptfoo
EXTRACTION_EVAL_TOKEN (none) Auth token for extraction-service evals

Usage

# Run evals with a different model
OLLAMA_MODEL=qwen2.5:7b pnpm eval:ollama

# Compare Gemini vs Ollama
EXTRACTION_EVAL_TOKEN=your-token pnpm eval:compare

Python Sidecar (LangExtract)

Variable Default Purpose
LANGEXTRACT_PROVIDER gemini Switch to openai_compat for Ollama
LANGEXTRACT_BASE_URL (Gemini) Set to http://localhost:11434/v1 for Ollama
LANGEXTRACT_API_KEY (Gemini key) Set to ollama for local
LANGEXTRACT_MODEL (Gemini model) Set to llama3.1:8b or preferred model

Switch to Ollama

export LANGEXTRACT_PROVIDER=openai_compat
export LANGEXTRACT_BASE_URL=http://localhost:11434/v1
export LANGEXTRACT_API_KEY=ollama
export LANGEXTRACT_MODEL=llama3.1:8b

Mission Control Dashboard

Variable Default Purpose
OLLAMA_URL http://localhost:11434 Ollama server URL (used by API routes)
PORT 3100 Dashboard dev server port

Start with Custom Ollama URL

OLLAMA_URL=http://192.168.1.100:11434 npm run dev -- -p 3100

Whisper.cpp

Whisper.cpp uses CLI flags rather than environment variables:

Flag Purpose Example
--model Path to GGML model file --model ~/whisper-models/ggml-large-v3-turbo.bin
--language Input language --language en
--file Audio file path --file recording.wav
--output-json Output in JSON format --output-json
--output-srt Output as SRT subtitles --output-srt
--output-vtt Output as VTT subtitles --output-vtt
--translate Translate to English --translate
--threads Number of CPU threads --threads 8
--processors Number of processors --processors 1
--print-colors Colorize output by confidence --print-colors
--no-timestamps Omit timestamps --no-timestamps
--port Server port (whisper-server) --port 8080

Proxy / Network (Corporate)

Variable Value on This Machine Purpose
HTTP_PROXY http://cso.proxy.att.com:8080/ Corporate HTTP proxy
HTTPS_PROXY http://cso.proxy.att.com:8080/ Corporate HTTPS proxy
NODE_TLS_REJECT_UNAUTHORIZED 0 Bypass Forcepoint SSL interception for Node.js
NO_PROXY (not set by default) Add ollama.com,registry.ollama.ai if pulls fail

All Paths

Path Content
~/.ollama/models/ Downloaded Ollama models
~/whisper-models/ Whisper GGML model files
/opt/homebrew/bin/ollama Ollama binary
/opt/homebrew/bin/whisper-cli Whisper CLI binary
/opt/homebrew/bin/ffmpeg FFmpeg binary
__LOCAL_LLMs/dashboard/ Mission Control Next.js app
__LOCAL_LLMs/docs/ This documentation
services/extraction-service/evals/ Promptfoo eval configs