docs(local-llm): add Ollama setup, extraction evals, and env vars reference
- docs/02-ollama-setup-and-models.md: installation, server config, memory management, idle timeout, manual load/unload, OpenAI-compatible API, native API reference, performance tuning flags (flash attention, KV cache) - docs/06-extraction-service-evals.md: promptfoo eval suite against Ollama, 19 cases across 5 tasks, assertion patterns for JSON string output, Python sidecar config - docs/09-environment-variables.md: comprehensive var reference for Ollama server, evals, Python sidecar, dashboard, whisper CLI flags, proxy/network settings
This commit is contained in:
parent
464ffb92ec
commit
80f794dee7
230
__LOCAL_LLMs/docs/02-ollama-setup-and-models.md
Normal file
230
__LOCAL_LLMs/docs/02-ollama-setup-and-models.md
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
# 02 — Ollama Setup & Models
|
||||||
|
|
||||||
|
> Installation, server configuration, model management, and memory behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
brew install ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Version installed:** 0.16.2
|
||||||
|
- **Binary:** `/opt/homebrew/opt/ollama/bin/ollama`
|
||||||
|
- **Models stored at:** `~/.ollama/models/`
|
||||||
|
- **Config:** No config file — uses environment variables
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Starting the Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Option A: foreground (dev, see logs)
|
||||||
|
ollama serve
|
||||||
|
|
||||||
|
# Option B: background service (auto-start at login)
|
||||||
|
brew services start ollama
|
||||||
|
|
||||||
|
# Check if running
|
||||||
|
curl http://localhost:11434/api/tags
|
||||||
|
```
|
||||||
|
|
||||||
|
**Server listens on:** `http://127.0.0.1:11434`
|
||||||
|
|
||||||
|
### Corporate Proxy Note
|
||||||
|
|
||||||
|
Ollama auto-detects `HTTP_PROXY` / `HTTPS_PROXY` from the environment. On this machine, the AT&T Forcepoint proxy (`http://cso.proxy.att.com:8080/`) is picked up automatically. Model downloads go through it. If a pull fails:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
NO_PROXY="ollama.com,registry.ollama.ai" ollama pull <model>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Models Currently Installed
|
||||||
|
|
||||||
|
Verified 2026-02-19:
|
||||||
|
|
||||||
|
| Model | Size | Pull Command | Use Case |
|
||||||
|
| ------------------- | ------ | ------------------------------- | --------------------------------------------- |
|
||||||
|
| `qwen2.5-coder:32b` | 19 GB | `ollama pull qwen2.5-coder:32b` | Best coding model — Swift, TypeScript, Python |
|
||||||
|
| `llama3.1:8b` | 4.9 GB | `ollama pull llama3.1:8b` | Default for evals, fast inference |
|
||||||
|
|
||||||
|
### Useful Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all downloaded models (disk)
|
||||||
|
ollama list
|
||||||
|
|
||||||
|
# Show what's currently loaded in RAM
|
||||||
|
ollama ps
|
||||||
|
|
||||||
|
# Pull a new model (downloads to ~/.ollama/models/)
|
||||||
|
ollama pull <model>
|
||||||
|
|
||||||
|
# Run interactively
|
||||||
|
ollama run <model>
|
||||||
|
|
||||||
|
# Run with a one-shot prompt
|
||||||
|
ollama run qwen2.5-coder:32b "Write a Swift function for audio conversion"
|
||||||
|
|
||||||
|
# Remove a model from disk
|
||||||
|
ollama rm <model>
|
||||||
|
|
||||||
|
# Show model details (size, parameters, template)
|
||||||
|
ollama show <model>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Memory Management
|
||||||
|
|
||||||
|
Ollama loads **one model at a time** into RAM by default. This is critical for a 48 GB machine.
|
||||||
|
|
||||||
|
### Key Behaviors
|
||||||
|
|
||||||
|
1. **Models are stored on disk** — you can download as many as disk allows
|
||||||
|
2. **Only the active model loads into RAM** — previous model is evicted when switching
|
||||||
|
3. **Idle timeout:** Models auto-unload after **5 minutes** of inactivity (configurable)
|
||||||
|
4. **Manual unload:** Send a request with `keep_alive: "0"` to unload immediately
|
||||||
|
|
||||||
|
### Controlling Idle Timeout
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default: 5 minutes
|
||||||
|
ollama serve
|
||||||
|
|
||||||
|
# Unload immediately after each request (saves RAM)
|
||||||
|
OLLAMA_KEEP_ALIVE=0 ollama serve
|
||||||
|
|
||||||
|
# Keep loaded for 30 minutes
|
||||||
|
OLLAMA_KEEP_ALIVE=30m ollama serve
|
||||||
|
|
||||||
|
# Keep loaded forever (until manual unload or server restart)
|
||||||
|
OLLAMA_KEEP_ALIVE=-1 ollama serve
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual Load/Unload
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Load a model into RAM (empty prompt trick)
|
||||||
|
curl http://localhost:11434/api/generate -d '{"model": "qwen2.5-coder:32b", "prompt": "", "keep_alive": "10m"}'
|
||||||
|
|
||||||
|
# Unload a model from RAM immediately
|
||||||
|
curl http://localhost:11434/api/generate -d '{"model": "qwen2.5-coder:32b", "prompt": "", "keep_alive": "0"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### How Many Models Can You Have Downloaded?
|
||||||
|
|
||||||
|
As many as your disk allows. Only the loaded model consumes RAM. Plan for 10 models:
|
||||||
|
|
||||||
|
| Count | Approx Disk |
|
||||||
|
| --------------- | ----------- |
|
||||||
|
| 2 (current) | ~24 GB |
|
||||||
|
| 5 (moderate) | ~55 GB |
|
||||||
|
| 10 (full stack) | ~115 GB |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenAI-Compatible API
|
||||||
|
|
||||||
|
Ollama exposes a drop-in OpenAI API at:
|
||||||
|
|
||||||
|
```
|
||||||
|
Base URL: http://localhost:11434/v1
|
||||||
|
API Key: ollama (any non-empty string)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: curl
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11434/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "llama3.1:8b",
|
||||||
|
"messages": [{"role": "user", "content": "Return JSON: {\"hello\": \"world\"}"}],
|
||||||
|
"response_format": {"type": "json_object"}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: Node.js (OpenAI SDK)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import OpenAI from 'openai';
|
||||||
|
|
||||||
|
const client = new OpenAI({
|
||||||
|
baseURL: 'http://localhost:11434/v1',
|
||||||
|
apiKey: 'ollama',
|
||||||
|
});
|
||||||
|
|
||||||
|
const res = await client.chat.completions.create({
|
||||||
|
model: 'llama3.1:8b',
|
||||||
|
messages: [{ role: 'user', content: 'Extract action items from: ...' }],
|
||||||
|
response_format: { type: 'json_object' },
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: Python
|
||||||
|
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
|
||||||
|
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model="llama3.1:8b",
|
||||||
|
messages=[{"role": "user", "content": "Extract action items from: ..."}],
|
||||||
|
response_format={"type": "json_object"},
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Native Ollama API
|
||||||
|
|
||||||
|
Beyond the OpenAI-compatible endpoint, Ollama has its own API:
|
||||||
|
|
||||||
|
| Endpoint | Method | Purpose |
|
||||||
|
| ----------------- | ------ | ----------------------------------- |
|
||||||
|
| `/api/tags` | GET | List all downloaded models |
|
||||||
|
| `/api/ps` | GET | List models currently loaded in RAM |
|
||||||
|
| `/api/generate` | POST | Generate text (single-turn) |
|
||||||
|
| `/api/chat` | POST | Chat completion (multi-turn) |
|
||||||
|
| `/api/pull` | POST | Download a model |
|
||||||
|
| `/api/delete` | DELETE | Remove a model from disk |
|
||||||
|
| `/api/show` | POST | Show model metadata |
|
||||||
|
| `/api/embeddings` | POST | Generate embeddings |
|
||||||
|
|
||||||
|
Full docs: https://github.com/ollama/ollama/blob/main/docs/api.md
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance on M4 Pro 48 GB
|
||||||
|
|
||||||
|
- **MLX warning:** `MLX dynamic library not available` — **harmless**, falls back to Metal/CPU automatically
|
||||||
|
- **Metal backend:** Fully utilized on Apple Silicon — near-GPU speeds via unified memory
|
||||||
|
- **Inference speed estimates:**
|
||||||
|
- 7B models: ~40-60 tok/s
|
||||||
|
- 32B models: ~15-25 tok/s
|
||||||
|
- 70B (Q4): ~5-10 tok/s
|
||||||
|
- **RAM usage (model loaded):**
|
||||||
|
- 7B: ~5-6 GB
|
||||||
|
- 32B: ~20-22 GB
|
||||||
|
- 70B (Q4): ~40-42 GB
|
||||||
|
|
||||||
|
### Performance Tuning
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable flash attention (faster, less RAM)
|
||||||
|
OLLAMA_FLASH_ATTENTION=1 ollama serve
|
||||||
|
|
||||||
|
# KV cache quantization (smaller RAM footprint)
|
||||||
|
OLLAMA_KV_CACHE_TYPE=q8_0 ollama serve
|
||||||
|
|
||||||
|
# Both together
|
||||||
|
OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama serve
|
||||||
|
|
||||||
|
# Allow concurrent requests (default: 1)
|
||||||
|
OLLAMA_NUM_PARALLEL=2 ollama serve
|
||||||
|
```
|
||||||
113
__LOCAL_LLMs/docs/06-extraction-service-evals.md
Normal file
113
__LOCAL_LLMs/docs/06-extraction-service-evals.md
Normal file
@ -0,0 +1,113 @@
|
|||||||
|
# 06 — Extraction Service Evals
|
||||||
|
|
||||||
|
> Running the promptfoo eval suite against Ollama for offline, zero-cost model evaluation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The extraction-service has a full promptfoo eval suite that can run against local Ollama models instead of (or alongside) cloud Gemini. This enables:
|
||||||
|
|
||||||
|
- **Zero-cost iteration** on extraction prompts
|
||||||
|
- **Side-by-side comparison** of local vs cloud model quality
|
||||||
|
- **Offline development** when cloud APIs are unavailable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
| --------------------------------------------------------- | -------------------------------------------------- |
|
||||||
|
| `services/extraction-service/evals/promptfoo.yaml` | Gemini evals (via extraction-service HTTP API) |
|
||||||
|
| `services/extraction-service/evals/promptfoo.ollama.yaml` | Same 19 cases, hits Ollama directly |
|
||||||
|
| `services/extraction-service/evals/compare-evals.sh` | Side-by-side Gemini vs Ollama pass-rate comparison |
|
||||||
|
| `services/extraction-service/evals/fixtures/golden.json` | Machine-readable golden fixtures |
|
||||||
|
| `services/extraction-service/evals/README.md` | Full usage docs |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running Evals
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd services/extraction-service
|
||||||
|
|
||||||
|
# Ollama only (no extraction-service needed)
|
||||||
|
pnpm eval:ollama
|
||||||
|
|
||||||
|
# Different model
|
||||||
|
OLLAMA_MODEL=qwen2.5:7b pnpm eval:ollama
|
||||||
|
|
||||||
|
# Compare Gemini vs Ollama (needs extraction-service running + EXTRACTION_EVAL_TOKEN)
|
||||||
|
pnpm eval:compare
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Ollama must be running (`ollama serve`)
|
||||||
|
- A model must be available (`ollama pull llama3.1:8b`)
|
||||||
|
- For comparison: extraction-service must be running with `EXTRACTION_EVAL_TOKEN` set
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Eval Coverage
|
||||||
|
|
||||||
|
| Task | Cases | Key Assertions |
|
||||||
|
| ----------------------- | ----- | --------------------------------------------------------------- |
|
||||||
|
| `transcript-extraction` | 4 | action_item, deadline, person, decision, question |
|
||||||
|
| `triage` | 5 | brain_signal routing (health/work/money), emotion valence |
|
||||||
|
| `memory-insight` | 4 | pattern frequency, relationship, milestone, recurring_theme |
|
||||||
|
| `reflection-enrichment` | 4 | emotional_state valence, accomplishment, concern, goal_progress |
|
||||||
|
| `bug-report-extraction` | 2 | all 5 fields, severity level attribute |
|
||||||
|
|
||||||
|
**Total: 19 cases, 50+ assertions**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Important: Assertion Pattern
|
||||||
|
|
||||||
|
Ollama returns a raw JSON **string** — assertions must parse it inline:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# ✅ Correct — parse the string first
|
||||||
|
- type: javascript
|
||||||
|
value: "const r=JSON.parse(output); return r.extractions.map(e=>e.extraction_class).includes('action');"
|
||||||
|
|
||||||
|
# ❌ Wrong — output is a string, not an object
|
||||||
|
- type: javascript
|
||||||
|
value: output.classes.includes('action')
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pointing the Python Sidecar at Ollama
|
||||||
|
|
||||||
|
The extraction-service Python sidecar (LangExtract) uses Gemini by default. To switch to Ollama for local dev:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export LANGEXTRACT_PROVIDER=openai_compat
|
||||||
|
export LANGEXTRACT_BASE_URL=http://localhost:11434/v1
|
||||||
|
export LANGEXTRACT_API_KEY=ollama
|
||||||
|
export LANGEXTRACT_MODEL=llama3.1:8b
|
||||||
|
```
|
||||||
|
|
||||||
|
> Check `services/extraction-service/python/` for exact env var names — the sidecar config may use different keys depending on LangExtract version.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cost Comparison
|
||||||
|
|
||||||
|
| Provider | Cost per full run | Notes |
|
||||||
|
| ----------------------------------- | ----------------- | ---------------------------------- |
|
||||||
|
| **Gemini** (via extraction-service) | ~$0.003–0.005 | gemini-2.5-flash |
|
||||||
|
| **Ollama** (local) | $0.00 | Fully offline after model download |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Models for Evals
|
||||||
|
|
||||||
|
| Model | JSON Quality | Speed | Notes |
|
||||||
|
| ------------------- | ------------ | -------- | ------------------------------- |
|
||||||
|
| `llama3.1:8b` | Good | Fast | Default, reliable JSON output |
|
||||||
|
| `qwen2.5:7b` | Excellent | Fast | Best JSON structure compliance |
|
||||||
|
| `qwen2.5-coder:32b` | Excellent | Moderate | Best quality, slower |
|
||||||
|
| `phi4` | Good | Fast | Good reasoning for triage tasks |
|
||||||
135
__LOCAL_LLMs/docs/09-environment-variables.md
Normal file
135
__LOCAL_LLMs/docs/09-environment-variables.md
Normal file
@ -0,0 +1,135 @@
|
|||||||
|
# 09 — Environment Variables Reference
|
||||||
|
|
||||||
|
> All configuration variables for Ollama, Whisper, dashboard, and evals.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ollama Server
|
||||||
|
|
||||||
|
| Variable | Default | Purpose |
|
||||||
|
| -------------------------- | ------------------------ | ------------------------------------------------------ |
|
||||||
|
| `OLLAMA_HOST` | `http://127.0.0.1:11434` | Server bind address |
|
||||||
|
| `OLLAMA_MODELS` | `~/.ollama/models` | Model storage path |
|
||||||
|
| `OLLAMA_KEEP_ALIVE` | `5m` | How long to keep model loaded after last request |
|
||||||
|
| `OLLAMA_FLASH_ATTENTION` | `false` | Enable flash attention (faster, less RAM) |
|
||||||
|
| `OLLAMA_KV_CACHE_TYPE` | _(none)_ | KV cache quantization (`q8_0` = smaller RAM footprint) |
|
||||||
|
| `OLLAMA_NUM_PARALLEL` | `1` | Number of concurrent requests |
|
||||||
|
| `OLLAMA_MAX_LOADED_MODELS` | `1` | Max models loaded in RAM simultaneously |
|
||||||
|
| `OLLAMA_GPU_OVERHEAD` | _(auto)_ | Reserved GPU memory (bytes) |
|
||||||
|
| `OLLAMA_ORIGINS` | `*` | Allowed CORS origins |
|
||||||
|
| `OLLAMA_DEBUG` | `false` | Enable debug logging |
|
||||||
|
| `HTTP_PROXY` | _(system)_ | HTTP proxy for model downloads |
|
||||||
|
| `HTTPS_PROXY` | _(system)_ | HTTPS proxy for model downloads |
|
||||||
|
| `NO_PROXY` | _(none)_ | Hosts to bypass proxy |
|
||||||
|
|
||||||
|
### Performance Tuning Combo
|
||||||
|
|
||||||
|
```bash
|
||||||
|
OLLAMA_FLASH_ATTENTION=1 \
|
||||||
|
OLLAMA_KV_CACHE_TYPE=q8_0 \
|
||||||
|
OLLAMA_NUM_PARALLEL=2 \
|
||||||
|
OLLAMA_KEEP_ALIVE=10m \
|
||||||
|
ollama serve
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Extraction Service Evals (promptfoo)
|
||||||
|
|
||||||
|
| Variable | Default | Purpose |
|
||||||
|
| ----------------------- | --------------------------- | --------------------------------------- |
|
||||||
|
| `OLLAMA_MODEL` | `llama3.1:8b` | Model used by `pnpm eval:ollama` |
|
||||||
|
| `OLLAMA_BASE_URL` | `http://localhost:11434/v1` | OpenAI-compat endpoint for promptfoo |
|
||||||
|
| `EXTRACTION_EVAL_TOKEN` | _(none)_ | Auth token for extraction-service evals |
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run evals with a different model
|
||||||
|
OLLAMA_MODEL=qwen2.5:7b pnpm eval:ollama
|
||||||
|
|
||||||
|
# Compare Gemini vs Ollama
|
||||||
|
EXTRACTION_EVAL_TOKEN=your-token pnpm eval:compare
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python Sidecar (LangExtract)
|
||||||
|
|
||||||
|
| Variable | Default | Purpose |
|
||||||
|
| ---------------------- | ---------------- | --------------------------------------------- |
|
||||||
|
| `LANGEXTRACT_PROVIDER` | `gemini` | Switch to `openai_compat` for Ollama |
|
||||||
|
| `LANGEXTRACT_BASE_URL` | _(Gemini)_ | Set to `http://localhost:11434/v1` for Ollama |
|
||||||
|
| `LANGEXTRACT_API_KEY` | _(Gemini key)_ | Set to `ollama` for local |
|
||||||
|
| `LANGEXTRACT_MODEL` | _(Gemini model)_ | Set to `llama3.1:8b` or preferred model |
|
||||||
|
|
||||||
|
### Switch to Ollama
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export LANGEXTRACT_PROVIDER=openai_compat
|
||||||
|
export LANGEXTRACT_BASE_URL=http://localhost:11434/v1
|
||||||
|
export LANGEXTRACT_API_KEY=ollama
|
||||||
|
export LANGEXTRACT_MODEL=llama3.1:8b
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mission Control Dashboard
|
||||||
|
|
||||||
|
| Variable | Default | Purpose |
|
||||||
|
| ------------ | ------------------------ | -------------------------------------- |
|
||||||
|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama server URL (used by API routes) |
|
||||||
|
| `PORT` | `3100` | Dashboard dev server port |
|
||||||
|
|
||||||
|
### Start with Custom Ollama URL
|
||||||
|
|
||||||
|
```bash
|
||||||
|
OLLAMA_URL=http://192.168.1.100:11434 npm run dev -- -p 3100
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Whisper.cpp
|
||||||
|
|
||||||
|
Whisper.cpp uses CLI flags rather than environment variables:
|
||||||
|
|
||||||
|
| Flag | Purpose | Example |
|
||||||
|
| ----------------- | ----------------------------- | -------------------------------------------------- |
|
||||||
|
| `--model` | Path to GGML model file | `--model ~/whisper-models/ggml-large-v3-turbo.bin` |
|
||||||
|
| `--language` | Input language | `--language en` |
|
||||||
|
| `--file` | Audio file path | `--file recording.wav` |
|
||||||
|
| `--output-json` | Output in JSON format | `--output-json` |
|
||||||
|
| `--output-srt` | Output as SRT subtitles | `--output-srt` |
|
||||||
|
| `--output-vtt` | Output as VTT subtitles | `--output-vtt` |
|
||||||
|
| `--translate` | Translate to English | `--translate` |
|
||||||
|
| `--threads` | Number of CPU threads | `--threads 8` |
|
||||||
|
| `--processors` | Number of processors | `--processors 1` |
|
||||||
|
| `--print-colors` | Colorize output by confidence | `--print-colors` |
|
||||||
|
| `--no-timestamps` | Omit timestamps | `--no-timestamps` |
|
||||||
|
| `--port` | Server port (whisper-server) | `--port 8080` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proxy / Network (Corporate)
|
||||||
|
|
||||||
|
| Variable | Value on This Machine | Purpose |
|
||||||
|
| ------------------------------ | -------------------------------- | ------------------------------------------------- |
|
||||||
|
| `HTTP_PROXY` | `http://cso.proxy.att.com:8080/` | Corporate HTTP proxy |
|
||||||
|
| `HTTPS_PROXY` | `http://cso.proxy.att.com:8080/` | Corporate HTTPS proxy |
|
||||||
|
| `NODE_TLS_REJECT_UNAUTHORIZED` | `0` | Bypass Forcepoint SSL interception for Node.js |
|
||||||
|
| `NO_PROXY` | _(not set by default)_ | Add `ollama.com,registry.ollama.ai` if pulls fail |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## All Paths
|
||||||
|
|
||||||
|
| Path | Content |
|
||||||
|
| ------------------------------------ | --------------------------- |
|
||||||
|
| `~/.ollama/models/` | Downloaded Ollama models |
|
||||||
|
| `~/whisper-models/` | Whisper GGML model files |
|
||||||
|
| `/opt/homebrew/bin/ollama` | Ollama binary |
|
||||||
|
| `/opt/homebrew/bin/whisper-cli` | Whisper CLI binary |
|
||||||
|
| `/opt/homebrew/bin/ffmpeg` | FFmpeg binary |
|
||||||
|
| `__LOCAL_LLMs/dashboard/` | Mission Control Next.js app |
|
||||||
|
| `__LOCAL_LLMs/docs/` | This documentation |
|
||||||
|
| `services/extraction-service/evals/` | Promptfoo eval configs |
|
||||||
Loading…
Reference in New Issue
Block a user