docs(local-llms): add comprehensive model comparison table and deepseek-r1:32b details
- Add Comprehensive Model Comparison Table: 11 models (local + cloud) with Disk, Params, Quant, RAM, Tok/s, JSON quality, Reasoning, Code, Instruction Following, Context window, <think> flag, and install status columns - Add Gap Analysis table: llama3.1:8b (~55%), qwen2.5-coder:32b (~85%), deepseek-r1:32b (~75-80%) vs llama3.3:70b across 5 capability dimensions - Update Tier 4 Reasoning table: add Parameters, Quant columns; add <think> warning note with link to eval doc transform pattern - Update By Use Case table: add brain signal routing row, update extraction evals fallback to qwen2.5-coder:32b
This commit is contained in:
parent
cfc1194079
commit
5deb5efdcf
@ -31,10 +31,12 @@
|
||||
|
||||
## Tier 4 — Reasoning & Deep Thinking
|
||||
|
||||
| Model | Size | RAM Used | Pull Command | Notes |
|
||||
| --------------------- | ----- | -------- | ----------------------------- | ------------------------------------------------ |
|
||||
| **`deepseek-r1:32b`** | 20 GB | ~22 GB | `ollama pull deepseek-r1:32b` | Chain-of-thought reasoning, closest to Kimi k1.5 |
|
||||
| `deepseek-r1:7b` | 5 GB | ~6 GB | `ollama pull deepseek-r1:7b` | Lightweight reasoning |
|
||||
| Model | Size | Parameters | Quant | RAM Used | Pull Command | Notes |
|
||||
| --------------------- | ----- | ---------- | ------ | -------- | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **`deepseek-r1:32b`** | 20 GB | 32B | Q4_K_M | ~22 GB | `ollama pull deepseek-r1:32b` | Chain-of-thought reasoning — emits `<think>` blocks before JSON output; ~75–80% of llama3.3:70b reasoning quality at half the RAM |
|
||||
| `deepseek-r1:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | `ollama pull deepseek-r1:7b` | Lightweight reasoning, good for quick triage |
|
||||
|
||||
> **⚠️ JSON output note:** DeepSeek R1 models emit `<think>...</think>` reasoning traces before the JSON response. Strip these before `JSON.parse()` — see [06-extraction-service-evals.md](06-extraction-service-evals.md) for the transform pattern.
|
||||
|
||||
## Tier 5 — Vision (Multimodal)
|
||||
|
||||
@ -78,17 +80,62 @@ Only one loads into RAM at a time. You can have all 10 on disk simultaneously.
|
||||
|
||||
## By Use Case (Quick Reference)
|
||||
|
||||
| Use Case | Best Model | Fallback |
|
||||
| -------------------------- | ------------------- | ----------------------- |
|
||||
| **TypeScript/ESM coding** | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` |
|
||||
| **Python coding** | `qwen2.5-coder:32b` | `deepseek-coder-v2:16b` |
|
||||
| **Swift/iOS coding** | `qwen2.5-coder:32b` | `codestral:22b` |
|
||||
| **Extraction evals** | `llama3.1:8b` | `qwen2.5:7b` |
|
||||
| **JSON structured output** | `qwen2.5:7b` | `qwen2.5-coder:7b` |
|
||||
| **Complex reasoning** | `deepseek-r1:32b` | `phi4:14b` |
|
||||
| **Image understanding** | `llava:34b` | `qwen2.5vl:7b` |
|
||||
| **Embeddings** | `nomic-embed-text` | `mxbai-embed-large` |
|
||||
| **Fast iteration** | `qwen2.5-coder:7b` | `llama3.1:8b` |
|
||||
| Use Case | Best Model | Fallback |
|
||||
| ------------------------------ | ------------------- | ----------------------- |
|
||||
| **TypeScript/ESM coding** | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` |
|
||||
| **Python coding** | `qwen2.5-coder:32b` | `deepseek-coder-v2:16b` |
|
||||
| **Swift/iOS coding** | `qwen2.5-coder:32b` | `codestral:22b` |
|
||||
| **Extraction evals** | `llama3.1:8b` | `qwen2.5-coder:32b` |
|
||||
| **JSON structured output** | `qwen2.5-coder:32b` | `qwen2.5:7b` |
|
||||
| **Complex reasoning / triage** | `deepseek-r1:32b` | `phi4:14b` |
|
||||
| **Brain signal routing** | `deepseek-r1:32b` | `qwen2.5-coder:32b` |
|
||||
| **Image understanding** | `llava:34b` | `qwen2.5vl:7b` |
|
||||
| **Embeddings** | `nomic-embed-text` | `mxbai-embed-large` |
|
||||
| **Fast iteration / dev evals** | `llama3.1:8b` | `qwen2.5-coder:7b` |
|
||||
|
||||
---
|
||||
|
||||
## Comprehensive Model Comparison Table
|
||||
|
||||
All models discussed — detailed capability reference for M4 Pro 48 GB:
|
||||
|
||||
| Model | Disk | Params | Quant | RAM | Tok/s | JSON | Reasoning | Code | Instruction Following | Context | `<think>` | Status on this machine |
|
||||
| ----------------------- | ------- | ------ | ------ | ------ | --------- | -------------- | ---------- | ---------- | --------------------- | ------- | --------- | -------------------------- |
|
||||
| `llama3.1:8b` | 4.9 GB | 8B | Q4_K_M | ~6 GB | 40–60 | ✅ Good | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | 128k | ❌ | ✅ Installed |
|
||||
| `qwen2.5-coder:32b` | 18.5 GB | 32.8B | Q4_K_M | ~22 GB | 15–25 | ✅ Excellent | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ✅ Installed |
|
||||
| `deepseek-r1:32b` | 20 GB | 32B | Q4_K_M | ~22 GB | 12–20 | ⚠️ Needs strip | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ✅ Yes | 🔲 Not installed |
|
||||
| `llama3.3:70b` (Q4) | 40 GB | 70B | Q4_K_M | ~42 GB | 5–10 | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ⚠️ Tight (6GB left for OS) |
|
||||
| `qwen2.5:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | 40–60 | ✅ Excellent | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ❌ | 🔲 Not installed |
|
||||
| `deepseek-r1:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | 35–50 | ⚠️ Needs strip | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 128k | ✅ Yes | 🔲 Not installed |
|
||||
| `phi4:14b` | 9 GB | 14B | Q4_K_M | ~11 GB | 25–35 | ✅ Good | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 16k | ❌ | 🔲 Not installed |
|
||||
| `deepseek-coder-v2:16b` | 10 GB | 16B | Q4_K_M | ~12 GB | 25–35 | ✅ Good | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ❌ | 🔲 Not installed |
|
||||
| `codestral:22b` | 13 GB | 22B | Q4_K_M | ~15 GB | 20–30 | ✅ Good | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 32k | ❌ | 🔲 Not installed |
|
||||
| `gemini-2.5-flash` | — | — | Cloud | — | ~1s/req | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 1M | ❌ | ☁️ Cloud ($0.003/run) |
|
||||
| `gpt-4o` | — | — | Cloud | — | ~1–2s/req | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ☁️ Cloud ($0.05–0.15/run) |
|
||||
|
||||
### Column Key
|
||||
|
||||
| Column | Meaning |
|
||||
| ------------------------- | -------------------------------------------------------------------- |
|
||||
| **Tok/s** | Tokens per second on M4 Pro 48 GB (Metal backend) |
|
||||
| **JSON** | Reliability of structured JSON output compliance |
|
||||
| **Reasoning** | Multi-step / chain-of-thought quality (⭐ = weak, ⭐⭐⭐⭐⭐ = best) |
|
||||
| **Code** | Code generation quality across TS/Python/Swift |
|
||||
| **Instruction Following** | Adherence to output format constraints |
|
||||
| **`<think>`** | Emits reasoning traces before output (needs stripping for JSON) |
|
||||
|
||||
### Gap Analysis vs llama3.3:70b (cloud-quality ceiling locally)
|
||||
|
||||
| Gap | `llama3.1:8b` | `qwen2.5-coder:32b` | `deepseek-r1:32b` |
|
||||
| ---------------------- | :-----------: | :-----------------: | :---------------: |
|
||||
| Multi-step reasoning | ~40% | ~65% | ~80% |
|
||||
| Strict JSON compliance | ~75% | ~95% | ~70%\* |
|
||||
| Brain signal routing | ~60% | ~80% | ~90% |
|
||||
| Code generation | ~55% | ~95% | ~80% |
|
||||
| Instruction following | ~70% | ~90% | ~85% |
|
||||
| **Overall vs 70B** | **~55%** | **~85%** | **~75–80%** |
|
||||
|
||||
\*With `<think>` strip transform applied
|
||||
|
||||
---
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user