# 07 — Model Recommendations > Tiered model guide by use case, size, and quality for Apple M4 Pro with 48 GB unified memory. --- ## Tier 1 — Best Overall Coding Models | Model | Size | RAM Used | Pull Command | Notes | | ----------------------- | ----- | -------- | ------------------------------- | -------------------------------------------------- | | **`qwen2.5-coder:32b`** | 19 GB | ~22 GB | `ollama pull qwen2.5-coder:32b` | **Top pick** — rivals GPT-4o on code, 128k context | | `deepseek-coder-v2:16b` | 10 GB | ~12 GB | `ollama pull deepseek-coder-v2` | Best open-source coding model at 16B | | `codestral:22b` | 13 GB | ~15 GB | `ollama pull codestral` | Mistral's coding model, very fast completions | ## Tier 2 — Fast & Capable (Speed/Quality Balance) | Model | Size | RAM Used | Pull Command | Notes | | ---------------------- | ---- | -------- | --------------------------------- | --------------------------------------------- | | **`qwen2.5-coder:7b`** | 5 GB | ~6 GB | `ollama pull qwen2.5-coder:7b` | Fast, surprisingly good for TS/Python/Swift | | `deepseek-coder:6.7b` | 4 GB | ~5 GB | `ollama pull deepseek-coder:6.7b` | Lightweight, solid everyday coding | | `codegemma:7b` | 5 GB | ~6 GB | `ollama pull codegemma:7b` | Google's model, decent but outclassed by Qwen | ## Tier 3 — General Purpose (Also Good at Code) | Model | Size | RAM Used | Pull Command | Notes | | ------------------- | ------ | -------- | -------------------------- | ----------------------------------- | | `llama3.1:70b` (Q4) | 40 GB | ~42 GB | `ollama pull llama3.1:70b` | Best general model — tight on 48 GB | | `llama3.1:8b` | 4.9 GB | ~6 GB | `ollama pull llama3.1:8b` | Fast, good for evals | | `mistral-nemo:12b` | 7 GB | ~9 GB | `ollama pull mistral-nemo` | Fast reasoning | | `phi4:14b` | 9 GB | ~11 GB | `ollama pull phi4` | Strong reasoning, fits comfortably | ## Tier 4 — Reasoning & Deep Thinking | Model | Size | Parameters | Quant | RAM Used | Pull Command | Notes | | --------------------- | ----- | ---------- | ------ | -------- | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | **`deepseek-r1:32b`** | 20 GB | 32B | Q4_K_M | ~22 GB | `ollama pull deepseek-r1:32b` | Chain-of-thought reasoning — emits `` blocks before JSON output; ~75–80% of llama3.3:70b reasoning quality at half the RAM | | `deepseek-r1:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | `ollama pull deepseek-r1:7b` | Lightweight reasoning, good for quick triage | > **⚠️ JSON output note:** DeepSeek R1 models emit `...` reasoning traces before the JSON response. Strip these before `JSON.parse()` — see [06-extraction-service-evals.md](06-extraction-service-evals.md) for the transform pattern. ## Tier 5 — Vision (Multimodal) | Model | Size | RAM Used | Pull Command | Notes | | -------------- | ----- | -------- | -------------------------- | ------------------------ | | `llava:34b` | 22 GB | ~22 GB | `ollama pull llava:34b` | Image understanding, OCR | | `qwen2.5vl:7b` | 6 GB | ~6 GB | `ollama pull qwen2.5vl:7b` | Qwen vision, fast | | `minicpm-v:8b` | 6 GB | ~6 GB | `ollama pull minicpm-v:8b` | Strong OCR | | `moondream2` | 2 GB | ~2 GB | `ollama pull moondream2` | Tiny, basic vision | ## Tier 6 — Embeddings | Model | Size | RAM Used | Pull Command | Notes | | ------------------- | ------ | -------- | ------------------------------- | ------------------------- | | `nomic-embed-text` | 0.3 GB | ~0.5 GB | `ollama pull nomic-embed-text` | Good for semantic search | | `mxbai-embed-large` | 0.7 GB | ~1 GB | `ollama pull mxbai-embed-large` | Higher quality embeddings | --- ## Recommended 10-Model Stack for M4 Pro 48 GB For maximum coverage across all use cases: | # | Model | Disk | Use Case | | --- | ----------------------- | ----------- | ---------------------------------------- | | 1 | `qwen2.5-coder:32b` | 19 GB | **Primary** — coding (TS, Python, Swift) | | 2 | `qwen2.5-coder:7b` | 5 GB | Fast coding completions | | 3 | `deepseek-coder-v2:16b` | 10 GB | Alternative coding model | | 4 | `llama3.1:8b` | 4.9 GB | Eval default, general tasks | | 5 | `deepseek-r1:32b` | 20 GB | Deep reasoning, complex triage | | 6 | `codestral:22b` | 13 GB | Fast code completions (Mistral) | | 7 | `phi4:14b` | 9 GB | Reasoning, structured output | | 8 | `llava:34b` | 22 GB | Vision / image understanding | | 9 | `mistral-nemo:12b` | 7 GB | Fast general purpose | | 10 | `nomic-embed-text` | 0.3 GB | Embeddings / semantic search | | | **Total** | **~115 GB** | | Only one loads into RAM at a time. You can have all 10 on disk simultaneously. --- ## By Use Case (Quick Reference) | Use Case | Best Model | Fallback | | ------------------------------ | ------------------- | ----------------------- | | **TypeScript/ESM coding** | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` | | **Python coding** | `qwen2.5-coder:32b` | `deepseek-coder-v2:16b` | | **Swift/iOS coding** | `qwen2.5-coder:32b` | `codestral:22b` | | **Extraction evals** | `llama3.1:8b` | `qwen2.5-coder:32b` | | **JSON structured output** | `qwen2.5-coder:32b` | `qwen2.5:7b` | | **Complex reasoning / triage** | `deepseek-r1:32b` | `phi4:14b` | | **Brain signal routing** | `deepseek-r1:32b` | `qwen2.5-coder:32b` | | **Image understanding** | `llava:34b` | `qwen2.5vl:7b` | | **Embeddings** | `nomic-embed-text` | `mxbai-embed-large` | | **Fast iteration / dev evals** | `llama3.1:8b` | `qwen2.5-coder:7b` | --- ## Comprehensive Model Comparison Table All models discussed — detailed capability reference for M4 Pro 48 GB: | Model | Disk | Params | Quant | RAM | Tok/s | JSON | Reasoning | Code | Instruction Following | Context | `` | Status on this machine | | ----------------------- | ------- | ------ | ------ | ------ | --------- | -------------- | ---------- | ---------- | --------------------- | ------- | --------- | -------------------------- | | `llama3.1:8b` | 4.9 GB | 8B | Q4_K_M | ~6 GB | 40–60 | ✅ Good | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | 128k | ❌ | ✅ Installed | | `qwen2.5-coder:32b` | 18.5 GB | 32.8B | Q4_K_M | ~22 GB | 15–25 | ✅ Excellent | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ✅ Installed | | `deepseek-r1:32b` | 20 GB | 32B | Q4_K_M | ~22 GB | 12–20 | ⚠️ Needs strip | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ✅ Yes | 🔲 Not installed | | `llama3.3:70b` (Q4) | 40 GB | 70B | Q4_K_M | ~42 GB | 5–10 | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ⚠️ Tight (6GB left for OS) | | `qwen2.5:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | 40–60 | ✅ Excellent | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ❌ | 🔲 Not installed | | `deepseek-r1:7b` | 5 GB | 7B | Q4_K_M | ~6 GB | 35–50 | ⚠️ Needs strip | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 128k | ✅ Yes | 🔲 Not installed | | `phi4:14b` | 9 GB | 14B | Q4_K_M | ~11 GB | 25–35 | ✅ Good | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 16k | ❌ | 🔲 Not installed | | `deepseek-coder-v2:16b` | 10 GB | 16B | Q4_K_M | ~12 GB | 25–35 | ✅ Good | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | ❌ | 🔲 Not installed | | `codestral:22b` | 13 GB | 22B | Q4_K_M | ~15 GB | 20–30 | ✅ Good | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 32k | ❌ | 🔲 Not installed | | `gemini-2.5-flash` | — | — | Cloud | — | ~1s/req | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 1M | ❌ | ☁️ Cloud ($0.003/run) | | `gpt-4o` | — | — | Cloud | — | ~1–2s/req | ✅ Excellent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | ❌ | ☁️ Cloud ($0.05–0.15/run) | ### Column Key | Column | Meaning | | ------------------------- | -------------------------------------------------------------------- | | **Tok/s** | Tokens per second on M4 Pro 48 GB (Metal backend) | | **JSON** | Reliability of structured JSON output compliance | | **Reasoning** | Multi-step / chain-of-thought quality (⭐ = weak, ⭐⭐⭐⭐⭐ = best) | | **Code** | Code generation quality across TS/Python/Swift | | **Instruction Following** | Adherence to output format constraints | | **``** | Emits reasoning traces before output (needs stripping for JSON) | ### Gap Analysis vs llama3.3:70b (cloud-quality ceiling locally) | Gap | `llama3.1:8b` | `qwen2.5-coder:32b` | `deepseek-r1:32b` | | ---------------------- | :-----------: | :-----------------: | :---------------: | | Multi-step reasoning | ~40% | ~65% | ~80% | | Strict JSON compliance | ~75% | ~95% | ~70%\* | | Brain signal routing | ~60% | ~80% | ~90% | | Code generation | ~55% | ~95% | ~80% | | Instruction following | ~70% | ~90% | ~85% | | **Overall vs 70B** | **~55%** | **~85%** | **~75–80%** | \*With `` strip transform applied --- ## Hardware Guide (General) For reference if running on different hardware: | RAM | Max Model Size | Recommendation | | ------ | -------------- | ------------------------------------- | | 8 GB | 7B | `qwen2.5-coder:7b` | | 16 GB | 13-16B | `deepseek-coder-v2:16b` | | 24 GB | 32B | `qwen2.5-coder:32b` | | 32 GB | 32B + headroom | `qwen2.5-coder:32b` (comfortable) | | 48 GB | 70B (Q4) | `llama3.1:70b` or any 32B comfortably | | 64 GB+ | 70B (Q8) | Full precision 70B models |