# 07 — Model Recommendations > Tiered model guide by use case, size, and quality for Apple M4 Pro with 48 GB unified memory. --- ## Tier 1 — Best Overall Coding Models | Model | Size | RAM Used | Pull Command | Notes | | ----------------------- | ----- | -------- | ------------------------------- | -------------------------------------------------- | | **`qwen2.5-coder:32b`** | 19 GB | ~22 GB | `ollama pull qwen2.5-coder:32b` | **Top pick** — rivals GPT-4o on code, 128k context | | `deepseek-coder-v2:16b` | 10 GB | ~12 GB | `ollama pull deepseek-coder-v2` | Best open-source coding model at 16B | | `codestral:22b` | 13 GB | ~15 GB | `ollama pull codestral` | Mistral's coding model, very fast completions | ## Tier 2 — Fast & Capable (Speed/Quality Balance) | Model | Size | RAM Used | Pull Command | Notes | | ---------------------- | ---- | -------- | --------------------------------- | --------------------------------------------- | | **`qwen2.5-coder:7b`** | 5 GB | ~6 GB | `ollama pull qwen2.5-coder:7b` | Fast, surprisingly good for TS/Python/Swift | | `deepseek-coder:6.7b` | 4 GB | ~5 GB | `ollama pull deepseek-coder:6.7b` | Lightweight, solid everyday coding | | `codegemma:7b` | 5 GB | ~6 GB | `ollama pull codegemma:7b` | Google's model, decent but outclassed by Qwen | ## Tier 3 — General Purpose (Also Good at Code) | Model | Size | RAM Used | Pull Command | Notes | | ------------------- | ------ | -------- | -------------------------- | ----------------------------------- | | `llama3.1:70b` (Q4) | 40 GB | ~42 GB | `ollama pull llama3.1:70b` | Best general model — tight on 48 GB | | `llama3.1:8b` | 4.9 GB | ~6 GB | `ollama pull llama3.1:8b` | Fast, good for evals | | `mistral-nemo:12b` | 7 GB | ~9 GB | `ollama pull mistral-nemo` | Fast reasoning | | `phi4:14b` | 9 GB | ~11 GB | `ollama pull phi4` | Strong reasoning, fits comfortably | ## Tier 4 — Reasoning & Deep Thinking | Model | Size | RAM Used | Pull Command | Notes | | --------------------- | ----- | -------- | ----------------------------- | ------------------------------------------------ | | **`deepseek-r1:32b`** | 20 GB | ~22 GB | `ollama pull deepseek-r1:32b` | Chain-of-thought reasoning, closest to Kimi k1.5 | | `deepseek-r1:7b` | 5 GB | ~6 GB | `ollama pull deepseek-r1:7b` | Lightweight reasoning | ## Tier 5 — Vision (Multimodal) | Model | Size | RAM Used | Pull Command | Notes | | -------------- | ----- | -------- | -------------------------- | ------------------------ | | `llava:34b` | 22 GB | ~22 GB | `ollama pull llava:34b` | Image understanding, OCR | | `qwen2.5vl:7b` | 6 GB | ~6 GB | `ollama pull qwen2.5vl:7b` | Qwen vision, fast | | `minicpm-v:8b` | 6 GB | ~6 GB | `ollama pull minicpm-v:8b` | Strong OCR | | `moondream2` | 2 GB | ~2 GB | `ollama pull moondream2` | Tiny, basic vision | ## Tier 6 — Embeddings | Model | Size | RAM Used | Pull Command | Notes | | ------------------- | ------ | -------- | ------------------------------- | ------------------------- | | `nomic-embed-text` | 0.3 GB | ~0.5 GB | `ollama pull nomic-embed-text` | Good for semantic search | | `mxbai-embed-large` | 0.7 GB | ~1 GB | `ollama pull mxbai-embed-large` | Higher quality embeddings | --- ## Recommended 10-Model Stack for M4 Pro 48 GB For maximum coverage across all use cases: | # | Model | Disk | Use Case | | --- | ----------------------- | ----------- | ---------------------------------------- | | 1 | `qwen2.5-coder:32b` | 19 GB | **Primary** — coding (TS, Python, Swift) | | 2 | `qwen2.5-coder:7b` | 5 GB | Fast coding completions | | 3 | `deepseek-coder-v2:16b` | 10 GB | Alternative coding model | | 4 | `llama3.1:8b` | 4.9 GB | Eval default, general tasks | | 5 | `deepseek-r1:32b` | 20 GB | Deep reasoning, complex triage | | 6 | `codestral:22b` | 13 GB | Fast code completions (Mistral) | | 7 | `phi4:14b` | 9 GB | Reasoning, structured output | | 8 | `llava:34b` | 22 GB | Vision / image understanding | | 9 | `mistral-nemo:12b` | 7 GB | Fast general purpose | | 10 | `nomic-embed-text` | 0.3 GB | Embeddings / semantic search | | | **Total** | **~115 GB** | | Only one loads into RAM at a time. You can have all 10 on disk simultaneously. --- ## By Use Case (Quick Reference) | Use Case | Best Model | Fallback | | -------------------------- | ------------------- | ----------------------- | | **TypeScript/ESM coding** | `qwen2.5-coder:32b` | `qwen2.5-coder:7b` | | **Python coding** | `qwen2.5-coder:32b` | `deepseek-coder-v2:16b` | | **Swift/iOS coding** | `qwen2.5-coder:32b` | `codestral:22b` | | **Extraction evals** | `llama3.1:8b` | `qwen2.5:7b` | | **JSON structured output** | `qwen2.5:7b` | `qwen2.5-coder:7b` | | **Complex reasoning** | `deepseek-r1:32b` | `phi4:14b` | | **Image understanding** | `llava:34b` | `qwen2.5vl:7b` | | **Embeddings** | `nomic-embed-text` | `mxbai-embed-large` | | **Fast iteration** | `qwen2.5-coder:7b` | `llama3.1:8b` | --- ## Hardware Guide (General) For reference if running on different hardware: | RAM | Max Model Size | Recommendation | | ------ | -------------- | ------------------------------------- | | 8 GB | 7B | `qwen2.5-coder:7b` | | 16 GB | 13-16B | `deepseek-coder-v2:16b` | | 24 GB | 32B | `qwen2.5-coder:32b` | | 32 GB | 32B + headroom | `qwen2.5-coder:32b` (comfortable) | | 48 GB | 70B (Q4) | `llama3.1:70b` or any 32B comfortably | | 64 GB+ | 70B (Q8) | Full precision 70B models |