# 07 — Model Recommendations

> Tiered model guide by use case, size, and quality for Apple M4 Pro with 48 GB unified memory.

---

## Tier 1 — Best Overall Coding Models

| Model                   | Size  | RAM Used | Pull Command                    | Notes                                              |
| ----------------------- | ----- | -------- | ------------------------------- | -------------------------------------------------- |
| **`qwen2.5-coder:32b`** | 19 GB | ~22 GB   | `ollama pull qwen2.5-coder:32b` | **Top pick** — rivals GPT-4o on code, 128k context |
| `deepseek-coder-v2:16b` | 10 GB | ~12 GB   | `ollama pull deepseek-coder-v2` | Best open-source coding model at 16B               |
| `codestral:22b`         | 13 GB | ~15 GB   | `ollama pull codestral`         | Mistral's coding model, very fast completions      |

## Tier 2 — Fast & Capable (Speed/Quality Balance)

| Model                  | Size | RAM Used | Pull Command                      | Notes                                         |
| ---------------------- | ---- | -------- | --------------------------------- | --------------------------------------------- |
| **`qwen2.5-coder:7b`** | 5 GB | ~6 GB    | `ollama pull qwen2.5-coder:7b`    | Fast, surprisingly good for TS/Python/Swift   |
| `deepseek-coder:6.7b`  | 4 GB | ~5 GB    | `ollama pull deepseek-coder:6.7b` | Lightweight, solid everyday coding            |
| `codegemma:7b`         | 5 GB | ~6 GB    | `ollama pull codegemma:7b`        | Google's model, decent but outclassed by Qwen |

## Tier 3 — General Purpose (Also Good at Code)

| Model               | Size   | RAM Used | Pull Command               | Notes                               |
| ------------------- | ------ | -------- | -------------------------- | ----------------------------------- |
| `llama3.1:70b` (Q4) | 40 GB  | ~42 GB   | `ollama pull llama3.1:70b` | Best general model — tight on 48 GB |
| `llama3.1:8b`       | 4.9 GB | ~6 GB    | `ollama pull llama3.1:8b`  | Fast, good for evals                |
| `mistral-nemo:12b`  | 7 GB   | ~9 GB    | `ollama pull mistral-nemo` | Fast reasoning                      |
| `phi4:14b`          | 9 GB   | ~11 GB   | `ollama pull phi4`         | Strong reasoning, fits comfortably  |

## Tier 4 — Reasoning & Deep Thinking

| Model                 | Size  | RAM Used | Pull Command                  | Notes                                            |
| --------------------- | ----- | -------- | ----------------------------- | ------------------------------------------------ |
| **`deepseek-r1:32b`** | 20 GB | ~22 GB   | `ollama pull deepseek-r1:32b` | Chain-of-thought reasoning, closest to Kimi k1.5 |
| `deepseek-r1:7b`      | 5 GB  | ~6 GB    | `ollama pull deepseek-r1:7b`  | Lightweight reasoning                            |

## Tier 5 — Vision (Multimodal)

| Model          | Size  | RAM Used | Pull Command               | Notes                    |
| -------------- | ----- | -------- | -------------------------- | ------------------------ |
| `llava:34b`    | 22 GB | ~22 GB   | `ollama pull llava:34b`    | Image understanding, OCR |
| `qwen2.5vl:7b` | 6 GB  | ~6 GB    | `ollama pull qwen2.5vl:7b` | Qwen vision, fast        |
| `minicpm-v:8b` | 6 GB  | ~6 GB    | `ollama pull minicpm-v:8b` | Strong OCR               |
| `moondream2`   | 2 GB  | ~2 GB    | `ollama pull moondream2`   | Tiny, basic vision       |

## Tier 6 — Embeddings

| Model               | Size   | RAM Used | Pull Command                    | Notes                     |
| ------------------- | ------ | -------- | ------------------------------- | ------------------------- |
| `nomic-embed-text`  | 0.3 GB | ~0.5 GB  | `ollama pull nomic-embed-text`  | Good for semantic search  |
| `mxbai-embed-large` | 0.7 GB | ~1 GB    | `ollama pull mxbai-embed-large` | Higher quality embeddings |

---

## Recommended 10-Model Stack for M4 Pro 48 GB

For maximum coverage across all use cases:

| #   | Model                   | Disk        | Use Case                                 |
| --- | ----------------------- | ----------- | ---------------------------------------- |
| 1   | `qwen2.5-coder:32b`     | 19 GB       | **Primary** — coding (TS, Python, Swift) |
| 2   | `qwen2.5-coder:7b`      | 5 GB        | Fast coding completions                  |
| 3   | `deepseek-coder-v2:16b` | 10 GB       | Alternative coding model                 |
| 4   | `llama3.1:8b`           | 4.9 GB      | Eval default, general tasks              |
| 5   | `deepseek-r1:32b`       | 20 GB       | Deep reasoning, complex triage           |
| 6   | `codestral:22b`         | 13 GB       | Fast code completions (Mistral)          |
| 7   | `phi4:14b`              | 9 GB        | Reasoning, structured output             |
| 8   | `llava:34b`             | 22 GB       | Vision / image understanding             |
| 9   | `mistral-nemo:12b`      | 7 GB        | Fast general purpose                     |
| 10  | `nomic-embed-text`      | 0.3 GB      | Embeddings / semantic search             |
|     | **Total**               | **~115 GB** |                                          |

Only one loads into RAM at a time. You can have all 10 on disk simultaneously.

---

## By Use Case (Quick Reference)

| Use Case                   | Best Model          | Fallback                |
| -------------------------- | ------------------- | ----------------------- |
| **TypeScript/ESM coding**  | `qwen2.5-coder:32b` | `qwen2.5-coder:7b`      |
| **Python coding**          | `qwen2.5-coder:32b` | `deepseek-coder-v2:16b` |
| **Swift/iOS coding**       | `qwen2.5-coder:32b` | `codestral:22b`         |
| **Extraction evals**       | `llama3.1:8b`       | `qwen2.5:7b`            |
| **JSON structured output** | `qwen2.5:7b`        | `qwen2.5-coder:7b`      |
| **Complex reasoning**      | `deepseek-r1:32b`   | `phi4:14b`              |
| **Image understanding**    | `llava:34b`         | `qwen2.5vl:7b`          |
| **Embeddings**             | `nomic-embed-text`  | `mxbai-embed-large`     |
| **Fast iteration**         | `qwen2.5-coder:7b`  | `llama3.1:8b`           |

---

## Hardware Guide (General)

For reference if running on different hardware:

| RAM    | Max Model Size | Recommendation                        |
| ------ | -------------- | ------------------------------------- |
| 8 GB   | 7B             | `qwen2.5-coder:7b`                    |
| 16 GB  | 13-16B         | `deepseek-coder-v2:16b`               |
| 24 GB  | 32B            | `qwen2.5-coder:32b`                   |
| 32 GB  | 32B + headroom | `qwen2.5-coder:32b` (comfortable)     |
| 48 GB  | 70B (Q4)       | `llama3.1:70b` or any 32B comfortably |
| 64 GB+ | 70B (Q8)       | Full precision 70B models             |