learning_ai_common_plat/__LOCAL_LLMs/docs/07-model-recommendations.md
saravanakumardb1 3561deee52 docs(local-llm): add multimodal stack, model recommendations, and troubleshooting
- docs/04-multimodal-local-stack.md: vision models (llava, qwen2.5vl, moondream2),
  audio pipeline architecture, video understanding status, Kimi alternatives,
  complete local AI stack diagram
- docs/07-model-recommendations.md: 6-tier model guide (coding, fast, general,
  reasoning, vision, embeddings), recommended 10-model stack for M4 Pro 48GB,
  use-case quick reference, hardware scaling guide
- docs/08-troubleshooting.md: corporate Forcepoint proxy workarounds, MLX warning,
  JSON parse errors, slow inference, whisper-cli vs whisper-cpp naming, audio
  format conversion, proxy-corrupted downloads detection
2026-02-19 13:01:22 -08:00

6.6 KiB

07 — Model Recommendations

Tiered model guide by use case, size, and quality for Apple M4 Pro with 48 GB unified memory.


Tier 1 — Best Overall Coding Models

Model Size RAM Used Pull Command Notes
qwen2.5-coder:32b 19 GB ~22 GB ollama pull qwen2.5-coder:32b Top pick — rivals GPT-4o on code, 128k context
deepseek-coder-v2:16b 10 GB ~12 GB ollama pull deepseek-coder-v2 Best open-source coding model at 16B
codestral:22b 13 GB ~15 GB ollama pull codestral Mistral's coding model, very fast completions

Tier 2 — Fast & Capable (Speed/Quality Balance)

Model Size RAM Used Pull Command Notes
qwen2.5-coder:7b 5 GB ~6 GB ollama pull qwen2.5-coder:7b Fast, surprisingly good for TS/Python/Swift
deepseek-coder:6.7b 4 GB ~5 GB ollama pull deepseek-coder:6.7b Lightweight, solid everyday coding
codegemma:7b 5 GB ~6 GB ollama pull codegemma:7b Google's model, decent but outclassed by Qwen

Tier 3 — General Purpose (Also Good at Code)

Model Size RAM Used Pull Command Notes
llama3.1:70b (Q4) 40 GB ~42 GB ollama pull llama3.1:70b Best general model — tight on 48 GB
llama3.1:8b 4.9 GB ~6 GB ollama pull llama3.1:8b Fast, good for evals
mistral-nemo:12b 7 GB ~9 GB ollama pull mistral-nemo Fast reasoning
phi4:14b 9 GB ~11 GB ollama pull phi4 Strong reasoning, fits comfortably

Tier 4 — Reasoning & Deep Thinking

Model Size RAM Used Pull Command Notes
deepseek-r1:32b 20 GB ~22 GB ollama pull deepseek-r1:32b Chain-of-thought reasoning, closest to Kimi k1.5
deepseek-r1:7b 5 GB ~6 GB ollama pull deepseek-r1:7b Lightweight reasoning

Tier 5 — Vision (Multimodal)

Model Size RAM Used Pull Command Notes
llava:34b 22 GB ~22 GB ollama pull llava:34b Image understanding, OCR
qwen2.5vl:7b 6 GB ~6 GB ollama pull qwen2.5vl:7b Qwen vision, fast
minicpm-v:8b 6 GB ~6 GB ollama pull minicpm-v:8b Strong OCR
moondream2 2 GB ~2 GB ollama pull moondream2 Tiny, basic vision

Tier 6 — Embeddings

Model Size RAM Used Pull Command Notes
nomic-embed-text 0.3 GB ~0.5 GB ollama pull nomic-embed-text Good for semantic search
mxbai-embed-large 0.7 GB ~1 GB ollama pull mxbai-embed-large Higher quality embeddings

For maximum coverage across all use cases:

# Model Disk Use Case
1 qwen2.5-coder:32b 19 GB Primary — coding (TS, Python, Swift)
2 qwen2.5-coder:7b 5 GB Fast coding completions
3 deepseek-coder-v2:16b 10 GB Alternative coding model
4 llama3.1:8b 4.9 GB Eval default, general tasks
5 deepseek-r1:32b 20 GB Deep reasoning, complex triage
6 codestral:22b 13 GB Fast code completions (Mistral)
7 phi4:14b 9 GB Reasoning, structured output
8 llava:34b 22 GB Vision / image understanding
9 mistral-nemo:12b 7 GB Fast general purpose
10 nomic-embed-text 0.3 GB Embeddings / semantic search
Total ~115 GB

Only one loads into RAM at a time. You can have all 10 on disk simultaneously.


By Use Case (Quick Reference)

Use Case Best Model Fallback
TypeScript/ESM coding qwen2.5-coder:32b qwen2.5-coder:7b
Python coding qwen2.5-coder:32b deepseek-coder-v2:16b
Swift/iOS coding qwen2.5-coder:32b codestral:22b
Extraction evals llama3.1:8b qwen2.5:7b
JSON structured output qwen2.5:7b qwen2.5-coder:7b
Complex reasoning deepseek-r1:32b phi4:14b
Image understanding llava:34b qwen2.5vl:7b
Embeddings nomic-embed-text mxbai-embed-large
Fast iteration qwen2.5-coder:7b llama3.1:8b

Hardware Guide (General)

For reference if running on different hardware:

RAM Max Model Size Recommendation
8 GB 7B qwen2.5-coder:7b
16 GB 13-16B deepseek-coder-v2:16b
24 GB 32B qwen2.5-coder:32b
32 GB 32B + headroom qwen2.5-coder:32b (comfortable)
48 GB 70B (Q4) llama3.1:70b or any 32B comfortably
64 GB+ 70B (Q8) Full precision 70B models