Local LLM Stack — Documentation Index
Complete guide for the local AI inference stack on the ByteLyst development machine.
Hardware: Apple M4 Pro · 48 GB LPDDR5 · macOS Tahoe
Last updated: 2026-02-21
Quick Start
# 1. Start Ollama
ollama serve # or: brew services start ollama
# 2. Load a model
ollama run qwen2.5-coder:32b # best coding model for this hardware
# 3. Launch Mission Control dashboard
cd __LOCAL_LLMs/dashboard && npm run dev
# Open http://localhost:3000
# 4. (Optional) Set up TTS
cd __LOCAL_LLMs && bash setup-tts.sh
Documentation
| # |
Document |
Description |
| 01 |
Hardware & Prerequisites |
Machine specs, installed toolchain, disk/RAM budget |
| 02 |
Ollama Setup & Models |
Installation, server config, model management, memory behavior |
| 03 |
Whisper.cpp Setup |
Speech-to-text: installation, models, CLI usage, real-time streaming |
| 04 |
Multimodal Local Stack |
Vision models, audio pipeline, video understanding status |
| 05 |
Mission Control Dashboard |
Next.js dashboard: architecture, API routes, features, running |
| 06 |
Extraction Service Evals |
promptfoo eval suite, Ollama vs Gemini comparison, Python sidecar |
| 07 |
Model Recommendations |
Tiered model guide by use case, size, and quality for M4 Pro 48GB |
| 08 |
Troubleshooting & Corporate Proxy |
Common issues, Forcepoint proxy workarounds, MLX warnings |
| 09 |
Environment Variables |
All config vars for Ollama, Whisper, dashboard, evals |
| 10 |
Text-to-Speech |
Orpheus TTS via Ollama, Qwen3-TTS 0.6B, setup, corporate proxy |
Directory Structure
__LOCAL_LLMs/
├── README.md ← you are here (moved from LOCAL_LLMs_setup_mac_m4_48gb.md)
├── docs/
│ ├── README.md ← this index
│ ├── 01-hardware-and-prerequisites.md
│ ├── 02-ollama-setup-and-models.md
│ ├── 03-whisper-cpp-setup.md
│ ├── 04-multimodal-local-stack.md
│ ├── 05-mission-control-dashboard.md
│ ├── 06-extraction-service-evals.md
│ ├── 07-model-recommendations.md
│ ├── 08-troubleshooting.md
│ ├── 09-environment-variables.md
│ └── 10-text-to-speech.md
├── dashboard/ ← Next.js Mission Control app (port 3000)
│ ├── src/app/(mission-control)/ ← Mission Control page + memory drilldown
│ ├── src/app/api/ollama/route.ts ← Ollama API proxy (list, load, unload, generate)
│ ├── src/app/api/whisper/route.ts ← Whisper binary/model discovery
│ ├── src/app/api/system/route.ts ← System info (chip, RAM, disk, brew)
│ └── src/app/api/system/memory/route.ts ← Memory drilldown (vm_stat + top processes)
├── setup-tts.sh ← One-shot TTS setup for fresh laptop
├── download-tts-models.sh ← Download model weights (uses hf-mirror.com)
├── test_orpheus_tts.py ← Orpheus TTS test (Ollama + SNAC decoder)
├── test_qwen_tts.py ← Qwen3-TTS 0.6B test (direct Python, MPS/CPU)
├── .venv-qwen-tts/ ← Python 3.12 venv for TTS (gitignored)
├── models/ ← Downloaded TTS model weights (gitignored)
└── LOCAL_LLMs_setup_mac_m4_48gb.md ← original doc (preserved, see docs/ for latest)
Current Installation Status (2026-02-21)
| Component |
Version |
Status |
Disk Usage |
| Ollama |
0.16.2 |
✅ Installed via brew |
— |
| qwen2.5-coder:32b |
— |
✅ Downloaded |
19 GB |
| qwen2.5-coder:7b |
— |
✅ Downloaded |
4.7 GB |
| deepseek-r1:32b |
— |
✅ Downloaded |
19 GB |
| llama3.1:8b |
— |
✅ Downloaded |
4.9 GB |
| sematre/orpheus:en (TTS) |
— |
✅ Downloaded via Ollama |
4 GB |
| whisper-cpp |
1.8.3 |
✅ Installed via brew |
9.6 MB |
| whisper model (ggml-large-v3-turbo) |
— |
✅ Downloaded via hf-mirror.com |
1.5 GB |
| ffmpeg |
8.0.1 |
✅ Installed via brew |
53.3 MB |
| Python 3.12 (TTS venv) |
3.12.12 |
✅ Installed via brew + venv created |
~2 GB |
| SNAC decoder (TTS) |
— |
✅ Downloaded via hf-mirror.com |
76 MB |
| Qwen3-TTS 0.6B |
— |
✅ Downloaded via hf-mirror.com |
1.7 GB |
| Mission Control Dashboard |
Next.js 16 |
✅ Built, runs on :3000 (memory drilldown) |
— |
Related Resources