# Local LLM Stack — Documentation Index > Complete guide for the local AI inference stack on the ByteLyst development machine. > Hardware: **Apple M4 Pro · 48 GB LPDDR5 · macOS Tahoe** > Last updated: 2026-02-21 --- ## Quick Start ```bash # 1. Start Ollama ollama serve # or: brew services start ollama # 2. Load a model ollama run qwen2.5-coder:32b # best coding model for this hardware # 3. Launch Mission Control dashboard cd __LOCAL_LLMs/dashboard && npm run dev # Open http://localhost:3000 # 4. (Optional) Set up TTS cd __LOCAL_LLMs && bash setup-tts.sh ``` --- ## Documentation | # | Document | Description | | --- | ------------------------------------------------------------ | -------------------------------------------------------------------- | | 01 | [Hardware & Prerequisites](01-hardware-and-prerequisites.md) | Machine specs, installed toolchain, disk/RAM budget | | 02 | [Ollama Setup & Models](02-ollama-setup-and-models.md) | Installation, server config, model management, memory behavior | | 03 | [Whisper.cpp Setup](03-whisper-cpp-setup.md) | Speech-to-text: installation, models, CLI usage, real-time streaming | | 04 | [Multimodal Local Stack](04-multimodal-local-stack.md) | Vision models, audio pipeline, video understanding status | | 05 | [Mission Control Dashboard](05-mission-control-dashboard.md) | Next.js dashboard: architecture, API routes, features, running | | 06 | [Extraction Service Evals](06-extraction-service-evals.md) | promptfoo eval suite, Ollama vs Gemini comparison, Python sidecar | | 07 | [Model Recommendations](07-model-recommendations.md) | Tiered model guide by use case, size, and quality for M4 Pro 48GB | | 08 | [Troubleshooting & Corporate Proxy](08-troubleshooting.md) | Common issues, Forcepoint proxy workarounds, MLX warnings | | 09 | [Environment Variables](09-environment-variables.md) | All config vars for Ollama, Whisper, dashboard, evals | | 10 | [Text-to-Speech](10-text-to-speech.md) | Orpheus TTS via Ollama, Qwen3-TTS 0.6B, setup, corporate proxy | --- ## Directory Structure ``` __LOCAL_LLMs/ ├── README.md ← you are here (moved from LOCAL_LLMs_setup_mac_m4_48gb.md) ├── docs/ │ ├── README.md ← this index │ ├── 01-hardware-and-prerequisites.md │ ├── 02-ollama-setup-and-models.md │ ├── 03-whisper-cpp-setup.md │ ├── 04-multimodal-local-stack.md │ ├── 05-mission-control-dashboard.md │ ├── 06-extraction-service-evals.md │ ├── 07-model-recommendations.md │ ├── 08-troubleshooting.md │ ├── 09-environment-variables.md │ └── 10-text-to-speech.md ├── dashboard/ ← Next.js Mission Control app (port 3000) │ ├── src/app/(mission-control)/ ← Mission Control page + memory drilldown │ ├── src/app/api/ollama/route.ts ← Ollama API proxy (list, load, unload, generate) │ ├── src/app/api/whisper/route.ts ← Whisper binary/model discovery │ ├── src/app/api/system/route.ts ← System info (chip, RAM, disk, brew) │ └── src/app/api/system/memory/route.ts ← Memory drilldown (vm_stat + top processes) ├── setup-tts.sh ← One-shot TTS setup for fresh laptop ├── download-tts-models.sh ← Download model weights (uses hf-mirror.com) ├── test_orpheus_tts.py ← Orpheus TTS test (Ollama + SNAC decoder) ├── test_qwen_tts.py ← Qwen3-TTS 0.6B test (direct Python, MPS/CPU) ├── .venv-qwen-tts/ ← Python 3.12 venv for TTS (gitignored) ├── models/ ← Downloaded TTS model weights (gitignored) └── LOCAL_LLMs_setup_mac_m4_48gb.md ← original doc (preserved, see docs/ for latest) ``` --- ## Current Installation Status (2026-02-21) | Component | Version | Status | Disk Usage | | ----------------------------------- | ---------- | ------------------------------------------ | ---------- | | Ollama | 0.16.2 | ✅ Installed via brew | — | | qwen2.5-coder:32b | — | ✅ Downloaded | 19 GB | | qwen2.5-coder:7b | — | ✅ Downloaded | 4.7 GB | | deepseek-r1:32b | — | ✅ Downloaded | 19 GB | | llama3.1:8b | — | ✅ Downloaded | 4.9 GB | | sematre/orpheus:en (TTS) | — | ✅ Downloaded via Ollama | 4 GB | | whisper-cpp | 1.8.3 | ✅ Installed via brew | 9.6 MB | | whisper model (ggml-large-v3-turbo) | — | ✅ Downloaded via hf-mirror.com | 1.5 GB | | ffmpeg | 8.0.1 | ✅ Installed via brew | 53.3 MB | | Python 3.12 (TTS venv) | 3.12.12 | ✅ Installed via brew + venv created | ~2 GB | | SNAC decoder (TTS) | — | ✅ Downloaded via hf-mirror.com | 76 MB | | Qwen3-TTS 0.6B | — | ✅ Downloaded via hf-mirror.com | 1.7 GB | | Mission Control Dashboard | Next.js 16 | ✅ Built, runs on :3000 (memory drilldown) | — | --- ## Related Resources - **Extraction service evals:** `services/extraction-service/evals/` - **Ollama REST API docs:** https://github.com/ollama/ollama/blob/main/docs/api.md - **Whisper.cpp:** https://github.com/ggerganov/whisper.cpp - **Hugging Face models:** https://huggingface.co/ggerganov/whisper.cpp/tree/main