6.1 KiB
6.1 KiB
Local LLM Stack — Documentation Index
Complete guide for the local AI inference stack on the ByteLyst development machine. Hardware: Apple M4 Pro · 48 GB LPDDR5 · macOS Tahoe Last updated: 2026-02-21
Quick Start
# 1. Start Ollama
ollama serve # or: brew services start ollama
# 2. Load a model
ollama run qwen2.5-coder:32b # best coding model for this hardware
# 3. Launch Mission Control dashboard
cd __LOCAL_LLMs/dashboard && npm run dev
# Open http://localhost:3000
# 4. (Optional) Set up TTS
cd __LOCAL_LLMs && bash setup-tts.sh
Documentation
| # | Document | Description |
|---|---|---|
| 01 | Hardware & Prerequisites | Machine specs, installed toolchain, disk/RAM budget |
| 02 | Ollama Setup & Models | Installation, server config, model management, memory behavior |
| 03 | Whisper.cpp Setup | Speech-to-text: installation, models, CLI usage, real-time streaming |
| 04 | Multimodal Local Stack | Vision models, audio pipeline, video understanding status |
| 05 | Mission Control Dashboard | Next.js dashboard: architecture, API routes, features, running |
| 06 | Extraction Service Evals | promptfoo eval suite, Ollama vs Gemini comparison, Python sidecar |
| 07 | Model Recommendations | Tiered model guide by use case, size, and quality for M4 Pro 48GB |
| 08 | Troubleshooting & Corporate Proxy | Common issues, Forcepoint proxy workarounds, MLX warnings |
| 09 | Environment Variables | All config vars for Ollama, Whisper, dashboard, evals |
| 10 | Text-to-Speech | Orpheus TTS via Ollama, Qwen3-TTS 0.6B, setup, corporate proxy |
Directory Structure
__LOCAL_LLMs/
├── README.md ← you are here (moved from LOCAL_LLMs_setup_mac_m4_48gb.md)
├── docs/
│ ├── README.md ← this index
│ ├── 01-hardware-and-prerequisites.md
│ ├── 02-ollama-setup-and-models.md
│ ├── 03-whisper-cpp-setup.md
│ ├── 04-multimodal-local-stack.md
│ ├── 05-mission-control-dashboard.md
│ ├── 06-extraction-service-evals.md
│ ├── 07-model-recommendations.md
│ ├── 08-troubleshooting.md
│ ├── 09-environment-variables.md
│ └── 10-text-to-speech.md
├── dashboard/ ← Next.js Mission Control app (port 3000)
│ ├── src/app/(mission-control)/ ← Mission Control page + memory drilldown
│ ├── src/app/api/ollama/route.ts ← Ollama API proxy (list, load, unload, generate)
│ ├── src/app/api/whisper/route.ts ← Whisper binary/model discovery
│ ├── src/app/api/system/route.ts ← System info (chip, RAM, disk, brew)
│ └── src/app/api/system/memory/route.ts ← Memory drilldown (vm_stat + top processes)
├── setup-tts.sh ← One-shot TTS setup for fresh laptop
├── download-tts-models.sh ← Download model weights (uses hf-mirror.com)
├── test_orpheus_tts.py ← Orpheus TTS test (Ollama + SNAC decoder)
├── test_qwen_tts.py ← Qwen3-TTS 0.6B test (direct Python, MPS/CPU)
├── .venv-qwen-tts/ ← Python 3.12 venv for TTS (gitignored)
├── models/ ← Downloaded TTS model weights (gitignored)
└── LOCAL_LLMs_setup_mac_m4_48gb.md ← original doc (preserved, see docs/ for latest)
Current Installation Status (2026-02-21)
| Component | Version | Status | Disk Usage |
|---|---|---|---|
| Ollama | 0.16.2 | ✅ Installed via brew | — |
| qwen2.5-coder:32b | — | ✅ Downloaded | 19 GB |
| qwen2.5-coder:7b | — | ✅ Downloaded | 4.7 GB |
| deepseek-r1:32b | — | ✅ Downloaded | 19 GB |
| llama3.1:8b | — | ✅ Downloaded | 4.9 GB |
| sematre/orpheus:en (TTS) | — | ✅ Downloaded via Ollama | 4 GB |
| whisper-cpp | 1.8.3 | ✅ Installed via brew | 9.6 MB |
| whisper model (ggml-large-v3-turbo) | — | ✅ Downloaded via hf-mirror.com | 1.5 GB |
| ffmpeg | 8.0.1 | ✅ Installed via brew | 53.3 MB |
| Python 3.12 (TTS venv) | 3.12.12 | ✅ Installed via brew + venv created | ~2 GB |
| SNAC decoder (TTS) | — | ✅ Downloaded via hf-mirror.com | 76 MB |
| Qwen3-TTS 0.6B | — | ✅ Downloaded via hf-mirror.com | 1.7 GB |
| Mission Control Dashboard | Next.js 16 | ✅ Built, runs on :3000 (memory drilldown) | — |
Related Resources
- Extraction service evals:
services/extraction-service/evals/ - Ollama REST API docs: https://github.com/ollama/ollama/blob/main/docs/api.md
- Whisper.cpp: https://github.com/ggerganov/whisper.cpp
- Hugging Face models: https://huggingface.co/ggerganov/whisper.cpp/tree/main