Local LLM Stack — Documentation Index
Complete guide for the local AI inference stack on the ByteLyst development machine.
Hardware: Apple M4 Pro · 48 GB LPDDR5 · macOS Tahoe
Last updated: 2026-02-19
Quick Start
# 1. Start Ollama
ollama serve # or: brew services start ollama
# 2. Load a model
ollama run qwen2.5-coder:32b # best coding model for this hardware
# 3. Launch Mission Control dashboard
cd __LOCAL_LLMs/dashboard && npm run dev -- -p 3100
# Open http://localhost:3100
Documentation
| # |
Document |
Description |
| 01 |
Hardware & Prerequisites |
Machine specs, installed toolchain, disk/RAM budget |
| 02 |
Ollama Setup & Models |
Installation, server config, model management, memory behavior |
| 03 |
Whisper.cpp Setup |
Speech-to-text: installation, models, CLI usage, real-time streaming |
| 04 |
Multimodal Local Stack |
Vision models, audio pipeline, video understanding status |
| 05 |
Mission Control Dashboard |
Next.js dashboard: architecture, API routes, features, running |
| 06 |
Extraction Service Evals |
promptfoo eval suite, Ollama vs Gemini comparison, Python sidecar |
| 07 |
Model Recommendations |
Tiered model guide by use case, size, and quality for M4 Pro 48GB |
| 08 |
Troubleshooting & Corporate Proxy |
Common issues, Forcepoint proxy workarounds, MLX warnings |
| 09 |
Environment Variables |
All config vars for Ollama, Whisper, dashboard, evals |
Directory Structure
__LOCAL_LLMs/
├── README.md ← you are here (moved from LOCAL_LLMs_setup_mac_m4_48gb.md)
├── docs/
│ ├── README.md ← this index
│ ├── 01-hardware-and-prerequisites.md
│ ├── 02-ollama-setup-and-models.md
│ ├── 03-whisper-cpp-setup.md
│ ├── 04-multimodal-local-stack.md
│ ├── 05-mission-control-dashboard.md
│ ├── 06-extraction-service-evals.md
│ ├── 07-model-recommendations.md
│ ├── 08-troubleshooting.md
│ └── 09-environment-variables.md
├── dashboard/ ← Next.js Mission Control app (port 3100)
│ ├── src/app/page.tsx ← main dashboard UI
│ ├── src/app/api/ollama/route.ts ← Ollama API proxy (list, load, unload, generate)
│ ├── src/app/api/whisper/route.ts ← Whisper binary/model discovery
│ └── src/app/api/system/route.ts ← System info (chip, RAM, disk, brew)
└── LOCAL_LLMs_setup_mac_m4_48gb.md ← original doc (preserved, see docs/ for latest)
Current Installation Status (2026-02-19)
| Component |
Version |
Status |
Disk Usage |
| Ollama |
0.16.2 |
✅ Installed via brew |
— |
| qwen2.5-coder:32b |
— |
✅ Downloaded |
19 GB |
| llama3.1:8b |
— |
✅ Downloaded |
4.9 GB |
| whisper-cpp |
1.8.3 |
✅ Installed via brew |
9.6 MB |
| whisper model (ggml-large-v3-turbo) |
— |
❌ Blocked by corporate proxy |
— |
| ffmpeg |
8.0.1 |
✅ Installed via brew |
53.3 MB |
| Mission Control Dashboard |
Next.js 16 |
✅ Built, runs on :3100 |
— |
Related Resources