# Local LLM Stack — Documentation Index

> Complete guide for the local AI inference stack on the ByteLyst development machine.
> Hardware: **Apple M4 Pro · 48 GB LPDDR5 · macOS Tahoe**
> Last updated: 2026-02-21

---

## Quick Start

```bash
# 1. Start Ollama
ollama serve                    # or: brew services start ollama

# 2. Load a model
ollama run qwen2.5-coder:32b   # best coding model for this hardware

# 3. Launch Mission Control dashboard
cd __LOCAL_LLMs/dashboard && npm run dev
# Open http://localhost:3000

# 4. (Optional) Set up TTS
cd __LOCAL_LLMs && bash setup-tts.sh
```

---

## Documentation

| #   | Document                                                     | Description                                                          |
| --- | ------------------------------------------------------------ | -------------------------------------------------------------------- |
| 01  | [Hardware & Prerequisites](01-hardware-and-prerequisites.md) | Machine specs, installed toolchain, disk/RAM budget                  |
| 02  | [Ollama Setup & Models](02-ollama-setup-and-models.md)       | Installation, server config, model management, memory behavior       |
| 03  | [Whisper.cpp Setup](03-whisper-cpp-setup.md)                 | Speech-to-text: installation, models, CLI usage, real-time streaming |
| 04  | [Multimodal Local Stack](04-multimodal-local-stack.md)       | Vision models, audio pipeline, video understanding status            |
| 05  | [Mission Control Dashboard](05-mission-control-dashboard.md) | Next.js dashboard: architecture, API routes, features, running       |
| 06  | [Extraction Service Evals](06-extraction-service-evals.md)   | promptfoo eval suite, Ollama vs Gemini comparison, Python sidecar    |
| 07  | [Model Recommendations](07-model-recommendations.md)         | Tiered model guide by use case, size, and quality for M4 Pro 48GB    |
| 08  | [Troubleshooting & Corporate Proxy](08-troubleshooting.md)   | Common issues, Forcepoint proxy workarounds, MLX warnings            |
| 09  | [Environment Variables](09-environment-variables.md)         | All config vars for Ollama, Whisper, dashboard, evals                |
| 10  | [Text-to-Speech](10-text-to-speech.md)                       | Orpheus TTS via Ollama, Qwen3-TTS 0.6B, setup, corporate proxy       |

---

## Directory Structure

```
__LOCAL_LLMs/
├── README.md                        ← you are here (moved from LOCAL_LLMs_setup_mac_m4_48gb.md)
├── docs/
│   ├── README.md                    ← this index
│   ├── 01-hardware-and-prerequisites.md
│   ├── 02-ollama-setup-and-models.md
│   ├── 03-whisper-cpp-setup.md
│   ├── 04-multimodal-local-stack.md
│   ├── 05-mission-control-dashboard.md
│   ├── 06-extraction-service-evals.md
│   ├── 07-model-recommendations.md
│   ├── 08-troubleshooting.md
│   ├── 09-environment-variables.md
│   └── 10-text-to-speech.md
├── dashboard/                       ← Next.js Mission Control app (port 3000)
│   ├── src/app/(mission-control)/   ← Mission Control page + memory drilldown
│   ├── src/app/api/ollama/route.ts  ← Ollama API proxy (list, load, unload, generate)
│   ├── src/app/api/whisper/route.ts ← Whisper binary/model discovery
│   ├── src/app/api/system/route.ts  ← System info (chip, RAM, disk, brew)
│   └── src/app/api/system/memory/route.ts ← Memory drilldown (vm_stat + top processes)
├── setup-tts.sh                     ← One-shot TTS setup for fresh laptop
├── download-tts-models.sh           ← Download model weights (uses hf-mirror.com)
├── test_orpheus_tts.py              ← Orpheus TTS test (Ollama + SNAC decoder)
├── test_qwen_tts.py                 ← Qwen3-TTS 0.6B test (direct Python, MPS/CPU)
├── .venv-qwen-tts/                  ← Python 3.12 venv for TTS (gitignored)
├── models/                          ← Downloaded TTS model weights (gitignored)
└── LOCAL_LLMs_setup_mac_m4_48gb.md  ← original doc (preserved, see docs/ for latest)
```

---

## Current Installation Status (2026-02-21)

| Component                           | Version    | Status                                     | Disk Usage |
| ----------------------------------- | ---------- | ------------------------------------------ | ---------- |
| Ollama                              | 0.16.2     | ✅ Installed via brew                      | —          |
| qwen2.5-coder:32b                   | —          | ✅ Downloaded                              | 19 GB      |
| qwen2.5-coder:7b                    | —          | ✅ Downloaded                              | 4.7 GB     |
| deepseek-r1:32b                     | —          | ✅ Downloaded                              | 19 GB      |
| llama3.1:8b                         | —          | ✅ Downloaded                              | 4.9 GB     |
| sematre/orpheus:en (TTS)            | —          | ✅ Downloaded via Ollama                   | 4 GB       |
| whisper-cpp                         | 1.8.3      | ✅ Installed via brew                      | 9.6 MB     |
| whisper model (ggml-large-v3-turbo) | —          | ✅ Downloaded via hf-mirror.com            | 1.5 GB     |
| ffmpeg                              | 8.0.1      | ✅ Installed via brew                      | 53.3 MB    |
| Python 3.12 (TTS venv)              | 3.12.12    | ✅ Installed via brew + venv created       | ~2 GB      |
| SNAC decoder (TTS)                  | —          | ✅ Downloaded via hf-mirror.com            | 76 MB      |
| Qwen3-TTS 0.6B                      | —          | ✅ Downloaded via hf-mirror.com            | 1.7 GB     |
| Mission Control Dashboard           | Next.js 16 | ✅ Built, runs on :3000 (memory drilldown) | —          |

---

## Related Resources

- **Extraction service evals:** `services/extraction-service/evals/`
- **Ollama REST API docs:** https://github.com/ollama/ollama/blob/main/docs/api.md
- **Whisper.cpp:** https://github.com/ggerganov/whisper.cpp
- **Hugging Face models:** https://huggingface.co/ggerganov/whisper.cpp/tree/main