# 03 — Whisper.cpp Setup > Local speech-to-text: installation, GGML models, CLI usage, real-time streaming, and ffmpeg. --- ## Installation ```bash brew install whisper-cpp ``` - **Version installed:** 1.8.3 - **Dependency installed:** sdl2 2.32.10 (audio I/O) - **Binary location:** `/opt/homebrew/bin/whisper-*` ### Installed Binaries | Binary | Purpose | | ----------------------------- | ------------------------------------------------ | | `whisper-cli` | **Main CLI** — transcribe audio files | | `whisper-server` | HTTP server — POST audio, get JSON transcription | | `whisper-stream` | **Real-time** microphone streaming transcription | | `whisper-talk-llama` | Voice → Whisper → LLaMA → TTS pipeline | | `whisper-bench` | Benchmark a model on your hardware | | `whisper-command` | Voice command detection | | `whisper-lsp` | Language Server Protocol integration | | `whisper-quantize` | Quantize models to smaller formats | | `whisper-vad-speech-segments` | Voice Activity Detection — split audio by speech | > **Note:** The binary is `whisper-cli`, NOT `whisper-cpp`. The brew formula name differs from the binary name. --- ## GGML Model Files Whisper.cpp requires separate GGML model files (not included with brew install). ### Model Size Guide | Model | File | Disk Size | Speed | Accuracy | | ------------------ | ----------------------------- | ---------- | --------- | -------------------- | | Tiny (English) | `ggml-tiny.en.bin` | 75 MB | Blazing | Basic | | Base (English) | `ggml-base.en.bin` | 142 MB | Very fast | Good | | Medium (English) | `ggml-medium.en.bin` | 1.5 GB | Fast | Great | | Large v3 | `ggml-large-v3.bin` | 3.1 GB | Moderate | Best | | **Large v3 Turbo** | **`ggml-large-v3-turbo.bin`** | **1.6 GB** | **Fast** | **Best (distilled)** | **Recommended for M4 Pro:** `ggml-large-v3-turbo` — best accuracy at half the size of large-v3, Metal-accelerated. ### Download Models Models are stored in `~/whisper-models/`. ```bash mkdir -p ~/whisper-models # Recommended: Large v3 Turbo (~1.6 GB) curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin # Alternative mirror curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \ https://ggml.ggerganov.com/whisper/ggml-large-v3-turbo.bin ``` **Download sources:** - https://huggingface.co/ggerganov/whisper.cpp/tree/main - https://ggml.ggerganov.com/ ### Current Status (2026-02-19) ❌ **Model download blocked by corporate proxy** (Forcepoint CertChecker intercepts Hugging Face HTTPS). Download from personal/home network required. See [08-troubleshooting.md](08-troubleshooting.md). --- ## Audio Format Requirements Whisper.cpp requires **WAV** format input (16-bit PCM, ideally 16 kHz mono). ### ffmpeg Installation ```bash brew install ffmpeg ``` Version installed: 8.0.1 (with dav1d, lame, libvpx, opus, svt-av1, x264, x265) ### Converting Audio Files ```bash # m4a → wav (16kHz mono, optimal for Whisper) ffmpeg -i input.m4a -ar 16000 -ac 1 output.wav # mp3 → wav ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav # Any format → wav ffmpeg -i input.ogg -ar 16000 -ac 1 output.wav ``` ### Tested Conversion (2026-02-19) ```bash ffmpeg -i '/Users/sd9235/Downloads/New Recording.m4a' \ -ar 16000 -ac 1 \ '/Users/sd9235/Downloads/recording.wav' # Result: 181 KB, 5.80 seconds, 16kHz mono ``` --- ## Usage ### File Transcription ```bash whisper-cli \ --model ~/whisper-models/ggml-large-v3-turbo.bin \ --language en \ --file /path/to/audio.wav ``` ### Real-Time Microphone Streaming ```bash whisper-stream \ --model ~/whisper-models/ggml-large-v3-turbo.bin \ --language en ``` This is particularly relevant for **LysnrAI** — real-time mic transcription locally, no Azure Speech SDK needed for dev/testing. ### HTTP Server Mode ```bash # Start server on port 8080 whisper-server \ --model ~/whisper-models/ggml-large-v3-turbo.bin \ --port 8080 # POST audio to get transcription curl -X POST http://localhost:8080/inference \ -F "file=@audio.wav" \ -F "response_format=json" ``` ### Voice Activity Detection ```bash whisper-vad-speech-segments \ --model ~/whisper-models/ggml-large-v3-turbo.bin \ --file recording.wav ``` ### Benchmarking ```bash whisper-bench --model ~/whisper-models/ggml-large-v3-turbo.bin ``` --- ## Integration with LysnrAI The local Whisper stack can serve as an **offline fallback** or **dev replacement** for Azure Speech SDK: | Component | Azure (production) | Whisper.cpp (local dev) | | ------------------ | ------------------ | ------------------------- | | Real-time STT | Azure Speech SDK | `whisper-stream` | | File transcription | Azure Batch | `whisper-cli` | | HTTP API | Azure REST API | `whisper-server` | | Cost | Pay-per-use | $0.00 (local) | | Latency | ~200ms (network) | ~50ms (local Metal) | | Languages | 100+ | 100+ (same Whisper model) | ### Potential Integration Points 1. **Desktop app** (`src/audio/azure_stt.py`): Add local Whisper backend option 2. **iOS keyboard** (`LysnrKeyboard/`): Use on-device Whisper for offline dictation 3. **Extraction service evals**: Transcribe test audio fixtures locally