saravanakumardb1 464ffb92ec docs(local-llm): add docs index, hardware specs, and whisper-cpp setup

- docs/README.md: documentation index with quick start, file structure, status table
- docs/01-hardware-and-prerequisites.md: M4 Pro 48GB specs, toolchain inventory,
  disk budget, network environment (Forcepoint proxy details)
- docs/03-whisper-cpp-setup.md: whisper-cpp installation, GGML model guide,
  ffmpeg audio conversion, CLI usage, real-time streaming, LysnrAI integration

2026-02-19 13:00:48 -08:00

5.7 KiB

Raw Blame History

03 — Whisper.cpp Setup

Local speech-to-text: installation, GGML models, CLI usage, real-time streaming, and ffmpeg.

Installation

brew install whisper-cpp

Version installed: 1.8.3
Dependency installed: sdl2 2.32.10 (audio I/O)
Binary location: /opt/homebrew/bin/whisper-*

Installed Binaries

Binary	Purpose
`whisper-cli`	Main CLI — transcribe audio files
`whisper-server`	HTTP server — POST audio, get JSON transcription
`whisper-stream`	Real-time microphone streaming transcription
`whisper-talk-llama`	Voice → Whisper → LLaMA → TTS pipeline
`whisper-bench`	Benchmark a model on your hardware
`whisper-command`	Voice command detection
`whisper-lsp`	Language Server Protocol integration
`whisper-quantize`	Quantize models to smaller formats
`whisper-vad-speech-segments`	Voice Activity Detection — split audio by speech

Note: The binary is whisper-cli, NOT whisper-cpp. The brew formula name differs from the binary name.

GGML Model Files

Whisper.cpp requires separate GGML model files (not included with brew install).

Model Size Guide

Model	File	Disk Size	Speed	Accuracy
Tiny (English)	`ggml-tiny.en.bin`	75 MB	Blazing	Basic
Base (English)	`ggml-base.en.bin`	142 MB	Very fast	Good
Medium (English)	`ggml-medium.en.bin`	1.5 GB	Fast	Great
Large v3	`ggml-large-v3.bin`	3.1 GB	Moderate	Best
Large v3 Turbo	`ggml-large-v3-turbo.bin`	1.6 GB	Fast	Best (distilled)

Recommended for M4 Pro: ggml-large-v3-turbo — best accuracy at half the size of large-v3, Metal-accelerated.

Download Models

Models are stored in ~/whisper-models/.

mkdir -p ~/whisper-models

# Recommended: Large v3 Turbo (~1.6 GB)
curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin

# Alternative mirror
curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \
  https://ggml.ggerganov.com/whisper/ggml-large-v3-turbo.bin

Download sources:

Current Status (2026-02-19)

❌ Model download blocked by corporate proxy (Forcepoint CertChecker intercepts Hugging Face HTTPS). Download from personal/home network required. See 08-troubleshooting.md.

Audio Format Requirements

Whisper.cpp requires WAV format input (16-bit PCM, ideally 16 kHz mono).

ffmpeg Installation

brew install ffmpeg

Version installed: 8.0.1 (with dav1d, lame, libvpx, opus, svt-av1, x264, x265)

Converting Audio Files

# m4a → wav (16kHz mono, optimal for Whisper)
ffmpeg -i input.m4a -ar 16000 -ac 1 output.wav

# mp3 → wav
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

# Any format → wav
ffmpeg -i input.ogg -ar 16000 -ac 1 output.wav

Tested Conversion (2026-02-19)

ffmpeg -i '/Users/sd9235/Downloads/New Recording.m4a' \
       -ar 16000 -ac 1 \
       '/Users/sd9235/Downloads/recording.wav'
# Result: 181 KB, 5.80 seconds, 16kHz mono

Usage

File Transcription

whisper-cli \
  --model ~/whisper-models/ggml-large-v3-turbo.bin \
  --language en \
  --file /path/to/audio.wav

Real-Time Microphone Streaming

whisper-stream \
  --model ~/whisper-models/ggml-large-v3-turbo.bin \
  --language en

This is particularly relevant for LysnrAI — real-time mic transcription locally, no Azure Speech SDK needed for dev/testing.

HTTP Server Mode

# Start server on port 8080
whisper-server \
  --model ~/whisper-models/ggml-large-v3-turbo.bin \
  --port 8080

# POST audio to get transcription
curl -X POST http://localhost:8080/inference \
  -F "file=@audio.wav" \
  -F "response_format=json"

Voice Activity Detection

whisper-vad-speech-segments \
  --model ~/whisper-models/ggml-large-v3-turbo.bin \
  --file recording.wav

Benchmarking

whisper-bench --model ~/whisper-models/ggml-large-v3-turbo.bin

Integration with LysnrAI

The local Whisper stack can serve as an offline fallback or dev replacement for Azure Speech SDK:

Component	Azure (production)	Whisper.cpp (local dev)
Real-time STT	Azure Speech SDK	`whisper-stream`
File transcription	Azure Batch	`whisper-cli`
HTTP API	Azure REST API	`whisper-server`
Cost	Pay-per-use	$0.00 (local)
Latency	~200ms (network)	~50ms (local Metal)
Languages	100+	100+ (same Whisper model)

Potential Integration Points

Desktop app (src/audio/azure_stt.py): Add local Whisper backend option
iOS keyboard (LysnrKeyboard/): Use on-device Whisper for offline dictation
Extraction service evals: Transcribe test audio fixtures locally

5.7 KiB Raw Blame History