saravanakumardb1 464ffb92ec docs(local-llm): add docs index, hardware specs, and whisper-cpp setup

- docs/README.md: documentation index with quick start, file structure, status table
- docs/01-hardware-and-prerequisites.md: M4 Pro 48GB specs, toolchain inventory,
  disk budget, network environment (Forcepoint proxy details)
- docs/03-whisper-cpp-setup.md: whisper-cpp installation, GGML model guide,
  ffmpeg audio conversion, CLI usage, real-time streaming, LysnrAI integration

2026-02-19 13:00:48 -08:00

4.6 KiB

Raw Blame History

01 — Hardware & Prerequisites

Machine specs, installed toolchain, and resource budgets for local LLM inference.

Hardware Specs

Component	Value
Model	MacBook Pro (Mac16,7)
Model Number	Z1FU0002HLL/A
Chip	Apple M4 Pro
CPU Cores	14 (10 Performance + 4 Efficiency)
GPU	Apple Silicon integrated (Metal backend)
Neural Engine	16-core
Memory	48 GB LPDDR5 (unified, shared CPU/GPU)
Memory Manufacturer	Micron
OS	macOS Tahoe (arm64)
Serial	KX6VMGJWM6

Why This Hardware Matters for LLMs

Apple Silicon's unified memory architecture means the GPU and CPU share the same 48 GB pool. This is ideal for LLM inference because:

No PCIe bottleneck copying weights between CPU RAM and VRAM
Models up to ~45 GB can run entirely "on GPU" via Metal
Ollama uses llama.cpp under the hood, which has excellent Metal backend support
The M4 Pro Neural Engine further accelerates certain operations

What You Can Run

RAM Budget	Model Size	Examples
5-8 GB	7B models	qwen2.5-coder:7b, llama3.1:8b, deepseek-coder:6.7b
10-14 GB	13-16B models	deepseek-coder-v2:16b, codestral:22b, phi4:14b
20-24 GB	32B models	qwen2.5-coder:32b, deepseek-r1:32b
40-45 GB	70B models (Q4)	llama3.1:70b — tight, leaves little headroom

Rule of thumb: Keep at least 6-8 GB free for macOS + dev tools (Xcode, VS Code, Docker, etc.).

Installed Toolchain

Verified on 2026-02-19.

Brew Packages

Package	Version	Purpose
`ollama`	0.16.2	LLM inference server (llama.cpp + Metal)
`whisper-cpp`	1.8.3	Local speech-to-text (Whisper GGML)
`ffmpeg`	8.0.1	Audio/video format conversion
`sdl2`	2.32.10	Audio I/O library (whisper-cpp dependency)

Key Binaries

/opt/homebrew/bin/ollama
/opt/homebrew/bin/whisper-cli
/opt/homebrew/bin/whisper-server
/opt/homebrew/bin/whisper-stream
/opt/homebrew/bin/whisper-talk-llama
/opt/homebrew/bin/whisper-bench
/opt/homebrew/bin/whisper-command
/opt/homebrew/bin/whisper-lsp
/opt/homebrew/bin/whisper-quantize
/opt/homebrew/bin/whisper-vad-speech-segments
/opt/homebrew/bin/ffmpeg

Storage Locations

Path	Content
`~/.ollama/models/`	Downloaded Ollama models (~24 GB currently)
`~/whisper-models/`	Whisper GGML model files (empty — proxy blocked download)
`/opt/homebrew/Cellar/`	Brew package binaries

Network Environment

This machine is on a corporate network with a Forcepoint proxy:

Proxy: http://cso.proxy.att.com:8080/
SSL Inspection: Forcepoint CertChecker intercepts HTTPS connections
Impact:
- Ollama model pulls work (Ollama handles proxy natively)
- Hugging Face downloads FAIL (curl, Python requests, huggingface_hub all blocked)
- Brew installs work (brew handles proxy)

Workaround: Download Hugging Face models (e.g., Whisper GGML files) from a personal/home network. See 08-troubleshooting.md.

Disk Space Budget

Approximate allocation for local AI tooling:

Component	Disk Usage
Ollama models (2 installed)	~24 GB
Whisper models (planned)	~1.6 GB
Brew packages (ollama, whisper-cpp, ffmpeg)	~70 MB
Dashboard app (node_modules)	~300 MB
Total	~26 GB

With 10 Ollama models (see 07-model-recommendations.md), expect ~115 GB total disk usage for models.

4.6 KiB Raw Blame History