- docs/README.md: documentation index with quick start, file structure, status table - docs/01-hardware-and-prerequisites.md: M4 Pro 48GB specs, toolchain inventory, disk budget, network environment (Forcepoint proxy details) - docs/03-whisper-cpp-setup.md: whisper-cpp installation, GGML model guide, ffmpeg audio conversion, CLI usage, real-time streaming, LysnrAI integration
4.6 KiB
4.6 KiB
01 — Hardware & Prerequisites
Machine specs, installed toolchain, and resource budgets for local LLM inference.
Hardware Specs
| Component | Value |
|---|---|
| Model | MacBook Pro (Mac16,7) |
| Model Number | Z1FU0002HLL/A |
| Chip | Apple M4 Pro |
| CPU Cores | 14 (10 Performance + 4 Efficiency) |
| GPU | Apple Silicon integrated (Metal backend) |
| Neural Engine | 16-core |
| Memory | 48 GB LPDDR5 (unified, shared CPU/GPU) |
| Memory Manufacturer | Micron |
| OS | macOS Tahoe (arm64) |
| Serial | KX6VMGJWM6 |
Why This Hardware Matters for LLMs
Apple Silicon's unified memory architecture means the GPU and CPU share the same 48 GB pool. This is ideal for LLM inference because:
- No PCIe bottleneck copying weights between CPU RAM and VRAM
- Models up to ~45 GB can run entirely "on GPU" via Metal
- Ollama uses
llama.cppunder the hood, which has excellent Metal backend support - The M4 Pro Neural Engine further accelerates certain operations
What You Can Run
| RAM Budget | Model Size | Examples |
|---|---|---|
| 5-8 GB | 7B models | qwen2.5-coder:7b, llama3.1:8b, deepseek-coder:6.7b |
| 10-14 GB | 13-16B models | deepseek-coder-v2:16b, codestral:22b, phi4:14b |
| 20-24 GB | 32B models | qwen2.5-coder:32b, deepseek-r1:32b |
| 40-45 GB | 70B models (Q4) | llama3.1:70b — tight, leaves little headroom |
Rule of thumb: Keep at least 6-8 GB free for macOS + dev tools (Xcode, VS Code, Docker, etc.).
Installed Toolchain
Verified on 2026-02-19.
Brew Packages
| Package | Version | Purpose |
|---|---|---|
ollama |
0.16.2 | LLM inference server (llama.cpp + Metal) |
whisper-cpp |
1.8.3 | Local speech-to-text (Whisper GGML) |
ffmpeg |
8.0.1 | Audio/video format conversion |
sdl2 |
2.32.10 | Audio I/O library (whisper-cpp dependency) |
Key Binaries
/opt/homebrew/bin/ollama
/opt/homebrew/bin/whisper-cli
/opt/homebrew/bin/whisper-server
/opt/homebrew/bin/whisper-stream
/opt/homebrew/bin/whisper-talk-llama
/opt/homebrew/bin/whisper-bench
/opt/homebrew/bin/whisper-command
/opt/homebrew/bin/whisper-lsp
/opt/homebrew/bin/whisper-quantize
/opt/homebrew/bin/whisper-vad-speech-segments
/opt/homebrew/bin/ffmpeg
Storage Locations
| Path | Content |
|---|---|
~/.ollama/models/ |
Downloaded Ollama models (~24 GB currently) |
~/whisper-models/ |
Whisper GGML model files (empty — proxy blocked download) |
/opt/homebrew/Cellar/ |
Brew package binaries |
Network Environment
This machine is on a corporate network with a Forcepoint proxy:
- Proxy:
http://cso.proxy.att.com:8080/ - SSL Inspection: Forcepoint CertChecker intercepts HTTPS connections
- Impact:
- Ollama model pulls work (Ollama handles proxy natively)
- Hugging Face downloads FAIL (curl, Python requests, huggingface_hub all blocked)
- Brew installs work (brew handles proxy)
Workaround: Download Hugging Face models (e.g., Whisper GGML files) from a personal/home network. See 08-troubleshooting.md.
Disk Space Budget
Approximate allocation for local AI tooling:
| Component | Disk Usage |
|---|---|
| Ollama models (2 installed) | ~24 GB |
| Whisper models (planned) | ~1.6 GB |
| Brew packages (ollama, whisper-cpp, ffmpeg) | ~70 MB |
| Dashboard app (node_modules) | ~300 MB |
| Total | ~26 GB |
With 10 Ollama models (see 07-model-recommendations.md), expect ~115 GB total disk usage for models.