learning_ai_common_plat/__LOCAL_LLMs/docs/01-hardware-and-prerequisites.md
saravanakumardb1 464ffb92ec docs(local-llm): add docs index, hardware specs, and whisper-cpp setup
- docs/README.md: documentation index with quick start, file structure, status table
- docs/01-hardware-and-prerequisites.md: M4 Pro 48GB specs, toolchain inventory,
  disk budget, network environment (Forcepoint proxy details)
- docs/03-whisper-cpp-setup.md: whisper-cpp installation, GGML model guide,
  ffmpeg audio conversion, CLI usage, real-time streaming, LysnrAI integration
2026-02-19 13:00:48 -08:00

4.6 KiB

01 — Hardware & Prerequisites

Machine specs, installed toolchain, and resource budgets for local LLM inference.


Hardware Specs

Component Value
Model MacBook Pro (Mac16,7)
Model Number Z1FU0002HLL/A
Chip Apple M4 Pro
CPU Cores 14 (10 Performance + 4 Efficiency)
GPU Apple Silicon integrated (Metal backend)
Neural Engine 16-core
Memory 48 GB LPDDR5 (unified, shared CPU/GPU)
Memory Manufacturer Micron
OS macOS Tahoe (arm64)
Serial KX6VMGJWM6

Why This Hardware Matters for LLMs

Apple Silicon's unified memory architecture means the GPU and CPU share the same 48 GB pool. This is ideal for LLM inference because:

  1. No PCIe bottleneck copying weights between CPU RAM and VRAM
  2. Models up to ~45 GB can run entirely "on GPU" via Metal
  3. Ollama uses llama.cpp under the hood, which has excellent Metal backend support
  4. The M4 Pro Neural Engine further accelerates certain operations

What You Can Run

RAM Budget Model Size Examples
5-8 GB 7B models qwen2.5-coder:7b, llama3.1:8b, deepseek-coder:6.7b
10-14 GB 13-16B models deepseek-coder-v2:16b, codestral:22b, phi4:14b
20-24 GB 32B models qwen2.5-coder:32b, deepseek-r1:32b
40-45 GB 70B models (Q4) llama3.1:70b — tight, leaves little headroom

Rule of thumb: Keep at least 6-8 GB free for macOS + dev tools (Xcode, VS Code, Docker, etc.).


Installed Toolchain

Verified on 2026-02-19.

Brew Packages

Package Version Purpose
ollama 0.16.2 LLM inference server (llama.cpp + Metal)
whisper-cpp 1.8.3 Local speech-to-text (Whisper GGML)
ffmpeg 8.0.1 Audio/video format conversion
sdl2 2.32.10 Audio I/O library (whisper-cpp dependency)

Key Binaries

/opt/homebrew/bin/ollama
/opt/homebrew/bin/whisper-cli
/opt/homebrew/bin/whisper-server
/opt/homebrew/bin/whisper-stream
/opt/homebrew/bin/whisper-talk-llama
/opt/homebrew/bin/whisper-bench
/opt/homebrew/bin/whisper-command
/opt/homebrew/bin/whisper-lsp
/opt/homebrew/bin/whisper-quantize
/opt/homebrew/bin/whisper-vad-speech-segments
/opt/homebrew/bin/ffmpeg

Storage Locations

Path Content
~/.ollama/models/ Downloaded Ollama models (~24 GB currently)
~/whisper-models/ Whisper GGML model files (empty — proxy blocked download)
/opt/homebrew/Cellar/ Brew package binaries

Network Environment

This machine is on a corporate network with a Forcepoint proxy:

  • Proxy: http://cso.proxy.att.com:8080/
  • SSL Inspection: Forcepoint CertChecker intercepts HTTPS connections
  • Impact:
    • Ollama model pulls work (Ollama handles proxy natively)
    • Hugging Face downloads FAIL (curl, Python requests, huggingface_hub all blocked)
    • Brew installs work (brew handles proxy)

Workaround: Download Hugging Face models (e.g., Whisper GGML files) from a personal/home network. See 08-troubleshooting.md.


Disk Space Budget

Approximate allocation for local AI tooling:

Component Disk Usage
Ollama models (2 installed) ~24 GB
Whisper models (planned) ~1.6 GB
Brew packages (ollama, whisper-cpp, ffmpeg) ~70 MB
Dashboard app (node_modules) ~300 MB
Total ~26 GB

With 10 Ollama models (see 07-model-recommendations.md), expect ~115 GB total disk usage for models.