# 01 — Hardware & Prerequisites > Machine specs, installed toolchain, and resource budgets for local LLM inference. --- ## Hardware Specs | Component | Value | | ----------------------- | ---------------------------------------- | | **Model** | MacBook Pro (Mac16,7) | | **Model Number** | Z1FU0002HLL/A | | **Chip** | Apple M4 Pro | | **CPU Cores** | 14 (10 Performance + 4 Efficiency) | | **GPU** | Apple Silicon integrated (Metal backend) | | **Neural Engine** | 16-core | | **Memory** | 48 GB LPDDR5 (unified, shared CPU/GPU) | | **Memory Manufacturer** | Micron | | **OS** | macOS Tahoe (arm64) | | **Serial** | KX6VMGJWM6 | ### Why This Hardware Matters for LLMs Apple Silicon's **unified memory architecture** means the GPU and CPU share the same 48 GB pool. This is ideal for LLM inference because: 1. No PCIe bottleneck copying weights between CPU RAM and VRAM 2. Models up to ~45 GB can run entirely "on GPU" via Metal 3. Ollama uses `llama.cpp` under the hood, which has excellent Metal backend support 4. The M4 Pro Neural Engine further accelerates certain operations ### What You Can Run | RAM Budget | Model Size | Examples | | ---------- | --------------- | -------------------------------------------------- | | 5-8 GB | 7B models | qwen2.5-coder:7b, llama3.1:8b, deepseek-coder:6.7b | | 10-14 GB | 13-16B models | deepseek-coder-v2:16b, codestral:22b, phi4:14b | | 20-24 GB | 32B models | qwen2.5-coder:32b, deepseek-r1:32b | | 40-45 GB | 70B models (Q4) | llama3.1:70b — tight, leaves little headroom | **Rule of thumb:** Keep at least 6-8 GB free for macOS + dev tools (Xcode, VS Code, Docker, etc.). --- ## Installed Toolchain Verified on 2026-02-19. ### Brew Packages | Package | Version | Purpose | | ------------- | ------- | ------------------------------------------ | | `ollama` | 0.16.2 | LLM inference server (llama.cpp + Metal) | | `whisper-cpp` | 1.8.3 | Local speech-to-text (Whisper GGML) | | `ffmpeg` | 8.0.1 | Audio/video format conversion | | `sdl2` | 2.32.10 | Audio I/O library (whisper-cpp dependency) | ### Key Binaries ``` /opt/homebrew/bin/ollama /opt/homebrew/bin/whisper-cli /opt/homebrew/bin/whisper-server /opt/homebrew/bin/whisper-stream /opt/homebrew/bin/whisper-talk-llama /opt/homebrew/bin/whisper-bench /opt/homebrew/bin/whisper-command /opt/homebrew/bin/whisper-lsp /opt/homebrew/bin/whisper-quantize /opt/homebrew/bin/whisper-vad-speech-segments /opt/homebrew/bin/ffmpeg ``` ### Storage Locations | Path | Content | | ----------------------- | --------------------------------------------------------- | | `~/.ollama/models/` | Downloaded Ollama models (~24 GB currently) | | `~/whisper-models/` | Whisper GGML model files (empty — proxy blocked download) | | `/opt/homebrew/Cellar/` | Brew package binaries | --- ## Network Environment This machine is on a **corporate network** with a Forcepoint proxy: - **Proxy:** `http://cso.proxy.att.com:8080/` - **SSL Inspection:** Forcepoint CertChecker intercepts HTTPS connections - **Impact:** - Ollama model pulls work (Ollama handles proxy natively) - Hugging Face downloads FAIL (curl, Python requests, huggingface_hub all blocked) - Brew installs work (brew handles proxy) **Workaround:** Download Hugging Face models (e.g., Whisper GGML files) from a personal/home network. See [08-troubleshooting.md](08-troubleshooting.md). --- ## Disk Space Budget Approximate allocation for local AI tooling: | Component | Disk Usage | | ------------------------------------------- | ---------- | | Ollama models (2 installed) | ~24 GB | | Whisper models (planned) | ~1.6 GB | | Brew packages (ollama, whisper-cpp, ffmpeg) | ~70 MB | | Dashboard app (node_modules) | ~300 MB | | **Total** | **~26 GB** | With 10 Ollama models (see [07-model-recommendations.md](07-model-recommendations.md)), expect **~115 GB** total disk usage for models.