# 01 — Hardware & Prerequisites

> Machine specs, installed toolchain, and resource budgets for local LLM inference.

---

## Hardware Specs

| Component               | Value                                    |
| ----------------------- | ---------------------------------------- |
| **Model**               | MacBook Pro (Mac16,7)                    |
| **Model Number**        | Z1FU0002HLL/A                            |
| **Chip**                | Apple M4 Pro                             |
| **CPU Cores**           | 14 (10 Performance + 4 Efficiency)       |
| **GPU**                 | Apple Silicon integrated (Metal backend) |
| **Neural Engine**       | 16-core                                  |
| **Memory**              | 48 GB LPDDR5 (unified, shared CPU/GPU)   |
| **Memory Manufacturer** | Micron                                   |
| **OS**                  | macOS Tahoe (arm64)                      |
| **Serial**              | KX6VMGJWM6                               |

### Why This Hardware Matters for LLMs

Apple Silicon's **unified memory architecture** means the GPU and CPU share the same 48 GB pool. This is ideal for LLM inference because:

1. No PCIe bottleneck copying weights between CPU RAM and VRAM
2. Models up to ~45 GB can run entirely "on GPU" via Metal
3. Ollama uses `llama.cpp` under the hood, which has excellent Metal backend support
4. The M4 Pro Neural Engine further accelerates certain operations

### What You Can Run

| RAM Budget | Model Size      | Examples                                           |
| ---------- | --------------- | -------------------------------------------------- |
| 5-8 GB     | 7B models       | qwen2.5-coder:7b, llama3.1:8b, deepseek-coder:6.7b |
| 10-14 GB   | 13-16B models   | deepseek-coder-v2:16b, codestral:22b, phi4:14b     |
| 20-24 GB   | 32B models      | qwen2.5-coder:32b, deepseek-r1:32b                 |
| 40-45 GB   | 70B models (Q4) | llama3.1:70b — tight, leaves little headroom       |

**Rule of thumb:** Keep at least 6-8 GB free for macOS + dev tools (Xcode, VS Code, Docker, etc.).

---

## Installed Toolchain

Verified on 2026-02-19.

### Brew Packages

| Package       | Version | Purpose                                    |
| ------------- | ------- | ------------------------------------------ |
| `ollama`      | 0.16.2  | LLM inference server (llama.cpp + Metal)   |
| `whisper-cpp` | 1.8.3   | Local speech-to-text (Whisper GGML)        |
| `ffmpeg`      | 8.0.1   | Audio/video format conversion              |
| `sdl2`        | 2.32.10 | Audio I/O library (whisper-cpp dependency) |

### Key Binaries

```
/opt/homebrew/bin/ollama
/opt/homebrew/bin/whisper-cli
/opt/homebrew/bin/whisper-server
/opt/homebrew/bin/whisper-stream
/opt/homebrew/bin/whisper-talk-llama
/opt/homebrew/bin/whisper-bench
/opt/homebrew/bin/whisper-command
/opt/homebrew/bin/whisper-lsp
/opt/homebrew/bin/whisper-quantize
/opt/homebrew/bin/whisper-vad-speech-segments
/opt/homebrew/bin/ffmpeg
```

### Storage Locations

| Path                    | Content                                                   |
| ----------------------- | --------------------------------------------------------- |
| `~/.ollama/models/`     | Downloaded Ollama models (~24 GB currently)               |
| `~/whisper-models/`     | Whisper GGML model files (empty — proxy blocked download) |
| `/opt/homebrew/Cellar/` | Brew package binaries                                     |

---

## Network Environment

This machine is on a **corporate network** with a Forcepoint proxy:

- **Proxy:** `http://cso.proxy.att.com:8080/`
- **SSL Inspection:** Forcepoint CertChecker intercepts HTTPS connections
- **Impact:**
  - Ollama model pulls work (Ollama handles proxy natively)
  - Hugging Face downloads FAIL (curl, Python requests, huggingface_hub all blocked)
  - Brew installs work (brew handles proxy)

**Workaround:** Download Hugging Face models (e.g., Whisper GGML files) from a personal/home network. See [08-troubleshooting.md](08-troubleshooting.md).

---

## Disk Space Budget

Approximate allocation for local AI tooling:

| Component                                   | Disk Usage |
| ------------------------------------------- | ---------- |
| Ollama models (2 installed)                 | ~24 GB     |
| Whisper models (planned)                    | ~1.6 GB    |
| Brew packages (ollama, whisper-cpp, ffmpeg) | ~70 MB     |
| Dashboard app (node_modules)                | ~300 MB    |
| **Total**                                   | **~26 GB** |

With 10 Ollama models (see [07-model-recommendations.md](07-model-recommendations.md)), expect **~115 GB** total disk usage for models.