diff --git a/__LOCAL_LLMs/windows_specific/mac-vs-windows-comparison.md b/__LOCAL_LLMs/windows_specific/mac-vs-windows-comparison.md new file mode 100644 index 00000000..fd63110c --- /dev/null +++ b/__LOCAL_LLMs/windows_specific/mac-vs-windows-comparison.md @@ -0,0 +1,374 @@ +# Mac vs Windows — Side-by-Side Machine Comparison + +> **Purpose:** Detailed capability comparison of both development machines for local AI/ML workloads, software development, and daily operations. +> **Date:** February 2026 + +--- + +## Hardware Specifications + +``` +┌─────────────────────────┬──────────────────────────────┬──────────────────────────────────┐ +│ │ MacBook Pro 16" │ Razer Blade 18 │ +│ │ (Current daily driver) │ (New Windows machine) │ +├─────────────────────────┼──────────────────────────────┼──────────────────────────────────┤ +│ CPU │ Apple M4 Pro │ Intel Core Ultra 9 275HX │ +│ CPU Cores │ 14 (10P + 4E) │ 24 (8P + 16E) │ +│ CPU Threads │ 14 │ 24 │ +│ CPU TDP │ ~40W (efficiency-first) │ ~55W base, ~157W turbo │ +│ │ │ │ +│ GPU │ Apple M4 Pro (integrated) │ NVIDIA RTX 5090 (discrete) │ +│ GPU Cores │ 20 Metal cores │ ~18,000–20,000 CUDA cores │ +│ GPU Memory │ Shared (from 48 GB unified) │ 24 GB GDDR7 (dedicated) │ +│ GPU Compute API │ Metal / MPS │ CUDA 13.x / Tensor cores │ +│ AI Accelerator │ 16-core Neural Engine │ NVIDIA Tensor cores (5th/6th gen) │ +│ │ │ + Intel NPU │ +│ │ │ │ +│ RAM │ 48 GB unified (LPDDR5X) │ 64 GB DDR5-5600 │ +│ RAM Bandwidth │ ~273 GB/s │ ~90–120 GB/s │ +│ RAM Architecture │ Unified (CPU+GPU shared) │ Separate (CPU RAM + GPU VRAM) │ +│ │ │ │ +│ Storage │ 1 TB NVMe │ 4 TB NVMe (2×2 TB) │ +│ Storage Speed │ ~7,400 MB/s read │ ~7,000–14,000 MB/s read │ +│ │ │ │ +│ Display │ 16" Liquid Retina XDR │ 18" Dual-Mode │ +│ │ 3456×2234, 120Hz ProMotion │ UHD+ 240Hz / FHD+ 440Hz │ +│ │ │ │ +│ OS │ macOS Sequoia │ Windows 11 Home + WSL2 │ +│ Weight │ ~2.1 kg │ ~3.1 kg │ +│ Price │ ~$3,500 │ $5,200 │ +└─────────────────────────┴──────────────────────────────┴──────────────────────────────────┘ +``` + +--- + +## AI / ML Model Capability + +### Models Currently Installed (Both Machines) + +``` +┌──────────────────────────┬─────────┬──────────────────────────┬──────────────────────────┐ +│ Model │ Size │ Mac (M4 Pro 48 GB) │ Razer (RTX 5090 24 GB) │ +├──────────────────────────┼─────────┼──────────────────────────┼──────────────────────────┤ +│ llama3.1:8b │ 4.9 GB │ ✅ Full GPU (MPS) │ ✅ Full GPU (CUDA) │ +│ qwen2.5-coder:7b │ 4.7 GB │ ✅ Full GPU (MPS) │ ✅ Full GPU (CUDA) │ +│ sematre/orpheus:en │ 4.0 GB │ ✅ Full GPU (MPS) │ ✅ Full GPU (CUDA) │ +│ qwen2.5-coder:32b │ 19 GB │ ✅ Full GPU (MPS) │ ✅ Full GPU (CUDA) │ +│ deepseek-r1:32b │ 19 GB │ ✅ Full GPU (MPS) │ ✅ Full GPU (CUDA) │ +├──────────────────────────┼─────────┼──────────────────────────┼──────────────────────────┤ +│ Total VRAM needed │ ~52 GB │ Shared from 48 GB pool │ 24 GB dedicated VRAM │ +│ (loaded one at a time) │ │ (any single model fits) │ (any single model fits) │ +└──────────────────────────┴─────────┴──────────────────────────┴──────────────────────────┘ +``` + +### Performance Estimates (Inference Speed) + +``` +┌─────────────────────────────┬─────────────────────────┬─────────────────────────┬────────┐ +│ Workload │ Mac M4 Pro │ Razer RTX 5090 │ Winner │ +├─────────────────────────────┼─────────────────────────┼─────────────────────────┼────────┤ +│ qwen2.5-coder:32b (coding) │ ~15–25 tok/s │ ~40–60 tok/s │ 🟦 Win │ +│ qwen2.5-coder:7b (fast) │ ~40–60 tok/s │ ~80–120 tok/s │ 🟦 Win │ +│ deepseek-r1:32b (reasoning) │ ~15–25 tok/s │ ~40–60 tok/s │ 🟦 Win │ +│ llama3.1:8b (general) │ ~50–70 tok/s │ ~100–150 tok/s │ 🟦 Win │ +│ Orpheus TTS (voice) │ ~realtime (MPS) │ ~2–3× realtime (CUDA) │ 🟦 Win │ +│ Qwen3-TTS (voice) │ ~realtime (MPS) │ ~2–4× realtime (CUDA) │ 🟦 Win │ +│ Whisper large-v3-turbo │ ~2–4× realtime (Metal) │ ~8–15× realtime (CUDA) │ 🟦 Win │ +│ Stable Diffusion XL │ ~30s/image (MPS) │ ~5–8s/image (CUDA) │ 🟦 Win │ +├─────────────────────────────┼─────────────────────────┼─────────────────────────┼────────┤ +│ SUMMARY │ Capable, good speed │ 2–4× faster on all │ 🟦 Win │ +│ │ │ GPU-bound workloads │ │ +└─────────────────────────────┴─────────────────────────┴─────────────────────────┴────────┘ +``` + +### What Larger Models Can Each Run? + +``` +┌────────────────────────────┬────────────────────────────┬────────────────────────────┐ +│ Model │ Mac M4 Pro (48 GB unified) │ Razer (24 GB VRAM + 64 RAM)│ +├────────────────────────────┼────────────────────────────┼────────────────────────────┤ +│ 7–8B models (Q4) │ ✅ Fast, fully in GPU │ ✅ Very fast, fully in GPU │ +│ 13B models (Q4) │ ✅ Fast, fully in GPU │ ✅ Very fast, fully in GPU │ +│ 32B models (Q4) │ ✅ Fits, good speed │ ✅ Fits in VRAM, very fast │ +│ 70B models (Q4, ~40 GB) │ ⚠️ Fits tight, slow │ ⚠️ Partial GPU + RAM offload│ +│ │ (~5–10 tok/s) │ (~10–20 tok/s) │ +│ 70B models (Q8, ~70 GB) │ ❌ Won't fit │ ⚠️ Mostly RAM, slow │ +│ 2× models simultaneously │ ⚠️ Only small models │ ⚠️ Only if total <24 GB │ +│ 3+ models simultaneously │ ❌ Not practical │ ❌ Not practical │ +└────────────────────────────┴────────────────────────────┴────────────────────────────┘ +``` + +**Key insight:** The Mac's unified memory architecture lets it load a 40 GB model (like 70B Q4) into GPU memory because CPU and GPU share the same 48 GB pool. The Razer has a hard 24 GB VRAM ceiling — anything beyond that spills to CPU RAM over PCIe, which is much slower. For models ≤32B, the Razer wins on raw speed. For 70B models, the Mac may actually provide a smoother (though slower) experience because it avoids the CPU↔GPU data transfer penalty. + +--- + +## Software Development Capabilities + +``` +┌──────────────────────────────┬──────────────────────┬──────────────────────┬──────────┐ +│ Capability │ Mac │ Windows + WSL2 │ Notes │ +├──────────────────────────────┼──────────────────────┼──────────────────────┼──────────┤ +│ Node.js / TypeScript │ ✅ Native │ ✅ WSL2 │ Same │ +│ Python 3.12 │ ✅ Native │ ✅ WSL2 │ Same │ +│ Fastify / Next.js dev │ ✅ Native │ ✅ WSL2 │ Same │ +│ Docker / Compose │ ✅ Docker Desktop │ ✅ Docker Desktop WSL2│ Same │ +│ Git / GitHub │ ✅ Native │ ✅ WSL2 │ Same │ +│ pnpm workspace builds │ ✅ Native │ ✅ WSL2 │ Same │ +│ Xcode / iOS builds │ ✅ Native │ ❌ Not available │ Mac only │ +│ SwiftUI development │ ✅ Native │ ❌ Not available │ Mac only │ +│ Android Studio / builds │ ✅ Native │ ✅ Native or WSL2 │ Same │ +│ VS Code / Windsurf / Cursor │ ✅ Native │ ✅ Native + WSL2 ext │ Same │ +│ CUDA development │ ❌ No NVIDIA GPU │ ✅ Native CUDA 13.x │ Win only │ +│ TensorRT optimization │ ❌ Not available │ ✅ RTX 5090 │ Win only │ +│ Unreal Engine 5 │ ⚠️ Limited (Metal) │ ✅ Full (DX12, CUDA) │ Win best │ +│ Blender GPU rendering │ ⚠️ Metal (slower) │ ✅ CUDA (much faster) │ Win best │ +│ Shell scripting (bash) │ ✅ Native │ ✅ WSL2 │ Same │ +│ PowerShell │ ⚠️ pwsh available │ ✅ Native │ Win best │ +│ k6 load testing │ ✅ Native │ ✅ WSL2 │ Same │ +│ pytest / vitest │ ✅ Native │ ✅ WSL2 │ Same │ +└──────────────────────────────┴──────────────────────┴──────────────────────┴──────────┘ +``` + +--- + +## Local LLM Stack Operations + +``` +┌──────────────────────────────┬───────────────────────┬───────────────────────┬──────────┐ +│ Operation │ Mac │ Windows + WSL2 │ Winner │ +├──────────────────────────────┼───────────────────────┼───────────────────────┼──────────┤ +│ Ollama model serving │ ✅ Native (Metal) │ ✅ Native Win (CUDA) │ 🟦 Win │ +│ Mission Control dashboard │ ✅ bash start-dash... │ ✅ bash start-dash... │ Tie │ +│ setup-tts.sh │ ✅ MPS PyTorch │ ✅ CUDA PyTorch │ 🟦 Win │ +│ Orpheus TTS generation │ ✅ ~realtime │ ✅ ~2–3× realtime │ 🟦 Win │ +│ Qwen3-TTS generation │ ✅ ~realtime │ ✅ ~2–4× realtime │ 🟦 Win │ +│ Whisper transcription │ ✅ Homebrew binary │ ✅ CUDA build │ 🟦 Win │ +│ Multi-model hot-swap │ ✅ Ollama auto-loads │ ✅ Ollama auto-loads │ Tie │ +│ Batch audio transcription │ ⚠️ Decent (~3× RT) │ ✅ Fast (~10× RT) │ 🟦 Win │ +│ Fine-tuning (LoRA) │ ⚠️ Small models only │ ✅ 24 GB VRAM │ 🟦 Win │ +│ Running while on battery │ ✅ 12–16 hr battery │ ⚠️ ~1–2 hr (180W GPU) │ 🍎 Mac │ +│ Silent operation │ ✅ Fanless at idle │ ⚠️ Fans audible │ 🍎 Mac │ +└──────────────────────────────┴───────────────────────┴───────────────────────┴──────────┘ +``` + +--- + +## Memory Architecture Deep Dive + +This is the single most important architectural difference between the two machines. + +``` +┌─────────────────────────────────────────────────────────────────────────────────────────┐ +│ │ +│ Mac M4 Pro — Unified Memory Architecture │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ 48 GB Unified LPDDR5X │ │ +│ │ ┌─────────────────────────────┐ │ │ +│ │ │ Shared by CPU + GPU + NPU │ │ │ +│ │ │ Bandwidth: ~273 GB/s │ │ │ +│ │ │ No data copying needed │ │ │ +│ │ └─────────────────────────────┘ │ │ +│ │ │ │ +│ │ ✅ A 40 GB model sits in memory once │ │ +│ │ ✅ GPU reads it directly — no transfer overhead │ │ +│ │ ⚠️ But GPU compute is slower than CUDA │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +│ Razer RTX 5090 — Discrete GPU Architecture │ +│ ┌──────────────────────┐ PCIe 5.0 ┌──────────────────────┐ │ +│ │ 64 GB DDR5 (CPU) │◄═══(~64 GB/s)════►│ 24 GB GDDR7 (GPU) │ │ +│ │ Bandwidth: ~90 GB/s │ │ Bandwidth: ~1 TB/s │ │ +│ └──────────────────────┘ └──────────────────────┘ │ +│ │ +│ ✅ GPU VRAM bandwidth is ~4× higher than Mac unified memory │ +│ ✅ Models ≤24 GB run MUCH faster (fully in VRAM) │ +│ ⚠️ Models >24 GB split across VRAM + RAM (PCIe bottleneck) │ +│ ⚠️ CPU RAM bandwidth is ~3× lower than Mac unified │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Practical Impact on Model Loading + +``` +┌─────────────────────┬──────────────────────────┬──────────────────────────────────────┐ +│ Model Size │ Mac M4 Pro │ Razer RTX 5090 │ +├─────────────────────┼──────────────────────────┼──────────────────────────────────────┤ +│ ≤8 GB (7B Q4) │ Good — all in unified │ Excellent — all in VRAM, ~4× faster │ +│ 8–19 GB (32B Q4) │ Good — all in unified │ Excellent — all in VRAM, ~2–3× faster│ +│ 19–24 GB │ Good — all in unified │ Good — fits in VRAM, faster │ +│ 24–48 GB (70B Q4) │ OK — all in unified, │ Mixed — split GPU/CPU, PCIe penalty │ +│ │ but slow compute │ Faster compute, slower memory access │ +│ >48 GB (70B Q8) │ Won't fit │ CPU RAM only — very slow │ +└─────────────────────┴──────────────────────────┴──────────────────────────────────────┘ +``` + +--- + +## Unique Strengths of Each Machine + +### 🍎 Mac M4 Pro — Best For + +``` +┌─────────────────────────────────────────────────────────────────────────────────────────┐ +│ │ +│ 1. iOS / macOS Development │ +│ Xcode, SwiftUI, TestFlight builds — ONLY possible on Mac │ +│ This is your LysnrAI + MindLyst iOS build machine │ +│ │ +│ 2. Large Model Loading (40–48 GB) │ +│ 70B Q4 models fit entirely in unified memory │ +│ No GPU↔CPU transfer penalty (slower compute, but smoother) │ +│ │ +│ 3. Mobile / Travel / Battery │ +│ 12–16 hours battery, 2.1 kg, silent at idle │ +│ Can run Ollama + dashboard on battery for hours │ +│ │ +│ 4. Daily Driver Development │ +│ Windsurf, VS Code, 3 dashboard dev servers, Docker — all comfortable │ +│ 48 GB RAM handles heavy multitasking │ +│ │ +│ 5. Ecosystem Integration │ +│ AirDrop, Handoff, Universal Clipboard, iCloud │ +│ Seamless with iPhone for testing LysnrAI mobile │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### 🟦 Razer RTX 5090 — Best For + +``` +┌─────────────────────────────────────────────────────────────────────────────────────────┐ +│ │ +│ 1. Raw GPU Inference Speed │ +│ 2–4× faster on all models ≤32B (19 GB fits entirely in 24 GB VRAM) │ +│ Tensor cores + CUDA = dramatically faster TTS, Whisper, coding models │ +│ │ +│ 2. Whisper Batch Transcription │ +│ 8–15× realtime vs 2–4× on Mac — critical for bulk audio processing │ +│ Hours of audio transcribed in minutes │ +│ │ +│ 3. TTS Generation at Scale │ +│ Qwen3-TTS at 2–4× realtime means pre-generating audio libraries │ +│ Orpheus at 2–3× realtime for real-time voice applications │ +│ │ +│ 4. Fine-Tuning / Training │ +│ 24 GB VRAM enables LoRA fine-tuning of 7B–13B models │ +│ Not practical on Mac MPS (too slow, limited memory bandwidth for training) │ +│ │ +│ 5. CUDA / TensorRT / ML Research │ +│ Full NVIDIA toolchain: CUDA, cuDNN, TensorRT, Triton │ +│ Most ML papers and frameworks are CUDA-first │ +│ │ +│ 6. Stable Diffusion / Image Generation │ +│ 5–8s per image (SDXL) vs 30s on Mac │ +│ ComfyUI, Automatic1111 — fully CUDA-optimized │ +│ │ +│ 7. Multi-GPU Workloads (Future) │ +│ eGPU or desktop migration path with same CUDA codebase │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Recommended Workload Distribution + +``` +┌─────────────────────────────────────┬──────────────┬───────────────────────────────────┐ +│ Workload │ Best Machine │ Reason │ +├─────────────────────────────────────┼──────────────┼───────────────────────────────────┤ +│ iOS development (LysnrAI, MindLyst) │ 🍎 Mac │ Xcode is Mac-only │ +│ Daily coding + Windsurf/Cursor │ 🍎 Mac │ Battery, portability, comfort │ +│ Dashboard dev (Next.js + Fastify) │ Either │ Identical experience │ +│ Ollama coding assistant (32B) │ 🟦 Razer │ 2–3× faster responses │ +│ Batch Whisper transcription │ 🟦 Razer │ 4–5× faster throughput │ +│ TTS audio generation │ 🟦 Razer │ 2–4× faster generation │ +│ LoRA fine-tuning │ 🟦 Razer │ CUDA required, 24 GB VRAM │ +│ Stable Diffusion / ComfyUI │ 🟦 Razer │ 5× faster image generation │ +│ 70B model experimentation │ 🍎 Mac │ Unified memory avoids GPU spill │ +│ Docker stack (all services) │ Either │ Both have enough RAM │ +│ On-the-go / travel / meetings │ 🍎 Mac │ Battery life, weight, silence │ +│ Desk/home heavy compute sessions │ 🟦 Razer │ Plugged in, max performance │ +│ Android builds │ Either │ Both support Gradle + SDK │ +│ Gaming (during breaks 😄) │ 🟦 Razer │ RTX 5090, 18" 240Hz, DLSS 4 │ +└─────────────────────────────────────┼──────────────┼───────────────────────────────────┤ +│ │ │ │ +│ IDEAL SETUP │ Both │ Mac = daily driver + iOS builds │ +│ │ │ Razer = GPU compute + heavy AI │ +└─────────────────────────────────────┴──────────────┴───────────────────────────────────┘ +``` + +--- + +## VRAM Budget Comparison + +### Mac: 48 GB Unified (Shared) + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ 48 GB Unified Memory │ +│ ████████████████████████████████████████████████░░░░░░░░░░░░░░░ │ +│ ├── macOS + apps: ~8–12 GB │ +│ ├── Ollama model: ~19 GB (32B) or ~5 GB (7B) │ +│ ├── Dashboard + Node: ~1 GB │ +│ ├── Python TTS venv: ~2 GB (when active) │ +│ └── Free headroom: ~14–18 GB │ +│ │ +│ Can load: 1× 32B model + dashboard + TTS comfortably │ +│ Can load: 1× 70B Q4 (~40 GB) — tight, no room for much else │ +└──────────────────────────────────────────────────────────────────┘ +``` + +### Razer: 24 GB VRAM + 64 GB RAM + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ 24 GB VRAM (GPU — fast) │ +│ ████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ +│ ├── Ollama model: ~19 GB (32B) or ~5 GB (7B) │ +│ ├── CUDA overhead: ~1 GB │ +│ └── Free VRAM: ~4 GB │ +│ │ +│ 64 GB RAM (CPU — for overflow) │ +│ ████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ +│ ├── Windows + WSL2: ~6–10 GB │ +│ ├── Dashboard + Node: ~1 GB │ +│ ├── Python TTS venv: ~2 GB (when active) │ +│ └── Free RAM: ~50 GB (overflow for large models) │ +│ │ +│ Can load: 1× 32B model fully in VRAM — blazing fast │ +│ Can load: 1× 70B Q4 — 24 GB in VRAM + 16 GB in RAM (slower) │ +│ Can load: 2× 7B models in VRAM simultaneously │ +└──────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Summary Scorecard + +``` +┌──────────────────────────────┬──────────┬──────────┐ +│ Category │ Mac │ Razer │ +├──────────────────────────────┼──────────┼──────────┤ +│ GPU inference speed (≤32B) │ ★★★☆☆ │ ★★★★★ │ +│ GPU inference speed (70B) │ ★★★☆☆ │ ★★★☆☆ │ +│ Large model capacity │ ★★★★☆ │ ★★★☆☆ │ +│ TTS performance │ ★★★☆☆ │ ★★★★★ │ +│ Whisper performance │ ★★★☆☆ │ ★★★★★ │ +│ Fine-tuning capability │ ★☆☆☆☆ │ ★★★★☆ │ +│ iOS development │ ★★★★★ │ ☆☆☆☆☆ │ +│ General dev experience │ ★★★★★ │ ★★★★☆ │ +│ Portability / battery │ ★★★★★ │ ★★☆☆☆ │ +│ Raw compute power │ ★★★☆☆ │ ★★★★★ │ +│ Storage capacity │ ★★☆☆☆ │ ★★★★★ │ +│ Display quality │ ★★★★★ │ ★★★★★ │ +│ Silence / thermals │ ★★★★★ │ ★★☆☆☆ │ +│ Price / value │ ★★★★☆ │ ★★★☆☆ │ +├──────────────────────────────┼──────────┼──────────┤ +│ OVERALL │ ★★★★☆ │ ★★★★☆ │ +│ │ (daily) │ (power) │ +└──────────────────────────────┴──────────┴──────────┘ +``` + +**Bottom line:** These machines are complementary, not competing. The Mac is the better daily driver and the only option for iOS builds. The Razer is the raw GPU compute powerhouse for AI workloads. Together, they cover every use case in your stack.