# Windows Setup Guide — Local LLM Stack on Razer Blade 18 > **Hardware:** Razer Blade 18 · Intel Core Ultra 9 275HX · RTX 5090 24 GB GDDR7 · 64 GB DDR5 · 4 TB NVMe > **OS:** Windows 11 Home + WSL2 (Ubuntu) > **Goal:** Mirror the macOS `__LOCAL_LLMs` stack — Ollama, Whisper, TTS (Orpheus + Qwen3), Mission Control dashboard > **See also:** [razer-blade-18-spec.md](razer-blade-18-spec.md) for full hardware specs --- ## Automated Setup (Recommended) **Two scripts, zero IDE required.** See [README.md](README.md) for the quick start, or run directly: ```powershell # Step 1 — PowerShell (Admin) on Windows Set-ExecutionPolicy -Scope Process Bypass .\setup-windows.ps1 # Reboot if WSL2 was just installed ``` ```bash # Step 2 — Ubuntu (WSL2) terminal curl -fsSL https://raw.githubusercontent.com/saravanakumardb1/learning_ai_common_plat/main/__LOCAL_LLMs/windows_specific/setup-wsl.sh | bash ``` The rest of this guide covers each step in detail for reference and troubleshooting. --- ## Architecture: Windows-Native + WSL2 ``` ┌────────────────────────────────────────────────────────┐ │ Windows 11 │ │ ├── NVIDIA drivers + CUDA (native) │ │ ├── Ollama (native Windows service, port 11434) │ │ └── Browser → http://localhost:3000 │ │ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ WSL2 (Ubuntu 24.04) │ │ │ │ ├── Node.js, Python 3.12, ffmpeg, git │ │ │ │ ├── __LOCAL_LLMs/ (cloned here) │ │ │ │ │ ├── dashboard/ → npm run dev (port 3000) │ │ │ │ │ ├── setup-tts.sh (works as-is) │ │ │ │ │ ├── start-dashboard.sh (works as-is) │ │ │ │ │ └── models/ (SNAC, Qwen3-TTS) │ │ │ │ ├── whisper-cpp (CUDA build) │ │ │ │ └── .venv-qwen-tts/ (PyTorch CUDA) │ │ │ └──────────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────────┘ ``` **Why WSL2?** All existing bash scripts, Python venvs, and Node.js tooling work identically to macOS — zero porting. The dashboard API routes auto-detect macOS vs Linux at runtime via `process.platform`. --- ## Phase 1: Windows-Native Setup ### 1. NVIDIA Drivers ```powershell # Install latest NVIDIA Game Ready or Studio drivers # Download from: https://www.nvidia.com/Download/index.aspx # Verify nvidia-smi # Should show: RTX 5090, 24 GB VRAM, CUDA 13.x+ ``` ### 2. Ollama (Windows-Native) Ollama runs natively on Windows and is accessible from WSL2 at `localhost:11434`. ```powershell winget install --id Ollama.Ollama # Verify ollama --version ``` ### 3. Pull Models (from Windows or WSL2) ```bash ollama pull qwen2.5-coder:32b # 19 GB — primary coding model ollama pull qwen2.5-coder:7b # 4.7 GB — fast coding ollama pull deepseek-r1:32b # 19 GB — chain-of-thought ollama pull llama3.1:8b # 4.9 GB — fast general tasks ollama pull sematre/orpheus:en # 4 GB — text-to-speech (8 voices) ollama list # verify all 5 models ``` ### 4. Install WSL2 ```powershell # From PowerShell (Admin) wsl --install -d Ubuntu-24.04 # Reboot if prompted, then set up username/password ``` --- ## Phase 2: WSL2 Setup ### 1. Install Dependencies ```bash # Update sudo apt update && sudo apt upgrade -y # Node.js 20 LTS curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs # Python 3.12 sudo apt install -y python3.12 python3.12-venv python3-pip # Build tools + ffmpeg sudo apt install -y ffmpeg git curl build-essential cmake # Verify node --version # 20.x+ python3.12 --version nvidia-smi # should show RTX 5090 (GPU passthrough from Windows) ``` > **Important:** Do NOT install NVIDIA drivers inside WSL2. The Windows-side driver handles GPU passthrough automatically. ### 2. Clone Repo ```bash mkdir -p ~/code/mygh && cd ~/code/mygh git clone https://github.com/saravanakumardb1/learning_ai_common_plat.git cd learning_ai_common_plat/__LOCAL_LLMs ``` > **Performance note:** Always clone inside WSL2 filesystem (`~/code/...`), NOT in `/mnt/c/` — the Windows filesystem bridge is very slow for `node_modules`. ### 3. Whisper.cpp (CUDA build) ```bash cd ~ git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp cmake -B build -DGGML_CUDA=ON cmake --build build --config Release -j$(nproc) sudo cp build/bin/whisper-cli /usr/local/bin/ # Download model (1.5 GB) mkdir -p ~/whisper-models curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \ "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin" # Verify whisper-cli --version ``` > **No corporate proxy on this machine** — download directly from `huggingface.co`. ### 4. TTS Setup (One-Shot) ```bash cd ~/code/mygh/learning_ai_common_plat/__LOCAL_LLMs # Works exactly like macOS — downloads SNAC, Qwen3-TTS, creates venv bash setup-tts.sh ``` The script detects macOS vs Linux and installs the correct PyTorch variant (MPS on macOS, CUDA on Linux). On a personal machine, override the default HuggingFace mirror: `HF_MIRROR=https://huggingface.co bash setup-tts.sh` ### 5. Start Dashboard ```bash bash start-dashboard.sh # Open http://localhost:3000 in Windows browser ``` WSL2 automatically forwards ports — the dashboard is accessible from Windows at `localhost:3000`. --- ## Key Differences: macOS vs WSL2 | Area | macOS (any Mac) | WSL2 (any Linux) | | ---------------------- | --------------------------- | -------------------------------------- | | **GPU** | Apple Silicon (MPS) | NVIDIA (CUDA) | | **Ollama** | macOS native (Metal) | Windows native, accessed via localhost | | **PyTorch device** | `mps` | `cuda` | | **Whisper install** | `brew install whisper-cpp` | Build from source with CUDA | | **Package manager** | Homebrew | apt | | **Shell scripts** | Work as-is | Work as-is | | **Python venv path** | `bin/python` | `bin/python` (same) | | **Dashboard** | Identical | Identical | | **Ollama models path** | `~/.ollama/models/` | Windows `%USERPROFILE%\.ollama\` | | **Model download** | `hf-mirror.com` (corporate) | `huggingface.co` (direct) | ### Performance Expectations | Workload | macOS M4 Pro 48 GB | Razer RTX 5090 24 GB | | --------------------------- | -------------------- | ------------------------------- | | qwen2.5-coder:32b inference | ~15–25 tok/s | ~40–60 tok/s | | Whisper large-v3-turbo | ~2–4x realtime | ~8–15x realtime | | Orpheus TTS | ~realtime | ~2–3x realtime | | Qwen3-TTS | ~realtime (MPS) | ~2–4x realtime (CUDA) | | 70B quantized models | Fits in 48 GB (slow) | Partially offloads to 64 GB RAM | ### VRAM Budget (RTX 5090 — 24 GB) | Model | VRAM Usage | Fits in GPU? | | ------------------ | ---------- | ------------ | | llama3.1:8b | ~5 GB | ✅ Fully | | qwen2.5-coder:7b | ~5 GB | ✅ Fully | | sematre/orpheus:en | ~4 GB | ✅ Fully | | qwen2.5-coder:32b | ~19 GB | ✅ Fully | | deepseek-r1:32b | ~19 GB | ✅ Fully | --- ## Quick Reference — Full Setup Checklist ### Windows Side ``` [ ] Install NVIDIA drivers (Game Ready or Studio) [ ] Install Ollama (winget install Ollama.Ollama) [ ] Pull all 5 models [ ] Install WSL2 (wsl --install -d Ubuntu-24.04) ``` ### WSL2 Side ``` [ ] Install Node.js 20+, Python 3.12, ffmpeg, git, cmake [ ] Verify nvidia-smi shows RTX 5090 [ ] Clone repo into ~/code/mygh/ [ ] Build whisper-cpp with CUDA [ ] Download Whisper model to ~/whisper-models/ [ ] Run: bash setup-tts.sh [ ] Run: bash start-dashboard.sh [ ] Verify: http://localhost:3000 shows all green ``` --- ## Troubleshooting ### Ollama not accessible from WSL2 ```bash curl http://localhost:11434/api/tags # If fails, check Windows firewall or try: curl http://$(hostname).local:11434/api/tags ``` ### CUDA not visible in WSL2 ```bash nvidia-smi # If "command not found": # 1. Update Windows NVIDIA drivers to latest # 2. Run: wsl --update # 3. Do NOT install nvidia-driver-* inside WSL2 ``` ### Slow filesystem performance ```bash # Clone repos inside WSL2 filesystem: ~/code/... # NOT in /mnt/c/ (Windows→WSL bridge is ~10x slower for node_modules) ```