# Windows Setup Guide — Local LLM Stack on Razer Blade 18

> **Hardware:** Razer Blade 18 · Intel Core Ultra 9 275HX · RTX 5090 24 GB GDDR7 · 64 GB DDR5 · 4 TB NVMe
> **OS:** Windows 11 Home + WSL2 (Ubuntu)
> **Goal:** Mirror the macOS `__LOCAL_LLMs` stack — Ollama, Whisper, TTS (Orpheus + Qwen3), Mission Control dashboard
> **See also:** [razer-blade-18-spec.md](razer-blade-18-spec.md) for full hardware specs

---

## Automated Setup (Recommended)

**Two scripts, zero IDE required.** See [README.md](README.md) for the quick start, or run directly:

```powershell
# Step 1 — PowerShell (Admin) on Windows
Set-ExecutionPolicy -Scope Process Bypass
.\setup-windows.ps1
# Reboot if WSL2 was just installed
```

```bash
# Step 2 — Ubuntu (WSL2) terminal
curl -fsSL https://raw.githubusercontent.com/saravanakumardb1/learning_ai_common_plat/main/__LOCAL_LLMs/windows_specific/setup-wsl.sh | bash
```

The rest of this guide covers each step in detail for reference and troubleshooting.

---

## Architecture: Windows-Native + WSL2

```
┌────────────────────────────────────────────────────────┐
│  Windows 11                                            │
│  ├── NVIDIA drivers + CUDA (native)                    │
│  ├── Ollama (native Windows service, port 11434)       │
│  └── Browser → http://localhost:3000                   │
│                                                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │  WSL2 (Ubuntu 24.04)                             │  │
│  │  ├── Node.js, Python 3.12, ffmpeg, git           │  │
│  │  ├── __LOCAL_LLMs/ (cloned here)                 │  │
│  │  │   ├── dashboard/ → npm run dev (port 3000)    │  │
│  │  │   ├── setup-tts.sh    (works as-is)           │  │
│  │  │   ├── start-dashboard.sh (works as-is)        │  │
│  │  │   └── models/ (SNAC, Qwen3-TTS)              │  │
│  │  ├── whisper-cpp (CUDA build)                    │  │
│  │  └── .venv-qwen-tts/ (PyTorch CUDA)             │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘
```

**Why WSL2?** All existing bash scripts, Python venvs, and Node.js tooling work identically to macOS — zero porting. The dashboard API routes auto-detect macOS vs Linux at runtime via `process.platform`.

---

## Phase 1: Windows-Native Setup

### 1. NVIDIA Drivers

```powershell
# Install latest NVIDIA Game Ready or Studio drivers
# Download from: https://www.nvidia.com/Download/index.aspx

# Verify
nvidia-smi
# Should show: RTX 5090, 24 GB VRAM, CUDA 13.x+
```

### 2. Ollama (Windows-Native)

Ollama runs natively on Windows and is accessible from WSL2 at `localhost:11434`.

```powershell
winget install --id Ollama.Ollama

# Verify
ollama --version
```

### 3. Pull Models (from Windows or WSL2)

```bash
ollama pull qwen2.5-coder:32b     # 19 GB — primary coding model
ollama pull qwen2.5-coder:7b      # 4.7 GB — fast coding
ollama pull deepseek-r1:32b       # 19 GB — chain-of-thought
ollama pull llama3.1:8b            # 4.9 GB — fast general tasks
ollama pull sematre/orpheus:en    # 4 GB — text-to-speech (8 voices)

ollama list    # verify all 5 models
```

### 4. Install WSL2

```powershell
# From PowerShell (Admin)
wsl --install -d Ubuntu-24.04
# Reboot if prompted, then set up username/password
```

---

## Phase 2: WSL2 Setup

### 1. Install Dependencies

```bash
# Update
sudo apt update && sudo apt upgrade -y

# Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# Python 3.12
sudo apt install -y python3.12 python3.12-venv python3-pip

# Build tools + ffmpeg
sudo apt install -y ffmpeg git curl build-essential cmake

# Verify
node --version        # 20.x+
python3.12 --version
nvidia-smi            # should show RTX 5090 (GPU passthrough from Windows)
```

> **Important:** Do NOT install NVIDIA drivers inside WSL2. The Windows-side driver handles GPU passthrough automatically.

### 2. Clone Repo

```bash
mkdir -p ~/code/mygh && cd ~/code/mygh
git clone https://github.com/saravanakumardb1/learning_ai_common_plat.git
cd learning_ai_common_plat/__LOCAL_LLMs
```

> **Performance note:** Always clone inside WSL2 filesystem (`~/code/...`), NOT in `/mnt/c/` — the Windows filesystem bridge is very slow for `node_modules`.

### 3. Whisper.cpp (CUDA build)

```bash
cd ~
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
sudo cp build/bin/whisper-cli /usr/local/bin/

# Download model (1.5 GB)
mkdir -p ~/whisper-models
curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin"

# Verify
whisper-cli --version
```

> **No corporate proxy on this machine** — download directly from `huggingface.co`.

### 4. TTS Setup (One-Shot)

```bash
cd ~/code/mygh/learning_ai_common_plat/__LOCAL_LLMs

# Works exactly like macOS — downloads SNAC, Qwen3-TTS, creates venv
bash setup-tts.sh
```

The script detects macOS vs Linux and installs the correct PyTorch variant (MPS on macOS, CUDA on Linux). On a personal machine, override the default HuggingFace mirror: `HF_MIRROR=https://huggingface.co bash setup-tts.sh`

### 5. Start Dashboard

```bash
bash start-dashboard.sh
# Open http://localhost:3000 in Windows browser
```

WSL2 automatically forwards ports — the dashboard is accessible from Windows at `localhost:3000`.

---

## Key Differences: macOS vs WSL2

| Area                   | macOS (any Mac)             | WSL2 (any Linux)                       |
| ---------------------- | --------------------------- | -------------------------------------- |
| **GPU**                | Apple Silicon (MPS)         | NVIDIA (CUDA)                          |
| **Ollama**             | macOS native (Metal)        | Windows native, accessed via localhost |
| **PyTorch device**     | `mps`                       | `cuda`                                 |
| **Whisper install**    | `brew install whisper-cpp`  | Build from source with CUDA            |
| **Package manager**    | Homebrew                    | apt                                    |
| **Shell scripts**      | Work as-is                  | Work as-is                             |
| **Python venv path**   | `bin/python`                | `bin/python` (same)                    |
| **Dashboard**          | Identical                   | Identical                              |
| **Ollama models path** | `~/.ollama/models/`         | Windows `%USERPROFILE%\.ollama\`       |
| **Model download**     | `hf-mirror.com` (corporate) | `huggingface.co` (direct)              |

### Performance Expectations

| Workload                    | macOS M4 Pro 48 GB   | Razer RTX 5090 24 GB            |
| --------------------------- | -------------------- | ------------------------------- |
| qwen2.5-coder:32b inference | ~15–25 tok/s         | ~40–60 tok/s                    |
| Whisper large-v3-turbo      | ~2–4x realtime       | ~8–15x realtime                 |
| Orpheus TTS                 | ~realtime            | ~2–3x realtime                  |
| Qwen3-TTS                   | ~realtime (MPS)      | ~2–4x realtime (CUDA)           |
| 70B quantized models        | Fits in 48 GB (slow) | Partially offloads to 64 GB RAM |

### VRAM Budget (RTX 5090 — 24 GB)

| Model              | VRAM Usage | Fits in GPU? |
| ------------------ | ---------- | ------------ |
| llama3.1:8b        | ~5 GB      | ✅ Fully     |
| qwen2.5-coder:7b   | ~5 GB      | ✅ Fully     |
| sematre/orpheus:en | ~4 GB      | ✅ Fully     |
| qwen2.5-coder:32b  | ~19 GB     | ✅ Fully     |
| deepseek-r1:32b    | ~19 GB     | ✅ Fully     |

---

## Quick Reference — Full Setup Checklist

### Windows Side

```
[ ] Install NVIDIA drivers (Game Ready or Studio)
[ ] Install Ollama (winget install Ollama.Ollama)
[ ] Pull all 5 models
[ ] Install WSL2 (wsl --install -d Ubuntu-24.04)
```

### WSL2 Side

```
[ ] Install Node.js 20+, Python 3.12, ffmpeg, git, cmake
[ ] Verify nvidia-smi shows RTX 5090
[ ] Clone repo into ~/code/mygh/
[ ] Build whisper-cpp with CUDA
[ ] Download Whisper model to ~/whisper-models/
[ ] Run: bash setup-tts.sh
[ ] Run: bash start-dashboard.sh
[ ] Verify: http://localhost:3000 shows all green
```

---

## Troubleshooting

### Ollama not accessible from WSL2

```bash
curl http://localhost:11434/api/tags
# If fails, check Windows firewall or try:
curl http://$(hostname).local:11434/api/tags
```

### CUDA not visible in WSL2

```bash
nvidia-smi
# If "command not found":
# 1. Update Windows NVIDIA drivers to latest
# 2. Run: wsl --update
# 3. Do NOT install nvidia-driver-* inside WSL2
```

### Slow filesystem performance

```bash
# Clone repos inside WSL2 filesystem: ~/code/...
# NOT in /mnt/c/ (Windows→WSL bridge is ~10x slower for node_modules)
```