learning_ai_common_plat/__LOCAL_LLMs/windows_specific/setup-guide.md
saravanakumardb1 efd45ad86f feat(local-llms): add one-click Windows setup scripts
- setup-windows.ps1: PowerShell script for Windows side
  - NVIDIA driver verification, Ollama install via winget
  - Pull all 5 models with skip-if-exists logic
  - WSL2 Ubuntu 24.04 install
- setup-wsl.sh: Bash script for WSL2 side
  - Idempotent apt deps (Node.js 20, Python 3.12, ffmpeg, cmake)
  - CUDA GPU passthrough verification
  - Repo clone + git pull, whisper.cpp CUDA build
  - Whisper model download, TTS setup, dashboard start
- README.md: 2-step quick start (no IDE required)
- setup-guide.md: add automated setup section at top
2026-02-21 16:28:02 -08:00

9.3 KiB
Raw Permalink Blame History

Windows Setup Guide — Local LLM Stack on Razer Blade 18

Hardware: Razer Blade 18 · Intel Core Ultra 9 275HX · RTX 5090 24 GB GDDR7 · 64 GB DDR5 · 4 TB NVMe OS: Windows 11 Home + WSL2 (Ubuntu) Goal: Mirror the macOS __LOCAL_LLMs stack — Ollama, Whisper, TTS (Orpheus + Qwen3), Mission Control dashboard See also: razer-blade-18-spec.md for full hardware specs


Two scripts, zero IDE required. See README.md for the quick start, or run directly:

# Step 1 — PowerShell (Admin) on Windows
Set-ExecutionPolicy -Scope Process Bypass
.\setup-windows.ps1
# Reboot if WSL2 was just installed
# Step 2 — Ubuntu (WSL2) terminal
curl -fsSL https://raw.githubusercontent.com/saravanakumardb1/learning_ai_common_plat/main/__LOCAL_LLMs/windows_specific/setup-wsl.sh | bash

The rest of this guide covers each step in detail for reference and troubleshooting.


Architecture: Windows-Native + WSL2

┌────────────────────────────────────────────────────────┐
│  Windows 11                                            │
│  ├── NVIDIA drivers + CUDA (native)                    │
│  ├── Ollama (native Windows service, port 11434)       │
│  └── Browser → http://localhost:3000                   │
│                                                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │  WSL2 (Ubuntu 24.04)                             │  │
│  │  ├── Node.js, Python 3.12, ffmpeg, git           │  │
│  │  ├── __LOCAL_LLMs/ (cloned here)                 │  │
│  │  │   ├── dashboard/ → npm run dev (port 3000)    │  │
│  │  │   ├── setup-tts.sh    (works as-is)           │  │
│  │  │   ├── start-dashboard.sh (works as-is)        │  │
│  │  │   └── models/ (SNAC, Qwen3-TTS)              │  │
│  │  ├── whisper-cpp (CUDA build)                    │  │
│  │  └── .venv-qwen-tts/ (PyTorch CUDA)             │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

Why WSL2? All existing bash scripts, Python venvs, and Node.js tooling work identically to macOS — zero porting. The dashboard API routes auto-detect macOS vs Linux at runtime via process.platform.


Phase 1: Windows-Native Setup

1. NVIDIA Drivers

# Install latest NVIDIA Game Ready or Studio drivers
# Download from: https://www.nvidia.com/Download/index.aspx

# Verify
nvidia-smi
# Should show: RTX 5090, 24 GB VRAM, CUDA 13.x+

2. Ollama (Windows-Native)

Ollama runs natively on Windows and is accessible from WSL2 at localhost:11434.

winget install --id Ollama.Ollama

# Verify
ollama --version

3. Pull Models (from Windows or WSL2)

ollama pull qwen2.5-coder:32b     # 19 GB — primary coding model
ollama pull qwen2.5-coder:7b      # 4.7 GB — fast coding
ollama pull deepseek-r1:32b       # 19 GB — chain-of-thought
ollama pull llama3.1:8b            # 4.9 GB — fast general tasks
ollama pull sematre/orpheus:en    # 4 GB — text-to-speech (8 voices)

ollama list    # verify all 5 models

4. Install WSL2

# From PowerShell (Admin)
wsl --install -d Ubuntu-24.04
# Reboot if prompted, then set up username/password

Phase 2: WSL2 Setup

1. Install Dependencies

# Update
sudo apt update && sudo apt upgrade -y

# Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# Python 3.12
sudo apt install -y python3.12 python3.12-venv python3-pip

# Build tools + ffmpeg
sudo apt install -y ffmpeg git curl build-essential cmake

# Verify
node --version        # 20.x+
python3.12 --version
nvidia-smi            # should show RTX 5090 (GPU passthrough from Windows)

Important: Do NOT install NVIDIA drivers inside WSL2. The Windows-side driver handles GPU passthrough automatically.

2. Clone Repo

mkdir -p ~/code/mygh && cd ~/code/mygh
git clone https://github.com/saravanakumardb1/learning_ai_common_plat.git
cd learning_ai_common_plat/__LOCAL_LLMs

Performance note: Always clone inside WSL2 filesystem (~/code/...), NOT in /mnt/c/ — the Windows filesystem bridge is very slow for node_modules.

3. Whisper.cpp (CUDA build)

cd ~
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
sudo cp build/bin/whisper-cli /usr/local/bin/

# Download model (1.5 GB)
mkdir -p ~/whisper-models
curl -L -o ~/whisper-models/ggml-large-v3-turbo.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin"

# Verify
whisper-cli --version

No corporate proxy on this machine — download directly from huggingface.co.

4. TTS Setup (One-Shot)

cd ~/code/mygh/learning_ai_common_plat/__LOCAL_LLMs

# Works exactly like macOS — downloads SNAC, Qwen3-TTS, creates venv
bash setup-tts.sh

The script detects macOS vs Linux and installs the correct PyTorch variant (MPS on macOS, CUDA on Linux). On a personal machine, override the default HuggingFace mirror: HF_MIRROR=https://huggingface.co bash setup-tts.sh

5. Start Dashboard

bash start-dashboard.sh
# Open http://localhost:3000 in Windows browser

WSL2 automatically forwards ports — the dashboard is accessible from Windows at localhost:3000.


Key Differences: macOS vs WSL2

Area macOS (any Mac) WSL2 (any Linux)
GPU Apple Silicon (MPS) NVIDIA (CUDA)
Ollama macOS native (Metal) Windows native, accessed via localhost
PyTorch device mps cuda
Whisper install brew install whisper-cpp Build from source with CUDA
Package manager Homebrew apt
Shell scripts Work as-is Work as-is
Python venv path bin/python bin/python (same)
Dashboard Identical Identical
Ollama models path ~/.ollama/models/ Windows %USERPROFILE%\.ollama\
Model download hf-mirror.com (corporate) huggingface.co (direct)

Performance Expectations

Workload macOS M4 Pro 48 GB Razer RTX 5090 24 GB
qwen2.5-coder:32b inference ~1525 tok/s ~4060 tok/s
Whisper large-v3-turbo ~24x realtime ~815x realtime
Orpheus TTS ~realtime ~23x realtime
Qwen3-TTS ~realtime (MPS) ~24x realtime (CUDA)
70B quantized models Fits in 48 GB (slow) Partially offloads to 64 GB RAM

VRAM Budget (RTX 5090 — 24 GB)

Model VRAM Usage Fits in GPU?
llama3.1:8b ~5 GB Fully
qwen2.5-coder:7b ~5 GB Fully
sematre/orpheus:en ~4 GB Fully
qwen2.5-coder:32b ~19 GB Fully
deepseek-r1:32b ~19 GB Fully

Quick Reference — Full Setup Checklist

Windows Side

[ ] Install NVIDIA drivers (Game Ready or Studio)
[ ] Install Ollama (winget install Ollama.Ollama)
[ ] Pull all 5 models
[ ] Install WSL2 (wsl --install -d Ubuntu-24.04)

WSL2 Side

[ ] Install Node.js 20+, Python 3.12, ffmpeg, git, cmake
[ ] Verify nvidia-smi shows RTX 5090
[ ] Clone repo into ~/code/mygh/
[ ] Build whisper-cpp with CUDA
[ ] Download Whisper model to ~/whisper-models/
[ ] Run: bash setup-tts.sh
[ ] Run: bash start-dashboard.sh
[ ] Verify: http://localhost:3000 shows all green

Troubleshooting

Ollama not accessible from WSL2

curl http://localhost:11434/api/tags
# If fails, check Windows firewall or try:
curl http://$(hostname).local:11434/api/tags

CUDA not visible in WSL2

nvidia-smi
# If "command not found":
# 1. Update Windows NVIDIA drivers to latest
# 2. Run: wsl --update
# 3. Do NOT install nvidia-driver-* inside WSL2

Slow filesystem performance

# Clone repos inside WSL2 filesystem: ~/code/...
# NOT in /mnt/c/ (Windows→WSL bridge is ~10x slower for node_modules)