# Voicebox — Local Voice Cloning Studio > **Repo:** [github.com/jamiepine/voicebox](https://github.com/jamiepine/voicebox) · **Version:** 0.1.12 > **Stack:** Tauri (Rust) + FastAPI (Python) + Qwen3-TTS + MLX (Apple Silicon) / PyTorch (CUDA) > **Local clone:** `__LOCAL_LLMs/APPS/Voice/voicebox/` (gitignored) --- ## What Is Voicebox? Voicebox is an open-source, local-first voice cloning studio powered by **Qwen3-TTS**. It provides a DAW-like interface for professional voice synthesis — a local alternative to cloud services like ElevenLabs. ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Voicebox Architecture │ │ │ │ ┌───────────────────────┐ ┌──────────────────────────┐ │ │ │ Web UI (Vite + React) │ │ Tauri Desktop App │ │ │ │ http://localhost:5173 │ OR │ (Rust + native window) │ │ │ └──────────┬────────────┘ └──────────┬───────────────┘ │ │ │ │ │ │ └──────────┐ ┌────────────────────┘ │ │ ▼ ▼ │ │ ┌─────────────────────┐ │ │ │ FastAPI Backend │ │ │ │ http://localhost:17493│ │ │ │ │ │ │ │ • Qwen3-TTS model │ │ │ │ • Voice profiles │ │ │ │ • Audio generation │ │ │ │ • Story editor │ │ │ │ • SQLite database │ │ │ │ • REST API │ │ │ └─────────────────────┘ │ │ │ │ │ ┌────────┴────────┐ │ │ │ GPU Acceleration│ │ │ │ MPS (Mac) or │ │ │ │ CUDA (Windows) │ │ │ └─────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ ``` ### Key Features | Feature | Description | | --------------------- | ---------------------------------------------------------------- | | **Voice cloning** | Record or upload a few seconds of audio → create a voice profile | | **Text-to-speech** | Type text, pick a voice, generate speech with Qwen3-TTS | | **Story editor** | Multi-voice timeline for podcasts, narratives, audiobooks | | **Multi-track audio** | DAW-like editing with multiple voices/tracks | | **REST API** | Full API for integration (port 17493) | | **100% local** | No cloud, no data leaves your machine | | **Cross-platform** | macOS (MLX Metal), Windows/Linux (PyTorch CUDA) | --- ## Prerequisites | Component | Required | Check Command | | ---------- | -------------------------------------- | ---------------------- | | **Python** | 3.12 or 3.13 | `python3.12 --version` | | **Bun** | ≥ 1.0 | `bun --version` | | **Rust** | Latest stable (for Tauri desktop only) | `rustc --version` | | **Git** | Any | `git --version` | ### Platform-Specific | Platform | GPU Backend | Extra Requirements | | ------------------------- | ------------ | ----------------------------- | | **macOS (Apple Silicon)** | MLX (Metal) | Xcode Command Line Tools | | **macOS (Intel)** | CPU only | — | | **Windows/WSL2** | PyTorch CUDA | NVIDIA drivers + CUDA toolkit | | **Linux** | PyTorch CUDA | NVIDIA drivers + CUDA toolkit | --- ## Installation (Step-by-Step) ### Step 1: Install Prerequisites #### macOS ```bash # Install Bun brew install oven-sh/bun/bun # Install Python 3.12 (if not present) brew install python@3.12 # Rust (only needed for Tauri desktop app) curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ``` #### Windows (WSL2) ```bash # Install Bun curl -fsSL https://bun.sh/install | bash source ~/.bashrc # Install Python 3.12 sudo apt install -y python3.12 python3.12-venv python3.12-dev # Rust (only needed for Tauri desktop app) curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ``` ### Step 2: Clone the Repository ```bash # Clone into the APPS/Voice directory cd /path/to/__LOCAL_LLMs/APPS/Voice git clone https://github.com/jamiepine/voicebox.git cd voicebox ``` > **Current location on this Mac:** `/Users/sd9235/code/mygh/learning_ai_common_plat/__LOCAL_LLMs/APPS/Voice/voicebox/` ### Step 3: Install JavaScript Dependencies ```bash # Root workspace dependencies bun install # Web frontend dependencies (separate) cd web && bun install && cd .. ``` ### Step 4: Install Python Dependencies ```bash # Option A: Use the Makefile (recommended) make setup-python # Option B: Manual python3.12 -m venv backend/venv source backend/venv/bin/activate pip install --upgrade pip pip install -r backend/requirements.txt # Apple Silicon only — MLX for native Metal acceleration pip install -r backend/requirements-mlx.txt # Install Qwen3-TTS pip install git+https://github.com/QwenLM/Qwen3-TTS.git ``` ### Step 5: Verify Installation ```bash # Check Python venv backend/venv/bin/python -c "import torch; print(f'PyTorch: {torch.__version__}')" backend/venv/bin/python -c "import fastapi; print(f'FastAPI: {fastapi.__version__}')" # Check GPU backend # macOS: backend/venv/bin/python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')" # Windows/Linux: backend/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')" # Check MLX (Apple Silicon only) backend/venv/bin/python -c "import mlx; print(f'MLX: {mlx.__version__}')" ``` --- ## Running Voicebox ### Option A: Web Frontend + Backend (Recommended for Development) Open **two terminals:** **Terminal 1 — Backend:** ```bash cd /path/to/voicebox make dev-backend # Or manually: # backend/venv/bin/uvicorn backend.main:app --reload --port 17493 ``` Expected output: ``` INFO: Uvicorn running on http://0.0.0.0:17493 (Press CTRL+C to quit) INFO: Application startup complete. INFO: GPU available: MPS (Apple Silicon) ``` **Terminal 2 — Web Frontend:** ```bash cd /path/to/voicebox/web bun run dev ``` Expected output: ``` VITE v5.4.21 ready in 2536 ms ➜ Local: http://localhost:5173/ ➜ Network: use --host to expose ``` **Open in browser:** [http://localhost:5173/](http://localhost:5173/) ### Option B: Tauri Desktop App ```bash make dev # This starts both backend + Tauri desktop window ``` > Requires Rust toolchain installed. ### Option C: Backend Only (API Mode) ```bash make dev-backend # API docs at: http://localhost:17493/docs ``` --- ## First Use ### 1. Download a Model On first launch, Voicebox will prompt you to download the Qwen3-TTS model. This is ~1.7 GB. If the automatic download fails (corporate proxy, etc.): ```bash # Manual download via hf-mirror.com (bypasses Forcepoint proxy) mkdir -p models/Qwen3-TTS cd models/Qwen3-TTS curl -k -L -o config.json "https://hf-mirror.com/Qwen/Qwen3-TTS-0.6B/resolve/main/config.json" curl -k -L -o model.safetensors "https://hf-mirror.com/Qwen/Qwen3-TTS-0.6B/resolve/main/model.safetensors" curl -k -L -o tokenizer.json "https://hf-mirror.com/Qwen/Qwen3-TTS-0.6B/resolve/main/tokenizer.json" cd ../.. ``` ### 2. Create a Voice Profile 1. Click **"Voice Profiles"** in the sidebar 2. Click **"New Profile"** 3. **Record** a few seconds of your voice — or **upload** an audio file (.wav, .mp3) 4. Give it a name → Save ### 3. Generate Speech 1. Click **"Generate"** in the sidebar 2. Type your text in the input box 3. Select a voice profile 4. Click **Generate** 5. Listen to the output, download as .wav ### 4. Story Editor 1. Click **"Stories"** in the sidebar 2. Create a new story 3. Add segments with different voices 4. Generate the full story as a single audio file 5. Export for podcasts, audiobooks, etc. --- ## Ports & URLs | Service | URL | Purpose | | ---------------- | ---------------------------------------------------------- | ---------------------------- | | **Backend API** | [http://localhost:17493](http://localhost:17493) | FastAPI server | | **API Docs** | [http://localhost:17493/docs](http://localhost:17493/docs) | Swagger/OpenAPI docs | | **Web Frontend** | [http://localhost:5173](http://localhost:5173) | Vite dev server (web mode) | | **Tauri App** | Native window | Desktop app (if using Tauri) | --- ## Make Commands Reference | Command | Description | | ------------------- | ------------------------------------------------- | | `make setup` | Full setup (JS + Python + MLX if Apple Silicon) | | `make setup-js` | Install JavaScript dependencies only | | `make setup-python` | Install Python dependencies + venv | | `make dev` | Start backend + Tauri desktop app | | `make dev-backend` | Start FastAPI backend only (port 17493) | | `make dev-web` | Start backend + web frontend | | `make kill-dev` | Kill all dev processes | | `make build` | Build server binary + Tauri app | | `make build-web` | Build web frontend to `web/dist/` | | `make db-init` | Initialize SQLite database | | `make db-reset` | Reset database (delete + reinitialize) | | `make generate-api` | Generate TypeScript API client from OpenAPI | | `make lint` | Run Biome linter | | `make format` | Format code with Biome | | `make test` | Run all tests | | `make clean` | Clean build artifacts | | `make clean-all` | Nuclear clean (everything including node_modules) | --- ## Platform Performance | Platform | GPU Backend | Speed (est.) | Model Load Time | | ------------------- | ------------ | ------------------------------ | --------------- | | **Mac M4 Pro 48GB** | MLX (Metal) | Fast — real-time or faster | ~5s | | **Mac M4 Pro 48GB** | PyTorch MPS | Good — near real-time | ~8s | | **RTX 5090 24GB** | PyTorch CUDA | Fastest — well above real-time | ~3s | | **RTX 3060 12GB** | PyTorch CUDA | Good — real-time | ~5s | | **CPU only (i7)** | PyTorch CPU | Slow — below real-time | ~15s | --- ## Troubleshooting ### Backend won't start ```bash # Check Python version (needs 3.12 or 3.13) backend/venv/bin/python --version # Check if port is in use lsof -i :17493 # Try starting manually with verbose output backend/venv/bin/uvicorn backend.main:app --reload --port 17493 --log-level debug ``` ### Frontend won't start (ERR_MODULE_NOT_FOUND) ```bash # Web dependencies need to be installed separately cd web && bun install && cd .. # Then start cd web && bun run dev ``` ### Model download fails (corporate proxy) ```bash # Use hf-mirror.com instead of huggingface.co # See "First Use > Download a Model" section above ``` ### MPS not available (macOS) ```bash # Check PyTorch MPS support backend/venv/bin/python -c "import torch; print(torch.backends.mps.is_available())" # If False — you may need to update PyTorch backend/venv/bin/pip install --upgrade torch ``` ### CUDA not available (Windows/WSL2) ```bash # Check CUDA backend/venv/bin/python -c "import torch; print(torch.cuda.is_available())" # If False — install CUDA PyTorch backend/venv/bin/pip install torch --index-url https://download.pytorch.org/whl/cu121 ``` ### transformers version conflict ``` mlx-audio 0.3.1 requires transformers==5.0.0rc3, but you have transformers 4.57.3 ``` This is a warning, not a blocking error. Everything still works. The MLX-audio package pins a pre-release version of transformers — the stable version is fine for Qwen3-TTS. ### Database issues ```bash # Reset the database make db-reset # Or manually: rm -f backend/data/voicebox.db ``` ### Kill everything ```bash make kill-dev # Or manually: pkill -f "uvicorn" || true pkill -f "vite" || true ``` --- ## File Structure ``` voicebox/ ├── backend/ # FastAPI Python backend │ ├── main.py # App entry point │ ├── requirements.txt # Python deps │ ├── requirements-mlx.txt # Apple Silicon MLX deps │ ├── venv/ # Python virtual environment │ └── data/voicebox.db # SQLite database ├── web/ # Vite + React web frontend ├── app/ # Shared app components ├── tauri/ # Tauri desktop app (Rust) ├── landing/ # Landing page ├── models/ # Downloaded TTS models ├── scripts/ # Build/setup scripts ├── Makefile # All commands └── package.json # Bun workspace root ``` --- ## Relevance to LysnrAI Voicebox is a standalone tool — not integrated into LysnrAI. However, it's useful for: | Use Case | How | | -------------------------- | -------------------------------------------------------------------- | | **Voice profile testing** | Clone voices locally before using in LysnrAI TTS pipeline | | **Audio content creation** | Generate podcast/narration audio for LysnrAI content | | **TTS experimentation** | Test Qwen3-TTS model quality and performance locally | | **API integration** | Voicebox REST API (port 17493) could be called from LysnrAI services | --- ## Quick Start (TL;DR) ```bash # Clone cd __LOCAL_LLMs/APPS/Voice git clone https://github.com/jamiepine/voicebox.git cd voicebox # Install bun install && cd web && bun install && cd .. make setup-python # Run (two terminals) make dev-backend # Terminal 1: backend on :17493 cd web && bun run dev # Terminal 2: frontend on :5173 # Open open http://localhost:5173 ```