Bug fixes:
- Fix Google Fonts build error (corporate proxy blocks fonts.gstatic.com)
by removing Geist font imports and switching to system font stack
- Fix system API 7.6s latency by caching static info (chip, GPU, brew)
with timeouts on shell commands — now responds in ~50ms
New features:
- Streaming prompt responses via NDJSON proxy (/api/ollama/stream)
with typing cursor animation and auto-scroll
- Model pull UI: input field + button to download new models
- Model delete with two-step confirmation dialog
- VRAM usage and expiry time display for loaded models
- Toast notifications (success/error/info) with slide-in animation
- Copy response button in prompt modal
- Escape key closes modals, backdrop click dismisses
- Pull/delete/show actions added to Ollama API route
Next.js 16 dashboard for monitoring and managing the local LLM stack.
Runs on port 3100 with dark theme using ByteLyst design tokens.
API routes:
- GET/POST /api/ollama — model list, running status, load/unload/generate
- GET /api/whisper — binary discovery, GGML model inventory
- GET /api/system — chip info, RAM/disk usage, brew package versions
Dashboard UI:
- Top stats row: Ollama status, model count, Whisper status, RAM usage
- Ollama models panel with load/unload actions, LOADED badge, details
- System panel with progress bars for RAM and disk
- Whisper.cpp panel with binary list and model inventory
- Brew packages panel with version tracking
- Basic prompt modal with Cmd+Enter shortcut
- Auto-refresh every 15 seconds
Also excludes __LOCAL_LLMs/ from root ESLint config (dashboard has its
own config and uses browser globals not available in Node.js context).
Tech: Next.js 16, React 19, TailwindCSS v4, Lucide icons, TypeScript
- Add __LOCAL_LLMs/LOCAL_LLMs_setup_mac_m4_48gb.md: comprehensive reference
for running Ollama on the dev Mac covering installation (v0.16.2 via brew),
corp proxy handling (AT&T Forcepoint), OpenAI-compat API usage examples
(curl/Node/Python), extraction-service eval integration, Python sidecar
wiring, model recommendations by use case, troubleshooting, and env var
reference
- Models documented: llama3.1:8b (4.9GB, default evals), qwen2.5-coder:32b
(19GB, code gen / Swift / TS)