New features:
- F2: Model search/filter — search input above models list (shown when
4+ models installed). Filters by name, family, and quantization level.
Press / to focus the search input.
- F3: Prompt history — saves last 20 prompts to localStorage with model
name and timestamp. History dropdown in prompt modal with one-click
re-run. Toggle via clock icon in textarea.
- F9: Modelfile viewer — expanded model details now fetch and display
the Modelfile via the show action. Collapsible <details> element
with syntax-highlighted pre block.
- F11: Keyboard shortcuts panel — press ? to toggle. Shows all shortcuts:
? (help), R (refresh), / (search), Esc (close/cancel), Cmd+Enter (send).
Shortcuts only fire when not in an input field.
Component extraction (CQ1):
- lib/types.ts: All interfaces (OllamaData, SystemData, Toast, etc.)
- lib/format.ts: formatBytes, formatUptime utilities
- lib/ollama-config.ts: Shared OLLAMA_URL constant
- components/StatusDot.tsx: Status indicator component
- components/ProgressBar.tsx: Progress bar component
- page.tsx: Now imports from extracted modules, reduced from 1180 to
1077 lines (interfaces + utilities + sub-components removed)
Error boundary (CQ4):
- error.tsx: Next.js App Router error boundary with styled error UI,
stack trace preview, and 'Try again' button
Shared config (CQ3):
- All 3 Ollama API routes now import OLLAMA_URL from lib/ollama-config.ts
instead of duplicating the env var fallback
Security (S1):
- Add MODEL_NAME_RE regex validation on POST /api/ollama — rejects
invalid model names before passing to Ollama API
Security (S2):
- Replace exec() with execFile() for brew package version check —
prevents shell injection if targets list ever becomes dynamic
New features:
- B2/F1: Streaming model pull with real-time progress bar. New
/api/ollama/pull/route.ts pipes NDJSON from Ollama stream:true.
UI shows status, completed/total bytes, and percentage during download.
- F6: Token/s metrics after prompt generation. Parses eval_count and
eval_duration from the final NDJSON chunk. Displays tok/s, total
tokens, and duration in the prompt modal footer.
Bug fixes:
- B7: Parse vm_stat page size from output instead of hardcoding 16384.
Reads 'page size of N bytes' from the first line for portability.
- B8: Whisper model discovery now scans multiple directories:
WHISPER_MODELS_DIR env var, ~/whisper-models, /opt/homebrew/share/
whisper-cpp/models/, ~/.cache/whisper/. Returns the first dir with
.bin files found.
Bug fixes:
- B4: Escape key now respects streaming state — during active stream,
Escape aborts the generation instead of closing the modal
- B5: Auto-refresh (15s interval) pauses during streaming and pull
operations to prevent background churn and UI flicker
- B9: Add AbortController to streaming fetch — closing modal or pressing
Escape cancels the underlying HTTP request, saving CPU/bandwidth
- B1: Header subtitle now dynamically shows chip name and RAM from the
system API instead of hardcoded 'Apple M4 Pro · 48 GB'
- B11: Escape handler clears promptText and promptResponse on close
- B6: Toast IDs use Date.now()+random instead of incrementing ref
(prevents collision on HMR remount)
- B10: Brew panel distinguishes 'Loading...' (system=null) from
'No tracked packages found' (system loaded, empty array)
- B3: Remove dead non-streaming generate action from Ollama API route
- P4: Add 5-second AbortController timeout to all fetchOllama() calls
to prevent indefinite hangs when Ollama is unresponsive
Rewrite DASHBOARD_REVIEW.md with progress-tracking improvements:
- Add GitHub-style checkboxes to all 41 actionable items
- Add file inventory table with line counts and purposes
- Add commit log section for tracking implementation progress
- Add sprint tracker tables with effort estimates and commit columns
- New finding B11: prompt text not cleared on Escape close
- New finding CQ6: no Zod validation on API responses
- Consolidate priority matrix into sprint tables (less redundancy)
- Add deferred items section with dependency notes
- Improve item descriptions with more precise file:line references
- Add stack summary and total effort estimate (14–17 hrs)
Replace Node.js os.freemem() with vm_stat parsing for macOS. The old
approach reported ~47.7 GB / 48 GB 'used' because os.freemem() only
counts truly free pages, ignoring ~20 GB of inactive/reclaimable cache.
New memory breakdown:
- App Memory: active + wired + compressor (actual process usage)
- Cached: inactive + purgeable + speculative (reclaimable on demand)
- Available: free + cached (what apps can actually use)
- Pressure: normal/warning/critical based on app memory ratio
Dashboard UI updated to show app memory, cached (reclaimable) label,
and pressure-based color coding on progress bars.
Bug fixes:
- Fix Google Fonts build error (corporate proxy blocks fonts.gstatic.com)
by removing Geist font imports and switching to system font stack
- Fix system API 7.6s latency by caching static info (chip, GPU, brew)
with timeouts on shell commands — now responds in ~50ms
New features:
- Streaming prompt responses via NDJSON proxy (/api/ollama/stream)
with typing cursor animation and auto-scroll
- Model pull UI: input field + button to download new models
- Model delete with two-step confirmation dialog
- VRAM usage and expiry time display for loaded models
- Toast notifications (success/error/info) with slide-in animation
- Copy response button in prompt modal
- Escape key closes modals, backdrop click dismisses
- Pull/delete/show actions added to Ollama API route
Next.js 16 dashboard for monitoring and managing the local LLM stack.
Runs on port 3100 with dark theme using ByteLyst design tokens.
API routes:
- GET/POST /api/ollama — model list, running status, load/unload/generate
- GET /api/whisper — binary discovery, GGML model inventory
- GET /api/system — chip info, RAM/disk usage, brew package versions
Dashboard UI:
- Top stats row: Ollama status, model count, Whisper status, RAM usage
- Ollama models panel with load/unload actions, LOADED badge, details
- System panel with progress bars for RAM and disk
- Whisper.cpp panel with binary list and model inventory
- Brew packages panel with version tracking
- Basic prompt modal with Cmd+Enter shortcut
- Auto-refresh every 15 seconds
Also excludes __LOCAL_LLMs/ from root ESLint config (dashboard has its
own config and uses browser globals not available in Node.js context).
Tech: Next.js 16, React 19, TailwindCSS v4, Lucide icons, TypeScript
- Add __LOCAL_LLMs/LOCAL_LLMs_setup_mac_m4_48gb.md: comprehensive reference
for running Ollama on the dev Mac covering installation (v0.16.2 via brew),
corp proxy handling (AT&T Forcepoint), OpenAI-compat API usage examples
(curl/Node/Python), extraction-service eval integration, Python sidecar
wiring, model recommendations by use case, troubleshooting, and env var
reference
- Models documented: llama3.1:8b (4.9GB, default evals), qwen2.5-coder:32b
(19GB, code gen / Swift / TS)
- Add evals/promptfoo.ollama.yaml: same 19 cases hitting Ollama OpenAI-compat
API directly (no extraction-service needed); all assertions use inline
JSON.parse(output) to handle raw string response from Ollama
- Add evals/compare-evals.sh: runs Gemini + Ollama evals back-to-back and
prints side-by-side pass-rate comparison table
- Supports OLLAMA_MODEL env var (default: llama3.1:8b)
- Add OsVersionRange interface + Zod schema (platform, minVersion?, maxVersion?)
- Add regions[] and osVersions[] to FeatureFlagDoc, CreateFlagSchema, UpdateFlagSchema
- Add compareVersions() helper for dot-separated semver comparison
- Extend GET /flags/poll with ?region and ?osVersion query params
- Region targeting: flag only returned if client region is in flag's regions list
- OS version targeting: per-platform min/max version range filtering
- Add 10 new tests (schema validation, compareVersions edge cases)
- 634 tests passing, tsc clean