saravanakumardb1 519f348583 docs(local-llm): add Next Wave — 15 model intelligence improvements (N1–N15)

Section 8 of DASHBOARD_REVIEW.md: pre-load RAM estimates, will-it-fit
indicator, RAM budget bar, context window, architecture/vision/think
badges, sort, tok/s history, countdown, session stats, delete confirm,
co-load suggestions. Organized in 4 tiers with sprint plan.

2026-02-19 22:32:29 -08:00

20 KiB

Raw Blame History

Mission Control Dashboard — Bug & Improvement Review

Systematic code review of __LOCAL_LLMs/dashboard/ (6 source files, 1,395 lines) Last updated: Feb 19, 2026

File Inventory

File	Lines	Purpose
`src/app/page.tsx`	1,079	Main dashboard UI (single component)
`src/app/globals.css`	91	Design tokens, animations, base styles
`src/app/layout.tsx`	20	Root layout (metadata, dark mode)
`src/app/api/ollama/route.ts`	117	Ollama REST proxy (list, load, unload, pull, delete, show, generate)
`src/app/api/ollama/stream/route.ts`	38	Ollama streaming generate proxy (NDJSON)
`src/app/api/whisper/route.ts`	66	Whisper binary + GGML model discovery
`src/app/api/system/route.ts`	162	System info (chip, memory via vm_stat, disk, brew)

Stack: Next.js 16, React 19, TailwindCSS v4, Lucide icons, TypeScript

1. Bugs

B1. Hardcoded machine specs in header — page.tsx:317 Subtitle reads Apple M4 Pro · 48 GB · {system?.platform} — should use system?.chip and formatBytes(system?.memory.total) dynamically so it works on any machine.
B2. Pull model blocks UI — no progress feedback — api/ollama/route.ts:84-92 handlePull calls Ollama with stream: false. Large models (20+ GB) block for 30+ minutes. The Next.js API route will likely timeout. Must use stream: true and pipe progress events to the client. (Combined with F1.)
B3. Dead code: non-streaming generate action — api/ollama/route.ts:69-82 The action === 'generate' handler is unused — UI only uses /api/ollama/stream. Remove or keep as fallback with a comment.
B4. Escape key closes modal during active streaming — page.tsx:188-197 Global keydown handler calls setPromptModel(null) unconditionally. Backdrop click correctly checks !promptLoading. Escape should also respect promptLoading to prevent discarding an in-flight response.
B5. Auto-refresh (15s) fires during streaming/pull — page.tsx:182-185 setInterval(fetchAll, 15000) runs unconditionally. During streaming this causes background churn and potential UI flicker. Should pause while promptLoading or pullLoading is true.
B6. Toast ID collision on HMR remount — page.tsx:156-159 toastId.current resets to 0 on component remount during dev. Use Date.now() or crypto.randomUUID() for robust uniqueness.
B7. vm_stat page size hardcoded — api/system/route.ts:103 Hardcoded 16384. Should parse from vm_stat's first line: "(page size of NNNNN bytes)" for portability.
B8. Whisper models dir not configurable — api/whisper/route.ts:24 Hardcoded to ~/whisper-models. Should scan multiple known paths (/opt/homebrew/share/whisper-cpp/models/, ~/whisper-models, ~/.cache/whisper/) or accept WHISPER_MODELS_DIR env var.
B9. No AbortController for streaming fetch — page.tsx:250-289 Closing the prompt modal doesn't cancel the underlying fetch. The reader.read() loop continues in the background wasting CPU/bandwidth until the model finishes generating.
B10. Brew shows "Loading..." when array is empty — page.tsx:936-940 When system.brewPackages is [] (all uninstalled), displays "Loading..." instead of "No packages found". Needs to distinguish "still fetching" vs "fetched but empty".
B11. Prompt text not cleared on close without send — page.tsx:951-957 Backdrop click clears promptText, but Escape handler (B4 fix) should also clear it. Otherwise stale text persists when re-opening.

2. Code Quality

CQ1. Monolithic 1,079-line single component — page.tsx All interfaces, utilities, sub-components, and 900+ lines of JSX in one file. Extract to:
- components/ — StatusDot, ProgressBar, ToastContainer, PromptModal, OllamaModelsPanel, SystemPanel, WhisperPanel, BrewPanel
- lib/types.ts — interfaces (OllamaModel, SystemData, etc.)
- lib/format.ts — formatBytes, formatUptime
- lib/hooks.ts — useAutoRefresh, useToasts, useOllamaActions
CQ2. Pervasive inline styles instead of CSS/Tailwind classes — page.tsx (100+ occurrences) Every style={{ color: 'var(--text-tertiary)' }} should be a utility class. Options: custom Tailwind theme mapping, or CSS utility classes in globals.css (e.g., .text-muted).
CQ3. OLLAMA_URL duplicated — api/ollama/route.ts:3 + api/ollama/stream/route.ts:3 Same process.env.OLLAMA_URL || 'http://localhost:11434' in two files. Extract to lib/ollama-config.ts.
CQ4. No React Error Boundary — page.tsx Unexpected API response shape crashes the entire dashboard. Add an error.tsx (Next.js App Router convention) for graceful recovery.
CQ5. No loading skeleton / shimmer UI Initial load shows "..." placeholders. Skeleton cards would be more polished.
CQ6. No TypeScript strict null checks in API responses API route handlers catch errors but return loosely typed JSON. Add Zod validation on the Ollama/system responses to prevent runtime surprises.

3. Features

F1. Streaming pull with progress bar (fixes B2) Use Ollama stream: true for /api/pull. Create /api/ollama/pull/route.ts that pipes NDJSON progress. UI shows progress bar with completed/total bytes, speed, and ETA.
F2. Model search/filter Search input above models list. Filter by name, family, quantization. Useful when 10+ models are installed.
F3. Prompt history (localStorage) Store last 20 prompts with model name + timestamp. Dropdown in prompt modal to re-run previous prompts.
F4. Chat mode (multi-turn conversation) Use Ollama /api/chat instead of /api/generate. Chat bubble layout with message history. System prompt input field.
F5. Model comparison (side-by-side) Send same prompt to 2 models simultaneously. Display responses side-by-side with latency/quality comparison.
F6. Token/s metrics after generation Parse eval_count and eval_duration from the final NDJSON chunk. Display tokens/second, total tokens, and latency in the response footer.
F7. System resource sparklines (time-series) Ring buffer of memory/CPU snapshots (localStorage). Render mini sparkline charts in the System panel. Spot trends over time.
F8. Ollama server logs viewer Read ~/.ollama/logs/ and display in a collapsible terminal-style panel. Filter by level. Auto-scroll.
F9. Modelfile / template viewer The show action already fetches Modelfile, template, and system prompt. Display in a collapsible code block in expanded model details.
F10. Dark/light theme toggle Add :root.light CSS variable overrides. Theme toggle with localStorage persistence. Current architecture supports this natively.
F11. Keyboard shortcuts panel (? key) Show all shortcuts in a modal: ⌘+Enter (send), Esc (close), R (refresh), / (search models), ? (help).
F12. Whisper transcription test Upload/record a short audio clip, transcribe locally via whisper-cli, display result with latency. Tests the full local STT pipeline.
F13. Responsive mobile layout Better breakpoints for the 4-column stats row and 3-column main grid. Collapsible sidebar on mobile.
F14. Model tags/labels (localStorage) User-defined tags (coding, fast, vision) with colored badges. Persisted in localStorage.
F15. Extraction service integration panel Show extraction-service (port 4005) health status. Run test extractions against loaded Ollama models. Bridges dashboard to LysnrAI pipeline.
F16. Auto-load preferred model Mark a model as "auto-load" (stored in localStorage). When Ollama is online but no models loaded, auto-load the preferred model.

4. Performance & Reliability

P1. No request deduplication on Refresh — page.tsx:164-176 Rapid clicks on Refresh fire duplicate fetchAll() calls. Add a fetchingRef guard or disable the button during fetch (partially done for actionLoading but not for fetchAll).
P2. Static cache never expires — api/system/route.ts:81-90 staticCache (chip, GPU, brew) lives forever in the server process. Brew package upgrades won't reflect. Add 5-minute TTL.
P3. du -sk ~/.ollama/models on every refresh — api/system/route.ts:41 Traverses entire models directory every 15 seconds. Cache with 60-second TTL.
P4. No fetch timeout on Ollama calls — api/ollama/route.ts:5-12 fetchOllama has no AbortSignal or timeout. If Ollama hangs, the dashboard hangs. Add 5-second timeout.
P5. system_profiler slow on first load — api/system/route.ts:52-53 Takes ~2-3 seconds. Cached after first call, but first dashboard load waits. Consider eager background fetch on server start or return placeholder.

5. Security & Hardening

S1. No input validation on model names — api/ollama/route.ts:50-51 model from request body passed directly to Ollama. Add regex validation: ^[a-zA-Z0-9._:/-]{1,256}$.
S2. Shell command interpolation pattern — api/system/route.ts:67 execAsync(\brew list --versions ${pkg}`)— safe today (hardcoded targets) but fragile. UseexecFile('brew', ['list', '--versions', pkg])` for safety.
S3. No CORS or auth (acceptable for local-only, documented) Any local process can call API routes. Fine for dev tool; document the assumption.

6. Implementation Tracker

Sprint 1 — Critical Bug Fixes (est. 1–2 hrs)

#	ID	Task	Effort	Commit
1	- [x] B4	Guard Escape key during streaming	5 min	`2da67c2`
2	- [x] B5	Pause auto-refresh during prompt/pull	10 min	`2da67c2`
3	- [x] B9	Add AbortController to streaming fetch	15 min	`2da67c2`
4	- [x] B1	Dynamic chip/RAM in header	5 min	`2da67c2`
5	- [x] B11	Clear prompt text on Escape close	5 min	`2da67c2`
6	- [x] P4	Add timeout to Ollama fetch calls	10 min	`2da67c2`
7	- [x] B3	Remove dead generate action (or document)	5 min	`2da67c2`
8	- [x] B6	Use Date.now() for toast IDs	2 min	`2da67c2`
9	- [x] B10	Fix brew "Loading..." vs "empty" state	5 min	`2da67c2`

Sprint 2 — Pull Progress + Metrics (est. 2–3 hrs)

#	ID	Task	Effort	Commit
10	- [x] B2+F1	Streaming pull with progress bar	60 min	`2d9475b`
11	- [x] F6	Display tokens/s after generation	30 min	`2d9475b`
12	- [x] B7	Parse vm_stat page size dynamically	10 min	`2d9475b`
13	- [x] B8	Multi-path whisper model discovery	15 min	`2d9475b`

Sprint 3 — Component Refactor (est. 2–3 hrs)

#	ID	Task	Effort	Commit
14	- [x] CQ1	Extract components into separate files	90 min	`75a3cd0`
15	- [x] CQ4	Add error.tsx Error Boundary	15 min	`75a3cd0`
16	- [x] CQ3	Shared ollama-config.ts	10 min	`75a3cd0`
17	- [x] CQ2	Consolidate inline styles → CSS classes	45 min	`ed93a6f`
18	- [x] S1	Add model name input validation	10 min	`75a3cd0`
19	- [x] S2	Replace exec → execFile for brew	10 min	`75a3cd0`

Sprint 4 — UX Enhancements (est. 3–4 hrs)

#	ID	Task	Effort	Commit
20	- [x] F3	Prompt history (localStorage)	45 min	`9c2f5f3`
21	- [x] F9	Modelfile viewer in expanded details	30 min	`9c2f5f3`
22	- [x] F4	Chat mode (multi-turn via /api/chat)	90 min	`ed93a6f`
23	- [x] F2	Model search/filter	30 min	`9c2f5f3`
24	- [x] F11	Keyboard shortcuts panel	20 min	`9c2f5f3`

Sprint 5 — Integration & Polish (est. 2–3 hrs)

#	ID	Task	Effort	Commit
25	- [x] F15	Extraction service panel	60 min	`8bdd5ee`
26	- [x] F12	Whisper transcription test	45 min	`8bdd5ee`
27	- [x] F7	System resource sparklines	45 min	`8bdd5ee`
28	- [x] CQ5	Loading skeleton UI	20 min	`8bdd5ee`
29	- [x] P1-P3	Request dedup + cache TTLs	30 min	`b1fda3a`
30	- [x] F16	Auto-load preferred model	20 min	`ed93a6f`

Deferred (nice-to-have)

ID	Task	Notes
- [x] F5	Model comparison (side-by-side)	`8bdd5ee`
- [x] F10	Dark/light theme toggle	`ed93a6f`
- [x] F13	Responsive mobile layout	`8bdd5ee`
- [x] F14	Model tags/labels	`ed93a6f`
- [x] CQ6	Zod validation on API responses	`ed93a6f`
- [x] F8	Ollama server logs viewer	`8bdd5ee`
- [x] S3	CORS / auth (documented)	`8bdd5ee`

7. Commit Log

Commits will be added here as work progresses.

#	Date	Commit	Sprint	Items Completed
1	Feb 19	`2da67c2`	Sprint 1	B1, B3, B4, B5, B6, B9, B10, B11, P4
2	Feb 19	`2d9475b`	Sprint 2	B2, B7, B8, F1, F6
3	Feb 19	`75a3cd0`	Sprint 3	CQ1, CQ3, CQ4, S1, S2
4	Feb 19	`9c2f5f3`	Sprint 4	F2, F3, F9, F11
5	Feb 19	`b1fda3a`	Sprint 5	P1, P2, P3
6	Feb 19	`ed93a6f`	Sprint 6	CQ2, CQ6, P5, F4, F10, F14, F16
7	Feb 19	`8bdd5ee`	Sprint 7	F5, F7, F8, F12, F13, F15, CQ5, S3

39 items total: 11 bugs, 6 code quality, 16 features, 5 performance, 3 security All 39 items completed across 7 sprints (9 code commits + doc updates) Actual total effort: ~8 hours across 7 sprints

8. Next Wave — Model Intelligence & Pre-Load Metrics

Proposed improvements focused on helping users make informed decisions before loading a model.

Tier A — Pre-Load Decision Metrics (est. 45 min)

ID	Feature	Description
N1	Estimated RAM per model	Approximate from disk size: Q4_K_M ≈ 1.2×disk in RAM. Show on every model card (e.g., `~22 GB RAM`), not just running models.
N2	"Will it fit?" indicator	Compare estimated RAM vs `system.memory.free + cached`. Color-code: 🟢 Fits, 🟡 Tight (80–100%), 🔴 Won't fit. Show on Load button or as badge.
N3	Aggregate loaded model RAM	Sum VRAM of all running models. Display at top of models panel: "3 models loaded · 28.5 GB VRAM".

Tier B — Rich Model Metadata (est. 60 min)

ID	Feature	Description
N4	RAM budget bar	Horizontal stacked bar: `[OS+Apps	Model A (loaded)	Model B (loaded)	Free]`. Instant visual of memory headroom.
N5	Context window size	Fetch `context_length` from Ollama `/api/show` → `model_info`. Display on card (e.g., `128k ctx`). Critical for knowing max prompt length.

Tier C — Model Intelligence Badges (est. 45 min)

ID	Feature	Description
N6	`<think>` warning badge	If model is DeepSeek R1 family, show ⚠️ badge: "Emits `<think>` traces — strip before JSON.parse". Prevents silent JSON failures.
N7	Vision model indicator	If model is multimodal (llava, qwen2.5vl), show 👁 badge. These need image input — text-only prompts are suboptimal.
N8	Architecture badge	Show model arch (llama, qwen2, phi3, deepseek2) as subtle pill on the card. Currently buried in expanded details.
N9	Sort/order models	Dropdown to sort by: name, size, parameters, running status, last modified. Currently uses Ollama's default order.
N10	Ollama version display	Call `/api/version`. Show in Ollama status card. Useful for debugging model compatibility.

Tier D — Runtime Metrics & UX (est. 30 min)

ID	Feature	Description
N11	Last known tok/s per model	Persist `StreamMetrics.tokensPerSec` in localStorage keyed by model. Show on card (e.g., `~45 tok/s`). Compare speeds without re-benchmarking.
N12	Auto-unload countdown	Replace static `Expires: 3:45 PM` with live countdown: `Unloads in 4m 32s`. More actionable.
N13	Session stats per model	Track prompts sent + tokens generated per model in session. Show in expanded details.
N14	Delete confirmation + reclaim	Show "Delete qwen2.5-coder:32b? Reclaim 18.5 GB disk." before deleting. Currently no confirmation.
N15	Simultaneous load suggestions	Based on available RAM, suggest which models can be co-loaded. E.g., "Can co-load llama3.1:8b + qwen2.5-coder:32b (28 GB, 20 GB free)".

Implementation Plan

Sprint	Items	Focus	Effort
8	N1, N2, N3	Pre-load RAM estimates	~45 min
9	N4, N5	RAM bar + context window	~60 min
10	N6, N7, N8, N9, N10	Badges + sort + version	~45 min
11	N11, N12, N13, N14, N15	Runtime metrics + UX	~30 min

20 KiB Raw Blame History Unescape Escape

Mission Control Dashboard — Bug & Improvement Review

File Inventory

1. Bugs

2. Code Quality

3. Features

4. Performance & Reliability

5. Security & Hardening

6. Implementation Tracker

Sprint 1 — Critical Bug Fixes (est. 1–2 hrs)

Sprint 2 — Pull Progress + Metrics (est. 2–3 hrs)

Sprint 3 — Component Refactor (est. 2–3 hrs)

Sprint 4 — UX Enhancements (est. 3–4 hrs)

Sprint 5 — Integration & Polish (est. 2–3 hrs)

Deferred (nice-to-have)

7. Commit Log

8. Next Wave — Model Intelligence & Pre-Load Metrics

Tier A — Pre-Load Decision Metrics (est. 45 min)

Tier B — Rich Model Metadata (est. 60 min)

Tier C — Model Intelligence Badges (est. 45 min)

Tier D — Runtime Metrics & UX (est. 30 min)

Implementation Plan

20 KiB

Raw Blame History