docs(local-llm): add Next Wave — 15 model intelligence improvements (N1–N15)

Section 8 of DASHBOARD_REVIEW.md: pre-load RAM estimates, will-it-fit indicator, RAM budget bar, context window, architecture/vision/think badges, sort, tok/s history, countdown, session stats, delete confirm, co-load suggestions. Organized in 4 tiers with sprint plan.
2026-02-19 22:32:29 -08:00 · 2026-02-19 22:32:29 -08:00 · 519f348583
commit 519f348583
parent 4090c8aa13
1 changed files with 50 additions and 0 deletions
--- a/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md
+++ b/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md
@ -258,3 +258,53 @@ _Commits will be added here as work progresses._
 > **39 items total:** 11 bugs, 6 code quality, 16 features, 5 performance, 3 security
 > **All 39 items completed** across 7 sprints (9 code commits + doc updates)
 > **Actual total effort:** ~8 hours across 7 sprints
+
+---
+
+## 8. Next Wave — Model Intelligence & Pre-Load Metrics
+
+> Proposed improvements focused on helping users make informed decisions **before** loading a model.
+
+### Tier A — Pre-Load Decision Metrics _(est. 45 min)_
+
+| ID  | Feature                        | Description                                                                                                                                     |
+| --- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
+| N1  | **Estimated RAM per model**    | Approximate from disk size: Q4_K_M ≈ 1.2×disk in RAM. Show on every model card (e.g., `~22 GB RAM`), not just running models.                   |
+| N2  | **"Will it fit?" indicator**   | Compare estimated RAM vs `system.memory.free + cached`. Color-code: 🟢 Fits, 🟡 Tight (80–100%), 🔴 Won't fit. Show on Load button or as badge. |
+| N3  | **Aggregate loaded model RAM** | Sum VRAM of all running models. Display at top of models panel: "3 models loaded · 28.5 GB VRAM".                                               |
+
+### Tier B — Rich Model Metadata _(est. 60 min)_
+
+| ID  | Feature                 | Description                                                                                                                                |
+| --- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------- | ---------------- | ------------------------------------------ |
+| N4  | **RAM budget bar**      | Horizontal stacked bar: `[OS+Apps                                                                                                          | Model A (loaded) | Model B (loaded) | Free]`. Instant visual of memory headroom. |
+| N5  | **Context window size** | Fetch `context_length` from Ollama `/api/show` → `model_info`. Display on card (e.g., `128k ctx`). Critical for knowing max prompt length. |
+
+### Tier C — Model Intelligence Badges _(est. 45 min)_
+
+| ID  | Feature                     | Description                                                                                                                       |
+| --- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
+| N6  | **`<think>` warning badge** | If model is DeepSeek R1 family, show ⚠️ badge: "Emits `<think>` traces — strip before JSON.parse". Prevents silent JSON failures. |
+| N7  | **Vision model indicator**  | If model is multimodal (llava, qwen2.5vl), show 👁 badge. These need image input — text-only prompts are suboptimal.              |
+| N8  | **Architecture badge**      | Show model arch (llama, qwen2, phi3, deepseek2) as subtle pill on the card. Currently buried in expanded details.                 |
+| N9  | **Sort/order models**       | Dropdown to sort by: name, size, parameters, running status, last modified. Currently uses Ollama's default order.                |
+| N10 | **Ollama version display**  | Call `/api/version`. Show in Ollama status card. Useful for debugging model compatibility.                                        |
+
+### Tier D — Runtime Metrics & UX _(est. 30 min)_
+
+| ID  | Feature                           | Description                                                                                                                                    |
+| --- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| N11 | **Last known tok/s per model**    | Persist `StreamMetrics.tokensPerSec` in localStorage keyed by model. Show on card (e.g., `~45 tok/s`). Compare speeds without re-benchmarking. |
+| N12 | **Auto-unload countdown**         | Replace static `Expires: 3:45 PM` with live countdown: `Unloads in 4m 32s`. More actionable.                                                   |
+| N13 | **Session stats per model**       | Track prompts sent + tokens generated per model in session. Show in expanded details.                                                          |
+| N14 | **Delete confirmation + reclaim** | Show "Delete qwen2.5-coder:32b? Reclaim 18.5 GB disk." before deleting. Currently no confirmation.                                             |
+| N15 | **Simultaneous load suggestions** | Based on available RAM, suggest which models can be co-loaded. E.g., "Can co-load llama3.1:8b + qwen2.5-coder:32b (28 GB, 20 GB free)".        |
+
+### Implementation Plan
+
+| Sprint | Items                   | Focus                    | Effort  |
+| ------ | ----------------------- | ------------------------ | ------- |
+| 8      | N1, N2, N3              | Pre-load RAM estimates   | ~45 min |
+| 9      | N4, N5                  | RAM bar + context window | ~60 min |
+| 10     | N6, N7, N8, N9, N10     | Badges + sort + version  | ~45 min |
+| 11     | N11, N12, N13, N14, N15 | Runtime metrics + UX     | ~30 min |