docs(local-llm): add Next Wave — 15 model intelligence improvements (N1–N15)

Section 8 of DASHBOARD_REVIEW.md: pre-load RAM estimates, will-it-fit
indicator, RAM budget bar, context window, architecture/vision/think
badges, sort, tok/s history, countdown, session stats, delete confirm,
co-load suggestions. Organized in 4 tiers with sprint plan.
This commit is contained in:
saravanakumardb1 2026-02-19 22:32:29 -08:00
parent 4090c8aa13
commit 519f348583

View File

@ -258,3 +258,53 @@ _Commits will be added here as work progresses._
> **39 items total:** 11 bugs, 6 code quality, 16 features, 5 performance, 3 security
> **All 39 items completed** across 7 sprints (9 code commits + doc updates)
> **Actual total effort:** ~8 hours across 7 sprints
---
## 8. Next Wave — Model Intelligence & Pre-Load Metrics
> Proposed improvements focused on helping users make informed decisions **before** loading a model.
### Tier A — Pre-Load Decision Metrics _(est. 45 min)_
| ID | Feature | Description |
| --- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| N1 | **Estimated RAM per model** | Approximate from disk size: Q4_K_M ≈ 1.2×disk in RAM. Show on every model card (e.g., `~22 GB RAM`), not just running models. |
| N2 | **"Will it fit?" indicator** | Compare estimated RAM vs `system.memory.free + cached`. Color-code: 🟢 Fits, 🟡 Tight (80100%), 🔴 Won't fit. Show on Load button or as badge. |
| N3 | **Aggregate loaded model RAM** | Sum VRAM of all running models. Display at top of models panel: "3 models loaded · 28.5 GB VRAM". |
### Tier B — Rich Model Metadata _(est. 60 min)_
| ID | Feature | Description |
| --- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------- | ---------------- | ------------------------------------------ |
| N4 | **RAM budget bar** | Horizontal stacked bar: `[OS+Apps | Model A (loaded) | Model B (loaded) | Free]`. Instant visual of memory headroom. |
| N5 | **Context window size** | Fetch `context_length` from Ollama `/api/show``model_info`. Display on card (e.g., `128k ctx`). Critical for knowing max prompt length. |
### Tier C — Model Intelligence Badges _(est. 45 min)_
| ID | Feature | Description |
| --- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| N6 | **`<think>` warning badge** | If model is DeepSeek R1 family, show ⚠️ badge: "Emits `<think>` traces — strip before JSON.parse". Prevents silent JSON failures. |
| N7 | **Vision model indicator** | If model is multimodal (llava, qwen2.5vl), show 👁 badge. These need image input — text-only prompts are suboptimal. |
| N8 | **Architecture badge** | Show model arch (llama, qwen2, phi3, deepseek2) as subtle pill on the card. Currently buried in expanded details. |
| N9 | **Sort/order models** | Dropdown to sort by: name, size, parameters, running status, last modified. Currently uses Ollama's default order. |
| N10 | **Ollama version display** | Call `/api/version`. Show in Ollama status card. Useful for debugging model compatibility. |
### Tier D — Runtime Metrics & UX _(est. 30 min)_
| ID | Feature | Description |
| --- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| N11 | **Last known tok/s per model** | Persist `StreamMetrics.tokensPerSec` in localStorage keyed by model. Show on card (e.g., `~45 tok/s`). Compare speeds without re-benchmarking. |
| N12 | **Auto-unload countdown** | Replace static `Expires: 3:45 PM` with live countdown: `Unloads in 4m 32s`. More actionable. |
| N13 | **Session stats per model** | Track prompts sent + tokens generated per model in session. Show in expanded details. |
| N14 | **Delete confirmation + reclaim** | Show "Delete qwen2.5-coder:32b? Reclaim 18.5 GB disk." before deleting. Currently no confirmation. |
| N15 | **Simultaneous load suggestions** | Based on available RAM, suggest which models can be co-loaded. E.g., "Can co-load llama3.1:8b + qwen2.5-coder:32b (28 GB, 20 GB free)". |
### Implementation Plan
| Sprint | Items | Focus | Effort |
| ------ | ----------------------- | ------------------------ | ------- |
| 8 | N1, N2, N3 | Pre-load RAM estimates | ~45 min |
| 9 | N4, N5 | RAM bar + context window | ~60 min |
| 10 | N6, N7, N8, N9, N10 | Badges + sort + version | ~45 min |
| 11 | N11, N12, N13, N14, N15 | Runtime metrics + UX | ~30 min |