From 519f3485839ee2a189e4dc887b0845a85ae717c0 Mon Sep 17 00:00:00 2001 From: saravanakumardb1 Date: Thu, 19 Feb 2026 22:32:29 -0800 Subject: [PATCH] =?UTF-8?q?docs(local-llm):=20add=20Next=20Wave=20?= =?UTF-8?q?=E2=80=94=2015=20model=20intelligence=20improvements=20(N1?= =?UTF-8?q?=E2=80=93N15)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Section 8 of DASHBOARD_REVIEW.md: pre-load RAM estimates, will-it-fit indicator, RAM budget bar, context window, architecture/vision/think badges, sort, tok/s history, countdown, session stats, delete confirm, co-load suggestions. Organized in 4 tiers with sprint plan. --- __LOCAL_LLMs/docs/DASHBOARD_REVIEW.md | 50 +++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md b/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md index aa627b03..6948dae1 100644 --- a/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md +++ b/__LOCAL_LLMs/docs/DASHBOARD_REVIEW.md @@ -258,3 +258,53 @@ _Commits will be added here as work progresses._ > **39 items total:** 11 bugs, 6 code quality, 16 features, 5 performance, 3 security > **All 39 items completed** across 7 sprints (9 code commits + doc updates) > **Actual total effort:** ~8 hours across 7 sprints + +--- + +## 8. Next Wave β€” Model Intelligence & Pre-Load Metrics + +> Proposed improvements focused on helping users make informed decisions **before** loading a model. + +### Tier A β€” Pre-Load Decision Metrics _(est. 45 min)_ + +| ID | Feature | Description | +| --- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | +| N1 | **Estimated RAM per model** | Approximate from disk size: Q4_K_M β‰ˆ 1.2Γ—disk in RAM. Show on every model card (e.g., `~22 GB RAM`), not just running models. | +| N2 | **"Will it fit?" indicator** | Compare estimated RAM vs `system.memory.free + cached`. Color-code: 🟒 Fits, 🟑 Tight (80–100%), πŸ”΄ Won't fit. Show on Load button or as badge. | +| N3 | **Aggregate loaded model RAM** | Sum VRAM of all running models. Display at top of models panel: "3 models loaded Β· 28.5 GB VRAM". | + +### Tier B β€” Rich Model Metadata _(est. 60 min)_ + +| ID | Feature | Description | +| --- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------- | ---------------- | ------------------------------------------ | +| N4 | **RAM budget bar** | Horizontal stacked bar: `[OS+Apps | Model A (loaded) | Model B (loaded) | Free]`. Instant visual of memory headroom. | +| N5 | **Context window size** | Fetch `context_length` from Ollama `/api/show` β†’ `model_info`. Display on card (e.g., `128k ctx`). Critical for knowing max prompt length. | + +### Tier C β€” Model Intelligence Badges _(est. 45 min)_ + +| ID | Feature | Description | +| --- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | +| N6 | **`` warning badge** | If model is DeepSeek R1 family, show ⚠️ badge: "Emits `` traces β€” strip before JSON.parse". Prevents silent JSON failures. | +| N7 | **Vision model indicator** | If model is multimodal (llava, qwen2.5vl), show πŸ‘ badge. These need image input β€” text-only prompts are suboptimal. | +| N8 | **Architecture badge** | Show model arch (llama, qwen2, phi3, deepseek2) as subtle pill on the card. Currently buried in expanded details. | +| N9 | **Sort/order models** | Dropdown to sort by: name, size, parameters, running status, last modified. Currently uses Ollama's default order. | +| N10 | **Ollama version display** | Call `/api/version`. Show in Ollama status card. Useful for debugging model compatibility. | + +### Tier D β€” Runtime Metrics & UX _(est. 30 min)_ + +| ID | Feature | Description | +| --- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| N11 | **Last known tok/s per model** | Persist `StreamMetrics.tokensPerSec` in localStorage keyed by model. Show on card (e.g., `~45 tok/s`). Compare speeds without re-benchmarking. | +| N12 | **Auto-unload countdown** | Replace static `Expires: 3:45 PM` with live countdown: `Unloads in 4m 32s`. More actionable. | +| N13 | **Session stats per model** | Track prompts sent + tokens generated per model in session. Show in expanded details. | +| N14 | **Delete confirmation + reclaim** | Show "Delete qwen2.5-coder:32b? Reclaim 18.5 GB disk." before deleting. Currently no confirmation. | +| N15 | **Simultaneous load suggestions** | Based on available RAM, suggest which models can be co-loaded. E.g., "Can co-load llama3.1:8b + qwen2.5-coder:32b (28 GB, 20 GB free)". | + +### Implementation Plan + +| Sprint | Items | Focus | Effort | +| ------ | ----------------------- | ------------------------ | ------- | +| 8 | N1, N2, N3 | Pre-load RAM estimates | ~45 min | +| 9 | N4, N5 | RAM bar + context window | ~60 min | +| 10 | N6, N7, N8, N9, N10 | Badges + sort + version | ~45 min | +| 11 | N11, N12, N13, N14, N15 | Runtime metrics + UX | ~30 min |