diff --git a/__LOCAL_LLMs/dashboard/docs/DASHBOARD_ROADMAP.md b/__LOCAL_LLMs/dashboard/docs/DASHBOARD_ROADMAP.md index 69981a28..6955d6db 100644 --- a/__LOCAL_LLMs/dashboard/docs/DASHBOARD_ROADMAP.md +++ b/__LOCAL_LLMs/dashboard/docs/DASHBOARD_ROADMAP.md @@ -15,30 +15,76 @@ Transform the dashboard from a model management tool into a **model intelligence --- +## Bugs Found During Review + +Issues discovered by cross-referencing the roadmap against the actual codebase (`page.tsx` ~1,885 lines): + +- [ ] **BN1. Compare buttons show unloaded models** — `page.tsx:1809` + `ollama.models.filter(m => m.name !== promptModel)` shows ALL installed models for comparison, but unloaded models can't generate responses. Should filter to `ollama.running` only, or show a "load first" indicator. + +- [ ] **BN2. No AbortController on compare stream** — `page.tsx:236-274` + `handleCompare` fetches `/api/ollama/stream` but doesn't use an abort controller. Closing the prompt modal during comparison doesn't cancel the stream — it continues in the background wasting resources. + +- [ ] **BN3. Chat messages lost on modal close** — `page.tsx:1524-1530` + Closing the prompt modal clears `promptResponse` and `promptText` but does not persist `chatMessages`. Re-opening the modal starts a fresh conversation. Multi-turn history is discarded. + +- [ ] **BN4. Logs panel has no refresh** — `page.tsx:1476-1517` + The Ollama logs panel fetches once on open (`fetchLogs` on toggle). There's no refresh button — the only way to see new logs is to close and re-open the panel. + +- [ ] **BN5. Delete confirmation doesn't show reclaim size** — `page.tsx:1121-1153` + Delete confirmation exists (two-step flow via `deleteConfirm` state) but only shows "Delete this model?" without the disk reclaim amount. N14 in the roadmap was marked as new, but the dialog already exists — it just needs `formatBytes(model.size)` added. + +--- + ## Phase 1 — Pre-Load Intelligence _(Sprint 8)_ **Goal:** Give users the information they need **before** clicking "Load" on a model. -**Estimated effort:** ~45 minutes +**Estimated effort:** ~60 minutes -| # | ID | Task | Status | Priority | Notes | -| --- | --- | -------------------------- | ------ | -------- | ----------------------------------------------------------- | -| 1 | N1 | Estimated RAM per model | [ ] | High | Q4_K_M ≈ 1.2× disk size. Show `~22 GB RAM` on every card. | -| 2 | N2 | "Will it fit?" indicator | [ ] | High | 🟢 Fits / 🟡 Tight / 🔴 Won't fit based on free+cached RAM. | -| 3 | N3 | Aggregate loaded model RAM | [ ] | High | Sum VRAM at top of panel: "2 loaded · 28.5 GB VRAM". | +| # | ID | Task | Status | Priority | Notes | +| --- | --- | ---------------------------------- | ------ | -------- | ------------------------------------------------------------ | +| 1 | N1 | Estimated RAM per model | - [ ] | High | Show `~22 GB RAM` on every model card, not just running ones | +| 2 | N2 | "Will it fit?" indicator | - [ ] | High | 🟢 Fits / 🟡 Tight / 🔴 Won't fit, on Load button | +| 3 | N3 | Aggregate loaded model RAM | - [ ] | High | Sum VRAM at top of panel: "2 loaded · 28.5 GB VRAM" | +| 4 | BN1 | Fix compare to show loaded only | - [ ] | High | Filter compare buttons to `ollama.running` models | +| 5 | BN2 | Add AbortController to compare | - [ ] | High | Cancel compare stream on modal close | +| 6 | BN5 | Show reclaim size in delete dialog | - [ ] | Medium | Add `formatBytes(model.size)` to existing confirmation | **Implementation details:** -- **N1:** Add `estimateRam(diskSize: number)` to `lib/format.ts`. Returns `diskSize * 1.2`. Display below existing size/params/quant line on each model card. -- **N2:** Compare `estimateRam(model.size)` against `system.memory.free + system.memory.cached`. Pass `system` into the model list rendering. Add colored dot or badge next to Load button. -- **N3:** Compute `ollama.running.reduce((sum, r) => sum + r.size_vram, 0)` and display in the models panel header next to "X active". +- **N1:** Add `estimateRam(diskSize: number, quant?: string)` to `lib/format.ts`. Use quantization-aware multipliers: + - `Q4_K_M` / `Q4_K_S` / `Q4_0`: 1.2× + - `Q5_K_M` / `Q5_K_S`: 1.25× + - `Q8_0`: 1.1× + - `F16` / `F32`: 1.05× + - Default (unknown): 1.2× + Display below existing size/params/quant line on each model card. + **Note:** Apple Silicon uses unified memory — GPU and CPU share the same pool. Add a tooltip explaining this. + +- **N2:** Compare `estimateRam(model.size, model.details?.quantization_level)` against `system.memory.free + system.memory.cached`. Note: cached memory is reclaimable by the OS but not guaranteed available. Use conservative estimate: `free + (cached * 0.5)` as effective available. + - 🟢 Green: estimated < 70% of available → "Fits comfortably" + - 🟡 Yellow: estimated is 70–100% of available → "Tight — may swap" + - 🔴 Red: estimated > available → "Won't fit — will swap heavily" + Add as colored dot next to Load button with tooltip. + +- **N3:** Compute `ollama.running.reduce((sum, r) => sum + r.size_vram, 0)` and display in the models panel header: "X active · Y GB VRAM". + +- **BN1:** Change `ollama.models.filter(m => m.name !== promptModel)` to `ollama.running.filter(r => r.name !== promptModel)` for the compare buttons. If no other models are loaded, show "Load another model to compare". + +- **BN2:** Create a `compareAbortRef` similar to `abortRef`. Call `compareAbortRef.current?.abort()` when modal closes. + +- **BN5:** Change "Delete this model?" to `"Delete ${model.name}? Reclaim ${formatBytes(model.size)}"`. **Acceptance criteria:** -- Every model card shows estimated RAM requirement -- Load button has a color-coded fit indicator -- Panel header shows total VRAM of loaded models -- TypeScript compiles cleanly +- [ ] Every model card shows estimated RAM requirement with quant-aware multiplier +- [ ] Load button has a color-coded fit indicator with tooltip +- [ ] Panel header shows total VRAM of loaded models +- [ ] Compare buttons only show loaded models +- [ ] Compare stream is abortable +- [ ] Delete dialog shows disk reclaim amount +- [ ] TypeScript compiles cleanly --- @@ -46,23 +92,39 @@ Transform the dashboard from a model management tool into a **model intelligence **Goal:** Surface critical model metadata that's currently hidden behind the Ollama API. -**Estimated effort:** ~60 minutes +**Estimated effort:** ~75 minutes -| # | ID | Task | Status | Priority | Notes | -| --- | --- | ------------------- | ------ | -------- | ---------------------------------------------------------------------------- | -| 4 | N4 | RAM budget bar | [ ] | Medium | Stacked horizontal bar: OS+Apps / Loaded models (by name) / Free. | -| 5 | N5 | Context window size | [ ] | High | Fetch `context_length` from `/api/show` model_info. Show `128k ctx` on card. | +| # | ID | Task | Status | Priority | Notes | +| --- | --- | ------------------------- | ------ | -------- | -------------------------------------------------- | +| 7 | N4 | RAM budget bar | - [ ] | Medium | Stacked bar: OS+Apps / Models (by name) / Free | +| 8 | N5 | Context window size | - [ ] | High | Fetch `context_length` from `/api/show` model_info | +| 9 | BN3 | Persist chat messages | - [ ] | Medium | Save to localStorage, restore on modal re-open | +| 10 | BN4 | Logs panel refresh button | - [ ] | Low | Add refresh icon next to "Show/Hide Ollama Logs" | **Implementation details:** -- **N4:** New `RamBudgetBar` component. Inputs: `totalRam`, `appMemory`, `runningModels[]` (each with name + size_vram), `freeRam`. Renders as a CSS flex bar with labeled segments. Place above the models list. -- **N5:** Extend the `OllamaModel` interface to include optional `context_length?: number`. On expand (or eagerly for all models), call `/api/ollama` POST with `action: 'show'` and extract `model_info.*.context_length`. Cache in component state. Show as badge on card. +- **N4:** New `RamBudgetBar` component in `components/`. Inputs: `totalRam`, `appMemory`, `runningModels[]` (each with name + size_vram), `freeRam`. Renders as a CSS flex bar with labeled segments using inline widths. + - Segment colors: OS/Apps = `--text-tertiary`, each model = unique from palette, Free = `--surface-muted` + - Label each model segment with name + size (if wide enough, else tooltip) + - Place above models list in the left column + - **Note:** On Apple Silicon, unified memory means all models compete with OS + apps. The bar should show this clearly. + +- **N5:** Extend the `show` action handler in `/api/ollama/route.ts` — it already returns the full response. The client needs to extract `model_info.*.context_length` from the response. Currently `modelfileData[model.name]` only stores the modelfile text. Add a separate `modelMetadata` state to cache the full show response. Display context window as badge on card: `128k ctx`, `32k ctx`, `4k ctx`. + - Fetch on expand (lazy) to avoid N+1 calls on page load + - Cache in `modelMetadata: Record` + +- **BN3:** Save `chatMessages` to `localStorage` keyed by `llm-chat-${modelName}`. Restore on prompt modal open if same model. Add a "Clear chat" button next to the chat mode toggle. Cap stored messages at 50. + +- **BN4:** Add a `RefreshCw` icon button next to the "Show/Hide Ollama Logs" toggle. Clicking calls `fetchLogs()`. **Acceptance criteria:** -- RAM budget bar visually represents memory allocation -- Each model shows context window length (once fetched) -- Bar updates when models are loaded/unloaded +- [ ] RAM budget bar renders with labeled model segments +- [ ] Bar updates live when models are loaded/unloaded +- [ ] Context window shown for expanded models (fetched lazily) +- [ ] Chat history persists across modal close/re-open +- [ ] Logs panel has a refresh button +- [ ] TypeScript compiles cleanly --- @@ -70,88 +132,215 @@ Transform the dashboard from a model management tool into a **model intelligence **Goal:** Auto-detect model capabilities and surface warnings so users don't hit surprises. -**Estimated effort:** ~45 minutes +**Estimated effort:** ~60 minutes | # | ID | Task | Status | Priority | Notes | | --- | --- | ----------------------- | ------ | -------- | -------------------------------------------------------------------- | -| 6 | N6 | `` warning badge | [ ] | High | DeepSeek R1 models emit reasoning traces — warn about JSON stripping | -| 7 | N7 | Vision model indicator | [ ] | Medium | Multimodal models (llava, qwen2.5vl) need image input | -| 8 | N8 | Architecture badge | [ ] | Low | Show model arch as pill on card (currently in expanded only) | -| 9 | N9 | Sort/order models | [ ] | Medium | Dropdown: name, size, parameters, running, modified | -| 10 | N10 | Ollama version display | [ ] | Low | Show Ollama server version in status card | +| 11 | N6 | `` warning badge | - [ ] | High | DeepSeek R1 models emit reasoning traces — warn about JSON stripping | +| 12 | N7 | Vision model indicator | - [ ] | Medium | Multimodal models need image input | +| 13 | N8 | Architecture badge | - [ ] | Low | Show model arch as pill on card (currently in expanded only) | +| 14 | N9 | Sort/order models | - [ ] | Medium | Dropdown: name, size, parameters, running, modified | +| 15 | N10 | Ollama version display | - [ ] | Low | Show Ollama server version in status card | **Implementation details:** -- **N6:** Pattern match model name: `/deepseek-r1/i` or family containing `deepseek`. Show amber ⚠️ badge with tooltip: "Emits `` traces before JSON output". -- **N7:** Pattern match: `/llava|qwen.*vl|minicpm-v/i`. Show 👁 badge with tooltip: "Vision model — supports image input". -- **N8:** Move `model.details.family` from expanded-only to always-visible as a subtle pill badge. -- **N9:** Add `modelSort` state (`'name' | 'size' | 'params' | 'running' | 'modified'`). Sort the filtered model list before `.map()`. Add dropdown above the model list (next to search bar). -- **N10:** New API call to Ollama `/api/version` in the GET handler. Return in OllamaData. Display in the Ollama stats card. +- **N6:** Create `getModelBadges(name: string, family?: string)` in `lib/format.ts`. Pattern match for `` emitters: + - `/deepseek-r1/i` — covers `deepseek-r1:7b`, `deepseek-r1:32b`, etc. + - `/deepseek-r1-distill/i` — distilled variants also emit `` + Show amber ⚠️ badge: "Emits `` traces — strip before JSON.parse". + In the prompt modal, if the active model is a `` model, show a dismissable tip: "This model emits reasoning traces. Use the response after the closing `` tag." + +- **N7:** Broader pattern match for vision/multimodal models: + + ``` + /llava|bakllava|moondream|qwen.*vl|minicpm-v|llama.*vision/i + ``` + + Show 👁 badge: "Vision model — supports image input". + **Future:** Add image upload to prompt modal when a vision model is selected (tracked as F24). + +- **N8:** Move `model.details.family` from expanded-only to the subtitle line (next to size/params/quant). Style as a subtle pill: `font-mono text-[10px] px-1.5 rounded bg-surface-muted`. + +- **N9:** Add `modelSort` state (`'name' | 'size' | 'params' | 'running' | 'modified'`). Apply sort to the filtered model list before `.map()`: + - `name`: alphabetical + - `size`: disk size descending + - `params`: parse `parameter_size` string ("32B" → 32) descending + - `running`: loaded models first, then alphabetical + - `modified`: most recently modified first + Add compact dropdown next to the search bar. Persist sort choice in localStorage (`llm-model-sort`). + +- **N10:** Call Ollama `/api/version` in the GET handler (alongside `/api/tags` and `/api/ps`). Add `version?: string` to `OllamaData` interface. Display in the Ollama stats card: "v0.16.2" next to the Online badge. **Acceptance criteria:** -- DeepSeek R1 models show `` warning badge -- Vision models show eye indicator -- Models can be sorted by any of the 5 criteria -- Ollama version shown in status card +- [ ] DeepSeek R1 (and distilled) models show `` warning badge +- [ ] Vision models show eye indicator +- [ ] Family/architecture shown as pill on all model cards +- [ ] Models sortable by 5 criteria, sort persisted in localStorage +- [ ] Ollama version displayed in stats card +- [ ] TypeScript compiles cleanly --- -## Phase 4 — Runtime Metrics & Polish _(Sprint 11)_ +## Phase 4 — Runtime Metrics & UX Polish _(Sprint 11)_ **Goal:** Improve the experience for users who are actively using models. -**Estimated effort:** ~30 minutes +**Estimated effort:** ~45 minutes | # | ID | Task | Status | Priority | Notes | | --- | --- | ----------------------------- | ------ | -------- | -------------------------------------------------------- | -| 11 | N11 | Last known tok/s per model | [ ] | Medium | Persist after prompt, show on card | -| 12 | N12 | Auto-unload countdown | [ ] | Medium | Live countdown instead of static expiry time | -| 13 | N13 | Session stats per model | [ ] | Low | Prompts sent + tokens generated in this session | -| 14 | N14 | Delete confirmation + reclaim | [ ] | High | "Delete X? Reclaim Y GB" dialog before deleting | -| 15 | N15 | Simultaneous load suggestions | [ ] | Low | Suggest which models fit together based on available RAM | +| 16 | N11 | Last known tok/s per model | - [ ] | Medium | Persist after prompt, show on card | +| 17 | N12 | Auto-unload countdown | - [ ] | Medium | Live countdown instead of static expiry time | +| 18 | N13 | Session stats per model | - [ ] | Low | Prompts sent + tokens generated in this session | +| 19 | N14 | Simultaneous load suggestions | - [ ] | Low | Suggest which models fit together based on available RAM | **Implementation details:** -- **N11:** After prompt/chat completes, save `{ model: string, tokPerSec: number, timestamp: number }` to localStorage key `modelBenchmarks`. Show on card as faded text: `~45 tok/s`. -- **N12:** Replace `Expires: {time}` with a `useEffect` interval that computes `expires_at - Date.now()` and formats as `Xm Ys`. Update every second. -- **N13:** Track in component state: `Map`. Increment on each prompt/chat completion. Display in expanded details. -- **N14:** Add `confirmDelete` state. When delete is clicked, show inline confirmation with model name and `formatBytes(model.size)` reclaim amount. Second click executes. -- **N15:** After computing estimated RAM for all unloaded models, filter those that fit in remaining free memory. Show as suggestions in the models panel footer. +- **N11:** After prompt/chat completes with `streamMetrics`, save to localStorage key `llm-model-benchmarks`: + + ```ts + Record; + ``` + + Show on card as faded text: `~45 tok/s` (most recent value). Use `llm-` prefix for consistency with other localStorage keys. + Restore on page load. Show "N/A" for models never benchmarked. + +- **N12:** Replace static `Expires: {time}` (line 1034-1038) with a live countdown. Add a `useEffect` with 1-second interval that computes `expires_at - Date.now()` and formats: + - `> 5m` → "Unloads in Xm" + - `1–5m` → "Unloads in Xm Ys" (yellow) + - `< 1m` → "Unloading soon" (red pulse) + Clear interval on unmount or when model is unloaded. + +- **N13:** Track in component state: `Map`. Increment on each prompt/chat completion. Display in expanded details: "Session: 5 prompts · 2,340 tokens". Resets on page refresh (intentional — session-scoped). + +- **N14 (was N15):** After computing estimated RAM (from N1) for all unloaded models, filter those that fit in remaining free memory (accounting for already-loaded models). Show as a suggestion strip below the models list: "Can also load: llama3.1:8b (~6 GB), qwen2.5:7b (~6 GB)". **Acceptance criteria:** -- Previously benchmarked models show tok/s on card -- Running models show live countdown to unload -- Delete requires explicit confirmation showing disk reclaim -- TypeScript compiles cleanly after all phases +- [ ] Previously benchmarked models show tok/s on card +- [ ] Running models show live countdown to auto-unload +- [ ] Session stats shown in expanded model details +- [ ] Suggestions shown for co-loadable models +- [ ] TypeScript compiles cleanly --- -## Phase 5 — Future Considerations _(Backlog)_ +## Phase 5 — Response Quality & Interaction _(Sprint 12)_ -Not planned for immediate implementation. Revisit after Phases 1–4. +**Goal:** Improve the prompt/response experience with better rendering and model-specific interactions. -| ID | Feature | Complexity | Notes | -| --- | ------------------------------ | ---------- | ----------------------------------------------------------------- | -| F17 | WebSocket real-time updates | High | Replace 15s polling with push-based updates from Ollama | -| F18 | GPU/Metal utilization chart | Medium | macOS `powermetrics` or IOKit for GPU load percentage | -| F19 | Model download queue | Medium | Queue multiple pulls, show progress for each | -| F20 | Inference history log | Medium | Persist all prompts/responses to localStorage or file | -| F21 | Custom Modelfile editor | Medium | Edit and push custom Modelfiles to Ollama | -| F22 | Benchmark suite | High | Run standard prompts across all models, generate comparison table | -| F23 | Component decomposition (CQ1b) | High | Break 1,885-line page.tsx into feature-based modules | +**Estimated effort:** ~90 minutes + +| # | ID | Task | Status | Priority | Notes | +| --- | --- | ------------------------------- | ------ | -------- | ----------------------------------------------------------- | +| 20 | F24 | Image upload for vision models | - [ ] | High | Add image input when vision model is active | +| 21 | F25 | Markdown rendering in responses | - [ ] | High | Render markdown (headers, lists, bold, code) instead of raw | +| 22 | F26 | Code syntax highlighting | - [ ] | Medium | Highlight code blocks in responses with language detection | +| 23 | F27 | `` block auto-collapse | - [ ] | Medium | Detect and collapse reasoning traces, show "Show reasoning" | +| 24 | F28 | Ollama model library link | - [ ] | Low | Link to ollama.com/library in the pull input placeholder | + +**Implementation details:** + +- **F24:** When a vision model is selected (detected by N7 badge logic), show an image upload area in the prompt modal. Convert to base64 and send in the Ollama API `images` field. Support drag-and-drop and file picker. Show thumbnail preview. + +- **F25:** Use `react-markdown` (lightweight) to render model responses. Install as dependency. Wrap response `
` blocks with ``. Preserves streaming (markdown renders incrementally).
+
+- **F26:** Use `react-syntax-highlighter` with a dark theme (e.g., `oneDark`) for fenced code blocks inside markdown. Lazy-load to avoid bundle bloat.
+
+- **F27:** For `` models (detected by N6 logic), parse the response to find `...` blocks. Wrap in a collapsible `
` element: `Show reasoning (X tokens)`. Display the actual answer below the collapsed reasoning. + +- **F28:** Change pull input placeholder from `"Pull a model... e.g. deepseek-r1:32b"` to include a link: add a small "Browse models" link below the input that opens `https://ollama.com/library` in a new tab. + +**Acceptance criteria:** + +- [ ] Vision models show image upload area in prompt modal +- [ ] Responses render markdown (headers, lists, code blocks, bold/italic) +- [ ] Code blocks have syntax highlighting with language labels +- [ ] `` traces are auto-collapsed with "Show reasoning" toggle +- [ ] Model library link available near pull input + +--- + +## Phase 6 — Data Persistence & Export _(Sprint 13)_ + +**Goal:** Protect user data and enable portability. + +**Estimated effort:** ~45 minutes + +| # | ID | Task | Status | Priority | Notes | +| --- | --- | ------------------------------ | ------ | -------- | --------------------------------------------------------- | +| 25 | F29 | Export/import settings | - [ ] | Medium | Backup all localStorage data as JSON file | +| 26 | F30 | Inference history log | - [ ] | Medium | Persist prompt/response pairs to localStorage with search | +| 27 | F31 | Clear all data / factory reset | - [ ] | Low | Single button to clear all localStorage keys + state | + +**Implementation details:** + +- **F29:** Add a gear icon in the header that opens a settings popover. Include "Export settings" (downloads `llm-dashboard-settings.json` containing all `llm-*` localStorage keys) and "Import settings" (file upload, validates shape, merges). Keys to export: `llm-theme`, `llm-model-tags`, `llm-auto-load-model`, `llm-prompt-history`, `llm-model-benchmarks`, `llm-model-sort`, `llm-chat-*`. + +- **F30:** After each prompt/chat completion, save `{ model, prompt, response, metrics, timestamp }` to `llm-inference-log` (capped at 100 entries, FIFO). Add a small "History" panel accessible from the header with search and replay. + +- **F31:** Add "Reset dashboard" button in settings. Clears all `llm-*` localStorage keys. Confirm with "This will clear all tags, history, and preferences". + +**Acceptance criteria:** + +- [ ] Settings can be exported as JSON and re-imported +- [ ] Inference history persists across page refreshes +- [ ] Factory reset clears all dashboard state +- [ ] TypeScript compiles cleanly + +--- + +## Phase 7 — Future Considerations _(Backlog)_ + +Not planned for immediate implementation. Revisit after Phases 1–6. + +| ID | Feature | Complexity | Notes | +| --- | ------------------------------ | ---------- | -------------------------------------------------------------------------- | +| F17 | WebSocket real-time updates | High | Replace 15s polling with push-based updates from Ollama | +| F18 | GPU/Metal utilization chart | Medium | macOS `powermetrics` or IOKit for GPU load percentage | +| F19 | Model download queue | Medium | Queue multiple pulls, show progress for each | +| F21 | Custom Modelfile editor | Medium | Edit and push custom Modelfiles to Ollama | +| F22 | Benchmark suite | High | Run standard prompts across all models, generate comparison table | +| F23 | Component decomposition (CQ1b) | High | Break 1,885-line page.tsx into feature-based modules | +| F32 | Accessibility / ARIA labels | Medium | Keyboard nav beyond shortcuts, screen reader support, focus management | +| F33 | API latency overlay | Low | Show request duration for each panel refresh (network timing) | +| F34 | Model disk cleanup wizard | Medium | Identify unused models (never loaded, old modified_at), suggest deletion | +| F35 | Drag-and-drop model ordering | Low | Manual model order with drag handles, persist in localStorage | +| F36 | Prompt templates library | Medium | Pre-built prompts for common tasks (code review, explain, translate, etc.) | +| F37 | Response diff viewer | Medium | Side-by-side diff of two model responses highlighting differences | +| F38 | Multi-model chat | High | Chat with multiple models simultaneously, responses interleaved | --- ## Summary -| Phase | Sprint | Items | Focus | Effort | Depends On | -| --------- | ------ | ------------ | ------------------------ | ---------- | ---------- | -| 1 | 8 | N1, N2, N3 | Pre-load RAM estimates | ~45 min | — | -| 2 | 9 | N4, N5 | RAM bar + context window | ~60 min | Phase 1 | -| 3 | 10 | N6–N10 | Badges + sort + version | ~45 min | — | -| 4 | 11 | N11–N15 | Runtime metrics + UX | ~30 min | — | -| **Total** | | **15 items** | | **~3 hrs** | | +| Phase | Sprint | Items | Focus | Effort | Depends On | +| --------- | ------ | -------------- | ------------------------------ | ---------- | ---------- | +| 1 | 8 | N1–N3, BN1–BN5 | Pre-load intelligence + fixes | ~60 min | — | +| 2 | 9 | N4–N5, BN3–BN4 | Rich metadata + persistence | ~75 min | Phase 1 | +| 3 | 10 | N6–N10 | Intelligence badges + sort | ~60 min | — | +| 4 | 11 | N11–N14 | Runtime metrics + UX | ~45 min | Phase 1 | +| 5 | 12 | F24–F28 | Response quality + interaction | ~90 min | Phase 3 | +| 6 | 13 | F29–F31 | Data persistence + export | ~45 min | — | +| **Total** | | **31 items** | | **~6 hrs** | | -Phases 1 and 3 can run in parallel. Phase 2 depends on Phase 1 (needs RAM estimates for the budget bar). Phase 4 is independent. +**Dependency graph:** + +- Phase 1 → Phase 2 (N4 budget bar needs N1 RAM estimates) +- Phase 1 → Phase 4 (N14 co-load suggestions need N1 RAM estimates) +- Phase 3 → Phase 5 (F24 vision upload needs N7 detection, F27 `` collapse needs N6) +- Phases 1, 3, 6 can run in parallel + +**localStorage key registry** (standardized `llm-` prefix): + +| Key | Type | Phase | Purpose | +| ---------------------- | ------ | ----- | ------------------------- | +| `llm-theme` | string | Done | Dark/light theme | +| `llm-model-tags` | JSON | Done | User tags per model | +| `llm-auto-load-model` | string | Done | Preferred auto-load model | +| `llm-prompt-history` | JSON | Done | Last 20 prompts | +| `llm-model-benchmarks` | JSON | P4 | Tok/s per model | +| `llm-model-sort` | string | P3 | Sort preference | +| `llm-chat-{model}` | JSON | P2 | Chat messages per model | +| `llm-inference-log` | JSON | P6 | Full inference history |