docs(local-llm): Rich Features PRD rev 2 — comprehensive review + expansion

Review findings addressed (20+ issues): Structure additions: - Target Users section with 5 personas (solo dev, tinkerer, privacy pro, writer, power user) - Non-Goals section (8 explicit out-of-scope items for v4) - Risks & Mitigations table (10 risks with impact/likelihood/mitigation) - New API Routes section (4 new routes with security notes) - Settings Expansion section (full tree: General, Router, Models, Input, Tasks, Data, About) - New Dependencies table (idb ~1KB, fuse.js ~6KB, cron-parser ~3KB) - Error Handling appendix (12 edge cases with expected behavior) Data model fixes: - Conversation/Message split into separate IndexedDB stores (scalability) - Message gets conversationId FK, promptTokens field, size/language on Attachment - Design decision note explaining why messages are stored separately Feature spec improvements: - 3.1 Conversations: context window management (token bar, auto-summarize at 80/95%) - 3.2 Quick Actions: expanded Cmd+K palette spec (5 result types, ranking) - 3.3 Agents: tools marked v4 vs v5, duplicate-from-builtin, unlink on delete - 3.4 Model Router: full resolveModel() with 4-level fallback chain + availability - 3.5 Multi-Modal: attachment size limits, Whisper error handling - 3.6 Response: hover-only action bars, rating aggregation per task type - 3.7 Cron: built-in templates table, runtime constraints, security (execFile) - 3.8 Orchestration: full data model, chain/race/vote UI specs, step limits - 3.9 Projects: system context detail, project stats, unlink behavior Acceptance criteria added to ALL 9 features (was missing on 5). Competitive analysis expanded with local competitors (Open WebUI, LM Studio, Jan.ai). Success metrics improved with measurement methodology and rationale. Open questions restructured as decision table with recommendations. IndexedDB schema with explicit indexes and compound keys. Migration strategy: 7-step v3→v4 with safety (no delete until confirmed). 681 lines → 1149 lines (+69% content)
2026-02-19 23:47:59 -08:00 · 2026-02-19 23:47:59 -08:00 · 7bd14054d4
commit 7bd14054d4
parent 1172dbb23e
1 changed files with 583 additions and 98 deletions
--- a/__LOCAL_LLMs/dashboard/docs/RICH_FEATURES_PRD.md
+++ b/__LOCAL_LLMs/dashboard/docs/RICH_FEATURES_PRD.md
@ -1,7 +1,7 @@
 # Rich LLM Interaction Features — Product Requirements Document

 > **Local LLM Workspace** — Evolving Mission Control from a model management dashboard into a full-featured local AI workspace.
-> Last updated: Feb 20, 2026
+> Last updated: Feb 20, 2026 (rev 2 — reviewed and expanded)
 >
 > See also: [DASHBOARD_PRD.md](DASHBOARD_PRD.md) · [DASHBOARD_ROADMAP.md](DASHBOARD_ROADMAP.md)

@ -26,6 +26,29 @@ Mission Control today is a **model management dashboard** — load, unload, prom
 | Text-only input              | Multi-modal: voice (Whisper), vision, files, URLs      |
 | Single model per chat        | Multi-model orchestration (chain, race, vote)          |

+### Target Users
+
+| Persona                            | Description                                                                       | Primary needs                                                                                  |
+| ---------------------------------- | --------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
+| **Solo developer**                 | Uses LLMs for code review, debugging, generation daily. Has 3–5 models installed. | Fast switching between coding/reasoning models, Quick Actions for code tasks, file attachments |
+| **AI tinkerer**                    | Experiments with new models weekly. Compares outputs, benchmarks quality.         | Model router, race/vote orchestration, rating system, benchmarks                               |
+| **Privacy-conscious professional** | Lawyer, doctor, or researcher. Needs LLMs but cannot send data to cloud.          | 100% local, no telemetry, conversation export, project isolation                               |
+| **Writer / content creator**       | Uses LLMs for drafts, rewrites, brainstorming. Long multi-turn sessions.          | Persistent conversations, Writing Editor agent, branch/fork for variants                       |
+| **Power user / automation**        | Wants LLMs integrated into daily workflows. Cron jobs, shell integration.         | Scheduled tasks, command input sources, file output                                            |
+
+### Non-Goals (v4 Scope)
+
+The following are explicitly **out of scope** for v4. They may be revisited in v5+.
+
+- **Cloud model integration** — No OpenAI/Anthropic/Google API support. This is a local-only tool.
+- **Multi-user / auth** — Single-user, localhost only. No login, no sharing, no collaboration.
+- **Sandboxed code execution** — Running model-generated code is high-risk. Deferred to v5 with proper sandboxing.
+- **RAG / vector database** — Persistent agent memory across conversations requires a vector store (ChromaDB, etc.). Deferred.
+- **Mobile app** — No iOS/Android companion. LAN-based remote access is a v5 consideration.
+- **Plugin/extension API** — No third-party integrations or user-written tools. Deferred.
+- **Fine-tuning UI** — No model training or LoRA management. Use Ollama CLI directly.
+- **Web search** — No SearXNG or web scraping integration. Deferred.
+
 ---

 ## 2. Information Architecture
@ -79,14 +102,17 @@ Conversations become the primary interaction unit — persistent, named, searcha
 #### Data Model

 ```typescript
+// Conversations and messages stored SEPARATELY in IndexedDB for query performance.
+// A conversation with 200 messages loads the header instantly; messages paginate.
+
 interface Conversation {
  id: string; // crypto.randomUUID()
-  title: string; // auto-generated, editable
-  model: string;
-  agentId?: string;
-  projectId?: string;
-  systemPrompt?: string;
-  messages: Message[];
+  title: string; // auto-generated from first message, user-editable
+  model: string; // primary model (last used)
+  agentId?: string; // if started from an Agent
+  projectId?: string; // folder grouping
+  systemPrompt?: string; // conversation-level override
+  messageCount: number; // denormalized for sidebar display
  createdAt: number;
  updatedAt: number;
  pinned: boolean;
@ -95,31 +121,47 @@ interface Conversation {
    totalTokens: number;
    totalPrompts: number;
    avgTokPerSec: number;
-    models: string[]; // all models used
+    models: string[]; // all models used in this conversation
  };
 }

+// Stored in separate IndexedDB object store, indexed by [conversationId, timestamp]
 interface Message {
  id: string;
+  conversationId: string; // FK to Conversation.id
  role: 'user' | 'assistant' | 'system';
  content: string;
-  model?: string;
+  model?: string; // which model generated this response
  timestamp: number;
  attachments?: Attachment[];
-  metrics?: { tokensPerSec: number; totalTokens: number; durationMs: number };
-  rating?: 'up' | 'down';
-  parentId?: string; // for branching
-  branchIndex?: number;
+  metrics?: {
+    tokensPerSec: number;
+    totalTokens: number;
+    promptTokens: number; // input tokens consumed
+    durationMs: number;
+  };
+  rating?: 'up' | 'down'; // user feedback
+  parentId?: string; // for branching (null = linear, set = branch root)
+  branchIndex?: number; // which variant (0, 1, 2...) when regenerated
 }

 interface Attachment {
  type: 'image' | 'file' | 'audio' | 'url';
  name: string;
-  data: string; // base64 or text content
+  data: string; // base64 for images, text content for files
  mimeType?: string;
+  size?: number; // bytes, for display
+  language?: string; // detected language for code files
 }
 ```

+> **Design decision:** Messages are stored in a separate IndexedDB object store (not inline in Conversation) so that:
+>
+> 1. Sidebar loads instantly (only conversation headers)
+> 2. Long conversations (200+ messages) paginate efficiently
+> 3. Full-text search can index messages independently
+> 4. Branch queries use the `[conversationId, parentId]` index
+
 #### Lifecycle

 1. **Create** — "New Conversation" button, Quick Action, or Agent launch
@ -140,9 +182,32 @@ interface Attachment {

 #### Storage

- **IndexedDB** via `idb` library (localStorage too small for full history)
+- **IndexedDB** via `idb` library (~1KB gzipped, thin wrapper over native API)
 - Migration from existing `llm-chat-*` and `llm-inference-log` on first v4 load
 - Export: `llm-conversations-backup-{date}.json`
+- **Size budget:** IndexedDB has no hard limit on most browsers, but we cap at 500 conversations (auto-archive oldest). Individual messages have no cap.
+
+#### Context Window Management
+
+Local models have limited context windows (typically 2K–128K tokens). The system must handle this gracefully:
+
+1. **Token counting:** Use a fast client-side tokenizer estimate (word count × 1.3 for English) to track approximate context usage per conversation.
+2. **Context bar:** Show a subtle progress bar in the conversation header: `4.2K / 32K tokens`.
+3. **Auto-truncation:** When context exceeds 80% of the model's window, show a warning. At 95%, automatically summarize older messages into a condensed system message (using the same model) and continue.
+4. **Manual truncation:** User can click "Compact history" to trigger summarization at any time.
+5. **Context-aware model switching:** If a conversation outgrows a small model's context, suggest switching to a model with a larger window.
+
+#### Acceptance Criteria
+
+- [ ] Conversations persist across page refreshes and browser restarts
+- [ ] Auto-title generates within 2s of first response using fast model
+- [ ] Sidebar shows conversations grouped by time period with correct counts
+- [ ] Full-text search returns results within 200ms for 500 conversations
+- [ ] Branch navigation shows `← 1/3 →` arrows on regenerated messages
+- [ ] Conversations exportable as Markdown with metadata header
+- [ ] Context usage bar visible in conversation header
+- [ ] Migration from v3 localStorage data completes without data loss
+- [ ] Archived conversations hidden from sidebar but searchable

 ---

@ -230,9 +295,28 @@ interface ModelDefaults {

 Auto-detection: scan installed models, match by name pattern. User overrides in Settings.

-#### Command Palette
+#### Command Palette (`Cmd+K`)

-`Cmd+K` opens a fuzzy-search palette across all Quick Actions, Agents, and recent conversations. Under 100ms response time.
+The command palette is the power-user entry point. It unifies access to everything:
+
+- **Quick Actions** — fuzzy search by name or category
+- **Agents** — start conversation with any agent
+- **Recent conversations** — resume by title search
+- **Models** — switch model mid-conversation
+- **System commands** — "Export", "Settings", "Mission Control"
+
+Opens in <100ms. Results ranked by: exact match > starts-with > fuzzy > recently used.
+
+#### Acceptance Criteria
+
+- [ ] 30+ built-in Quick Actions across 5 categories
+- [ ] `Cmd+K` command palette opens in <100ms with fuzzy search
+- [ ] Quick Action creates new conversation with pre-filled system prompt and template
+- [ ] User can create, edit, duplicate, and delete custom Quick Actions
+- [ ] Model hint resolves to actual installed model (fallback: first loaded > first installed)
+- [ ] Usage tracking surfaces top 5 "frequently used" actions in sidebar
+- [ ] Actions included in settings export/import (F29 compatibility)
+- [ ] Template placeholders `{...}` highlighted in the input for easy replacement

 ---

@ -262,12 +346,12 @@ interface Agent {
 }

 type AgentTool =
-  | 'file_read'
-  | 'vision'
-  | 'voice_input'
-  | 'web_search'
-  | 'code_execution'
-  | 'memory';
+  | 'file_read' // v4: read attached files and include in context
+  | 'vision' // v4: process images via vision-capable models
+  | 'voice_input' // v4: Whisper transcription input
+  | 'web_search' // v5 (deferred): local web search
+  | 'code_execution' // v5 (deferred): sandboxed code runner
+  | 'memory'; // v5 (deferred): persistent memory across conversations
 ```

 #### Built-in Agents (10)
@ -295,7 +379,18 @@ type AgentTool =

 #### Agent Editor

-Full-screen editor: icon, name, model, temperature, system prompt (large textarea), welcome message, example prompts (tag input), tool toggles, constraints list.
+Full-screen editor: icon, name, model, temperature, system prompt (large textarea), welcome message, example prompts (tag input), tool toggles (v4 tools only), constraints list.
+
+#### Acceptance Criteria
+
+- [ ] 10 built-in agents covering code, writing, reasoning, creative, security
+- [ ] Agent editor with all fields persisted to IndexedDB
+- [ ] Agent conversations show badge in header and welcome message
+- [ ] Example prompts rendered as clickable chips below input bar
+- [ ] Agents included in settings export/import
+- [ ] Agent conversation count tracked and displayed in agent picker
+- [ ] Custom agents can be duplicated from built-in agents as starting point
+- [ ] Deleting an agent does NOT delete its conversations (conversations become "unlinked")

 ---

@ -304,20 +399,98 @@ Full-screen editor: icon, name, model, temperature, system prompt (large textare
 Heuristic-based classifier picks the best model for the task — no LLM call, runs in <5ms.

 ````typescript
-function classifyTask(input: string): TaskType {
+type TaskType =
+  | 'code'
+  | 'code_review'
+  | 'debugging'
+  | 'reasoning'
+  | 'math'
+  | 'simple'
+  | 'translation'
+  | 'creative'
+  | 'general';
+
+function classifyTask(input: string, attachments?: Attachment[]): TaskType {
+  // Priority 1: Attachments override text classification
+  if (attachments?.some(a => a.type === 'image')) return 'code'; // vision model handles
+
  const lower = input.toLowerCase();
-  if (/```|function |class |const |import |def |=>/.test(input)) return 'code';
-  if (/review|refactor|debug|fix|bug/.test(lower)) return 'code_review';
-  if (/error:|traceback|exception|crash/.test(lower)) return 'debugging';
-  if (/think.*through|step.by.step|analyze|compare|why/.test(lower)) return 'reasoning';
-  if (/calculate|math|equation|proof/.test(lower)) return 'math';
-  if (input.length < 60) return 'simple';
-  if (/translate|translation/.test(lower)) return 'translation';
+
+  // Priority 2: Structural signals (code blocks, error traces)
+  if (/```|function |class |const |import |def |=>|\{\s*\n/.test(input)) return 'code';
+  if (/error:|traceback|exception|crashed?|ENOENT|SIGTERM/.test(lower)) return 'debugging';
+
+  // Priority 3: Intent signals
+  if (/review|refactor|optimize|lint/.test(lower)) return 'code_review';
+  if (/debug|fix.*bug|why.*(fail|break|wrong)/.test(lower)) return 'debugging';
+  if (/think.*through|step.by.step|analyze|compare|pros.*cons|trade.?off/.test(lower))
+    return 'reasoning';
+  if (/calculate|math|equation|proof|probability|statistic/.test(lower)) return 'math';
+  if (/translate|translation|in (spanish|french|german|japanese|chinese|korean)/.test(lower))
+    return 'translation';
+  if (/brainstorm|creative|story|poem|write.*about|imagine/.test(lower)) return 'creative';
+
+  // Priority 4: Length heuristic
+  if (input.length < 80) return 'simple';
+
  return 'general';
 }
+
+function resolveModel(
+  taskType: TaskType,
+  defaults: ModelDefaults,
+  loadedModels: string[],
+  installedModels: string[]
+): { model: string; reason: string; loaded: boolean } {
+  const preferred = defaults[taskTypeToHint(taskType)];
+
+  // Best case: preferred model is already loaded in RAM
+  if (loadedModels.includes(preferred)) {
+    return { model: preferred, reason: `${taskType} detected`, loaded: true };
+  }
+
+  // Fallback 1: any loaded model that matches the hint category
+  const loadedMatch = loadedModels.find(m => matchesHint(m, taskTypeToHint(taskType)));
+  if (loadedMatch) {
+    return {
+      model: loadedMatch,
+      reason: `${taskType} detected (using loaded model)`,
+      loaded: true,
+    };
+  }
+
+  // Fallback 2: preferred model exists but needs loading
+  if (installedModels.includes(preferred)) {
+    return { model: preferred, reason: `${taskType} detected (will load model)`, loaded: false };
+  }
+
+  // Fallback 3: first loaded model
+  if (loadedModels.length > 0) {
+    return { model: loadedModels[0], reason: 'Using currently loaded model', loaded: true };
+  }
+
+  // Fallback 4: first installed model
+  return { model: installedModels[0], reason: 'Using first available model', loaded: false };
+}
 ````

-**UI:** Model chip shows `🤖 Auto` → resolves on send → "Routed to deepseek-r1:32b (reasoning detected)". Click to override. Configurable defaults in Settings.
+**UI behavior:**
+
+- **Auto mode (default):** Model chip shows `🤖 Auto` → resolves on send → toast: "Routed to deepseek-r1:32b (reasoning detected)"
+- **Manual override:** Click model chip to pick a specific model (sticky for that conversation)
+- **Explanation tooltip:** Hover over routed model shows classification reason
+- **Load warning:** If the selected model isn't loaded, show "Model will be loaded (~8s)" before sending
+- **Settings:** Configure model defaults per task type in Settings → Model Router
+
+#### Acceptance Criteria
+
+- [ ] Heuristic classifier runs in <5ms (no LLM call, pure regex)
+- [ ] Auto mode shown as default for new conversations (not Agent conversations)
+- [ ] Routing decision explained on hover tooltip
+- [ ] User can override per-conversation; override persists
+- [ ] Model defaults configurable in Settings (5 categories: fast, coding, reasoning, chat, vision)
+- [ ] Auto-detection fills defaults on first run by scanning installed model names
+- [ ] Graceful fallback chain: preferred loaded → category loaded → preferred installed → any loaded → any installed

 ---

@ -340,7 +513,19 @@ Rich input bar replacing the single textarea:
 | **Paste error**                       | Auto-detect stack trace, suggest "Debug Error" |
 | **Drag-and-drop**                     | Drop zone overlay, adds as attachment          |

-**Voice:** Click 🎤 or `Cmd+Shift+V` → record → `/api/whisper/transcribe` → fills textarea.
+**Voice:** Click 🎤 or `Cmd+Shift+V` → record via Web Audio API (MediaRecorder) → POST to `/api/whisper/transcribe` → transcription fills textarea → user can edit before sending.
+
+#### Acceptance Criteria
+
+- [ ] File picker supports `.ts`, `.py`, `.go`, `.rs`, `.js`, `.json`, `.md`, `.txt`, `.csv`, `.yaml`
+- [ ] Code files show syntax-highlighted collapsible preview before sending
+- [ ] Images display as thumbnails with file size; auto-select vision model if not already selected
+- [ ] Audio files sent to Whisper; transcription fills input (requires Whisper.cpp installed)
+- [ ] Paste image from clipboard (Cmd+V) creates inline attachment
+- [ ] Paste code auto-detected and wrapped in fenced code block with language
+- [ ] Drag-and-drop shows visual overlay; supports multiple files
+- [ ] Attachment size limit: 10MB per file, configurable in Settings
+- [ ] Graceful error if Whisper not installed: "Whisper.cpp not found — install via brew"

 ---

@ -376,7 +561,23 @@ Regenerate keeps original (dimmed), streams new variant, shows `← 1 of 2 →`

 #### Rating → Model Quality Profile

-👍/👎 per response → aggregate per model → show in model cards: "87% positive on code tasks"
+👍/👎 per response → aggregate per model per task type → stored in IndexedDB `modelRatings` store.
+
+Display in Mission Control model cards: "87% positive on code tasks (23 ratings)".
+Surface in model router: prefer models with higher ratings for their category.
+
+#### Acceptance Criteria
+
+- [ ] Per-message action bar appears on hover (not always visible, to avoid clutter)
+- [ ] Copy button copies full message text (markdown source, not rendered HTML)
+- [ ] Per-code-block copy button copies only the code (not language label or fences)
+- [ ] Regenerate creates branch; original preserved with dimmed styling
+- [ ] Branch navigation `← 1 of N →` shown when message has multiple variants
+- [ ] "Try with other model" dropdown shows currently loaded models
+- [ ] Live token counter and tok/s visible during streaming
+- [ ] Stop button prominently visible during streaming (red, larger than current)
+- [ ] 👍/👎 persisted per message; aggregated view in Mission Control model cards
+- [ ] Rating data included in settings export

 ---

@ -408,23 +609,123 @@ interface ScheduledTask {
 }
 ```

-**Built-in templates:** Morning Brief, Git Diff Summary, Dependency Audit, README Updater, Daily Learning Prompt.
+#### Built-in Templates

-**Constraints:** Dashboard tab must be open (browser cron). Optional `launchd` background service for macOS. 60s timeout per task. Loads model if needed.
+| Template             | Schedule      | Input Source       | Output       | Use Case                                  |
+| -------------------- | ------------- | ------------------ | ------------ | ----------------------------------------- |
+| **Morning Brief**    | Weekdays 8 AM | Static prompt      | Notification | "What should a developer focus on today?" |
+| **Git Diff Summary** | Weekdays 5 PM | `git diff --stat`  | Conversation | Summarize today's code changes            |
+| **Dependency Audit** | Mondays 10 AM | `npm audit --json` | Conversation | Weekly security scan summary              |
+| **README Freshness** | 1st of month  | File: `README.md`  | File output  | Monthly README review                     |
+| **Daily Learning**   | Daily 12 PM   | Static prompt      | Notification | "Teach me one useful programming concept" |
+
+#### Runtime Constraints
+
+- **Browser-based:** Uses `setInterval` with cron matching. Dashboard tab **must be open** for tasks to fire.
+- **Background alternative (future):** Optional Node.js process managed via macOS `launchd` plist. Would allow tasks to run when dashboard is closed. Deferred to v4.1.
+- **Model loading:** Task checks if model is loaded. If not, sends load request, waits up to 30s, then runs. Unloads after if it wasn't loaded before.
+- **Timeout:** 60s max per task. Abort and log error if exceeded.
+- **Notification:** Uses browser `Notification` API (requests permission on first task creation).
+- **Security:** Shell command input source is opt-in per task. Only `stdout` is captured (not `stderr`). Commands run via `execFile` (no shell injection).
+
+#### Task Editor UI
+
+Modal editor with sections: name, schedule (preset picker + custom cron input with human-readable preview), model selector, input source radio (static / shell command / file path / clipboard), prompt textarea with `{input}` placeholder, output action dropdown, enable/disable toggle, "Run Now" test button.
+
+#### Acceptance Criteria
+
+- [ ] Cron expression validates and shows human-readable schedule ("Every weekday at 9:00 AM")
+- [ ] Tasks fire when dashboard tab is open and schedule matches (checked every 60s)
+- [ ] "Run Now" button for immediate testing with result preview
+- [ ] Run history shows last 10 executions with duration, token count, and success/error
+- [ ] Shell command input captures stdout safely via server-side API route
+- [ ] Browser notification on task completion (with permission request)
+- [ ] 5 built-in templates included as starting points
+- [ ] Tasks included in settings export/import
+- [ ] Disabled tasks don't fire but are preserved

 ---

 ### 3.8 Multi-Model Orchestration

-Three modes for using multiple models together:
+Three modes for combining multiple models on a single task.

-**Chain** — Sequential pipeline: model A reasons → model B codes from that reasoning.
+#### Chain Mode — Sequential Pipeline

-**Race** — Same prompt to N models simultaneously. Show all results with timing. User picks best.
+Output of model A becomes input to model B. Each step can have its own system prompt.

-**Vote** — 3+ models answer, then a synthesizer model produces consensus with disagreement notes.
+```
+User prompt
+  → Step 1: deepseek-r1:32b (system: "Analyze the problem and plan an approach")
+  → Step 2: qwen2.5-coder:32b (system: "Implement the plan from the previous step")
+  → Final response shown to user (with expandable intermediate steps)
+```

-Orchestrations can be saved as reusable Quick Actions.
+**Use cases:** Architecture decision → implementation, problem analysis → solution code, outline → full text.
+
+#### Race Mode — Parallel Competition
+
+Same prompt sent to N models simultaneously. All results displayed side-by-side with timing metrics.
+
+```
+User prompt
+  → llama3.1:8b    (3.2s, 42 tok/s)  ← fastest
+  → qwen2.5:7b     (4.1s, 38 tok/s)
+  → mistral:7b     (3.8s, 35 tok/s)
+User picks the best response (or takes all)
+```
+
+**Use cases:** Comparing model quality on a task, finding fastest model, choosing best explanation.
+
+#### Vote Mode — Consensus
+
+Same prompt to 3+ models. A designated synthesizer model reads all outputs and produces a consensus, noting disagreements.
+
+```
+User prompt → 3 models answer independently
+  → Synthesizer (reasoning model) reads all 3:
+     "All models agree on X. Models A and B suggest Y, but Model C
+      disagrees because Z. My recommendation: ..."
+```
+
+**Use cases:** Important decisions, fact verification, reducing hallucination risk.
+
+#### Data Model
+
+```typescript
+interface Orchestration {
+  id: string;
+  name: string; // "Reason then Code"
+  mode: 'chain' | 'race' | 'vote';
+  steps: OrchestrationStep[]; // chain: sequential; race/vote: parallel
+  synthesizer?: string; // vote mode: model that produces consensus
+  description?: string;
+}
+
+interface OrchestrationStep {
+  model: string; // or modelHint for flexibility
+  systemPrompt?: string;
+  transformInput?: string; // chain mode: template with {prev} placeholder
+}
+```
+
+Orchestrations are saved as a special type of Quick Action (category: `orchestration`) and appear in the command palette.
+
+#### UI
+
+- **Chain:** Show intermediate steps in collapsible blocks, final answer prominent
+- **Race:** Side-by-side cards with model name, timing, response. "Pick winner" button.
+- **Vote:** Individual responses in tabs, consensus answer at top with attribution
+
+#### Acceptance Criteria
+
+- [ ] Chain mode passes output of step N as `{prev}` placeholder to step N+1
+- [ ] Race mode streams all models in parallel with independent progress indicators
+- [ ] Vote mode synthesizer receives all responses in a structured prompt
+- [ ] All modes show timing metrics per model
+- [ ] Orchestrations saveable as reusable Quick Actions
+- [ ] At least 2 models must be installed for orchestration to be available
+- [ ] Chain limited to 5 steps max; Race/Vote limited to 5 models max

 ---

@ -443,11 +744,22 @@ interface Project {
 }
 ```

- **System context per project** injected into every conversation
- **Default model/agent** per project
- Drag conversations into projects
- `Cmd+P` project switcher
- Project-scoped search
+- **System context per project:** Injected as a system message into every conversation in the project. Example: "You are helping with a TypeScript/Fastify backend using Azure Cosmos DB and Zod validation."
+- **Default model/agent:** New conversations in the project inherit these defaults
+- **Drag to organize:** Drag conversations from sidebar into project folders
+- **`Cmd+P` project switcher:** Quick switch between projects, filters sidebar to show only that project's conversations
+- **Project-scoped search:** Search only within a project's conversations
+- **Project stats:** Show total conversations, total tokens, most-used model
+
+#### Acceptance Criteria
+
+- [ ] Create/edit/delete projects with name, icon, description, system context
+- [ ] Project-level model and agent defaults apply to new conversations
+- [ ] Drag-and-drop conversations into/between projects
+- [ ] `Cmd+P` opens project switcher with search
+- [ ] Project system context visible as a badge in conversation header
+- [ ] Removing a conversation from a project moves it back to "Uncategorized"
+- [ ] Projects included in settings export/import

 ---

@ -471,24 +783,117 @@ interface Project {

 ---

-## 5. Storage Architecture
+## 5. Settings Expansion
+
+```
+Settings
+├── General
+│   ├── Theme (dark / light / auto)
+│   ├── Sidebar default state (expanded / collapsed)
+│   ├── Default view on launch (last conversation / new conversation / mission control)
+│   ├── Auto-title conversations (on/off, which model to use)
+│   └── Conversation list density (compact / comfortable)
+│
+├── Model Router
+│   ├── Default model per task type (fast, coding, reasoning, chat, vision)
+│   ├── Auto-routing enabled (on / off, default: on)
+│   └── Show routing explanation (on / off)
+│
+├── Models
+│   ├── Model nicknames (display name overrides, e.g., "deepseek-r1:32b" → "DeepSeek R1")
+│   ├── Per-model default temperature
+│   └── Model notes (free-text per model, shown in expanded details)
+│
+├── Input
+│   ├── Voice input: Whisper model selection (if multiple installed)
+│   ├── Auto-detect paste type (on / off)
+│   └── Max attachment size (default: 10MB)
+│
+├── Scheduled Tasks
+│   ├── Default timeout per task (30s / 60s / 120s)
+│   └── Browser notification preferences
+│
+├── Data
+│   ├── Export all data (conversations + agents + actions + tasks + settings)
+│   ├── Import data (validates, merges, reports conflicts)
+│   ├── Clear all conversations (keep agents + actions)
+│   ├── Clear inference log
+│   └── Factory reset (clear everything, confirm dialog)
+│
+└── About
+    ├── Dashboard version
+    ├── Ollama version (from existing N10 feature)
+    ├── Storage usage (IndexedDB size + localStorage size)
+    └── Model disk usage (from existing system panel)
+```
+
+Settings persist to localStorage (same `llm-*` prefix convention from v3). Complex data (conversations, messages, agents) in IndexedDB.
+
+---
+
+## 6. New API Routes
+
+The existing API routes remain unchanged. The following new routes are needed for v4 features:
+
+| Method | Route                   | Purpose                                                       | Feature             |
+| ------ | ----------------------- | ------------------------------------------------------------- | ------------------- |
+| POST   | `/api/ollama/chat`      | Already exists                                                | 3.1 Conversations   |
+| POST   | `/api/ollama/stream`    | Already exists (updated for images in Phase 5)                | 3.5 Multi-Modal     |
+| POST   | `/api/ollama/title`     | Generate conversation title via fast model                    | 3.1 Auto-title      |
+| POST   | `/api/ollama/summarize` | Summarize older messages for context compaction               | 3.1 Context Mgmt    |
+| POST   | `/api/system/exec`      | Execute shell command (for scheduled tasks)                   | 3.7 Cron            |
+| GET    | `/api/system/file`      | Read file content by path (for scheduled tasks + attachments) | 3.7 Cron, 3.5 Input |
+
+**Security notes:**
+
+- `/api/system/exec` is the most sensitive route. Validate command against an allowlist of safe executables (`git`, `npm`, `brew`, `cat`, `ls`, `wc`, `du`, `df`). Block `rm`, `mv`, `chmod`, `sudo`, `curl`, `wget` by default. User can extend allowlist in Settings.
+- `/api/system/file` reads files for context. Limit to configurable base directories. No traversal above home directory.
+- Both routes are localhost-only (same as all existing routes).
+
+---
+
+## 7. Storage Architecture

 ### Migration: localStorage → IndexedDB

 ```
-IndexedDB (via idb):
-  conversations, messages, agents, quickActions, projects, scheduledTasks, taskRuns
+IndexedDB stores (via idb library, ~1KB gzipped):
+  conversations    ← pk: id, indexes: [updatedAt], [projectId], [archived]
+  messages         ← pk: id, indexes: [conversationId+timestamp], [conversationId+parentId]
+  agents           ← pk: id
+  quickActions     ← pk: id, indexes: [category], [usageCount]
+  projects         ← pk: id
+  scheduledTasks   ← pk: id
+  taskRuns         ← pk: auto, indexes: [taskId+timestamp]
+  modelRatings     ← pk: auto, indexes: [model+taskType]
+  orchestrations   ← pk: id

-localStorage (lightweight settings):
+localStorage (lightweight, synchronous access):
  llm-theme, llm-model-defaults, llm-model-sort, llm-auto-load-model,
-  llm-sidebar-state, llm-last-conversation, llm-model-benchmarks
+  llm-sidebar-state, llm-last-conversation, llm-model-benchmarks,
+  llm-settings-*, llm-command-allowlist
 ```

-Migration on first v4 load: existing `llm-inference-log` and `llm-chat-*` entries → IndexedDB conversations.
+### Migration Strategy (v3 → v4)
+
+1. On first v4 load, detect existing `llm-*` localStorage keys
+2. Convert `llm-inference-log` entries into Conversation + Message records in IndexedDB
+3. Convert `llm-chat-{model}` entries into Conversation + Message records
+4. Migrate `llm-prompt-history` into a "Prompt History" conversation
+5. Keep `llm-theme`, `llm-model-sort`, `llm-auto-load-model`, `llm-model-benchmarks` in localStorage
+6. Set `llm-migrated-v4: true` flag to prevent re-migration
+7. **Do NOT delete** old keys until user confirms migration was successful
+
+### Size Management
+
+- **Conversation cap:** 500 active conversations. Beyond that, oldest auto-archived.
+- **Message pagination:** Load 50 messages at a time when scrolling up.
+- **Attachment storage:** Images stored as base64 in message data. For conversations with many images, offer "Export & clean attachments" option.
+- **Periodic cleanup:** Prompt user quarterly to archive/delete old conversations if count > 300.

 ---

-## 6. Component Architecture (v4)
+## 8. Component Architecture (v4)

 ```
 src/app/
@ -524,7 +929,7 @@ src/app/

 ---

-## 7. Implementation Phases
+## 9. Implementation Phases

 | Phase     | Focus                      | Tasks                                                                           | Effort    | Depends On |
 | --------- | -------------------------- | ------------------------------------------------------------------------------- | --------- | ---------- |
@ -541,61 +946,121 @@ src/app/

 ---

-## 8. Competitive Differentiation
+## 10. Competitive Differentiation

-| Feature             | ChatGPT      | Claude       | **Local (Ours)**               |
-| ------------------- | ------------ | ------------ | ------------------------------ |
-| Privacy             | Cloud        | Cloud        | **100% local**                 |
-| Cost                | $20/mo       | $20/mo       | **Free**                       |
-| Rate limits         | Yes          | Yes          | **None**                       |
-| Model choice        | GPT-4/o/mini | Claude 3/3.5 | **Any Ollama model**           |
-| Custom GPTs         | Yes          | No           | **Custom Agents**              |
-| File upload         | Yes          | Yes          | **+ local file system**        |
-| Voice input         | Yes          | No           | **Whisper.cpp local**          |
-| Scheduled tasks     | No           | No           | **Cron automation**            |
-| Multi-model         | No           | No           | **Chain/Race/Vote**            |
-| Model router        | Hidden       | Hidden       | **Transparent + configurable** |
-| Offline             | No           | No           | **Yes**                        |
-| Hot-swap models     | No           | No           | **Instant model switching**    |
-| Hardware visibility | None         | None         | **Full system monitoring**     |
+### vs. Cloud (ChatGPT, Claude, Gemini)

-**Our unique advantages:**
+| Feature             | ChatGPT      | Claude       | Gemini   | **Local (Ours)**               |
+| ------------------- | ------------ | ------------ | -------- | ------------------------------ |
+| Privacy             | Cloud        | Cloud        | Cloud    | **100% local**                 |
+| Cost                | $20/mo       | $20/mo       | $20/mo   | **Free**                       |
+| Rate limits         | Yes          | Yes          | Yes      | **None**                       |
+| Model choice        | GPT-4/o/mini | Claude 3/3.5 | Gemini 2 | **Any Ollama model**           |
+| Custom GPTs         | Yes          | Projects     | Gems     | **Custom Agents**              |
+| File upload         | Yes          | Yes          | Yes      | **+ local file system**        |
+| Voice input         | Yes          | No           | Yes      | **Whisper.cpp local**          |
+| Scheduled tasks     | No           | No           | No       | **Cron automation**            |
+| Multi-model         | No           | No           | No       | **Chain/Race/Vote**            |
+| Model router        | Hidden       | Hidden       | Hidden   | **Transparent + configurable** |
+| Offline             | No           | No           | No       | **Yes**                        |
+| Hot-swap models     | No           | No           | No       | **Instant model switching**    |
+| Hardware visibility | None         | None         | None     | **Full system monitoring**     |

-1. **Transparency** — see exactly which model, how fast, how much RAM
-2. **Orchestration** — chain/race/vote across models (impossible with cloud)
-3. **Automation** — scheduled tasks with local file/command integration
-4. **Zero friction** — no login, no API keys, no rate limits, no costs
-5. **Model diversity** — 20+ models from different families, each with strengths
+### vs. Local Competitors (Open WebUI, LM Studio, Jan.ai)
+
+| Feature           | Open WebUI | LM Studio      | Jan.ai         | **Ours**                        |
+| ----------------- | ---------- | -------------- | -------------- | ------------------------------- |
+| Setup             | Docker     | Standalone app | Standalone app | **npm dev server (instant)**    |
+| Model management  | Good       | Excellent      | Good           | **+ RAM estimation, co-load**   |
+| Custom agents     | Basic      | No             | Assistants     | **Full agent editor + tools**   |
+| Quick actions     | No         | No             | No             | **30+ built-in templates**      |
+| Scheduled tasks   | No         | No             | No             | **Cron with shell/file input**  |
+| Multi-model orch. | No         | No             | No             | **Chain/Race/Vote**             |
+| System monitoring | Minimal    | Minimal        | No             | **Full (RAM, disk, GPU, brew)** |
+| Model router      | No         | No             | No             | **Heuristic auto-routing**      |
+| Command palette   | No         | No             | No             | **Cmd+K unified access**        |
+| Developer-focused | General    | General        | General        | **Code-first Quick Actions**    |
+| Extensibility     | Plugins    | No             | Extensions     | **v5 (deferred)**               |
+
+**Our unique position:** Developer-first local AI workspace with the deepest system integration (shell commands, file I/O, cron, hardware monitoring) and the richest multi-model orchestration. We trade breadth (no plugins, no cloud) for depth in the local developer workflow.
+
+**Key differentiators vs. all competitors:**
+
+1. **Orchestration** — chain/race/vote across models (no competitor offers this)
+2. **Automation** — scheduled tasks with local file/command integration
+3. **Transparency** — see exactly which model, how fast, how much RAM, will-it-fit
+4. **Zero friction** — no Docker, no Electron, no API keys, just `npm run dev`
+5. **Code-first** — 30 Quick Actions optimized for developer workflows

 ---

-## 9. Success Metrics
+## 11. Risks & Mitigations

-| Metric                | Target                              | Measurement                         |
-| --------------------- | ----------------------------------- | ----------------------------------- |
-| Conversation creation | >5/day                              | Count in IndexedDB                  |
-| Quick Action usage    | >50% of conversations start from QA | `usageCount` tracking               |
-| Agent adoption        | 3+ agents in regular use            | `conversationCount` per agent       |
-| Auto-routing accuracy | >80% user satisfaction              | Rating on auto-routed conversations |
-| Retention             | Session >10 min                     | Timestamp delta                     |
-| Task completion       | <3 interactions to useful answer    | Message count per conversation      |
+| #   | Risk                                                                          | Impact   | Likelihood | Mitigation                                                                  |
+| --- | ----------------------------------------------------------------------------- | -------- | ---------- | --------------------------------------------------------------------------- |
+| R1  | **IndexedDB data loss** — browser clear, profile reset, or corruption         | High     | Low        | Auto-export backup every 7 days (prompt user). Export includes all data.    |
+| R2  | **Context window overflow** — long conversations silently truncated by Ollama | High     | Medium     | Context bar (3.1), auto-summarization at 80%, model switch suggestion.      |
+| R3  | **Shell command injection** — scheduled task exec route exploited             | Critical | Low        | Allowlist of safe executables, `execFile` (not `exec`), localhost-only.     |
+| R4  | **Model not loaded for cron task** — task fails silently                      | Medium   | Medium     | Auto-load before task, 30s timeout for load, log error if fails.            |
+| R5  | **Bundle size bloat** — new deps (idb, cron parser) increase load time        | Low      | Low        | `idb` is ~1KB. Cron parser is ~2KB. No heavy new deps.                      |
+| R6  | **page.tsx complexity** — already 2,500 lines, v4 adds much more              | High     | High       | Phase A includes component decomposition into route groups. Non-negotiable. |
+| R7  | **Browser tab closed during cron** — scheduled tasks don't fire               | Medium   | High       | Clear UX: show "Dashboard must be open" warning. Offer launchd in v4.1.     |
+| R8  | **Stale auto-title** — fast model generates bad title                         | Low      | Medium     | Title is always editable. "Rename" action in sidebar context menu.          |
+| R9  | **Race condition in Race mode** — parallel streams conflict                   | Medium   | Medium     | Each stream writes to independent state. Merge only on completion.          |
+| R10 | **Privacy leak via scheduled command** — command output sent to model         | Medium   | Low        | Show command output to user before sending to model (opt-in auto-send).     |

 ---

-## 10. Open Questions
+## 12. Success Metrics

-1. **Background cron:** Should scheduled tasks require dashboard open, or should we ship a `launchd` daemon?
-2. **Code execution:** Sandboxed code runner is high-value but high-risk. Scope for v5?
-3. **Web search:** Local web search via SearXNG or similar? Or defer to cloud models?
-4. **Persistent memory:** Agent memory across conversations (RAG-style)? Needs vector DB.
-5. **Mobile companion:** Minimal mobile UI that sends to Mac over LAN?
-6. **Plugin system:** Let users write custom tools in JS? Extension API?
+All metrics are local-only (no telemetry). Measured via IndexedDB queries on the About page.
+
+| Metric                        | Target                               | How Measured                                      | Why It Matters                            |
+| ----------------------------- | ------------------------------------ | ------------------------------------------------- | ----------------------------------------- |
+| Conversations/week            | >15                                  | Count in IndexedDB by `createdAt`                 | Adoption: are people using conversations? |
+| Quick Action usage            | >40% of conversations start from QA  | `agentId` or first message matches template       | Quick Actions reduce friction             |
+| Agent adoption                | 3+ agents with >5 conversations each | `conversationCount` per agent                     | Agents provide lasting value              |
+| Auto-routing override rate    | <30%                                 | Manual model changes in auto-routed conversations | Router is accurate enough                 |
+| Avg messages per conversation | >4                                   | Message count / conversation count                | Multi-turn engagement                     |
+| Branch usage                  | >10% of conversations have branches  | Messages with `branchIndex > 0`                   | Branching is discoverable                 |
+| Scheduled task success rate   | >90%                                 | `success` in `runHistory`                         | Tasks are reliable                        |

 ---

-## Appendix A: Quick Action Templates (Full Detail)
+## 13. Open Questions

-### Code Review
+| #   | Question                                                                                   | Options                                                  | Recommendation                                                                                            |
+| --- | ------------------------------------------------------------------------------------------ | -------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
+| Q1  | **Background cron:** Dashboard open vs. launchd daemon?                                    | (a) Browser-only for v4, (b) Ship launchd from day 1     | **(a)** Browser-only for v4. Add launchd in v4.1 after validating demand.                                 |
+| Q2  | **Code execution:** Sandboxed runner for model-generated code?                             | (a) Defer to v5, (b) Simple eval with timeout            | **(a)** Defer. Security implications are high.                                                            |
+| Q3  | **Web search:** Local search via SearXNG?                                                  | (a) Defer, (b) Optional SearXNG integration              | **(a)** Defer. Requires external service. Out of "zero-deps" philosophy.                                  |
+| Q4  | **Persistent memory:** Agent memory across conversations?                                  | (a) Defer (needs vector DB), (b) Simple key-value memory | **(b)** Consider simple key-value "agent notes" in v4 as lightweight alternative to RAG.                  |
+| Q5  | **Mobile companion:** LAN-based access from phone?                                         | (a) Defer, (b) Expose on LAN with `--hostname 0.0.0.0`   | **(a)** Defer. Security concerns with LAN exposure.                                                       |
+| Q6  | **Plugin system:** User-written tools?                                                     | (a) Defer, (b) Simple JS extension API                   | **(a)** Defer. Need to stabilize core features first.                                                     |
+| Q7  | **Conversation sharing:** Export as shareable HTML?                                        | (a) Markdown only, (b) Self-contained HTML               | **(b)** Nice-to-have. Self-contained HTML with embedded CSS = easy sharing. Consider for v4.1.            |
+| Q8  | **Model fine-tuning triggers:** If user rates many responses, offer to create a Modelfile? | (a) Defer, (b) Suggest Modelfile adjustments             | **(b)** Low-effort: if a model gets 10+ ratings, suggest temperature/system prompt tweaks as a new Agent. |
+
+---
+
+## 14. New Dependencies
+
+| Package                    | Size (gzip)  | Purpose                              | Phase |
+| -------------------------- | ------------ | ------------------------------------ | ----- |
+| `idb`                      | ~1KB         | Thin IndexedDB wrapper with Promises | A     |
+| `fuse.js`                  | ~6KB         | Fuzzy search for command palette     | B     |
+| `cron-parser`              | ~3KB         | Parse and validate cron expressions  | F     |
+| `react-markdown`           | ~12KB        | Already installed (Phase 5)          | —     |
+| `react-syntax-highlighter` | ~40KB (lazy) | Already installed (Phase 5)          | —     |
+
+**Total new bundle cost:** ~10KB gzipped. Minimal impact.
+
+---
+
+## Appendix A: Quick Action Templates (Selected Examples)
+
+Full templates for 4 representative actions. The remaining 26 follow the same YAML structure and are defined in `src/app/lib/quick-actions.ts`.
+
+### Code Review (Code category)

 ```yaml
 name: Code Review
@ -608,13 +1073,14 @@ systemPrompt: |
  readability, and maintainability.
  Always reference specific lines. Suggest concrete fixes, not just problems.
  Rate severity: critical, warning, or suggestion.
+  Format output as a numbered list of findings.
 userTemplate: |
  Review this code:

  {paste or attach file}
 ```

-### Debug Error
+### Debug Error (Code category)

 ```yaml
 name: Debug Error
@ -638,7 +1104,7 @@ userTemplate: |
  {paste relevant code}
 ```

-### Deep Think
+### Deep Think (Analysis category)

 ```yaml
 name: Deep Think
@ -656,7 +1122,7 @@ userTemplate: |
  {describe the problem or question}
 ```

-### Shell Command
+### Shell Command (DevOps category)

 ```yaml
 name: Shell Command
@ -677,4 +1143,23 @@ userTemplate: |

 ---

+## Appendix B: Error Handling & Edge Cases
+
+| Scenario                                 | Behavior                                                                                                              |
+| ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
+| No models installed                      | Show "Install a model" CTA linking to Ollama library. Disable all chat features.                                      |
+| Ollama not running                       | Show offline banner (existing). Conversations still browsable but sending disabled.                                   |
+| Model unloaded mid-conversation          | Detect 404 on stream, auto-reload model, retry once. Show toast: "Model reloaded".                                    |
+| IndexedDB full                           | Show warning toast with storage usage. Suggest archiving old conversations.                                           |
+| IndexedDB unavailable (private browsing) | Fall back to localStorage with 50-conversation cap. Show banner: "Limited storage mode".                              |
+| Voice recording fails                    | Show error: "Microphone access denied" or "Recording failed". Fall back to text.                                      |
+| Whisper not installed                    | Voice button disabled with tooltip: "Install whisper-cpp via Homebrew".                                               |
+| Cron task model not installed            | Task fails, logged to run history: "Model not found: xyz". Notification shown.                                        |
+| Chain orchestration step fails           | Abort chain, show partial results with error at failed step.                                                          |
+| Race mode — one model fails              | Show results from successful models. Failed model shows error card.                                                   |
+| Attachment too large                     | Reject with toast: "File exceeds 10MB limit". Don't send.                                                             |
+| Context window exceeded                  | Warning at 80%, auto-summarize at 95%, hard-fail message: "Context full — start new conversation or compact history". |
+
+---
+
 _This PRD is a living document. Update as features are implemented and user feedback arrives._