From 076449268bef9a50faa1952b12c53f333bb648a0 Mon Sep 17 00:00:00 2001
From: Hermes VM <root@srv1491630>
Date: Sun, 31 May 2026 10:48:52 +0000
Subject: [PATCH] docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/INTERVIEW/01-ecosystem-rag-fabric.md | 279 ++++++++++++++++++++++
 docs/INTERVIEW/02-competency-deepdives.md | 224 +++++++++++++++++
 docs/INTERVIEW/03-star-interview-bank.md  | 143 +++++++++++
 docs/INTERVIEW/04-enhancement-roadmap.md  | 191 +++++++++++++++
 docs/INTERVIEW/05-banking-blueprints.md   | 175 ++++++++++++++
 docs/INTERVIEW/06-glossary-quickref.md    | 112 +++++++++
 docs/INTERVIEW/README.md                  | 113 +++++++++
 7 files changed, 1237 insertions(+)
 create mode 100644 docs/INTERVIEW/01-ecosystem-rag-fabric.md
 create mode 100644 docs/INTERVIEW/02-competency-deepdives.md
 create mode 100644 docs/INTERVIEW/03-star-interview-bank.md
 create mode 100644 docs/INTERVIEW/04-enhancement-roadmap.md
 create mode 100644 docs/INTERVIEW/05-banking-blueprints.md
 create mode 100644 docs/INTERVIEW/06-glossary-quickref.md
 create mode 100644 docs/INTERVIEW/README.md
diff --git a/docs/INTERVIEW/01-ecosystem-rag-fabric.md b/docs/INTERVIEW/01-ecosystem-rag-fabric.md
new file mode 100644
index 0000000..e436fe4
--- /dev/null
+++ b/docs/INTERVIEW/01-ecosystem-rag-fabric.md
@@ -0,0 +1,279 @@
+# 01 · The ByteLyst Ecosystem as an Agentic RAG Fabric
+
+The trick in this interview is to stop treating ByteLyst as "a bunch of side projects"
+and start describing it as **one governed retrieval fabric with multiple agentic
+front-ends**. Every diagram below is something you can reproduce on a whiteboard.
+
+---
+
+## 1. System context — what we actually run
+
+```mermaid
+flowchart TB
+    subgraph Users["👤 Humans & Agents"]
+        U1[End users<br/>web / mobile]
+        U2[Coding agents<br/>claude · codex · devin]
+        U3[Operators<br/>Hermes Mission Control]
+    end
+
+    subgraph Fronts["Agentic Product Surfaces"]
+        P1["invt_trdg<br/>AI trading chat<br/>(tool-calling over markets)"]
+        P2["flowmonk<br/>planning + bounded AI layer"]
+        P3["notelett<br/>notes for humans + agents"]
+        P4["chronomind<br/>contextual time AI"]
+    end
+
+    subgraph Platform["common_plat — the shared fabric"]
+        PS["platform-service :4003<br/>auth · flags · telemetry · billing · blob"]
+        ES["extraction-service :4005<br/>URL / doc → retrievable units"]
+        MCP["mcp-server :4007<br/>tool / resource registration"]
+        LR["packages/llm-router<br/>provider abstraction"]
+    end
+
+    subgraph Data["Governed Data Sources"]
+        DB[("Cosmos DB<br/>docs + Gremlin graph")]
+        PG[("Postgres<br/>structured + pgvector*")]
+        EV[("event-store<br/>immutable audit")]
+        BLOB[("blob<br/>raw documents")]
+    end
+
+    subgraph Ops["Control Plane"]
+        HERMES["Hermes Mission Control<br/>(devops_tools/dashboard)"]
+        AQ["agent-queue<br/>multi-agent runner"]
+    end
+
+    U1 --> P1 & P2 & P3 & P4
+    U2 --> AQ
+    U3 --> HERMES
+    P1 & P2 & P3 & P4 --> PS
+    P1 & P2 & P3 & P4 --> MCP
+    MCP --> LR
+    ES --> BLOB
+    PS --> DB & PG & EV
+    MCP --> DB & PG
+    AQ --> MCP
+    HERMES --> PS
+    HERMES -.observes.-> ES & MCP & LR
+
+    classDef plan fill:#fef3c7,stroke:#d97706
+    class PG,LR plan
+```
+
+> `*` pgvector and the Gremlin graph are the planned hardening (see `04-enhancement-roadmap.md`).
+> Everything else is a real, deployed component of the ecosystem.
+
+**How to narrate it:** *"The platform-service is my policy/identity plane, the mcp-server
+is my tool-boundary plane, llm-router is my model plane, and the data sources are governed
+behind both. Any product surface is just a thin agentic UI over that fabric — which is
+exactly the shape of an enterprise agentic-RAG platform."*
+
+---
+
+## 2. The reference agentic-RAG container view
+
+This is the canonical picture the interviewer wants to see — drawn in *our* components.
+
+```mermaid
+flowchart LR
+    Q[User query] --> ROUTER
+
+    subgraph Orchestration["Agentic Orchestration (LangGraph-shaped)"]
+        ROUTER{{"Router / planner agent<br/>intent + complexity"}}
+        RETR["Retriever agent"]
+        GRADE{{"Relevance grader<br/>(CRAG gate)"}}
+        REWRITE["Query rewriter<br/>(HyDE)"]
+        GEN["Generator agent<br/>+ citation enforcer"]
+        CRITIC{{"Self-RAG critic<br/>groundedness check"}}
+    end
+
+    subgraph Retrieval["Hybrid Retrieval Fabric"]
+        VEC[("Vector<br/>pgvector / Azure AI Search")]
+        BM25[("Lexical<br/>BM25")]
+        GRAPH[("Graph traversal<br/>Cosmos Gremlin")]
+        SQL[("Structured<br/>schema-aware SQL tool")]
+        RERANK["Cross-encoder rerank<br/>+ context compression"]
+    end
+
+    subgraph Gov["Governance plane (every hop)"]
+        ACL["Access-controlled retrieval<br/>auth + row/col masking"]
+        AUDIT["event-store audit trail"]
+        KILL["kill-switch / flags"]
+    end
+
+    ROUTER --> RETR
+    RETR --> VEC & BM25 & GRAPH & SQL
+    VEC & BM25 & GRAPH & SQL --> RERANK
+    RERANK --> GRADE
+    GRADE -- "low relevance" --> REWRITE --> RETR
+    GRADE -- "ok" --> GEN
+    GEN --> CRITIC
+    CRITIC -- "ungrounded" --> REWRITE
+    CRITIC -- "grounded + cited" --> A[Answer + citations]
+
+    RETR -.enforced by.-> ACL
+    GEN -.logged to.-> AUDIT
+    ROUTER -.gated by.-> KILL
+```
+
+**Key talking points keyed to the JD:**
+- *Hybrid search (vector + BM25 + graph)* → the four parallel retrievers fan-out, reranker fans-in.
+- *Reranking + context compression* → the `RERANK` node (cross-encoder, e.g. ColBERT late-interaction or a bge-reranker).
+- *CRAG* → the `GRADE` gate that triggers corrective re-retrieval.
+- *HyDE* → the `REWRITE` node generating a hypothetical answer to embed.
+- *Self-RAG* → the `CRITIC` node reflecting on groundedness before release.
+- *Access-controlled retrieval / Zero Trust / audit* → the governance plane wraps **every** hop, not just the entrance.
+
+---
+
+## 3. Multi-agent orchestration topology (we run a real one)
+
+`agent-queue/` is a production folder-kanban that drives **three different agent engines**
+(`claude`, `codex`, `devin`) through an explicit state machine. That *is* multi-agent
+orchestration — and it's the strongest "I've shipped agents" story you have.
+
+```mermaid
+stateDiagram-v2
+    [*] --> inbox: drop prompt .md
+    inbox --> doing: runner claims (auto-approve)
+    doing --> done: success
+    doing --> failed: error / timeout
+    failed --> inbox: requeue (human-in-loop)
+    done --> [*]
+
+    note right of doing
+      Engine selected per task:
+      claude · codex · devin
+      = heterogeneous agent pool
+    end note
+```
+
+Map this to LangGraph vocabulary in the room:
+
+| agent-queue concept | LangGraph / agentic equivalent |
+|---|---|
+| `inbox/doing/done/failed` folders | graph **nodes** / state enum |
+| runner claiming + transitioning | **conditional edges** |
+| engine flag (claude/codex/devin) | **tool/agent binding** per node |
+| `failed → inbox` requeue | **cyclic edge** w/ human-in-the-loop checkpoint |
+| live `status`/`watch` | **state checkpointer** + observability |
+
+> Honest framing: *"I built this deliberately framework-light to stay bash-portable and
+> dependency-free. The state model is identical to LangGraph; porting it onto LangGraph's
+> `StateGraph` mostly buys me typed state, built-in checkpointing, and the A2A handoff
+> contract — which is exactly the enhancement I've scoped."*
+
+---
+
+## 4. MCP server — Zero-Trust tool boundary
+
+This is your strongest *governance* asset and a direct hit on a Preferred Qualification
+("MCP server architecture, tool/resource registration patterns, agentic security threat
+modeling"). We run `mcp-server` on :4007 with `packages/mcp-client`.
+
+```mermaid
+flowchart TB
+    subgraph Agent["Agent (untrusted by default)"]
+        A[LLM reasoning loop]
+    end
+
+    subgraph Boundary["mcp-server :4007 — policy enforcement point"]
+        REG["Tool / resource registry<br/>(declared, typed, versioned)"]
+        AUTHZ{"AuthZ check<br/>identity + scope + role"}
+        MASK["Row/column masking<br/>field-encrypt"]
+        RATE["Rate / cost limits + kill-switch"]
+        LOG["Audit emit → event-store"]
+    end
+
+    subgraph Resources["Governed resources"]
+        T1[Market data tool]
+        T2[Doc retrieval tool]
+        T3[Graph query tool]
+        T4[Text-to-SQL tool<br/>read-only views]
+    end
+
+    A -- "tool call (intent)" --> REG
+    REG --> AUTHZ
+    AUTHZ -- deny --> A
+    AUTHZ -- allow --> MASK
+    MASK --> RATE
+    RATE --> T1 & T2 & T3 & T4
+    T1 & T2 & T3 & T4 --> LOG
+    LOG --> A
+```
+
+**Threat-model talking points** (say these — they signal seniority):
+- **Confused-deputy:** the agent never holds raw credentials; the MCP server exchanges the *user's* scoped identity, so a tool can't be tricked into over-broad reads.
+- **Tool-poisoning / prompt injection via retrieved content:** retrieved text is treated as data, never as instructions; the generator is sandboxed from re-invoking tools without re-passing the AuthZ gate.
+- **Exfiltration:** column masking + egress logging means even a successful injection can't surface PII it wasn't entitled to.
+- **Blast radius:** `kill-switch-client` lets us disable a model or a single tool instantly without redeploying — critical for SR 11-7 "ability to constrain a model in production."
+
+---
+
+## 5. Governance & grounding plane (the part that wins regulated deals)
+
+```mermaid
+flowchart LR
+    subgraph Ingest["Ingestion governance"]
+        CLASS["Data classification<br/>(public / internal / PII)"]
+        EMB["Embedding + metadata tags<br/>tenant · sensitivity · source"]
+    end
+    subgraph Query["Query-time governance"]
+        IDENT["Caller identity + role"]
+        FILTER["Namespace + ACL filter<br/>(pre-retrieval)"]
+        RETR2["Retrieve only entitled chunks"]
+    end
+    subgraph Answer["Answer governance"]
+        CITE["Mandatory citation<br/>(source attribution)"]
+        FAITH["Faithfulness score<br/>(RAGAS / LLM-as-judge)"]
+        CARD["Model card + decision log"]
+    end
+    CLASS --> EMB --> RETR2
+    IDENT --> FILTER --> RETR2 --> CITE --> FAITH --> CARD
+    FAITH -- "below SLA" --> ABSTAIN["Abstain / escalate to human"]
+```
+
+This single diagram covers four JD bullets at once: **access-controlled retrieval**,
+**citation/source attribution**, **faithfulness SLAs**, and **model cards / audit**.
+The `ABSTAIN` branch is the line that separates a demo from a regulated system — *"in
+banking, a confident wrong answer is a worse outcome than 'I don't know, here's a human.'"*
+
+---
+
+## 6. Multi-tenant / namespace isolation (real concern here already)
+
+We *already* think in tenants: every product has a `productId`, and Hermes runs **two
+isolated instances (Vijay / Bheem)** with separate users, services, and backup repos. That
+is the same isolation discipline a vector DB needs.
+
+```mermaid
+flowchart TB
+    subgraph T_A["Tenant A (productId=invt_trdg)"]
+        NSA["Vector namespace A"]
+        GA["Graph partition A"]
+        SA["SQL schema A (RLS)"]
+    end
+    subgraph T_B["Tenant B (productId=notelett)"]
+        NSB["Vector namespace B"]
+        GB["Graph partition B"]
+        SB["SQL schema B (RLS)"]
+    end
+    POLICY["platform-service<br/>tenant resolver + auth"] --> NSA & NSB & GA & GB & SA & SB
+```
+
+> *"Namespace isolation isn't a vector-DB feature I'd discover late — it's how the whole
+> platform is partitioned. Pinecone namespaces / Azure AI Search index-per-tenant /
+> pgvector schema-per-tenant are just the storage expression of a `productId` model I
+> already run."*
+
+---
+
+## Cheat-sheet: which diagram answers which question
+
+| If they ask… | Draw |
+|---|---|
+| "Walk me through your RAG architecture" | §2 container view |
+| "How do you orchestrate multiple agents?" | §3 state machine |
+| "How is this secure / Zero Trust?" | §4 MCP boundary |
+| "How do you prevent hallucination in production?" | §5 governance plane (CRITIC + ABSTAIN) |
+| "How do you handle multi-tenancy at scale?" | §6 isolation |
+| "What does your whole platform look like?" | §1 context |
diff --git a/docs/INTERVIEW/02-competency-deepdives.md b/docs/INTERVIEW/02-competency-deepdives.md
new file mode 100644
index 0000000..3da773f
--- /dev/null
+++ b/docs/INTERVIEW/02-competency-deepdives.md
@@ -0,0 +1,224 @@
+# 02 · Competency Deep-Dives
+
+One section per competency-matrix row. Each gives you: **the concept** (so you sound
+fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest
+edge** (where to pivot to roadmap rather than overclaim).
+
+---
+
+## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)
+
+**Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes
+(LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability
+and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output,
+retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol
+(agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational
+multi-agent loops.
+
+**Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin)
+with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see
+`01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding;
+`packages/llm-router` is the model-selection layer a node would call.
+
+**Say this.**
+- "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
+- "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
+- "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."
+
+**Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable).
+The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground."
+
+---
+
+## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)
+
+**Concept.**
+- **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable.
+- **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction.
+- **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap.
+- **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
+- **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not.
+- **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").
+
+**Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs
+into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured
+data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement
+(`04 §B`).
+
+**Say this.**
+- "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
+- "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision."
+- "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill."
+
+**Honest edge.** Lead with the *reasoning about when to use each*, which is the architect
+signal; the implementations are well-trodden and scoped on our roadmap.
+
+---
+
+## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)
+
+**Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are
+**wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only
+semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/
+column descriptions), validate/parse the SQL before execution, enforce **row-level security**,
+and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to
+governed data.
+
+**Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed
+operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's
+the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than
+emitting arbitrary SQL.
+
+**Say this.**
+- "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
+- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog."
+
+**Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped;
+in production I've leaned on the tool-calling pattern because it's safer in regulated data."
+
+---
+
+## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)
+
+**Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware**
+parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR**
+(Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic
+boundaries, not byte counts. Tables and figures need special handling (serialize tables to
+markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section)
+— mandatory for citations.
+
+**Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc
+extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece
+(`04 §B`).
+
+**Say this.**
+- "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
+- "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."
+
+---
+
+## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)
+
+**Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected
+to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds
+it as structured context — ideal for "show the ownership chain," "trace this transaction
+counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest,
+build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then
+generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).
+
+**Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin**
+graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already
+model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is
+`04 §D`.
+
+**Say this.**
+- "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
+- "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped."
+
+**Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure
+we already operate, not a new platform bet."
+
+---
+
+## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)
+
+**Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant**
+(open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector +
+BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your
+Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** =
+namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) /
+schema or row filter (pgvector). Metadata filtering is as important as the ANN index.
+
+**Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives
+transactional consistency with the source rows. Multi-tenancy is already first-class
+(`productId`, two-instance Hermes) — see `01 §6`.
+
+**Say this.**
+- "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
+- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model."
+- "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."
+
+---
+
+## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)
+
+**Concept.** The core metrics:
+- **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination)
+- **Answer relevancy** — does the answer address the question?
+- **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality)
+- **Context recall** — did we retrieve *all* needed evidence? (retriever quality)
+- **Answer correctness** — vs. ground truth.
+
+**RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback
+functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/
+eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain").
+
+**Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is
+the source of truth, and **the AI layer is constrained to explanation / safe recommendation** —
+it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture.
+`diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the
+ready-made home for an eval harness + drift monitor (`04 §E`).
+
+**Say this.**
+- "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
+- "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
+- "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change."
+
+---
+
+## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)
+
+**Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model
+endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index;
+Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the
+model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the
+customer's tenancy.
+
+**Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`).
+`packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex
+swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference.
+
+**Say this.**
+- "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
+- "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path."
+
+---
+
+## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)
+
+**Concept.** Governance is **structural**, applied at every hop (see `01 §4–5`):
+access-controlled retrieval (you can only retrieve what your identity entitles you to),
+row/column masking, role-aware context injection, immutable audit, instant kill-switch,
+model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify
+identity + scope at the tool boundary every call.
+
+**Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` /
+`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant
+constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the
+**deepest, most defensible** part of the story.
+
+**Say this.**
+- "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
+- "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."
+
+(Full regulatory mapping in `05-banking-blueprints.md`.)
+
+---
+
+## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)
+
+**Concept.** The flagship use cases: **customer-support automation** (grounded answers from
+policy/product docs with citations + escalation), **compliance-document retrieval** (find the
+controlling clause across filings/policies), **regulatory reporting**, **model risk management**
+(SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG).
+
+**Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans,
+alerts, profiles — a domain where wrong answers have consequences and auditability matters. The
+patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.
+
+**Say this.**
+- "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source."
+- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."
+
+**Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from
+trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."
diff --git a/docs/INTERVIEW/03-star-interview-bank.md b/docs/INTERVIEW/03-star-interview-bank.md
new file mode 100644
index 0000000..c74e625
--- /dev/null
+++ b/docs/INTERVIEW/03-star-interview-bank.md
@@ -0,0 +1,143 @@
+# 03 · STAR Interview Bank
+
+Twelve stories, each grounded in real ByteLyst work, in **Situation · Task · Action ·
+Result** form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the
+**bold** line is your headline if you only get 20 seconds.
+
+> Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server,
+> llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance
+> isolation). Where a story references planned work, it's labeled — present those as
+> *design decisions and roadmaps you own*, not as shipped-and-measured outcomes.
+
+---
+
+## 1. Multi-agent orchestration without a heavy framework
+**Proves:** Agentic frameworks · orchestration topology · state-machine design
+
+- **S** — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM.
+- **T** — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2.
+- **A** — Designed `agent-queue` as a **folder-kanban state machine**: `inbox→doing→done/failed` with a `failed→inbox` requeue for human-in-the-loop, an engine flag binding each task to an agent, and `status`/`watch`/`logs` for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges.
+- **R** — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. **The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.**
+
+---
+
+## 2. A Zero-Trust tool boundary for agents (MCP)
+**Proves:** MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval
+
+- **S** — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access.
+- **T** — Make the tool layer a **policy enforcement point**, not a passthrough.
+- **A** — Centralized tools behind `mcp-server` (:4007) with `mcp-client`: a typed/versioned tool registry, an authZ check on **every** call (identity + scope + role), column masking via `field-encrypt`, rate/cost caps with a `kill-switch`, and an audit emit to `event-store`. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration.
+- **R** — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. **Governance lives in the boundary, so no product surface can route around it.**
+
+---
+
+## 3. Grounding by architecture, not by prompt (flowmonk)
+**Proves:** Grounding · hallucination mitigation · faithfulness
+
+- **S** — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant.
+- **T** — Deliver helpful AI without letting the model be the source of truth.
+- **A** — Made a **deterministic scheduler authoritative** and **constrained the AI layer to explanation, summarization, and safe recommendation only**. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail.
+- **R** — The assistant is helpful *and* can't hallucinate an invalid plan into existence. **This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.**
+
+---
+
+## 4. Schema-aware tool-calling instead of free Text-to-SQL
+**Proves:** Structured retrieval · Text-to-SQL judgment · safety
+
+- **S** — `invt_trdg` users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals.
+- **T** — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL.
+- **A** — Built the AI chat as **typed, parameterized tool-calling** over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool.
+- **R** — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. **I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.**
+
+---
+
+## 5. Provider-portable model layer (llm-router)
+**Proves:** Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing
+
+- **S** — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change.
+- **T** — Make model choice a config decision.
+- **A** — Introduced `packages/llm-router` as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with `ollama-client` for on-prem/air-gapped inference.
+- **R** — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. **Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.**
+
+---
+
+## 6. Multi-tenant isolation as a platform default
+**Proves:** Vector DB multi-tenancy · namespace isolation · governance
+
+- **S** — Several products share one platform; a cross-tenant data leak would be catastrophic.
+- **T** — Make isolation structural, not per-feature discipline.
+- **A** — Every product carries a `productId`; Hermes runs **two fully isolated instances (Vijay/Bheem)** with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant.
+- **R** — Isolation is the default the whole platform is partitioned by. **When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.**
+
+---
+
+## 7. Unstructured ingestion pipeline (extraction-service)
+**Proves:** Unstructured retrieval · ingestion · provenance
+
+- **S** — Agents needed to answer from external documents and URLs, not just structured data.
+- **T** — Turn messy unstructured sources into clean, retrievable, attributable units.
+- **A** — Built `extraction-service` (:4005) + `packages/extraction` to parse URLs/docs into retrievable units; `notelett` provides a structured-notes store for human+agent content.
+- **R** — A working ingestion path into the fabric. **The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'**
+
+---
+
+## 8. Operational observability for AI systems (Hermes)
+**Proves:** Eval-harness home · drift monitoring · production ops
+
+- **S** — Running agentic services in production with no single pane meant blind spots.
+- **T** — One control plane for the agentic fabric.
+- **A** — Built **Hermes Mission Control** (Next.js + Fastify) with `diagnostics-client`/`telemetry-client`/`monitoring`; the `hermes-ops` module already models both instances as the seed for real data.
+- **R** — A live ops console for the ecosystem. **It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.**
+
+---
+
+## 9. Instant blast-radius control (kill-switch + flags)
+**Proves:** Governance · Zero Trust · SR 11-7 ("constrain a model in production")
+
+- **S** — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle.
+- **T** — Decouple "turn this off" from "ship a release."
+- **A** — Adopted `feature-flag-client` + `kill-switch-client` so any model or individual tool can be disabled live; combined with `event-store` audit so the action is logged.
+- **R** — Sub-minute containment without a redeploy. **This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.**
+
+---
+
+## 10. Disaster recovery + parity discipline
+**Proves:** Production rigor · regulated-grade operations
+
+- **S** — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot.
+- **T** — Drive both to parity with persistent backup, watchdog, and **tested** restore.
+- **A** — Documented the gap explicitly in the v2 roadmap (`hermes_dashboard_v2_roadmap.md`) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance.
+- **R** — A named, prioritized closure plan. **In regulated environments 'we have backups' is not a control until restore is *tested*; I treat untested DR as an open finding, not a checkbox.**
+
+---
+
+## 11. Bounded autonomy with human-in-the-loop
+**Proves:** Agentic safety · orchestration · abstain-and-escalate
+
+- **S** — Autonomous agents that never escalate will confidently do the wrong thing.
+- **T** — Build escalation into the topology.
+- **A** — In `agent-queue`, `failed` routes back to `inbox` for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to **abstain/escalate** (see `01 §5`).
+- **R** — The system degrades to a human instead of degrading to a hallucination. **The escalation edge is the most important edge in the graph for a regulated deployment.**
+
+---
+
+## 12. Documentation & decision rigor as an architect
+**Proves:** ADRs · blueprints · roadmaps · mentoring / CoE contribution
+
+- **S** — A multi-product ecosystem with multiple agent engines drifts without written decisions.
+- **T** — Make architecture legible to engineers and execs.
+- **A** — Maintained an ADR directory, roadmaps (`hermes_*_roadmap.md`, `deployment-optimization-roadmap.md`), a repo map, and agent-facing `AGENTS.md`/`CLAUDE.md` so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator.
+- **R** — New contributors (human or agent) onboard from canonical docs. **This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.**
+
+---
+
+## Behavioral / leadership prompts — quick frames
+
+| Prompt | Lead with |
+|---|---|
+| "Tell me about a time you influenced without authority." | #12 docs/ADRs driving multi-agent consistency. |
+| "A production AI system gave a wrong answer. What did you do?" | #3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch. |
+| "How do you handle disagreement on architecture?" | ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing. |
+| "Describe mentoring junior engineers." | The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task. |
+| "Biggest technical mistake?" | Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore. |
+| "Why this role / why financial services?" | Trading product taught me to engineer for *consequences*; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off. |
diff --git a/docs/INTERVIEW/04-enhancement-roadmap.md b/docs/INTERVIEW/04-enhancement-roadmap.md
new file mode 100644
index 0000000..7755d19
--- /dev/null
+++ b/docs/INTERVIEW/04-enhancement-roadmap.md
@@ -0,0 +1,191 @@
+# 04 · Enhancement Roadmap — make every claim literally true
+
+This is the "what would you build here" answer, and it doubles as a real backlog. Each
+enhancement turns an *adjacent* capability into a *shipped* one on infrastructure we
+already run. They're ordered so each builds on the last; the whole set is a credible
+"agentic-RAG fabric, hardened" program.
+
+> Mapping note: these slot into the existing repo conventions — new code under
+> `learning_ai_common_plat/packages` + a `services/rag-service`, eval harness surfaced in
+> `learning_ai_devops_tools/dashboard` (Hermes), and ADRs under
+> `learning_ai_devops_tools/docs/adr/`. Cut tracker items via `scripts/tracker-seed/`.
+
+```mermaid
+flowchart LR
+    A["§A LangGraph port<br/>+ A2A agent cards"] --> B["§B Hybrid retrieval<br/>pgvector+BM25+rerank+HyDE/CRAG/Self-RAG"]
+    B --> C["§C Guarded Text-to-SQL<br/>read-only views + RLS"]
+    B --> D["§D Cosmos Gremlin<br/>knowledge graph + Graph RAG"]
+    B --> E["§E RAGAS/DeepEval harness<br/>+ drift monitor in Hermes"]
+    C & D --> F["§F Model-card registry<br/>+ governance pack"]
+    E --> F
+    classDef p1 fill:#dcfce7,stroke:#16a34a
+    classDef p2 fill:#fef9c3,stroke:#ca8a04
+    classDef p3 fill:#fee2e2,stroke:#dc2626
+    class A,B p1
+    class C,D,E p2
+    class F p3
+```
+
+| Phase | Enhancements | Why now |
+|---|---|---|
+| **P1 (foundation)** | §A, §B | Orchestration + retrieval are the spine; everything else attaches to them. |
+| **P2 (sources + quality)** | §C, §D, §E | Add structured + graph sources and the eval loop that proves quality. |
+| **P3 (governance)** | §F | Wrap the now-real fabric in the regulated-grade governance story. |
+
+---
+
+## §A — Port `agent-queue` topology onto LangGraph + add A2A handoff
+
+**Goal:** make the "prod-grade LangGraph" claim literal while keeping the proven state model.
+
+- New `packages/agent-graph`: a typed `StateGraph` with nodes `route → retrieve → grade → (rewrite) → generate → critique`, conditional + cyclic edges, and a checkpointer backed by `event-store`.
+- Keep `agent-queue`'s engine-selection idea as **node-level model binding** through `llm-router`.
+- Expose each product agent with an **A2A agent card** (capabilities, auth scope, cost hints) so a supervisor agent can delegate; the card is served by `mcp-server`.
+
+```mermaid
+stateDiagram-v2
+    [*] --> route
+    route --> retrieve: needs evidence
+    route --> generate: parametric/FAQ
+    retrieve --> grade
+    grade --> rewrite: low relevance (CRAG)
+    rewrite --> retrieve
+    grade --> generate: ok
+    generate --> critique
+    critique --> rewrite: ungrounded (Self-RAG)
+    critique --> [*]: grounded + cited
+```
+
+**Acceptance:** a LangGraph run with a forced low-relevance retrieval demonstrably loops
+through `rewrite`; checkpoints land in `event-store`; one product reachable via A2A card.
+**Effort:** M. **Risk:** low (mapping is 1:1 with today's state machine).
+
+---
+
+## §B — Hybrid retrieval: pgvector + BM25 + rerank + HyDE / CRAG / Self-RAG
+
+**Goal:** turn "I understand hybrid RAG" into a running `services/rag-service`.
+
+- **pgvector** alongside the existing Postgres → one DB, one backup, transactional consistency with source rows; **schema-per-tenant** namespaces (mirrors `productId`).
+- **BM25** lexical (Postgres FTS or an OpenSearch sidecar) fused with vector via **RRF**.
+- **Cross-encoder rerank** (bge-reranker or ColBERT late-interaction) on the fused candidates; **context compression** to fit budget.
+- **HyDE** query rewriter node; **CRAG** relevance gate; **Self-RAG** groundedness critic (the §A nodes).
+- **Layout-aware ingestion** in `extraction-service`: PyMuPDF / Unstructured.io, OCR fallback, table preservation, **page/section provenance** on every chunk.
+
+```mermaid
+flowchart LR
+    Q --> HYDE[HyDE rewrite] --> EMB[embed]
+    EMB --> VEC[(pgvector ANN)]
+    Q --> BM[(BM25)]
+    VEC & BM --> RRF[RRF fuse] --> RR[cross-encoder rerank] --> CC[context compress] --> GEN
+```
+
+**Acceptance:** hybrid beats vector-only on a golden set (context-recall ↑, context-precision ↑);
+every chunk carries doc/page/section provenance; abstain fires when reranked top-score < τ.
+**Effort:** L. **Risk:** medium (reranker latency budget — mitigate with rerank-top-k only).
+
+---
+
+## §C — Guarded Text-to-SQL tool
+
+**Goal:** add genuine generative SQL for ad-hoc analytics without the foot-guns.
+
+- Register a `sql-query` tool on `mcp-server` scoped to **read-only semantic views** (no base tables), with **row-level security** by tenant/role.
+- **Schema-aware retrieval:** embed table/column descriptions; retrieve only the relevant schema slice into the prompt (don't dump the catalog).
+- Parse + validate generated SQL (allow-list of statements, forbid cross-schema joins, enforce `LIMIT`); cost-cap and timeout.
+- Audit every generated query + row count to `event-store`.
+
+**Acceptance:** an attempt to read an unentitled column is blocked at the view/RLS layer
+and logged; a malformed/oversized query is rejected pre-execution.
+**Effort:** M. **Risk:** medium (this is the highest-leakage surface — keep it behind views).
+
+---
+
+## §D — Cosmos Gremlin knowledge graph + Graph RAG
+
+**Goal:** answer "connected-to" questions (KYC/AML-shaped) on infra we already run.
+
+- Use the existing **Azure Cosmos DB Gremlin** API. Entity/relation extraction at ingest (from `extraction-service` output + structured rows) builds the graph.
+- **Graph-augmented retrieval:** vector hit seeds an entry node → bounded Gremlin traversal returns the subgraph → fuse subgraph + text chunks into context.
+- Expose a `graph-query` tool on `mcp-server` (read-only, depth-bounded).
+
+```mermaid
+flowchart LR
+    Q --> V[(vector seed)] --> N[entry entity]
+    N --> G[(Gremlin traversal<br/>≤2 hops)]
+    G --> SUB[subgraph]
+    SUB --> FUSE[fuse w/ text chunks] --> GEN
+```
+
+**Acceptance:** a 2-hop relationship question that vector-only fails is answered correctly
+with the subgraph cited; traversal depth/time bounded.
+**Effort:** L. **Risk:** medium (graph modeling + traversal cost).
+
+---
+
+## §E — Evaluation harness + factual-drift monitor in Hermes
+
+**Goal:** make "RAGAS / faithfulness SLAs / drift monitoring" real and visible.
+
+- **Offline (CI):** **DeepEval** pytest-style assertions on a golden set — faithfulness, answer-relevancy, context-precision, context-recall, answer-correctness. Regression below threshold **blocks deploy**.
+- **Online:** sample production traces, score with **RAGAS / LLM-as-judge**, emit metrics via `telemetry-client`.
+- **Hermes pane:** a "RAG Quality" panel (extends `hermes-ops`) trending the five metrics per tenant + a **drift alert** when faithfulness/recall degrade week-over-week.
+- Wire **abstain rate** and **escalation rate** as first-class SLAs.
+
+```mermaid
+flowchart TB
+    subgraph CI["Offline / CI (DeepEval)"]
+        G[golden set] --> SC1[score] --> GATE{≥ SLA?}
+        GATE -- no --> BLOCK[block deploy]
+        GATE -- yes --> SHIP[ship]
+    end
+    subgraph PROD["Online (RAGAS / judge)"]
+        TR[sampled traces] --> SC2[score] --> TEL[telemetry-client] --> HERMES[Hermes RAG-Quality pane]
+        HERMES --> DRIFT{drift?} -- yes --> ALERT[alert + open finding]
+    end
+```
+
+**Acceptance:** a deliberately-degraded retriever fails the CI gate; the Hermes pane shows
+the five metrics per tenant and fires a drift alert on a seeded regression.
+**Effort:** M. **Risk:** low-medium (judge cost — sample, don't score 100%).
+
+---
+
+## §F — Model-card registry + governance pack
+
+**Goal:** the regulated-grade documentation/audit layer (SR 11-7 / EU AI Act ready).
+
+- **Model-card registry** (a `governance` package + Hermes pane): per deployed model/agent — purpose, data sources, eval scores, known limits, owner, last-reviewed date, kill-switch link.
+- **Decision log:** every generation's (query, retrieved sources, model, faithfulness score, abstain/answer) to `event-store` → reproducible audit trail.
+- **RACI doc** template per engagement; **ADR** set under `docs/adr/` for each architectural choice.
+- Map controls to **SR 11-7** (model inventory, validation, monitoring, change control) and **EU AI Act** (risk classification, logging, human oversight, transparency) — see `05-banking-blueprints.md`.
+
+**Acceptance:** every production model has a card with current eval scores + owner; any
+answer can be reconstructed from the decision log; controls trace to named regulatory clauses.
+**Effort:** M. **Risk:** low (mostly assembly over existing `event-store`/flags/auth).
+
+---
+
+## Sequencing & "what I'd do in the first 90 days" (great closing answer)
+
+```mermaid
+gantt
+    title Agentic-RAG hardening — 90-day view
+    dateFormat X
+    axisFormat %s
+    section Foundation
+    §A LangGraph + A2A      :a, 0, 3
+    §B Hybrid retrieval     :b, 1, 5
+    section Sources & Quality
+    §C Guarded Text-to-SQL  :c, 5, 3
+    §D Graph RAG (Gremlin)  :d, 5, 4
+    §E Eval harness + drift :e, 4, 4
+    section Governance
+    §F Model cards + RACI   :f, 8, 3
+```
+
+> *"In 90 days I'd stand up the retrieval spine and the eval harness first — because you
+> can't tune what you can't measure — then layer structured + graph sources, and close with
+> the governance pack so the whole thing is audit-ready. Notice governance isn't last
+> because it's least important; it's last because by then it's mostly **assembling controls
+> the platform already enforces** (auth, masking, kill-switch, audit) into cards and RACI."*
diff --git a/docs/INTERVIEW/05-banking-blueprints.md b/docs/INTERVIEW/05-banking-blueprints.md
new file mode 100644
index 0000000..d6d7467
--- /dev/null
+++ b/docs/INTERVIEW/05-banking-blueprints.md
@@ -0,0 +1,175 @@
+# 05 · Banking Solution Blueprints (client-ready)
+
+Two end-to-end blueprints you can present to a financial-services client, in the JD's own
+deliverable formats: **solution architecture + ADRs + phased roadmap + regulatory mapping**.
+Both reuse the ByteLyst fabric patterns from `01-ecosystem-rag-fabric.md`.
+
+---
+
+# Blueprint 1 — Compliance Document Retrieval Assistant
+
+**Use case:** compliance analysts ask natural-language questions ("What is our retention
+obligation for KYC records under the latest policy?") and get a **grounded, cited** answer
+drawn from regulatory filings, internal policies, and procedure manuals — or an explicit
+*"insufficient evidence, escalate."*
+
+## Architecture
+
+```mermaid
+flowchart TB
+    AN[👤 Compliance analyst] --> APP[Assistant UI]
+    APP --> ORCH
+
+    subgraph ORCH["Agentic orchestration (LangGraph)"]
+        R{{route}} --> RET[retrieve] --> GR{{CRAG grade}}
+        GR -- weak --> RW[HyDE rewrite] --> RET
+        GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}}
+        CR -- ungrounded --> RW
+        CR -- grounded --> OUT[answer + clause citations]
+        CR -- no evidence --> ESC[escalate to human]
+    end
+
+    subgraph RETR["Hybrid retrieval"]
+        VEC[(Azure AI Search<br/>vector + BM25 + semantic rerank)]
+        KG[(Cosmos Gremlin<br/>policy ⇄ regulation graph)]
+    end
+    RET --> VEC & KG
+
+    subgraph GOV["Governance plane"]
+        ACL[role-aware ACL filter]
+        AUD[event-store audit]
+        CARD[model card + decision log]
+    end
+    RET -.-> ACL
+    GEN -.-> AUD
+    OUT -.-> CARD
+```
+
+**Why these choices (headline ADRs below):** Azure AI Search gives managed hybrid +
+semantic rerank inside one audit boundary; the Gremlin graph links *policies ↔ controlling
+regulations* so "what regulation drives this clause" is a traversal, not a guess; the critic
++ escalate edge guarantees no confident-wrong answers on compliance questions.
+
+## Ingestion (layout-aware, provenance-first)
+
+```mermaid
+flowchart LR
+    DOC[Filings · policies · procedures<br/>PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io<br/>+ OCR fallback]
+    PARSE --> CHUNK[layout + semantic chunking<br/>tables preserved]
+    CHUNK --> META[attach provenance<br/>doc·page·section·effective-date·sensitivity]
+    META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)]
+    META --> GRAPH[(extract policy↔reg edges → Gremlin)]
+```
+
+> **Effective-date metadata is a compliance requirement, not a nicety:** retrieval must be
+> able to answer "as of" a date and never cite a superseded policy as current.
+
+## Phased delivery
+
+| Phase | Scope | Exit criteria |
+|---|---|---|
+| **0 · Discovery (2–3 wks)** | Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs | Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess) |
+| **1 · PoC (4–6 wks)** | Hybrid retrieval over a bounded corpus, citations, abstain path | Beats keyword search on the golden set; every answer cited or escalated |
+| **2 · Hardening (6–8 wks)** | Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor | SLAs met under eval harness; controls mapped to SR 11-7 |
+| **3 · Production (ongoing)** | Model cards, audit, human-in-loop ops, change control | Audit trail reproducible; quarterly model-card review live |
+
+---
+
+# Blueprint 2 — Customer-Support Automation (retail banking)
+
+**Use case:** a grounded support agent answers customer questions from product docs, fee
+schedules, and account-policy content — with **strict masking of customer PII**, citations,
+and instant handoff to a human for anything account-specific or low-confidence.
+
+## Architecture
+
+```mermaid
+flowchart TB
+    C[👤 Customer] --> CH[Support chat]
+    CH --> ORCH2
+
+    subgraph ORCH2["Orchestration"]
+        RT{{route:<br/>info vs. account-action}}
+        RT -- "general info" --> RAG[grounded RAG answer]
+        RT -- "account-specific" --> AUTHZ{step-up auth + entitlement}
+        AUTHZ -- ok --> TOOL[typed account tool via MCP<br/>masked fields]
+        AUTHZ -- fail / sensitive --> HUMAN[human handoff]
+        RAG --> CONF{confidence ≥ SLA?}
+        CONF -- no --> HUMAN
+        CONF -- yes --> ANS[answer + citation]
+    end
+
+    subgraph GOV2["Zero-Trust + governance"]
+        MASK[field-encrypt column masking]
+        KILL[kill-switch per tool/model]
+        LOG[event-store audit]
+    end
+    TOOL -.-> MASK
+    RT -.-> KILL
+    ANS -.-> LOG
+    TOOL -.-> LOG
+```
+
+**Key design stances:**
+- **Two lanes by intent.** General-info → RAG over public/internal docs. Account-specific →
+  typed MCP tool behind **step-up auth + entitlement check + field masking**. The model never
+  free-queries customer data.
+- **Confidence gate → human.** Below SLA, hand off. In banking support, escalation is a
+  feature, not a failure.
+- **PII never enters the prompt unmasked.** Masking is enforced at the MCP boundary
+  (`field-encrypt`), so no prompt-engineering mistake can leak it.
+
+## Phased delivery (condensed)
+
+1. **Discovery** — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs.
+2. **PoC** — info-lane RAG with citations + handoff; no account access yet.
+3. **Account lane** — MCP typed tools, step-up auth, masking, full audit.
+4. **Production** — eval harness, drift monitor, model cards, change control.
+
+---
+
+# Cross-cutting: Regulatory control mapping
+
+This table is gold in the room — it shows you map *architecture* to *named clauses*.
+
+| Requirement | Source | How the architecture satisfies it |
+|---|---|---|
+| Model inventory & ownership | **SR 11-7** | Model-card registry (`04 §F`): every model/agent has a card with owner + purpose. |
+| Independent validation | **SR 11-7 / OCC** | RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence. |
+| Ongoing monitoring | **SR 11-7** | Online RAGAS scoring + factual-drift alerts in Hermes. |
+| Ability to constrain a model in production | **SR 11-7** | `kill-switch-client` disables a model/tool live, audited. |
+| Change control | **SR 11-7** | ADRs + CI eval gate; no deploy below faithfulness SLA. |
+| Risk classification of AI system | **EU AI Act** | Blueprint declares risk tier; high-risk paths get human oversight by design. |
+| Logging & traceability | **EU AI Act** | `event-store` decision log: query, sources, model, score, outcome — reproducible. |
+| Human oversight | **EU AI Act** | Confidence-gate → human handoff edge in both blueprints. |
+| Transparency to user | **EU AI Act** | Mandatory citations + "AI-assisted" disclosure + abstain language. |
+| Right to data protection / minimization | **GDPR / CCPA** | Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks. |
+| Data subject access / deletion | **GDPR / CCPA** | Provenance metadata + tenant namespaces make targeted deletion + re-index feasible. |
+
+---
+
+# Sample ADRs (the format they want you to produce)
+
+### ADR-001 — Hybrid retrieval over pure-vector
+- **Status:** Accepted
+- **Context:** Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses.
+- **Decision:** Vector ⊕ BM25 fused with RRF, then cross-encoder rerank.
+- **Consequences:** +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries.
+
+### ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data
+- **Status:** Accepted
+- **Context:** Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof.
+- **Decision:** Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS.
+- **Consequences:** Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit.
+
+### ADR-003 — Abstain-and-escalate as a first-class outcome
+- **Status:** Accepted
+- **Context:** In regulated support/compliance, a confident wrong answer is the worst outcome.
+- **Decision:** Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error.
+- **Consequences:** Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal.
+
+### ADR-004 — Provider-portable model layer (router seam)
+- **Status:** Accepted
+- **Context:** Data-residency + vendor-risk requirements vary per client.
+- **Decision:** All inference behind `llm-router`; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama.
+- **Consequences:** Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture.
diff --git a/docs/INTERVIEW/06-glossary-quickref.md b/docs/INTERVIEW/06-glossary-quickref.md
new file mode 100644
index 0000000..80176a3
--- /dev/null
+++ b/docs/INTERVIEW/06-glossary-quickref.md
@@ -0,0 +1,112 @@
+# 06 · Glossary & Rapid-Fire Quick-Reference
+
+The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the
+**bold** line cleanly for each, you sound fluent.
+
+---
+
+## Advanced RAG techniques
+
+| Term | What it is | One-liner |
+|---|---|---|
+| **HyDE** | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed *that* and retrieve against it. | **"Fixes recall by closing the question↔document vocabulary gap."** |
+| **CRAG** | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | **"A relevance gate that re-retrieves instead of generating from junk."** |
+| **Self-RAG** | Model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*; loops if not. | **"The model critiques its own groundedness before answering."** |
+| **RAPTOR** | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | **"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."** |
+| **Reranking** | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | **"Fixes precision; I rerank only the top-k to control latency."** |
+| **ColBERT** | Late-interaction reranking — token-level MaxSim. Accurate *and* scalable. | **"Token-level matching without re-encoding the whole pair per query."** |
+| **Context compression** | Drop/condense retrieved spans to fit budget and cut distraction. | **"Less, more-relevant context beats more context."** |
+| **Hybrid search** | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via **RRF**. | **"Vector for meaning, BM25 for exact terms, graph for relationships."** |
+| **RRF** | Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed. | **"Tuning-free way to fuse vector + lexical ranks."** |
+| **Semantic chunking** | Split on topic/structure boundaries, not byte counts. | **"Chunk on document structure first, size second."** |
+| **Agentic RAG** | An agent *decides* when/what/how to retrieve, can use multiple tools and loop. | **"RAG as a control flow, not a single hop."** |
+
+---
+
+## Evaluation metrics (RAGAS vocabulary)
+
+| Metric | Question it answers | What it isolates |
+|---|---|---|
+| **Faithfulness / groundedness** | Is every claim supported by retrieved context? | Hallucination |
+| **Answer relevancy** | Does the answer address the question? | Generation focus |
+| **Context precision** | Are the *top* retrieved chunks the relevant ones? | Reranker quality |
+| **Context recall** | Did we retrieve *all* needed evidence? | Retriever quality |
+| **Answer correctness** | Right vs. ground truth? | End-to-end |
+
+- **RAGAS** — library that computes these (often LLM-as-judge).
+- **TruLens** — RAG-triad + feedback functions + tracing.
+- **DeepEval** — pytest-style assertions → CI gates.
+- **LangSmith** — tracing + eval ops for LangChain/LangGraph.
+
+> **Diagnostic move to say out loud:** *"Low context-recall → fix the **retriever** (HyDE,
+> hybrid, chunking). High recall but low context-precision → fix the **reranker**. Good
+> context but low faithfulness → fix the **generator/prompt** or **abstain**."*
+
+---
+
+## Agentic frameworks
+
+| Term | Essence |
+|---|---|
+| **LangChain** | Primitives: tool binding, structured output, retrievers, chains. |
+| **LangGraph** | Agents as **state graphs**: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. |
+| **Google ADK** | Google's Agent Development Kit for building/deploying agents. |
+| **A2A** | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. |
+| **AutoGen** | Conversational multi-agent loops (agents talk to each other). |
+| **MCP** | Model Context Protocol: standard for exposing **tools/resources** to models via a server, with typed registration. |
+
+---
+
+## Governance & regulatory
+
+| Term | What to say |
+|---|---|
+| **SR 11-7** | US Fed/OCC **model risk management** guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. **"My eval harness + model cards + kill-switch + decision log map directly to its pillars."** |
+| **OCC model risk** | Aligned with SR 11-7; emphasizes governance + effective challenge. |
+| **EU AI Act** | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. **"High-risk paths get human-in-the-loop and full decision logging by design."** |
+| **GDPR / CCPA** | Data protection/minimization, subject access/deletion. **"Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."** |
+| **Zero Trust (for agents)** | Never trust the agent's request implicitly; verify identity + scope at the tool boundary **every** call. **"The MCP server is my policy enforcement point."** |
+| **Access-controlled retrieval** | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). |
+| **Row/column masking** | Mask sensitive fields at the boundary regardless of query. |
+| **Model card** | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. |
+| **RACI** | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. |
+
+---
+
+## ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")
+
+| JD theme | Say this anchor |
+|---|---|
+| Multi-agent orchestration | **`agent-queue`** — claude/codex/devin, `inbox→doing→done/failed` state machine. |
+| MCP / Zero-Trust tool boundary | **`mcp-server` :4007 + `mcp-client`**, authZ per call, masking, kill-switch, audit. |
+| Provider-portable models | **`llm-router`** (Azure OpenAI / Bedrock / Vertex / Ollama). |
+| Grounding by architecture | **`flowmonk`** — deterministic engine authoritative, AI = explanation/safe-reco only. |
+| Schema-aware structured retrieval | **`invt_trdg` AI chat** — typed tool-calling over markets, not free SQL. |
+| Unstructured ingestion | **`extraction-service` :4005 + `packages/extraction`**; `notelett` store. |
+| Graph (Cosmos Gremlin) | **`packages/cosmos`** — Cosmos DB Gremlin API in prod. |
+| Vector / multi-tenancy | **pgvector path** + `productId` / two-instance Hermes isolation. |
+| Eval / ops console | **Hermes Mission Control** + `telemetry`/`diagnostics`/`monitoring`. |
+| Governance primitives | **`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`**. |
+| Banking domain analog | **`invt_trdg`** — regulated-consequence domain; abstain-over-guess discipline. |
+
+---
+
+## Likely curveballs + crisp answers
+
+- **"How do you stop prompt injection from retrieved docs?"** → "Treat retrieved text as *data*, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
+- **"Vector DB choice?"** → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
+- **"How do you measure if RAG is 'good enough' for prod?"** → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
+- **"Free Text-to-SQL — yes or no?"** → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
+- **"Latency vs. accuracy trade-off?"** → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
+- **"How is this different from a chatbot demo?"** → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can *prove* why it answered and refuse when it shouldn't."
+- **"What worries you most in agentic RAG for banking?"** → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."
+
+---
+
+## 30-second close
+
+> *"I build agentic systems where the hard engineering is in the boundaries — what a tool
+> can retrieve, how output is grounded and cited, when to abstain, and how every hop is
+> audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider-
+> routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away
+> from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."*
diff --git a/docs/INTERVIEW/README.md b/docs/INTERVIEW/README.md
new file mode 100644
index 0000000..f2ae20c
--- /dev/null
+++ b/docs/INTERVIEW/README.md
@@ -0,0 +1,113 @@
+# Senior Agentic RAG Architect — Interview Prep Kit
+
+> Target role: **Senior Agentic RAG Architect — TEKsystems Global Services, Product Engineering Group**
+> Candidate anchor: the **ByteLyst ecosystem** (this monorepo workspace).
+> Purpose: turn what we already run in production into a defensible, evidence-backed
+> narrative for every line of the job description — plus a concrete roadmap of
+> enhancements that make each claim *literally true* if we choose to build them.
+
+This kit is deliberately structured so you can walk into the interview and, for **any**
+competency on the matrix, point to (a) a real system we run, (b) an architecture diagram,
+(c) a STAR story, and (d) a credible "here's how I'd take it to enterprise scale" answer.
+
+---
+
+## How to use this kit
+
+| If you have… | Read |
+|---|---|
+| 60 minutes the night before | `06-glossary-quickref.md` then this README's matrix |
+| A full prep day | All docs in order 01 → 06 |
+| A whiteboard / panel round | `01-ecosystem-rag-fabric.md` + `05-banking-blueprints.md` |
+| A behavioral / leadership round | `03-star-interview-bank.md` |
+| A "what would you build here" round | `04-enhancement-roadmap.md` |
+
+### Documents
+
+1. **[01-ecosystem-rag-fabric.md](01-ecosystem-rag-fabric.md)** — The ByteLyst ecosystem re-drawn as an agentic RAG retrieval fabric. Context, container, retrieval-pipeline, multi-agent topology, MCP Zero-Trust, and governance diagrams.
+2. **[02-competency-deepdives.md](02-competency-deepdives.md)** — Every competency-matrix row: the concept, how it maps to our code, talking points, and honest gaps.
+3. **[03-star-interview-bank.md](03-star-interview-bank.md)** — 12 STAR stories grounded in real ecosystem work (Hermes, agent-queue, mcp-server, invt_trdg AI chat, flowmonk grounding, llm-router).
+4. **[04-enhancement-roadmap.md](04-enhancement-roadmap.md)** — Buildable enhancements that convert "I understand X" into "I shipped X here": pgvector hybrid retrieval, CRAG/Self-RAG loops, RAGAS eval harness, Cosmos Gremlin knowledge graph, model-card registry in Hermes.
+5. **[05-banking-blueprints.md](05-banking-blueprints.md)** — Two client-ready solution blueprints (compliance-document retrieval; customer-support automation) with ADRs, SR 11-7 / EU AI Act alignment, and phased delivery.
+6. **[06-glossary-quickref.md](06-glossary-quickref.md)** — Rapid-fire definitions and crisp answers: RAPTOR, HyDE, CRAG, Self-RAG, ColBERT, RAGAS metrics, SR 11-7, EU AI Act, Zero Trust for agents.
+
+---
+
+## The role in one paragraph
+
+Design, build, and tune **enterprise-grade RAG systems that power agentic applications**,
+fusing **structured (RDBMS / warehouse), unstructured (PDF / docs / email), and graph
+(knowledge-graph / ontology)** sources into one **governed** retrieval fabric. Be the
+technical authority across **financial-services** engagements; enforce **grounding,
+citation, hallucination mitigation**; own **evaluation harnesses (RAGAS / TruLens /
+DeepEval)**; embed **Zero Trust, access-controlled retrieval, SR 11-7 / EU AI Act**
+governance; and lead **ADRs, blueprints, roadmaps** for execs and engineers.
+
+---
+
+## Competency matrix → ByteLyst evidence
+
+The JD's matrix is reproduced verbatim in the left columns; the right column is **our
+real anchor** in this ecosystem (where it exists today) plus a pointer to the enhancement
+that hardens it.
+
+| Competency | Must-have | Nice-to-have | ByteLyst anchor (today → planned) |
+|---|---|---|---|
+| **Agentic Frameworks** | LangGraph, LangChain (prod-grade) | Google ADK, A2A, AutoGen | `agent-queue/` multi-engine runner (claude·codex·devin) is a real folder-kanban orchestration topology with state transitions (`inbox→doing→done/failed`) — a hand-rolled state machine analogous to LangGraph nodes/edges. `packages/mcp-client` + `mcp-server` (:4007) provide tool binding. → **04 §A** ports the topology onto LangGraph and adds an A2A handoff contract. |
+| **RAG Architecture** | Hybrid retrieval, reranking, HyDE, Self-RAG | RAPTOR, multimodal | `packages/extraction` + `extraction-service` (:4005) parse URLs/docs into retrievable units today; `invt_trdg` AI chat already does retrieve-then-reason over structured data. → **04 §B** adds vector+BM25 hybrid, cross-encoder rerank, HyDE & CRAG loops. |
+| **Structured Retrieval** | Text-to-SQL, schema-aware retrieval | Snowflake Cortex, BigQuery ML | `invt_trdg` AI chat assistant maps NL → trading actions/queries over a typed domain (quotes, plans, watchlists) — schema-aware tool-calling, the safe cousin of free Text-to-SQL. → **04 §C** adds a guarded Text-to-SQL tool with read-only views + row-level filters. |
+| **Unstructured Retrieval** | PDF parsing, layout-aware chunking | Multi-modal pipelines | `packages/extraction` + `extraction-service`; `notelett` ingests structured notes for humans+agents. → **04 §B** adds PyMuPDF/Unstructured.io layout-aware chunking + OCR fallback. |
+| **Graph RAG** | KG + vector hybrid | SPARQL, ontology design | We run **Azure Cosmos DB** (`packages/cosmos`); Cosmos exposes the **Gremlin** graph API. `event-store`/`events` already model entity relationships. → **04 §D** stands up a Cosmos Gremlin knowledge graph + graph-augmented retrieval. |
+| **Vector Databases** | Pinecone / Weaviate / Azure AI Search | Qdrant, pgvector, multi-tenancy | Postgres is in the stack; **pgvector** is the lowest-friction path. Multi-tenant namespace isolation is already a first-class concern (per-product `productId`, two-instance Hermes Vijay/Bheem). → **04 §B** adds pgvector with per-tenant namespaces. |
+| **Grounding & Eval** | RAGAS, TruLens, faithfulness SLAs | LangSmith, LLM-as-judge | `flowmonk` deliberately **bounds the AI layer to explanation/safe recommendation** over a deterministic engine — a production grounding pattern. `diagnostics-client`/`telemetry-client`/`monitoring` + Hermes dashboards are the eval-harness home. → **04 §E** wires a RAGAS/DeepEval harness + drift monitor pane in Hermes. |
+| **Cloud Platform** | Azure (AI Foundry, OpenAI, Search) | AWS Bedrock, GCP Vertex | Azure Cosmos DB in prod (`_AZURE/`, `packages/cosmos`); `packages/llm-router` abstracts providers so Azure OpenAI / Bedrock / Vertex are swap-in. → **02** talks Azure AI Search as the managed hybrid index. |
+| **AI Governance** | Access-controlled RAG, Zero Trust | SR 11-7, EU AI Act | `packages/auth` + `fastify-auth`, `field-encrypt`/`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant model kill), `event-store` (immutable audit). MCP tool boundaries are explicit. → **05** maps all of this to SR 11-7 + EU AI Act. |
+| **Domain: Banking** | Support / compliance automation | Model risk mgmt, KYC/AML | `invt_trdg` is our regulated-industry analog (markets, trade plans, alerts, auditability). → **05** translates it into a bank customer-support + compliance-retrieval blueprint. |
+
+---
+
+## Honest gap analysis (say this out loud — it builds trust)
+
+Be candid in the interview. Frame it as *"here's what's production-real, here's what's
+adjacent, here's exactly how I'd close the gap."*
+
+```mermaid
+quadrantChart
+    title Evidence strength vs. JD centrality
+    x-axis "Adjacent / planned" --> "Production-real today"
+    y-axis "Nice-to-have" --> "Core to the role"
+    quadrant-1 "Lead with these"
+    quadrant-2 "Build before interview if possible"
+    quadrant-3 "Mention, don't dwell"
+    quadrant-4 "Frame as quick wins"
+    "MCP tool boundaries": [0.82, 0.78]
+    "Multi-agent orchestration (agent-queue)": [0.75, 0.7]
+    "Access-controlled / Zero-Trust retrieval": [0.8, 0.85]
+    "Bounded grounding (flowmonk)": [0.78, 0.9]
+    "Schema-aware tool-calling (invt_trdg)": [0.72, 0.72]
+    "LangGraph (prod-grade)": [0.3, 0.88]
+    "RAGAS / TruLens eval harness": [0.25, 0.86]
+    "pgvector hybrid retrieval": [0.35, 0.8]
+    "Cosmos Gremlin Graph RAG": [0.3, 0.6]
+    "Google ADK / A2A": [0.2, 0.4]
+    "RAPTOR / HyDE / CRAG / Self-RAG": [0.28, 0.65]
+    "SR 11-7 / EU AI Act docs": [0.45, 0.7]
+```
+
+**Three sentences to own the gap:**
+> "My production depth is in **agentic orchestration, MCP tool boundaries, and bounded
+> grounding** — the parts that decide whether an agentic system is *safe* in a regulated
+> setting. The classic LangChain/LangGraph and RAGAS surface area I've architected and
+> can stand up fast; in fact I've scoped exactly that as a roadmap on our own platform.
+> What I bring that's harder to hire is the **governance instinct** — designing retrieval
+> so that masking, kill-switches, and audit trails are structural, not bolted on."
+
+---
+
+## One-line elevator pitch for the role
+
+> *"I build agentic systems where the interesting engineering is in the **boundaries** —
+> what a tool is allowed to retrieve, how a model's output is grounded and cited, and how
+> every hop is audited — and I've been running a multi-product ecosystem (MCP servers,
+> a multi-agent runner, provider-routed LLMs, encrypted/flagged data access) that is one
+> deliberate roadmap away from being a textbook enterprise agentic-RAG fabric."*