# 06 · Glossary & Rapid-Fire Quick-Reference The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the **bold** line cleanly for each, you sound fluent. --- ## Advanced RAG techniques | Term | What it is | One-liner | |---|---|---| | **HyDE** | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed *that* and retrieve against it. | **"Fixes recall by closing the question↔document vocabulary gap."** | | **CRAG** | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | **"A relevance gate that re-retrieves instead of generating from junk."** | | **Self-RAG** | Model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*; loops if not. | **"The model critiques its own groundedness before answering."** | | **RAPTOR** | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | **"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."** | | **Reranking** | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | **"Fixes precision; I rerank only the top-k to control latency."** | | **ColBERT** | Late-interaction reranking — token-level MaxSim. Accurate *and* scalable. | **"Token-level matching without re-encoding the whole pair per query."** | | **Context compression** | Drop/condense retrieved spans to fit budget and cut distraction. | **"Less, more-relevant context beats more context."** | | **Hybrid search** | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via **RRF**. | **"Vector for meaning, BM25 for exact terms, graph for relationships."** | | **RRF** | Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed. | **"Tuning-free way to fuse vector + lexical ranks."** | | **Semantic chunking** | Split on topic/structure boundaries, not byte counts. | **"Chunk on document structure first, size second."** | | **Agentic RAG** | An agent *decides* when/what/how to retrieve, can use multiple tools and loop. | **"RAG as a control flow, not a single hop."** | --- ## Evaluation metrics (RAGAS vocabulary) | Metric | Question it answers | What it isolates | |---|---|---| | **Faithfulness / groundedness** | Is every claim supported by retrieved context? | Hallucination | | **Answer relevancy** | Does the answer address the question? | Generation focus | | **Context precision** | Are the *top* retrieved chunks the relevant ones? | Reranker quality | | **Context recall** | Did we retrieve *all* needed evidence? | Retriever quality | | **Answer correctness** | Right vs. ground truth? | End-to-end | - **RAGAS** — library that computes these (often LLM-as-judge). - **TruLens** — RAG-triad + feedback functions + tracing. - **DeepEval** — pytest-style assertions → CI gates. - **LangSmith** — tracing + eval ops for LangChain/LangGraph. > **Diagnostic move to say out loud:** *"Low context-recall → fix the **retriever** (HyDE, > hybrid, chunking). High recall but low context-precision → fix the **reranker**. Good > context but low faithfulness → fix the **generator/prompt** or **abstain**."* --- ## Agentic frameworks | Term | Essence | |---|---| | **LangChain** | Primitives: tool binding, structured output, retrievers, chains. | | **LangGraph** | Agents as **state graphs**: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. | | **Google ADK** | Google's Agent Development Kit for building/deploying agents. | | **A2A** | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. | | **AutoGen** | Conversational multi-agent loops (agents talk to each other). | | **MCP** | Model Context Protocol: standard for exposing **tools/resources** to models via a server, with typed registration. | --- ## Governance & regulatory | Term | What to say | |---|---| | **SR 11-7** | US Fed/OCC **model risk management** guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. **"My eval harness + model cards + kill-switch + decision log map directly to its pillars."** | | **OCC model risk** | Aligned with SR 11-7; emphasizes governance + effective challenge. | | **EU AI Act** | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. **"High-risk paths get human-in-the-loop and full decision logging by design."** | | **GDPR / CCPA** | Data protection/minimization, subject access/deletion. **"Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."** | | **Zero Trust (for agents)** | Never trust the agent's request implicitly; verify identity + scope at the tool boundary **every** call. **"The MCP server is my policy enforcement point."** | | **Access-controlled retrieval** | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). | | **Row/column masking** | Mask sensitive fields at the boundary regardless of query. | | **Model card** | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. | | **RACI** | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. | --- ## ByteLyst anchor cheat-sheet (so you never blank on "where have you done this") | JD theme | Say this anchor | |---|---| | Multi-agent orchestration | **`agent-queue`** — claude/codex/devin, `inbox→doing→done/failed` state machine. | | MCP / Zero-Trust tool boundary | **`mcp-server` :4007 + `mcp-client`**, authZ per call, masking, kill-switch, audit. | | Provider-portable models | **`llm-router`** (Azure OpenAI / Bedrock / Vertex / Ollama). | | Grounding by architecture | **`flowmonk`** — deterministic engine authoritative, AI = explanation/safe-reco only. | | Schema-aware structured retrieval | **`invt_trdg` AI chat** — typed tool-calling over markets, not free SQL. | | Unstructured ingestion | **`extraction-service` :4005 + `packages/extraction`**; `notelett` store. | | Graph (Cosmos Gremlin) | **`packages/cosmos`** — Cosmos DB Gremlin API in prod. | | Vector / multi-tenancy | **pgvector path** + `productId` / two-instance Hermes isolation. | | Eval / ops console | **Hermes Mission Control** + `telemetry`/`diagnostics`/`monitoring`. | | Governance primitives | **`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`**. | | Banking domain analog | **`invt_trdg`** — regulated-consequence domain; abstain-over-guess discipline. | --- ## Likely curveballs + crisp answers - **"How do you stop prompt injection from retrieved docs?"** → "Treat retrieved text as *data*, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields." - **"Vector DB choice?"** → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary." - **"How do you measure if RAG is 'good enough' for prod?"** → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation." - **"Free Text-to-SQL — yes or no?"** → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables." - **"Latency vs. accuracy trade-off?"** → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains." - **"How is this different from a chatbot demo?"** → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can *prove* why it answered and refuse when it shouldn't." - **"What worries you most in agentic RAG for banking?"** → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call." --- ## 30-second close > *"I build agentic systems where the hard engineering is in the boundaries — what a tool > can retrieve, how output is grounded and cited, when to abstain, and how every hop is > audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider- > routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away > from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."*