bytelyst-devops-tools/docs/INTERVIEW/06-glossary-quickref.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

113 lines
8.9 KiB
Markdown

# 06 · Glossary & Rapid-Fire Quick-Reference
The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the
**bold** line cleanly for each, you sound fluent.
---
## Advanced RAG techniques
| Term | What it is | One-liner |
|---|---|---|
| **HyDE** | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed *that* and retrieve against it. | **"Fixes recall by closing the question↔document vocabulary gap."** |
| **CRAG** | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | **"A relevance gate that re-retrieves instead of generating from junk."** |
| **Self-RAG** | Model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*; loops if not. | **"The model critiques its own groundedness before answering."** |
| **RAPTOR** | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | **"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."** |
| **Reranking** | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | **"Fixes precision; I rerank only the top-k to control latency."** |
| **ColBERT** | Late-interaction reranking — token-level MaxSim. Accurate *and* scalable. | **"Token-level matching without re-encoding the whole pair per query."** |
| **Context compression** | Drop/condense retrieved spans to fit budget and cut distraction. | **"Less, more-relevant context beats more context."** |
| **Hybrid search** | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via **RRF**. | **"Vector for meaning, BM25 for exact terms, graph for relationships."** |
| **RRF** | Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed. | **"Tuning-free way to fuse vector + lexical ranks."** |
| **Semantic chunking** | Split on topic/structure boundaries, not byte counts. | **"Chunk on document structure first, size second."** |
| **Agentic RAG** | An agent *decides* when/what/how to retrieve, can use multiple tools and loop. | **"RAG as a control flow, not a single hop."** |
---
## Evaluation metrics (RAGAS vocabulary)
| Metric | Question it answers | What it isolates |
|---|---|---|
| **Faithfulness / groundedness** | Is every claim supported by retrieved context? | Hallucination |
| **Answer relevancy** | Does the answer address the question? | Generation focus |
| **Context precision** | Are the *top* retrieved chunks the relevant ones? | Reranker quality |
| **Context recall** | Did we retrieve *all* needed evidence? | Retriever quality |
| **Answer correctness** | Right vs. ground truth? | End-to-end |
- **RAGAS** — library that computes these (often LLM-as-judge).
- **TruLens** — RAG-triad + feedback functions + tracing.
- **DeepEval** — pytest-style assertions → CI gates.
- **LangSmith** — tracing + eval ops for LangChain/LangGraph.
> **Diagnostic move to say out loud:** *"Low context-recall → fix the **retriever** (HyDE,
> hybrid, chunking). High recall but low context-precision → fix the **reranker**. Good
> context but low faithfulness → fix the **generator/prompt** or **abstain**."*
---
## Agentic frameworks
| Term | Essence |
|---|---|
| **LangChain** | Primitives: tool binding, structured output, retrievers, chains. |
| **LangGraph** | Agents as **state graphs**: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. |
| **Google ADK** | Google's Agent Development Kit for building/deploying agents. |
| **A2A** | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. |
| **AutoGen** | Conversational multi-agent loops (agents talk to each other). |
| **MCP** | Model Context Protocol: standard for exposing **tools/resources** to models via a server, with typed registration. |
---
## Governance & regulatory
| Term | What to say |
|---|---|
| **SR 11-7** | US Fed/OCC **model risk management** guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. **"My eval harness + model cards + kill-switch + decision log map directly to its pillars."** |
| **OCC model risk** | Aligned with SR 11-7; emphasizes governance + effective challenge. |
| **EU AI Act** | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. **"High-risk paths get human-in-the-loop and full decision logging by design."** |
| **GDPR / CCPA** | Data protection/minimization, subject access/deletion. **"Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."** |
| **Zero Trust (for agents)** | Never trust the agent's request implicitly; verify identity + scope at the tool boundary **every** call. **"The MCP server is my policy enforcement point."** |
| **Access-controlled retrieval** | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). |
| **Row/column masking** | Mask sensitive fields at the boundary regardless of query. |
| **Model card** | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. |
| **RACI** | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. |
---
## ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")
| JD theme | Say this anchor |
|---|---|
| Multi-agent orchestration | **`agent-queue`** — claude/codex/devin, `inbox→doing→done/failed` state machine. |
| MCP / Zero-Trust tool boundary | **`mcp-server` :4007 + `mcp-client`**, authZ per call, masking, kill-switch, audit. |
| Provider-portable models | **`llm-router`** (Azure OpenAI / Bedrock / Vertex / Ollama). |
| Grounding by architecture | **`flowmonk`** — deterministic engine authoritative, AI = explanation/safe-reco only. |
| Schema-aware structured retrieval | **`invt_trdg` AI chat** — typed tool-calling over markets, not free SQL. |
| Unstructured ingestion | **`extraction-service` :4005 + `packages/extraction`**; `notelett` store. |
| Graph (Cosmos Gremlin) | **`packages/cosmos`** — Cosmos DB Gremlin API in prod. |
| Vector / multi-tenancy | **pgvector path** + `productId` / two-instance Hermes isolation. |
| Eval / ops console | **Hermes Mission Control** + `telemetry`/`diagnostics`/`monitoring`. |
| Governance primitives | **`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`**. |
| Banking domain analog | **`invt_trdg`** — regulated-consequence domain; abstain-over-guess discipline. |
---
## Likely curveballs + crisp answers
- **"How do you stop prompt injection from retrieved docs?"** → "Treat retrieved text as *data*, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
- **"Vector DB choice?"** → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
- **"How do you measure if RAG is 'good enough' for prod?"** → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
- **"Free Text-to-SQL — yes or no?"** → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
- **"Latency vs. accuracy trade-off?"** → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
- **"How is this different from a chatbot demo?"** → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can *prove* why it answered and refuse when it shouldn't."
- **"What worries you most in agentic RAG for banking?"** → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."
---
## 30-second close
> *"I build agentic systems where the hard engineering is in the boundaries — what a tool
> can retrieve, how output is grounded and cited, when to abstain, and how every hop is
> audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider-
> routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away
> from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."*