bytelyst-devops-tools/docs/INTERVIEW/06-glossary-quickref.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

8.9 KiB

06 · Glossary & Rapid-Fire Quick-Reference

The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the bold line cleanly for each, you sound fluent.


Advanced RAG techniques

Term What it is One-liner
HyDE Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed that and retrieve against it. "Fixes recall by closing the question↔document vocabulary gap."
CRAG Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. "A relevance gate that re-retrieves instead of generating from junk."
Self-RAG Model emits reflection tokens deciding whether to retrieve and whether its draft is supported; loops if not. "The model critiques its own groundedness before answering."
RAPTOR Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. "Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."
Reranking Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. "Fixes precision; I rerank only the top-k to control latency."
ColBERT Late-interaction reranking — token-level MaxSim. Accurate and scalable. "Token-level matching without re-encoding the whole pair per query."
Context compression Drop/condense retrieved spans to fit budget and cut distraction. "Less, more-relevant context beats more context."
Hybrid search Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via RRF. "Vector for meaning, BM25 for exact terms, graph for relationships."
RRF Reciprocal Rank Fusion — combine rankings by Σ 1/(k+rank); no score calibration needed. "Tuning-free way to fuse vector + lexical ranks."
Semantic chunking Split on topic/structure boundaries, not byte counts. "Chunk on document structure first, size second."
Agentic RAG An agent decides when/what/how to retrieve, can use multiple tools and loop. "RAG as a control flow, not a single hop."

Evaluation metrics (RAGAS vocabulary)

Metric Question it answers What it isolates
Faithfulness / groundedness Is every claim supported by retrieved context? Hallucination
Answer relevancy Does the answer address the question? Generation focus
Context precision Are the top retrieved chunks the relevant ones? Reranker quality
Context recall Did we retrieve all needed evidence? Retriever quality
Answer correctness Right vs. ground truth? End-to-end
  • RAGAS — library that computes these (often LLM-as-judge).
  • TruLens — RAG-triad + feedback functions + tracing.
  • DeepEval — pytest-style assertions → CI gates.
  • LangSmith — tracing + eval ops for LangChain/LangGraph.

Diagnostic move to say out loud: "Low context-recall → fix the retriever (HyDE, hybrid, chunking). High recall but low context-precision → fix the reranker. Good context but low faithfulness → fix the generator/prompt or abstain."


Agentic frameworks

Term Essence
LangChain Primitives: tool binding, structured output, retrievers, chains.
LangGraph Agents as state graphs: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop.
Google ADK Google's Agent Development Kit for building/deploying agents.
A2A Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop.
AutoGen Conversational multi-agent loops (agents talk to each other).
MCP Model Context Protocol: standard for exposing tools/resources to models via a server, with typed registration.

Governance & regulatory

Term What to say
SR 11-7 US Fed/OCC model risk management guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. "My eval harness + model cards + kill-switch + decision log map directly to its pillars."
OCC model risk Aligned with SR 11-7; emphasizes governance + effective challenge.
EU AI Act Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. "High-risk paths get human-in-the-loop and full decision logging by design."
GDPR / CCPA Data protection/minimization, subject access/deletion. "Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."
Zero Trust (for agents) Never trust the agent's request implicitly; verify identity + scope at the tool boundary every call. "The MCP server is my policy enforcement point."
Access-controlled retrieval Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter).
Row/column masking Mask sensitive fields at the boundary regardless of query.
Model card Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch.
RACI Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit.

ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")

JD theme Say this anchor
Multi-agent orchestration agent-queue — claude/codex/devin, inbox→doing→done/failed state machine.
MCP / Zero-Trust tool boundary mcp-server :4007 + mcp-client, authZ per call, masking, kill-switch, audit.
Provider-portable models llm-router (Azure OpenAI / Bedrock / Vertex / Ollama).
Grounding by architecture flowmonk — deterministic engine authoritative, AI = explanation/safe-reco only.
Schema-aware structured retrieval invt_trdg AI chat — typed tool-calling over markets, not free SQL.
Unstructured ingestion extraction-service :4005 + packages/extraction; notelett store.
Graph (Cosmos Gremlin) packages/cosmos — Cosmos DB Gremlin API in prod.
Vector / multi-tenancy pgvector path + productId / two-instance Hermes isolation.
Eval / ops console Hermes Mission Control + telemetry/diagnostics/monitoring.
Governance primitives auth/fastify-auth, field-encrypt, feature-flag/kill-switch, event-store.
Banking domain analog invt_trdg — regulated-consequence domain; abstain-over-guess discipline.

Likely curveballs + crisp answers

  • "How do you stop prompt injection from retrieved docs?" → "Treat retrieved text as data, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
  • "Vector DB choice?" → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
  • "How do you measure if RAG is 'good enough' for prod?" → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
  • "Free Text-to-SQL — yes or no?" → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
  • "Latency vs. accuracy trade-off?" → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
  • "How is this different from a chatbot demo?" → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can prove why it answered and refuse when it shouldn't."
  • "What worries you most in agentic RAG for banking?" → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."

30-second close

"I build agentic systems where the hard engineering is in the boundaries — what a tool can retrieve, how output is grounded and cited, when to abstain, and how every hop is audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider- routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."