Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 10:48:52 +00:00

8.9 KiB

Raw Blame History

06 · Glossary & Rapid-Fire Quick-Reference

The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the bold line cleanly for each, you sound fluent.

Advanced RAG techniques

Term	What it is	One-liner
HyDE	Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed that and retrieve against it.	"Fixes recall by closing the question↔document vocabulary gap."
CRAG	Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating.	"A relevance gate that re-retrieves instead of generating from junk."
Self-RAG	Model emits reflection tokens deciding whether to retrieve and whether its draft is supported; loops if not.	"The model critiques its own groundedness before answering."
RAPTOR	Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs.	"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."
Reranking	Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity.	"Fixes precision; I rerank only the top-k to control latency."
ColBERT	Late-interaction reranking — token-level MaxSim. Accurate and scalable.	"Token-level matching without re-encoding the whole pair per query."
Context compression	Drop/condense retrieved spans to fit budget and cut distraction.	"Less, more-relevant context beats more context."
Hybrid search	Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via RRF.	"Vector for meaning, BM25 for exact terms, graph for relationships."
RRF	Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed.	"Tuning-free way to fuse vector + lexical ranks."
Semantic chunking	Split on topic/structure boundaries, not byte counts.	"Chunk on document structure first, size second."
Agentic RAG	An agent decides when/what/how to retrieve, can use multiple tools and loop.	"RAG as a control flow, not a single hop."

Evaluation metrics (RAGAS vocabulary)

Metric	Question it answers	What it isolates
Faithfulness / groundedness	Is every claim supported by retrieved context?	Hallucination
Answer relevancy	Does the answer address the question?	Generation focus
Context precision	Are the top retrieved chunks the relevant ones?	Reranker quality
Context recall	Did we retrieve all needed evidence?	Retriever quality
Answer correctness	Right vs. ground truth?	End-to-end

RAGAS — library that computes these (often LLM-as-judge).
TruLens — RAG-triad + feedback functions + tracing.
DeepEval — pytest-style assertions → CI gates.
LangSmith — tracing + eval ops for LangChain/LangGraph.

Diagnostic move to say out loud: "Low context-recall → fix the retriever (HyDE, hybrid, chunking). High recall but low context-precision → fix the reranker. Good context but low faithfulness → fix the generator/prompt or abstain."

Agentic frameworks

Term	Essence
LangChain	Primitives: tool binding, structured output, retrievers, chains.
LangGraph	Agents as state graphs: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop.
Google ADK	Google's Agent Development Kit for building/deploying agents.
A2A	Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop.
AutoGen	Conversational multi-agent loops (agents talk to each other).
MCP	Model Context Protocol: standard for exposing tools/resources to models via a server, with typed registration.

Governance & regulatory

Term	What to say
SR 11-7	US Fed/OCC model risk management guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. "My eval harness + model cards + kill-switch + decision log map directly to its pillars."
OCC model risk	Aligned with SR 11-7; emphasizes governance + effective challenge.
EU AI Act	Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. "High-risk paths get human-in-the-loop and full decision logging by design."
GDPR / CCPA	Data protection/minimization, subject access/deletion. "Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."
Zero Trust (for agents)	Never trust the agent's request implicitly; verify identity + scope at the tool boundary every call. "The MCP server is my policy enforcement point."
Access-controlled retrieval	Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter).
Row/column masking	Mask sensitive fields at the boundary regardless of query.
Model card	Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch.
RACI	Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit.

ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")

JD theme	Say this anchor
Multi-agent orchestration	`agent-queue` — claude/codex/devin, `inbox→doing→done/failed` state machine.
MCP / Zero-Trust tool boundary	`mcp-server` :4007 + `mcp-client`, authZ per call, masking, kill-switch, audit.
Provider-portable models	`llm-router` (Azure OpenAI / Bedrock / Vertex / Ollama).
Grounding by architecture	`flowmonk` — deterministic engine authoritative, AI = explanation/safe-reco only.
Schema-aware structured retrieval	`invt_trdg` AI chat — typed tool-calling over markets, not free SQL.
Unstructured ingestion	`extraction-service` :4005 + `packages/extraction`; `notelett` store.
Graph (Cosmos Gremlin)	`packages/cosmos` — Cosmos DB Gremlin API in prod.
Vector / multi-tenancy	pgvector path + `productId` / two-instance Hermes isolation.
Eval / ops console	Hermes Mission Control + `telemetry`/`diagnostics`/`monitoring`.
Governance primitives	`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`.
Banking domain analog	`invt_trdg` — regulated-consequence domain; abstain-over-guess discipline.

Likely curveballs + crisp answers

"How do you stop prompt injection from retrieved docs?" → "Treat retrieved text as data, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
"Vector DB choice?" → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
"How do you measure if RAG is 'good enough' for prod?" → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
"Free Text-to-SQL — yes or no?" → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
"Latency vs. accuracy trade-off?" → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
"How is this different from a chatbot demo?" → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can prove why it answered and refuse when it shouldn't."
"What worries you most in agentic RAG for banking?" → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."

30-second close

"I build agentic systems where the hard engineering is in the boundaries — what a tool can retrieve, how output is grounded and cited, when to abstain, and how every hop is audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider- routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."

8.9 KiB Raw Blame History

06 · Glossary & Rapid-Fire Quick-Reference

Advanced RAG techniques

Evaluation metrics (RAGAS vocabulary)

Agentic frameworks

Governance & regulatory

ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")

Likely curveballs + crisp answers

30-second close

8.9 KiB

Raw Blame History