7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8.9 KiB
06 · Glossary & Rapid-Fire Quick-Reference
The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the bold line cleanly for each, you sound fluent.
Advanced RAG techniques
| Term | What it is | One-liner |
|---|---|---|
| HyDE | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed that and retrieve against it. | "Fixes recall by closing the question↔document vocabulary gap." |
| CRAG | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | "A relevance gate that re-retrieves instead of generating from junk." |
| Self-RAG | Model emits reflection tokens deciding whether to retrieve and whether its draft is supported; loops if not. | "The model critiques its own groundedness before answering." |
| RAPTOR | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | "Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics." |
| Reranking | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | "Fixes precision; I rerank only the top-k to control latency." |
| ColBERT | Late-interaction reranking — token-level MaxSim. Accurate and scalable. | "Token-level matching without re-encoding the whole pair per query." |
| Context compression | Drop/condense retrieved spans to fit budget and cut distraction. | "Less, more-relevant context beats more context." |
| Hybrid search | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via RRF. | "Vector for meaning, BM25 for exact terms, graph for relationships." |
| RRF | Reciprocal Rank Fusion — combine rankings by Σ 1/(k+rank); no score calibration needed. |
"Tuning-free way to fuse vector + lexical ranks." |
| Semantic chunking | Split on topic/structure boundaries, not byte counts. | "Chunk on document structure first, size second." |
| Agentic RAG | An agent decides when/what/how to retrieve, can use multiple tools and loop. | "RAG as a control flow, not a single hop." |
Evaluation metrics (RAGAS vocabulary)
| Metric | Question it answers | What it isolates |
|---|---|---|
| Faithfulness / groundedness | Is every claim supported by retrieved context? | Hallucination |
| Answer relevancy | Does the answer address the question? | Generation focus |
| Context precision | Are the top retrieved chunks the relevant ones? | Reranker quality |
| Context recall | Did we retrieve all needed evidence? | Retriever quality |
| Answer correctness | Right vs. ground truth? | End-to-end |
- RAGAS — library that computes these (often LLM-as-judge).
- TruLens — RAG-triad + feedback functions + tracing.
- DeepEval — pytest-style assertions → CI gates.
- LangSmith — tracing + eval ops for LangChain/LangGraph.
Diagnostic move to say out loud: "Low context-recall → fix the retriever (HyDE, hybrid, chunking). High recall but low context-precision → fix the reranker. Good context but low faithfulness → fix the generator/prompt or abstain."
Agentic frameworks
| Term | Essence |
|---|---|
| LangChain | Primitives: tool binding, structured output, retrievers, chains. |
| LangGraph | Agents as state graphs: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. |
| Google ADK | Google's Agent Development Kit for building/deploying agents. |
| A2A | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. |
| AutoGen | Conversational multi-agent loops (agents talk to each other). |
| MCP | Model Context Protocol: standard for exposing tools/resources to models via a server, with typed registration. |
Governance & regulatory
| Term | What to say |
|---|---|
| SR 11-7 | US Fed/OCC model risk management guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. "My eval harness + model cards + kill-switch + decision log map directly to its pillars." |
| OCC model risk | Aligned with SR 11-7; emphasizes governance + effective challenge. |
| EU AI Act | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. "High-risk paths get human-in-the-loop and full decision logging by design." |
| GDPR / CCPA | Data protection/minimization, subject access/deletion. "Field masking + provenance + tenant namespaces make minimization and targeted deletion structural." |
| Zero Trust (for agents) | Never trust the agent's request implicitly; verify identity + scope at the tool boundary every call. "The MCP server is my policy enforcement point." |
| Access-controlled retrieval | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). |
| Row/column masking | Mask sensitive fields at the boundary regardless of query. |
| Model card | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. |
| RACI | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. |
ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")
| JD theme | Say this anchor |
|---|---|
| Multi-agent orchestration | agent-queue — claude/codex/devin, inbox→doing→done/failed state machine. |
| MCP / Zero-Trust tool boundary | mcp-server :4007 + mcp-client, authZ per call, masking, kill-switch, audit. |
| Provider-portable models | llm-router (Azure OpenAI / Bedrock / Vertex / Ollama). |
| Grounding by architecture | flowmonk — deterministic engine authoritative, AI = explanation/safe-reco only. |
| Schema-aware structured retrieval | invt_trdg AI chat — typed tool-calling over markets, not free SQL. |
| Unstructured ingestion | extraction-service :4005 + packages/extraction; notelett store. |
| Graph (Cosmos Gremlin) | packages/cosmos — Cosmos DB Gremlin API in prod. |
| Vector / multi-tenancy | pgvector path + productId / two-instance Hermes isolation. |
| Eval / ops console | Hermes Mission Control + telemetry/diagnostics/monitoring. |
| Governance primitives | auth/fastify-auth, field-encrypt, feature-flag/kill-switch, event-store. |
| Banking domain analog | invt_trdg — regulated-consequence domain; abstain-over-guess discipline. |
Likely curveballs + crisp answers
- "How do you stop prompt injection from retrieved docs?" → "Treat retrieved text as data, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
- "Vector DB choice?" → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
- "How do you measure if RAG is 'good enough' for prod?" → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
- "Free Text-to-SQL — yes or no?" → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
- "Latency vs. accuracy trade-off?" → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
- "How is this different from a chatbot demo?" → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can prove why it answered and refuse when it shouldn't."
- "What worries you most in agentic RAG for banking?" → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."
30-second close
"I build agentic systems where the hard engineering is in the boundaries — what a tool can retrieve, how output is grounded and cited, when to abstain, and how every hop is audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider- routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."