Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 10:48:52 +00:00

15 KiB

Raw Blame History

02 · Competency Deep-Dives

One section per competency-matrix row. Each gives you: the concept (so you sound fluent), our anchor (so it's credible), say-this talking points, and the honest edge (where to pivot to roadmap rather than overclaim).

A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)

Concept. LangGraph models an agent as a state graph: typed shared state, nodes (LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output, retrievers). ADK is Google's agent SDK; A2A is an open agent-to-agent interop protocol (agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational multi-agent loops.

Our anchor. agent-queue/ is a real multi-engine orchestrator (claude·codex·devin) with an explicit inbox→doing→done/failed state machine and requeue cycle — see 01-ecosystem-rag-fabric.md §3. packages/mcp-client + mcp-server provide tool binding; packages/llm-router is the model-selection layer a node would call.

Say this.

"LangGraph's value over a raw loop is typed state + conditional/cyclic edges + checkpointing — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
"I treat routing as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
"For A2A I'd expose each of our product agents with an agent card (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."

Honest edge. "My shipped orchestration is framework-light by choice (bash-portable). The LangGraph port is scoped (04 §A); conceptually it's a 1:1 mapping, not new ground."

B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)

Concept.

Hybrid retrieval = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then reranked by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). ColBERT = late-interaction reranking (token-level MaxSim) — accurate and scalable.
Context compression = drop/condense retrieved spans to fit the budget and reduce distraction.
HyDE = embed a hypothetical answer the LLM drafts, not the raw question — closes the query/document vocabulary gap.
CRAG = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
Self-RAG = the model emits reflection tokens deciding whether to retrieve and whether its draft is supported, looping if not.
RAPTOR = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").

Our anchor. packages/extraction + extraction-service (:4005) already turn URLs/docs into retrievable units. invt_trdg's AI chat is a retrieve-then-reason loop over structured data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement (04 §B).

Say this.

"I default to hybrid + rerank because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
"HyDE and reranking attack different failures — HyDE fixes recall (you retrieved the wrong thing), reranking fixes precision (you retrieved too much). I tune them independently against context-recall vs. context-precision."
"RAPTOR earns its cost on long regulatory corpora where the answer spans sections; for transactional Q&A it's overkill."

Honest edge. Lead with the reasoning about when to use each, which is the architect signal; the implementations are well-trodden and scoped on our roadmap.

C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)

Concept. Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are wrong joins, full-table scans, and data leakage. Mitigations: restrict to read-only semantic views, inject schema + few-shot exemplars (schema-aware retrieval over table/ column descriptions), validate/parse the SQL before execution, enforce row-level security, and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to governed data.

Our anchor. invt_trdg's AI chat is schema-aware tool-calling — NL maps to typed operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's the safe form of Text-to-SQL: the model picks a vetted, parameterized tool rather than emitting arbitrary SQL.

Say this.

"I prefer typed tool-calling over free Text-to-SQL wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
"Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the relevant schema slice into the prompt, so the model isn't drowning in a 400-table catalog."

Honest edge. "Free Text-to-SQL against a warehouse I've prototyped more than shipped; in production I've leaned on the tool-calling pattern because it's safer in regulated data."

D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)

Concept. Naïve "split every 1000 chars" chunking destroys meaning. Layout-aware parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; OCR (Tesseract/Azure Doc Intelligence) handles scans. Semantic chunking splits on topic boundaries, not byte counts. Tables and figures need special handling (serialize tables to markdown; caption images). Each chunk carries provenance metadata (doc id, page, section) — mandatory for citations.

Our anchor. packages/extraction + extraction-service already perform URL/task/doc extraction; notelett models structured notes. Layout-aware PDF + OCR is the additive piece (04 §B).

Say this.

"Chunking is where most RAG quality is won or lost. I chunk on document structure first, size second, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
"For regulatory filings I keep tables intact and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."

E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)

Concept. Vector RAG finds similar text; it can't answer "which entities are connected to X within 2 hops." Graph RAG retrieves a subgraph (entities + relationships) and feeds it as structured context — ideal for "show the ownership chain," "trace this transaction counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest, build a knowledge graph, retrieve by graph traversal (Gremlin/Cypher) + vector seed, then generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).

Our anchor. We run Azure Cosmos DB (packages/cosmos), which exposes the Gremlin graph API — the JD lists "Azure Cosmos Gremlin" explicitly. event-store/events already model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is 04 §D.

Say this.

"I use the graph for the 'connected-to' questions vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
"In banking this is the AML/KYC sweet spot: link analysis across counterparties is inherently graph-shaped."

Honest edge. "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure we already operate, not a new platform bet."

F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)

Concept. Trade-offs: Pinecone (managed, serverless, fast to ship), Weaviate/Qdrant (open, hybrid + filtering, self-host control), Azure AI Search (managed hybrid: vector + BM25 + semantic rerank in one service — the natural Azure pick), pgvector (lives in your Postgres → transactional consistency, one backup story, lowest ops). Multi-tenancy = namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) / schema or row filter (pgvector). Metadata filtering is as important as the ANN index.

Our anchor. Postgres is in the stack → pgvector is our lowest-friction path and gives transactional consistency with the source rows. Multi-tenancy is already first-class (productId, two-instance Hermes) — see 01 §6.

Say this.

"On Azure I'd default to Azure AI Search because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
"For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for pgvector — one database, one backup, one ACL model."
"I pick the vector DB last, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."

G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)

Concept. The core metrics:

Faithfulness / groundedness — is every claim supported by retrieved context? (anti-hallucination)
Answer relevancy — does the answer address the question?
Context precision — are the top retrieved chunks the relevant ones? (reranker quality)
Context recall — did we retrieve all needed evidence? (retriever quality)
Answer correctness — vs. ground truth.

RAGAS computes these (often LLM-as-judge). TruLens adds the "RAG triad" + feedback functions + tracing. DeepEval is pytest-style assertions for CI. LangSmith = tracing/ eval ops. An SLA turns metrics into gates ("faithfulness ≥ 0.9 or abstain").

Our anchor. flowmonk is a grounding pattern made flesh: the deterministic scheduler is the source of truth, and the AI layer is constrained to explanation / safe recommendation — it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture. diagnostics-client / telemetry-client / monitoring + Hermes dashboards are the ready-made home for an eval harness + drift monitor (04 §E).

Say this.

"I don't ship grounding as a vibe — I ship it as gates: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
"flowmonk taught me the cheapest hallucination fix: don't let the LLM be the source of truth. Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
"Eval is a two-loop system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches factual drift as corpora and models change."

H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)

Concept. Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index; Cosmos DB = docs + Gremlin graph. The architect skill is provider-portability: abstract the model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the customer's tenancy.

Our anchor. Azure Cosmos DB in production (_AZURE/, packages/cosmos). packages/llm-router is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex swap-in, no app rewrite. packages/ollama-client covers on-prem/air-gapped inference.

Say this.

"I keep a router seam so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
"On-prem / air-gapped is real in banking; ollama-client in our stack means I've thought about the no-egress deployment, not just the SaaS path."

I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)

Concept. Governance is structural, applied at every hop (see 01 §4–5): access-controlled retrieval (you can only retrieve what your identity entitles you to), row/column masking, role-aware context injection, immutable audit, instant kill-switch, model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify identity + scope at the tool boundary every call.

Our anchor. packages/auth + fastify-auth (identity/scope), field-encrypt / client-encrypt (column/field masking), feature-flag-client + kill-switch-client (instant constraint), event-store (immutable audit), MCP server (explicit tool boundary). This is the deepest, most defensible part of the story.

Say this.

"I design retrieval so that masking and audit are inescapable — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
"A kill-switch that disables a model or single tool without a redeploy is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."

(Full regulatory mapping in 05-banking-blueprints.md.)

J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)

Concept. The flagship use cases: customer-support automation (grounded answers from policy/product docs with citations + escalation), compliance-document retrieval (find the controlling clause across filings/policies), regulatory reporting, model risk management (SR 11-7), KYC/AML (entity/network link analysis → graph RAG).

Our anchor. invt_trdg is our regulated-industry analog: market data, trade plans, alerts, profiles — a domain where wrong answers have consequences and auditability matters. The patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.

Say this.

"My trading product is the closest non-bank analog to a banking workload: it taught me to default to abstain over guess when money or compliance is on the line, and to make every answer traceable to a source."
"KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."

Honest edge. "I haven't shipped inside a chartered bank; my regulated-domain reps come from trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."

15 KiB Raw Blame History Unescape Escape

02 · Competency Deep-Dives

A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)

B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)

C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)

D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)

E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)

F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)

G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)

H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)

I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)

J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)

15 KiB

Raw Blame History