bytelyst-devops-tools/docs/INTERVIEW/02-competency-deepdives.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

225 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 02 · Competency Deep-Dives
One section per competency-matrix row. Each gives you: **the concept** (so you sound
fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest
edge** (where to pivot to roadmap rather than overclaim).
---
## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)
**Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes
(LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability
and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output,
retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol
(agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational
multi-agent loops.
**Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin)
with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see
`01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding;
`packages/llm-router` is the model-selection layer a node would call.
**Say this.**
- "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
- "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
- "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."
**Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable).
The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground."
---
## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)
**Concept.**
- **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable.
- **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction.
- **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap.
- **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
- **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not.
- **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").
**Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs
into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured
data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement
(`04 §B`).
**Say this.**
- "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
- "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision."
- "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill."
**Honest edge.** Lead with the *reasoning about when to use each*, which is the architect
signal; the implementations are well-trodden and scoped on our roadmap.
---
## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)
**Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are
**wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only
semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/
column descriptions), validate/parse the SQL before execution, enforce **row-level security**,
and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to
governed data.
**Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed
operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's
the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than
emitting arbitrary SQL.
**Say this.**
- "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog."
**Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped;
in production I've leaned on the tool-calling pattern because it's safer in regulated data."
---
## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)
**Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware**
parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR**
(Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic
boundaries, not byte counts. Tables and figures need special handling (serialize tables to
markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section)
— mandatory for citations.
**Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc
extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece
(`04 §B`).
**Say this.**
- "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
- "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."
---
## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)
**Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected
to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds
it as structured context — ideal for "show the ownership chain," "trace this transaction
counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest,
build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then
generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).
**Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin**
graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already
model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is
`04 §D`.
**Say this.**
- "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
- "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped."
**Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure
we already operate, not a new platform bet."
---
## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)
**Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant**
(open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector +
BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your
Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** =
namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) /
schema or row filter (pgvector). Metadata filtering is as important as the ANN index.
**Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives
transactional consistency with the source rows. Multi-tenancy is already first-class
(`productId`, two-instance Hermes) — see `01 §6`.
**Say this.**
- "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model."
- "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."
---
## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)
**Concept.** The core metrics:
- **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination)
- **Answer relevancy** — does the answer address the question?
- **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality)
- **Context recall** — did we retrieve *all* needed evidence? (retriever quality)
- **Answer correctness** — vs. ground truth.
**RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback
functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/
eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain").
**Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is
the source of truth, and **the AI layer is constrained to explanation / safe recommendation**
it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture.
`diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the
ready-made home for an eval harness + drift monitor (`04 §E`).
**Say this.**
- "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
- "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
- "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change."
---
## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)
**Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model
endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index;
Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the
model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the
customer's tenancy.
**Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`).
`packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex
swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference.
**Say this.**
- "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
- "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path."
---
## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)
**Concept.** Governance is **structural**, applied at every hop (see `01 §45`):
access-controlled retrieval (you can only retrieve what your identity entitles you to),
row/column masking, role-aware context injection, immutable audit, instant kill-switch,
model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify
identity + scope at the tool boundary every call.
**Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` /
`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant
constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the
**deepest, most defensible** part of the story.
**Say this.**
- "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
- "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."
(Full regulatory mapping in `05-banking-blueprints.md`.)
---
## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)
**Concept.** The flagship use cases: **customer-support automation** (grounded answers from
policy/product docs with citations + escalation), **compliance-document retrieval** (find the
controlling clause across filings/policies), **regulatory reporting**, **model risk management**
(SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG).
**Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans,
alerts, profiles — a domain where wrong answers have consequences and auditability matters. The
patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.
**Say this.**
- "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source."
- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."
**Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from
trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."