# 02 · Competency Deep-Dives One section per competency-matrix row. Each gives you: **the concept** (so you sound fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest edge** (where to pivot to roadmap rather than overclaim). --- ## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice) **Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes (LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output, retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol (agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational multi-agent loops. **Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin) with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see `01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding; `packages/llm-router` is the model-selection layer a node would call. **Say this.** - "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves." - "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ." - "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag." **Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable). The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground." --- ## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice) **Concept.** - **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable. - **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction. - **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap. - **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating. - **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not. - **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing"). **Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement (`04 §B`). **Say this.** - "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails." - "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision." - "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill." **Honest edge.** Lead with the *reasoning about when to use each*, which is the architect signal; the implementations are well-trodden and scoped on our roadmap. --- ## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice) **Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are **wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/ column descriptions), validate/parse the SQL before execution, enforce **row-level security**, and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to governed data. **Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than emitting arbitrary SQL. **Say this.** - "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS." - "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog." **Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped; in production I've leaned on the tool-calling pattern because it's safer in regulated data." --- ## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice) **Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware** parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR** (Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic boundaries, not byte counts. Tables and figures need special handling (serialize tables to markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section) — mandatory for citations. **Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece (`04 §B`). **Say this.** - "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'" - "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one." --- ## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice) **Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds it as structured context — ideal for "show the ownership chain," "trace this transaction counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest, build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes). **Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin** graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is `04 §D`. **Say this.** - "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood." - "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped." **Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure we already operate, not a new platform bet." --- ## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice) **Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant** (open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector + BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** = namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) / schema or row filter (pgvector). Metadata filtering is as important as the ANN index. **Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives transactional consistency with the source rows. Multi-tenancy is already first-class (`productId`, two-instance Hermes) — see `01 §6`. **Say this.** - "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary." - "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model." - "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion." --- ## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice) **Concept.** The core metrics: - **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination) - **Answer relevancy** — does the answer address the question? - **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality) - **Context recall** — did we retrieve *all* needed evidence? (retriever quality) - **Answer correctness** — vs. ground truth. **RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/ eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain"). **Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is the source of truth, and **the AI layer is constrained to explanation / safe recommendation** — it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture. `diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the ready-made home for an eval harness + drift monitor (`04 §E`). **Say this.** - "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)." - "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability." - "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change." --- ## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice) **Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index; Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the customer's tenancy. **Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`). `packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference. **Say this.** - "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency." - "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path." --- ## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice) **Concept.** Governance is **structural**, applied at every hop (see `01 §4–5`): access-controlled retrieval (you can only retrieve what your identity entitles you to), row/column masking, role-aware context injection, immutable audit, instant kill-switch, model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify identity + scope at the tool boundary every call. **Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` / `client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the **deepest, most defensible** part of the story. **Say this.** - "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means." - "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately." (Full regulatory mapping in `05-banking-blueprints.md`.) --- ## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice) **Concept.** The flagship use cases: **customer-support automation** (grounded answers from policy/product docs with citations + escalation), **compliance-document retrieval** (find the controlling clause across filings/policies), **regulatory reporting**, **model risk management** (SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG). **Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans, alerts, profiles — a domain where wrong answers have consequences and auditability matters. The patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank. **Say this.** - "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source." - "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval." **Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."