7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15 KiB
02 · Competency Deep-Dives
One section per competency-matrix row. Each gives you: the concept (so you sound fluent), our anchor (so it's credible), say-this talking points, and the honest edge (where to pivot to roadmap rather than overclaim).
A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)
Concept. LangGraph models an agent as a state graph: typed shared state, nodes (LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output, retrievers). ADK is Google's agent SDK; A2A is an open agent-to-agent interop protocol (agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational multi-agent loops.
Our anchor. agent-queue/ is a real multi-engine orchestrator (claude·codex·devin)
with an explicit inbox→doing→done/failed state machine and requeue cycle — see
01-ecosystem-rag-fabric.md §3. packages/mcp-client + mcp-server provide tool binding;
packages/llm-router is the model-selection layer a node would call.
Say this.
- "LangGraph's value over a raw loop is typed state + conditional/cyclic edges + checkpointing — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
- "I treat routing as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
- "For A2A I'd expose each of our product agents with an agent card (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."
Honest edge. "My shipped orchestration is framework-light by choice (bash-portable).
The LangGraph port is scoped (04 §A); conceptually it's a 1:1 mapping, not new ground."
B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)
Concept.
- Hybrid retrieval = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then reranked by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). ColBERT = late-interaction reranking (token-level MaxSim) — accurate and scalable.
- Context compression = drop/condense retrieved spans to fit the budget and reduce distraction.
- HyDE = embed a hypothetical answer the LLM drafts, not the raw question — closes the query/document vocabulary gap.
- CRAG = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
- Self-RAG = the model emits reflection tokens deciding whether to retrieve and whether its draft is supported, looping if not.
- RAPTOR = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").
Our anchor. packages/extraction + extraction-service (:4005) already turn URLs/docs
into retrievable units. invt_trdg's AI chat is a retrieve-then-reason loop over structured
data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement
(04 §B).
Say this.
- "I default to hybrid + rerank because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
- "HyDE and reranking attack different failures — HyDE fixes recall (you retrieved the wrong thing), reranking fixes precision (you retrieved too much). I tune them independently against context-recall vs. context-precision."
- "RAPTOR earns its cost on long regulatory corpora where the answer spans sections; for transactional Q&A it's overkill."
Honest edge. Lead with the reasoning about when to use each, which is the architect signal; the implementations are well-trodden and scoped on our roadmap.
C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)
Concept. Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are wrong joins, full-table scans, and data leakage. Mitigations: restrict to read-only semantic views, inject schema + few-shot exemplars (schema-aware retrieval over table/ column descriptions), validate/parse the SQL before execution, enforce row-level security, and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to governed data.
Our anchor. invt_trdg's AI chat is schema-aware tool-calling — NL maps to typed
operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's
the safe form of Text-to-SQL: the model picks a vetted, parameterized tool rather than
emitting arbitrary SQL.
Say this.
- "I prefer typed tool-calling over free Text-to-SQL wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the relevant schema slice into the prompt, so the model isn't drowning in a 400-table catalog."
Honest edge. "Free Text-to-SQL against a warehouse I've prototyped more than shipped; in production I've leaned on the tool-calling pattern because it's safer in regulated data."
D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)
Concept. Naïve "split every 1000 chars" chunking destroys meaning. Layout-aware parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; OCR (Tesseract/Azure Doc Intelligence) handles scans. Semantic chunking splits on topic boundaries, not byte counts. Tables and figures need special handling (serialize tables to markdown; caption images). Each chunk carries provenance metadata (doc id, page, section) — mandatory for citations.
Our anchor. packages/extraction + extraction-service already perform URL/task/doc
extraction; notelett models structured notes. Layout-aware PDF + OCR is the additive piece
(04 §B).
Say this.
- "Chunking is where most RAG quality is won or lost. I chunk on document structure first, size second, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
- "For regulatory filings I keep tables intact and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."
E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)
Concept. Vector RAG finds similar text; it can't answer "which entities are connected to X within 2 hops." Graph RAG retrieves a subgraph (entities + relationships) and feeds it as structured context — ideal for "show the ownership chain," "trace this transaction counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest, build a knowledge graph, retrieve by graph traversal (Gremlin/Cypher) + vector seed, then generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).
Our anchor. We run Azure Cosmos DB (packages/cosmos), which exposes the Gremlin
graph API — the JD lists "Azure Cosmos Gremlin" explicitly. event-store/events already
model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is
04 §D.
Say this.
- "I use the graph for the 'connected-to' questions vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
- "In banking this is the AML/KYC sweet spot: link analysis across counterparties is inherently graph-shaped."
Honest edge. "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure we already operate, not a new platform bet."
F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)
Concept. Trade-offs: Pinecone (managed, serverless, fast to ship), Weaviate/Qdrant (open, hybrid + filtering, self-host control), Azure AI Search (managed hybrid: vector + BM25 + semantic rerank in one service — the natural Azure pick), pgvector (lives in your Postgres → transactional consistency, one backup story, lowest ops). Multi-tenancy = namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) / schema or row filter (pgvector). Metadata filtering is as important as the ANN index.
Our anchor. Postgres is in the stack → pgvector is our lowest-friction path and gives
transactional consistency with the source rows. Multi-tenancy is already first-class
(productId, two-instance Hermes) — see 01 §6.
Say this.
- "On Azure I'd default to Azure AI Search because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for pgvector — one database, one backup, one ACL model."
- "I pick the vector DB last, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."
G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)
Concept. The core metrics:
- Faithfulness / groundedness — is every claim supported by retrieved context? (anti-hallucination)
- Answer relevancy — does the answer address the question?
- Context precision — are the top retrieved chunks the relevant ones? (reranker quality)
- Context recall — did we retrieve all needed evidence? (retriever quality)
- Answer correctness — vs. ground truth.
RAGAS computes these (often LLM-as-judge). TruLens adds the "RAG triad" + feedback functions + tracing. DeepEval is pytest-style assertions for CI. LangSmith = tracing/ eval ops. An SLA turns metrics into gates ("faithfulness ≥ 0.9 or abstain").
Our anchor. flowmonk is a grounding pattern made flesh: the deterministic scheduler is
the source of truth, and the AI layer is constrained to explanation / safe recommendation —
it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture.
diagnostics-client / telemetry-client / monitoring + Hermes dashboards are the
ready-made home for an eval harness + drift monitor (04 §E).
Say this.
- "I don't ship grounding as a vibe — I ship it as gates: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
- "flowmonk taught me the cheapest hallucination fix: don't let the LLM be the source of truth. Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
- "Eval is a two-loop system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches factual drift as corpora and models change."
H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)
Concept. Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index; Cosmos DB = docs + Gremlin graph. The architect skill is provider-portability: abstract the model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the customer's tenancy.
Our anchor. Azure Cosmos DB in production (_AZURE/, packages/cosmos).
packages/llm-router is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex
swap-in, no app rewrite. packages/ollama-client covers on-prem/air-gapped inference.
Say this.
- "I keep a router seam so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
- "On-prem / air-gapped is real in banking;
ollama-clientin our stack means I've thought about the no-egress deployment, not just the SaaS path."
I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)
Concept. Governance is structural, applied at every hop (see 01 §4–5):
access-controlled retrieval (you can only retrieve what your identity entitles you to),
row/column masking, role-aware context injection, immutable audit, instant kill-switch,
model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify
identity + scope at the tool boundary every call.
Our anchor. packages/auth + fastify-auth (identity/scope), field-encrypt /
client-encrypt (column/field masking), feature-flag-client + kill-switch-client (instant
constraint), event-store (immutable audit), MCP server (explicit tool boundary). This is the
deepest, most defensible part of the story.
Say this.
- "I design retrieval so that masking and audit are inescapable — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
- "A kill-switch that disables a model or single tool without a redeploy is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."
(Full regulatory mapping in 05-banking-blueprints.md.)
J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)
Concept. The flagship use cases: customer-support automation (grounded answers from policy/product docs with citations + escalation), compliance-document retrieval (find the controlling clause across filings/policies), regulatory reporting, model risk management (SR 11-7), KYC/AML (entity/network link analysis → graph RAG).
Our anchor. invt_trdg is our regulated-industry analog: market data, trade plans,
alerts, profiles — a domain where wrong answers have consequences and auditability matters. The
patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.
Say this.
- "My trading product is the closest non-bank analog to a banking workload: it taught me to default to abstain over guess when money or compliance is on the line, and to make every answer traceable to a source."
- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."
Honest edge. "I haven't shipped inside a chartered bank; my regulated-domain reps come from trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."