7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
225 lines
15 KiB
Markdown
225 lines
15 KiB
Markdown
# 02 · Competency Deep-Dives
|
||
|
||
One section per competency-matrix row. Each gives you: **the concept** (so you sound
|
||
fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest
|
||
edge** (where to pivot to roadmap rather than overclaim).
|
||
|
||
---
|
||
|
||
## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)
|
||
|
||
**Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes
|
||
(LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability
|
||
and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output,
|
||
retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol
|
||
(agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational
|
||
multi-agent loops.
|
||
|
||
**Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin)
|
||
with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see
|
||
`01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding;
|
||
`packages/llm-router` is the model-selection layer a node would call.
|
||
|
||
**Say this.**
|
||
- "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
|
||
- "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
|
||
- "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."
|
||
|
||
**Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable).
|
||
The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground."
|
||
|
||
---
|
||
|
||
## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)
|
||
|
||
**Concept.**
|
||
- **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable.
|
||
- **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction.
|
||
- **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap.
|
||
- **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
|
||
- **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not.
|
||
- **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").
|
||
|
||
**Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs
|
||
into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured
|
||
data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement
|
||
(`04 §B`).
|
||
|
||
**Say this.**
|
||
- "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
|
||
- "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision."
|
||
- "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill."
|
||
|
||
**Honest edge.** Lead with the *reasoning about when to use each*, which is the architect
|
||
signal; the implementations are well-trodden and scoped on our roadmap.
|
||
|
||
---
|
||
|
||
## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)
|
||
|
||
**Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are
|
||
**wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only
|
||
semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/
|
||
column descriptions), validate/parse the SQL before execution, enforce **row-level security**,
|
||
and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to
|
||
governed data.
|
||
|
||
**Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed
|
||
operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's
|
||
the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than
|
||
emitting arbitrary SQL.
|
||
|
||
**Say this.**
|
||
- "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
|
||
- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog."
|
||
|
||
**Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped;
|
||
in production I've leaned on the tool-calling pattern because it's safer in regulated data."
|
||
|
||
---
|
||
|
||
## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)
|
||
|
||
**Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware**
|
||
parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR**
|
||
(Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic
|
||
boundaries, not byte counts. Tables and figures need special handling (serialize tables to
|
||
markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section)
|
||
— mandatory for citations.
|
||
|
||
**Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc
|
||
extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece
|
||
(`04 §B`).
|
||
|
||
**Say this.**
|
||
- "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
|
||
- "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."
|
||
|
||
---
|
||
|
||
## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)
|
||
|
||
**Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected
|
||
to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds
|
||
it as structured context — ideal for "show the ownership chain," "trace this transaction
|
||
counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest,
|
||
build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then
|
||
generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).
|
||
|
||
**Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin**
|
||
graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already
|
||
model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is
|
||
`04 §D`.
|
||
|
||
**Say this.**
|
||
- "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
|
||
- "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped."
|
||
|
||
**Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure
|
||
we already operate, not a new platform bet."
|
||
|
||
---
|
||
|
||
## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)
|
||
|
||
**Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant**
|
||
(open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector +
|
||
BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your
|
||
Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** =
|
||
namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) /
|
||
schema or row filter (pgvector). Metadata filtering is as important as the ANN index.
|
||
|
||
**Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives
|
||
transactional consistency with the source rows. Multi-tenancy is already first-class
|
||
(`productId`, two-instance Hermes) — see `01 §6`.
|
||
|
||
**Say this.**
|
||
- "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
|
||
- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model."
|
||
- "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."
|
||
|
||
---
|
||
|
||
## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)
|
||
|
||
**Concept.** The core metrics:
|
||
- **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination)
|
||
- **Answer relevancy** — does the answer address the question?
|
||
- **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality)
|
||
- **Context recall** — did we retrieve *all* needed evidence? (retriever quality)
|
||
- **Answer correctness** — vs. ground truth.
|
||
|
||
**RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback
|
||
functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/
|
||
eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain").
|
||
|
||
**Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is
|
||
the source of truth, and **the AI layer is constrained to explanation / safe recommendation** —
|
||
it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture.
|
||
`diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the
|
||
ready-made home for an eval harness + drift monitor (`04 §E`).
|
||
|
||
**Say this.**
|
||
- "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
|
||
- "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
|
||
- "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change."
|
||
|
||
---
|
||
|
||
## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)
|
||
|
||
**Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model
|
||
endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index;
|
||
Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the
|
||
model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the
|
||
customer's tenancy.
|
||
|
||
**Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`).
|
||
`packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex
|
||
swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference.
|
||
|
||
**Say this.**
|
||
- "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
|
||
- "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path."
|
||
|
||
---
|
||
|
||
## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)
|
||
|
||
**Concept.** Governance is **structural**, applied at every hop (see `01 §4–5`):
|
||
access-controlled retrieval (you can only retrieve what your identity entitles you to),
|
||
row/column masking, role-aware context injection, immutable audit, instant kill-switch,
|
||
model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify
|
||
identity + scope at the tool boundary every call.
|
||
|
||
**Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` /
|
||
`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant
|
||
constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the
|
||
**deepest, most defensible** part of the story.
|
||
|
||
**Say this.**
|
||
- "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
|
||
- "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."
|
||
|
||
(Full regulatory mapping in `05-banking-blueprints.md`.)
|
||
|
||
---
|
||
|
||
## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)
|
||
|
||
**Concept.** The flagship use cases: **customer-support automation** (grounded answers from
|
||
policy/product docs with citations + escalation), **compliance-document retrieval** (find the
|
||
controlling clause across filings/policies), **regulatory reporting**, **model risk management**
|
||
(SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG).
|
||
|
||
**Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans,
|
||
alerts, profiles — a domain where wrong answers have consequences and auditability matters. The
|
||
patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.
|
||
|
||
**Say this.**
|
||
- "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source."
|
||
- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."
|
||
|
||
**Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from
|
||
trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."
|