From 076449268bef9a50faa1952b12c53f333bb648a0 Mon Sep 17 00:00:00 2001 From: Hermes VM Date: Sun, 31 May 2026 10:48:52 +0000 Subject: [PATCH] docs(interview): add Senior Agentic RAG Architect prep kit 7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 --- docs/INTERVIEW/01-ecosystem-rag-fabric.md | 279 ++++++++++++++++++++++ docs/INTERVIEW/02-competency-deepdives.md | 224 +++++++++++++++++ docs/INTERVIEW/03-star-interview-bank.md | 143 +++++++++++ docs/INTERVIEW/04-enhancement-roadmap.md | 191 +++++++++++++++ docs/INTERVIEW/05-banking-blueprints.md | 175 ++++++++++++++ docs/INTERVIEW/06-glossary-quickref.md | 112 +++++++++ docs/INTERVIEW/README.md | 113 +++++++++ 7 files changed, 1237 insertions(+) create mode 100644 docs/INTERVIEW/01-ecosystem-rag-fabric.md create mode 100644 docs/INTERVIEW/02-competency-deepdives.md create mode 100644 docs/INTERVIEW/03-star-interview-bank.md create mode 100644 docs/INTERVIEW/04-enhancement-roadmap.md create mode 100644 docs/INTERVIEW/05-banking-blueprints.md create mode 100644 docs/INTERVIEW/06-glossary-quickref.md create mode 100644 docs/INTERVIEW/README.md diff --git a/docs/INTERVIEW/01-ecosystem-rag-fabric.md b/docs/INTERVIEW/01-ecosystem-rag-fabric.md new file mode 100644 index 0000000..e436fe4 --- /dev/null +++ b/docs/INTERVIEW/01-ecosystem-rag-fabric.md @@ -0,0 +1,279 @@ +# 01 · The ByteLyst Ecosystem as an Agentic RAG Fabric + +The trick in this interview is to stop treating ByteLyst as "a bunch of side projects" +and start describing it as **one governed retrieval fabric with multiple agentic +front-ends**. Every diagram below is something you can reproduce on a whiteboard. + +--- + +## 1. System context — what we actually run + +```mermaid +flowchart TB + subgraph Users["👤 Humans & Agents"] + U1[End users
web / mobile] + U2[Coding agents
claude · codex · devin] + U3[Operators
Hermes Mission Control] + end + + subgraph Fronts["Agentic Product Surfaces"] + P1["invt_trdg
AI trading chat
(tool-calling over markets)"] + P2["flowmonk
planning + bounded AI layer"] + P3["notelett
notes for humans + agents"] + P4["chronomind
contextual time AI"] + end + + subgraph Platform["common_plat — the shared fabric"] + PS["platform-service :4003
auth · flags · telemetry · billing · blob"] + ES["extraction-service :4005
URL / doc → retrievable units"] + MCP["mcp-server :4007
tool / resource registration"] + LR["packages/llm-router
provider abstraction"] + end + + subgraph Data["Governed Data Sources"] + DB[("Cosmos DB
docs + Gremlin graph")] + PG[("Postgres
structured + pgvector*")] + EV[("event-store
immutable audit")] + BLOB[("blob
raw documents")] + end + + subgraph Ops["Control Plane"] + HERMES["Hermes Mission Control
(devops_tools/dashboard)"] + AQ["agent-queue
multi-agent runner"] + end + + U1 --> P1 & P2 & P3 & P4 + U2 --> AQ + U3 --> HERMES + P1 & P2 & P3 & P4 --> PS + P1 & P2 & P3 & P4 --> MCP + MCP --> LR + ES --> BLOB + PS --> DB & PG & EV + MCP --> DB & PG + AQ --> MCP + HERMES --> PS + HERMES -.observes.-> ES & MCP & LR + + classDef plan fill:#fef3c7,stroke:#d97706 + class PG,LR plan +``` + +> `*` pgvector and the Gremlin graph are the planned hardening (see `04-enhancement-roadmap.md`). +> Everything else is a real, deployed component of the ecosystem. + +**How to narrate it:** *"The platform-service is my policy/identity plane, the mcp-server +is my tool-boundary plane, llm-router is my model plane, and the data sources are governed +behind both. Any product surface is just a thin agentic UI over that fabric — which is +exactly the shape of an enterprise agentic-RAG platform."* + +--- + +## 2. The reference agentic-RAG container view + +This is the canonical picture the interviewer wants to see — drawn in *our* components. + +```mermaid +flowchart LR + Q[User query] --> ROUTER + + subgraph Orchestration["Agentic Orchestration (LangGraph-shaped)"] + ROUTER{{"Router / planner agent
intent + complexity"}} + RETR["Retriever agent"] + GRADE{{"Relevance grader
(CRAG gate)"}} + REWRITE["Query rewriter
(HyDE)"] + GEN["Generator agent
+ citation enforcer"] + CRITIC{{"Self-RAG critic
groundedness check"}} + end + + subgraph Retrieval["Hybrid Retrieval Fabric"] + VEC[("Vector
pgvector / Azure AI Search")] + BM25[("Lexical
BM25")] + GRAPH[("Graph traversal
Cosmos Gremlin")] + SQL[("Structured
schema-aware SQL tool")] + RERANK["Cross-encoder rerank
+ context compression"] + end + + subgraph Gov["Governance plane (every hop)"] + ACL["Access-controlled retrieval
auth + row/col masking"] + AUDIT["event-store audit trail"] + KILL["kill-switch / flags"] + end + + ROUTER --> RETR + RETR --> VEC & BM25 & GRAPH & SQL + VEC & BM25 & GRAPH & SQL --> RERANK + RERANK --> GRADE + GRADE -- "low relevance" --> REWRITE --> RETR + GRADE -- "ok" --> GEN + GEN --> CRITIC + CRITIC -- "ungrounded" --> REWRITE + CRITIC -- "grounded + cited" --> A[Answer + citations] + + RETR -.enforced by.-> ACL + GEN -.logged to.-> AUDIT + ROUTER -.gated by.-> KILL +``` + +**Key talking points keyed to the JD:** +- *Hybrid search (vector + BM25 + graph)* → the four parallel retrievers fan-out, reranker fans-in. +- *Reranking + context compression* → the `RERANK` node (cross-encoder, e.g. ColBERT late-interaction or a bge-reranker). +- *CRAG* → the `GRADE` gate that triggers corrective re-retrieval. +- *HyDE* → the `REWRITE` node generating a hypothetical answer to embed. +- *Self-RAG* → the `CRITIC` node reflecting on groundedness before release. +- *Access-controlled retrieval / Zero Trust / audit* → the governance plane wraps **every** hop, not just the entrance. + +--- + +## 3. Multi-agent orchestration topology (we run a real one) + +`agent-queue/` is a production folder-kanban that drives **three different agent engines** +(`claude`, `codex`, `devin`) through an explicit state machine. That *is* multi-agent +orchestration — and it's the strongest "I've shipped agents" story you have. + +```mermaid +stateDiagram-v2 + [*] --> inbox: drop prompt .md + inbox --> doing: runner claims (auto-approve) + doing --> done: success + doing --> failed: error / timeout + failed --> inbox: requeue (human-in-loop) + done --> [*] + + note right of doing + Engine selected per task: + claude · codex · devin + = heterogeneous agent pool + end note +``` + +Map this to LangGraph vocabulary in the room: + +| agent-queue concept | LangGraph / agentic equivalent | +|---|---| +| `inbox/doing/done/failed` folders | graph **nodes** / state enum | +| runner claiming + transitioning | **conditional edges** | +| engine flag (claude/codex/devin) | **tool/agent binding** per node | +| `failed → inbox` requeue | **cyclic edge** w/ human-in-the-loop checkpoint | +| live `status`/`watch` | **state checkpointer** + observability | + +> Honest framing: *"I built this deliberately framework-light to stay bash-portable and +> dependency-free. The state model is identical to LangGraph; porting it onto LangGraph's +> `StateGraph` mostly buys me typed state, built-in checkpointing, and the A2A handoff +> contract — which is exactly the enhancement I've scoped."* + +--- + +## 4. MCP server — Zero-Trust tool boundary + +This is your strongest *governance* asset and a direct hit on a Preferred Qualification +("MCP server architecture, tool/resource registration patterns, agentic security threat +modeling"). We run `mcp-server` on :4007 with `packages/mcp-client`. + +```mermaid +flowchart TB + subgraph Agent["Agent (untrusted by default)"] + A[LLM reasoning loop] + end + + subgraph Boundary["mcp-server :4007 — policy enforcement point"] + REG["Tool / resource registry
(declared, typed, versioned)"] + AUTHZ{"AuthZ check
identity + scope + role"} + MASK["Row/column masking
field-encrypt"] + RATE["Rate / cost limits + kill-switch"] + LOG["Audit emit → event-store"] + end + + subgraph Resources["Governed resources"] + T1[Market data tool] + T2[Doc retrieval tool] + T3[Graph query tool] + T4[Text-to-SQL tool
read-only views] + end + + A -- "tool call (intent)" --> REG + REG --> AUTHZ + AUTHZ -- deny --> A + AUTHZ -- allow --> MASK + MASK --> RATE + RATE --> T1 & T2 & T3 & T4 + T1 & T2 & T3 & T4 --> LOG + LOG --> A +``` + +**Threat-model talking points** (say these — they signal seniority): +- **Confused-deputy:** the agent never holds raw credentials; the MCP server exchanges the *user's* scoped identity, so a tool can't be tricked into over-broad reads. +- **Tool-poisoning / prompt injection via retrieved content:** retrieved text is treated as data, never as instructions; the generator is sandboxed from re-invoking tools without re-passing the AuthZ gate. +- **Exfiltration:** column masking + egress logging means even a successful injection can't surface PII it wasn't entitled to. +- **Blast radius:** `kill-switch-client` lets us disable a model or a single tool instantly without redeploying — critical for SR 11-7 "ability to constrain a model in production." + +--- + +## 5. Governance & grounding plane (the part that wins regulated deals) + +```mermaid +flowchart LR + subgraph Ingest["Ingestion governance"] + CLASS["Data classification
(public / internal / PII)"] + EMB["Embedding + metadata tags
tenant · sensitivity · source"] + end + subgraph Query["Query-time governance"] + IDENT["Caller identity + role"] + FILTER["Namespace + ACL filter
(pre-retrieval)"] + RETR2["Retrieve only entitled chunks"] + end + subgraph Answer["Answer governance"] + CITE["Mandatory citation
(source attribution)"] + FAITH["Faithfulness score
(RAGAS / LLM-as-judge)"] + CARD["Model card + decision log"] + end + CLASS --> EMB --> RETR2 + IDENT --> FILTER --> RETR2 --> CITE --> FAITH --> CARD + FAITH -- "below SLA" --> ABSTAIN["Abstain / escalate to human"] +``` + +This single diagram covers four JD bullets at once: **access-controlled retrieval**, +**citation/source attribution**, **faithfulness SLAs**, and **model cards / audit**. +The `ABSTAIN` branch is the line that separates a demo from a regulated system — *"in +banking, a confident wrong answer is a worse outcome than 'I don't know, here's a human.'"* + +--- + +## 6. Multi-tenant / namespace isolation (real concern here already) + +We *already* think in tenants: every product has a `productId`, and Hermes runs **two +isolated instances (Vijay / Bheem)** with separate users, services, and backup repos. That +is the same isolation discipline a vector DB needs. + +```mermaid +flowchart TB + subgraph T_A["Tenant A (productId=invt_trdg)"] + NSA["Vector namespace A"] + GA["Graph partition A"] + SA["SQL schema A (RLS)"] + end + subgraph T_B["Tenant B (productId=notelett)"] + NSB["Vector namespace B"] + GB["Graph partition B"] + SB["SQL schema B (RLS)"] + end + POLICY["platform-service
tenant resolver + auth"] --> NSA & NSB & GA & GB & SA & SB +``` + +> *"Namespace isolation isn't a vector-DB feature I'd discover late — it's how the whole +> platform is partitioned. Pinecone namespaces / Azure AI Search index-per-tenant / +> pgvector schema-per-tenant are just the storage expression of a `productId` model I +> already run."* + +--- + +## Cheat-sheet: which diagram answers which question + +| If they ask… | Draw | +|---|---| +| "Walk me through your RAG architecture" | §2 container view | +| "How do you orchestrate multiple agents?" | §3 state machine | +| "How is this secure / Zero Trust?" | §4 MCP boundary | +| "How do you prevent hallucination in production?" | §5 governance plane (CRITIC + ABSTAIN) | +| "How do you handle multi-tenancy at scale?" | §6 isolation | +| "What does your whole platform look like?" | §1 context | diff --git a/docs/INTERVIEW/02-competency-deepdives.md b/docs/INTERVIEW/02-competency-deepdives.md new file mode 100644 index 0000000..3da773f --- /dev/null +++ b/docs/INTERVIEW/02-competency-deepdives.md @@ -0,0 +1,224 @@ +# 02 · Competency Deep-Dives + +One section per competency-matrix row. Each gives you: **the concept** (so you sound +fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest +edge** (where to pivot to roadmap rather than overclaim). + +--- + +## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice) + +**Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes +(LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability +and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output, +retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol +(agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational +multi-agent loops. + +**Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin) +with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see +`01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding; +`packages/llm-router` is the model-selection layer a node would call. + +**Say this.** +- "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves." +- "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ." +- "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag." + +**Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable). +The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground." + +--- + +## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice) + +**Concept.** +- **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable. +- **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction. +- **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap. +- **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating. +- **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not. +- **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing"). + +**Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs +into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured +data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement +(`04 §B`). + +**Say this.** +- "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails." +- "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision." +- "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill." + +**Honest edge.** Lead with the *reasoning about when to use each*, which is the architect +signal; the implementations are well-trodden and scoped on our roadmap. + +--- + +## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice) + +**Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are +**wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only +semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/ +column descriptions), validate/parse the SQL before execution, enforce **row-level security**, +and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to +governed data. + +**Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed +operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's +the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than +emitting arbitrary SQL. + +**Say this.** +- "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS." +- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog." + +**Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped; +in production I've leaned on the tool-calling pattern because it's safer in regulated data." + +--- + +## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice) + +**Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware** +parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR** +(Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic +boundaries, not byte counts. Tables and figures need special handling (serialize tables to +markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section) +— mandatory for citations. + +**Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc +extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece +(`04 §B`). + +**Say this.** +- "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'" +- "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one." + +--- + +## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice) + +**Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected +to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds +it as structured context — ideal for "show the ownership chain," "trace this transaction +counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest, +build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then +generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes). + +**Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin** +graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already +model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is +`04 §D`. + +**Say this.** +- "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood." +- "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped." + +**Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure +we already operate, not a new platform bet." + +--- + +## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice) + +**Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant** +(open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector + +BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your +Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** = +namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) / +schema or row filter (pgvector). Metadata filtering is as important as the ANN index. + +**Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives +transactional consistency with the source rows. Multi-tenancy is already first-class +(`productId`, two-instance Hermes) — see `01 §6`. + +**Say this.** +- "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary." +- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model." +- "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion." + +--- + +## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice) + +**Concept.** The core metrics: +- **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination) +- **Answer relevancy** — does the answer address the question? +- **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality) +- **Context recall** — did we retrieve *all* needed evidence? (retriever quality) +- **Answer correctness** — vs. ground truth. + +**RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback +functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/ +eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain"). + +**Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is +the source of truth, and **the AI layer is constrained to explanation / safe recommendation** — +it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture. +`diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the +ready-made home for an eval harness + drift monitor (`04 §E`). + +**Say this.** +- "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)." +- "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability." +- "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change." + +--- + +## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice) + +**Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model +endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index; +Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the +model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the +customer's tenancy. + +**Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`). +`packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex +swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference. + +**Say this.** +- "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency." +- "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path." + +--- + +## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice) + +**Concept.** Governance is **structural**, applied at every hop (see `01 §4–5`): +access-controlled retrieval (you can only retrieve what your identity entitles you to), +row/column masking, role-aware context injection, immutable audit, instant kill-switch, +model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify +identity + scope at the tool boundary every call. + +**Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` / +`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant +constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the +**deepest, most defensible** part of the story. + +**Say this.** +- "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means." +- "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately." + +(Full regulatory mapping in `05-banking-blueprints.md`.) + +--- + +## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice) + +**Concept.** The flagship use cases: **customer-support automation** (grounded answers from +policy/product docs with citations + escalation), **compliance-document retrieval** (find the +controlling clause across filings/policies), **regulatory reporting**, **model risk management** +(SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG). + +**Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans, +alerts, profiles — a domain where wrong answers have consequences and auditability matters. The +patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank. + +**Say this.** +- "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source." +- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval." + +**Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from +trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely." diff --git a/docs/INTERVIEW/03-star-interview-bank.md b/docs/INTERVIEW/03-star-interview-bank.md new file mode 100644 index 0000000..c74e625 --- /dev/null +++ b/docs/INTERVIEW/03-star-interview-bank.md @@ -0,0 +1,143 @@ +# 03 · STAR Interview Bank + +Twelve stories, each grounded in real ByteLyst work, in **Situation · Task · Action · +Result** form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the +**bold** line is your headline if you only get 20 seconds. + +> Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server, +> llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance +> isolation). Where a story references planned work, it's labeled — present those as +> *design decisions and roadmaps you own*, not as shipped-and-measured outcomes. + +--- + +## 1. Multi-agent orchestration without a heavy framework +**Proves:** Agentic frameworks · orchestration topology · state-machine design + +- **S** — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM. +- **T** — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2. +- **A** — Designed `agent-queue` as a **folder-kanban state machine**: `inbox→doing→done/failed` with a `failed→inbox` requeue for human-in-the-loop, an engine flag binding each task to an agent, and `status`/`watch`/`logs` for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges. +- **R** — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. **The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.** + +--- + +## 2. A Zero-Trust tool boundary for agents (MCP) +**Proves:** MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval + +- **S** — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access. +- **T** — Make the tool layer a **policy enforcement point**, not a passthrough. +- **A** — Centralized tools behind `mcp-server` (:4007) with `mcp-client`: a typed/versioned tool registry, an authZ check on **every** call (identity + scope + role), column masking via `field-encrypt`, rate/cost caps with a `kill-switch`, and an audit emit to `event-store`. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration. +- **R** — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. **Governance lives in the boundary, so no product surface can route around it.** + +--- + +## 3. Grounding by architecture, not by prompt (flowmonk) +**Proves:** Grounding · hallucination mitigation · faithfulness + +- **S** — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant. +- **T** — Deliver helpful AI without letting the model be the source of truth. +- **A** — Made a **deterministic scheduler authoritative** and **constrained the AI layer to explanation, summarization, and safe recommendation only**. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail. +- **R** — The assistant is helpful *and* can't hallucinate an invalid plan into existence. **This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.** + +--- + +## 4. Schema-aware tool-calling instead of free Text-to-SQL +**Proves:** Structured retrieval · Text-to-SQL judgment · safety + +- **S** — `invt_trdg` users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals. +- **T** — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL. +- **A** — Built the AI chat as **typed, parameterized tool-calling** over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool. +- **R** — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. **I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.** + +--- + +## 5. Provider-portable model layer (llm-router) +**Proves:** Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing + +- **S** — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change. +- **T** — Make model choice a config decision. +- **A** — Introduced `packages/llm-router` as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with `ollama-client` for on-prem/air-gapped inference. +- **R** — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. **Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.** + +--- + +## 6. Multi-tenant isolation as a platform default +**Proves:** Vector DB multi-tenancy · namespace isolation · governance + +- **S** — Several products share one platform; a cross-tenant data leak would be catastrophic. +- **T** — Make isolation structural, not per-feature discipline. +- **A** — Every product carries a `productId`; Hermes runs **two fully isolated instances (Vijay/Bheem)** with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant. +- **R** — Isolation is the default the whole platform is partitioned by. **When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.** + +--- + +## 7. Unstructured ingestion pipeline (extraction-service) +**Proves:** Unstructured retrieval · ingestion · provenance + +- **S** — Agents needed to answer from external documents and URLs, not just structured data. +- **T** — Turn messy unstructured sources into clean, retrievable, attributable units. +- **A** — Built `extraction-service` (:4005) + `packages/extraction` to parse URLs/docs into retrievable units; `notelett` provides a structured-notes store for human+agent content. +- **R** — A working ingestion path into the fabric. **The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'** + +--- + +## 8. Operational observability for AI systems (Hermes) +**Proves:** Eval-harness home · drift monitoring · production ops + +- **S** — Running agentic services in production with no single pane meant blind spots. +- **T** — One control plane for the agentic fabric. +- **A** — Built **Hermes Mission Control** (Next.js + Fastify) with `diagnostics-client`/`telemetry-client`/`monitoring`; the `hermes-ops` module already models both instances as the seed for real data. +- **R** — A live ops console for the ecosystem. **It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.** + +--- + +## 9. Instant blast-radius control (kill-switch + flags) +**Proves:** Governance · Zero Trust · SR 11-7 ("constrain a model in production") + +- **S** — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle. +- **T** — Decouple "turn this off" from "ship a release." +- **A** — Adopted `feature-flag-client` + `kill-switch-client` so any model or individual tool can be disabled live; combined with `event-store` audit so the action is logged. +- **R** — Sub-minute containment without a redeploy. **This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.** + +--- + +## 10. Disaster recovery + parity discipline +**Proves:** Production rigor · regulated-grade operations + +- **S** — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot. +- **T** — Drive both to parity with persistent backup, watchdog, and **tested** restore. +- **A** — Documented the gap explicitly in the v2 roadmap (`hermes_dashboard_v2_roadmap.md`) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance. +- **R** — A named, prioritized closure plan. **In regulated environments 'we have backups' is not a control until restore is *tested*; I treat untested DR as an open finding, not a checkbox.** + +--- + +## 11. Bounded autonomy with human-in-the-loop +**Proves:** Agentic safety · orchestration · abstain-and-escalate + +- **S** — Autonomous agents that never escalate will confidently do the wrong thing. +- **T** — Build escalation into the topology. +- **A** — In `agent-queue`, `failed` routes back to `inbox` for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to **abstain/escalate** (see `01 §5`). +- **R** — The system degrades to a human instead of degrading to a hallucination. **The escalation edge is the most important edge in the graph for a regulated deployment.** + +--- + +## 12. Documentation & decision rigor as an architect +**Proves:** ADRs · blueprints · roadmaps · mentoring / CoE contribution + +- **S** — A multi-product ecosystem with multiple agent engines drifts without written decisions. +- **T** — Make architecture legible to engineers and execs. +- **A** — Maintained an ADR directory, roadmaps (`hermes_*_roadmap.md`, `deployment-optimization-roadmap.md`), a repo map, and agent-facing `AGENTS.md`/`CLAUDE.md` so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator. +- **R** — New contributors (human or agent) onboard from canonical docs. **This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.** + +--- + +## Behavioral / leadership prompts — quick frames + +| Prompt | Lead with | +|---|---| +| "Tell me about a time you influenced without authority." | #12 docs/ADRs driving multi-agent consistency. | +| "A production AI system gave a wrong answer. What did you do?" | #3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch. | +| "How do you handle disagreement on architecture?" | ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing. | +| "Describe mentoring junior engineers." | The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task. | +| "Biggest technical mistake?" | Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore. | +| "Why this role / why financial services?" | Trading product taught me to engineer for *consequences*; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off. | diff --git a/docs/INTERVIEW/04-enhancement-roadmap.md b/docs/INTERVIEW/04-enhancement-roadmap.md new file mode 100644 index 0000000..7755d19 --- /dev/null +++ b/docs/INTERVIEW/04-enhancement-roadmap.md @@ -0,0 +1,191 @@ +# 04 · Enhancement Roadmap — make every claim literally true + +This is the "what would you build here" answer, and it doubles as a real backlog. Each +enhancement turns an *adjacent* capability into a *shipped* one on infrastructure we +already run. They're ordered so each builds on the last; the whole set is a credible +"agentic-RAG fabric, hardened" program. + +> Mapping note: these slot into the existing repo conventions — new code under +> `learning_ai_common_plat/packages` + a `services/rag-service`, eval harness surfaced in +> `learning_ai_devops_tools/dashboard` (Hermes), and ADRs under +> `learning_ai_devops_tools/docs/adr/`. Cut tracker items via `scripts/tracker-seed/`. + +```mermaid +flowchart LR + A["§A LangGraph port
+ A2A agent cards"] --> B["§B Hybrid retrieval
pgvector+BM25+rerank+HyDE/CRAG/Self-RAG"] + B --> C["§C Guarded Text-to-SQL
read-only views + RLS"] + B --> D["§D Cosmos Gremlin
knowledge graph + Graph RAG"] + B --> E["§E RAGAS/DeepEval harness
+ drift monitor in Hermes"] + C & D --> F["§F Model-card registry
+ governance pack"] + E --> F + classDef p1 fill:#dcfce7,stroke:#16a34a + classDef p2 fill:#fef9c3,stroke:#ca8a04 + classDef p3 fill:#fee2e2,stroke:#dc2626 + class A,B p1 + class C,D,E p2 + class F p3 +``` + +| Phase | Enhancements | Why now | +|---|---|---| +| **P1 (foundation)** | §A, §B | Orchestration + retrieval are the spine; everything else attaches to them. | +| **P2 (sources + quality)** | §C, §D, §E | Add structured + graph sources and the eval loop that proves quality. | +| **P3 (governance)** | §F | Wrap the now-real fabric in the regulated-grade governance story. | + +--- + +## §A — Port `agent-queue` topology onto LangGraph + add A2A handoff + +**Goal:** make the "prod-grade LangGraph" claim literal while keeping the proven state model. + +- New `packages/agent-graph`: a typed `StateGraph` with nodes `route → retrieve → grade → (rewrite) → generate → critique`, conditional + cyclic edges, and a checkpointer backed by `event-store`. +- Keep `agent-queue`'s engine-selection idea as **node-level model binding** through `llm-router`. +- Expose each product agent with an **A2A agent card** (capabilities, auth scope, cost hints) so a supervisor agent can delegate; the card is served by `mcp-server`. + +```mermaid +stateDiagram-v2 + [*] --> route + route --> retrieve: needs evidence + route --> generate: parametric/FAQ + retrieve --> grade + grade --> rewrite: low relevance (CRAG) + rewrite --> retrieve + grade --> generate: ok + generate --> critique + critique --> rewrite: ungrounded (Self-RAG) + critique --> [*]: grounded + cited +``` + +**Acceptance:** a LangGraph run with a forced low-relevance retrieval demonstrably loops +through `rewrite`; checkpoints land in `event-store`; one product reachable via A2A card. +**Effort:** M. **Risk:** low (mapping is 1:1 with today's state machine). + +--- + +## §B — Hybrid retrieval: pgvector + BM25 + rerank + HyDE / CRAG / Self-RAG + +**Goal:** turn "I understand hybrid RAG" into a running `services/rag-service`. + +- **pgvector** alongside the existing Postgres → one DB, one backup, transactional consistency with source rows; **schema-per-tenant** namespaces (mirrors `productId`). +- **BM25** lexical (Postgres FTS or an OpenSearch sidecar) fused with vector via **RRF**. +- **Cross-encoder rerank** (bge-reranker or ColBERT late-interaction) on the fused candidates; **context compression** to fit budget. +- **HyDE** query rewriter node; **CRAG** relevance gate; **Self-RAG** groundedness critic (the §A nodes). +- **Layout-aware ingestion** in `extraction-service`: PyMuPDF / Unstructured.io, OCR fallback, table preservation, **page/section provenance** on every chunk. + +```mermaid +flowchart LR + Q --> HYDE[HyDE rewrite] --> EMB[embed] + EMB --> VEC[(pgvector ANN)] + Q --> BM[(BM25)] + VEC & BM --> RRF[RRF fuse] --> RR[cross-encoder rerank] --> CC[context compress] --> GEN +``` + +**Acceptance:** hybrid beats vector-only on a golden set (context-recall ↑, context-precision ↑); +every chunk carries doc/page/section provenance; abstain fires when reranked top-score < τ. +**Effort:** L. **Risk:** medium (reranker latency budget — mitigate with rerank-top-k only). + +--- + +## §C — Guarded Text-to-SQL tool + +**Goal:** add genuine generative SQL for ad-hoc analytics without the foot-guns. + +- Register a `sql-query` tool on `mcp-server` scoped to **read-only semantic views** (no base tables), with **row-level security** by tenant/role. +- **Schema-aware retrieval:** embed table/column descriptions; retrieve only the relevant schema slice into the prompt (don't dump the catalog). +- Parse + validate generated SQL (allow-list of statements, forbid cross-schema joins, enforce `LIMIT`); cost-cap and timeout. +- Audit every generated query + row count to `event-store`. + +**Acceptance:** an attempt to read an unentitled column is blocked at the view/RLS layer +and logged; a malformed/oversized query is rejected pre-execution. +**Effort:** M. **Risk:** medium (this is the highest-leakage surface — keep it behind views). + +--- + +## §D — Cosmos Gremlin knowledge graph + Graph RAG + +**Goal:** answer "connected-to" questions (KYC/AML-shaped) on infra we already run. + +- Use the existing **Azure Cosmos DB Gremlin** API. Entity/relation extraction at ingest (from `extraction-service` output + structured rows) builds the graph. +- **Graph-augmented retrieval:** vector hit seeds an entry node → bounded Gremlin traversal returns the subgraph → fuse subgraph + text chunks into context. +- Expose a `graph-query` tool on `mcp-server` (read-only, depth-bounded). + +```mermaid +flowchart LR + Q --> V[(vector seed)] --> N[entry entity] + N --> G[(Gremlin traversal
≤2 hops)] + G --> SUB[subgraph] + SUB --> FUSE[fuse w/ text chunks] --> GEN +``` + +**Acceptance:** a 2-hop relationship question that vector-only fails is answered correctly +with the subgraph cited; traversal depth/time bounded. +**Effort:** L. **Risk:** medium (graph modeling + traversal cost). + +--- + +## §E — Evaluation harness + factual-drift monitor in Hermes + +**Goal:** make "RAGAS / faithfulness SLAs / drift monitoring" real and visible. + +- **Offline (CI):** **DeepEval** pytest-style assertions on a golden set — faithfulness, answer-relevancy, context-precision, context-recall, answer-correctness. Regression below threshold **blocks deploy**. +- **Online:** sample production traces, score with **RAGAS / LLM-as-judge**, emit metrics via `telemetry-client`. +- **Hermes pane:** a "RAG Quality" panel (extends `hermes-ops`) trending the five metrics per tenant + a **drift alert** when faithfulness/recall degrade week-over-week. +- Wire **abstain rate** and **escalation rate** as first-class SLAs. + +```mermaid +flowchart TB + subgraph CI["Offline / CI (DeepEval)"] + G[golden set] --> SC1[score] --> GATE{≥ SLA?} + GATE -- no --> BLOCK[block deploy] + GATE -- yes --> SHIP[ship] + end + subgraph PROD["Online (RAGAS / judge)"] + TR[sampled traces] --> SC2[score] --> TEL[telemetry-client] --> HERMES[Hermes RAG-Quality pane] + HERMES --> DRIFT{drift?} -- yes --> ALERT[alert + open finding] + end +``` + +**Acceptance:** a deliberately-degraded retriever fails the CI gate; the Hermes pane shows +the five metrics per tenant and fires a drift alert on a seeded regression. +**Effort:** M. **Risk:** low-medium (judge cost — sample, don't score 100%). + +--- + +## §F — Model-card registry + governance pack + +**Goal:** the regulated-grade documentation/audit layer (SR 11-7 / EU AI Act ready). + +- **Model-card registry** (a `governance` package + Hermes pane): per deployed model/agent — purpose, data sources, eval scores, known limits, owner, last-reviewed date, kill-switch link. +- **Decision log:** every generation's (query, retrieved sources, model, faithfulness score, abstain/answer) to `event-store` → reproducible audit trail. +- **RACI doc** template per engagement; **ADR** set under `docs/adr/` for each architectural choice. +- Map controls to **SR 11-7** (model inventory, validation, monitoring, change control) and **EU AI Act** (risk classification, logging, human oversight, transparency) — see `05-banking-blueprints.md`. + +**Acceptance:** every production model has a card with current eval scores + owner; any +answer can be reconstructed from the decision log; controls trace to named regulatory clauses. +**Effort:** M. **Risk:** low (mostly assembly over existing `event-store`/flags/auth). + +--- + +## Sequencing & "what I'd do in the first 90 days" (great closing answer) + +```mermaid +gantt + title Agentic-RAG hardening — 90-day view + dateFormat X + axisFormat %s + section Foundation + §A LangGraph + A2A :a, 0, 3 + §B Hybrid retrieval :b, 1, 5 + section Sources & Quality + §C Guarded Text-to-SQL :c, 5, 3 + §D Graph RAG (Gremlin) :d, 5, 4 + §E Eval harness + drift :e, 4, 4 + section Governance + §F Model cards + RACI :f, 8, 3 +``` + +> *"In 90 days I'd stand up the retrieval spine and the eval harness first — because you +> can't tune what you can't measure — then layer structured + graph sources, and close with +> the governance pack so the whole thing is audit-ready. Notice governance isn't last +> because it's least important; it's last because by then it's mostly **assembling controls +> the platform already enforces** (auth, masking, kill-switch, audit) into cards and RACI."* diff --git a/docs/INTERVIEW/05-banking-blueprints.md b/docs/INTERVIEW/05-banking-blueprints.md new file mode 100644 index 0000000..d6d7467 --- /dev/null +++ b/docs/INTERVIEW/05-banking-blueprints.md @@ -0,0 +1,175 @@ +# 05 · Banking Solution Blueprints (client-ready) + +Two end-to-end blueprints you can present to a financial-services client, in the JD's own +deliverable formats: **solution architecture + ADRs + phased roadmap + regulatory mapping**. +Both reuse the ByteLyst fabric patterns from `01-ecosystem-rag-fabric.md`. + +--- + +# Blueprint 1 — Compliance Document Retrieval Assistant + +**Use case:** compliance analysts ask natural-language questions ("What is our retention +obligation for KYC records under the latest policy?") and get a **grounded, cited** answer +drawn from regulatory filings, internal policies, and procedure manuals — or an explicit +*"insufficient evidence, escalate."* + +## Architecture + +```mermaid +flowchart TB + AN[👤 Compliance analyst] --> APP[Assistant UI] + APP --> ORCH + + subgraph ORCH["Agentic orchestration (LangGraph)"] + R{{route}} --> RET[retrieve] --> GR{{CRAG grade}} + GR -- weak --> RW[HyDE rewrite] --> RET + GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}} + CR -- ungrounded --> RW + CR -- grounded --> OUT[answer + clause citations] + CR -- no evidence --> ESC[escalate to human] + end + + subgraph RETR["Hybrid retrieval"] + VEC[(Azure AI Search
vector + BM25 + semantic rerank)] + KG[(Cosmos Gremlin
policy ⇄ regulation graph)] + end + RET --> VEC & KG + + subgraph GOV["Governance plane"] + ACL[role-aware ACL filter] + AUD[event-store audit] + CARD[model card + decision log] + end + RET -.-> ACL + GEN -.-> AUD + OUT -.-> CARD +``` + +**Why these choices (headline ADRs below):** Azure AI Search gives managed hybrid + +semantic rerank inside one audit boundary; the Gremlin graph links *policies ↔ controlling +regulations* so "what regulation drives this clause" is a traversal, not a guess; the critic ++ escalate edge guarantees no confident-wrong answers on compliance questions. + +## Ingestion (layout-aware, provenance-first) + +```mermaid +flowchart LR + DOC[Filings · policies · procedures
PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io
+ OCR fallback] + PARSE --> CHUNK[layout + semantic chunking
tables preserved] + CHUNK --> META[attach provenance
doc·page·section·effective-date·sensitivity] + META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)] + META --> GRAPH[(extract policy↔reg edges → Gremlin)] +``` + +> **Effective-date metadata is a compliance requirement, not a nicety:** retrieval must be +> able to answer "as of" a date and never cite a superseded policy as current. + +## Phased delivery + +| Phase | Scope | Exit criteria | +|---|---|---| +| **0 · Discovery (2–3 wks)** | Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs | Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess) | +| **1 · PoC (4–6 wks)** | Hybrid retrieval over a bounded corpus, citations, abstain path | Beats keyword search on the golden set; every answer cited or escalated | +| **2 · Hardening (6–8 wks)** | Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor | SLAs met under eval harness; controls mapped to SR 11-7 | +| **3 · Production (ongoing)** | Model cards, audit, human-in-loop ops, change control | Audit trail reproducible; quarterly model-card review live | + +--- + +# Blueprint 2 — Customer-Support Automation (retail banking) + +**Use case:** a grounded support agent answers customer questions from product docs, fee +schedules, and account-policy content — with **strict masking of customer PII**, citations, +and instant handoff to a human for anything account-specific or low-confidence. + +## Architecture + +```mermaid +flowchart TB + C[👤 Customer] --> CH[Support chat] + CH --> ORCH2 + + subgraph ORCH2["Orchestration"] + RT{{route:
info vs. account-action}} + RT -- "general info" --> RAG[grounded RAG answer] + RT -- "account-specific" --> AUTHZ{step-up auth + entitlement} + AUTHZ -- ok --> TOOL[typed account tool via MCP
masked fields] + AUTHZ -- fail / sensitive --> HUMAN[human handoff] + RAG --> CONF{confidence ≥ SLA?} + CONF -- no --> HUMAN + CONF -- yes --> ANS[answer + citation] + end + + subgraph GOV2["Zero-Trust + governance"] + MASK[field-encrypt column masking] + KILL[kill-switch per tool/model] + LOG[event-store audit] + end + TOOL -.-> MASK + RT -.-> KILL + ANS -.-> LOG + TOOL -.-> LOG +``` + +**Key design stances:** +- **Two lanes by intent.** General-info → RAG over public/internal docs. Account-specific → + typed MCP tool behind **step-up auth + entitlement check + field masking**. The model never + free-queries customer data. +- **Confidence gate → human.** Below SLA, hand off. In banking support, escalation is a + feature, not a failure. +- **PII never enters the prompt unmasked.** Masking is enforced at the MCP boundary + (`field-encrypt`), so no prompt-engineering mistake can leak it. + +## Phased delivery (condensed) + +1. **Discovery** — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs. +2. **PoC** — info-lane RAG with citations + handoff; no account access yet. +3. **Account lane** — MCP typed tools, step-up auth, masking, full audit. +4. **Production** — eval harness, drift monitor, model cards, change control. + +--- + +# Cross-cutting: Regulatory control mapping + +This table is gold in the room — it shows you map *architecture* to *named clauses*. + +| Requirement | Source | How the architecture satisfies it | +|---|---|---| +| Model inventory & ownership | **SR 11-7** | Model-card registry (`04 §F`): every model/agent has a card with owner + purpose. | +| Independent validation | **SR 11-7 / OCC** | RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence. | +| Ongoing monitoring | **SR 11-7** | Online RAGAS scoring + factual-drift alerts in Hermes. | +| Ability to constrain a model in production | **SR 11-7** | `kill-switch-client` disables a model/tool live, audited. | +| Change control | **SR 11-7** | ADRs + CI eval gate; no deploy below faithfulness SLA. | +| Risk classification of AI system | **EU AI Act** | Blueprint declares risk tier; high-risk paths get human oversight by design. | +| Logging & traceability | **EU AI Act** | `event-store` decision log: query, sources, model, score, outcome — reproducible. | +| Human oversight | **EU AI Act** | Confidence-gate → human handoff edge in both blueprints. | +| Transparency to user | **EU AI Act** | Mandatory citations + "AI-assisted" disclosure + abstain language. | +| Right to data protection / minimization | **GDPR / CCPA** | Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks. | +| Data subject access / deletion | **GDPR / CCPA** | Provenance metadata + tenant namespaces make targeted deletion + re-index feasible. | + +--- + +# Sample ADRs (the format they want you to produce) + +### ADR-001 — Hybrid retrieval over pure-vector +- **Status:** Accepted +- **Context:** Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses. +- **Decision:** Vector ⊕ BM25 fused with RRF, then cross-encoder rerank. +- **Consequences:** +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries. + +### ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data +- **Status:** Accepted +- **Context:** Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof. +- **Decision:** Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS. +- **Consequences:** Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit. + +### ADR-003 — Abstain-and-escalate as a first-class outcome +- **Status:** Accepted +- **Context:** In regulated support/compliance, a confident wrong answer is the worst outcome. +- **Decision:** Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error. +- **Consequences:** Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal. + +### ADR-004 — Provider-portable model layer (router seam) +- **Status:** Accepted +- **Context:** Data-residency + vendor-risk requirements vary per client. +- **Decision:** All inference behind `llm-router`; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama. +- **Consequences:** Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture. diff --git a/docs/INTERVIEW/06-glossary-quickref.md b/docs/INTERVIEW/06-glossary-quickref.md new file mode 100644 index 0000000..80176a3 --- /dev/null +++ b/docs/INTERVIEW/06-glossary-quickref.md @@ -0,0 +1,112 @@ +# 06 · Glossary & Rapid-Fire Quick-Reference + +The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the +**bold** line cleanly for each, you sound fluent. + +--- + +## Advanced RAG techniques + +| Term | What it is | One-liner | +|---|---|---| +| **HyDE** | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed *that* and retrieve against it. | **"Fixes recall by closing the question↔document vocabulary gap."** | +| **CRAG** | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | **"A relevance gate that re-retrieves instead of generating from junk."** | +| **Self-RAG** | Model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*; loops if not. | **"The model critiques its own groundedness before answering."** | +| **RAPTOR** | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | **"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."** | +| **Reranking** | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | **"Fixes precision; I rerank only the top-k to control latency."** | +| **ColBERT** | Late-interaction reranking — token-level MaxSim. Accurate *and* scalable. | **"Token-level matching without re-encoding the whole pair per query."** | +| **Context compression** | Drop/condense retrieved spans to fit budget and cut distraction. | **"Less, more-relevant context beats more context."** | +| **Hybrid search** | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via **RRF**. | **"Vector for meaning, BM25 for exact terms, graph for relationships."** | +| **RRF** | Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed. | **"Tuning-free way to fuse vector + lexical ranks."** | +| **Semantic chunking** | Split on topic/structure boundaries, not byte counts. | **"Chunk on document structure first, size second."** | +| **Agentic RAG** | An agent *decides* when/what/how to retrieve, can use multiple tools and loop. | **"RAG as a control flow, not a single hop."** | + +--- + +## Evaluation metrics (RAGAS vocabulary) + +| Metric | Question it answers | What it isolates | +|---|---|---| +| **Faithfulness / groundedness** | Is every claim supported by retrieved context? | Hallucination | +| **Answer relevancy** | Does the answer address the question? | Generation focus | +| **Context precision** | Are the *top* retrieved chunks the relevant ones? | Reranker quality | +| **Context recall** | Did we retrieve *all* needed evidence? | Retriever quality | +| **Answer correctness** | Right vs. ground truth? | End-to-end | + +- **RAGAS** — library that computes these (often LLM-as-judge). +- **TruLens** — RAG-triad + feedback functions + tracing. +- **DeepEval** — pytest-style assertions → CI gates. +- **LangSmith** — tracing + eval ops for LangChain/LangGraph. + +> **Diagnostic move to say out loud:** *"Low context-recall → fix the **retriever** (HyDE, +> hybrid, chunking). High recall but low context-precision → fix the **reranker**. Good +> context but low faithfulness → fix the **generator/prompt** or **abstain**."* + +--- + +## Agentic frameworks + +| Term | Essence | +|---|---| +| **LangChain** | Primitives: tool binding, structured output, retrievers, chains. | +| **LangGraph** | Agents as **state graphs**: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. | +| **Google ADK** | Google's Agent Development Kit for building/deploying agents. | +| **A2A** | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. | +| **AutoGen** | Conversational multi-agent loops (agents talk to each other). | +| **MCP** | Model Context Protocol: standard for exposing **tools/resources** to models via a server, with typed registration. | + +--- + +## Governance & regulatory + +| Term | What to say | +|---|---| +| **SR 11-7** | US Fed/OCC **model risk management** guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. **"My eval harness + model cards + kill-switch + decision log map directly to its pillars."** | +| **OCC model risk** | Aligned with SR 11-7; emphasizes governance + effective challenge. | +| **EU AI Act** | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. **"High-risk paths get human-in-the-loop and full decision logging by design."** | +| **GDPR / CCPA** | Data protection/minimization, subject access/deletion. **"Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."** | +| **Zero Trust (for agents)** | Never trust the agent's request implicitly; verify identity + scope at the tool boundary **every** call. **"The MCP server is my policy enforcement point."** | +| **Access-controlled retrieval** | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). | +| **Row/column masking** | Mask sensitive fields at the boundary regardless of query. | +| **Model card** | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. | +| **RACI** | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. | + +--- + +## ByteLyst anchor cheat-sheet (so you never blank on "where have you done this") + +| JD theme | Say this anchor | +|---|---| +| Multi-agent orchestration | **`agent-queue`** — claude/codex/devin, `inbox→doing→done/failed` state machine. | +| MCP / Zero-Trust tool boundary | **`mcp-server` :4007 + `mcp-client`**, authZ per call, masking, kill-switch, audit. | +| Provider-portable models | **`llm-router`** (Azure OpenAI / Bedrock / Vertex / Ollama). | +| Grounding by architecture | **`flowmonk`** — deterministic engine authoritative, AI = explanation/safe-reco only. | +| Schema-aware structured retrieval | **`invt_trdg` AI chat** — typed tool-calling over markets, not free SQL. | +| Unstructured ingestion | **`extraction-service` :4005 + `packages/extraction`**; `notelett` store. | +| Graph (Cosmos Gremlin) | **`packages/cosmos`** — Cosmos DB Gremlin API in prod. | +| Vector / multi-tenancy | **pgvector path** + `productId` / two-instance Hermes isolation. | +| Eval / ops console | **Hermes Mission Control** + `telemetry`/`diagnostics`/`monitoring`. | +| Governance primitives | **`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`**. | +| Banking domain analog | **`invt_trdg`** — regulated-consequence domain; abstain-over-guess discipline. | + +--- + +## Likely curveballs + crisp answers + +- **"How do you stop prompt injection from retrieved docs?"** → "Treat retrieved text as *data*, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields." +- **"Vector DB choice?"** → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary." +- **"How do you measure if RAG is 'good enough' for prod?"** → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation." +- **"Free Text-to-SQL — yes or no?"** → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables." +- **"Latency vs. accuracy trade-off?"** → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains." +- **"How is this different from a chatbot demo?"** → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can *prove* why it answered and refuse when it shouldn't." +- **"What worries you most in agentic RAG for banking?"** → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call." + +--- + +## 30-second close + +> *"I build agentic systems where the hard engineering is in the boundaries — what a tool +> can retrieve, how output is grounded and cited, when to abstain, and how every hop is +> audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider- +> routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away +> from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."* diff --git a/docs/INTERVIEW/README.md b/docs/INTERVIEW/README.md new file mode 100644 index 0000000..f2ae20c --- /dev/null +++ b/docs/INTERVIEW/README.md @@ -0,0 +1,113 @@ +# Senior Agentic RAG Architect — Interview Prep Kit + +> Target role: **Senior Agentic RAG Architect — TEKsystems Global Services, Product Engineering Group** +> Candidate anchor: the **ByteLyst ecosystem** (this monorepo workspace). +> Purpose: turn what we already run in production into a defensible, evidence-backed +> narrative for every line of the job description — plus a concrete roadmap of +> enhancements that make each claim *literally true* if we choose to build them. + +This kit is deliberately structured so you can walk into the interview and, for **any** +competency on the matrix, point to (a) a real system we run, (b) an architecture diagram, +(c) a STAR story, and (d) a credible "here's how I'd take it to enterprise scale" answer. + +--- + +## How to use this kit + +| If you have… | Read | +|---|---| +| 60 minutes the night before | `06-glossary-quickref.md` then this README's matrix | +| A full prep day | All docs in order 01 → 06 | +| A whiteboard / panel round | `01-ecosystem-rag-fabric.md` + `05-banking-blueprints.md` | +| A behavioral / leadership round | `03-star-interview-bank.md` | +| A "what would you build here" round | `04-enhancement-roadmap.md` | + +### Documents + +1. **[01-ecosystem-rag-fabric.md](01-ecosystem-rag-fabric.md)** — The ByteLyst ecosystem re-drawn as an agentic RAG retrieval fabric. Context, container, retrieval-pipeline, multi-agent topology, MCP Zero-Trust, and governance diagrams. +2. **[02-competency-deepdives.md](02-competency-deepdives.md)** — Every competency-matrix row: the concept, how it maps to our code, talking points, and honest gaps. +3. **[03-star-interview-bank.md](03-star-interview-bank.md)** — 12 STAR stories grounded in real ecosystem work (Hermes, agent-queue, mcp-server, invt_trdg AI chat, flowmonk grounding, llm-router). +4. **[04-enhancement-roadmap.md](04-enhancement-roadmap.md)** — Buildable enhancements that convert "I understand X" into "I shipped X here": pgvector hybrid retrieval, CRAG/Self-RAG loops, RAGAS eval harness, Cosmos Gremlin knowledge graph, model-card registry in Hermes. +5. **[05-banking-blueprints.md](05-banking-blueprints.md)** — Two client-ready solution blueprints (compliance-document retrieval; customer-support automation) with ADRs, SR 11-7 / EU AI Act alignment, and phased delivery. +6. **[06-glossary-quickref.md](06-glossary-quickref.md)** — Rapid-fire definitions and crisp answers: RAPTOR, HyDE, CRAG, Self-RAG, ColBERT, RAGAS metrics, SR 11-7, EU AI Act, Zero Trust for agents. + +--- + +## The role in one paragraph + +Design, build, and tune **enterprise-grade RAG systems that power agentic applications**, +fusing **structured (RDBMS / warehouse), unstructured (PDF / docs / email), and graph +(knowledge-graph / ontology)** sources into one **governed** retrieval fabric. Be the +technical authority across **financial-services** engagements; enforce **grounding, +citation, hallucination mitigation**; own **evaluation harnesses (RAGAS / TruLens / +DeepEval)**; embed **Zero Trust, access-controlled retrieval, SR 11-7 / EU AI Act** +governance; and lead **ADRs, blueprints, roadmaps** for execs and engineers. + +--- + +## Competency matrix → ByteLyst evidence + +The JD's matrix is reproduced verbatim in the left columns; the right column is **our +real anchor** in this ecosystem (where it exists today) plus a pointer to the enhancement +that hardens it. + +| Competency | Must-have | Nice-to-have | ByteLyst anchor (today → planned) | +|---|---|---|---| +| **Agentic Frameworks** | LangGraph, LangChain (prod-grade) | Google ADK, A2A, AutoGen | `agent-queue/` multi-engine runner (claude·codex·devin) is a real folder-kanban orchestration topology with state transitions (`inbox→doing→done/failed`) — a hand-rolled state machine analogous to LangGraph nodes/edges. `packages/mcp-client` + `mcp-server` (:4007) provide tool binding. → **04 §A** ports the topology onto LangGraph and adds an A2A handoff contract. | +| **RAG Architecture** | Hybrid retrieval, reranking, HyDE, Self-RAG | RAPTOR, multimodal | `packages/extraction` + `extraction-service` (:4005) parse URLs/docs into retrievable units today; `invt_trdg` AI chat already does retrieve-then-reason over structured data. → **04 §B** adds vector+BM25 hybrid, cross-encoder rerank, HyDE & CRAG loops. | +| **Structured Retrieval** | Text-to-SQL, schema-aware retrieval | Snowflake Cortex, BigQuery ML | `invt_trdg` AI chat assistant maps NL → trading actions/queries over a typed domain (quotes, plans, watchlists) — schema-aware tool-calling, the safe cousin of free Text-to-SQL. → **04 §C** adds a guarded Text-to-SQL tool with read-only views + row-level filters. | +| **Unstructured Retrieval** | PDF parsing, layout-aware chunking | Multi-modal pipelines | `packages/extraction` + `extraction-service`; `notelett` ingests structured notes for humans+agents. → **04 §B** adds PyMuPDF/Unstructured.io layout-aware chunking + OCR fallback. | +| **Graph RAG** | KG + vector hybrid | SPARQL, ontology design | We run **Azure Cosmos DB** (`packages/cosmos`); Cosmos exposes the **Gremlin** graph API. `event-store`/`events` already model entity relationships. → **04 §D** stands up a Cosmos Gremlin knowledge graph + graph-augmented retrieval. | +| **Vector Databases** | Pinecone / Weaviate / Azure AI Search | Qdrant, pgvector, multi-tenancy | Postgres is in the stack; **pgvector** is the lowest-friction path. Multi-tenant namespace isolation is already a first-class concern (per-product `productId`, two-instance Hermes Vijay/Bheem). → **04 §B** adds pgvector with per-tenant namespaces. | +| **Grounding & Eval** | RAGAS, TruLens, faithfulness SLAs | LangSmith, LLM-as-judge | `flowmonk` deliberately **bounds the AI layer to explanation/safe recommendation** over a deterministic engine — a production grounding pattern. `diagnostics-client`/`telemetry-client`/`monitoring` + Hermes dashboards are the eval-harness home. → **04 §E** wires a RAGAS/DeepEval harness + drift monitor pane in Hermes. | +| **Cloud Platform** | Azure (AI Foundry, OpenAI, Search) | AWS Bedrock, GCP Vertex | Azure Cosmos DB in prod (`_AZURE/`, `packages/cosmos`); `packages/llm-router` abstracts providers so Azure OpenAI / Bedrock / Vertex are swap-in. → **02** talks Azure AI Search as the managed hybrid index. | +| **AI Governance** | Access-controlled RAG, Zero Trust | SR 11-7, EU AI Act | `packages/auth` + `fastify-auth`, `field-encrypt`/`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant model kill), `event-store` (immutable audit). MCP tool boundaries are explicit. → **05** maps all of this to SR 11-7 + EU AI Act. | +| **Domain: Banking** | Support / compliance automation | Model risk mgmt, KYC/AML | `invt_trdg` is our regulated-industry analog (markets, trade plans, alerts, auditability). → **05** translates it into a bank customer-support + compliance-retrieval blueprint. | + +--- + +## Honest gap analysis (say this out loud — it builds trust) + +Be candid in the interview. Frame it as *"here's what's production-real, here's what's +adjacent, here's exactly how I'd close the gap."* + +```mermaid +quadrantChart + title Evidence strength vs. JD centrality + x-axis "Adjacent / planned" --> "Production-real today" + y-axis "Nice-to-have" --> "Core to the role" + quadrant-1 "Lead with these" + quadrant-2 "Build before interview if possible" + quadrant-3 "Mention, don't dwell" + quadrant-4 "Frame as quick wins" + "MCP tool boundaries": [0.82, 0.78] + "Multi-agent orchestration (agent-queue)": [0.75, 0.7] + "Access-controlled / Zero-Trust retrieval": [0.8, 0.85] + "Bounded grounding (flowmonk)": [0.78, 0.9] + "Schema-aware tool-calling (invt_trdg)": [0.72, 0.72] + "LangGraph (prod-grade)": [0.3, 0.88] + "RAGAS / TruLens eval harness": [0.25, 0.86] + "pgvector hybrid retrieval": [0.35, 0.8] + "Cosmos Gremlin Graph RAG": [0.3, 0.6] + "Google ADK / A2A": [0.2, 0.4] + "RAPTOR / HyDE / CRAG / Self-RAG": [0.28, 0.65] + "SR 11-7 / EU AI Act docs": [0.45, 0.7] +``` + +**Three sentences to own the gap:** +> "My production depth is in **agentic orchestration, MCP tool boundaries, and bounded +> grounding** — the parts that decide whether an agentic system is *safe* in a regulated +> setting. The classic LangChain/LangGraph and RAGAS surface area I've architected and +> can stand up fast; in fact I've scoped exactly that as a roadmap on our own platform. +> What I bring that's harder to hire is the **governance instinct** — designing retrieval +> so that masking, kill-switches, and audit trails are structural, not bolted on." + +--- + +## One-line elevator pitch for the role + +> *"I build agentic systems where the interesting engineering is in the **boundaries** — +> what a tool is allowed to retrieve, how a model's output is grounded and cited, and how +> every hop is audited — and I've been running a multi-product ecosystem (MCP servers, +> a multi-agent runner, provider-routed LLMs, encrypted/flagged data access) that is one +> deliberate roadmap away from being a textbook enterprise agentic-RAG fabric."*