docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Hermes VM 2026-05-31 10:48:52 +00:00
parent 1b957cf6d9
commit 076449268b
7 changed files with 1237 additions and 0 deletions

View File

@ -0,0 +1,279 @@
# 01 · The ByteLyst Ecosystem as an Agentic RAG Fabric
The trick in this interview is to stop treating ByteLyst as "a bunch of side projects"
and start describing it as **one governed retrieval fabric with multiple agentic
front-ends**. Every diagram below is something you can reproduce on a whiteboard.
---
## 1. System context — what we actually run
```mermaid
flowchart TB
subgraph Users["👤 Humans & Agents"]
U1[End users<br/>web / mobile]
U2[Coding agents<br/>claude · codex · devin]
U3[Operators<br/>Hermes Mission Control]
end
subgraph Fronts["Agentic Product Surfaces"]
P1["invt_trdg<br/>AI trading chat<br/>(tool-calling over markets)"]
P2["flowmonk<br/>planning + bounded AI layer"]
P3["notelett<br/>notes for humans + agents"]
P4["chronomind<br/>contextual time AI"]
end
subgraph Platform["common_plat — the shared fabric"]
PS["platform-service :4003<br/>auth · flags · telemetry · billing · blob"]
ES["extraction-service :4005<br/>URL / doc → retrievable units"]
MCP["mcp-server :4007<br/>tool / resource registration"]
LR["packages/llm-router<br/>provider abstraction"]
end
subgraph Data["Governed Data Sources"]
DB[("Cosmos DB<br/>docs + Gremlin graph")]
PG[("Postgres<br/>structured + pgvector*")]
EV[("event-store<br/>immutable audit")]
BLOB[("blob<br/>raw documents")]
end
subgraph Ops["Control Plane"]
HERMES["Hermes Mission Control<br/>(devops_tools/dashboard)"]
AQ["agent-queue<br/>multi-agent runner"]
end
U1 --> P1 & P2 & P3 & P4
U2 --> AQ
U3 --> HERMES
P1 & P2 & P3 & P4 --> PS
P1 & P2 & P3 & P4 --> MCP
MCP --> LR
ES --> BLOB
PS --> DB & PG & EV
MCP --> DB & PG
AQ --> MCP
HERMES --> PS
HERMES -.observes.-> ES & MCP & LR
classDef plan fill:#fef3c7,stroke:#d97706
class PG,LR plan
```
> `*` pgvector and the Gremlin graph are the planned hardening (see `04-enhancement-roadmap.md`).
> Everything else is a real, deployed component of the ecosystem.
**How to narrate it:** *"The platform-service is my policy/identity plane, the mcp-server
is my tool-boundary plane, llm-router is my model plane, and the data sources are governed
behind both. Any product surface is just a thin agentic UI over that fabric — which is
exactly the shape of an enterprise agentic-RAG platform."*
---
## 2. The reference agentic-RAG container view
This is the canonical picture the interviewer wants to see — drawn in *our* components.
```mermaid
flowchart LR
Q[User query] --> ROUTER
subgraph Orchestration["Agentic Orchestration (LangGraph-shaped)"]
ROUTER{{"Router / planner agent<br/>intent + complexity"}}
RETR["Retriever agent"]
GRADE{{"Relevance grader<br/>(CRAG gate)"}}
REWRITE["Query rewriter<br/>(HyDE)"]
GEN["Generator agent<br/>+ citation enforcer"]
CRITIC{{"Self-RAG critic<br/>groundedness check"}}
end
subgraph Retrieval["Hybrid Retrieval Fabric"]
VEC[("Vector<br/>pgvector / Azure AI Search")]
BM25[("Lexical<br/>BM25")]
GRAPH[("Graph traversal<br/>Cosmos Gremlin")]
SQL[("Structured<br/>schema-aware SQL tool")]
RERANK["Cross-encoder rerank<br/>+ context compression"]
end
subgraph Gov["Governance plane (every hop)"]
ACL["Access-controlled retrieval<br/>auth + row/col masking"]
AUDIT["event-store audit trail"]
KILL["kill-switch / flags"]
end
ROUTER --> RETR
RETR --> VEC & BM25 & GRAPH & SQL
VEC & BM25 & GRAPH & SQL --> RERANK
RERANK --> GRADE
GRADE -- "low relevance" --> REWRITE --> RETR
GRADE -- "ok" --> GEN
GEN --> CRITIC
CRITIC -- "ungrounded" --> REWRITE
CRITIC -- "grounded + cited" --> A[Answer + citations]
RETR -.enforced by.-> ACL
GEN -.logged to.-> AUDIT
ROUTER -.gated by.-> KILL
```
**Key talking points keyed to the JD:**
- *Hybrid search (vector + BM25 + graph)* → the four parallel retrievers fan-out, reranker fans-in.
- *Reranking + context compression* → the `RERANK` node (cross-encoder, e.g. ColBERT late-interaction or a bge-reranker).
- *CRAG* → the `GRADE` gate that triggers corrective re-retrieval.
- *HyDE* → the `REWRITE` node generating a hypothetical answer to embed.
- *Self-RAG* → the `CRITIC` node reflecting on groundedness before release.
- *Access-controlled retrieval / Zero Trust / audit* → the governance plane wraps **every** hop, not just the entrance.
---
## 3. Multi-agent orchestration topology (we run a real one)
`agent-queue/` is a production folder-kanban that drives **three different agent engines**
(`claude`, `codex`, `devin`) through an explicit state machine. That *is* multi-agent
orchestration — and it's the strongest "I've shipped agents" story you have.
```mermaid
stateDiagram-v2
[*] --> inbox: drop prompt .md
inbox --> doing: runner claims (auto-approve)
doing --> done: success
doing --> failed: error / timeout
failed --> inbox: requeue (human-in-loop)
done --> [*]
note right of doing
Engine selected per task:
claude · codex · devin
= heterogeneous agent pool
end note
```
Map this to LangGraph vocabulary in the room:
| agent-queue concept | LangGraph / agentic equivalent |
|---|---|
| `inbox/doing/done/failed` folders | graph **nodes** / state enum |
| runner claiming + transitioning | **conditional edges** |
| engine flag (claude/codex/devin) | **tool/agent binding** per node |
| `failed → inbox` requeue | **cyclic edge** w/ human-in-the-loop checkpoint |
| live `status`/`watch` | **state checkpointer** + observability |
> Honest framing: *"I built this deliberately framework-light to stay bash-portable and
> dependency-free. The state model is identical to LangGraph; porting it onto LangGraph's
> `StateGraph` mostly buys me typed state, built-in checkpointing, and the A2A handoff
> contract — which is exactly the enhancement I've scoped."*
---
## 4. MCP server — Zero-Trust tool boundary
This is your strongest *governance* asset and a direct hit on a Preferred Qualification
("MCP server architecture, tool/resource registration patterns, agentic security threat
modeling"). We run `mcp-server` on :4007 with `packages/mcp-client`.
```mermaid
flowchart TB
subgraph Agent["Agent (untrusted by default)"]
A[LLM reasoning loop]
end
subgraph Boundary["mcp-server :4007 — policy enforcement point"]
REG["Tool / resource registry<br/>(declared, typed, versioned)"]
AUTHZ{"AuthZ check<br/>identity + scope + role"}
MASK["Row/column masking<br/>field-encrypt"]
RATE["Rate / cost limits + kill-switch"]
LOG["Audit emit → event-store"]
end
subgraph Resources["Governed resources"]
T1[Market data tool]
T2[Doc retrieval tool]
T3[Graph query tool]
T4[Text-to-SQL tool<br/>read-only views]
end
A -- "tool call (intent)" --> REG
REG --> AUTHZ
AUTHZ -- deny --> A
AUTHZ -- allow --> MASK
MASK --> RATE
RATE --> T1 & T2 & T3 & T4
T1 & T2 & T3 & T4 --> LOG
LOG --> A
```
**Threat-model talking points** (say these — they signal seniority):
- **Confused-deputy:** the agent never holds raw credentials; the MCP server exchanges the *user's* scoped identity, so a tool can't be tricked into over-broad reads.
- **Tool-poisoning / prompt injection via retrieved content:** retrieved text is treated as data, never as instructions; the generator is sandboxed from re-invoking tools without re-passing the AuthZ gate.
- **Exfiltration:** column masking + egress logging means even a successful injection can't surface PII it wasn't entitled to.
- **Blast radius:** `kill-switch-client` lets us disable a model or a single tool instantly without redeploying — critical for SR 11-7 "ability to constrain a model in production."
---
## 5. Governance & grounding plane (the part that wins regulated deals)
```mermaid
flowchart LR
subgraph Ingest["Ingestion governance"]
CLASS["Data classification<br/>(public / internal / PII)"]
EMB["Embedding + metadata tags<br/>tenant · sensitivity · source"]
end
subgraph Query["Query-time governance"]
IDENT["Caller identity + role"]
FILTER["Namespace + ACL filter<br/>(pre-retrieval)"]
RETR2["Retrieve only entitled chunks"]
end
subgraph Answer["Answer governance"]
CITE["Mandatory citation<br/>(source attribution)"]
FAITH["Faithfulness score<br/>(RAGAS / LLM-as-judge)"]
CARD["Model card + decision log"]
end
CLASS --> EMB --> RETR2
IDENT --> FILTER --> RETR2 --> CITE --> FAITH --> CARD
FAITH -- "below SLA" --> ABSTAIN["Abstain / escalate to human"]
```
This single diagram covers four JD bullets at once: **access-controlled retrieval**,
**citation/source attribution**, **faithfulness SLAs**, and **model cards / audit**.
The `ABSTAIN` branch is the line that separates a demo from a regulated system — *"in
banking, a confident wrong answer is a worse outcome than 'I don't know, here's a human.'"*
---
## 6. Multi-tenant / namespace isolation (real concern here already)
We *already* think in tenants: every product has a `productId`, and Hermes runs **two
isolated instances (Vijay / Bheem)** with separate users, services, and backup repos. That
is the same isolation discipline a vector DB needs.
```mermaid
flowchart TB
subgraph T_A["Tenant A (productId=invt_trdg)"]
NSA["Vector namespace A"]
GA["Graph partition A"]
SA["SQL schema A (RLS)"]
end
subgraph T_B["Tenant B (productId=notelett)"]
NSB["Vector namespace B"]
GB["Graph partition B"]
SB["SQL schema B (RLS)"]
end
POLICY["platform-service<br/>tenant resolver + auth"] --> NSA & NSB & GA & GB & SA & SB
```
> *"Namespace isolation isn't a vector-DB feature I'd discover late — it's how the whole
> platform is partitioned. Pinecone namespaces / Azure AI Search index-per-tenant /
> pgvector schema-per-tenant are just the storage expression of a `productId` model I
> already run."*
---
## Cheat-sheet: which diagram answers which question
| If they ask… | Draw |
|---|---|
| "Walk me through your RAG architecture" | §2 container view |
| "How do you orchestrate multiple agents?" | §3 state machine |
| "How is this secure / Zero Trust?" | §4 MCP boundary |
| "How do you prevent hallucination in production?" | §5 governance plane (CRITIC + ABSTAIN) |
| "How do you handle multi-tenancy at scale?" | §6 isolation |
| "What does your whole platform look like?" | §1 context |

View File

@ -0,0 +1,224 @@
# 02 · Competency Deep-Dives
One section per competency-matrix row. Each gives you: **the concept** (so you sound
fluent), **our anchor** (so it's credible), **say-this** talking points, and the **honest
edge** (where to pivot to roadmap rather than overclaim).
---
## A. Agentic Frameworks — LangGraph / LangChain (must) · ADK / A2A / AutoGen (nice)
**Concept.** LangGraph models an agent as a **state graph**: typed shared state, nodes
(LLM calls / tools), and conditional + cyclic edges, with a checkpointer for durability
and human-in-the-loop. LangChain supplies the primitives (tool binding, structured output,
retrievers). ADK is Google's agent SDK; **A2A** is an open agent-to-agent interop protocol
(agent cards, task lifecycle, message/artifact exchange). AutoGen favors conversational
multi-agent loops.
**Our anchor.** `agent-queue/` is a *real* multi-engine orchestrator (claude·codex·devin)
with an explicit `inbox→doing→done/failed` state machine and requeue cycle — see
`01-ecosystem-rag-fabric.md §3`. `packages/mcp-client` + `mcp-server` provide tool binding;
`packages/llm-router` is the model-selection layer a node would call.
**Say this.**
- "LangGraph's value over a raw loop is **typed state + conditional/cyclic edges + checkpointing** — which is exactly the trio I'd want for a CRAG/Self-RAG loop that re-retrieves."
- "I treat **routing** as a first-class node: a cheap classifier decides single-shot vs. multi-hop vs. tool-only, so I'm not paying for a full agent loop on a FAQ."
- "For A2A I'd expose each of our product agents with an **agent card** (capabilities, auth, cost) so a supervisor agent can delegate — that's the natural evolution of agent-queue's engine flag."
**Honest edge.** "My shipped orchestration is framework-light by choice (bash-portable).
The LangGraph port is scoped (`04 §A`); conceptually it's a 1:1 mapping, not new ground."
---
## B. RAG Architecture — hybrid retrieval, reranking, HyDE, Self-RAG (must) · RAPTOR, multimodal (nice)
**Concept.**
- **Hybrid retrieval** = dense (vector) ⊕ sparse (BM25) ⊕ optionally graph, fused (RRF — reciprocal rank fusion) then **reranked** by a cross-encoder (scores the (query, passage) pair jointly; far more precise than bi-encoder similarity). **ColBERT** = late-interaction reranking (token-level MaxSim) — accurate and scalable.
- **Context compression** = drop/condense retrieved spans to fit the budget and reduce distraction.
- **HyDE** = embed a *hypothetical answer* the LLM drafts, not the raw question — closes the query/document vocabulary gap.
- **CRAG** = grade retrieved docs; if weak, correct (web/alt-source or rewrite) before generating.
- **Self-RAG** = the model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*, looping if not.
- **RAPTOR** = recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs (great for "summarize this 200-page filing").
**Our anchor.** `packages/extraction` + `extraction-service` (:4005) already turn URLs/docs
into retrievable units. `invt_trdg`'s AI chat is a retrieve-then-reason loop over structured
data today. The hybrid index + rerank + HyDE/CRAG/Self-RAG loop is the headline enhancement
(`04 §B`).
**Say this.**
- "I default to **hybrid + rerank** because pure-vector misses exact-match terms (a regulatory clause number, a ticker, an account ID) that BM25 nails."
- "HyDE and reranking attack **different** failures — HyDE fixes *recall* (you retrieved the wrong thing), reranking fixes *precision* (you retrieved too much). I tune them independently against context-recall vs. context-precision."
- "RAPTOR earns its cost on **long regulatory corpora** where the answer spans sections; for transactional Q&A it's overkill."
**Honest edge.** Lead with the *reasoning about when to use each*, which is the architect
signal; the implementations are well-trodden and scoped on our roadmap.
---
## C. Structured Retrieval — Text-to-SQL, schema-aware retrieval (must) · Snowflake Cortex, BigQuery ML (nice)
**Concept.** Text-to-SQL turns NL into SQL against a known schema. The enterprise risks are
**wrong joins, full-table scans, and data leakage**. Mitigations: restrict to **read-only
semantic views**, inject **schema + few-shot exemplars** (schema-aware retrieval over table/
column descriptions), validate/parse the SQL before execution, enforce **row-level security**,
and cap cost. Warehouse-native options (Snowflake Cortex, BigQuery ML) push inference next to
governed data.
**Our anchor.** `invt_trdg`'s AI chat is **schema-aware tool-calling** — NL maps to typed
operations (get quote, create trade plan, manage watchlist/alerts) over a known domain. That's
the *safe* form of Text-to-SQL: the model picks a vetted, parameterized tool rather than
emitting arbitrary SQL.
**Say this.**
- "I prefer **typed tool-calling over free Text-to-SQL** wherever the query space is bounded — it's auditable and injection-resistant. I reserve generative SQL for genuine ad-hoc analytics, behind read-only views with RLS."
- "Schema-aware retrieval is itself a RAG problem: I embed table/column docs and retrieve the *relevant* schema slice into the prompt, so the model isn't drowning in a 400-table catalog."
**Honest edge.** "Free Text-to-SQL against a warehouse I've prototyped more than shipped;
in production I've leaned on the tool-calling pattern because it's safer in regulated data."
---
## D. Unstructured Retrieval — PDF parsing, layout-aware chunking (must) · multimodal (nice)
**Concept.** Naïve "split every 1000 chars" chunking destroys meaning. **Layout-aware**
parsing (PyMuPDF, Unstructured.io) preserves headings, tables, lists, reading order; **OCR**
(Tesseract/Azure Doc Intelligence) handles scans. **Semantic chunking** splits on topic
boundaries, not byte counts. Tables and figures need special handling (serialize tables to
markdown; caption images). Each chunk carries **provenance metadata** (doc id, page, section)
— mandatory for citations.
**Our anchor.** `packages/extraction` + `extraction-service` already perform URL/task/doc
extraction; `notelett` models structured notes. Layout-aware PDF + OCR is the additive piece
(`04 §B`).
**Say this.**
- "Chunking is where most RAG quality is won or lost. I chunk on **document structure first, size second**, and I always attach page/section provenance so the answer can cite a clause, not 'a document.'"
- "For regulatory filings I keep **tables intact** and store a text serialization alongside — losing a covenant table to a mid-row split is a correctness bug, not a formatting one."
---
## E. Graph RAG — KG + vector hybrid (must) · SPARQL / ontology design (nice)
**Concept.** Vector RAG finds *similar* text; it can't answer *"which entities are connected
to X within 2 hops."* Graph RAG retrieves a **subgraph** (entities + relationships) and feeds
it as structured context — ideal for "show the ownership chain," "trace this transaction
counterparty network," KYC/AML link analysis. Patterns: extract entities/relations at ingest,
build a knowledge graph, retrieve by **graph traversal (Gremlin/Cypher) + vector seed**, then
generate over the fused context. SPARQL/ontologies add formal semantics (RDF, classes).
**Our anchor.** We run **Azure Cosmos DB** (`packages/cosmos`), which exposes the **Gremlin**
graph API — the JD lists "Azure Cosmos Gremlin" explicitly. `event-store`/`events` already
model entity relationships over time. Standing up a Gremlin KG + graph-augmented retrieval is
`04 §D`.
**Say this.**
- "I use the graph for the **'connected-to' questions** vector search can't answer, and seed traversals from vector hits — vector finds the entry node, the graph supplies the neighborhood."
- "In banking this is the **AML/KYC** sweet spot: link analysis across counterparties is inherently graph-shaped."
**Honest edge.** "Cosmos Gremlin is in our stack; the KG is a clean build on infrastructure
we already operate, not a new platform bet."
---
## F. Vector Databases — Pinecone / Weaviate / Azure AI Search (must) · Qdrant, pgvector, multi-tenancy (nice)
**Concept.** Trade-offs: **Pinecone** (managed, serverless, fast to ship), **Weaviate/Qdrant**
(open, hybrid + filtering, self-host control), **Azure AI Search** (managed hybrid: vector +
BM25 + semantic rerank in one service — the natural Azure pick), **pgvector** (lives in your
Postgres → transactional consistency, one backup story, lowest ops). **Multi-tenancy** =
namespaces (Pinecone) / index-per-tenant (Azure) / collection or payload filter (Qdrant) /
schema or row filter (pgvector). Metadata filtering is as important as the ANN index.
**Our anchor.** Postgres is in the stack → **pgvector** is our lowest-friction path and gives
transactional consistency with the source rows. Multi-tenancy is already first-class
(`productId`, two-instance Hermes) — see `01 §6`.
**Say this.**
- "On Azure I'd default to **Azure AI Search** because it gives me hybrid + semantic reranking as a managed service — fewer moving parts in a regulated audit boundary."
- "For tight transactional coupling (the vectors must stay consistent with the source-of-truth rows) I reach for **pgvector** — one database, one backup, one ACL model."
- "I pick the vector DB **last**, after I know tenant count, filter cardinality, recall target, and the audit boundary — it's a consequence of requirements, not a religion."
---
## G. Grounding & Eval — RAGAS, TruLens, faithfulness SLAs (must) · LangSmith, LLM-as-judge (nice)
**Concept.** The core metrics:
- **Faithfulness / groundedness** — is every claim supported by retrieved context? (anti-hallucination)
- **Answer relevancy** — does the answer address the question?
- **Context precision** — are the *top* retrieved chunks the relevant ones? (reranker quality)
- **Context recall** — did we retrieve *all* needed evidence? (retriever quality)
- **Answer correctness** — vs. ground truth.
**RAGAS** computes these (often LLM-as-judge). **TruLens** adds the "RAG triad" + feedback
functions + tracing. **DeepEval** is pytest-style assertions for CI. **LangSmith** = tracing/
eval ops. An **SLA** turns metrics into gates ("faithfulness ≥ 0.9 or abstain").
**Our anchor.** `flowmonk` is a *grounding pattern made flesh*: the deterministic scheduler is
the source of truth, and **the AI layer is constrained to explanation / safe recommendation**
it cannot invent a plan, only narrate one. That's hallucination mitigation by architecture.
`diagnostics-client` / `telemetry-client` / `monitoring` + Hermes dashboards are the
ready-made home for an eval harness + drift monitor (`04 §E`).
**Say this.**
- "I don't ship grounding as a vibe — I ship it as **gates**: a faithfulness threshold that triggers abstain-and-escalate, evaluated continuously, with regressions blocking deploy in CI (DeepEval)."
- "flowmonk taught me the cheapest hallucination fix: **don't let the LLM be the source of truth.** Make a deterministic engine authoritative and scope the model to explanation. In banking that's the difference between a tool and a liability."
- "Eval is a **two-loop** system: offline (golden set in CI) catches regressions pre-deploy; online (sampled production traces + LLM-judge) catches **factual drift** as corpora and models change."
---
## H. Cloud Platform — Azure (AI Foundry / OpenAI / Search) (must) · AWS Bedrock, GCP Vertex (nice)
**Concept.** Azure AI Foundry = the build/eval/deploy hub; Azure OpenAI = governed model
endpoints with content filtering and data-residency; Azure AI Search = managed hybrid index;
Cosmos DB = docs + Gremlin graph. The architect skill is **provider-portability**: abstract the
model behind a router so Azure OpenAI / Bedrock / Vertex are swappable, and keep data in the
customer's tenancy.
**Our anchor.** Azure Cosmos DB in production (`_AZURE/`, `packages/cosmos`).
`packages/llm-router` is the provider-abstraction layer — Azure OpenAI today, Bedrock/Vertex
swap-in, no app rewrite. `packages/ollama-client` covers on-prem/air-gapped inference.
**Say this.**
- "I keep a **router seam** so model choice is a config decision, not an architecture decision — and so a bank can pin everything to its own Azure tenant for residency."
- "On-prem / air-gapped is real in banking; `ollama-client` in our stack means I've thought about the **no-egress** deployment, not just the SaaS path."
---
## I. AI Governance — access-controlled RAG, Zero Trust (must) · SR 11-7, EU AI Act (nice)
**Concept.** Governance is **structural**, applied at every hop (see `01 §45`):
access-controlled retrieval (you can only retrieve what your identity entitles you to),
row/column masking, role-aware context injection, immutable audit, instant kill-switch,
model cards, RACI. Zero Trust for agents = never trust the agent's request implicitly; verify
identity + scope at the tool boundary every call.
**Our anchor.** `packages/auth` + `fastify-auth` (identity/scope), `field-encrypt` /
`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant
constraint), `event-store` (immutable audit), MCP server (explicit tool boundary). This is the
**deepest, most defensible** part of the story.
**Say this.**
- "I design retrieval so that **masking and audit are inescapable** — they live at the platform/MCP layer, so no product surface can route around them. That's what 'governance by architecture' means."
- "A kill-switch that disables a model or single tool **without a redeploy** is a hard SR 11-7 requirement: supervisors must be able to constrain a model in production immediately."
(Full regulatory mapping in `05-banking-blueprints.md`.)
---
## J. Domain: Banking — support / compliance automation (must) · model risk, KYC/AML (nice)
**Concept.** The flagship use cases: **customer-support automation** (grounded answers from
policy/product docs with citations + escalation), **compliance-document retrieval** (find the
controlling clause across filings/policies), **regulatory reporting**, **model risk management**
(SR 11-7), **KYC/AML** (entity/network link analysis → graph RAG).
**Our anchor.** `invt_trdg` is our **regulated-industry analog**: market data, trade plans,
alerts, profiles — a domain where wrong answers have consequences and auditability matters. The
patterns (typed tool-calling, audit, abstain-on-uncertainty) port directly to a bank.
**Say this.**
- "My trading product is the closest non-bank analog to a banking workload: it taught me to **default to abstain over guess** when money or compliance is on the line, and to make every answer traceable to a source."
- "KYC/AML is where my graph-RAG and my governance stories converge — link analysis on a knowledge graph, behind access-controlled, audited retrieval."
**Honest edge.** "I haven't shipped inside a chartered bank; my regulated-domain reps come from
trading and from designing for SR 11-7 / EU AI Act controls — which I can walk through concretely."

View File

@ -0,0 +1,143 @@
# 03 · STAR Interview Bank
Twelve stories, each grounded in real ByteLyst work, in **Situation · Task · Action ·
Result** form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the
**bold** line is your headline if you only get 20 seconds.
> Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server,
> llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance
> isolation). Where a story references planned work, it's labeled — present those as
> *design decisions and roadmaps you own*, not as shipped-and-measured outcomes.
---
## 1. Multi-agent orchestration without a heavy framework
**Proves:** Agentic frameworks · orchestration topology · state-machine design
- **S** — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM.
- **T** — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2.
- **A** — Designed `agent-queue` as a **folder-kanban state machine**: `inbox→doing→done/failed` with a `failed→inbox` requeue for human-in-the-loop, an engine flag binding each task to an agent, and `status`/`watch`/`logs` for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges.
- **R** — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. **The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.**
---
## 2. A Zero-Trust tool boundary for agents (MCP)
**Proves:** MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval
- **S** — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access.
- **T** — Make the tool layer a **policy enforcement point**, not a passthrough.
- **A** — Centralized tools behind `mcp-server` (:4007) with `mcp-client`: a typed/versioned tool registry, an authZ check on **every** call (identity + scope + role), column masking via `field-encrypt`, rate/cost caps with a `kill-switch`, and an audit emit to `event-store`. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration.
- **R** — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. **Governance lives in the boundary, so no product surface can route around it.**
---
## 3. Grounding by architecture, not by prompt (flowmonk)
**Proves:** Grounding · hallucination mitigation · faithfulness
- **S** — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant.
- **T** — Deliver helpful AI without letting the model be the source of truth.
- **A** — Made a **deterministic scheduler authoritative** and **constrained the AI layer to explanation, summarization, and safe recommendation only**. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail.
- **R** — The assistant is helpful *and* can't hallucinate an invalid plan into existence. **This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.**
---
## 4. Schema-aware tool-calling instead of free Text-to-SQL
**Proves:** Structured retrieval · Text-to-SQL judgment · safety
- **S**`invt_trdg` users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals.
- **T** — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL.
- **A** — Built the AI chat as **typed, parameterized tool-calling** over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool.
- **R** — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. **I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.**
---
## 5. Provider-portable model layer (llm-router)
**Proves:** Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing
- **S** — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change.
- **T** — Make model choice a config decision.
- **A** — Introduced `packages/llm-router` as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with `ollama-client` for on-prem/air-gapped inference.
- **R** — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. **Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.**
---
## 6. Multi-tenant isolation as a platform default
**Proves:** Vector DB multi-tenancy · namespace isolation · governance
- **S** — Several products share one platform; a cross-tenant data leak would be catastrophic.
- **T** — Make isolation structural, not per-feature discipline.
- **A** — Every product carries a `productId`; Hermes runs **two fully isolated instances (Vijay/Bheem)** with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant.
- **R** — Isolation is the default the whole platform is partitioned by. **When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.**
---
## 7. Unstructured ingestion pipeline (extraction-service)
**Proves:** Unstructured retrieval · ingestion · provenance
- **S** — Agents needed to answer from external documents and URLs, not just structured data.
- **T** — Turn messy unstructured sources into clean, retrievable, attributable units.
- **A** — Built `extraction-service` (:4005) + `packages/extraction` to parse URLs/docs into retrievable units; `notelett` provides a structured-notes store for human+agent content.
- **R** — A working ingestion path into the fabric. **The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'**
---
## 8. Operational observability for AI systems (Hermes)
**Proves:** Eval-harness home · drift monitoring · production ops
- **S** — Running agentic services in production with no single pane meant blind spots.
- **T** — One control plane for the agentic fabric.
- **A** — Built **Hermes Mission Control** (Next.js + Fastify) with `diagnostics-client`/`telemetry-client`/`monitoring`; the `hermes-ops` module already models both instances as the seed for real data.
- **R** — A live ops console for the ecosystem. **It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.**
---
## 9. Instant blast-radius control (kill-switch + flags)
**Proves:** Governance · Zero Trust · SR 11-7 ("constrain a model in production")
- **S** — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle.
- **T** — Decouple "turn this off" from "ship a release."
- **A** — Adopted `feature-flag-client` + `kill-switch-client` so any model or individual tool can be disabled live; combined with `event-store` audit so the action is logged.
- **R** — Sub-minute containment without a redeploy. **This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.**
---
## 10. Disaster recovery + parity discipline
**Proves:** Production rigor · regulated-grade operations
- **S** — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot.
- **T** — Drive both to parity with persistent backup, watchdog, and **tested** restore.
- **A** — Documented the gap explicitly in the v2 roadmap (`hermes_dashboard_v2_roadmap.md`) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance.
- **R** — A named, prioritized closure plan. **In regulated environments 'we have backups' is not a control until restore is *tested*; I treat untested DR as an open finding, not a checkbox.**
---
## 11. Bounded autonomy with human-in-the-loop
**Proves:** Agentic safety · orchestration · abstain-and-escalate
- **S** — Autonomous agents that never escalate will confidently do the wrong thing.
- **T** — Build escalation into the topology.
- **A** — In `agent-queue`, `failed` routes back to `inbox` for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to **abstain/escalate** (see `01 §5`).
- **R** — The system degrades to a human instead of degrading to a hallucination. **The escalation edge is the most important edge in the graph for a regulated deployment.**
---
## 12. Documentation & decision rigor as an architect
**Proves:** ADRs · blueprints · roadmaps · mentoring / CoE contribution
- **S** — A multi-product ecosystem with multiple agent engines drifts without written decisions.
- **T** — Make architecture legible to engineers and execs.
- **A** — Maintained an ADR directory, roadmaps (`hermes_*_roadmap.md`, `deployment-optimization-roadmap.md`), a repo map, and agent-facing `AGENTS.md`/`CLAUDE.md` so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator.
- **R** — New contributors (human or agent) onboard from canonical docs. **This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.**
---
## Behavioral / leadership prompts — quick frames
| Prompt | Lead with |
|---|---|
| "Tell me about a time you influenced without authority." | #12 docs/ADRs driving multi-agent consistency. |
| "A production AI system gave a wrong answer. What did you do?" | #3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch. |
| "How do you handle disagreement on architecture?" | ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing. |
| "Describe mentoring junior engineers." | The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task. |
| "Biggest technical mistake?" | Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore. |
| "Why this role / why financial services?" | Trading product taught me to engineer for *consequences*; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off. |

View File

@ -0,0 +1,191 @@
# 04 · Enhancement Roadmap — make every claim literally true
This is the "what would you build here" answer, and it doubles as a real backlog. Each
enhancement turns an *adjacent* capability into a *shipped* one on infrastructure we
already run. They're ordered so each builds on the last; the whole set is a credible
"agentic-RAG fabric, hardened" program.
> Mapping note: these slot into the existing repo conventions — new code under
> `learning_ai_common_plat/packages` + a `services/rag-service`, eval harness surfaced in
> `learning_ai_devops_tools/dashboard` (Hermes), and ADRs under
> `learning_ai_devops_tools/docs/adr/`. Cut tracker items via `scripts/tracker-seed/`.
```mermaid
flowchart LR
A["§A LangGraph port<br/>+ A2A agent cards"] --> B["§B Hybrid retrieval<br/>pgvector+BM25+rerank+HyDE/CRAG/Self-RAG"]
B --> C["§C Guarded Text-to-SQL<br/>read-only views + RLS"]
B --> D["§D Cosmos Gremlin<br/>knowledge graph + Graph RAG"]
B --> E["§E RAGAS/DeepEval harness<br/>+ drift monitor in Hermes"]
C & D --> F["§F Model-card registry<br/>+ governance pack"]
E --> F
classDef p1 fill:#dcfce7,stroke:#16a34a
classDef p2 fill:#fef9c3,stroke:#ca8a04
classDef p3 fill:#fee2e2,stroke:#dc2626
class A,B p1
class C,D,E p2
class F p3
```
| Phase | Enhancements | Why now |
|---|---|---|
| **P1 (foundation)** | §A, §B | Orchestration + retrieval are the spine; everything else attaches to them. |
| **P2 (sources + quality)** | §C, §D, §E | Add structured + graph sources and the eval loop that proves quality. |
| **P3 (governance)** | §F | Wrap the now-real fabric in the regulated-grade governance story. |
---
## §A — Port `agent-queue` topology onto LangGraph + add A2A handoff
**Goal:** make the "prod-grade LangGraph" claim literal while keeping the proven state model.
- New `packages/agent-graph`: a typed `StateGraph` with nodes `route → retrieve → grade → (rewrite) → generate → critique`, conditional + cyclic edges, and a checkpointer backed by `event-store`.
- Keep `agent-queue`'s engine-selection idea as **node-level model binding** through `llm-router`.
- Expose each product agent with an **A2A agent card** (capabilities, auth scope, cost hints) so a supervisor agent can delegate; the card is served by `mcp-server`.
```mermaid
stateDiagram-v2
[*] --> route
route --> retrieve: needs evidence
route --> generate: parametric/FAQ
retrieve --> grade
grade --> rewrite: low relevance (CRAG)
rewrite --> retrieve
grade --> generate: ok
generate --> critique
critique --> rewrite: ungrounded (Self-RAG)
critique --> [*]: grounded + cited
```
**Acceptance:** a LangGraph run with a forced low-relevance retrieval demonstrably loops
through `rewrite`; checkpoints land in `event-store`; one product reachable via A2A card.
**Effort:** M. **Risk:** low (mapping is 1:1 with today's state machine).
---
## §B — Hybrid retrieval: pgvector + BM25 + rerank + HyDE / CRAG / Self-RAG
**Goal:** turn "I understand hybrid RAG" into a running `services/rag-service`.
- **pgvector** alongside the existing Postgres → one DB, one backup, transactional consistency with source rows; **schema-per-tenant** namespaces (mirrors `productId`).
- **BM25** lexical (Postgres FTS or an OpenSearch sidecar) fused with vector via **RRF**.
- **Cross-encoder rerank** (bge-reranker or ColBERT late-interaction) on the fused candidates; **context compression** to fit budget.
- **HyDE** query rewriter node; **CRAG** relevance gate; **Self-RAG** groundedness critic (the §A nodes).
- **Layout-aware ingestion** in `extraction-service`: PyMuPDF / Unstructured.io, OCR fallback, table preservation, **page/section provenance** on every chunk.
```mermaid
flowchart LR
Q --> HYDE[HyDE rewrite] --> EMB[embed]
EMB --> VEC[(pgvector ANN)]
Q --> BM[(BM25)]
VEC & BM --> RRF[RRF fuse] --> RR[cross-encoder rerank] --> CC[context compress] --> GEN
```
**Acceptance:** hybrid beats vector-only on a golden set (context-recall ↑, context-precision ↑);
every chunk carries doc/page/section provenance; abstain fires when reranked top-score < τ.
**Effort:** L. **Risk:** medium (reranker latency budget — mitigate with rerank-top-k only).
---
## §C — Guarded Text-to-SQL tool
**Goal:** add genuine generative SQL for ad-hoc analytics without the foot-guns.
- Register a `sql-query` tool on `mcp-server` scoped to **read-only semantic views** (no base tables), with **row-level security** by tenant/role.
- **Schema-aware retrieval:** embed table/column descriptions; retrieve only the relevant schema slice into the prompt (don't dump the catalog).
- Parse + validate generated SQL (allow-list of statements, forbid cross-schema joins, enforce `LIMIT`); cost-cap and timeout.
- Audit every generated query + row count to `event-store`.
**Acceptance:** an attempt to read an unentitled column is blocked at the view/RLS layer
and logged; a malformed/oversized query is rejected pre-execution.
**Effort:** M. **Risk:** medium (this is the highest-leakage surface — keep it behind views).
---
## §D — Cosmos Gremlin knowledge graph + Graph RAG
**Goal:** answer "connected-to" questions (KYC/AML-shaped) on infra we already run.
- Use the existing **Azure Cosmos DB Gremlin** API. Entity/relation extraction at ingest (from `extraction-service` output + structured rows) builds the graph.
- **Graph-augmented retrieval:** vector hit seeds an entry node → bounded Gremlin traversal returns the subgraph → fuse subgraph + text chunks into context.
- Expose a `graph-query` tool on `mcp-server` (read-only, depth-bounded).
```mermaid
flowchart LR
Q --> V[(vector seed)] --> N[entry entity]
N --> G[(Gremlin traversal<br/>≤2 hops)]
G --> SUB[subgraph]
SUB --> FUSE[fuse w/ text chunks] --> GEN
```
**Acceptance:** a 2-hop relationship question that vector-only fails is answered correctly
with the subgraph cited; traversal depth/time bounded.
**Effort:** L. **Risk:** medium (graph modeling + traversal cost).
---
## §E — Evaluation harness + factual-drift monitor in Hermes
**Goal:** make "RAGAS / faithfulness SLAs / drift monitoring" real and visible.
- **Offline (CI):** **DeepEval** pytest-style assertions on a golden set — faithfulness, answer-relevancy, context-precision, context-recall, answer-correctness. Regression below threshold **blocks deploy**.
- **Online:** sample production traces, score with **RAGAS / LLM-as-judge**, emit metrics via `telemetry-client`.
- **Hermes pane:** a "RAG Quality" panel (extends `hermes-ops`) trending the five metrics per tenant + a **drift alert** when faithfulness/recall degrade week-over-week.
- Wire **abstain rate** and **escalation rate** as first-class SLAs.
```mermaid
flowchart TB
subgraph CI["Offline / CI (DeepEval)"]
G[golden set] --> SC1[score] --> GATE{≥ SLA?}
GATE -- no --> BLOCK[block deploy]
GATE -- yes --> SHIP[ship]
end
subgraph PROD["Online (RAGAS / judge)"]
TR[sampled traces] --> SC2[score] --> TEL[telemetry-client] --> HERMES[Hermes RAG-Quality pane]
HERMES --> DRIFT{drift?} -- yes --> ALERT[alert + open finding]
end
```
**Acceptance:** a deliberately-degraded retriever fails the CI gate; the Hermes pane shows
the five metrics per tenant and fires a drift alert on a seeded regression.
**Effort:** M. **Risk:** low-medium (judge cost — sample, don't score 100%).
---
## §F — Model-card registry + governance pack
**Goal:** the regulated-grade documentation/audit layer (SR 11-7 / EU AI Act ready).
- **Model-card registry** (a `governance` package + Hermes pane): per deployed model/agent — purpose, data sources, eval scores, known limits, owner, last-reviewed date, kill-switch link.
- **Decision log:** every generation's (query, retrieved sources, model, faithfulness score, abstain/answer) to `event-store` → reproducible audit trail.
- **RACI doc** template per engagement; **ADR** set under `docs/adr/` for each architectural choice.
- Map controls to **SR 11-7** (model inventory, validation, monitoring, change control) and **EU AI Act** (risk classification, logging, human oversight, transparency) — see `05-banking-blueprints.md`.
**Acceptance:** every production model has a card with current eval scores + owner; any
answer can be reconstructed from the decision log; controls trace to named regulatory clauses.
**Effort:** M. **Risk:** low (mostly assembly over existing `event-store`/flags/auth).
---
## Sequencing & "what I'd do in the first 90 days" (great closing answer)
```mermaid
gantt
title Agentic-RAG hardening — 90-day view
dateFormat X
axisFormat %s
section Foundation
§A LangGraph + A2A :a, 0, 3
§B Hybrid retrieval :b, 1, 5
section Sources & Quality
§C Guarded Text-to-SQL :c, 5, 3
§D Graph RAG (Gremlin) :d, 5, 4
§E Eval harness + drift :e, 4, 4
section Governance
§F Model cards + RACI :f, 8, 3
```
> *"In 90 days I'd stand up the retrieval spine and the eval harness first — because you
> can't tune what you can't measure — then layer structured + graph sources, and close with
> the governance pack so the whole thing is audit-ready. Notice governance isn't last
> because it's least important; it's last because by then it's mostly **assembling controls
> the platform already enforces** (auth, masking, kill-switch, audit) into cards and RACI."*

View File

@ -0,0 +1,175 @@
# 05 · Banking Solution Blueprints (client-ready)
Two end-to-end blueprints you can present to a financial-services client, in the JD's own
deliverable formats: **solution architecture + ADRs + phased roadmap + regulatory mapping**.
Both reuse the ByteLyst fabric patterns from `01-ecosystem-rag-fabric.md`.
---
# Blueprint 1 — Compliance Document Retrieval Assistant
**Use case:** compliance analysts ask natural-language questions ("What is our retention
obligation for KYC records under the latest policy?") and get a **grounded, cited** answer
drawn from regulatory filings, internal policies, and procedure manuals — or an explicit
*"insufficient evidence, escalate."*
## Architecture
```mermaid
flowchart TB
AN[👤 Compliance analyst] --> APP[Assistant UI]
APP --> ORCH
subgraph ORCH["Agentic orchestration (LangGraph)"]
R{{route}} --> RET[retrieve] --> GR{{CRAG grade}}
GR -- weak --> RW[HyDE rewrite] --> RET
GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}}
CR -- ungrounded --> RW
CR -- grounded --> OUT[answer + clause citations]
CR -- no evidence --> ESC[escalate to human]
end
subgraph RETR["Hybrid retrieval"]
VEC[(Azure AI Search<br/>vector + BM25 + semantic rerank)]
KG[(Cosmos Gremlin<br/>policy ⇄ regulation graph)]
end
RET --> VEC & KG
subgraph GOV["Governance plane"]
ACL[role-aware ACL filter]
AUD[event-store audit]
CARD[model card + decision log]
end
RET -.-> ACL
GEN -.-> AUD
OUT -.-> CARD
```
**Why these choices (headline ADRs below):** Azure AI Search gives managed hybrid +
semantic rerank inside one audit boundary; the Gremlin graph links *policies ↔ controlling
regulations* so "what regulation drives this clause" is a traversal, not a guess; the critic
+ escalate edge guarantees no confident-wrong answers on compliance questions.
## Ingestion (layout-aware, provenance-first)
```mermaid
flowchart LR
DOC[Filings · policies · procedures<br/>PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io<br/>+ OCR fallback]
PARSE --> CHUNK[layout + semantic chunking<br/>tables preserved]
CHUNK --> META[attach provenance<br/>doc·page·section·effective-date·sensitivity]
META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)]
META --> GRAPH[(extract policy↔reg edges → Gremlin)]
```
> **Effective-date metadata is a compliance requirement, not a nicety:** retrieval must be
> able to answer "as of" a date and never cite a superseded policy as current.
## Phased delivery
| Phase | Scope | Exit criteria |
|---|---|---|
| **0 · Discovery (23 wks)** | Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs | Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess) |
| **1 · PoC (46 wks)** | Hybrid retrieval over a bounded corpus, citations, abstain path | Beats keyword search on the golden set; every answer cited or escalated |
| **2 · Hardening (68 wks)** | Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor | SLAs met under eval harness; controls mapped to SR 11-7 |
| **3 · Production (ongoing)** | Model cards, audit, human-in-loop ops, change control | Audit trail reproducible; quarterly model-card review live |
---
# Blueprint 2 — Customer-Support Automation (retail banking)
**Use case:** a grounded support agent answers customer questions from product docs, fee
schedules, and account-policy content — with **strict masking of customer PII**, citations,
and instant handoff to a human for anything account-specific or low-confidence.
## Architecture
```mermaid
flowchart TB
C[👤 Customer] --> CH[Support chat]
CH --> ORCH2
subgraph ORCH2["Orchestration"]
RT{{route:<br/>info vs. account-action}}
RT -- "general info" --> RAG[grounded RAG answer]
RT -- "account-specific" --> AUTHZ{step-up auth + entitlement}
AUTHZ -- ok --> TOOL[typed account tool via MCP<br/>masked fields]
AUTHZ -- fail / sensitive --> HUMAN[human handoff]
RAG --> CONF{confidence ≥ SLA?}
CONF -- no --> HUMAN
CONF -- yes --> ANS[answer + citation]
end
subgraph GOV2["Zero-Trust + governance"]
MASK[field-encrypt column masking]
KILL[kill-switch per tool/model]
LOG[event-store audit]
end
TOOL -.-> MASK
RT -.-> KILL
ANS -.-> LOG
TOOL -.-> LOG
```
**Key design stances:**
- **Two lanes by intent.** General-info → RAG over public/internal docs. Account-specific →
typed MCP tool behind **step-up auth + entitlement check + field masking**. The model never
free-queries customer data.
- **Confidence gate → human.** Below SLA, hand off. In banking support, escalation is a
feature, not a failure.
- **PII never enters the prompt unmasked.** Masking is enforced at the MCP boundary
(`field-encrypt`), so no prompt-engineering mistake can leak it.
## Phased delivery (condensed)
1. **Discovery** — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs.
2. **PoC** — info-lane RAG with citations + handoff; no account access yet.
3. **Account lane** — MCP typed tools, step-up auth, masking, full audit.
4. **Production** — eval harness, drift monitor, model cards, change control.
---
# Cross-cutting: Regulatory control mapping
This table is gold in the room — it shows you map *architecture* to *named clauses*.
| Requirement | Source | How the architecture satisfies it |
|---|---|---|
| Model inventory & ownership | **SR 11-7** | Model-card registry (`04 §F`): every model/agent has a card with owner + purpose. |
| Independent validation | **SR 11-7 / OCC** | RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence. |
| Ongoing monitoring | **SR 11-7** | Online RAGAS scoring + factual-drift alerts in Hermes. |
| Ability to constrain a model in production | **SR 11-7** | `kill-switch-client` disables a model/tool live, audited. |
| Change control | **SR 11-7** | ADRs + CI eval gate; no deploy below faithfulness SLA. |
| Risk classification of AI system | **EU AI Act** | Blueprint declares risk tier; high-risk paths get human oversight by design. |
| Logging & traceability | **EU AI Act** | `event-store` decision log: query, sources, model, score, outcome — reproducible. |
| Human oversight | **EU AI Act** | Confidence-gate → human handoff edge in both blueprints. |
| Transparency to user | **EU AI Act** | Mandatory citations + "AI-assisted" disclosure + abstain language. |
| Right to data protection / minimization | **GDPR / CCPA** | Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks. |
| Data subject access / deletion | **GDPR / CCPA** | Provenance metadata + tenant namespaces make targeted deletion + re-index feasible. |
---
# Sample ADRs (the format they want you to produce)
### ADR-001 — Hybrid retrieval over pure-vector
- **Status:** Accepted
- **Context:** Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses.
- **Decision:** Vector ⊕ BM25 fused with RRF, then cross-encoder rerank.
- **Consequences:** +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries.
### ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data
- **Status:** Accepted
- **Context:** Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof.
- **Decision:** Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS.
- **Consequences:** Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit.
### ADR-003 — Abstain-and-escalate as a first-class outcome
- **Status:** Accepted
- **Context:** In regulated support/compliance, a confident wrong answer is the worst outcome.
- **Decision:** Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error.
- **Consequences:** Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal.
### ADR-004 — Provider-portable model layer (router seam)
- **Status:** Accepted
- **Context:** Data-residency + vendor-risk requirements vary per client.
- **Decision:** All inference behind `llm-router`; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama.
- **Consequences:** Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture.

View File

@ -0,0 +1,112 @@
# 06 · Glossary & Rapid-Fire Quick-Reference
The night-before doc. Crisp definitions + one-liners you can fire back. If you can say the
**bold** line cleanly for each, you sound fluent.
---
## Advanced RAG techniques
| Term | What it is | One-liner |
|---|---|---|
| **HyDE** | Hypothetical Document Embeddings — LLM drafts a hypothetical answer; you embed *that* and retrieve against it. | **"Fixes recall by closing the question↔document vocabulary gap."** |
| **CRAG** | Corrective RAG — grade retrieved docs; if weak, correct (re-retrieve / alt source / rewrite) before generating. | **"A relevance gate that re-retrieves instead of generating from junk."** |
| **Self-RAG** | Model emits reflection tokens deciding *whether to retrieve* and *whether its draft is supported*; loops if not. | **"The model critiques its own groundedness before answering."** |
| **RAPTOR** | Recursively cluster + summarize chunks into a tree; retrieve at the abstraction level the query needs. | **"Multi-resolution retrieval for long corpora — summary nodes for broad Qs, leaves for specifics."** |
| **Reranking** | Cross-encoder scores (query, passage) jointly after first-stage retrieval — far more precise than bi-encoder similarity. | **"Fixes precision; I rerank only the top-k to control latency."** |
| **ColBERT** | Late-interaction reranking — token-level MaxSim. Accurate *and* scalable. | **"Token-level matching without re-encoding the whole pair per query."** |
| **Context compression** | Drop/condense retrieved spans to fit budget and cut distraction. | **"Less, more-relevant context beats more context."** |
| **Hybrid search** | Dense (vector) ⊕ sparse (BM25) ⊕ graph, fused via **RRF**. | **"Vector for meaning, BM25 for exact terms, graph for relationships."** |
| **RRF** | Reciprocal Rank Fusion — combine rankings by `Σ 1/(k+rank)`; no score calibration needed. | **"Tuning-free way to fuse vector + lexical ranks."** |
| **Semantic chunking** | Split on topic/structure boundaries, not byte counts. | **"Chunk on document structure first, size second."** |
| **Agentic RAG** | An agent *decides* when/what/how to retrieve, can use multiple tools and loop. | **"RAG as a control flow, not a single hop."** |
---
## Evaluation metrics (RAGAS vocabulary)
| Metric | Question it answers | What it isolates |
|---|---|---|
| **Faithfulness / groundedness** | Is every claim supported by retrieved context? | Hallucination |
| **Answer relevancy** | Does the answer address the question? | Generation focus |
| **Context precision** | Are the *top* retrieved chunks the relevant ones? | Reranker quality |
| **Context recall** | Did we retrieve *all* needed evidence? | Retriever quality |
| **Answer correctness** | Right vs. ground truth? | End-to-end |
- **RAGAS** — library that computes these (often LLM-as-judge).
- **TruLens** — RAG-triad + feedback functions + tracing.
- **DeepEval** — pytest-style assertions → CI gates.
- **LangSmith** — tracing + eval ops for LangChain/LangGraph.
> **Diagnostic move to say out loud:** *"Low context-recall → fix the **retriever** (HyDE,
> hybrid, chunking). High recall but low context-precision → fix the **reranker**. Good
> context but low faithfulness → fix the **generator/prompt** or **abstain**."*
---
## Agentic frameworks
| Term | Essence |
|---|---|
| **LangChain** | Primitives: tool binding, structured output, retrievers, chains. |
| **LangGraph** | Agents as **state graphs**: typed state, nodes, conditional + cyclic edges, checkpointer, human-in-loop. |
| **Google ADK** | Google's Agent Development Kit for building/deploying agents. |
| **A2A** | Agent-to-Agent protocol: agent cards (capabilities/auth), task lifecycle, message/artifact exchange — agent interop. |
| **AutoGen** | Conversational multi-agent loops (agents talk to each other). |
| **MCP** | Model Context Protocol: standard for exposing **tools/resources** to models via a server, with typed registration. |
---
## Governance & regulatory
| Term | What to say |
|---|---|
| **SR 11-7** | US Fed/OCC **model risk management** guidance: model inventory, independent validation, ongoing monitoring, change control, ability to constrain a model. **"My eval harness + model cards + kill-switch + decision log map directly to its pillars."** |
| **OCC model risk** | Aligned with SR 11-7; emphasizes governance + effective challenge. |
| **EU AI Act** | Risk-tiered AI regulation: classification, logging/traceability, human oversight, transparency for high-risk systems. **"High-risk paths get human-in-the-loop and full decision logging by design."** |
| **GDPR / CCPA** | Data protection/minimization, subject access/deletion. **"Field masking + provenance + tenant namespaces make minimization and targeted deletion structural."** |
| **Zero Trust (for agents)** | Never trust the agent's request implicitly; verify identity + scope at the tool boundary **every** call. **"The MCP server is my policy enforcement point."** |
| **Access-controlled retrieval** | Retrieve only chunks the caller's identity/role entitles them to (pre-retrieval ACL filter). |
| **Row/column masking** | Mask sensitive fields at the boundary regardless of query. |
| **Model card** | Per-model doc: purpose, data, eval scores, limits, owner, review date, kill-switch. |
| **RACI** | Responsible/Accountable/Consulted/Informed matrix per component — governance ownership made explicit. |
---
## ByteLyst anchor cheat-sheet (so you never blank on "where have you done this")
| JD theme | Say this anchor |
|---|---|
| Multi-agent orchestration | **`agent-queue`** — claude/codex/devin, `inbox→doing→done/failed` state machine. |
| MCP / Zero-Trust tool boundary | **`mcp-server` :4007 + `mcp-client`**, authZ per call, masking, kill-switch, audit. |
| Provider-portable models | **`llm-router`** (Azure OpenAI / Bedrock / Vertex / Ollama). |
| Grounding by architecture | **`flowmonk`** — deterministic engine authoritative, AI = explanation/safe-reco only. |
| Schema-aware structured retrieval | **`invt_trdg` AI chat** — typed tool-calling over markets, not free SQL. |
| Unstructured ingestion | **`extraction-service` :4005 + `packages/extraction`**; `notelett` store. |
| Graph (Cosmos Gremlin) | **`packages/cosmos`** — Cosmos DB Gremlin API in prod. |
| Vector / multi-tenancy | **pgvector path** + `productId` / two-instance Hermes isolation. |
| Eval / ops console | **Hermes Mission Control** + `telemetry`/`diagnostics`/`monitoring`. |
| Governance primitives | **`auth`/`fastify-auth`, `field-encrypt`, `feature-flag`/`kill-switch`, `event-store`**. |
| Banking domain analog | **`invt_trdg`** — regulated-consequence domain; abstain-over-guess discipline. |
---
## Likely curveballs + crisp answers
- **"How do you stop prompt injection from retrieved docs?"** → "Treat retrieved text as *data*, never instructions; the generator can't re-invoke tools without re-passing the MCP authZ gate; egress is masked + logged so even a successful injection can't exfiltrate unentitled fields."
- **"Vector DB choice?"** → "Decided last, from requirements. Azure → Azure AI Search (managed hybrid+rerank, one audit boundary). Tight transactional coupling → pgvector. I don't pick a vector DB before I know tenant count, filter cardinality, recall target, and the audit boundary."
- **"How do you measure if RAG is 'good enough' for prod?"** → "SLAs as gates: faithfulness ≥ 0.9, citation 100%, abstain instead of guess; DeepEval blocks deploy below threshold; online RAGAS + drift alerts catch degradation."
- **"Free Text-to-SQL — yes or no?"** → "Bounded domains → typed tool-calling (auditable, inject-resistant). Genuine ad-hoc analytics → generative SQL behind read-only views with RLS, validated + cost-capped. Never raw SQL on base tables."
- **"Latency vs. accuracy trade-off?"** → "Route by query: FAQ/parametric skips retrieval; only complex queries pay for hybrid + rerank + critic loop. Rerank top-k only. The critic loop is bounded to N iterations then abstains."
- **"How is this different from a chatbot demo?"** → "The boundaries: access-controlled retrieval, mandatory citation, abstain-and-escalate, masking, kill-switch, and a reproducible decision log. A demo answers; a regulated system can *prove* why it answered and refuse when it shouldn't."
- **"What worries you most in agentic RAG for banking?"** → "Silent factual drift and over-broad tool scope. I counter drift with online eval + alerts, and scope with typed MCP tools + least-privilege entitlement at every call."
---
## 30-second close
> *"I build agentic systems where the hard engineering is in the boundaries — what a tool
> can retrieve, how output is grounded and cited, when to abstain, and how every hop is
> audited. I've been running an ecosystem (MCP servers, a multi-agent runner, provider-
> routed LLMs, encrypted/flagged/audited data access) that's one deliberate roadmap away
> from a textbook enterprise agentic-RAG fabric — and I've already written that roadmap."*

113
docs/INTERVIEW/README.md Normal file
View File

@ -0,0 +1,113 @@
# Senior Agentic RAG Architect — Interview Prep Kit
> Target role: **Senior Agentic RAG Architect — TEKsystems Global Services, Product Engineering Group**
> Candidate anchor: the **ByteLyst ecosystem** (this monorepo workspace).
> Purpose: turn what we already run in production into a defensible, evidence-backed
> narrative for every line of the job description — plus a concrete roadmap of
> enhancements that make each claim *literally true* if we choose to build them.
This kit is deliberately structured so you can walk into the interview and, for **any**
competency on the matrix, point to (a) a real system we run, (b) an architecture diagram,
(c) a STAR story, and (d) a credible "here's how I'd take it to enterprise scale" answer.
---
## How to use this kit
| If you have… | Read |
|---|---|
| 60 minutes the night before | `06-glossary-quickref.md` then this README's matrix |
| A full prep day | All docs in order 01 → 06 |
| A whiteboard / panel round | `01-ecosystem-rag-fabric.md` + `05-banking-blueprints.md` |
| A behavioral / leadership round | `03-star-interview-bank.md` |
| A "what would you build here" round | `04-enhancement-roadmap.md` |
### Documents
1. **[01-ecosystem-rag-fabric.md](01-ecosystem-rag-fabric.md)** — The ByteLyst ecosystem re-drawn as an agentic RAG retrieval fabric. Context, container, retrieval-pipeline, multi-agent topology, MCP Zero-Trust, and governance diagrams.
2. **[02-competency-deepdives.md](02-competency-deepdives.md)** — Every competency-matrix row: the concept, how it maps to our code, talking points, and honest gaps.
3. **[03-star-interview-bank.md](03-star-interview-bank.md)** — 12 STAR stories grounded in real ecosystem work (Hermes, agent-queue, mcp-server, invt_trdg AI chat, flowmonk grounding, llm-router).
4. **[04-enhancement-roadmap.md](04-enhancement-roadmap.md)** — Buildable enhancements that convert "I understand X" into "I shipped X here": pgvector hybrid retrieval, CRAG/Self-RAG loops, RAGAS eval harness, Cosmos Gremlin knowledge graph, model-card registry in Hermes.
5. **[05-banking-blueprints.md](05-banking-blueprints.md)** — Two client-ready solution blueprints (compliance-document retrieval; customer-support automation) with ADRs, SR 11-7 / EU AI Act alignment, and phased delivery.
6. **[06-glossary-quickref.md](06-glossary-quickref.md)** — Rapid-fire definitions and crisp answers: RAPTOR, HyDE, CRAG, Self-RAG, ColBERT, RAGAS metrics, SR 11-7, EU AI Act, Zero Trust for agents.
---
## The role in one paragraph
Design, build, and tune **enterprise-grade RAG systems that power agentic applications**,
fusing **structured (RDBMS / warehouse), unstructured (PDF / docs / email), and graph
(knowledge-graph / ontology)** sources into one **governed** retrieval fabric. Be the
technical authority across **financial-services** engagements; enforce **grounding,
citation, hallucination mitigation**; own **evaluation harnesses (RAGAS / TruLens /
DeepEval)**; embed **Zero Trust, access-controlled retrieval, SR 11-7 / EU AI Act**
governance; and lead **ADRs, blueprints, roadmaps** for execs and engineers.
---
## Competency matrix → ByteLyst evidence
The JD's matrix is reproduced verbatim in the left columns; the right column is **our
real anchor** in this ecosystem (where it exists today) plus a pointer to the enhancement
that hardens it.
| Competency | Must-have | Nice-to-have | ByteLyst anchor (today → planned) |
|---|---|---|---|
| **Agentic Frameworks** | LangGraph, LangChain (prod-grade) | Google ADK, A2A, AutoGen | `agent-queue/` multi-engine runner (claude·codex·devin) is a real folder-kanban orchestration topology with state transitions (`inbox→doing→done/failed`) — a hand-rolled state machine analogous to LangGraph nodes/edges. `packages/mcp-client` + `mcp-server` (:4007) provide tool binding. → **04 §A** ports the topology onto LangGraph and adds an A2A handoff contract. |
| **RAG Architecture** | Hybrid retrieval, reranking, HyDE, Self-RAG | RAPTOR, multimodal | `packages/extraction` + `extraction-service` (:4005) parse URLs/docs into retrievable units today; `invt_trdg` AI chat already does retrieve-then-reason over structured data. → **04 §B** adds vector+BM25 hybrid, cross-encoder rerank, HyDE & CRAG loops. |
| **Structured Retrieval** | Text-to-SQL, schema-aware retrieval | Snowflake Cortex, BigQuery ML | `invt_trdg` AI chat assistant maps NL → trading actions/queries over a typed domain (quotes, plans, watchlists) — schema-aware tool-calling, the safe cousin of free Text-to-SQL. → **04 §C** adds a guarded Text-to-SQL tool with read-only views + row-level filters. |
| **Unstructured Retrieval** | PDF parsing, layout-aware chunking | Multi-modal pipelines | `packages/extraction` + `extraction-service`; `notelett` ingests structured notes for humans+agents. → **04 §B** adds PyMuPDF/Unstructured.io layout-aware chunking + OCR fallback. |
| **Graph RAG** | KG + vector hybrid | SPARQL, ontology design | We run **Azure Cosmos DB** (`packages/cosmos`); Cosmos exposes the **Gremlin** graph API. `event-store`/`events` already model entity relationships. → **04 §D** stands up a Cosmos Gremlin knowledge graph + graph-augmented retrieval. |
| **Vector Databases** | Pinecone / Weaviate / Azure AI Search | Qdrant, pgvector, multi-tenancy | Postgres is in the stack; **pgvector** is the lowest-friction path. Multi-tenant namespace isolation is already a first-class concern (per-product `productId`, two-instance Hermes Vijay/Bheem). → **04 §B** adds pgvector with per-tenant namespaces. |
| **Grounding & Eval** | RAGAS, TruLens, faithfulness SLAs | LangSmith, LLM-as-judge | `flowmonk` deliberately **bounds the AI layer to explanation/safe recommendation** over a deterministic engine — a production grounding pattern. `diagnostics-client`/`telemetry-client`/`monitoring` + Hermes dashboards are the eval-harness home. → **04 §E** wires a RAGAS/DeepEval harness + drift monitor pane in Hermes. |
| **Cloud Platform** | Azure (AI Foundry, OpenAI, Search) | AWS Bedrock, GCP Vertex | Azure Cosmos DB in prod (`_AZURE/`, `packages/cosmos`); `packages/llm-router` abstracts providers so Azure OpenAI / Bedrock / Vertex are swap-in. → **02** talks Azure AI Search as the managed hybrid index. |
| **AI Governance** | Access-controlled RAG, Zero Trust | SR 11-7, EU AI Act | `packages/auth` + `fastify-auth`, `field-encrypt`/`client-encrypt` (column/field masking), `feature-flag-client` + `kill-switch-client` (instant model kill), `event-store` (immutable audit). MCP tool boundaries are explicit. → **05** maps all of this to SR 11-7 + EU AI Act. |
| **Domain: Banking** | Support / compliance automation | Model risk mgmt, KYC/AML | `invt_trdg` is our regulated-industry analog (markets, trade plans, alerts, auditability). → **05** translates it into a bank customer-support + compliance-retrieval blueprint. |
---
## Honest gap analysis (say this out loud — it builds trust)
Be candid in the interview. Frame it as *"here's what's production-real, here's what's
adjacent, here's exactly how I'd close the gap."*
```mermaid
quadrantChart
title Evidence strength vs. JD centrality
x-axis "Adjacent / planned" --> "Production-real today"
y-axis "Nice-to-have" --> "Core to the role"
quadrant-1 "Lead with these"
quadrant-2 "Build before interview if possible"
quadrant-3 "Mention, don't dwell"
quadrant-4 "Frame as quick wins"
"MCP tool boundaries": [0.82, 0.78]
"Multi-agent orchestration (agent-queue)": [0.75, 0.7]
"Access-controlled / Zero-Trust retrieval": [0.8, 0.85]
"Bounded grounding (flowmonk)": [0.78, 0.9]
"Schema-aware tool-calling (invt_trdg)": [0.72, 0.72]
"LangGraph (prod-grade)": [0.3, 0.88]
"RAGAS / TruLens eval harness": [0.25, 0.86]
"pgvector hybrid retrieval": [0.35, 0.8]
"Cosmos Gremlin Graph RAG": [0.3, 0.6]
"Google ADK / A2A": [0.2, 0.4]
"RAPTOR / HyDE / CRAG / Self-RAG": [0.28, 0.65]
"SR 11-7 / EU AI Act docs": [0.45, 0.7]
```
**Three sentences to own the gap:**
> "My production depth is in **agentic orchestration, MCP tool boundaries, and bounded
> grounding** — the parts that decide whether an agentic system is *safe* in a regulated
> setting. The classic LangChain/LangGraph and RAGAS surface area I've architected and
> can stand up fast; in fact I've scoped exactly that as a roadmap on our own platform.
> What I bring that's harder to hire is the **governance instinct** — designing retrieval
> so that masking, kill-switches, and audit trails are structural, not bolted on."
---
## One-line elevator pitch for the role
> *"I build agentic systems where the interesting engineering is in the **boundaries** —
> what a tool is allowed to retrieve, how a model's output is grounded and cited, and how
> every hop is audited — and I've been running a multi-product ecosystem (MCP servers,
> a multi-agent runner, provider-routed LLMs, encrypted/flagged data access) that is one
> deliberate roadmap away from being a textbook enterprise agentic-RAG fabric."*