bytelyst-devops-tools/docs/INTERVIEW/05-banking-blueprints.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

176 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 05 · Banking Solution Blueprints (client-ready)
Two end-to-end blueprints you can present to a financial-services client, in the JD's own
deliverable formats: **solution architecture + ADRs + phased roadmap + regulatory mapping**.
Both reuse the ByteLyst fabric patterns from `01-ecosystem-rag-fabric.md`.
---
# Blueprint 1 — Compliance Document Retrieval Assistant
**Use case:** compliance analysts ask natural-language questions ("What is our retention
obligation for KYC records under the latest policy?") and get a **grounded, cited** answer
drawn from regulatory filings, internal policies, and procedure manuals — or an explicit
*"insufficient evidence, escalate."*
## Architecture
```mermaid
flowchart TB
AN[👤 Compliance analyst] --> APP[Assistant UI]
APP --> ORCH
subgraph ORCH["Agentic orchestration (LangGraph)"]
R{{route}} --> RET[retrieve] --> GR{{CRAG grade}}
GR -- weak --> RW[HyDE rewrite] --> RET
GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}}
CR -- ungrounded --> RW
CR -- grounded --> OUT[answer + clause citations]
CR -- no evidence --> ESC[escalate to human]
end
subgraph RETR["Hybrid retrieval"]
VEC[(Azure AI Search<br/>vector + BM25 + semantic rerank)]
KG[(Cosmos Gremlin<br/>policy ⇄ regulation graph)]
end
RET --> VEC & KG
subgraph GOV["Governance plane"]
ACL[role-aware ACL filter]
AUD[event-store audit]
CARD[model card + decision log]
end
RET -.-> ACL
GEN -.-> AUD
OUT -.-> CARD
```
**Why these choices (headline ADRs below):** Azure AI Search gives managed hybrid +
semantic rerank inside one audit boundary; the Gremlin graph links *policies ↔ controlling
regulations* so "what regulation drives this clause" is a traversal, not a guess; the critic
+ escalate edge guarantees no confident-wrong answers on compliance questions.
## Ingestion (layout-aware, provenance-first)
```mermaid
flowchart LR
DOC[Filings · policies · procedures<br/>PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io<br/>+ OCR fallback]
PARSE --> CHUNK[layout + semantic chunking<br/>tables preserved]
CHUNK --> META[attach provenance<br/>doc·page·section·effective-date·sensitivity]
META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)]
META --> GRAPH[(extract policy↔reg edges → Gremlin)]
```
> **Effective-date metadata is a compliance requirement, not a nicety:** retrieval must be
> able to answer "as of" a date and never cite a superseded policy as current.
## Phased delivery
| Phase | Scope | Exit criteria |
|---|---|---|
| **0 · Discovery (23 wks)** | Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs | Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess) |
| **1 · PoC (46 wks)** | Hybrid retrieval over a bounded corpus, citations, abstain path | Beats keyword search on the golden set; every answer cited or escalated |
| **2 · Hardening (68 wks)** | Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor | SLAs met under eval harness; controls mapped to SR 11-7 |
| **3 · Production (ongoing)** | Model cards, audit, human-in-loop ops, change control | Audit trail reproducible; quarterly model-card review live |
---
# Blueprint 2 — Customer-Support Automation (retail banking)
**Use case:** a grounded support agent answers customer questions from product docs, fee
schedules, and account-policy content — with **strict masking of customer PII**, citations,
and instant handoff to a human for anything account-specific or low-confidence.
## Architecture
```mermaid
flowchart TB
C[👤 Customer] --> CH[Support chat]
CH --> ORCH2
subgraph ORCH2["Orchestration"]
RT{{route:<br/>info vs. account-action}}
RT -- "general info" --> RAG[grounded RAG answer]
RT -- "account-specific" --> AUTHZ{step-up auth + entitlement}
AUTHZ -- ok --> TOOL[typed account tool via MCP<br/>masked fields]
AUTHZ -- fail / sensitive --> HUMAN[human handoff]
RAG --> CONF{confidence ≥ SLA?}
CONF -- no --> HUMAN
CONF -- yes --> ANS[answer + citation]
end
subgraph GOV2["Zero-Trust + governance"]
MASK[field-encrypt column masking]
KILL[kill-switch per tool/model]
LOG[event-store audit]
end
TOOL -.-> MASK
RT -.-> KILL
ANS -.-> LOG
TOOL -.-> LOG
```
**Key design stances:**
- **Two lanes by intent.** General-info → RAG over public/internal docs. Account-specific →
typed MCP tool behind **step-up auth + entitlement check + field masking**. The model never
free-queries customer data.
- **Confidence gate → human.** Below SLA, hand off. In banking support, escalation is a
feature, not a failure.
- **PII never enters the prompt unmasked.** Masking is enforced at the MCP boundary
(`field-encrypt`), so no prompt-engineering mistake can leak it.
## Phased delivery (condensed)
1. **Discovery** — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs.
2. **PoC** — info-lane RAG with citations + handoff; no account access yet.
3. **Account lane** — MCP typed tools, step-up auth, masking, full audit.
4. **Production** — eval harness, drift monitor, model cards, change control.
---
# Cross-cutting: Regulatory control mapping
This table is gold in the room — it shows you map *architecture* to *named clauses*.
| Requirement | Source | How the architecture satisfies it |
|---|---|---|
| Model inventory & ownership | **SR 11-7** | Model-card registry (`04 §F`): every model/agent has a card with owner + purpose. |
| Independent validation | **SR 11-7 / OCC** | RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence. |
| Ongoing monitoring | **SR 11-7** | Online RAGAS scoring + factual-drift alerts in Hermes. |
| Ability to constrain a model in production | **SR 11-7** | `kill-switch-client` disables a model/tool live, audited. |
| Change control | **SR 11-7** | ADRs + CI eval gate; no deploy below faithfulness SLA. |
| Risk classification of AI system | **EU AI Act** | Blueprint declares risk tier; high-risk paths get human oversight by design. |
| Logging & traceability | **EU AI Act** | `event-store` decision log: query, sources, model, score, outcome — reproducible. |
| Human oversight | **EU AI Act** | Confidence-gate → human handoff edge in both blueprints. |
| Transparency to user | **EU AI Act** | Mandatory citations + "AI-assisted" disclosure + abstain language. |
| Right to data protection / minimization | **GDPR / CCPA** | Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks. |
| Data subject access / deletion | **GDPR / CCPA** | Provenance metadata + tenant namespaces make targeted deletion + re-index feasible. |
---
# Sample ADRs (the format they want you to produce)
### ADR-001 — Hybrid retrieval over pure-vector
- **Status:** Accepted
- **Context:** Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses.
- **Decision:** Vector ⊕ BM25 fused with RRF, then cross-encoder rerank.
- **Consequences:** +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries.
### ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data
- **Status:** Accepted
- **Context:** Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof.
- **Decision:** Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS.
- **Consequences:** Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit.
### ADR-003 — Abstain-and-escalate as a first-class outcome
- **Status:** Accepted
- **Context:** In regulated support/compliance, a confident wrong answer is the worst outcome.
- **Decision:** Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error.
- **Consequences:** Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal.
### ADR-004 — Provider-portable model layer (router seam)
- **Status:** Accepted
- **Context:** Data-residency + vendor-risk requirements vary per client.
- **Decision:** All inference behind `llm-router`; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama.
- **Consequences:** Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture.