7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
176 lines
8.4 KiB
Markdown
176 lines
8.4 KiB
Markdown
# 05 · Banking Solution Blueprints (client-ready)
|
||
|
||
Two end-to-end blueprints you can present to a financial-services client, in the JD's own
|
||
deliverable formats: **solution architecture + ADRs + phased roadmap + regulatory mapping**.
|
||
Both reuse the ByteLyst fabric patterns from `01-ecosystem-rag-fabric.md`.
|
||
|
||
---
|
||
|
||
# Blueprint 1 — Compliance Document Retrieval Assistant
|
||
|
||
**Use case:** compliance analysts ask natural-language questions ("What is our retention
|
||
obligation for KYC records under the latest policy?") and get a **grounded, cited** answer
|
||
drawn from regulatory filings, internal policies, and procedure manuals — or an explicit
|
||
*"insufficient evidence, escalate."*
|
||
|
||
## Architecture
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
AN[👤 Compliance analyst] --> APP[Assistant UI]
|
||
APP --> ORCH
|
||
|
||
subgraph ORCH["Agentic orchestration (LangGraph)"]
|
||
R{{route}} --> RET[retrieve] --> GR{{CRAG grade}}
|
||
GR -- weak --> RW[HyDE rewrite] --> RET
|
||
GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}}
|
||
CR -- ungrounded --> RW
|
||
CR -- grounded --> OUT[answer + clause citations]
|
||
CR -- no evidence --> ESC[escalate to human]
|
||
end
|
||
|
||
subgraph RETR["Hybrid retrieval"]
|
||
VEC[(Azure AI Search<br/>vector + BM25 + semantic rerank)]
|
||
KG[(Cosmos Gremlin<br/>policy ⇄ regulation graph)]
|
||
end
|
||
RET --> VEC & KG
|
||
|
||
subgraph GOV["Governance plane"]
|
||
ACL[role-aware ACL filter]
|
||
AUD[event-store audit]
|
||
CARD[model card + decision log]
|
||
end
|
||
RET -.-> ACL
|
||
GEN -.-> AUD
|
||
OUT -.-> CARD
|
||
```
|
||
|
||
**Why these choices (headline ADRs below):** Azure AI Search gives managed hybrid +
|
||
semantic rerank inside one audit boundary; the Gremlin graph links *policies ↔ controlling
|
||
regulations* so "what regulation drives this clause" is a traversal, not a guess; the critic
|
||
+ escalate edge guarantees no confident-wrong answers on compliance questions.
|
||
|
||
## Ingestion (layout-aware, provenance-first)
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
DOC[Filings · policies · procedures<br/>PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io<br/>+ OCR fallback]
|
||
PARSE --> CHUNK[layout + semantic chunking<br/>tables preserved]
|
||
CHUNK --> META[attach provenance<br/>doc·page·section·effective-date·sensitivity]
|
||
META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)]
|
||
META --> GRAPH[(extract policy↔reg edges → Gremlin)]
|
||
```
|
||
|
||
> **Effective-date metadata is a compliance requirement, not a nicety:** retrieval must be
|
||
> able to answer "as of" a date and never cite a superseded policy as current.
|
||
|
||
## Phased delivery
|
||
|
||
| Phase | Scope | Exit criteria |
|
||
|---|---|---|
|
||
| **0 · Discovery (2–3 wks)** | Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs | Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess) |
|
||
| **1 · PoC (4–6 wks)** | Hybrid retrieval over a bounded corpus, citations, abstain path | Beats keyword search on the golden set; every answer cited or escalated |
|
||
| **2 · Hardening (6–8 wks)** | Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor | SLAs met under eval harness; controls mapped to SR 11-7 |
|
||
| **3 · Production (ongoing)** | Model cards, audit, human-in-loop ops, change control | Audit trail reproducible; quarterly model-card review live |
|
||
|
||
---
|
||
|
||
# Blueprint 2 — Customer-Support Automation (retail banking)
|
||
|
||
**Use case:** a grounded support agent answers customer questions from product docs, fee
|
||
schedules, and account-policy content — with **strict masking of customer PII**, citations,
|
||
and instant handoff to a human for anything account-specific or low-confidence.
|
||
|
||
## Architecture
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
C[👤 Customer] --> CH[Support chat]
|
||
CH --> ORCH2
|
||
|
||
subgraph ORCH2["Orchestration"]
|
||
RT{{route:<br/>info vs. account-action}}
|
||
RT -- "general info" --> RAG[grounded RAG answer]
|
||
RT -- "account-specific" --> AUTHZ{step-up auth + entitlement}
|
||
AUTHZ -- ok --> TOOL[typed account tool via MCP<br/>masked fields]
|
||
AUTHZ -- fail / sensitive --> HUMAN[human handoff]
|
||
RAG --> CONF{confidence ≥ SLA?}
|
||
CONF -- no --> HUMAN
|
||
CONF -- yes --> ANS[answer + citation]
|
||
end
|
||
|
||
subgraph GOV2["Zero-Trust + governance"]
|
||
MASK[field-encrypt column masking]
|
||
KILL[kill-switch per tool/model]
|
||
LOG[event-store audit]
|
||
end
|
||
TOOL -.-> MASK
|
||
RT -.-> KILL
|
||
ANS -.-> LOG
|
||
TOOL -.-> LOG
|
||
```
|
||
|
||
**Key design stances:**
|
||
- **Two lanes by intent.** General-info → RAG over public/internal docs. Account-specific →
|
||
typed MCP tool behind **step-up auth + entitlement check + field masking**. The model never
|
||
free-queries customer data.
|
||
- **Confidence gate → human.** Below SLA, hand off. In banking support, escalation is a
|
||
feature, not a failure.
|
||
- **PII never enters the prompt unmasked.** Masking is enforced at the MCP boundary
|
||
(`field-encrypt`), so no prompt-engineering mistake can leak it.
|
||
|
||
## Phased delivery (condensed)
|
||
|
||
1. **Discovery** — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs.
|
||
2. **PoC** — info-lane RAG with citations + handoff; no account access yet.
|
||
3. **Account lane** — MCP typed tools, step-up auth, masking, full audit.
|
||
4. **Production** — eval harness, drift monitor, model cards, change control.
|
||
|
||
---
|
||
|
||
# Cross-cutting: Regulatory control mapping
|
||
|
||
This table is gold in the room — it shows you map *architecture* to *named clauses*.
|
||
|
||
| Requirement | Source | How the architecture satisfies it |
|
||
|---|---|---|
|
||
| Model inventory & ownership | **SR 11-7** | Model-card registry (`04 §F`): every model/agent has a card with owner + purpose. |
|
||
| Independent validation | **SR 11-7 / OCC** | RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence. |
|
||
| Ongoing monitoring | **SR 11-7** | Online RAGAS scoring + factual-drift alerts in Hermes. |
|
||
| Ability to constrain a model in production | **SR 11-7** | `kill-switch-client` disables a model/tool live, audited. |
|
||
| Change control | **SR 11-7** | ADRs + CI eval gate; no deploy below faithfulness SLA. |
|
||
| Risk classification of AI system | **EU AI Act** | Blueprint declares risk tier; high-risk paths get human oversight by design. |
|
||
| Logging & traceability | **EU AI Act** | `event-store` decision log: query, sources, model, score, outcome — reproducible. |
|
||
| Human oversight | **EU AI Act** | Confidence-gate → human handoff edge in both blueprints. |
|
||
| Transparency to user | **EU AI Act** | Mandatory citations + "AI-assisted" disclosure + abstain language. |
|
||
| Right to data protection / minimization | **GDPR / CCPA** | Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks. |
|
||
| Data subject access / deletion | **GDPR / CCPA** | Provenance metadata + tenant namespaces make targeted deletion + re-index feasible. |
|
||
|
||
---
|
||
|
||
# Sample ADRs (the format they want you to produce)
|
||
|
||
### ADR-001 — Hybrid retrieval over pure-vector
|
||
- **Status:** Accepted
|
||
- **Context:** Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses.
|
||
- **Decision:** Vector ⊕ BM25 fused with RRF, then cross-encoder rerank.
|
||
- **Consequences:** +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries.
|
||
|
||
### ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data
|
||
- **Status:** Accepted
|
||
- **Context:** Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof.
|
||
- **Decision:** Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS.
|
||
- **Consequences:** Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit.
|
||
|
||
### ADR-003 — Abstain-and-escalate as a first-class outcome
|
||
- **Status:** Accepted
|
||
- **Context:** In regulated support/compliance, a confident wrong answer is the worst outcome.
|
||
- **Decision:** Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error.
|
||
- **Consequences:** Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal.
|
||
|
||
### ADR-004 — Provider-portable model layer (router seam)
|
||
- **Status:** Accepted
|
||
- **Context:** Data-residency + vendor-risk requirements vary per client.
|
||
- **Decision:** All inference behind `llm-router`; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama.
|
||
- **Consequences:** Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture.
|