Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 10:48:52 +00:00

8.4 KiB

Raw Blame History

05 · Banking Solution Blueprints (client-ready)

Two end-to-end blueprints you can present to a financial-services client, in the JD's own deliverable formats: solution architecture + ADRs + phased roadmap + regulatory mapping. Both reuse the ByteLyst fabric patterns from 01-ecosystem-rag-fabric.md.

Blueprint 1 — Compliance Document Retrieval Assistant

Use case: compliance analysts ask natural-language questions ("What is our retention obligation for KYC records under the latest policy?") and get a grounded, cited answer drawn from regulatory filings, internal policies, and procedure manuals — or an explicit "insufficient evidence, escalate."

Architecture

flowchart TB
    AN[👤 Compliance analyst] --> APP[Assistant UI]
    APP --> ORCH

    subgraph ORCH["Agentic orchestration (LangGraph)"]
        R{{route}} --> RET[retrieve] --> GR{{CRAG grade}}
        GR -- weak --> RW[HyDE rewrite] --> RET
        GR -- ok --> GEN[generate + cite] --> CR{{Self-RAG critic}}
        CR -- ungrounded --> RW
        CR -- grounded --> OUT[answer + clause citations]
        CR -- no evidence --> ESC[escalate to human]
    end

    subgraph RETR["Hybrid retrieval"]
        VEC[(Azure AI Search<br/>vector + BM25 + semantic rerank)]
        KG[(Cosmos Gremlin<br/>policy ⇄ regulation graph)]
    end
    RET --> VEC & KG

    subgraph GOV["Governance plane"]
        ACL[role-aware ACL filter]
        AUD[event-store audit]
        CARD[model card + decision log]
    end
    RET -.-> ACL
    GEN -.-> AUD
    OUT -.-> CARD

Why these choices (headline ADRs below): Azure AI Search gives managed hybrid + semantic rerank inside one audit boundary; the Gremlin graph links policies ↔ controlling regulations so "what regulation drives this clause" is a traversal, not a guess; the critic

escalate edge guarantees no confident-wrong answers on compliance questions.

Ingestion (layout-aware, provenance-first)

flowchart LR
    DOC[Filings · policies · procedures<br/>PDF/DOCX/scans] --> PARSE[PyMuPDF / Unstructured.io<br/>+ OCR fallback]
    PARSE --> CHUNK[layout + semantic chunking<br/>tables preserved]
    CHUNK --> META[attach provenance<br/>doc·page·section·effective-date·sensitivity]
    META --> EMB[embed] --> IDX[(Azure AI Search index per tenant)]
    META --> GRAPH[(extract policy↔reg edges → Gremlin)]

Effective-date metadata is a compliance requirement, not a nicety: retrieval must be able to answer "as of" a date and never cite a superseded policy as current.

Phased delivery

Phase	Scope	Exit criteria
0 · Discovery (2–3 wks)	Corpus inventory, sensitivity classification, golden-question set with SMEs, success SLAs	Signed-off SLA sheet (faithfulness ≥ 0.9, citation 100%, abstain instead of guess)
1 · PoC (4–6 wks)	Hybrid retrieval over a bounded corpus, citations, abstain path	Beats keyword search on the golden set; every answer cited or escalated
2 · Hardening (6–8 wks)	Graph links, role-aware ACL, RAGAS/DeepEval CI gate, drift monitor	SLAs met under eval harness; controls mapped to SR 11-7
3 · Production (ongoing)	Model cards, audit, human-in-loop ops, change control	Audit trail reproducible; quarterly model-card review live

Blueprint 2 — Customer-Support Automation (retail banking)

Use case: a grounded support agent answers customer questions from product docs, fee schedules, and account-policy content — with strict masking of customer PII, citations, and instant handoff to a human for anything account-specific or low-confidence.

Architecture

flowchart TB
    C[👤 Customer] --> CH[Support chat]
    CH --> ORCH2

    subgraph ORCH2["Orchestration"]
        RT{{route:<br/>info vs. account-action}}
        RT -- "general info" --> RAG[grounded RAG answer]
        RT -- "account-specific" --> AUTHZ{step-up auth + entitlement}
        AUTHZ -- ok --> TOOL[typed account tool via MCP<br/>masked fields]
        AUTHZ -- fail / sensitive --> HUMAN[human handoff]
        RAG --> CONF{confidence ≥ SLA?}
        CONF -- no --> HUMAN
        CONF -- yes --> ANS[answer + citation]
    end

    subgraph GOV2["Zero-Trust + governance"]
        MASK[field-encrypt column masking]
        KILL[kill-switch per tool/model]
        LOG[event-store audit]
    end
    TOOL -.-> MASK
    RT -.-> KILL
    ANS -.-> LOG
    TOOL -.-> LOG

Key design stances:

Two lanes by intent. General-info → RAG over public/internal docs. Account-specific → typed MCP tool behind step-up auth + entitlement check + field masking. The model never free-queries customer data.
Confidence gate → human. Below SLA, hand off. In banking support, escalation is a feature, not a failure.
PII never enters the prompt unmasked. Masking is enforced at the MCP boundary (field-encrypt), so no prompt-engineering mistake can leak it.

Phased delivery (condensed)

Discovery — intent taxonomy, what's answerable-from-docs vs. needs-account-access, PII map, SLAs.
PoC — info-lane RAG with citations + handoff; no account access yet.
Account lane — MCP typed tools, step-up auth, masking, full audit.
Production — eval harness, drift monitor, model cards, change control.

Cross-cutting: Regulatory control mapping

This table is gold in the room — it shows you map architecture to named clauses.

Requirement	Source	How the architecture satisfies it
Model inventory & ownership	SR 11-7	Model-card registry (`04 §F`): every model/agent has a card with owner + purpose.
Independent validation	SR 11-7 / OCC	RAGAS/DeepEval harness (`04 §E`) provides repeatable, independent eval evidence.
Ongoing monitoring	SR 11-7	Online RAGAS scoring + factual-drift alerts in Hermes.
Ability to constrain a model in production	SR 11-7	`kill-switch-client` disables a model/tool live, audited.
Change control	SR 11-7	ADRs + CI eval gate; no deploy below faithfulness SLA.
Risk classification of AI system	EU AI Act	Blueprint declares risk tier; high-risk paths get human oversight by design.
Logging & traceability	EU AI Act	`event-store` decision log: query, sources, model, score, outcome — reproducible.
Human oversight	EU AI Act	Confidence-gate → human handoff edge in both blueprints.
Transparency to user	EU AI Act	Mandatory citations + "AI-assisted" disclosure + abstain language.
Right to data protection / minimization	GDPR / CCPA	Field-level masking, role-aware retrieval, retrieve-only-entitled-chunks.
Data subject access / deletion	GDPR / CCPA	Provenance metadata + tenant namespaces make targeted deletion + re-index feasible.

Sample ADRs (the format they want you to produce)

ADR-001 — Hybrid retrieval over pure-vector

Status: Accepted
Context: Compliance queries hinge on exact identifiers (clause numbers, reg citations) that dense retrieval misses.
Decision: Vector ⊕ BM25 fused with RRF, then cross-encoder rerank.
Consequences: +latency from rerank (mitigate: rerank top-k only); large recall/precision gain on identifier-bearing queries.

ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data

Status: Accepted
Context: Account data is the highest-leakage surface; free SQL is hard to audit and inject-proof.
Decision: Account access only via typed, parameterized MCP tools behind auth + masking; generative SQL restricted to read-only analytics views with RLS.
Consequences: Slightly less flexible NL→data coverage; dramatically smaller attack surface and clean audit.

ADR-003 — Abstain-and-escalate as a first-class outcome

Status: Accepted
Context: In regulated support/compliance, a confident wrong answer is the worst outcome.
Decision: Faithfulness/confidence below SLA routes to human handoff; tracked as an SLA, not an error.
Consequences: Higher human-handoff rate early; measurable safety + trust; abstain-rate becomes a tuning signal.

ADR-004 — Provider-portable model layer (router seam)

Status: Accepted
Context: Data-residency + vendor-risk requirements vary per client.
Decision: All inference behind llm-router; default Azure OpenAI, swap-in Bedrock/Vertex, on-prem via Ollama.
Consequences: Small abstraction cost; residency + vendor-risk satisfied by config, not re-architecture.

8.4 KiB Raw Blame History Unescape Escape

05 · Banking Solution Blueprints (client-ready)

Blueprint 1 — Compliance Document Retrieval Assistant

Architecture

Ingestion (layout-aware, provenance-first)

Phased delivery

Blueprint 2 — Customer-Support Automation (retail banking)

Architecture

Phased delivery (condensed)

Cross-cutting: Regulatory control mapping

Sample ADRs (the format they want you to produce)

ADR-001 — Hybrid retrieval over pure-vector

ADR-002 — Typed MCP tool-calling over free Text-to-SQL for account data

ADR-003 — Abstain-and-escalate as a first-class outcome

ADR-004 — Provider-portable model layer (router seam)

8.4 KiB

Raw Blame History