bytelyst-devops-tools/docs/INTERVIEW/04-enhancement-roadmap.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

192 lines
9.3 KiB
Markdown

# 04 · Enhancement Roadmap — make every claim literally true
This is the "what would you build here" answer, and it doubles as a real backlog. Each
enhancement turns an *adjacent* capability into a *shipped* one on infrastructure we
already run. They're ordered so each builds on the last; the whole set is a credible
"agentic-RAG fabric, hardened" program.
> Mapping note: these slot into the existing repo conventions — new code under
> `learning_ai_common_plat/packages` + a `services/rag-service`, eval harness surfaced in
> `learning_ai_devops_tools/dashboard` (Hermes), and ADRs under
> `learning_ai_devops_tools/docs/adr/`. Cut tracker items via `scripts/tracker-seed/`.
```mermaid
flowchart LR
A["§A LangGraph port<br/>+ A2A agent cards"] --> B["§B Hybrid retrieval<br/>pgvector+BM25+rerank+HyDE/CRAG/Self-RAG"]
B --> C["§C Guarded Text-to-SQL<br/>read-only views + RLS"]
B --> D["§D Cosmos Gremlin<br/>knowledge graph + Graph RAG"]
B --> E["§E RAGAS/DeepEval harness<br/>+ drift monitor in Hermes"]
C & D --> F["§F Model-card registry<br/>+ governance pack"]
E --> F
classDef p1 fill:#dcfce7,stroke:#16a34a
classDef p2 fill:#fef9c3,stroke:#ca8a04
classDef p3 fill:#fee2e2,stroke:#dc2626
class A,B p1
class C,D,E p2
class F p3
```
| Phase | Enhancements | Why now |
|---|---|---|
| **P1 (foundation)** | §A, §B | Orchestration + retrieval are the spine; everything else attaches to them. |
| **P2 (sources + quality)** | §C, §D, §E | Add structured + graph sources and the eval loop that proves quality. |
| **P3 (governance)** | §F | Wrap the now-real fabric in the regulated-grade governance story. |
---
## §A — Port `agent-queue` topology onto LangGraph + add A2A handoff
**Goal:** make the "prod-grade LangGraph" claim literal while keeping the proven state model.
- New `packages/agent-graph`: a typed `StateGraph` with nodes `route → retrieve → grade → (rewrite) → generate → critique`, conditional + cyclic edges, and a checkpointer backed by `event-store`.
- Keep `agent-queue`'s engine-selection idea as **node-level model binding** through `llm-router`.
- Expose each product agent with an **A2A agent card** (capabilities, auth scope, cost hints) so a supervisor agent can delegate; the card is served by `mcp-server`.
```mermaid
stateDiagram-v2
[*] --> route
route --> retrieve: needs evidence
route --> generate: parametric/FAQ
retrieve --> grade
grade --> rewrite: low relevance (CRAG)
rewrite --> retrieve
grade --> generate: ok
generate --> critique
critique --> rewrite: ungrounded (Self-RAG)
critique --> [*]: grounded + cited
```
**Acceptance:** a LangGraph run with a forced low-relevance retrieval demonstrably loops
through `rewrite`; checkpoints land in `event-store`; one product reachable via A2A card.
**Effort:** M. **Risk:** low (mapping is 1:1 with today's state machine).
---
## §B — Hybrid retrieval: pgvector + BM25 + rerank + HyDE / CRAG / Self-RAG
**Goal:** turn "I understand hybrid RAG" into a running `services/rag-service`.
- **pgvector** alongside the existing Postgres → one DB, one backup, transactional consistency with source rows; **schema-per-tenant** namespaces (mirrors `productId`).
- **BM25** lexical (Postgres FTS or an OpenSearch sidecar) fused with vector via **RRF**.
- **Cross-encoder rerank** (bge-reranker or ColBERT late-interaction) on the fused candidates; **context compression** to fit budget.
- **HyDE** query rewriter node; **CRAG** relevance gate; **Self-RAG** groundedness critic (the §A nodes).
- **Layout-aware ingestion** in `extraction-service`: PyMuPDF / Unstructured.io, OCR fallback, table preservation, **page/section provenance** on every chunk.
```mermaid
flowchart LR
Q --> HYDE[HyDE rewrite] --> EMB[embed]
EMB --> VEC[(pgvector ANN)]
Q --> BM[(BM25)]
VEC & BM --> RRF[RRF fuse] --> RR[cross-encoder rerank] --> CC[context compress] --> GEN
```
**Acceptance:** hybrid beats vector-only on a golden set (context-recall ↑, context-precision ↑);
every chunk carries doc/page/section provenance; abstain fires when reranked top-score < τ.
**Effort:** L. **Risk:** medium (reranker latency budget mitigate with rerank-top-k only).
---
## §C — Guarded Text-to-SQL tool
**Goal:** add genuine generative SQL for ad-hoc analytics without the foot-guns.
- Register a `sql-query` tool on `mcp-server` scoped to **read-only semantic views** (no base tables), with **row-level security** by tenant/role.
- **Schema-aware retrieval:** embed table/column descriptions; retrieve only the relevant schema slice into the prompt (don't dump the catalog).
- Parse + validate generated SQL (allow-list of statements, forbid cross-schema joins, enforce `LIMIT`); cost-cap and timeout.
- Audit every generated query + row count to `event-store`.
**Acceptance:** an attempt to read an unentitled column is blocked at the view/RLS layer
and logged; a malformed/oversized query is rejected pre-execution.
**Effort:** M. **Risk:** medium (this is the highest-leakage surface keep it behind views).
---
## §D — Cosmos Gremlin knowledge graph + Graph RAG
**Goal:** answer "connected-to" questions (KYC/AML-shaped) on infra we already run.
- Use the existing **Azure Cosmos DB Gremlin** API. Entity/relation extraction at ingest (from `extraction-service` output + structured rows) builds the graph.
- **Graph-augmented retrieval:** vector hit seeds an entry node bounded Gremlin traversal returns the subgraph fuse subgraph + text chunks into context.
- Expose a `graph-query` tool on `mcp-server` (read-only, depth-bounded).
```mermaid
flowchart LR
Q --> V[(vector seed)] --> N[entry entity]
N --> G[(Gremlin traversal<br/>≤2 hops)]
G --> SUB[subgraph]
SUB --> FUSE[fuse w/ text chunks] --> GEN
```
**Acceptance:** a 2-hop relationship question that vector-only fails is answered correctly
with the subgraph cited; traversal depth/time bounded.
**Effort:** L. **Risk:** medium (graph modeling + traversal cost).
---
## §E — Evaluation harness + factual-drift monitor in Hermes
**Goal:** make "RAGAS / faithfulness SLAs / drift monitoring" real and visible.
- **Offline (CI):** **DeepEval** pytest-style assertions on a golden set faithfulness, answer-relevancy, context-precision, context-recall, answer-correctness. Regression below threshold **blocks deploy**.
- **Online:** sample production traces, score with **RAGAS / LLM-as-judge**, emit metrics via `telemetry-client`.
- **Hermes pane:** a "RAG Quality" panel (extends `hermes-ops`) trending the five metrics per tenant + a **drift alert** when faithfulness/recall degrade week-over-week.
- Wire **abstain rate** and **escalation rate** as first-class SLAs.
```mermaid
flowchart TB
subgraph CI["Offline / CI (DeepEval)"]
G[golden set] --> SC1[score] --> GATE{≥ SLA?}
GATE -- no --> BLOCK[block deploy]
GATE -- yes --> SHIP[ship]
end
subgraph PROD["Online (RAGAS / judge)"]
TR[sampled traces] --> SC2[score] --> TEL[telemetry-client] --> HERMES[Hermes RAG-Quality pane]
HERMES --> DRIFT{drift?} -- yes --> ALERT[alert + open finding]
end
```
**Acceptance:** a deliberately-degraded retriever fails the CI gate; the Hermes pane shows
the five metrics per tenant and fires a drift alert on a seeded regression.
**Effort:** M. **Risk:** low-medium (judge cost sample, don't score 100%).
---
## §F — Model-card registry + governance pack
**Goal:** the regulated-grade documentation/audit layer (SR 11-7 / EU AI Act ready).
- **Model-card registry** (a `governance` package + Hermes pane): per deployed model/agent purpose, data sources, eval scores, known limits, owner, last-reviewed date, kill-switch link.
- **Decision log:** every generation's (query, retrieved sources, model, faithfulness score, abstain/answer) to `event-store` reproducible audit trail.
- **RACI doc** template per engagement; **ADR** set under `docs/adr/` for each architectural choice.
- Map controls to **SR 11-7** (model inventory, validation, monitoring, change control) and **EU AI Act** (risk classification, logging, human oversight, transparency) see `05-banking-blueprints.md`.
**Acceptance:** every production model has a card with current eval scores + owner; any
answer can be reconstructed from the decision log; controls trace to named regulatory clauses.
**Effort:** M. **Risk:** low (mostly assembly over existing `event-store`/flags/auth).
---
## Sequencing & "what I'd do in the first 90 days" (great closing answer)
```mermaid
gantt
title Agentic-RAG hardening — 90-day view
dateFormat X
axisFormat %s
section Foundation
§A LangGraph + A2A :a, 0, 3
§B Hybrid retrieval :b, 1, 5
section Sources & Quality
§C Guarded Text-to-SQL :c, 5, 3
§D Graph RAG (Gremlin) :d, 5, 4
§E Eval harness + drift :e, 4, 4
section Governance
§F Model cards + RACI :f, 8, 3
```
> *"In 90 days I'd stand up the retrieval spine and the eval harness first — because you
> can't tune what you can't measure — then layer structured + graph sources, and close with
> the governance pack so the whole thing is audit-ready. Notice governance isn't last
> because it's least important; it's last because by then it's mostly **assembling controls
> the platform already enforces** (auth, masking, kill-switch, audit) into cards and RACI."*