7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem: ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank, enhancement roadmap, banking blueprints, and a glossary quick-ref. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
192 lines
9.3 KiB
Markdown
192 lines
9.3 KiB
Markdown
# 04 · Enhancement Roadmap — make every claim literally true
|
|
|
|
This is the "what would you build here" answer, and it doubles as a real backlog. Each
|
|
enhancement turns an *adjacent* capability into a *shipped* one on infrastructure we
|
|
already run. They're ordered so each builds on the last; the whole set is a credible
|
|
"agentic-RAG fabric, hardened" program.
|
|
|
|
> Mapping note: these slot into the existing repo conventions — new code under
|
|
> `learning_ai_common_plat/packages` + a `services/rag-service`, eval harness surfaced in
|
|
> `learning_ai_devops_tools/dashboard` (Hermes), and ADRs under
|
|
> `learning_ai_devops_tools/docs/adr/`. Cut tracker items via `scripts/tracker-seed/`.
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
A["§A LangGraph port<br/>+ A2A agent cards"] --> B["§B Hybrid retrieval<br/>pgvector+BM25+rerank+HyDE/CRAG/Self-RAG"]
|
|
B --> C["§C Guarded Text-to-SQL<br/>read-only views + RLS"]
|
|
B --> D["§D Cosmos Gremlin<br/>knowledge graph + Graph RAG"]
|
|
B --> E["§E RAGAS/DeepEval harness<br/>+ drift monitor in Hermes"]
|
|
C & D --> F["§F Model-card registry<br/>+ governance pack"]
|
|
E --> F
|
|
classDef p1 fill:#dcfce7,stroke:#16a34a
|
|
classDef p2 fill:#fef9c3,stroke:#ca8a04
|
|
classDef p3 fill:#fee2e2,stroke:#dc2626
|
|
class A,B p1
|
|
class C,D,E p2
|
|
class F p3
|
|
```
|
|
|
|
| Phase | Enhancements | Why now |
|
|
|---|---|---|
|
|
| **P1 (foundation)** | §A, §B | Orchestration + retrieval are the spine; everything else attaches to them. |
|
|
| **P2 (sources + quality)** | §C, §D, §E | Add structured + graph sources and the eval loop that proves quality. |
|
|
| **P3 (governance)** | §F | Wrap the now-real fabric in the regulated-grade governance story. |
|
|
|
|
---
|
|
|
|
## §A — Port `agent-queue` topology onto LangGraph + add A2A handoff
|
|
|
|
**Goal:** make the "prod-grade LangGraph" claim literal while keeping the proven state model.
|
|
|
|
- New `packages/agent-graph`: a typed `StateGraph` with nodes `route → retrieve → grade → (rewrite) → generate → critique`, conditional + cyclic edges, and a checkpointer backed by `event-store`.
|
|
- Keep `agent-queue`'s engine-selection idea as **node-level model binding** through `llm-router`.
|
|
- Expose each product agent with an **A2A agent card** (capabilities, auth scope, cost hints) so a supervisor agent can delegate; the card is served by `mcp-server`.
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> route
|
|
route --> retrieve: needs evidence
|
|
route --> generate: parametric/FAQ
|
|
retrieve --> grade
|
|
grade --> rewrite: low relevance (CRAG)
|
|
rewrite --> retrieve
|
|
grade --> generate: ok
|
|
generate --> critique
|
|
critique --> rewrite: ungrounded (Self-RAG)
|
|
critique --> [*]: grounded + cited
|
|
```
|
|
|
|
**Acceptance:** a LangGraph run with a forced low-relevance retrieval demonstrably loops
|
|
through `rewrite`; checkpoints land in `event-store`; one product reachable via A2A card.
|
|
**Effort:** M. **Risk:** low (mapping is 1:1 with today's state machine).
|
|
|
|
---
|
|
|
|
## §B — Hybrid retrieval: pgvector + BM25 + rerank + HyDE / CRAG / Self-RAG
|
|
|
|
**Goal:** turn "I understand hybrid RAG" into a running `services/rag-service`.
|
|
|
|
- **pgvector** alongside the existing Postgres → one DB, one backup, transactional consistency with source rows; **schema-per-tenant** namespaces (mirrors `productId`).
|
|
- **BM25** lexical (Postgres FTS or an OpenSearch sidecar) fused with vector via **RRF**.
|
|
- **Cross-encoder rerank** (bge-reranker or ColBERT late-interaction) on the fused candidates; **context compression** to fit budget.
|
|
- **HyDE** query rewriter node; **CRAG** relevance gate; **Self-RAG** groundedness critic (the §A nodes).
|
|
- **Layout-aware ingestion** in `extraction-service`: PyMuPDF / Unstructured.io, OCR fallback, table preservation, **page/section provenance** on every chunk.
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Q --> HYDE[HyDE rewrite] --> EMB[embed]
|
|
EMB --> VEC[(pgvector ANN)]
|
|
Q --> BM[(BM25)]
|
|
VEC & BM --> RRF[RRF fuse] --> RR[cross-encoder rerank] --> CC[context compress] --> GEN
|
|
```
|
|
|
|
**Acceptance:** hybrid beats vector-only on a golden set (context-recall ↑, context-precision ↑);
|
|
every chunk carries doc/page/section provenance; abstain fires when reranked top-score < τ.
|
|
**Effort:** L. **Risk:** medium (reranker latency budget — mitigate with rerank-top-k only).
|
|
|
|
---
|
|
|
|
## §C — Guarded Text-to-SQL tool
|
|
|
|
**Goal:** add genuine generative SQL for ad-hoc analytics without the foot-guns.
|
|
|
|
- Register a `sql-query` tool on `mcp-server` scoped to **read-only semantic views** (no base tables), with **row-level security** by tenant/role.
|
|
- **Schema-aware retrieval:** embed table/column descriptions; retrieve only the relevant schema slice into the prompt (don't dump the catalog).
|
|
- Parse + validate generated SQL (allow-list of statements, forbid cross-schema joins, enforce `LIMIT`); cost-cap and timeout.
|
|
- Audit every generated query + row count to `event-store`.
|
|
|
|
**Acceptance:** an attempt to read an unentitled column is blocked at the view/RLS layer
|
|
and logged; a malformed/oversized query is rejected pre-execution.
|
|
**Effort:** M. **Risk:** medium (this is the highest-leakage surface — keep it behind views).
|
|
|
|
---
|
|
|
|
## §D — Cosmos Gremlin knowledge graph + Graph RAG
|
|
|
|
**Goal:** answer "connected-to" questions (KYC/AML-shaped) on infra we already run.
|
|
|
|
- Use the existing **Azure Cosmos DB Gremlin** API. Entity/relation extraction at ingest (from `extraction-service` output + structured rows) builds the graph.
|
|
- **Graph-augmented retrieval:** vector hit seeds an entry node → bounded Gremlin traversal returns the subgraph → fuse subgraph + text chunks into context.
|
|
- Expose a `graph-query` tool on `mcp-server` (read-only, depth-bounded).
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Q --> V[(vector seed)] --> N[entry entity]
|
|
N --> G[(Gremlin traversal<br/>≤2 hops)]
|
|
G --> SUB[subgraph]
|
|
SUB --> FUSE[fuse w/ text chunks] --> GEN
|
|
```
|
|
|
|
**Acceptance:** a 2-hop relationship question that vector-only fails is answered correctly
|
|
with the subgraph cited; traversal depth/time bounded.
|
|
**Effort:** L. **Risk:** medium (graph modeling + traversal cost).
|
|
|
|
---
|
|
|
|
## §E — Evaluation harness + factual-drift monitor in Hermes
|
|
|
|
**Goal:** make "RAGAS / faithfulness SLAs / drift monitoring" real and visible.
|
|
|
|
- **Offline (CI):** **DeepEval** pytest-style assertions on a golden set — faithfulness, answer-relevancy, context-precision, context-recall, answer-correctness. Regression below threshold **blocks deploy**.
|
|
- **Online:** sample production traces, score with **RAGAS / LLM-as-judge**, emit metrics via `telemetry-client`.
|
|
- **Hermes pane:** a "RAG Quality" panel (extends `hermes-ops`) trending the five metrics per tenant + a **drift alert** when faithfulness/recall degrade week-over-week.
|
|
- Wire **abstain rate** and **escalation rate** as first-class SLAs.
|
|
|
|
```mermaid
|
|
flowchart TB
|
|
subgraph CI["Offline / CI (DeepEval)"]
|
|
G[golden set] --> SC1[score] --> GATE{≥ SLA?}
|
|
GATE -- no --> BLOCK[block deploy]
|
|
GATE -- yes --> SHIP[ship]
|
|
end
|
|
subgraph PROD["Online (RAGAS / judge)"]
|
|
TR[sampled traces] --> SC2[score] --> TEL[telemetry-client] --> HERMES[Hermes RAG-Quality pane]
|
|
HERMES --> DRIFT{drift?} -- yes --> ALERT[alert + open finding]
|
|
end
|
|
```
|
|
|
|
**Acceptance:** a deliberately-degraded retriever fails the CI gate; the Hermes pane shows
|
|
the five metrics per tenant and fires a drift alert on a seeded regression.
|
|
**Effort:** M. **Risk:** low-medium (judge cost — sample, don't score 100%).
|
|
|
|
---
|
|
|
|
## §F — Model-card registry + governance pack
|
|
|
|
**Goal:** the regulated-grade documentation/audit layer (SR 11-7 / EU AI Act ready).
|
|
|
|
- **Model-card registry** (a `governance` package + Hermes pane): per deployed model/agent — purpose, data sources, eval scores, known limits, owner, last-reviewed date, kill-switch link.
|
|
- **Decision log:** every generation's (query, retrieved sources, model, faithfulness score, abstain/answer) to `event-store` → reproducible audit trail.
|
|
- **RACI doc** template per engagement; **ADR** set under `docs/adr/` for each architectural choice.
|
|
- Map controls to **SR 11-7** (model inventory, validation, monitoring, change control) and **EU AI Act** (risk classification, logging, human oversight, transparency) — see `05-banking-blueprints.md`.
|
|
|
|
**Acceptance:** every production model has a card with current eval scores + owner; any
|
|
answer can be reconstructed from the decision log; controls trace to named regulatory clauses.
|
|
**Effort:** M. **Risk:** low (mostly assembly over existing `event-store`/flags/auth).
|
|
|
|
---
|
|
|
|
## Sequencing & "what I'd do in the first 90 days" (great closing answer)
|
|
|
|
```mermaid
|
|
gantt
|
|
title Agentic-RAG hardening — 90-day view
|
|
dateFormat X
|
|
axisFormat %s
|
|
section Foundation
|
|
§A LangGraph + A2A :a, 0, 3
|
|
§B Hybrid retrieval :b, 1, 5
|
|
section Sources & Quality
|
|
§C Guarded Text-to-SQL :c, 5, 3
|
|
§D Graph RAG (Gremlin) :d, 5, 4
|
|
§E Eval harness + drift :e, 4, 4
|
|
section Governance
|
|
§F Model cards + RACI :f, 8, 3
|
|
```
|
|
|
|
> *"In 90 days I'd stand up the retrieval spine and the eval harness first — because you
|
|
> can't tune what you can't measure — then layer structured + graph sources, and close with
|
|
> the governance pack so the whole thing is audit-ready. Notice governance isn't last
|
|
> because it's least important; it's last because by then it's mostly **assembling controls
|
|
> the platform already enforces** (auth, masking, kill-switch, audit) into cards and RACI."*
|