bytelyst-devops-tools/docs/INTERVIEW/01-ecosystem-rag-fabric.md

# 01 · The ByteLyst Ecosystem as an Agentic RAG Fabric

The trick in this interview is to stop treating ByteLyst as "a bunch of side projects"
and start describing it as **one governed retrieval fabric with multiple agentic
front-ends**. Every diagram below is something you can reproduce on a whiteboard.

---

## 1. System context — what we actually run

```mermaid
flowchart TB
    subgraph Users["👤 Humans & Agents"]
        U1[End users<br/>web / mobile]
        U2[Coding agents<br/>claude · codex · devin]
        U3[Operators<br/>Hermes Mission Control]
    end

    subgraph Fronts["Agentic Product Surfaces"]
        P1["invt_trdg<br/>AI trading chat<br/>(tool-calling over markets)"]
        P2["flowmonk<br/>planning + bounded AI layer"]
        P3["notelett<br/>notes for humans + agents"]
        P4["chronomind<br/>contextual time AI"]
    end

    subgraph Platform["common_plat — the shared fabric"]
        PS["platform-service :4003<br/>auth · flags · telemetry · billing · blob"]
        ES["extraction-service :4005<br/>URL / doc → retrievable units"]
        MCP["mcp-server :4007<br/>tool / resource registration"]
        LR["packages/llm-router<br/>provider abstraction"]
    end

    subgraph Data["Governed Data Sources"]
        DB[("Cosmos DB<br/>docs + Gremlin graph")]
        PG[("Postgres<br/>structured + pgvector*")]
        EV[("event-store<br/>immutable audit")]
        BLOB[("blob<br/>raw documents")]
    end

    subgraph Ops["Control Plane"]
        HERMES["Hermes Mission Control<br/>(devops_tools/dashboard)"]
        AQ["agent-queue<br/>multi-agent runner"]
    end

    U1 --> P1 & P2 & P3 & P4
    U2 --> AQ
    U3 --> HERMES
    P1 & P2 & P3 & P4 --> PS
    P1 & P2 & P3 & P4 --> MCP
    MCP --> LR
    ES --> BLOB
    PS --> DB & PG & EV
    MCP --> DB & PG
    AQ --> MCP
    HERMES --> PS
    HERMES -.observes.-> ES & MCP & LR

    classDef plan fill:#fef3c7,stroke:#d97706
    class PG,LR plan
```

> `*` pgvector and the Gremlin graph are the planned hardening (see `04-enhancement-roadmap.md`).
> Everything else is a real, deployed component of the ecosystem.

**How to narrate it:** *"The platform-service is my policy/identity plane, the mcp-server
is my tool-boundary plane, llm-router is my model plane, and the data sources are governed
behind both. Any product surface is just a thin agentic UI over that fabric — which is
exactly the shape of an enterprise agentic-RAG platform."*

---

## 2. The reference agentic-RAG container view

This is the canonical picture the interviewer wants to see — drawn in *our* components.

```mermaid
flowchart LR
    Q[User query] --> ROUTER

    subgraph Orchestration["Agentic Orchestration (LangGraph-shaped)"]
        ROUTER{{"Router / planner agent<br/>intent + complexity"}}
        RETR["Retriever agent"]
        GRADE{{"Relevance grader<br/>(CRAG gate)"}}
        REWRITE["Query rewriter<br/>(HyDE)"]
        GEN["Generator agent<br/>+ citation enforcer"]
        CRITIC{{"Self-RAG critic<br/>groundedness check"}}
    end

    subgraph Retrieval["Hybrid Retrieval Fabric"]
        VEC[("Vector<br/>pgvector / Azure AI Search")]
        BM25[("Lexical<br/>BM25")]
        GRAPH[("Graph traversal<br/>Cosmos Gremlin")]
        SQL[("Structured<br/>schema-aware SQL tool")]
        RERANK["Cross-encoder rerank<br/>+ context compression"]
    end

    subgraph Gov["Governance plane (every hop)"]
        ACL["Access-controlled retrieval<br/>auth + row/col masking"]
        AUDIT["event-store audit trail"]
        KILL["kill-switch / flags"]
    end

    ROUTER --> RETR
    RETR --> VEC & BM25 & GRAPH & SQL
    VEC & BM25 & GRAPH & SQL --> RERANK
    RERANK --> GRADE
    GRADE -- "low relevance" --> REWRITE --> RETR
    GRADE -- "ok" --> GEN
    GEN --> CRITIC
    CRITIC -- "ungrounded" --> REWRITE
    CRITIC -- "grounded + cited" --> A[Answer + citations]

    RETR -.enforced by.-> ACL
    GEN -.logged to.-> AUDIT
    ROUTER -.gated by.-> KILL
```

**Key talking points keyed to the JD:**
- *Hybrid search (vector + BM25 + graph)* → the four parallel retrievers fan-out, reranker fans-in.
- *Reranking + context compression* → the `RERANK` node (cross-encoder, e.g. ColBERT late-interaction or a bge-reranker).
- *CRAG* → the `GRADE` gate that triggers corrective re-retrieval.
- *HyDE* → the `REWRITE` node generating a hypothetical answer to embed.
- *Self-RAG* → the `CRITIC` node reflecting on groundedness before release.
- *Access-controlled retrieval / Zero Trust / audit* → the governance plane wraps **every** hop, not just the entrance.

---

## 3. Multi-agent orchestration topology (we run a real one)

`agent-queue/` is a production folder-kanban that drives **three different agent engines**
(`claude`, `codex`, `devin`) through an explicit state machine. That *is* multi-agent
orchestration — and it's the strongest "I've shipped agents" story you have.

```mermaid
stateDiagram-v2
    [*] --> inbox: drop prompt .md
    inbox --> doing: runner claims (auto-approve)
    doing --> done: success
    doing --> failed: error / timeout
    failed --> inbox: requeue (human-in-loop)
    done --> [*]

    note right of doing
      Engine selected per task:
      claude · codex · devin
      = heterogeneous agent pool
    end note
```

Map this to LangGraph vocabulary in the room:

| agent-queue concept | LangGraph / agentic equivalent |
|---|---|
| `inbox/doing/done/failed` folders | graph **nodes** / state enum |
| runner claiming + transitioning | **conditional edges** |
| engine flag (claude/codex/devin) | **tool/agent binding** per node |
| `failed → inbox` requeue | **cyclic edge** w/ human-in-the-loop checkpoint |
| live `status`/`watch` | **state checkpointer** + observability |

> Honest framing: *"I built this deliberately framework-light to stay bash-portable and
> dependency-free. The state model is identical to LangGraph; porting it onto LangGraph's
> `StateGraph` mostly buys me typed state, built-in checkpointing, and the A2A handoff
> contract — which is exactly the enhancement I've scoped."*

---

## 4. MCP server — Zero-Trust tool boundary

This is your strongest *governance* asset and a direct hit on a Preferred Qualification
("MCP server architecture, tool/resource registration patterns, agentic security threat
modeling"). We run `mcp-server` on :4007 with `packages/mcp-client`.

```mermaid
flowchart TB
    subgraph Agent["Agent (untrusted by default)"]
        A[LLM reasoning loop]
    end

    subgraph Boundary["mcp-server :4007 — policy enforcement point"]
        REG["Tool / resource registry<br/>(declared, typed, versioned)"]
        AUTHZ{"AuthZ check<br/>identity + scope + role"}
        MASK["Row/column masking<br/>field-encrypt"]
        RATE["Rate / cost limits + kill-switch"]
        LOG["Audit emit → event-store"]
    end

    subgraph Resources["Governed resources"]
        T1[Market data tool]
        T2[Doc retrieval tool]
        T3[Graph query tool]
        T4[Text-to-SQL tool<br/>read-only views]
    end

    A -- "tool call (intent)" --> REG
    REG --> AUTHZ
    AUTHZ -- deny --> A
    AUTHZ -- allow --> MASK
    MASK --> RATE
    RATE --> T1 & T2 & T3 & T4
    T1 & T2 & T3 & T4 --> LOG
    LOG --> A
```

**Threat-model talking points** (say these — they signal seniority):
- **Confused-deputy:** the agent never holds raw credentials; the MCP server exchanges the *user's* scoped identity, so a tool can't be tricked into over-broad reads.
- **Tool-poisoning / prompt injection via retrieved content:** retrieved text is treated as data, never as instructions; the generator is sandboxed from re-invoking tools without re-passing the AuthZ gate.
- **Exfiltration:** column masking + egress logging means even a successful injection can't surface PII it wasn't entitled to.
- **Blast radius:** `kill-switch-client` lets us disable a model or a single tool instantly without redeploying — critical for SR 11-7 "ability to constrain a model in production."

---

## 5. Governance & grounding plane (the part that wins regulated deals)

```mermaid
flowchart LR
    subgraph Ingest["Ingestion governance"]
        CLASS["Data classification<br/>(public / internal / PII)"]
        EMB["Embedding + metadata tags<br/>tenant · sensitivity · source"]
    end
    subgraph Query["Query-time governance"]
        IDENT["Caller identity + role"]
        FILTER["Namespace + ACL filter<br/>(pre-retrieval)"]
        RETR2["Retrieve only entitled chunks"]
    end
    subgraph Answer["Answer governance"]
        CITE["Mandatory citation<br/>(source attribution)"]
        FAITH["Faithfulness score<br/>(RAGAS / LLM-as-judge)"]
        CARD["Model card + decision log"]
    end
    CLASS --> EMB --> RETR2
    IDENT --> FILTER --> RETR2 --> CITE --> FAITH --> CARD
    FAITH -- "below SLA" --> ABSTAIN["Abstain / escalate to human"]
```

This single diagram covers four JD bullets at once: **access-controlled retrieval**,
**citation/source attribution**, **faithfulness SLAs**, and **model cards / audit**.
The `ABSTAIN` branch is the line that separates a demo from a regulated system — *"in
banking, a confident wrong answer is a worse outcome than 'I don't know, here's a human.'"*

---

## 6. Multi-tenant / namespace isolation (real concern here already)

We *already* think in tenants: every product has a `productId`, and Hermes runs **two
isolated instances (Vijay / Bheem)** with separate users, services, and backup repos. That
is the same isolation discipline a vector DB needs.

```mermaid
flowchart TB
    subgraph T_A["Tenant A (productId=invt_trdg)"]
        NSA["Vector namespace A"]
        GA["Graph partition A"]
        SA["SQL schema A (RLS)"]
    end
    subgraph T_B["Tenant B (productId=notelett)"]
        NSB["Vector namespace B"]
        GB["Graph partition B"]
        SB["SQL schema B (RLS)"]
    end
    POLICY["platform-service<br/>tenant resolver + auth"] --> NSA & NSB & GA & GB & SA & SB
```

> *"Namespace isolation isn't a vector-DB feature I'd discover late — it's how the whole
> platform is partitioned. Pinecone namespaces / Azure AI Search index-per-tenant /
> pgvector schema-per-tenant are just the storage expression of a `productId` model I
> already run."*

---

## Cheat-sheet: which diagram answers which question

| If they ask… | Draw |
|---|---|
| "Walk me through your RAG architecture" | §2 container view |
| "How do you orchestrate multiple agents?" | §3 state machine |
| "How is this secure / Zero Trust?" | §4 MCP boundary |
| "How do you prevent hallucination in production?" | §5 governance plane (CRITIC + ABSTAIN) |
| "How do you handle multi-tenancy at scale?" | §6 isolation |
| "What does your whole platform look like?" | §1 context |