bytelyst-devops-tools/docs/INTERVIEW/01-ecosystem-rag-fabric.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

10 KiB

01 · The ByteLyst Ecosystem as an Agentic RAG Fabric

The trick in this interview is to stop treating ByteLyst as "a bunch of side projects" and start describing it as one governed retrieval fabric with multiple agentic front-ends. Every diagram below is something you can reproduce on a whiteboard.


1. System context — what we actually run

flowchart TB
    subgraph Users["👤 Humans & Agents"]
        U1[End users<br/>web / mobile]
        U2[Coding agents<br/>claude · codex · devin]
        U3[Operators<br/>Hermes Mission Control]
    end

    subgraph Fronts["Agentic Product Surfaces"]
        P1["invt_trdg<br/>AI trading chat<br/>(tool-calling over markets)"]
        P2["flowmonk<br/>planning + bounded AI layer"]
        P3["notelett<br/>notes for humans + agents"]
        P4["chronomind<br/>contextual time AI"]
    end

    subgraph Platform["common_plat — the shared fabric"]
        PS["platform-service :4003<br/>auth · flags · telemetry · billing · blob"]
        ES["extraction-service :4005<br/>URL / doc → retrievable units"]
        MCP["mcp-server :4007<br/>tool / resource registration"]
        LR["packages/llm-router<br/>provider abstraction"]
    end

    subgraph Data["Governed Data Sources"]
        DB[("Cosmos DB<br/>docs + Gremlin graph")]
        PG[("Postgres<br/>structured + pgvector*")]
        EV[("event-store<br/>immutable audit")]
        BLOB[("blob<br/>raw documents")]
    end

    subgraph Ops["Control Plane"]
        HERMES["Hermes Mission Control<br/>(devops_tools/dashboard)"]
        AQ["agent-queue<br/>multi-agent runner"]
    end

    U1 --> P1 & P2 & P3 & P4
    U2 --> AQ
    U3 --> HERMES
    P1 & P2 & P3 & P4 --> PS
    P1 & P2 & P3 & P4 --> MCP
    MCP --> LR
    ES --> BLOB
    PS --> DB & PG & EV
    MCP --> DB & PG
    AQ --> MCP
    HERMES --> PS
    HERMES -.observes.-> ES & MCP & LR

    classDef plan fill:#fef3c7,stroke:#d97706
    class PG,LR plan

* pgvector and the Gremlin graph are the planned hardening (see 04-enhancement-roadmap.md). Everything else is a real, deployed component of the ecosystem.

How to narrate it: "The platform-service is my policy/identity plane, the mcp-server is my tool-boundary plane, llm-router is my model plane, and the data sources are governed behind both. Any product surface is just a thin agentic UI over that fabric — which is exactly the shape of an enterprise agentic-RAG platform."


2. The reference agentic-RAG container view

This is the canonical picture the interviewer wants to see — drawn in our components.

flowchart LR
    Q[User query] --> ROUTER

    subgraph Orchestration["Agentic Orchestration (LangGraph-shaped)"]
        ROUTER{{"Router / planner agent<br/>intent + complexity"}}
        RETR["Retriever agent"]
        GRADE{{"Relevance grader<br/>(CRAG gate)"}}
        REWRITE["Query rewriter<br/>(HyDE)"]
        GEN["Generator agent<br/>+ citation enforcer"]
        CRITIC{{"Self-RAG critic<br/>groundedness check"}}
    end

    subgraph Retrieval["Hybrid Retrieval Fabric"]
        VEC[("Vector<br/>pgvector / Azure AI Search")]
        BM25[("Lexical<br/>BM25")]
        GRAPH[("Graph traversal<br/>Cosmos Gremlin")]
        SQL[("Structured<br/>schema-aware SQL tool")]
        RERANK["Cross-encoder rerank<br/>+ context compression"]
    end

    subgraph Gov["Governance plane (every hop)"]
        ACL["Access-controlled retrieval<br/>auth + row/col masking"]
        AUDIT["event-store audit trail"]
        KILL["kill-switch / flags"]
    end

    ROUTER --> RETR
    RETR --> VEC & BM25 & GRAPH & SQL
    VEC & BM25 & GRAPH & SQL --> RERANK
    RERANK --> GRADE
    GRADE -- "low relevance" --> REWRITE --> RETR
    GRADE -- "ok" --> GEN
    GEN --> CRITIC
    CRITIC -- "ungrounded" --> REWRITE
    CRITIC -- "grounded + cited" --> A[Answer + citations]

    RETR -.enforced by.-> ACL
    GEN -.logged to.-> AUDIT
    ROUTER -.gated by.-> KILL

Key talking points keyed to the JD:

  • Hybrid search (vector + BM25 + graph) → the four parallel retrievers fan-out, reranker fans-in.
  • Reranking + context compression → the RERANK node (cross-encoder, e.g. ColBERT late-interaction or a bge-reranker).
  • CRAG → the GRADE gate that triggers corrective re-retrieval.
  • HyDE → the REWRITE node generating a hypothetical answer to embed.
  • Self-RAG → the CRITIC node reflecting on groundedness before release.
  • Access-controlled retrieval / Zero Trust / audit → the governance plane wraps every hop, not just the entrance.

3. Multi-agent orchestration topology (we run a real one)

agent-queue/ is a production folder-kanban that drives three different agent engines (claude, codex, devin) through an explicit state machine. That is multi-agent orchestration — and it's the strongest "I've shipped agents" story you have.

stateDiagram-v2
    [*] --> inbox: drop prompt .md
    inbox --> doing: runner claims (auto-approve)
    doing --> done: success
    doing --> failed: error / timeout
    failed --> inbox: requeue (human-in-loop)
    done --> [*]

    note right of doing
      Engine selected per task:
      claude · codex · devin
      = heterogeneous agent pool
    end note

Map this to LangGraph vocabulary in the room:

agent-queue concept LangGraph / agentic equivalent
inbox/doing/done/failed folders graph nodes / state enum
runner claiming + transitioning conditional edges
engine flag (claude/codex/devin) tool/agent binding per node
failed → inbox requeue cyclic edge w/ human-in-the-loop checkpoint
live status/watch state checkpointer + observability

Honest framing: "I built this deliberately framework-light to stay bash-portable and dependency-free. The state model is identical to LangGraph; porting it onto LangGraph's StateGraph mostly buys me typed state, built-in checkpointing, and the A2A handoff contract — which is exactly the enhancement I've scoped."


4. MCP server — Zero-Trust tool boundary

This is your strongest governance asset and a direct hit on a Preferred Qualification ("MCP server architecture, tool/resource registration patterns, agentic security threat modeling"). We run mcp-server on :4007 with packages/mcp-client.

flowchart TB
    subgraph Agent["Agent (untrusted by default)"]
        A[LLM reasoning loop]
    end

    subgraph Boundary["mcp-server :4007 — policy enforcement point"]
        REG["Tool / resource registry<br/>(declared, typed, versioned)"]
        AUTHZ{"AuthZ check<br/>identity + scope + role"}
        MASK["Row/column masking<br/>field-encrypt"]
        RATE["Rate / cost limits + kill-switch"]
        LOG["Audit emit → event-store"]
    end

    subgraph Resources["Governed resources"]
        T1[Market data tool]
        T2[Doc retrieval tool]
        T3[Graph query tool]
        T4[Text-to-SQL tool<br/>read-only views]
    end

    A -- "tool call (intent)" --> REG
    REG --> AUTHZ
    AUTHZ -- deny --> A
    AUTHZ -- allow --> MASK
    MASK --> RATE
    RATE --> T1 & T2 & T3 & T4
    T1 & T2 & T3 & T4 --> LOG
    LOG --> A

Threat-model talking points (say these — they signal seniority):

  • Confused-deputy: the agent never holds raw credentials; the MCP server exchanges the user's scoped identity, so a tool can't be tricked into over-broad reads.
  • Tool-poisoning / prompt injection via retrieved content: retrieved text is treated as data, never as instructions; the generator is sandboxed from re-invoking tools without re-passing the AuthZ gate.
  • Exfiltration: column masking + egress logging means even a successful injection can't surface PII it wasn't entitled to.
  • Blast radius: kill-switch-client lets us disable a model or a single tool instantly without redeploying — critical for SR 11-7 "ability to constrain a model in production."

5. Governance & grounding plane (the part that wins regulated deals)

flowchart LR
    subgraph Ingest["Ingestion governance"]
        CLASS["Data classification<br/>(public / internal / PII)"]
        EMB["Embedding + metadata tags<br/>tenant · sensitivity · source"]
    end
    subgraph Query["Query-time governance"]
        IDENT["Caller identity + role"]
        FILTER["Namespace + ACL filter<br/>(pre-retrieval)"]
        RETR2["Retrieve only entitled chunks"]
    end
    subgraph Answer["Answer governance"]
        CITE["Mandatory citation<br/>(source attribution)"]
        FAITH["Faithfulness score<br/>(RAGAS / LLM-as-judge)"]
        CARD["Model card + decision log"]
    end
    CLASS --> EMB --> RETR2
    IDENT --> FILTER --> RETR2 --> CITE --> FAITH --> CARD
    FAITH -- "below SLA" --> ABSTAIN["Abstain / escalate to human"]

This single diagram covers four JD bullets at once: access-controlled retrieval, citation/source attribution, faithfulness SLAs, and model cards / audit. The ABSTAIN branch is the line that separates a demo from a regulated system — "in banking, a confident wrong answer is a worse outcome than 'I don't know, here's a human.'"


6. Multi-tenant / namespace isolation (real concern here already)

We already think in tenants: every product has a productId, and Hermes runs two isolated instances (Vijay / Bheem) with separate users, services, and backup repos. That is the same isolation discipline a vector DB needs.

flowchart TB
    subgraph T_A["Tenant A (productId=invt_trdg)"]
        NSA["Vector namespace A"]
        GA["Graph partition A"]
        SA["SQL schema A (RLS)"]
    end
    subgraph T_B["Tenant B (productId=notelett)"]
        NSB["Vector namespace B"]
        GB["Graph partition B"]
        SB["SQL schema B (RLS)"]
    end
    POLICY["platform-service<br/>tenant resolver + auth"] --> NSA & NSB & GA & GB & SA & SB

"Namespace isolation isn't a vector-DB feature I'd discover late — it's how the whole platform is partitioned. Pinecone namespaces / Azure AI Search index-per-tenant / pgvector schema-per-tenant are just the storage expression of a productId model I already run."


Cheat-sheet: which diagram answers which question

If they ask… Draw
"Walk me through your RAG architecture" §2 container view
"How do you orchestrate multiple agents?" §3 state machine
"How is this secure / Zero Trust?" §4 MCP boundary
"How do you prevent hallucination in production?" §5 governance plane (CRITIC + ABSTAIN)
"How do you handle multi-tenancy at scale?" §6 isolation
"What does your whole platform look like?" §1 context