Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit

7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 10:48:52 +00:00

11 KiB

Raw Blame History

03 · STAR Interview Bank

Twelve stories, each grounded in real ByteLyst work, in Situation · Task · Action · Result form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the bold line is your headline if you only get 20 seconds.

Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server, llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance isolation). Where a story references planned work, it's labeled — present those as design decisions and roadmaps you own, not as shipped-and-measured outcomes.

1. Multi-agent orchestration without a heavy framework

Proves: Agentic frameworks · orchestration topology · state-machine design

S — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM.
T — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2.
A — Designed agent-queue as a folder-kanban state machine: inbox→doing→done/failed with a failed→inbox requeue for human-in-the-loop, an engine flag binding each task to an agent, and status/watch/logs for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges.
R — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.

2. A Zero-Trust tool boundary for agents (MCP)

Proves: MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval

S — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access.
T — Make the tool layer a policy enforcement point, not a passthrough.
A — Centralized tools behind mcp-server (:4007) with mcp-client: a typed/versioned tool registry, an authZ check on every call (identity + scope + role), column masking via field-encrypt, rate/cost caps with a kill-switch, and an audit emit to event-store. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration.
R — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. Governance lives in the boundary, so no product surface can route around it.

3. Grounding by architecture, not by prompt (flowmonk)

Proves: Grounding · hallucination mitigation · faithfulness

S — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant.
T — Deliver helpful AI without letting the model be the source of truth.
A — Made a deterministic scheduler authoritative and constrained the AI layer to explanation, summarization, and safe recommendation only. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail.
R — The assistant is helpful and can't hallucinate an invalid plan into existence. This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.

4. Schema-aware tool-calling instead of free Text-to-SQL

Proves: Structured retrieval · Text-to-SQL judgment · safety

S — invt_trdg users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals.
T — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL.
A — Built the AI chat as typed, parameterized tool-calling over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool.
R — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.

5. Provider-portable model layer (llm-router)

Proves: Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing

S — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change.
T — Make model choice a config decision.
A — Introduced packages/llm-router as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with ollama-client for on-prem/air-gapped inference.
R — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.

6. Multi-tenant isolation as a platform default

Proves: Vector DB multi-tenancy · namespace isolation · governance

S — Several products share one platform; a cross-tenant data leak would be catastrophic.
T — Make isolation structural, not per-feature discipline.
A — Every product carries a productId; Hermes runs two fully isolated instances (Vijay/Bheem) with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant.
R — Isolation is the default the whole platform is partitioned by. When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.

7. Unstructured ingestion pipeline (extraction-service)

Proves: Unstructured retrieval · ingestion · provenance

S — Agents needed to answer from external documents and URLs, not just structured data.
T — Turn messy unstructured sources into clean, retrievable, attributable units.
A — Built extraction-service (:4005) + packages/extraction to parse URLs/docs into retrievable units; notelett provides a structured-notes store for human+agent content.
R — A working ingestion path into the fabric. The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'

8. Operational observability for AI systems (Hermes)

Proves: Eval-harness home · drift monitoring · production ops

S — Running agentic services in production with no single pane meant blind spots.
T — One control plane for the agentic fabric.
A — Built Hermes Mission Control (Next.js + Fastify) with diagnostics-client/telemetry-client/monitoring; the hermes-ops module already models both instances as the seed for real data.
R — A live ops console for the ecosystem. It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.

9. Instant blast-radius control (kill-switch + flags)

Proves: Governance · Zero Trust · SR 11-7 ("constrain a model in production")

S — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle.
T — Decouple "turn this off" from "ship a release."
A — Adopted feature-flag-client + kill-switch-client so any model or individual tool can be disabled live; combined with event-store audit so the action is logged.
R — Sub-minute containment without a redeploy. This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.

10. Disaster recovery + parity discipline

Proves: Production rigor · regulated-grade operations

S — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot.
T — Drive both to parity with persistent backup, watchdog, and tested restore.
A — Documented the gap explicitly in the v2 roadmap (hermes_dashboard_v2_roadmap.md) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance.
R — A named, prioritized closure plan. In regulated environments 'we have backups' is not a control until restore is tested; I treat untested DR as an open finding, not a checkbox.

11. Bounded autonomy with human-in-the-loop

Proves: Agentic safety · orchestration · abstain-and-escalate

S — Autonomous agents that never escalate will confidently do the wrong thing.
T — Build escalation into the topology.
A — In agent-queue, failed routes back to inbox for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to abstain/escalate (see 01 §5).
R — The system degrades to a human instead of degrading to a hallucination. The escalation edge is the most important edge in the graph for a regulated deployment.

12. Documentation & decision rigor as an architect

Proves: ADRs · blueprints · roadmaps · mentoring / CoE contribution

S — A multi-product ecosystem with multiple agent engines drifts without written decisions.
T — Make architecture legible to engineers and execs.
A — Maintained an ADR directory, roadmaps (hermes_*_roadmap.md, deployment-optimization-roadmap.md), a repo map, and agent-facing AGENTS.md/CLAUDE.md so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator.
R — New contributors (human or agent) onboard from canonical docs. This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.

Behavioral / leadership prompts — quick frames

Prompt	Lead with
"Tell me about a time you influenced without authority."	#12 docs/ADRs driving multi-agent consistency.
"A production AI system gave a wrong answer. What did you do?"	#3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch.
"How do you handle disagreement on architecture?"	ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing.
"Describe mentoring junior engineers."	The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task.
"Biggest technical mistake?"	Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore.
"Why this role / why financial services?"	Trading product taught me to engineer for consequences; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off.

11 KiB Raw Blame History