bytelyst-devops-tools/docs/INTERVIEW/03-star-interview-bank.md
Hermes VM 076449268b docs(interview): add Senior Agentic RAG Architect prep kit
7-doc kit mapping the JD competency matrix to the ByteLyst ecosystem:
ecosystem-as-RAG-fabric architecture, competency deep-dives, STAR bank,
enhancement roadmap, banking blueprints, and a glossary quick-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:48:52 +00:00

144 lines
11 KiB
Markdown

# 03 · STAR Interview Bank
Twelve stories, each grounded in real ByteLyst work, in **Situation · Task · Action ·
Result** form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the
**bold** line is your headline if you only get 20 seconds.
> Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server,
> llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance
> isolation). Where a story references planned work, it's labeled — present those as
> *design decisions and roadmaps you own*, not as shipped-and-measured outcomes.
---
## 1. Multi-agent orchestration without a heavy framework
**Proves:** Agentic frameworks · orchestration topology · state-machine design
- **S** — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM.
- **T** — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2.
- **A** — Designed `agent-queue` as a **folder-kanban state machine**: `inbox→doing→done/failed` with a `failed→inbox` requeue for human-in-the-loop, an engine flag binding each task to an agent, and `status`/`watch`/`logs` for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges.
- **R** — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. **The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.**
---
## 2. A Zero-Trust tool boundary for agents (MCP)
**Proves:** MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval
- **S** — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access.
- **T** — Make the tool layer a **policy enforcement point**, not a passthrough.
- **A** — Centralized tools behind `mcp-server` (:4007) with `mcp-client`: a typed/versioned tool registry, an authZ check on **every** call (identity + scope + role), column masking via `field-encrypt`, rate/cost caps with a `kill-switch`, and an audit emit to `event-store`. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration.
- **R** — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. **Governance lives in the boundary, so no product surface can route around it.**
---
## 3. Grounding by architecture, not by prompt (flowmonk)
**Proves:** Grounding · hallucination mitigation · faithfulness
- **S** — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant.
- **T** — Deliver helpful AI without letting the model be the source of truth.
- **A** — Made a **deterministic scheduler authoritative** and **constrained the AI layer to explanation, summarization, and safe recommendation only**. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail.
- **R** — The assistant is helpful *and* can't hallucinate an invalid plan into existence. **This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.**
---
## 4. Schema-aware tool-calling instead of free Text-to-SQL
**Proves:** Structured retrieval · Text-to-SQL judgment · safety
- **S** — `invt_trdg` users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals.
- **T** — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL.
- **A** — Built the AI chat as **typed, parameterized tool-calling** over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool.
- **R** — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. **I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.**
---
## 5. Provider-portable model layer (llm-router)
**Proves:** Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing
- **S** — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change.
- **T** — Make model choice a config decision.
- **A** — Introduced `packages/llm-router` as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with `ollama-client` for on-prem/air-gapped inference.
- **R** — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. **Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.**
---
## 6. Multi-tenant isolation as a platform default
**Proves:** Vector DB multi-tenancy · namespace isolation · governance
- **S** — Several products share one platform; a cross-tenant data leak would be catastrophic.
- **T** — Make isolation structural, not per-feature discipline.
- **A** — Every product carries a `productId`; Hermes runs **two fully isolated instances (Vijay/Bheem)** with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant.
- **R** — Isolation is the default the whole platform is partitioned by. **When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.**
---
## 7. Unstructured ingestion pipeline (extraction-service)
**Proves:** Unstructured retrieval · ingestion · provenance
- **S** — Agents needed to answer from external documents and URLs, not just structured data.
- **T** — Turn messy unstructured sources into clean, retrievable, attributable units.
- **A** — Built `extraction-service` (:4005) + `packages/extraction` to parse URLs/docs into retrievable units; `notelett` provides a structured-notes store for human+agent content.
- **R** — A working ingestion path into the fabric. **The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'**
---
## 8. Operational observability for AI systems (Hermes)
**Proves:** Eval-harness home · drift monitoring · production ops
- **S** — Running agentic services in production with no single pane meant blind spots.
- **T** — One control plane for the agentic fabric.
- **A** — Built **Hermes Mission Control** (Next.js + Fastify) with `diagnostics-client`/`telemetry-client`/`monitoring`; the `hermes-ops` module already models both instances as the seed for real data.
- **R** — A live ops console for the ecosystem. **It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.**
---
## 9. Instant blast-radius control (kill-switch + flags)
**Proves:** Governance · Zero Trust · SR 11-7 ("constrain a model in production")
- **S** — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle.
- **T** — Decouple "turn this off" from "ship a release."
- **A** — Adopted `feature-flag-client` + `kill-switch-client` so any model or individual tool can be disabled live; combined with `event-store` audit so the action is logged.
- **R** — Sub-minute containment without a redeploy. **This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.**
---
## 10. Disaster recovery + parity discipline
**Proves:** Production rigor · regulated-grade operations
- **S** — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot.
- **T** — Drive both to parity with persistent backup, watchdog, and **tested** restore.
- **A** — Documented the gap explicitly in the v2 roadmap (`hermes_dashboard_v2_roadmap.md`) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance.
- **R** — A named, prioritized closure plan. **In regulated environments 'we have backups' is not a control until restore is *tested*; I treat untested DR as an open finding, not a checkbox.**
---
## 11. Bounded autonomy with human-in-the-loop
**Proves:** Agentic safety · orchestration · abstain-and-escalate
- **S** — Autonomous agents that never escalate will confidently do the wrong thing.
- **T** — Build escalation into the topology.
- **A** — In `agent-queue`, `failed` routes back to `inbox` for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to **abstain/escalate** (see `01 §5`).
- **R** — The system degrades to a human instead of degrading to a hallucination. **The escalation edge is the most important edge in the graph for a regulated deployment.**
---
## 12. Documentation & decision rigor as an architect
**Proves:** ADRs · blueprints · roadmaps · mentoring / CoE contribution
- **S** — A multi-product ecosystem with multiple agent engines drifts without written decisions.
- **T** — Make architecture legible to engineers and execs.
- **A** — Maintained an ADR directory, roadmaps (`hermes_*_roadmap.md`, `deployment-optimization-roadmap.md`), a repo map, and agent-facing `AGENTS.md`/`CLAUDE.md` so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator.
- **R** — New contributors (human or agent) onboard from canonical docs. **This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.**
---
## Behavioral / leadership prompts — quick frames
| Prompt | Lead with |
|---|---|
| "Tell me about a time you influenced without authority." | #12 docs/ADRs driving multi-agent consistency. |
| "A production AI system gave a wrong answer. What did you do?" | #3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch. |
| "How do you handle disagreement on architecture?" | ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing. |
| "Describe mentoring junior engineers." | The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task. |
| "Biggest technical mistake?" | Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore. |
| "Why this role / why financial services?" | Trading product taught me to engineer for *consequences*; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off. |