# 03 · STAR Interview Bank Twelve stories, each grounded in real ByteLyst work, in **Situation · Task · Action · Result** form, tagged to the JD competency they prove. Keep delivery to ~90 seconds; the **bold** line is your headline if you only get 20 seconds. > Integrity note: these describe real systems in this ecosystem (agent-queue, mcp-server, > llm-router, invt_trdg AI chat, flowmonk, Hermes, extraction-service, two-instance > isolation). Where a story references planned work, it's labeled — present those as > *design decisions and roadmaps you own*, not as shipped-and-measured outcomes. --- ## 1. Multi-agent orchestration without a heavy framework **Proves:** Agentic frameworks · orchestration topology · state-machine design - **S** — We needed to run long-horizon coding tasks across three different agent engines (claude, codex, devin) unattended, but couldn't take a heavy runtime dependency on the operator VM. - **T** — Build a reliable multi-agent runner with explicit state, failure handling, and observability — portable down to bash 3.2. - **A** — Designed `agent-queue` as a **folder-kanban state machine**: `inbox→doing→done/failed` with a `failed→inbox` requeue for human-in-the-loop, an engine flag binding each task to an agent, and `status`/`watch`/`logs` for live observability. The state model maps 1:1 to LangGraph nodes/conditional edges. - **R** — Tasks run auto-approve, survive failures via requeue, and the kanban gives at-a-glance state. **The lesson I carry: orchestration is a state-machine problem first and a framework choice second — which is exactly why porting it onto LangGraph is low-risk.** --- ## 2. A Zero-Trust tool boundary for agents (MCP) **Proves:** MCP architecture · Zero Trust · agentic threat modeling · access-controlled retrieval - **S** — Multiple product agents needed access to sensitive tools (market data, document retrieval) but I refused to hand agents raw credentials or unbounded data access. - **T** — Make the tool layer a **policy enforcement point**, not a passthrough. - **A** — Centralized tools behind `mcp-server` (:4007) with `mcp-client`: a typed/versioned tool registry, an authZ check on **every** call (identity + scope + role), column masking via `field-encrypt`, rate/cost caps with a `kill-switch`, and an audit emit to `event-store`. Threat-modeled confused-deputy, tool-poisoning via retrieved content, and exfiltration. - **R** — Agents hold no secrets; a successful prompt injection still can't exfiltrate unentitled fields, and any tool can be killed live without a redeploy. **Governance lives in the boundary, so no product surface can route around it.** --- ## 3. Grounding by architecture, not by prompt (flowmonk) **Proves:** Grounding · hallucination mitigation · faithfulness - **S** — Users wanted an AI planning assistant, but an LLM inventing a "plan" that violates constraints is worse than no assistant. - **T** — Deliver helpful AI without letting the model be the source of truth. - **A** — Made a **deterministic scheduler authoritative** and **constrained the AI layer to explanation, summarization, and safe recommendation only**. The model narrates and suggests; it can never author the canonical plan. Recommendations carry an audit trail. - **R** — The assistant is helpful *and* can't hallucinate an invalid plan into existence. **This is the cheapest, most reliable hallucination fix I know — and it's the pattern I'd bring to any regulated workflow: scope the model to where being wrong is recoverable.** --- ## 4. Schema-aware tool-calling instead of free Text-to-SQL **Proves:** Structured retrieval · Text-to-SQL judgment · safety - **S** — `invt_trdg` users wanted natural-language access to quotes, trade plans, watchlists, alerts, and goals. - **T** — Give NL access to structured data without the injection/runaway-query risk of free Text-to-SQL. - **A** — Built the AI chat as **typed, parameterized tool-calling** over a known domain: the model selects a vetted operation, not arbitrary SQL. Hybrid asset-class detection (crypto vs. equity) routes to the right tool. - **R** — Natural-language coverage of the whole product, fully auditable, with no arbitrary query surface. **I reserve generative SQL for genuine ad-hoc analytics behind read-only views with row-level security — bounded domains get tool-calling.** --- ## 5. Provider-portable model layer (llm-router) **Proves:** Cloud platform · Azure/Bedrock/Vertex portability · cost/latency routing - **S** — Hard-coding one model provider risked lock-in, blocked data-residency requirements, and made cost/latency tuning a code change. - **T** — Make model choice a config decision. - **A** — Introduced `packages/llm-router` as a provider-abstraction seam (Azure OpenAI primary; Bedrock/Vertex swap-in) with `ollama-client` for on-prem/air-gapped inference. - **R** — A new model or provider is a config change, not a rewrite, and a regulated customer can pin inference to their own tenant. **Portability is a governance feature, not just an engineering nicety — it's how you satisfy data-residency without re-architecting.** --- ## 6. Multi-tenant isolation as a platform default **Proves:** Vector DB multi-tenancy · namespace isolation · governance - **S** — Several products share one platform; a cross-tenant data leak would be catastrophic. - **T** — Make isolation structural, not per-feature discipline. - **A** — Every product carries a `productId`; Hermes runs **two fully isolated instances (Vijay/Bheem)** with separate users, services, and backup repos. The same model maps directly to vector namespaces / index-per-tenant / pgvector schema-per-tenant. - **R** — Isolation is the default the whole platform is partitioned by. **When I add a vector store, multi-tenancy isn't a migration — it's the storage expression of a tenant model I already enforce.** --- ## 7. Unstructured ingestion pipeline (extraction-service) **Proves:** Unstructured retrieval · ingestion · provenance - **S** — Agents needed to answer from external documents and URLs, not just structured data. - **T** — Turn messy unstructured sources into clean, retrievable, attributable units. - **A** — Built `extraction-service` (:4005) + `packages/extraction` to parse URLs/docs into retrievable units; `notelett` provides a structured-notes store for human+agent content. - **R** — A working ingestion path into the fabric. **The roadmap (layout-aware PDF chunking, OCR, table preservation, page-level provenance) is additive on this spine — and provenance is non-negotiable because every answer must cite a clause, not 'a document.'** --- ## 8. Operational observability for AI systems (Hermes) **Proves:** Eval-harness home · drift monitoring · production ops - **S** — Running agentic services in production with no single pane meant blind spots. - **T** — One control plane for the agentic fabric. - **A** — Built **Hermes Mission Control** (Next.js + Fastify) with `diagnostics-client`/`telemetry-client`/`monitoring`; the `hermes-ops` module already models both instances as the seed for real data. - **R** — A live ops console for the ecosystem. **It's the natural home for the eval harness: a faithfulness/relevancy/recall pane plus a factual-drift monitor turns it from infra-ops into AI-quality-ops — which is the v2 roadmap I own.** --- ## 9. Instant blast-radius control (kill-switch + flags) **Proves:** Governance · Zero Trust · SR 11-7 ("constrain a model in production") - **S** — A misbehaving model or tool in production needs to be stoppable in seconds, not a deploy cycle. - **T** — Decouple "turn this off" from "ship a release." - **A** — Adopted `feature-flag-client` + `kill-switch-client` so any model or individual tool can be disabled live; combined with `event-store` audit so the action is logged. - **R** — Sub-minute containment without a redeploy. **This is a literal SR 11-7 control: model risk management requires the ability to immediately constrain a model in production, with an audit trail of who constrained it and when.** --- ## 10. Disaster recovery + parity discipline **Proves:** Production rigor · regulated-grade operations - **S** — Two Hermes instances existed, but only one had a tested backup/restore path; the second was an operational blind spot. - **T** — Drive both to parity with persistent backup, watchdog, and **tested** restore. - **A** — Documented the gap explicitly in the v2 roadmap (`hermes_dashboard_v2_roadmap.md`) and the DR doc, prioritizing the missing backup repo/watchdog/restore for the second instance. - **R** — A named, prioritized closure plan. **In regulated environments 'we have backups' is not a control until restore is *tested*; I treat untested DR as an open finding, not a checkbox.** --- ## 11. Bounded autonomy with human-in-the-loop **Proves:** Agentic safety · orchestration · abstain-and-escalate - **S** — Autonomous agents that never escalate will confidently do the wrong thing. - **T** — Build escalation into the topology. - **A** — In `agent-queue`, `failed` routes back to `inbox` for human triage rather than silently retrying forever; in the RAG design, a sub-SLA faithfulness score routes to **abstain/escalate** (see `01 §5`). - **R** — The system degrades to a human instead of degrading to a hallucination. **The escalation edge is the most important edge in the graph for a regulated deployment.** --- ## 12. Documentation & decision rigor as an architect **Proves:** ADRs · blueprints · roadmaps · mentoring / CoE contribution - **S** — A multi-product ecosystem with multiple agent engines drifts without written decisions. - **T** — Make architecture legible to engineers and execs. - **A** — Maintained an ADR directory, roadmaps (`hermes_*_roadmap.md`, `deployment-optimization-roadmap.md`), a repo map, and agent-facing `AGENTS.md`/`CLAUDE.md` so both humans and coding agents navigate consistently — and authored this very interview/architecture kit as a reusable accelerator. - **R** — New contributors (human or agent) onboard from canonical docs. **This is exactly the 'AI Center of Excellence / reusable accelerators' contribution the role asks for — I default to writing the pattern down so it scales past me.** --- ## Behavioral / leadership prompts — quick frames | Prompt | Lead with | |---|---| | "Tell me about a time you influenced without authority." | #12 docs/ADRs driving multi-agent consistency. | | "A production AI system gave a wrong answer. What did you do?" | #3 grounding-by-architecture + #11 abstain/escalate + #9 kill-switch. | | "How do you handle disagreement on architecture?" | ADR process — capture options, trade-offs, decision, and revisit date; disagree-and-commit in writing. | | "Describe mentoring junior engineers." | The `AGENTS.md`/repo-map pattern: I encode the 'how we work here' so it's teachable, then pair on the first real task. | | "Biggest technical mistake?" | Untested DR on the second Hermes instance (#10) — I'd treated 'backups exist' as 'DR works'; now I gate on a tested restore. | | "Why this role / why financial services?" | Trading product taught me to engineer for *consequences*; FS is where governance-by-architecture matters most and where my MCP/Zero-Trust depth pays off. |