diff --git a/docs/roadmaps/not-started/platform_AGENT_PLATFORM_GAP_ROADMAP_INDEX.md b/docs/roadmaps/not-started/platform_AGENT_PLATFORM_GAP_ROADMAP_INDEX.md new file mode 100644 index 00000000..4b050245 --- /dev/null +++ b/docs/roadmaps/not-started/platform_AGENT_PLATFORM_GAP_ROADMAP_INDEX.md @@ -0,0 +1,109 @@ +# Agent Platform Gaps - Roadmap Index + +> **Purpose:** Turn the current agent-company platform gaps into an actionable roadmap set. +> +> **Scope:** `learning_ai_common_plat` +> +> **Date:** 2026-03-14 +> +> **Status:** Planned + +--- + +## Executive Summary + +The shared platform already covers a large amount of generic SaaS infrastructure: +auth, telemetry, diagnostics, flags, delivery, jobs, marketplace, billing-related +modules, extraction, MCP tooling, and durable queue primitives. + +What is still missing is the **agent control plane**: + +1. durable agent run orchestration +2. org/workspace/team/RBAC +3. agent registry and prompt versioning +4. reusable knowledge/RAG +5. human review and approval queue +6. support case management +7. durable cross-service eventing and worker runtime +8. centralized AI governance and evals +9. AI budget and cost governance +10. enterprise provisioning and SCIM + +This roadmap set breaks those gaps into separate implementation documents so they +can be sequenced without mixing concerns. + +--- + +## Roadmap Set + +1. [Agent Runtime & Orchestration Roadmap](./platform_AGENT_RUNTIME_ORCHESTRATION_ROADMAP.md) +2. [Org, Workspace & RBAC Roadmap](./platform_ORG_WORKSPACE_RBAC_ROADMAP.md) +3. [Agent Registry & Prompt Versioning Roadmap](./platform_AGENT_REGISTRY_PROMPT_VERSIONING_ROADMAP.md) +4. [Knowledge & RAG Service Roadmap](./platform_KNOWLEDGE_RAG_SERVICE_ROADMAP.md) +5. [Human Review & Approval Queue Roadmap](./platform_HUMAN_REVIEW_APPROVAL_QUEUE_ROADMAP.md) +6. [Support Case Management Roadmap](./platform_SUPPORT_CASE_MANAGEMENT_ROADMAP.md) +7. [Durable Event Bus & Worker Runtime Roadmap](./platform_DURABLE_EVENT_BUS_AND_WORKER_RUNTIME_ROADMAP.md) +8. [AI Governance & Evaluation Roadmap](./platform_AI_GOVERNANCE_EVALS_ROADMAP.md) +9. [AI Budget & Cost Governance Roadmap](./platform_AI_BUDGET_COST_GOVERNANCE_ROADMAP.md) +10. [Enterprise Provisioning & SCIM Roadmap](./platform_ENTERPRISE_PROVISIONING_SCIM_ROADMAP.md) + +--- + +## Existing Repo Signals + +These gaps are not theoretical. The current codebase already shows the partial +foundations and the missing layers: + +- Durable queue primitives now exist in `packages/queue/`, but agent orchestration in + `services/mcp-server/src/modules/a2a/runner.ts` is still primarily log-driven. +- `platform-service` has broad product infrastructure, but there is no first-class + org/workspace/team module under `services/platform-service/src/modules/`. +- Enterprise IdP support exists in `services/platform-service/src/modules/auth/enterprise/`, + but enterprise provisioning does not. +- The event bus in `packages/events/src/memory.ts` is in-process only. +- `ai-diagnostics` already uses embeddings and vector similarity, but there is no reusable + knowledge service for general agent retrieval. +- MFA push approvals exist, but there is no general review queue for agent actions. + +--- + +## Recommended Build Order + +### P0 + +1. Agent Runtime & Orchestration +2. Durable Event Bus & Worker Runtime +3. Org, Workspace & RBAC +4. Human Review & Approval Queue + +### P1 + +5. Agent Registry & Prompt Versioning +6. Knowledge & RAG Service +7. AI Budget & Cost Governance +8. AI Governance & Evaluation + +### P2 + +9. Enterprise Provisioning & SCIM +10. Support Case Management + +--- + +## Architectural Guidance + +These docs assume the current repo direction remains: + +- TypeScript + Fastify services +- shared `@bytelyst/*` packages +- `platform-service` as control plane +- `mcp-server` as operator and A2A interface + +However, some missing capabilities are more naturally relational or workflow-heavy +than the current Cosmos-first platform modules. Each roadmap therefore includes: + +- a **recommended stack** for long-term quality +- a **repo-fit alternative** that stays closer to current conventions + +That is intentional. The best industry-standard choice is not always the same as +the least disruptive repo-local choice. diff --git a/docs/roadmaps/not-started/platform_AGENT_REGISTRY_PROMPT_VERSIONING_ROADMAP.md b/docs/roadmaps/not-started/platform_AGENT_REGISTRY_PROMPT_VERSIONING_ROADMAP.md new file mode 100644 index 00000000..3dbdd92c --- /dev/null +++ b/docs/roadmaps/not-started/platform_AGENT_REGISTRY_PROMPT_VERSIONING_ROADMAP.md @@ -0,0 +1,94 @@ +# Agent Registry & Prompt Versioning Roadmap + +> **Purpose:** Create a system of record for agents, prompts, tools, versions, +> rollout states, and release governance. +> +> **Primary Surfaces:** `services/platform-service/`, `services/mcp-server/` +> +> **Status:** Planned +> +> **Estimated Effort:** 2-3 weeks + +--- + +## Why This Is Missing + +The repo has MCP tools and A2A pipelines, but it does not have a persistent registry +for the definitions that power them. Without that, agent behavior is embedded in code +and docs rather than treated as versioned platform data. + +--- + +## Recommended Stack + +- **PostgreSQL** for metadata and version history +- **Blob storage or Git-backed artifacts** for prompt files and larger assets +- **OpenTelemetry** for linking versions to production runs + +### Repo-Fit Alternative + +- Cosmos-backed registry module in `platform-service` +- Prompt artifacts stored in blob storage +- MCP server resolves active versions from `platform-service` + +--- + +## Phase 1 - Core Registry + +- [ ] Create modules: + - [ ] `agent-registry` + - [ ] `prompt-registry` + - [ ] `tool-bundles` +- [ ] Add entities: + - [ ] `AgentDefinition` + - [ ] `AgentVersion` + - [ ] `PromptTemplate` + - [ ] `PromptVersion` + - [ ] `ToolBundle` + - [ ] `ReleaseChannel` +- [ ] Track: + - [ ] owner + - [ ] changelog + - [ ] status: `draft`, `staged`, `active`, `deprecated`, `archived` + - [ ] compatibility constraints + +**Acceptance Criteria** + +- Every production agent has a durable version record +- Prompt changes are diffable and auditable + +--- + +## Phase 2 - Rollouts & Safety + +- [ ] Add staged rollouts by product, org, workspace, or cohort +- [ ] Add allowlists and freeze controls +- [ ] Add prompt approval requirement for sensitive agents +- [ ] Add rollback support +- [ ] Link agent versions to eval results and incidents + +--- + +## Phase 3 - Runtime Integration + +- [ ] `mcp-server` loads active definitions from registry rather than code-only defaults +- [ ] `agent-runs` store `agentVersion` and `promptVersion` +- [ ] support replay against older versions for regression analysis + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| -------------------------------- | -------------------------- | ------------------------------ | --------------------------- | +| PostgreSQL + blob storage | Strong relational history | New datastore | Best long-term | +| Git as source of truth + sync DB | Great developer ergonomics | Dual-source complexity | Good for prompt-heavy teams | +| Cosmos + blob storage | Consistent with repo | Version queries less ergonomic | Good short-term | + +--- + +## Risks + +- Code-only prompt management creates invisible production drift +- Without version pinning, incident replay and audit are weak +- Registry without rollout controls is just a metadata catalog diff --git a/docs/roadmaps/not-started/platform_AGENT_RUNTIME_ORCHESTRATION_ROADMAP.md b/docs/roadmaps/not-started/platform_AGENT_RUNTIME_ORCHESTRATION_ROADMAP.md new file mode 100644 index 00000000..a5e9a1db --- /dev/null +++ b/docs/roadmaps/not-started/platform_AGENT_RUNTIME_ORCHESTRATION_ROADMAP.md @@ -0,0 +1,146 @@ +# Agent Runtime & Orchestration Roadmap + +> **Purpose:** Build a durable execution layer for agent runs, step transitions, +> cancellations, retries, resumability, and operator-visible history. +> +> **Primary Surfaces:** `services/platform-service/`, `services/mcp-server/`, +> `packages/queue/` +> +> **Status:** Planned +> +> **Estimated Effort:** 3-5 weeks + +--- + +## Why This Is Missing + +The repo now has durable queue primitives, but agent execution is still not a +first-class platform service. A2A pipelines in `services/mcp-server/src/modules/a2a/` +are composed code paths rather than durable runs with persistent step state. + +That is enough for prototypes. It is not enough for: + +- long-running multi-step agents +- retries after process restarts +- human escalation in the middle of a run +- cancellation and pause/resume +- auditable run history + +--- + +## Recommended Stack + +### Best Long-Term Industry Standard + +- **Temporal** for workflow orchestration +- **PostgreSQL** for run metadata and operator queries +- **Redis** for short-lived coordination and cache + +### Best Repo-Fit Option + +- `@bytelyst/queue` for durable job dispatch +- `platform-service` run records in Cosmos or datastore abstraction +- `mcp-server` as orchestration client and tool executor + +### Recommendation + +Start with the repo-fit option to get durable runs quickly, but design the run model +so a later move to Temporal is possible without rewriting every agent contract. + +--- + +## Phase 1 - Canonical Run Model + +- [ ] Create `services/platform-service/src/modules/agent-runs/` +- [ ] Define `AgentRunDoc`, `AgentRunStepDoc`, `AgentRunEventDoc` +- [ ] Support states: `queued`, `running`, `waiting_for_input`, `paused`, `succeeded`, `failed`, `cancelled` +- [ ] Add `parentRunId`, `workflowId`, `agentId`, `agentVersion`, `triggerSource` +- [ ] Persist step inputs, outputs, timings, error summaries, and correlation IDs +- [ ] Add APIs: + - [ ] `POST /agent-runs` + - [ ] `GET /agent-runs/:id` + - [ ] `GET /agent-runs/:id/events` + - [ ] `POST /agent-runs/:id/cancel` + - [ ] `POST /agent-runs/:id/pause` + - [ ] `POST /agent-runs/:id/resume` + +**Acceptance Criteria** + +- Every agent run has durable metadata and step history +- A run can be fetched after service restart +- Cancellation and pause are explicit states, not implicit errors + +--- + +## Phase 2 - Queue-Backed Execution + +- [ ] Add `agent.run.execute` queue type on top of `@bytelyst/queue` +- [ ] Add per-step retries with backoff +- [ ] Add lease heartbeat for long-running steps +- [ ] Add idempotency keys for replays +- [ ] Add dead-letter handling and operator inspection +- [ ] Record structured run events for step started/completed/failed/retried + +**Acceptance Criteria** + +- In-flight runs survive worker restart +- Retried steps do not duplicate side effects when idempotency is configured +- Dead-lettered runs are queryable and replayable + +--- + +## Phase 3 - A2A Integration + +- [ ] Replace direct A2A pipeline progression in `mcp-server` with run orchestration APIs +- [ ] Make every pipeline step emit durable run events +- [ ] Support handoff to human review queue +- [ ] Support child runs for delegated agent tasks +- [ ] Add run-level audit links to diagnostics, telemetry, and support systems + +**Acceptance Criteria** + +- `mcp-server` no longer owns the durable run state itself +- A2A pipelines are observable step-by-step +- Human review can pause and later resume a run + +--- + +## Phase 4 - Operator Experience + +- [ ] Admin UI for runs, filters, and replay +- [ ] Timeline view per run +- [ ] Step diff view for prompt/tool transitions +- [ ] Cancel/retry/replay controls +- [ ] SLOs: success rate, mean run duration, retry rate, dead-letter count + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| ---------------------------- | ------------------------------------------------- | --------------------------------- | ----------------------- | +| Temporal | Best workflow semantics, retries, signals, timers | New infra, steeper learning curve | Best long-term | +| BullMQ + Redis + run DB | Simple, common in Node | Workflow semantics are custom | Strong practical option | +| `@bytelyst/queue` + run docs | Lowest disruption to repo | More framework logic to build | Best immediate path | + +--- + +## Risks + +- Custom orchestration can become a weak in-house Temporal clone if not scoped tightly +- If step contracts are not versioned, replay becomes unsafe +- If all state remains in logs, operator tooling will never be reliable + +--- + +## Recommendation + +Implement the v1 run system inside `platform-service` using `@bytelyst/queue`, but +borrow Temporal-style concepts from day one: + +- workflow ID +- run ID +- signals +- child runs +- durable timers +- explicit waiting states diff --git a/docs/roadmaps/not-started/platform_AI_BUDGET_COST_GOVERNANCE_ROADMAP.md b/docs/roadmaps/not-started/platform_AI_BUDGET_COST_GOVERNANCE_ROADMAP.md new file mode 100644 index 00000000..58aa8412 --- /dev/null +++ b/docs/roadmaps/not-started/platform_AI_BUDGET_COST_GOVERNANCE_ROADMAP.md @@ -0,0 +1,94 @@ +# AI Budget & Cost Governance Roadmap + +> **Purpose:** Add per-tenant and per-agent controls for model spend, quotas, +> budgets, alerts, and invoiceable AI usage. +> +> **Primary Surface:** `services/platform-service/` +> +> **Status:** Planned +> +> **Estimated Effort:** 2-3 weeks + +--- + +## Why This Is Missing + +Model usage is often more volatile than standard API usage. Agent companies need +controls for: + +- daily and monthly spend caps +- per-workspace or per-agent budgets +- model allowlists and deny rules +- burst protection +- usage attribution for billing + +The repo has usage and billing modules, but not a dedicated AI cost governance layer. + +--- + +## Recommended Stack + +- `platform-service` cost governance module +- usage ledger in Cosmos or PostgreSQL +- provider-specific pricing tables +- alerting through Slack, Telegram, and email + +### Recommendation + +This fits naturally in `platform-service`. The key is not the datastore; it is the +quality of attribution and enforcement. + +--- + +## Phase 1 - Usage Ledger + +- [ ] Create modules: + - [ ] `ai-usage` + - [ ] `ai-budgets` + - [ ] `ai-pricing` +- [ ] Store: + - [ ] tenant + - [ ] workspace + - [ ] agent + - [ ] provider + - [ ] model + - [ ] tokens or units + - [ ] cost estimate + - [ ] request correlation ID + +--- + +## Phase 2 - Enforcement + +- [ ] Preflight budget checks before expensive calls +- [ ] rate and spend throttles +- [ ] model allowlists by tenant +- [ ] degrade-to-cheaper-model policy +- [ ] hard cap vs soft cap behavior + +--- + +## Phase 3 - Visibility + +- [ ] Admin reports by tenant, agent, provider, and model +- [ ] budget burn-down alerts +- [ ] anomaly detection for spend spikes +- [ ] export for finance and customer invoicing + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| --------------------------------------- | --------------------------------- | ----------------------------- | ------------------------- | +| Platform-native ledger + pricing tables | Full control and tenant awareness | Requires pricing upkeep | Best fit | +| External spend tool only | Fast bootstrap | Weak product attribution | Limited | +| Billing-module extension only | Less module sprawl | AI-specific logic gets buried | Acceptable but less clear | + +--- + +## Risks + +- Without spend controls, one bad prompt or loop can create material cost +- Without tenant attribution, enterprise billing becomes unreliable +- Without enforcement, dashboards become retrospective only diff --git a/docs/roadmaps/not-started/platform_AI_GOVERNANCE_EVALS_ROADMAP.md b/docs/roadmaps/not-started/platform_AI_GOVERNANCE_EVALS_ROADMAP.md new file mode 100644 index 00000000..56d789de --- /dev/null +++ b/docs/roadmaps/not-started/platform_AI_GOVERNANCE_EVALS_ROADMAP.md @@ -0,0 +1,92 @@ +# AI Governance & Evaluation Roadmap + +> **Purpose:** Centralize evals, policy enforcement, safety review, release gates, +> and regression tracking for prompts, agents, and model behavior. +> +> **Primary Surfaces:** `services/platform-service/`, `services/extraction-service/`, +> `services/mcp-server/` +> +> **Status:** Planned +> +> **Estimated Effort:** 3-5 weeks + +--- + +## Why This Is Missing + +The repo has useful pieces: + +- extraction evals +- telemetry and diagnostics +- flags and experiments + +What it does not have is a central AI governance surface that answers: + +- which prompts are approved +- which eval suite a release passed +- what policies apply to a class of agents +- what changed after a model or prompt update + +--- + +## Recommended Stack + +- `platform-service` governance modules +- OpenTelemetry for trace-linked evidence +- Promptfoo or a similar eval harness for offline regression +- policy layer using code-first rules first, with optional Cedar or OPA later + +--- + +## Phase 1 - Eval Registry + +- [ ] Create modules: + - [ ] `ai-evals` + - [ ] `ai-policies` + - [ ] `ai-releases` +- [ ] Add entities: + - [ ] benchmark set + - [ ] eval run + - [ ] eval result + - [ ] policy decision + - [ ] release gate + +--- + +## Phase 2 - Policy Engine + +- [ ] Add policy checks for: + - [ ] allowed models + - [ ] max temperature + - [ ] blocked tools + - [ ] required human review + - [ ] tenant-specific restrictions +- [ ] Add release gates based on eval thresholds +- [ ] Add regression detection on prompt or model changes + +--- + +## Phase 3 - Operational Governance + +- [ ] Link agent and prompt versions to eval runs +- [ ] Add incident-driven rollback recommendations +- [ ] Add policy override audit logs +- [ ] Add dashboards for pass rate, drift, and blocked releases + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| ----------------------------- | -------------------------- | ------------------------ | ------------------- | +| Promptfoo + platform registry | Good current ecosystem fit | Need custom service glue | Best near-term | +| Custom eval runner only | Full control | Reinvents too much | Weak starting point | +| OPA/Cedar-backed governance | Strong policy model | More complexity | Good phase 2+ | + +--- + +## Risks + +- Shipping prompts without eval gating causes avoidable regressions +- Governance only in docs will drift from runtime +- No policy audit trail creates enterprise trust problems diff --git a/docs/roadmaps/not-started/platform_DURABLE_EVENT_BUS_AND_WORKER_RUNTIME_ROADMAP.md b/docs/roadmaps/not-started/platform_DURABLE_EVENT_BUS_AND_WORKER_RUNTIME_ROADMAP.md new file mode 100644 index 00000000..de27a2f8 --- /dev/null +++ b/docs/roadmaps/not-started/platform_DURABLE_EVENT_BUS_AND_WORKER_RUNTIME_ROADMAP.md @@ -0,0 +1,94 @@ +# Durable Event Bus & Worker Runtime Roadmap + +> **Purpose:** Replace in-process eventing and scattered background execution +> with a durable cross-service event and worker backbone. +> +> **Primary Surfaces:** `packages/events/`, `packages/queue/`, `services/platform-service/` +> +> **Status:** Planned +> +> **Estimated Effort:** 3-4 weeks + +--- + +## Why This Is Missing + +`packages/events/src/memory.ts` is an in-process event bus. That is useful for local +dispatch inside one process, but it is not enough for: + +- cross-service subscriptions +- replay +- dead-letter handling +- durable delivery +- delayed fan-out + +The new `@bytelyst/queue` package improves durable background work, but the eventing +layer is still incomplete. + +--- + +## Recommended Stack + +### Best Long-Term Industry Standard + +- **Redis Streams** or **NATS JetStream** for durable event delivery +- `@bytelyst/queue` or BullMQ for work execution +- OpenTelemetry for trace correlation + +### Repo-Fit Option + +- Add a durable adapter to `@bytelyst/events` +- Use Redis-backed delivery first +- Keep current memory bus as test/dev adapter + +### Recommendation + +Use `@bytelyst/events` as the interface, but add a durable Redis or NATS adapter. +Do not let direct in-memory emitters remain the production default for critical flows. + +--- + +## Phase 1 - Event Abstraction + +- [ ] Extend `@bytelyst/events` to support pluggable backends +- [ ] Keep `memory` for tests +- [ ] Add `redis-streams` or `jetstream` adapter +- [ ] Add consumer groups, ack, retry, and dead-letter support +- [ ] Add correlation and causation IDs + +--- + +## Phase 2 - Worker Runtime + +- [ ] Standardize worker bootstrap pattern +- [ ] Add handler registration, concurrency controls, leases, and health endpoints +- [ ] Add poison-message and dead-letter inspection +- [ ] Add scheduling and delayed dispatch + +--- + +## Phase 3 - Service Migration + +- [ ] Move delivery subscribers onto durable events +- [ ] Move auth side effects off fire-and-forget local emitters +- [ ] Move MCP/A2A transitions onto durable events where appropriate +- [ ] Add observability for event lag and failure rate + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| --------------- | ----------------------------------------------- | ------------------------------ | ------------------- | +| NATS JetStream | Strong event semantics, lightweight | New infra and integration work | Excellent long-term | +| Redis Streams | Familiar, easy to adopt with BullMQ-style stack | Less specialized than NATS | Best pragmatic path | +| Kafka | Powerful at scale | Heavy operational footprint | Overkill now | +| Memory bus only | Simple | Not durable | Dev/test only | + +--- + +## Risks + +- In-process events hide failures and block cross-service reliability +- Durable queues without durable events still leave side effects fragile +- Multiple custom worker patterns will drift without a standard runtime diff --git a/docs/roadmaps/not-started/platform_ENTERPRISE_PROVISIONING_SCIM_ROADMAP.md b/docs/roadmaps/not-started/platform_ENTERPRISE_PROVISIONING_SCIM_ROADMAP.md new file mode 100644 index 00000000..0b7df308 --- /dev/null +++ b/docs/roadmaps/not-started/platform_ENTERPRISE_PROVISIONING_SCIM_ROADMAP.md @@ -0,0 +1,81 @@ +# Enterprise Provisioning & SCIM Roadmap + +> **Purpose:** Extend enterprise identity from federation-only to full lifecycle +> provisioning, deprovisioning, group sync, and seat governance. +> +> **Primary Surface:** `services/platform-service/src/modules/auth/enterprise/` +> +> **Status:** Planned +> +> **Estimated Effort:** 2-3 weeks + +--- + +## Why This Is Missing + +The platform already has enterprise SAML and OIDC federation. That solves login. +It does not solve enterprise lifecycle management: + +- just-in-time user provisioning policies +- SCIM user sync +- group sync +- deprovisioning +- seat and entitlement mapping + +--- + +## Recommended Stack + +- Extend `platform-service` enterprise auth +- SCIM 2.0 endpoints in Fastify +- org/workspace mapping from the tenant model +- optional background sync jobs using `@bytelyst/queue` + +--- + +## Phase 1 - SCIM Foundations + +- [ ] Add SCIM service provider config endpoint +- [ ] Add SCIM resource schemas +- [ ] Add endpoints for: + - [ ] `/scim/v2/Users` + - [ ] `/scim/v2/Groups` + - [ ] PATCH + - [ ] deactivate +- [ ] Add enterprise API tokens and audit logs + +--- + +## Phase 2 - Provisioning Rules + +- [ ] Map SCIM users to org/workspace memberships +- [ ] Map groups to roles or teams +- [ ] Support seat assignment and revocation +- [ ] Add deprovision grace policy + +--- + +## Phase 3 - Admin Controls + +- [ ] Admin UI for provisioning state and sync errors +- [ ] reconciliation jobs +- [ ] audit exports +- [ ] break-glass override flows + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| ------------------------------- | ----------------------------------- | ------------------------------------ | --------------------------------- | +| Native SCIM in platform-service | Full control, strong enterprise fit | Must implement spec carefully | Best long-term | +| IdP proxy product | Faster setup | External dependency and less control | Acceptable only if needed quickly | +| JIT only | Minimal effort | Not enough for enterprise IT | Inadequate | + +--- + +## Risks + +- Enterprise login without enterprise provisioning still creates admin pain +- Group mapping drift leads to incorrect access +- Deprovision lag is a real security risk diff --git a/docs/roadmaps/not-started/platform_HUMAN_REVIEW_APPROVAL_QUEUE_ROADMAP.md b/docs/roadmaps/not-started/platform_HUMAN_REVIEW_APPROVAL_QUEUE_ROADMAP.md new file mode 100644 index 00000000..ca81cbef --- /dev/null +++ b/docs/roadmaps/not-started/platform_HUMAN_REVIEW_APPROVAL_QUEUE_ROADMAP.md @@ -0,0 +1,93 @@ +# Human Review & Approval Queue Roadmap + +> **Purpose:** Add a generic human-in-the-loop system for agent actions, +> escalations, approvals, and quality review. +> +> **Primary Surface:** `services/platform-service/` +> +> **Status:** Planned +> +> **Estimated Effort:** 2-3 weeks + +--- + +## Why This Is Missing + +The platform has MFA push approvals, but that is a narrow auth flow. An agent company +also needs a generic review queue for cases like: + +- send this message +- execute this external action +- publish this recommendation +- approve this prompt change +- inspect low-confidence output + +--- + +## Recommended Stack + +- `platform-service` review module +- `@bytelyst/queue` for routing and escalation timers +- Slack and Telegram delivery adapters for reviewer notifications +- Optional policy engine later with OpenFGA or Cedar + +--- + +## Phase 1 - Review Objects + +- [ ] Create modules: + - [ ] `reviews` + - [ ] `approvals` + - [ ] `escalations` +- [ ] Define review object fields: + - [ ] subject type + - [ ] subject ID + - [ ] review reason + - [ ] risk level + - [ ] required decision type + - [ ] assigned reviewer(s) + - [ ] SLA and due time + - [ ] supporting evidence +- [ ] Add states: + - [ ] pending + - [ ] claimed + - [ ] approved + - [ ] rejected + - [ ] expired + - [ ] superseded + +--- + +## Phase 2 - Workflow Integration + +- [ ] Allow agent runs to emit `waiting_for_review` +- [ ] Add review decision callbacks to resume or cancel runs +- [ ] Add escalation timers and reassignment +- [ ] Add reviewer comments and audit trail + +--- + +## Phase 3 - Reviewer Experience + +- [ ] API and admin UI queue +- [ ] bulk claim and assignment +- [ ] notification fan-out via Slack/Telegram/email +- [ ] filters by risk, workspace, agent, age, reviewer + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| --------------------------------------- | -------------------------- | ---------------------------------------------- | ------------------- | +| Platform module + queue + notifications | Simple and aligned to repo | More UI to build | Best immediate path | +| Commercial ticketing/workflow tool | Fast start | External dependency and poor control-plane fit | Poor long-term | +| Dedicated BPM engine | Powerful | Too heavy for initial need | Overkill initially | + +--- + +## Risks + +- If approvals are only implemented ad hoc per module, policy becomes inconsistent +- If decisions are not audit logged, enterprise trust will be weak +- Review queues without SLA and ownership become dead letter inboxes diff --git a/docs/roadmaps/not-started/platform_KNOWLEDGE_RAG_SERVICE_ROADMAP.md b/docs/roadmaps/not-started/platform_KNOWLEDGE_RAG_SERVICE_ROADMAP.md new file mode 100644 index 00000000..88810cb0 --- /dev/null +++ b/docs/roadmaps/not-started/platform_KNOWLEDGE_RAG_SERVICE_ROADMAP.md @@ -0,0 +1,117 @@ +# Knowledge & RAG Service Roadmap + +> **Purpose:** Build a shared knowledge platform for ingestion, chunking, +> embeddings, retrieval, citations, and access-controlled context assembly. +> +> **Primary Surfaces:** `services/platform-service/`, `services/extraction-service/` +> +> **Status:** Planned +> +> **Estimated Effort:** 4-6 weeks + +--- + +## Why This Is Missing + +The repo already has extraction and some vector-based diagnostics work, but there is +no reusable platform service for general retrieval-augmented generation across +products and agents. + +Every serious agent company eventually needs: + +- managed document ingestion +- chunking and metadata +- embeddings +- retrieval APIs +- citations and provenance +- workspace-aware access control + +--- + +## Recommended Stack + +### Best Long-Term Industry Standard + +- **PostgreSQL + pgvector** for integrated metadata + vector search +- **Qdrant** if vector-first performance becomes dominant +- **Blob storage** for source documents + +### Cloud-Native Alternative + +- **Azure AI Search** for retrieval +- Cosmos or Postgres for metadata + +### Recommendation + +Use PostgreSQL + pgvector if you want the strongest balance of flexibility, +ownership, and industry-standard retrieval patterns. Azure AI Search is a valid +alternative if deep Azure integration matters more than datastore simplicity. + +--- + +## Phase 1 - Knowledge Objects + +- [ ] Create modules: + - [ ] `knowledge-sources` + - [ ] `knowledge-documents` + - [ ] `knowledge-chunks` + - [ ] `knowledge-indexes` +- [ ] Add ingestion states: + - [ ] uploaded + - [ ] parsed + - [ ] chunked + - [ ] embedded + - [ ] indexed + - [ ] failed +- [ ] Add source provenance: + - [ ] filename + - [ ] URL + - [ ] connector type + - [ ] page or section references + +--- + +## Phase 2 - Retrieval Pipeline + +- [ ] Add chunking service with configurable strategies +- [ ] Add embedding generation pipeline +- [ ] Add hybrid search: + - [ ] lexical + - [ ] vector + - [ ] metadata filters +- [ ] Add citation builder and quote bounds +- [ ] Add workspace and org scoping + +**Acceptance Criteria** + +- Retrieval returns chunks with citations and permission-safe metadata +- Different products can share the same retrieval API + +--- + +## Phase 3 - Connectors + +- [ ] File upload +- [ ] Web page ingestion +- [ ] Notes/workspace connector +- [ ] Blob-backed ingestion +- [ ] Optional Slack/Confluence/Google Drive connectors + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| ------------------------ | ------------------------------------------ | ---------------------------- | -------------------------------- | +| Postgres + pgvector | Strong standard, unified metadata + vector | Requires new datastore | Best overall | +| Qdrant + metadata DB | Great vector performance | Two systems to operate | Good at scale | +| Azure AI Search | Strong managed search | Vendor-tighter coupling | Best Azure-managed option | +| Cosmos vector workaround | Least disruption | Not ideal as main RAG engine | Avoid as primary long-term stack | + +--- + +## Risks + +- Retrieval without access control causes data leakage between tenants +- Retrieval without citations causes trust and compliance issues +- Embeddings without source lifecycle management become stale quickly diff --git a/docs/roadmaps/not-started/platform_ORG_WORKSPACE_RBAC_ROADMAP.md b/docs/roadmaps/not-started/platform_ORG_WORKSPACE_RBAC_ROADMAP.md new file mode 100644 index 00000000..4b464650 --- /dev/null +++ b/docs/roadmaps/not-started/platform_ORG_WORKSPACE_RBAC_ROADMAP.md @@ -0,0 +1,122 @@ +# Org, Workspace & RBAC Roadmap + +> **Purpose:** Add a first-class tenant model for organizations, workspaces, +> teams, memberships, scoped roles, and admin governance. +> +> **Primary Surface:** `services/platform-service/` +> +> **Status:** Planned +> +> **Estimated Effort:** 3-4 weeks + +--- + +## Why This Is Missing + +The platform has users and per-product memberships, but no canonical model for: + +- organizations +- workspaces +- teams +- workspace-scoped roles +- resource ownership and sharing + +Enterprise IdP support exists, but it does not replace a real tenant model. + +--- + +## Recommended Stack + +### Best Long-Term Industry Standard + +- **PostgreSQL** +- **Drizzle ORM** or **Prisma** +- **OpenFGA** or Zanzibar-style authorization model for fine-grained access + +### Best Repo-Fit Option + +- `platform-service` module set backed by Cosmos +- Role and membership evaluation in service code +- Optional policy layer later using OpenFGA + +### Recommendation + +If tenanting will be central to the business, PostgreSQL is the better long-term +fit because org/workspace membership is relational by nature. If short-term +consistency matters more, start in Cosmos but keep the permission model portable. + +--- + +## Phase 1 - Data Model + +- [ ] Create modules: + - [ ] `orgs` + - [ ] `workspaces` + - [ ] `teams` + - [ ] `memberships` + - [ ] `roles` +- [ ] Define resources: + - [ ] organization + - [ ] workspace + - [ ] team + - [ ] service account + - [ ] API key +- [ ] Define roles: + - [ ] `org_owner` + - [ ] `org_admin` + - [ ] `workspace_admin` + - [ ] `workspace_editor` + - [ ] `workspace_viewer` + - [ ] `support_operator` +- [ ] Add invitation and deprovision flows + +**Acceptance Criteria** + +- Every protected resource can be tied to org/workspace ownership +- Users can belong to multiple workspaces with different roles + +--- + +## Phase 2 - Authorization + +- [ ] Add authorization helpers to `@bytelyst/auth` or a new `@bytelyst/authorization` +- [ ] Evaluate permissions by resource and action +- [ ] Add policy checks for: + - [ ] read + - [ ] write + - [ ] execute + - [ ] approve + - [ ] administer +- [ ] Add service account and API key scopes + +**Acceptance Criteria** + +- Endpoints no longer rely only on flat `admin` vs `user` +- Policies are testable and reusable across modules + +--- + +## Phase 3 - Product Integration + +- [ ] Migrate existing modules that should be workspace-scoped +- [ ] Add workspace headers or explicit route scoping +- [ ] Connect enterprise IdP claims to org/workspace resolution +- [ ] Add audit entries for membership and role changes + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| --------------------------- | ----------------------------------- | -------------------------------- | ---------------------- | +| PostgreSQL + OpenFGA | Best long-term for RBAC and sharing | New datastore + auth layer | Best industry-standard | +| PostgreSQL only | Simpler than OpenFGA, still strong | Fine-grained auth gets custom | Good medium path | +| Cosmos + service-level RBAC | Lowest disruption | Harder joins and policy richness | Good short-term | + +--- + +## Risks + +- Flat roles will become a blocker for enterprise and multi-agent collaboration +- Delaying workspace boundaries causes later data migrations +- Fine-grained sharing is hard to retrofit once data models hardcode user ownership diff --git a/docs/roadmaps/not-started/platform_SUPPORT_CASE_MANAGEMENT_ROADMAP.md b/docs/roadmaps/not-started/platform_SUPPORT_CASE_MANAGEMENT_ROADMAP.md new file mode 100644 index 00000000..7d1bfaa1 --- /dev/null +++ b/docs/roadmaps/not-started/platform_SUPPORT_CASE_MANAGEMENT_ROADMAP.md @@ -0,0 +1,87 @@ +# Support Case Management Roadmap + +> **Purpose:** Build a platform-native case system for customer issues, agent +> escalations, internal triage, and resolution tracking. +> +> **Primary Surfaces:** `services/platform-service/`, `services/mcp-server/` +> +> **Status:** Planned +> +> **Estimated Effort:** 3-4 weeks + +--- + +## Why This Is Missing + +The repo has diagnostics, telemetry, debug tooling, and support-oriented MCP helpers. +What it lacks is a canonical case record that ties them together. + +Without a case system, support work becomes fragmented across logs, chat, and ad hoc notes. + +--- + +## Recommended Stack + +- `platform-service` case module +- Cosmos or PostgreSQL for case records +- Blob storage for attachments and debug packs +- Notification hooks to Slack/Telegram/email + +### Recommendation + +This can live comfortably in `platform-service`. If the case domain becomes highly +relational, PostgreSQL is better. Otherwise a Cosmos-backed module is acceptable. + +--- + +## Phase 1 - Core Case Model + +- [ ] Create modules: + - [ ] `cases` + - [ ] `case-comments` + - [ ] `case-attachments` + - [ ] `case-links` +- [ ] Track: + - [ ] customer or workspace + - [ ] severity + - [ ] status + - [ ] assignee + - [ ] linked run IDs + - [ ] linked diagnostics sessions + - [ ] linked incidents and releases + +--- + +## Phase 2 - Operational Workflow + +- [ ] Add triage statuses and SLA timers +- [ ] Add handoff between support, engineering, and operations +- [ ] Add debug-pack ingestion +- [ ] Add incident and case cross-links +- [ ] Add case templates for common issue categories + +--- + +## Phase 3 - Agent Integration + +- [ ] Let agents create draft cases from failed or escalated runs +- [ ] Let support operators ask MCP tools for case-linked diagnostics +- [ ] Add case summarization and next-step suggestions + +--- + +## Tech Stack Options + +| Option | Pros | Cons | Fit | +| --------------------------- | ----------------------------- | ------------------------------- | --------------------- | +| Platform-native case module | Full control, integrates well | More work up front | Best long-term | +| External helpdesk sync | Faster bootstrap | Split system of record | Good only if required | +| Ticket tool only | Lowest build effort | Weak agent-platform integration | Poor strategic fit | + +--- + +## Risks + +- No unified case object means poor support analytics and weak escalations +- External-only support systems hide key agent and diagnostics context +- If cases cannot link to runs and review queues, operators lose causal context