docs(codex): platform+product refactor roadmap (2026-02-14)

2026-02-14 12:09:14 -08:00 · 2026-02-14 12:09:14 -08:00 · 468a6c6072
commit 468a6c6072
parent d97c4ce4f0
1 changed files with 286 additions and 0 deletions
--- a/docs/codex/platform_and_products_refactor_2026-02-14.md
+++ b/docs/codex/platform_and_products_refactor_2026-02-14.md
@ -0,0 +1,286 @@
 # Platform + Product Repo Refactor (ByteLyst) — 2026-02-14
 ## Why This Document Exists
 You currently have 3 repos checked out side-by-side:
 - `learning_ai_common_plat`: shared TypeScript packages (`@bytelyst/*`) + shared Fastify microservices + monitoring stack
 - `learning_voice_ai_agent` (LysnrAI): Python desktop app + FastAPI backend + multiple Next.js dashboards, currently consuming platform code via sibling-repo references
 - `learning_multimodal_memory_agents` (MindLyst): KMP (Android/iOS) + Next.js web app with many API routes (some directly using Cosmos DB)
 The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping.
 Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we *can* adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates.
 ---
 ## Current State (What You Already Have)
 ### Common repo (`learning_ai_common_plat`)
 - Shared libraries already exist as a pnpm workspace:
  - `packages/*`: `api-client`, `auth`, `config`, `cosmos`, `errors`, `fastify-core`, `logger`, `react-auth`, `design-tokens`, `testing`
 - Shared services already exist:
  - `services/*`: `platform-service` (auth/audit/flags/notifications/blob), `billing-service`, `growth-service`, `tracker-service`, `monitoring` (Loki/Grafana)
 - This is already the “right direction”: platform primitives are centralized.
 ### Voice repo (`learning_voice_ai_agent`)
 - A product repo that currently contains:
  - Product-specific clients: desktop app (Python), mobile app, dashboards (Next.js)
  - A **product backend** (FastAPI) that currently overlaps conceptually with platform-service/billing-service responsibilities (auth, notifications, usage, licenses, etc.)
 - The dashboards reference local platform packages (`file:../../learning_ai_common_plat/...`), which is convenient for hacking but fragile in CI/CD and for teammates.
 ### MindLyst repo (`learning_multimodal_memory_agents`)
 - Product-specific KMP architecture is clean (KMP shared module owns business logic).
 - The Next.js web app contains many API routes and includes a Cosmos helper (`src/lib/cosmos.ts`) that directly uses Cosmos keys.
 ---
 ## Target End State
 ### 1) A True “Platform Repo” (common libs + common services)
 `learning_ai_common_plat` becomes the only place you implement:
 - Cross-cutting libraries:
  - Auth primitives (JWT + middleware), error taxonomy, config schemas, structured logging, telemetry, Cosmos adapters, API client utilities, design tokens
 - Cross-product services (deployed once, used by all products):
  - Identity/auth, feature flags, audit log, notification delivery, billing/entitlements, growth/referrals, tracker/feedback
 - Service contracts + SDKs:
  - OpenAPI specs (source of truth)
  - Generated TypeScript + Python SDK clients (and optionally Kotlin)
 ### 2) Product repos contain only product logic + UI + thin adapters
 `learning_voice_ai_agent` and `learning_multimodal_memory_agents` keep:
 - UI/clients and product UX
 - Product-specific domain logic (e.g., dictation UX; MindLyst brains/triage UX)
 - Minimal backend surface area:
  - Either none (UI talks to platform services)
  - Or a product backend that only owns what is truly product-specific, and delegates everything else to platform services
 ### 3) Consumption is “boring”
 - Products do **not** depend on sibling repo paths.
 - Products depend on **versioned** platform packages and/or **stable HTTP APIs**.
 - Local development supports “edit platform + test in product” with one command.
 ---
 ## The Big Refactor: Move From “Shared Code via Sibling Paths” to “Platform Interfaces”
 There are 2 kinds of reuse; treating them differently is the key.
 ### A) Library reuse (same language)
 Use shared packages for TypeScript consumers:
 - Next.js apps
 - Fastify services
 This is what `@bytelyst/*` already does well.
 ### B) Cross-language reuse (Python/Kotlin/TS)
 Do **not** try to share source code.
 Instead, share:
 - Service APIs (OpenAPI)
 - Generated SDK clients (TS/Python; optionally Kotlin)
 - Shared operational standards (auth tokens, error format, correlation IDs)
 This is the scalable move if you want a platform that works across products and languages.
 ---
 ## Concrete Recommendations (Best Refactoring Moves)
 ### Recommendation 1: Formalize “Platform APIs” as the primary integration point
 Make the platform services the source of truth for:
 - Identity/auth (tokens, sessions, roles)
 - Billing + entitlements + licenses
 - Audit log
 - Feature flags / kill switches
 - Notifications
 Then:
 - `learning_voice_ai_agent/backend` becomes either:
  - A thin product service (transcripts, dictation-specific workflows), or
  - Removed entirely if its endpoints are generic and can be served by platform services.
 This reduces duplicated auth/license logic across products and removes “N ways to do identity”.
 ### Recommendation 2: Stop using `file:` dependencies across repos (except for a temporary dev mode)
 Replace cross-repo `file:../../learning_ai_common_plat/...` with:
 - Private package publishing (GitHub Packages or npm private registry), or
 - A single monorepo (if you ever decide separation is hurting velocity)
 In a multi-repo setup, publishing is usually the right move.
 ### Recommendation 3: Add OpenAPI + client generation to the platform repo
 Put OpenAPI specs under something like:
 ```
 learning_ai_common_plat/contracts/
  platform-service.openapi.yaml
  billing-service.openapi.yaml
  growth-service.openapi.yaml
  tracker-service.openapi.yaml
 ```
 Then generate:
 - `packages/sdk-ts/*` (or `packages/api-client` becomes generated)
 - `python/bytelyst_sdk/*` (published as an internal Python package)
 Products consume SDKs, not raw HTTP calls.
 ### Recommendation 4: Standardize “golden path” service scaffolding
 Add a generator/template for a new service that includes:
 - `@bytelyst/config` env schema pattern
 - `@bytelyst/errors` error mapping + RFC7807-ish response format
 - structured logging (`@bytelyst/logger`)
 - `/health`, `/metrics`, OpenAPI, request IDs, CORS defaults
 - docker + compose wiring
 This is one of the highest-leverage “how top orgs build” moves: it makes the right thing the easy thing.
 ### Recommendation 5: MindLyst: migrate away from direct Cosmos usage in Next API routes (optional but recommended)
 Two options (pick one):
 1. Keep Next API routes, but use `@bytelyst/cosmos` + platform-style config/errors/logging to reduce duplication.
 2. Prefer a MindLyst product service (or platform “memory service”) so Cosmos keys live only in backend services, not in the Next runtime.
 If MindLyst is going mobile-first with on-device SQLDelight, this can be phased; don’t block product progress on it.
 ---
 ## “How Do I Use This Effectively?” (Developer Workflow)
 ### Workflow goal
 - Most product work should not require editing platform code.
 - When it does, the feedback loop should be < 10 minutes.
 ### Recommended workflow
 1. Platform code changes happen in `learning_ai_common_plat`.
 2. Platform releases are cut frequently (small increments).
 3. Product repos update platform versions as part of regular maintenance (dependabot-like).
 ### For local dev (editing platform + product together)
 Pick one of these patterns:
 1. **Preferred (multi-repo): publish + install**
   - fast and consistent; requires registry + auth token setup
 2. **Temporary dev mode: local tarballs**
   - `pnpm -C learning_ai_common_plat pack` then `npm i <tarball>` in product
 3. **If you must: `file:` deps**
   - keep, but treat as dev-only and document it as such (breaks easily; forces build ordering)
 ---
 ## Roadmap With Task List (Implementation Steps)
 ### Phase 0 (Decisions + Guardrails) — 1-2 days
 - [ ] Decide the integration contract per capability:
  - [ ] Cross-language: **API + SDK** (recommended)
  - [ ] TS-only: **shared package**
 - [ ] Decide the package distribution method:
  - [ ] GitHub Packages (private npm scope) or private npm registry
  - [ ] Python package distribution (private index or GitHub release artifacts)
 - [ ] Define “platform vs product” ownership boundaries:
  - [ ] What must be in platform (identity/billing/flags/notifications)
  - [ ] What remains product-only (dictation UX; MindLyst brains/triage UX)
 ### Phase 1 (Make Platform Libraries Release-Ready) — 2-5 days
 - [ ] Add versioning/release tooling to `learning_ai_common_plat`:
  - [ ] Add Changesets (or equivalent) for `@bytelyst/*`
  - [ ] Enforce semver and changelogs
 - [ ] Ensure every package has:
  - [ ] stable exports (`exports` field), `README.md`, minimal examples
  - [ ] tests for critical primitives (auth/config/errors/cosmos)
 - [ ] Add CI workflows that run on every PR:
  - [ ] `pnpm -w lint`, `pnpm -w typecheck`, `pnpm -w test`, `pnpm -w build`
 ### Phase 2 (Publish Platform Packages) — 1-3 days
 - [ ] Configure publishing for `@bytelyst/*`:
  - [ ] set up registry auth (`.npmrc` in CI only)
  - [ ] publish on merge to `main` with tags
 - [ ] Publish initial versions (`0.1.x`) and document upgrade policy.
 ### Phase 3 (Decouple Product Repos From Sibling Paths) — 2-7 days
 - [ ] In `learning_voice_ai_agent/*-dashboard-web`:
  - [ ] Replace `file:` dependencies with versioned `@bytelyst/*`
  - [ ] Remove any build hacks that assume `../../learning_ai_common_plat/packages/*/dist`
  - [ ] Ensure `npm ci && npm run build` works with no sibling checkout
 - [ ] In MindLyst web:
  - [ ] (Optional) Replace local `src/lib/cosmos.ts` with `@bytelyst/cosmos` usage
 ### Phase 4 (Platform Contracts + SDK Generation) — 3-10 days
 - [ ] Add OpenAPI specs for each platform service.
 - [ ] Add generators:
  - [ ] TypeScript SDK (consumed by dashboards + future clients)
  - [ ] Python SDK (consumed by `learning_voice_ai_agent` desktop/backend)
 - [ ] Update products to use generated clients (reduces drift and hand-written HTTP code).
 ### Phase 5 (Unify Identity/Billing Across Products) — 1-3 weeks (incremental)
 - [ ] Define the canonical identity model (users/orgs/roles/entitlements).
 - [ ] Move generic endpoints out of product backends into platform services:
  - [ ] Auth/token issuance and verification
  - [ ] License issuance/validation
  - [ ] Usage aggregation + limits
  - [ ] Notifications + audit
 - [ ] In product repos:
  - [ ] Replace direct Cosmos access where it duplicates platform (when feasible)
  - [ ] Keep product-only endpoints in a thin product service
 ### Phase 6 (Developer Experience “Golden Paths”) — ongoing
 - [ ] Add templates:
  - [ ] `create-bytelyst-service` scaffold (Fastify + packages)
  - [ ] `create-bytelyst-dashboard` scaffold (Next + auth + client)
 - [ ] Standardize toolchain pinning across repos:
  - [ ] Node (>=20.9), pnpm version, Python (>=3.12 for LysnrAI)
  - [ ] one command per repo: `./quick-check.sh` or `make check`
 - [ ] Add CODEOWNERS + review rules for platform code.
 - [ ] Add feature flagging discipline (flags live in platform; products consume).
 ---
 ## Key Risks / Tradeoffs
 - **Multi-repo + shared libs without publishing** will keep causing fragile builds.
 - **Sharing code across languages** does not scale; share APIs/SDKs instead.
 - **Over-centralizing too early** can slow product iteration; migrate by “thin vertical slices”.
 ---
 ## What I Would Do First (If You Want Maximum ROI)
 1. Publish `@bytelyst/*` packages and remove `file:` cross-repo deps for dashboards.
 2. Make platform-service the canonical auth/flags/audit/notifications boundary and generate SDKs.
 3. Decide whether LysnrAI FastAPI backend is:
   - product-only (dictation/transcripts), or
   - a transitional legacy backend to be decomposed into platform services.