From 468a6c60721e848a2618818cdc9217465b881cd3 Mon Sep 17 00:00:00 2001 From: Saravana Achu Mac Date: Sat, 14 Feb 2026 12:09:14 -0800 Subject: [PATCH] docs(codex): platform+product refactor roadmap (2026-02-14) --- ...atform_and_products_refactor_2026-02-14.md | 286 ++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 docs/codex/platform_and_products_refactor_2026-02-14.md diff --git a/docs/codex/platform_and_products_refactor_2026-02-14.md b/docs/codex/platform_and_products_refactor_2026-02-14.md new file mode 100644 index 00000000..eee9a6c5 --- /dev/null +++ b/docs/codex/platform_and_products_refactor_2026-02-14.md @@ -0,0 +1,286 @@ +# Platform + Product Repo Refactor (ByteLyst) — 2026-02-14 + +## Why This Document Exists + +You currently have 3 repos checked out side-by-side: + +- `learning_ai_common_plat`: shared TypeScript packages (`@bytelyst/*`) + shared Fastify microservices + monitoring stack +- `learning_voice_ai_agent` (LysnrAI): Python desktop app + FastAPI backend + multiple Next.js dashboards, currently consuming platform code via sibling-repo references +- `learning_multimodal_memory_agents` (MindLyst): KMP (Android/iOS) + Next.js web app with many API routes (some directly using Cosmos DB) + +The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping. + +Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we *can* adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates. + +--- + +## Current State (What You Already Have) + +### Common repo (`learning_ai_common_plat`) + +- Shared libraries already exist as a pnpm workspace: + - `packages/*`: `api-client`, `auth`, `config`, `cosmos`, `errors`, `fastify-core`, `logger`, `react-auth`, `design-tokens`, `testing` +- Shared services already exist: + - `services/*`: `platform-service` (auth/audit/flags/notifications/blob), `billing-service`, `growth-service`, `tracker-service`, `monitoring` (Loki/Grafana) +- This is already the “right direction”: platform primitives are centralized. + +### Voice repo (`learning_voice_ai_agent`) + +- A product repo that currently contains: + - Product-specific clients: desktop app (Python), mobile app, dashboards (Next.js) + - A **product backend** (FastAPI) that currently overlaps conceptually with platform-service/billing-service responsibilities (auth, notifications, usage, licenses, etc.) +- The dashboards reference local platform packages (`file:../../learning_ai_common_plat/...`), which is convenient for hacking but fragile in CI/CD and for teammates. + +### MindLyst repo (`learning_multimodal_memory_agents`) + +- Product-specific KMP architecture is clean (KMP shared module owns business logic). +- The Next.js web app contains many API routes and includes a Cosmos helper (`src/lib/cosmos.ts`) that directly uses Cosmos keys. + +--- + +## Target End State + +### 1) A True “Platform Repo” (common libs + common services) + +`learning_ai_common_plat` becomes the only place you implement: + +- Cross-cutting libraries: + - Auth primitives (JWT + middleware), error taxonomy, config schemas, structured logging, telemetry, Cosmos adapters, API client utilities, design tokens +- Cross-product services (deployed once, used by all products): + - Identity/auth, feature flags, audit log, notification delivery, billing/entitlements, growth/referrals, tracker/feedback +- Service contracts + SDKs: + - OpenAPI specs (source of truth) + - Generated TypeScript + Python SDK clients (and optionally Kotlin) + +### 2) Product repos contain only product logic + UI + thin adapters + +`learning_voice_ai_agent` and `learning_multimodal_memory_agents` keep: + +- UI/clients and product UX +- Product-specific domain logic (e.g., dictation UX; MindLyst brains/triage UX) +- Minimal backend surface area: + - Either none (UI talks to platform services) + - Or a product backend that only owns what is truly product-specific, and delegates everything else to platform services + +### 3) Consumption is “boring” + +- Products do **not** depend on sibling repo paths. +- Products depend on **versioned** platform packages and/or **stable HTTP APIs**. +- Local development supports “edit platform + test in product” with one command. + +--- + +## The Big Refactor: Move From “Shared Code via Sibling Paths” to “Platform Interfaces” + +There are 2 kinds of reuse; treating them differently is the key. + +### A) Library reuse (same language) + +Use shared packages for TypeScript consumers: + +- Next.js apps +- Fastify services + +This is what `@bytelyst/*` already does well. + +### B) Cross-language reuse (Python/Kotlin/TS) + +Do **not** try to share source code. + +Instead, share: + +- Service APIs (OpenAPI) +- Generated SDK clients (TS/Python; optionally Kotlin) +- Shared operational standards (auth tokens, error format, correlation IDs) + +This is the scalable move if you want a platform that works across products and languages. + +--- + +## Concrete Recommendations (Best Refactoring Moves) + +### Recommendation 1: Formalize “Platform APIs” as the primary integration point + +Make the platform services the source of truth for: + +- Identity/auth (tokens, sessions, roles) +- Billing + entitlements + licenses +- Audit log +- Feature flags / kill switches +- Notifications + +Then: + +- `learning_voice_ai_agent/backend` becomes either: + - A thin product service (transcripts, dictation-specific workflows), or + - Removed entirely if its endpoints are generic and can be served by platform services. + +This reduces duplicated auth/license logic across products and removes “N ways to do identity”. + +### Recommendation 2: Stop using `file:` dependencies across repos (except for a temporary dev mode) + +Replace cross-repo `file:../../learning_ai_common_plat/...` with: + +- Private package publishing (GitHub Packages or npm private registry), or +- A single monorepo (if you ever decide separation is hurting velocity) + +In a multi-repo setup, publishing is usually the right move. + +### Recommendation 3: Add OpenAPI + client generation to the platform repo + +Put OpenAPI specs under something like: + +``` +learning_ai_common_plat/contracts/ + platform-service.openapi.yaml + billing-service.openapi.yaml + growth-service.openapi.yaml + tracker-service.openapi.yaml +``` + +Then generate: + +- `packages/sdk-ts/*` (or `packages/api-client` becomes generated) +- `python/bytelyst_sdk/*` (published as an internal Python package) + +Products consume SDKs, not raw HTTP calls. + +### Recommendation 4: Standardize “golden path” service scaffolding + +Add a generator/template for a new service that includes: + +- `@bytelyst/config` env schema pattern +- `@bytelyst/errors` error mapping + RFC7807-ish response format +- structured logging (`@bytelyst/logger`) +- `/health`, `/metrics`, OpenAPI, request IDs, CORS defaults +- docker + compose wiring + +This is one of the highest-leverage “how top orgs build” moves: it makes the right thing the easy thing. + +### Recommendation 5: MindLyst: migrate away from direct Cosmos usage in Next API routes (optional but recommended) + +Two options (pick one): + +1. Keep Next API routes, but use `@bytelyst/cosmos` + platform-style config/errors/logging to reduce duplication. +2. Prefer a MindLyst product service (or platform “memory service”) so Cosmos keys live only in backend services, not in the Next runtime. + +If MindLyst is going mobile-first with on-device SQLDelight, this can be phased; don’t block product progress on it. + +--- + +## “How Do I Use This Effectively?” (Developer Workflow) + +### Workflow goal + +- Most product work should not require editing platform code. +- When it does, the feedback loop should be < 10 minutes. + +### Recommended workflow + +1. Platform code changes happen in `learning_ai_common_plat`. +2. Platform releases are cut frequently (small increments). +3. Product repos update platform versions as part of regular maintenance (dependabot-like). + +### For local dev (editing platform + product together) + +Pick one of these patterns: + +1. **Preferred (multi-repo): publish + install** + - fast and consistent; requires registry + auth token setup +2. **Temporary dev mode: local tarballs** + - `pnpm -C learning_ai_common_plat pack` then `npm i ` in product +3. **If you must: `file:` deps** + - keep, but treat as dev-only and document it as such (breaks easily; forces build ordering) + +--- + +## Roadmap With Task List (Implementation Steps) + +### Phase 0 (Decisions + Guardrails) — 1-2 days + +- [ ] Decide the integration contract per capability: + - [ ] Cross-language: **API + SDK** (recommended) + - [ ] TS-only: **shared package** +- [ ] Decide the package distribution method: + - [ ] GitHub Packages (private npm scope) or private npm registry + - [ ] Python package distribution (private index or GitHub release artifacts) +- [ ] Define “platform vs product” ownership boundaries: + - [ ] What must be in platform (identity/billing/flags/notifications) + - [ ] What remains product-only (dictation UX; MindLyst brains/triage UX) + +### Phase 1 (Make Platform Libraries Release-Ready) — 2-5 days + +- [ ] Add versioning/release tooling to `learning_ai_common_plat`: + - [ ] Add Changesets (or equivalent) for `@bytelyst/*` + - [ ] Enforce semver and changelogs +- [ ] Ensure every package has: + - [ ] stable exports (`exports` field), `README.md`, minimal examples + - [ ] tests for critical primitives (auth/config/errors/cosmos) +- [ ] Add CI workflows that run on every PR: + - [ ] `pnpm -w lint`, `pnpm -w typecheck`, `pnpm -w test`, `pnpm -w build` + +### Phase 2 (Publish Platform Packages) — 1-3 days + +- [ ] Configure publishing for `@bytelyst/*`: + - [ ] set up registry auth (`.npmrc` in CI only) + - [ ] publish on merge to `main` with tags +- [ ] Publish initial versions (`0.1.x`) and document upgrade policy. + +### Phase 3 (Decouple Product Repos From Sibling Paths) — 2-7 days + +- [ ] In `learning_voice_ai_agent/*-dashboard-web`: + - [ ] Replace `file:` dependencies with versioned `@bytelyst/*` + - [ ] Remove any build hacks that assume `../../learning_ai_common_plat/packages/*/dist` + - [ ] Ensure `npm ci && npm run build` works with no sibling checkout +- [ ] In MindLyst web: + - [ ] (Optional) Replace local `src/lib/cosmos.ts` with `@bytelyst/cosmos` usage + +### Phase 4 (Platform Contracts + SDK Generation) — 3-10 days + +- [ ] Add OpenAPI specs for each platform service. +- [ ] Add generators: + - [ ] TypeScript SDK (consumed by dashboards + future clients) + - [ ] Python SDK (consumed by `learning_voice_ai_agent` desktop/backend) +- [ ] Update products to use generated clients (reduces drift and hand-written HTTP code). + +### Phase 5 (Unify Identity/Billing Across Products) — 1-3 weeks (incremental) + +- [ ] Define the canonical identity model (users/orgs/roles/entitlements). +- [ ] Move generic endpoints out of product backends into platform services: + - [ ] Auth/token issuance and verification + - [ ] License issuance/validation + - [ ] Usage aggregation + limits + - [ ] Notifications + audit +- [ ] In product repos: + - [ ] Replace direct Cosmos access where it duplicates platform (when feasible) + - [ ] Keep product-only endpoints in a thin product service + +### Phase 6 (Developer Experience “Golden Paths”) — ongoing + +- [ ] Add templates: + - [ ] `create-bytelyst-service` scaffold (Fastify + packages) + - [ ] `create-bytelyst-dashboard` scaffold (Next + auth + client) +- [ ] Standardize toolchain pinning across repos: + - [ ] Node (>=20.9), pnpm version, Python (>=3.12 for LysnrAI) + - [ ] one command per repo: `./quick-check.sh` or `make check` +- [ ] Add CODEOWNERS + review rules for platform code. +- [ ] Add feature flagging discipline (flags live in platform; products consume). + +--- + +## Key Risks / Tradeoffs + +- **Multi-repo + shared libs without publishing** will keep causing fragile builds. +- **Sharing code across languages** does not scale; share APIs/SDKs instead. +- **Over-centralizing too early** can slow product iteration; migrate by “thin vertical slices”. + +--- + +## What I Would Do First (If You Want Maximum ROI) + +1. Publish `@bytelyst/*` packages and remove `file:` cross-repo deps for dashboards. +2. Make platform-service the canonical auth/flags/audit/notifications boundary and generate SDKs. +3. Decide whether LysnrAI FastAPI backend is: + - product-only (dictation/transcripts), or + - a transitional legacy backend to be decomposed into platform services. +