docs(codex): platform+product refactor roadmap (2026-02-14)

This commit is contained in:
Saravana Achu Mac 2026-02-14 12:09:14 -08:00
parent d97c4ce4f0
commit 468a6c6072

View File

@ -0,0 +1,286 @@
# Platform + Product Repo Refactor (ByteLyst) — 2026-02-14
## Why This Document Exists
You currently have 3 repos checked out side-by-side:
- `learning_ai_common_plat`: shared TypeScript packages (`@bytelyst/*`) + shared Fastify microservices + monitoring stack
- `learning_voice_ai_agent` (LysnrAI): Python desktop app + FastAPI backend + multiple Next.js dashboards, currently consuming platform code via sibling-repo references
- `learning_multimodal_memory_agents` (MindLyst): KMP (Android/iOS) + Next.js web app with many API routes (some directly using Cosmos DB)
The goal is to **systematically refactor into a “platform repo”** (common libraries + common services) while keeping **product-specific code in product repos**, with a workflow that feels like how high-performing AI companies build: small PRs, strong automation, stable internal interfaces, and “golden paths” for shipping.
Important constraint: we cannot know exactly how OpenAI/Anthropic run their internal engineering, but we *can* adopt the common patterns used by top-tier product+platform orgs: platform teams, strong CI gates, typed service contracts, SDK generation, trunk-based integration, feature flags, and opinionated templates.
---
## Current State (What You Already Have)
### Common repo (`learning_ai_common_plat`)
- Shared libraries already exist as a pnpm workspace:
- `packages/*`: `api-client`, `auth`, `config`, `cosmos`, `errors`, `fastify-core`, `logger`, `react-auth`, `design-tokens`, `testing`
- Shared services already exist:
- `services/*`: `platform-service` (auth/audit/flags/notifications/blob), `billing-service`, `growth-service`, `tracker-service`, `monitoring` (Loki/Grafana)
- This is already the “right direction”: platform primitives are centralized.
### Voice repo (`learning_voice_ai_agent`)
- A product repo that currently contains:
- Product-specific clients: desktop app (Python), mobile app, dashboards (Next.js)
- A **product backend** (FastAPI) that currently overlaps conceptually with platform-service/billing-service responsibilities (auth, notifications, usage, licenses, etc.)
- The dashboards reference local platform packages (`file:../../learning_ai_common_plat/...`), which is convenient for hacking but fragile in CI/CD and for teammates.
### MindLyst repo (`learning_multimodal_memory_agents`)
- Product-specific KMP architecture is clean (KMP shared module owns business logic).
- The Next.js web app contains many API routes and includes a Cosmos helper (`src/lib/cosmos.ts`) that directly uses Cosmos keys.
---
## Target End State
### 1) A True “Platform Repo” (common libs + common services)
`learning_ai_common_plat` becomes the only place you implement:
- Cross-cutting libraries:
- Auth primitives (JWT + middleware), error taxonomy, config schemas, structured logging, telemetry, Cosmos adapters, API client utilities, design tokens
- Cross-product services (deployed once, used by all products):
- Identity/auth, feature flags, audit log, notification delivery, billing/entitlements, growth/referrals, tracker/feedback
- Service contracts + SDKs:
- OpenAPI specs (source of truth)
- Generated TypeScript + Python SDK clients (and optionally Kotlin)
### 2) Product repos contain only product logic + UI + thin adapters
`learning_voice_ai_agent` and `learning_multimodal_memory_agents` keep:
- UI/clients and product UX
- Product-specific domain logic (e.g., dictation UX; MindLyst brains/triage UX)
- Minimal backend surface area:
- Either none (UI talks to platform services)
- Or a product backend that only owns what is truly product-specific, and delegates everything else to platform services
### 3) Consumption is “boring”
- Products do **not** depend on sibling repo paths.
- Products depend on **versioned** platform packages and/or **stable HTTP APIs**.
- Local development supports “edit platform + test in product” with one command.
---
## The Big Refactor: Move From “Shared Code via Sibling Paths” to “Platform Interfaces”
There are 2 kinds of reuse; treating them differently is the key.
### A) Library reuse (same language)
Use shared packages for TypeScript consumers:
- Next.js apps
- Fastify services
This is what `@bytelyst/*` already does well.
### B) Cross-language reuse (Python/Kotlin/TS)
Do **not** try to share source code.
Instead, share:
- Service APIs (OpenAPI)
- Generated SDK clients (TS/Python; optionally Kotlin)
- Shared operational standards (auth tokens, error format, correlation IDs)
This is the scalable move if you want a platform that works across products and languages.
---
## Concrete Recommendations (Best Refactoring Moves)
### Recommendation 1: Formalize “Platform APIs” as the primary integration point
Make the platform services the source of truth for:
- Identity/auth (tokens, sessions, roles)
- Billing + entitlements + licenses
- Audit log
- Feature flags / kill switches
- Notifications
Then:
- `learning_voice_ai_agent/backend` becomes either:
- A thin product service (transcripts, dictation-specific workflows), or
- Removed entirely if its endpoints are generic and can be served by platform services.
This reduces duplicated auth/license logic across products and removes “N ways to do identity”.
### Recommendation 2: Stop using `file:` dependencies across repos (except for a temporary dev mode)
Replace cross-repo `file:../../learning_ai_common_plat/...` with:
- Private package publishing (GitHub Packages or npm private registry), or
- A single monorepo (if you ever decide separation is hurting velocity)
In a multi-repo setup, publishing is usually the right move.
### Recommendation 3: Add OpenAPI + client generation to the platform repo
Put OpenAPI specs under something like:
```
learning_ai_common_plat/contracts/
platform-service.openapi.yaml
billing-service.openapi.yaml
growth-service.openapi.yaml
tracker-service.openapi.yaml
```
Then generate:
- `packages/sdk-ts/*` (or `packages/api-client` becomes generated)
- `python/bytelyst_sdk/*` (published as an internal Python package)
Products consume SDKs, not raw HTTP calls.
### Recommendation 4: Standardize “golden path” service scaffolding
Add a generator/template for a new service that includes:
- `@bytelyst/config` env schema pattern
- `@bytelyst/errors` error mapping + RFC7807-ish response format
- structured logging (`@bytelyst/logger`)
- `/health`, `/metrics`, OpenAPI, request IDs, CORS defaults
- docker + compose wiring
This is one of the highest-leverage “how top orgs build” moves: it makes the right thing the easy thing.
### Recommendation 5: MindLyst: migrate away from direct Cosmos usage in Next API routes (optional but recommended)
Two options (pick one):
1. Keep Next API routes, but use `@bytelyst/cosmos` + platform-style config/errors/logging to reduce duplication.
2. Prefer a MindLyst product service (or platform “memory service”) so Cosmos keys live only in backend services, not in the Next runtime.
If MindLyst is going mobile-first with on-device SQLDelight, this can be phased; dont block product progress on it.
---
## “How Do I Use This Effectively?” (Developer Workflow)
### Workflow goal
- Most product work should not require editing platform code.
- When it does, the feedback loop should be < 10 minutes.
### Recommended workflow
1. Platform code changes happen in `learning_ai_common_plat`.
2. Platform releases are cut frequently (small increments).
3. Product repos update platform versions as part of regular maintenance (dependabot-like).
### For local dev (editing platform + product together)
Pick one of these patterns:
1. **Preferred (multi-repo): publish + install**
- fast and consistent; requires registry + auth token setup
2. **Temporary dev mode: local tarballs**
- `pnpm -C learning_ai_common_plat pack` then `npm i <tarball>` in product
3. **If you must: `file:` deps**
- keep, but treat as dev-only and document it as such (breaks easily; forces build ordering)
---
## Roadmap With Task List (Implementation Steps)
### Phase 0 (Decisions + Guardrails) — 1-2 days
- [ ] Decide the integration contract per capability:
- [ ] Cross-language: **API + SDK** (recommended)
- [ ] TS-only: **shared package**
- [ ] Decide the package distribution method:
- [ ] GitHub Packages (private npm scope) or private npm registry
- [ ] Python package distribution (private index or GitHub release artifacts)
- [ ] Define “platform vs product” ownership boundaries:
- [ ] What must be in platform (identity/billing/flags/notifications)
- [ ] What remains product-only (dictation UX; MindLyst brains/triage UX)
### Phase 1 (Make Platform Libraries Release-Ready) — 2-5 days
- [ ] Add versioning/release tooling to `learning_ai_common_plat`:
- [ ] Add Changesets (or equivalent) for `@bytelyst/*`
- [ ] Enforce semver and changelogs
- [ ] Ensure every package has:
- [ ] stable exports (`exports` field), `README.md`, minimal examples
- [ ] tests for critical primitives (auth/config/errors/cosmos)
- [ ] Add CI workflows that run on every PR:
- [ ] `pnpm -w lint`, `pnpm -w typecheck`, `pnpm -w test`, `pnpm -w build`
### Phase 2 (Publish Platform Packages) — 1-3 days
- [ ] Configure publishing for `@bytelyst/*`:
- [ ] set up registry auth (`.npmrc` in CI only)
- [ ] publish on merge to `main` with tags
- [ ] Publish initial versions (`0.1.x`) and document upgrade policy.
### Phase 3 (Decouple Product Repos From Sibling Paths) — 2-7 days
- [ ] In `learning_voice_ai_agent/*-dashboard-web`:
- [ ] Replace `file:` dependencies with versioned `@bytelyst/*`
- [ ] Remove any build hacks that assume `../../learning_ai_common_plat/packages/*/dist`
- [ ] Ensure `npm ci && npm run build` works with no sibling checkout
- [ ] In MindLyst web:
- [ ] (Optional) Replace local `src/lib/cosmos.ts` with `@bytelyst/cosmos` usage
### Phase 4 (Platform Contracts + SDK Generation) — 3-10 days
- [ ] Add OpenAPI specs for each platform service.
- [ ] Add generators:
- [ ] TypeScript SDK (consumed by dashboards + future clients)
- [ ] Python SDK (consumed by `learning_voice_ai_agent` desktop/backend)
- [ ] Update products to use generated clients (reduces drift and hand-written HTTP code).
### Phase 5 (Unify Identity/Billing Across Products) — 1-3 weeks (incremental)
- [ ] Define the canonical identity model (users/orgs/roles/entitlements).
- [ ] Move generic endpoints out of product backends into platform services:
- [ ] Auth/token issuance and verification
- [ ] License issuance/validation
- [ ] Usage aggregation + limits
- [ ] Notifications + audit
- [ ] In product repos:
- [ ] Replace direct Cosmos access where it duplicates platform (when feasible)
- [ ] Keep product-only endpoints in a thin product service
### Phase 6 (Developer Experience “Golden Paths”) — ongoing
- [ ] Add templates:
- [ ] `create-bytelyst-service` scaffold (Fastify + packages)
- [ ] `create-bytelyst-dashboard` scaffold (Next + auth + client)
- [ ] Standardize toolchain pinning across repos:
- [ ] Node (>=20.9), pnpm version, Python (>=3.12 for LysnrAI)
- [ ] one command per repo: `./quick-check.sh` or `make check`
- [ ] Add CODEOWNERS + review rules for platform code.
- [ ] Add feature flagging discipline (flags live in platform; products consume).
---
## Key Risks / Tradeoffs
- **Multi-repo + shared libs without publishing** will keep causing fragile builds.
- **Sharing code across languages** does not scale; share APIs/SDKs instead.
- **Over-centralizing too early** can slow product iteration; migrate by “thin vertical slices”.
---
## What I Would Do First (If You Want Maximum ROI)
1. Publish `@bytelyst/*` packages and remove `file:` cross-repo deps for dashboards.
2. Make platform-service the canonical auth/flags/audit/notifications boundary and generate SDKs.
3. Decide whether LysnrAI FastAPI backend is:
- product-only (dictation/transcripts), or
- a transitional legacy backend to be decomposed into platform services.