298 lines
16 KiB
Markdown
298 lines
16 KiB
Markdown
# Workspace Anti-Pattern Audit
|
||
|
||
> **Date:** 2026-02-28
|
||
> **Scope:** All 5 workspace repos (`learning_ai_common_plat`, `learning_voice_ai_agent`, `learning_multimodal_memory_agents`, `learning_ai_clock`, `learning_ai_fastgap`)
|
||
> **Method:** Automated grep/scan across all repos + manual review
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
| Severity | Count | Category |
|
||
| ------------------------------- | ------ | -------------------------------------------------------- |
|
||
| **P0 — Security / Data Loss** | 4 | Auth gaps, secrets exposure, CORS wildcard |
|
||
| **P1 — Reliability / Crashes** | 6 | Missing error handling, no retries, no graceful shutdown |
|
||
| **P2 — Maintainability / Debt** | 8 | Code duplication, version mismatches, package divergence |
|
||
| **P3 — Operational / DX** | 6 | CI gaps, env sprawl, missing observability |
|
||
| **Total** | **24** | |
|
||
|
||
---
|
||
|
||
## P0 — Security / Data Loss
|
||
|
||
### 1. Admin API routes missing auth guards — CRITICAL
|
||
|
||
**28 of 53** admin-web API routes have **no auth check** at all. This includes sensitive endpoints:
|
||
|
||
| Route | Risk |
|
||
| -------------------------------------- | -------------------------------------------- |
|
||
| `/api/ops/secrets` (GET/POST) | **Lists and writes Azure Key Vault secrets** |
|
||
| `/api/ops/secrets/[name]` (GET/DELETE) | **Reads and deletes individual secrets** |
|
||
| `/api/telemetry/*` (7 routes) | Queries/mutates telemetry data |
|
||
| `/api/themes/*` (4 routes) | Modifies platform themes |
|
||
| `/api/tokens/*` (2 routes) | API token management |
|
||
| `/api/stripe/config` | Stripe configuration |
|
||
|
||
The secrets routes are the most critical — they interact directly with Azure Key Vault with **zero authentication**. Anyone who can reach the admin dashboard can read/write/delete all production secrets.
|
||
|
||
**Fix:** Add Next.js edge middleware (`middleware.ts`) that validates JWT on all `/api/*` routes except `/api/auth/login` and `/api/auth/forgot-password`. This is a single file, ~30 lines, that protects all routes uniformly.
|
||
|
||
### 2. User-dashboard API routes missing auth — HIGH
|
||
|
||
**31 of 36** user-dashboard API routes lack explicit auth checks. Includes:
|
||
|
||
- `/api/payments`, `/api/subscription` — billing operations
|
||
- `/api/sessions/*` — user session data
|
||
- `/api/stripe/portal`, `/api/stripe/config`
|
||
- `/api/transcripts` — user transcript data
|
||
- `/api/dashboard` — dashboard aggregations
|
||
|
||
**Fix:** Same middleware.ts pattern. Protect all `/api/*` except `/api/auth/*`.
|
||
|
||
### 3. CORS defaults to wildcard (`origin: true`) — MEDIUM
|
||
|
||
In `@bytelyst/fastify-core`, when `CORS_ORIGIN` env var is not set, CORS defaults to `origin: true` (allow all origins). In production, if someone forgets to set this variable, any website can make authenticated requests to platform-service.
|
||
|
||
```typescript
|
||
// packages/fastify-core/src/create-app.ts:34
|
||
const origin = corsOrigin ? corsOrigin.split(',').map(o => o.trim()) : true;
|
||
```
|
||
|
||
**Fix:** Default to `false` (deny all) when `CORS_ORIGIN` is unset. Require explicit opt-in.
|
||
|
||
### 4. No CSP / security headers on MindLyst-web and ChronoMind-web — MEDIUM
|
||
|
||
Admin-web, tracker-web, and user-dashboard all have security headers in `next.config.ts`. MindLyst-web and ChronoMind-web have **zero** security headers configured — no CSP, no X-Frame-Options, no HSTS.
|
||
|
||
**Fix:** Copy the security headers block from admin-web's `next.config.ts` to both apps.
|
||
|
||
---
|
||
|
||
## P1 — Reliability / Crashes
|
||
|
||
### 5. 21 API routes with no try/catch — crash on any DB/network error — HIGH
|
||
|
||
| Dashboard | Total Routes | Without try/catch | % Unprotected |
|
||
| -------------- | ------------ | ----------------- | ------------- |
|
||
| user-dashboard | 36 | 12 | 33% |
|
||
| mindlyst-web | 33 | 6 | 18% |
|
||
| admin-web | 53 | 3 | 6% |
|
||
|
||
Unprotected routes include payments, subscriptions, sessions, transcripts, SSO callbacks. Any Cosmos timeout or network blip returns an unhandled 500 with a stack trace (information leak + poor UX).
|
||
|
||
**Fix:** Wrap each handler body in try/catch, or create a shared `withErrorHandler()` HOF that all API routes use.
|
||
|
||
### 6. No retry / timeout / circuit breaker in `@bytelyst/api-client` — HIGH
|
||
|
||
The shared `createApiClient()` has **no timeout**, **no retry logic**, and **no circuit breaker**. Every consumer (6 dashboards + mobile apps) inherits this:
|
||
|
||
- A single Cosmos slowdown cascades to all dashboards
|
||
- Network blips cause immediate failures with no recovery
|
||
- No AbortController timeout — requests can hang indefinitely
|
||
|
||
**Fix:** Add `timeout` option (default 10s via AbortController), `retries` option (default 2 for GET, 0 for mutations), and exponential backoff.
|
||
|
||
### 7. No graceful shutdown in Fastify services — MEDIUM
|
||
|
||
`startService()` in `@bytelyst/fastify-core` calls `process.exit(1)` on startup failure but has **no SIGTERM/SIGINT handler**. In Docker/K8s, this means:
|
||
|
||
- In-flight requests are dropped on deploy
|
||
- Database connections not cleaned up
|
||
- Potential data corruption on writes
|
||
|
||
**Fix:** Add to `startService()`:
|
||
|
||
```typescript
|
||
for (const signal of ['SIGTERM', 'SIGINT']) {
|
||
process.on(signal, async () => {
|
||
app.log.info(`Received ${signal}, shutting down gracefully`);
|
||
await app.close();
|
||
process.exit(0);
|
||
});
|
||
}
|
||
```
|
||
|
||
### 8. No `error.tsx` in any Next.js app — MEDIUM
|
||
|
||
**Zero** of the 5 Next.js apps have an `error.tsx` file. When a React component throws during render, users see a blank white page (or the browser's default error). This is the #1 source of "the app is broken" reports.
|
||
|
||
**Fix:** Add `error.tsx` to each app's root `app/` directory — ~20 lines showing a "Something went wrong" UI with a retry button.
|
||
|
||
### 9. No `not-found.tsx` in 4 of 5 Next.js apps — LOW
|
||
|
||
Only MindLyst has a custom 404. The other 4 apps show Next.js's default 404 page.
|
||
|
||
**Fix:** Add `not-found.tsx` to each app.
|
||
|
||
### 10. Missing `loading.tsx` in 4 of 5 dashboards — LOW
|
||
|
||
Only admin-web has a `loading.tsx`. Other dashboards show no loading indicator during route transitions.
|
||
|
||
**Fix:** Add a skeleton loader `loading.tsx` to each app's layout group.
|
||
|
||
---
|
||
|
||
## P2 — Maintainability / Code Duplication
|
||
|
||
### 11. MindLyst-web uses raw `@azure/cosmos` v3 instead of `@bytelyst/cosmos` — HIGH
|
||
|
||
MindLyst-web has its own 86-line `cosmos.ts` with a hand-rolled Cosmos client using **v3.17.3** of the SDK. Every other dashboard uses `@bytelyst/cosmos` (which uses v4.x). This means:
|
||
|
||
- **Different API surface** (v3 vs v4 have breaking changes)
|
||
- **No container registry** (MindLyst manages containers ad-hoc)
|
||
- **Hardcoded `PRODUCT_ID = "mindlyst"`** instead of using `@bytelyst/config`
|
||
- Bug fixes to the shared package don't reach MindLyst
|
||
|
||
**Fix:** Migrate MindLyst-web to `@bytelyst/cosmos` + `@bytelyst/config`. Replace the 86-line file with ~40 lines matching user-dashboard pattern.
|
||
|
||
### 12. MindLyst billing-client uses raw fetch instead of `@bytelyst/api-client` — MEDIUM
|
||
|
||
MindLyst's `billing-client.ts` has its own `billingFetch()` wrapper with hardcoded headers, token management, and error handling. User-dashboard's `billing-client.ts` correctly uses `createApiClient()`.
|
||
|
||
**Fix:** Rewrite MindLyst's billing-client to use `@bytelyst/api-client` like every other dashboard.
|
||
|
||
### 13. Duplicate `feature-flags.ts` across repos — MEDIUM
|
||
|
||
`feature-flags.ts` is nearly identical in user-dashboard and MindLyst-web (only differs by `PRODUCT_ID` fallback). Both have their own raw `fetch()` calls.
|
||
|
||
**Fix:** Either add a `createFeatureFlagClient()` to `@bytelyst/api-client` or create a thin `@bytelyst/feature-flags` package.
|
||
|
||
### 14. 5 copies of `product-config.ts` with identical boilerplate — LOW
|
||
|
||
Every service and dashboard has its own `product-config.ts` that wraps `@bytelyst/config`. The files are 5-10 lines of identical code.
|
||
|
||
**Fix:** Consider making `@bytelyst/config` export a ready-to-use `PRODUCT_ID` constant (lazy-loaded) to eliminate the wrapper files.
|
||
|
||
### 15. 4 copies of `docker-prep.sh` across repos — LOW
|
||
|
||
Each consumer repo has its own `docker-prep.sh` script (22-45 lines each) for packing `@bytelyst/*` tarballs. They diverge in package lists and paths.
|
||
|
||
**Fix:** Move the canonical script to `learning_ai_common_plat/scripts/docker-prep.sh` and have consumer repos call it, or use a shared Makefile target.
|
||
|
||
### 16. Duplicate `error-boundary.tsx` in admin + user dashboards — LOW
|
||
|
||
Nearly identical class component (differs by 3 whitespace lines). Should be in a shared UI package.
|
||
|
||
**Fix:** Move to `@bytelyst/react-auth` (or create `@bytelyst/react-ui`) and re-export.
|
||
|
||
### 17. Zod v4 vs v3 conflict — ChronoMind uses Zod 4 — MEDIUM
|
||
|
||
ChronoMind-web uses `zod: "^4.3.6"` while **every other package and service** uses Zod 3.x. The `@bytelyst/*` packages (config, events) all depend on Zod 3. This means:
|
||
|
||
- ChronoMind cannot use `@bytelyst/config` or any Zod-dependent shared package without runtime conflicts
|
||
- Schema types are incompatible between Zod 3 and Zod 4
|
||
|
||
**Fix:** Either downgrade ChronoMind to Zod 3 to match ecosystem, or upgrade the entire ecosystem to Zod 4 (breaking change for all services).
|
||
|
||
### 18. TypeScript version skew — MINOR
|
||
|
||
| Range | Repos |
|
||
| --------------------------- | ------------------------------------------------------ |
|
||
| `^5` (loose) | admin-web, tracker-web, user-dashboard, chronomind-web |
|
||
| `^5.7.0` – `^5.7.3` | common-plat root, services |
|
||
| `~5.9.2` – `5.9.3` (pinned) | NomGap, MindLyst-web |
|
||
|
||
MindLyst pins `5.9.3` (exact) while NomGap uses `~5.9.2`. The common-plat root uses `^5.7.0`. This can cause type-checking discrepancies.
|
||
|
||
**Fix:** Standardize all repos to `^5.9.0` in a coordinated PR.
|
||
|
||
---
|
||
|
||
## P3 — Operational / DX
|
||
|
||
### 19. 10 disabled CI workflows — no automated quality gate — HIGH
|
||
|
||
| Repo | Disabled Workflows |
|
||
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
|
||
| learning_voice_ai_agent | 7 (ci.yml, ci-python-backend, ci-admin-dashboard, ci-user-dashboard, ci-tracker-dashboard, churn-alert, release) |
|
||
| learning_ai_common_plat | 2 (ci.yml, trigger-consumers) |
|
||
| learning_multimodal_memory_agents | 1 (ci.yml) |
|
||
|
||
Only ChronoMind and NomGap have active CI. The **three largest repos** have zero automated CI on push/PR. Regressions go undetected until manual testing.
|
||
|
||
**Fix:** Re-enable CI workflows. Even a minimal `typecheck + test` workflow on PR catches most regressions.
|
||
|
||
### 20. Zero `x-request-id` propagation in dashboard API routes — MEDIUM
|
||
|
||
**All 122 dashboard API routes** (53 admin + 36 user + 33 mindlyst) lack `x-request-id` propagation. When a dashboard API route calls platform-service, there's no way to correlate the request across services in logs.
|
||
|
||
**Fix:** Add a shared middleware or utility that auto-generates and forwards `x-request-id` from incoming request to all outgoing `fetch()` / `createApiClient()` calls.
|
||
|
||
### 21. 80+ unique env vars with no central registry — MEDIUM
|
||
|
||
Across all services and dashboards, there are **80+ unique `process.env.*` references**. There's no single document listing which vars each app needs, their valid values, and which are required vs optional.
|
||
|
||
**Fix:** Create an `ENV_REGISTRY.md` in common-plat docs, auto-generated by scanning all repos. Each entry: var name, required/optional, which apps use it, description.
|
||
|
||
### 22. No `middleware.ts` in any Next.js app — MEDIUM
|
||
|
||
None of the 5 Next.js apps have a `middleware.ts` file. This means:
|
||
|
||
- No edge-level auth protection (each API route must check auth individually — and most don't)
|
||
- No redirect logic for unauthenticated users
|
||
- No request logging at the edge
|
||
|
||
**Fix:** Add `middleware.ts` to admin-web, user-dashboard, and mindlyst-web. Tracker-web may not need it if it's mostly public.
|
||
|
||
### 23. No `instrumentation.ts` in ChronoMind-web — LOW
|
||
|
||
All other Next.js apps have `instrumentation.ts` for AKV secret resolution at startup. ChronoMind-web is missing it — secrets won't resolve from Key Vault.
|
||
|
||
**Fix:** Add `instrumentation.ts` following the pattern from user-dashboard-web.
|
||
|
||
### 24. Package manager split: pnpm (common-plat) vs npm (all consumers) — INFO
|
||
|
||
Common-plat uses pnpm workspace. All 4 consumer repos use npm with `package-lock.json`. This isn't a bug but creates friction:
|
||
|
||
- Contributors must know which tool to use where
|
||
- `file:` refs from npm to pnpm workspace packages require `pnpm build` first
|
||
- Lock file formats differ
|
||
|
||
**Recommendation:** Document this clearly. Long-term, consider migrating consumers to pnpm or publishing `@bytelyst/*` to a private registry.
|
||
|
||
---
|
||
|
||
## Priority Action Plan
|
||
|
||
### Sprint 1 — Security (1-2 days)
|
||
|
||
1. Add `middleware.ts` to admin-web (blocks unauthenticated access to secrets, telemetry, themes, tokens)
|
||
2. Add `middleware.ts` to user-dashboard (blocks unauthenticated access to payments, sessions, transcripts)
|
||
3. Fix CORS default to deny-all when `CORS_ORIGIN` is unset
|
||
4. Add security headers to MindLyst-web and ChronoMind-web `next.config.ts`
|
||
|
||
### Sprint 2 — Reliability (2-3 days)
|
||
|
||
5. Add `error.tsx` + `not-found.tsx` to all 5 Next.js apps
|
||
6. Add try/catch to all 21 unprotected API routes (or create shared error handler HOF)
|
||
7. Add timeout + retry to `@bytelyst/api-client`
|
||
8. Add graceful shutdown to `@bytelyst/fastify-core`
|
||
|
||
### Sprint 3 — Deduplication (2-3 days)
|
||
|
||
9. Migrate MindLyst-web from raw `@azure/cosmos` v3 → `@bytelyst/cosmos` v4
|
||
10. Migrate MindLyst billing-client to `@bytelyst/api-client`
|
||
11. Consolidate `feature-flags.ts` into shared package
|
||
12. Resolve Zod v3/v4 conflict (ChronoMind)
|
||
|
||
### Sprint 4 — Ops & DX (1-2 days)
|
||
|
||
13. Re-enable CI on the 3 largest repos (even minimal typecheck + test)
|
||
14. Add `x-request-id` propagation to dashboard API layer
|
||
15. Standardize TypeScript version across all repos
|
||
16. Create `ENV_REGISTRY.md` with all env vars documented
|
||
|
||
---
|
||
|
||
## Items Confirmed Correct (Not Anti-Patterns)
|
||
|
||
| # | Item | Status |
|
||
| --- | ---------------------------------------------------------------------------------------- | ----------------------------------------------- |
|
||
| A | Tracker-web has no direct Cosmos access — uses `@bytelyst/api-client` → platform-service | Correct by design |
|
||
| B | MindLyst native (KMP) has no Azure wiring — all Azure goes through web API routes | Correct by design |
|
||
| C | ChronoMind/NomGap have no direct Azure SDK usage — REST API only | Correct by design |
|
||
| D | `console.log` in `@bytelyst/logger` | Intentional (it IS the logger) |
|
||
| E | `console.log` in `design-tokens/generate.ts` | Build script, not production code |
|
||
| F | `print()` in Python `cli_output.py` | Intentional CLI output (has `noqa` comment) |
|
||
| G | `as any` in `api-client/client.ts:44` | Single occurrence, casting Headers — acceptable |
|