learning_ai_notes/docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md

22 KiB

NoteLett Production Readiness Handoff Roadmap

Date: May 5, 2026 Repo: learning_ai_notes Common platform repo: ../learning_ai/learning_ai_common_plat Status: Ready for incremental implementation

Purpose

Use this document as the source of truth for finishing NoteLett end to end and making it production ready. It reconciles the existing product roadmaps, current code, and the reusable capabilities in learning_ai_common_plat.

The implementation rule is simple: complete one checklist item or one small cluster at a time, run the stated verification, commit with the repo convention, push, and record the commit hash in this document before moving on.

Current Baseline

This baseline is from repo inspection on May 5, 2026.

Area Current state
Backend Fastify 5 product backend with notes, workspaces, relationships, tasks, artifacts, agent actions, saved views, prompts, intake, collaborators, shares, versions, Palace integration, MCP tools, Cosmos registration
Web Next.js 16 App Router with auth routes, dashboard, workspaces, search, reviews, note detail, prompts/intake/settings, shared clients, E2E specs
Mobile Expo app with auth, tabs, note/capture/intake/prompt flows, stores, shared platform clients, MMKV persistence
Tests present 67 local test/spec files: 42 backend, 24 web including 9 E2E specs, 10 mobile
DevOps present backend/Dockerfile, web/Dockerfile, docker-compose.yml, GitHub Actions CI, docker prep script
Common platform packages present backend-config, backend-flags, backend-telemetry, fastify-auth, field-encrypt, palace, blob-client, extraction, dashboard-components, react-native-platform-sdk, sync, and other @bytelyst/* packages exist in common platform

Critical Observations

  • Several older docs are stale. For example, docs/AGENT_TASK_ROADMAP.md still says common-platform backend package sources are missing, but those package directories now exist in learning_ai_common_plat/packages/.
  • docs/ROADMAP.md, AGENTS.md, and older gap docs under docs/ disagree on implemented status, test counts, endpoints, and platform integration maturity.
  • Web has many hardcoded hex/RGBA colors and fallback values in app/component styles. This conflicts with the project rule that colors come from --nl-* tokens and @bytelyst/design-tokens.
  • Mobile still has hardcoded colors in a few screens, especially provider/intake labels and button text. These should move to NoteLettTheme or shared token-derived mappings.
  • Backend startup still writes Cosmos init messages through process.stdout / process.stderr; production paths should use app.log or shared logger patterns.
  • Backend config has development-friendly defaults (DB_PROVIDER=memory, default JWT secret, telemetry/flags off). Production must fail closed unless explicitly configured.
  • Web default MCP URL uses http://localhost:4050/mcp, while common platform documents mcp-server on port 4007. Align URL defaults and examples.
  • CI currently covers backend and web lint/typecheck/test/build, and mobile typecheck. It should also run mobile tests, web E2E where practical, Docker builds, and shared platform smoke checks.
  • The repo uses many common platform packages already, but production readiness should verify runtime behavior against platform-service, extraction-service, blob, telemetry, diagnostics, flags, kill switch, and MCP rather than only checking dependencies.

Non-Negotiable Implementation Rules

  • Do not edit .npmrc directly. Sync from ../learning_ai_common_plat/scripts/npmrc.template if needed.
  • Do not move NoteLett domain logic into common platform unless another product has a concrete reuse need.
  • Do not add repo-local substitutes for platform concerns already covered by @bytelyst/* packages or platform-service.
  • Do not hardcode colors in web or mobile product code. Use --nl-* CSS variables, @bytelyst/design-tokens, or NoteLettTheme.
  • Do not use console.log or raw stdout/stderr in production backend code. Use req.log, app.log, or shared logger wrappers.
  • Do not change tests just to pass. Fix the implementation.
  • Every production Cosmos document must include productId: PRODUCT_ID.
  • Keep every implementation commit small enough to review independently.

Handoff Prompt For Codex Desktop

Use this one-liner to start implementation:

In /Users/saravana/BytelystAI/learning_ai_notes, implement docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md one checklist item at a time. For each item: inspect current code, make the minimal production-quality change, run the listed verification, commit with type(scope): description, push, then update the roadmap row with the commit hash and any verification notes before moving to the next item. Fully leverage /Users/saravana/BytelystAI/learning_ai/learning_ai_common_plat for platform concerns.

Use this shorter resume prompt after interruptions:

Resume docs/PRODUCTION_READINESS_HANDOFF_ROADMAP.md from the first unchecked item, preserve prior work, commit/push each completed item, and record the commit hash plus verification notes in the roadmap.

Commit Tracking Format

After each completed item, update that checklist row:

- [x] **P1.1** Task title — Commit: `abc1234`; Verified: `pnpm --filter ...`

If an item is intentionally deferred:

- [ ] **P1.1** Task title — Deferred: reason, owner, date

Phase P0 — Verify Baseline Before Changes

Goal: establish a trustworthy starting point and avoid implementing against stale assumptions.

  • P0.1 Run git status --short and confirm the worktree is clean or identify user-owned changes. Commit: fa30191; Verified: git status --short --branch returned ## main...origin/main with no modified or untracked files before production-readiness edits began.
  • P0.2 Run pnpm install --frozen-lockfile if dependencies are not installed. Commit: pending; Verified: repaired stale common-platform workspace path from ../learning_ai_common_plat/packages/* to ../learning_ai/learning_ai_common_plat/packages/*, refreshed pnpm-lock.yaml against the local common-platform checkout, then GITEA_NPM_TOKEN=dummy pnpm install --frozen-lockfile passed. Notes: pnpm reported existing peer/build-script warnings for common-platform Azure Cosmos/Expo packages; these are baseline observations for later hardening.
  • P0.3 Run baseline checks: pnpm run typecheck, pnpm run test, and pnpm run build. Commit:
  • P0.4 Run targeted lint checks: pnpm --filter @notelett/backend run lint, pnpm --filter @notelett/web run lint, and pnpm --filter @notelett/mobile run lint. Commit:
  • P0.5 Record any baseline failures in this document before fixing them. Commit:

Acceptance criteria:

  • The next implementer knows exactly what passed before production-readiness edits began.
  • Any baseline failure has a named owner task in a later phase.

Phase P1 — Documentation Truth Alignment

Goal: remove roadmap drift so agents do not chase already completed or obsolete work.

  • P1.1 Reconcile docs/ROADMAP.md with actual implemented status for product identity, ports, CRUD, tests, Docker, CI, MCP, Smart Actions, intake, Palace, and platform integrations. Commit:
  • P1.2 Update AGENTS.md with current module list, endpoint list, test counts, route modules, containers, shared packages, and verification commands. Commit:
  • P1.3 Update README.md quick start with required common-platform services, auth expectations, Docker path, local memory mode, and production env requirements. Commit:
  • P1.4 Mark stale docs/GAP_ANALYSIS.md, docs/AGENT_TASK_ROADMAP.md, and older reuse roadmaps as historical or reconcile their open items with this roadmap. Commit:
  • P1.5 Add a concise architecture boundary section: product-local NoteLett logic versus common platform responsibilities. Commit:

Acceptance criteria:

  • No active roadmap says common-platform packages are missing when they exist.
  • AGENTS.md, README.md, and docs/ROADMAP.md agree on current state.
  • This document remains the active production-readiness checklist.

Phase P2 — Common Platform Runtime Alignment

Goal: prove that NoteLett is using common platform services and packages at runtime, not just declaring dependencies.

  • P2.1 Align MCP URL defaults and examples across web/src/lib/product-config.ts, web/.env.example, settings copy, Docker env, and docs. Use common platform mcp-server port 4007 unless the shared server exposes a different production route. Commit:
  • P2.2 Add or update platform smoke documentation for platform-service, extraction-service, mcp-server, telemetry, diagnostics, flags, kill switch, blob, and NoteLett backend health. Commit:
  • P2.3 Add a local smoke script or extend existing scripts to check GET /health, GET /api/bootstrap, platform dependencies, and one authenticated product-backend flow in memory mode. Prefer reusing common platform smoke/self-test conventions. Commit:
  • P2.4 Verify web and mobile shared clients propagate product identity, auth token, and request IDs where supported by common platform clients. Add tests where behavior is local. Commit:
  • P2.5 Decide whether mobile should adopt @bytelyst/react-native-platform-sdk now or explicitly defer it. If adopted, replace redundant local composition; if deferred, document why direct clients are still preferred. Commit:
  • P2.6 Decide whether @bytelyst/sync should back mobile offline/sync behavior now or be deferred. Record the decision and update implementation accordingly. Commit:
  • P2.7 Evaluate backend LLM usage against @bytelyst/llm-router. Either adopt the router for provider/model/fallback governance or document why @bytelyst/llm remains sufficient for NoteLett release 1. Commit:

Acceptance criteria:

  • Platform-service, extraction-service, blob, telemetry, diagnostics, flags, kill switch, and MCP are documented with real smoke steps.
  • Shared package usage has no unnecessary product-local duplicate for platform concerns.
  • Deferrals are explicit, not implied by absence.

Phase P3 — Security And Production Config Hardening

Goal: production starts fail closed and do not silently run with development-only security.

  • P3.1 Change backend config validation so production cannot use default JWT_SECRET, memory DB, disabled encryption by accident, or missing Cosmos credentials. Keep test/dev ergonomics intact. Commit:
  • P3.2 Add tests for production config validation: missing secret, default secret, memory DB in production, missing Cosmos env, encryption provider requirements. Commit:
  • P3.3 Review all unauthenticated backend endpoints. Keep /health, /api/bootstrap, and public share reads intentional; protect diagnostics or make them explicitly dev/admin-gated. Commit:
  • P3.4 Add or verify rate limiting and abuse controls for public share, auth-facing, prompt, intake, and LLM-backed endpoints using shared platform patterns where available. Commit:
  • P3.5 Verify agent write paths enforce role, product scope, workspace membership, idempotency, dry-run, and audit trail behavior. Add missing tests. Commit:
  • P3.6 Verify field encryption coverage for note body, sensitive artifact metadata, prompt content if needed, and agent action details. Add migration notes if fields are newly encrypted. Commit:

Acceptance criteria:

  • A production container cannot start with unsafe defaults.
  • Public and diagnostic surfaces are intentionally scoped.
  • Agent and AI write paths are auditable and permissioned.

Phase P4 — Design System And Accessibility Compliance

Goal: remove local visual drift and meet interaction accessibility expectations.

  • P4.1 Replace web hardcoded hex/RGBA colors and token fallbacks in web/src/app and web/src/components with --nl-* tokens or shared component primitives. Commit:
  • P4.2 Replace mobile hardcoded colors in screens and label mappings with NoteLettTheme, token-derived values, or named semantic mappings. Commit:
  • P4.3 Audit web interactive elements for visible text or aria-label, including icon buttons, modals, editor controls, review actions, prompt actions, and settings. Commit:
  • P4.4 Audit mobile pressables/buttons for accessibility labels, roles, disabled states, and Dynamic Type tolerance. Commit:
  • P4.5 Replace custom web buttons/cards/badges with @bytelyst/ui or @bytelyst/dashboard-components where the shared component is appropriate. Keep notes editor-specific UI local. Commit:
  • P4.6 Run visual checks for desktop and mobile web widths. Fix overlapping text, unstable toolbar dimensions, nested cards, or clipped controls. Commit:

Acceptance criteria:

  • rg "#[0-9a-fA-F]{3,8}|rgba?\\(" web/src mobile/src --glob '!**/*.test.*' has only documented exceptions.
  • Interactive controls are accessible.
  • Shared UI packages are used where sensible without making NoteLett-specific editor UX generic.

Phase P5 — Backend Reliability And Observability

Goal: backend behavior is production observable, testable, and consistent with common platform service conventions.

  • P5.1 Replace raw process.stdout.write / process.stderr.write in backend startup paths with app logger or shared logger-compatible handling. Commit:
  • P5.2 Ensure request IDs are propagated to extraction-service, platform-service, blob, LLM, webhook, and MCP calls where supported. Add tests for local propagation helpers. Commit:
  • P5.3 Add health/dependency readiness coverage for datastore, encryption, platform-service, extraction-service, and MCP. Prefer @bytelyst/monitoring patterns if suitable. Commit:
  • P5.4 Add structured error mapping tests for validation, auth, forbidden, not found, conflict, extraction failure, LLM timeout, and blob failure paths. Commit:
  • P5.5 Review Cosmos partition keys and query patterns for current containers; document any cross-partition or count-heavy operations and add repository tests for scope isolation. Commit:
  • P5.6 Verify webhook/scheduler loops shut down cleanly on Fastify close and do not start unexpectedly in tests. Commit:

Acceptance criteria:

  • Backend production logs are structured.
  • Readiness tells operators whether dependencies are healthy.
  • Failure modes are covered by tests and return stable errors.

Phase P6 — Web Product Completeness And Robustness

Goal: the web app is the production-grade primary NoteLett surface.

  • P6.1 Verify auth middleware and client auth state against real platform-service flows, including expired token, refresh failure, logout, and unauthenticated redirects. Commit:
  • P6.2 Verify all mutation flows use offline queue or clear retry UX where appropriate: note create/update/archive/restore, workspace CRUD, link note, task/artifact creation, prompts, intake, reviews. Commit:
  • P6.3 Add user-facing error states and empty states for backend down, platform down, extraction down, blob upload failure, and feature disabled. Commit:
  • P6.4 Complete import/export readiness: JSON and Markdown export, deterministic metadata, auth checks, and web download UX. If import is deferred, document scope and acceptance criteria. Commit:
  • P6.5 Verify note sharing/collaboration flows: public share safety, workspace collaborator access, revocation, and UI copy. Commit:
  • P6.6 Add Playwright coverage for create note, edit note, archive/restore, link note, review approve/reject, prompt run, intake URL, settings smoke, and public share. Commit:
  • P6.7 Add web build analysis for bundle regressions and remove unused route/runtime code. Commit:

Acceptance criteria:

  • Main web workflows work against real backend clients.
  • Broken dependencies produce useful UI states.
  • Playwright covers the release-critical journeys.

Phase P7 — Mobile Product Completeness And Robustness

Goal: mobile is a reliable companion app, not a scaffolded surface.

  • P7.1 Verify mobile auth against platform-service: login, register, token persistence, refresh/logout, auth-gated store hydration, and offline startup. Commit:
  • P7.2 Add or strengthen mobile tests for note list/detail/edit, capture, workspace selection, inbox approvals, intake, prompt result, settings, feedback, broadcast/survey, and offline queue. Commit:
  • P7.3 Verify mobile blob upload, share intent intake, and prompt/intake flows use shared clients and handle offline/failure states. Commit:
  • P7.4 Verify mobile feature flags, kill switch, diagnostics, telemetry, and app metadata are initialized and flushed at lifecycle boundaries. Commit:
  • P7.5 Add Expo production build notes and smoke checklist for iOS/Android simulators or devices. Commit:

Acceptance criteria:

  • Mobile has meaningful tests for release-critical flows.
  • Shared platform features are initialized intentionally.
  • Production build and smoke steps are documented.

Phase P8 — CI, Docker, And Release Automation

Goal: the production-readiness checks run in automation, not only locally.

  • P8.1 Extend CI mobile job to run mobile tests and lint, not only typecheck. Commit:
  • P8.2 Add CI web E2E job or documented gated workflow with Playwright browser install/cache. Commit:
  • P8.3 Add CI Docker build job for backend and web images. Commit:
  • P8.4 Add or document compose smoke: build, start backend/web, hit /health, /api/bootstrap, and a web smoke endpoint. Commit:
  • P8.5 Add secret scan and hardcoded token/color checks, reusing common platform scripts where possible. Commit:
  • P8.6 Add dependency health workflow or scheduled check for @bytelyst/*, Next, React, Expo, Fastify, Vitest, and Playwright compatibility. Commit:
  • P8.7 Add release notes template and production deploy checklist with environment variables, rollback, migrations, smoke tests, and monitoring links. Commit:

Acceptance criteria:

  • CI blocks obvious type, test, lint, E2E, Docker, secret, and design-token regressions.
  • Release can be executed from documented steps.

Phase P9 — Data, Migration, And Operational Readiness

Goal: production data can be created, migrated, backed up, and debugged safely.

  • P9.1 Document Cosmos containers, partition keys, indexes, retention expectations, and backup/restore approach. Commit:
  • P9.2 Add seed/bootstrap strategy for built-in prompt templates, intake rules, default workspace, and feature flags. Commit:
  • P9.3 Add migration plan for encrypted fields, schema changes, and backfills; reuse common platform migration scripts where practical. Commit:
  • P9.4 Define telemetry event taxonomy and diagnostic breadcrumbs for notes, prompts, intake, reviews, mobile capture, and MCP actions. Commit:
  • P9.5 Add operator runbook: incident triage, dependency outage behavior, stuck scheduler/webhook recovery, failed blob upload recovery, failed LLM/extraction recovery. Commit:

Acceptance criteria:

  • Operators know what data exists, how it is protected, and how to recover.
  • Built-ins and migrations are deterministic.

Phase P10 — Final Production Gate

Goal: prove the product is ready to release.

  • P10.1 Run full local verification: pnpm run verify. Commit:
  • P10.2 Run lint for backend, web, and mobile. Commit:
  • P10.3 Run web Playwright E2E. Commit:
  • P10.4 Run Docker compose build and smoke. Commit:
  • P10.5 Run common-platform dependency smoke with platform-service, extraction-service, and mcp-server available. Commit:
  • P10.6 Run hardcoded color/token audit, secret scan, and API URL/product ID drift audit. Commit:
  • P10.7 Update docs/ROADMAP.md, AGENTS.md, README.md, and this document with final status, commit hashes, known deferrals, and release notes. Commit:
  • P10.8 Push all commits and confirm remote branch status. Commit:

Acceptance criteria:

  • All checks pass or have explicit signed-off deferrals.
  • The final roadmap includes commit hashes for every completed item.
  • The remote branch contains the completed production-readiness work.

Suggested Commit Sequence

Use this sequence unless implementation findings require reordering:

  1. docs(roadmap): add production readiness handoff
  2. docs(roadmap): reconcile current implementation status
  3. fix(config): harden production backend config
  4. fix(platform): align shared service urls and smoke checks
  5. refactor(ui): replace hardcoded web colors with tokens
  6. refactor(mobile): replace hardcoded native colors with theme tokens
  7. fix(backend): tighten diagnostics and startup logging
  8. test(backend): cover production security and failure paths
  9. test(web): expand release-critical e2e coverage
  10. test(mobile): cover companion app release flows
  11. ci: expand production readiness checks
  12. docs(release): add runbook and final gate checklist

Definition Of Done

NoteLett is production ready when:

  • All P0-P10 tasks are checked or explicitly deferred with reason and owner.
  • Each completed task has a commit hash recorded here.
  • pnpm run verify, lint, E2E, Docker smoke, platform smoke, secret scan, and design-token checks pass.
  • Production config fails closed on unsafe defaults.
  • Web and mobile use shared platform packages for platform concerns and shared tokens for visual styling.
  • Backend preserves product-local NoteLett logic while using common platform packages/services for auth, datastore, errors, logging, telemetry, diagnostics, flags, blob, extraction, encryption, LLM governance, MCP, and monitoring where appropriate.
  • docs/ROADMAP.md, AGENTS.md, and README.md agree with implementation reality.