# Trading Monorepo Operations ## Purpose This document is the operator and engineer runbook for `learning_ai_invt_trdg`. It covers: - local development setup - verification and CI expectations - staged rollout of the new monorepo deployment - rollback rules - release go/no-go checks - post-cutover monitoring ## Local Development ### Prerequisites - Node.js `>=20` - `pnpm` `>=10` - local checkout of: - `learning_ai_invt_trdg` - `learning_ai_common_plat` - access to: - platform-service - Azure Cosmos DB ### Workspace bootstrap ```bash pnpm install cp .env.example .env pnpm verify ``` ### Core commands ```bash pnpm verify pnpm lint pnpm typecheck pnpm test pnpm build pnpm smoke:release ``` ### Surface-specific commands ```bash pnpm --filter @bytelyst/trading-backend dev pnpm --filter @bytelyst/trading-web dev pnpm --filter @bytelyst/trading-mobile dev ``` ## Environment Model ### Platform-service - `PLATFORM_API_URL` - `PLATFORM_AUTH_ENABLED` - `PLATFORM_JWT_ISSUER` - `PLATFORM_JWT_PUBLIC_KEY` or `PLATFORM_JWT_JWKS_URL` - `JWT_SECRET` only for HS256 compatibility environments ### Cosmos - `COSMOS_ENDPOINT` - `COSMOS_KEY` - `COSMOS_DATABASE` Rule: - platform-service and Cosmos are the only supported production systems for this repo - legacy repos may still be consulted as code references, but they are not runtime dependencies - trading user profiles, dynamic config, trading controls, snapshots, capital ledgers, and strategy presets already use Cosmos-backed authority paths ## Verification Standard Before merge or release, all of the following must pass from repo root: ```bash pnpm verify pnpm lint ``` `pnpm verify` currently gates: - backend, web, and mobile typecheck - backend and web test suites - backend and web build plus mobile typecheck `pnpm lint` currently gates: - backend contract and safety verification scripts - web lint - mobile lint ## Request Tracing - the main web and mobile API paths now attach `x-request-id` - backend HTTP responses echo `x-request-id` so browser/app logs can be correlated with backend logs - during incident review, treat `x-request-id` as the primary request correlation key across client and backend traces ## Feature Flag Ownership - backend `GET /api/feature-flags` is the authoritative runtime contract for user-facing feature access - web feature gates must read explicit feature-flag contracts instead of scraping generic config payloads - dynamic config may still store the underlying values, but the product surfaces should consume the typed feature-flag API ## Staged Cutover ### Order 1. Backend internal validation 2. Web internal adoption 3. Mobile internal beta 4. Controlled operator rollout 5. Broader production cutover ### Backend cutover - deploy backend with platform JWT support and Cosmos-backed control-plane and execution persistence enabled - confirm runtime control reads/writes work through backend APIs - confirm `dynamic_config`, trading-control, order, trade-history, and manual-entry containers are readable and writable - confirm unauthorized requests are rejected and tenant-scoped reads are enforced ### Web cutover - move operators to the monorepo web dashboard - validate sign-in, session restore, kill-switch handling, and admin controls - validate dynamic config writes through backend APIs - keep legacy direct-table workflows disabled where backend API replacements exist ### Mobile cutover - release to internal beta first - validate sign-in, session restore, live state, degraded-state handling, and safe interventions - do not enable broader rollout until backend/web contracts stay stable through at least one backend deploy cycle ## Rollback Rules ### Hard rollback triggers - auth/session failures prevent sign-in or session refresh - incorrect tenant scoping leaks another user's profile, orders, alerts, or history - global trade halt or scoped disable controls do not apply correctly - dynamic config writes fail or partially apply without clear operator visibility - mobile/web clients cannot recover from degraded platform-service or backend states ### Rollback actions 1. stop rollout to additional users immediately 2. revert the most recent monorepo deployment 3. restore traffic to the previous stable web/backend/mobile release 4. keep backend trade-halt authority available during rollback 5. preserve audit logs and operational events for incident review ### Data rollback rule - do not rewrite or delete Cosmos control-plane state as part of first-response rollback - prefer application rollback first, then explicit state repair if needed ## Release Go/No-Go Release is `go` only if all of the following are true: - `pnpm verify` passes - `pnpm lint` passes - `pnpm smoke:release` passes - platform-service auth is reachable from web and mobile - Cosmos control-plane reads and writes succeed - Cosmos execution-data reads and writes succeed - kill-switch and maintenance behavior are validated on web and mobile - backend tenant isolation checks are green - operator-safe mobile interventions are limited to approved actions only - no legacy runtime data dependency remains in critical public flows Release is `no-go` if any of the following are true: - auth source of truth is ambiguous in production - admin/runtime-control actions are not fully audited - rollback owner or rollback commands are unclear ## Release Smoke Checklist `pnpm smoke:release` currently validates: - web sign-in flow behavior - web password reset flow behavior - web authenticated session bootstrap behavior - web websocket auth token gating - web product kill-switch accessibility gating - mobile auth and product-availability surfaces still compile against the shared platform contracts Manual mobile release smoke is still required before broad rollout: 1. Sign in on a fresh install. 2. Confirm session restore after app restart. 3. Confirm product-disabled state blocks the app shell. 4. Confirm maintenance/availability messaging is visible. 5. Confirm the app recovers after re-enabling the product. ## Post-Cutover Monitoring ### Watch immediately after rollout - platform auth failures - token refresh failures - backend `401` and `403` spikes - websocket connection failure rate - dynamic config update failures - trading-control update failures - mobile degraded/offline state frequency - unexpected operator intervention failures ### Watch for the first 24 hours - tenant isolation anomalies - runtime control drift between backend memory and Cosmos control state - kill-switch misfires - stale session behavior across web and mobile - build or chunk-size regressions affecting web load ## Known Remaining Gaps - Cosmos-only execution persistence is now in place for the main backend runtime paths, but dormant legacy code and one-off reference scripts still need cleanup - web now uses platform-session handling end to end; the remaining auth cleanup is removing dormant compatibility stubs and aligning profile bootstrap contracts fully with backend-owned product APIs - root `pnpm verify` is green again after aligning the web Vitest harness with platform-session storage and current API contracts - mobile does not yet include push notification infrastructure - feature-flag ownership and correlation-ID propagation are not fully standardized yet These are follow-up items, not hidden defects. They should remain tracked in `docs/ROADMAP.md`.