7.3 KiB
7.3 KiB
Trading Monorepo Operations
Purpose
This document is the operator and engineer runbook for learning_ai_invt_trdg.
It covers:
- local development setup
- verification and CI expectations
- staged rollout of the new monorepo deployment
- rollback rules
- release go/no-go checks
- post-cutover monitoring
Local Development
Prerequisites
- Node.js
>=20 pnpm>=10- local checkout of:
learning_ai_invt_trdglearning_ai_common_plat
- access to:
- platform-service
- Azure Cosmos DB
Workspace bootstrap
pnpm install
cp .env.example .env
pnpm verify
Core commands
pnpm verify
pnpm lint
pnpm typecheck
pnpm test
pnpm build
pnpm smoke:release
Surface-specific commands
pnpm --filter @bytelyst/trading-backend dev
pnpm --filter @bytelyst/trading-web dev
pnpm --filter @bytelyst/trading-mobile dev
Environment Model
Platform-service
PLATFORM_API_URLPLATFORM_AUTH_ENABLEDPLATFORM_JWT_ISSUERPLATFORM_JWT_PUBLIC_KEYorPLATFORM_JWT_JWKS_URLJWT_SECRETonly for HS256 compatibility environments
Cosmos
COSMOS_ENDPOINTCOSMOS_KEYCOSMOS_DATABASE
Rule:
- platform-service and Cosmos are the only supported production systems for this repo
- legacy repos may still be consulted as code references, but they are not runtime dependencies
- trading user profiles, dynamic config, trading controls, snapshots, capital ledgers, and strategy presets already use Cosmos-backed authority paths
Verification Standard
Before merge or release, all of the following must pass from repo root:
pnpm verify
pnpm lint
pnpm verify currently gates:
- backend, web, and mobile typecheck
- backend and web test suites
- backend and web build plus mobile typecheck
pnpm lint currently gates:
- backend contract and safety verification scripts
- web lint
- mobile lint
Request Tracing
- the main web and mobile API paths now attach
x-request-id - backend HTTP responses echo
x-request-idso browser/app logs can be correlated with backend logs - during incident review, treat
x-request-idas the primary request correlation key across client and backend traces
Feature Flag Ownership
- backend
GET /api/feature-flagsis the authoritative runtime contract for user-facing feature access - web feature gates must read explicit feature-flag contracts instead of scraping generic config payloads
- dynamic config may still store the underlying values, but the product surfaces should consume the typed feature-flag API
Staged Cutover
Order
- Backend internal validation
- Web internal adoption
- Mobile internal beta
- Controlled operator rollout
- Broader production cutover
Backend cutover
- deploy backend with platform JWT support and Cosmos-backed control-plane and execution persistence enabled
- confirm runtime control reads/writes work through backend APIs
- confirm
dynamic_config, trading-control, order, trade-history, and manual-entry containers are readable and writable - confirm unauthorized requests are rejected and tenant-scoped reads are enforced
Web cutover
- move operators to the monorepo web dashboard
- validate sign-in, session restore, kill-switch handling, and admin controls
- validate dynamic config writes through backend APIs
- keep legacy direct-table workflows disabled where backend API replacements exist
Mobile cutover
- release to internal beta first
- validate sign-in, session restore, live state, degraded-state handling, and safe interventions
- do not enable broader rollout until backend/web contracts stay stable through at least one backend deploy cycle
Rollback Rules
Hard rollback triggers
- auth/session failures prevent sign-in or session refresh
- incorrect tenant scoping leaks another user's profile, orders, alerts, or history
- global trade halt or scoped disable controls do not apply correctly
- dynamic config writes fail or partially apply without clear operator visibility
- mobile/web clients cannot recover from degraded platform-service or backend states
Rollback actions
- stop rollout to additional users immediately
- revert the most recent monorepo deployment
- restore traffic to the previous stable web/backend/mobile release
- keep backend trade-halt authority available during rollback
- preserve audit logs and operational events for incident review
Data rollback rule
- do not rewrite or delete Cosmos control-plane state as part of first-response rollback
- prefer application rollback first, then explicit state repair if needed
Release Go/No-Go
Release is go only if all of the following are true:
pnpm verifypassespnpm lintpassespnpm smoke:releasepasses- platform-service auth is reachable from web and mobile
- Cosmos control-plane reads and writes succeed
- Cosmos execution-data reads and writes succeed
- kill-switch and maintenance behavior are validated on web and mobile
- backend tenant isolation checks are green
- operator-safe mobile interventions are limited to approved actions only
- no legacy runtime data dependency remains in critical public flows
Release is no-go if any of the following are true:
- auth source of truth is ambiguous in production
- admin/runtime-control actions are not fully audited
- rollback owner or rollback commands are unclear
Release Smoke Checklist
pnpm smoke:release currently validates:
- web sign-in flow behavior
- web password reset flow behavior
- web authenticated session bootstrap behavior
- web websocket auth token gating
- web product kill-switch accessibility gating
- mobile auth and product-availability surfaces still compile against the shared platform contracts
Manual mobile release smoke is still required before broad rollout:
- Sign in on a fresh install.
- Confirm session restore after app restart.
- Confirm product-disabled state blocks the app shell.
- Confirm maintenance/availability messaging is visible.
- Confirm the app recovers after re-enabling the product.
Post-Cutover Monitoring
Watch immediately after rollout
- platform auth failures
- token refresh failures
- backend
401and403spikes - websocket connection failure rate
- dynamic config update failures
- trading-control update failures
- mobile degraded/offline state frequency
- unexpected operator intervention failures
Watch for the first 24 hours
- tenant isolation anomalies
- runtime control drift between backend memory and Cosmos control state
- kill-switch misfires
- stale session behavior across web and mobile
- build or chunk-size regressions affecting web load
Known Remaining Gaps
- Cosmos-only execution persistence is now in place for the main backend runtime paths, but dormant legacy code and one-off reference scripts still need cleanup
- web now uses platform-session handling end to end; the remaining auth cleanup is removing dormant compatibility stubs and aligning profile bootstrap contracts fully with backend-owned product APIs
- root
pnpm verifyis green again after aligning the web Vitest harness with platform-session storage and current API contracts - mobile does not yet include push notification infrastructure
- feature-flag ownership and correlation-ID propagation are not fully standardized yet
These are follow-up items, not hidden defects. They should remain tracked in docs/ROADMAP.md.