learning_ai_invt_trdg/docs/CUTOVER_WEB.md
Saravana Achu Mac d955d00c00 feat(contracts): shared socket namespaces, tab flags, cutover docs, README
- shared/realtime.ts: add SOCKET_NAMESPACES constants (/trading, /admin, root)
- shared/feature-flags.ts: add tabs.marketplace and tabs.membership to
  TradingFeatureFlagsResponse; add FEATURE_FLAG_KEYS constants
- .env.example: remove /api suffix from VITE/NEXT_PUBLIC trading URL vars
  (web appends /api itself); add tab visibility flag vars with comments
- web: add useTabFeatureFlags hook + DOM test; wire tab visibility into App.tsx
- web/vite.config.ts: finalize build config
- mobile/providers/TradingDataProvider.tsx: deriveSocketParams for proxy-safe
  socket origin/path resolution (already landed upstream, conflict resolved)
- docs: add CUTOVER_WEB.md, CUTOVER_MOBILE.md checklists; update OPERATIONS.md
  with Docker commands and resolved gap log; update ROADMAP.md to Done;
  add BACKEND_AUDIT_SCHEMA.md, BACKEND_API_DEPRECATION.md, CONVENTIONS.md;
  add audit-events container entry to AZURE_INFRASTRUCTURE.md
- README.md: full rewrite with workspace table, arch summary, env var reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:35:53 -04:00

8.4 KiB
Raw Blame History

Web Internal Adoption Checklist (Stage 2 Cutover)

Purpose

This document is the step-by-step runbook for switching internal operators from the legacy bytelyst-trading-dashboard-web to the new monorepo web dashboard (web/).

It covers the pre-flight gate, deployment, validation, rollback triggers, and post-adoption monitoring. Complete every step in order.


Pre-Flight Gate

Do not begin cutover until all of the following are true.

Go / No-Go Checks

Run from the monorepo root:

pnpm verify          # typecheck + test + build — must be green
pnpm lint            # backend contract + security guards + web/mobile lint — must be green
pnpm smoke:release   # auth + kill-switch smoke tests — must pass

Backend-specific:

cd backend
npm run check:api-contract          # feature-flag shapes, audit events, namespace constants
npm run check:websocket-contract    # BotState lifecycle consistency
npm run check:security-guards       # tenant isolation — must be green
npm run check:tenant-isolation      # row-level access — must be green

Environment Checks

  • Backend is deployed and reachable (GET /health/live returns 200)
  • Cosmos DB containers readable and writable (dynamic_config, trading-profiles, trading-control, snapshots, capital-ledger)
  • Platform-service is reachable from the deployment environment
  • PLATFORM_AUTH_ENABLED=true is set on the backend deployment
  • VITE_TRADING_API_URL points to the deployed backend (not localhost)
  • VITE_PLATFORM_URL points to the live platform-service
  • CORS_ALLOWED_ORIGINS on the backend includes the new web dashboard origin
  • Feature flags set correctly for the rollout population:
    • TAB_MARKETPLACE_ENABLED — set per rollout plan
    • TAB_MEMBERSHIP_ENABLED — set per rollout plan
    • ENABLE_BACKTEST — set per rollout plan

Rollback Readiness

  • The legacy web dashboard URL is still live and working
  • You know who owns the rollback decision and how to reach them
  • Backend trade-halt control is reachable (POST /internal/trading/pause)

Step 1 — Deploy the Web Dashboard

# From monorepo root — production build
pnpm build

# Or using Docker
pnpm docker:up

Verify the deployment:

  • Web dashboard loads at the new URL without a blank screen
  • Browser console shows no errors on load
  • Network tab shows no 4xx/5xx on initial API calls

Step 2 — Internal Operator Sign-In

Have each internal operator complete the sign-in sequence:

  • Navigate to the new web dashboard URL
  • Sign in using platform credentials (same as the legacy dashboard)
  • Session restores correctly after browser refresh (no re-login required)
  • Auth token is a platform JWT (check via browser devtools: Authorization: Bearer ... on API calls)
  • GET /api/me/profile returns the correct user profile and role

Step 3 — Core Feature Validation

Each operator validates their own user scope:

Trading State

  • Overview tab loads with live bot state (not stale/empty)
  • WebSocket connection shows "Connected" in the header
  • Socket connects to /trading namespace (check backend logs: [API][/trading] Client connected)
  • Positions tab shows current open positions
  • Trade History tab shows closed trade history
  • My Strategies tab lists the operator's trading profiles

Real-Time Updates

  • Leave the dashboard open for 60 seconds; confirm symbol prices update live
  • Trigger a manual order or profile toggle; confirm the state updates without refresh

Admin Operators (role = admin only)

  • Signals tab is visible and loads correctly
  • Entries tab is visible and loads correctly
  • Admin Panel tab is visible
  • Strategy Clusters tab is visible
  • Admin Panel → Trading Control: pause and resume work correctly
  • Backend logs show [AUDIT] entries for pause/resume actions
  • "Preview as Customer" toggle hides admin-only tabs correctly

Kill-Switch Behaviour

  • If platform-service maintenance mode is toggled on, web blocks access with correct UI
  • After maintenance mode is lifted, web recovers without a page reload

Step 4 — Config and Feature Flag Validation

  • GET /api/feature-flags returns the correct backtest, tabs.marketplace, and tabs.membership values
  • Backtesting tab visibility matches ENABLE_BACKTEST and BACKTEST_CUSTOMER_ENABLED config
  • Marketplace tab visibility matches TAB_MARKETPLACE_ENABLED config
  • Membership tab visibility matches TAB_MEMBERSHIP_ENABLED config
  • Dynamic config changes via Admin Panel → Config are persisted to Cosmos and visible after a page refresh

Step 5 — Request Tracing Spot Check

Pick any operator action (e.g., load trade history):

  • Browser devtools shows x-request-id header on the request
  • Backend response echoes the same x-request-id
  • Search backend logs for that x-request-id — the full request trace appears

Run the new and legacy dashboards in parallel before switching traffic fully:

  • Operators use the new dashboard as primary
  • Legacy dashboard remains accessible as a fallback
  • No trading state mutations go through the legacy dashboard during this period
  • Monitor for discrepancies between what new and legacy dashboards show

Step 7 — Traffic Cutover

Once parallel run is complete with no issues:

  • Update any bookmarks, internal links, or runbooks to point to the new URL
  • Communicate to all internal users that the new dashboard is now primary
  • Disable or redirect the legacy dashboard URL (do not delete it yet)

Rollback Triggers

Stop cutover and revert to the legacy dashboard immediately if any of the following occur:

Condition Action
Sign-in or session restore fails for any operator Rollback
Tenant data leak — operator sees another user's positions or history Rollback immediately + page oncall
Trading control (pause/resume) does not apply correctly Rollback
Dynamic config writes fail silently Rollback
WebSocket disconnects repeatedly with no recovery Rollback
Missing data in positions or trade history vs. legacy dashboard Investigate before proceeding

Rollback Steps

  1. Restore the legacy dashboard URL as primary (flip DNS or update internal links)
  2. Notify all operators to switch back immediately
  3. Do not rewrite or delete Cosmos state during first-response rollback
  4. File an incident report referencing the x-request-id values from affected requests
  5. Resolve the root cause before re-attempting cutover

Post-Adoption Monitoring (First 24 Hours)

Watch the following immediately after cutting over:

Immediate (first 30 minutes)

  • Platform auth failure rate is zero
  • Token refresh failures are zero
  • Backend 401 / 403 error rate is baseline (no spike)
  • WebSocket connection error rate is baseline

First Hour

  • Cosmos reads and writes are completing successfully (check backend logs for Cosmos errors)
  • Dynamic config refresh cycle completes without error (every DYNAMIC_CONFIG_REFRESH_MS)
  • No tenant isolation anomalies in security guard logs

First 24 Hours

  • Runtime control drift: Cosmos control-plane state matches in-memory trading control mode
  • Kill-switch state matches platform-service state
  • No stale session events (operators are not re-prompted to log in unexpectedly)
  • No build or chunk-size regressions affecting web load time (check browser waterfall)

Post-Cutover Sign-Off

Complete after the first 24 hours with no rollback triggers:

  • All operators confirm the new dashboard is working correctly
  • Monitoring checks above are all green
  • Incident response runbooks updated to reference the new dashboard URL
  • Legacy web repo marked as archived (not deleted — kept as reference)
  • ROADMAP.md: mark "Web internal adoption" as [x] Done
  • OPERATIONS.md: update Staged Cutover section to reflect Stage 2 complete

Next Stage

After web internal adoption is confirmed:

Stage 3 — Mobile Internal Beta (see planned docs/CUTOVER_MOBILE.md)

  • Release mobile app to internal testers
  • Validate sign-in, session restore, live state, degraded-state handling
  • Gate: backend/web contracts must be stable through at least one full backend deploy cycle