learning_ai_invt_trdg/docs/CUTOVER_WEB.md
Saravana Achu Mac d955d00c00 feat(contracts): shared socket namespaces, tab flags, cutover docs, README
- shared/realtime.ts: add SOCKET_NAMESPACES constants (/trading, /admin, root)
- shared/feature-flags.ts: add tabs.marketplace and tabs.membership to
  TradingFeatureFlagsResponse; add FEATURE_FLAG_KEYS constants
- .env.example: remove /api suffix from VITE/NEXT_PUBLIC trading URL vars
  (web appends /api itself); add tab visibility flag vars with comments
- web: add useTabFeatureFlags hook + DOM test; wire tab visibility into App.tsx
- web/vite.config.ts: finalize build config
- mobile/providers/TradingDataProvider.tsx: deriveSocketParams for proxy-safe
  socket origin/path resolution (already landed upstream, conflict resolved)
- docs: add CUTOVER_WEB.md, CUTOVER_MOBILE.md checklists; update OPERATIONS.md
  with Docker commands and resolved gap log; update ROADMAP.md to Done;
  add BACKEND_AUDIT_SCHEMA.md, BACKEND_API_DEPRECATION.md, CONVENTIONS.md;
  add audit-events container entry to AZURE_INFRASTRUCTURE.md
- README.md: full rewrite with workspace table, arch summary, env var reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:35:53 -04:00

228 lines
8.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Web Internal Adoption Checklist (Stage 2 Cutover)
## Purpose
This document is the step-by-step runbook for switching internal operators from the
legacy `bytelyst-trading-dashboard-web` to the new monorepo web dashboard (`web/`).
It covers the pre-flight gate, deployment, validation, rollback triggers, and
post-adoption monitoring. Complete every step in order.
---
## Pre-Flight Gate
Do not begin cutover until all of the following are true.
### Go / No-Go Checks
Run from the monorepo root:
```bash
pnpm verify # typecheck + test + build — must be green
pnpm lint # backend contract + security guards + web/mobile lint — must be green
pnpm smoke:release # auth + kill-switch smoke tests — must pass
```
Backend-specific:
```bash
cd backend
npm run check:api-contract # feature-flag shapes, audit events, namespace constants
npm run check:websocket-contract # BotState lifecycle consistency
npm run check:security-guards # tenant isolation — must be green
npm run check:tenant-isolation # row-level access — must be green
```
### Environment Checks
- [ ] Backend is deployed and reachable (`GET /health/live` returns 200)
- [ ] Cosmos DB containers readable and writable (`dynamic_config`, `trading-profiles`, `trading-control`, `snapshots`, `capital-ledger`)
- [ ] Platform-service is reachable from the deployment environment
- [ ] `PLATFORM_AUTH_ENABLED=true` is set on the backend deployment
- [ ] `VITE_TRADING_API_URL` points to the deployed backend (not localhost)
- [ ] `VITE_PLATFORM_URL` points to the live platform-service
- [ ] `CORS_ALLOWED_ORIGINS` on the backend includes the new web dashboard origin
- [ ] Feature flags set correctly for the rollout population:
- `TAB_MARKETPLACE_ENABLED` — set per rollout plan
- `TAB_MEMBERSHIP_ENABLED` — set per rollout plan
- `ENABLE_BACKTEST` — set per rollout plan
### Rollback Readiness
- [ ] The legacy web dashboard URL is still live and working
- [ ] You know who owns the rollback decision and how to reach them
- [ ] Backend trade-halt control is reachable (`POST /internal/trading/pause`)
---
## Step 1 — Deploy the Web Dashboard
```bash
# From monorepo root — production build
pnpm build
# Or using Docker
pnpm docker:up
```
Verify the deployment:
- [ ] Web dashboard loads at the new URL without a blank screen
- [ ] Browser console shows no errors on load
- [ ] Network tab shows no 4xx/5xx on initial API calls
---
## Step 2 — Internal Operator Sign-In
Have each internal operator complete the sign-in sequence:
- [ ] Navigate to the new web dashboard URL
- [ ] Sign in using platform credentials (same as the legacy dashboard)
- [ ] Session restores correctly after browser refresh (no re-login required)
- [ ] Auth token is a platform JWT (check via browser devtools: `Authorization: Bearer ...` on API calls)
- [ ] `GET /api/me/profile` returns the correct user profile and role
---
## Step 3 — Core Feature Validation
Each operator validates their own user scope:
### Trading State
- [ ] Overview tab loads with live bot state (not stale/empty)
- [ ] WebSocket connection shows "Connected" in the header
- [ ] Socket connects to `/trading` namespace (check backend logs: `[API][/trading] Client connected`)
- [ ] Positions tab shows current open positions
- [ ] Trade History tab shows closed trade history
- [ ] My Strategies tab lists the operator's trading profiles
### Real-Time Updates
- [ ] Leave the dashboard open for 60 seconds; confirm symbol prices update live
- [ ] Trigger a manual order or profile toggle; confirm the state updates without refresh
### Admin Operators (role = admin only)
- [ ] Signals tab is visible and loads correctly
- [ ] Entries tab is visible and loads correctly
- [ ] Admin Panel tab is visible
- [ ] Strategy Clusters tab is visible
- [ ] Admin Panel → Trading Control: pause and resume work correctly
- [ ] Backend logs show `[AUDIT]` entries for pause/resume actions
- [ ] "Preview as Customer" toggle hides admin-only tabs correctly
### Kill-Switch Behaviour
- [ ] If platform-service maintenance mode is toggled on, web blocks access with correct UI
- [ ] After maintenance mode is lifted, web recovers without a page reload
---
## Step 4 — Config and Feature Flag Validation
- [ ] `GET /api/feature-flags` returns the correct `backtest`, `tabs.marketplace`, and `tabs.membership` values
- [ ] Backtesting tab visibility matches `ENABLE_BACKTEST` and `BACKTEST_CUSTOMER_ENABLED` config
- [ ] Marketplace tab visibility matches `TAB_MARKETPLACE_ENABLED` config
- [ ] Membership tab visibility matches `TAB_MEMBERSHIP_ENABLED` config
- [ ] Dynamic config changes via Admin Panel → Config are persisted to Cosmos and visible after a page refresh
---
## Step 5 — Request Tracing Spot Check
Pick any operator action (e.g., load trade history):
- [ ] Browser devtools shows `x-request-id` header on the request
- [ ] Backend response echoes the same `x-request-id`
- [ ] Search backend logs for that `x-request-id` — the full request trace appears
---
## Step 6 — Parallel Run Period (Recommended: 13 days)
Run the new and legacy dashboards in parallel before switching traffic fully:
- [ ] Operators use the new dashboard as primary
- [ ] Legacy dashboard remains accessible as a fallback
- [ ] No trading state mutations go through the legacy dashboard during this period
- [ ] Monitor for discrepancies between what new and legacy dashboards show
---
## Step 7 — Traffic Cutover
Once parallel run is complete with no issues:
- [ ] Update any bookmarks, internal links, or runbooks to point to the new URL
- [ ] Communicate to all internal users that the new dashboard is now primary
- [ ] Disable or redirect the legacy dashboard URL (do not delete it yet)
---
## Rollback Triggers
Stop cutover and revert to the legacy dashboard immediately if any of the following occur:
| Condition | Action |
|---|---|
| Sign-in or session restore fails for any operator | Rollback |
| Tenant data leak — operator sees another user's positions or history | Rollback immediately + page oncall |
| Trading control (pause/resume) does not apply correctly | Rollback |
| Dynamic config writes fail silently | Rollback |
| WebSocket disconnects repeatedly with no recovery | Rollback |
| Missing data in positions or trade history vs. legacy dashboard | Investigate before proceeding |
### Rollback Steps
1. Restore the legacy dashboard URL as primary (flip DNS or update internal links)
2. Notify all operators to switch back immediately
3. Do **not** rewrite or delete Cosmos state during first-response rollback
4. File an incident report referencing the `x-request-id` values from affected requests
5. Resolve the root cause before re-attempting cutover
---
## Post-Adoption Monitoring (First 24 Hours)
Watch the following immediately after cutting over:
### Immediate (first 30 minutes)
- [ ] Platform auth failure rate is zero
- [ ] Token refresh failures are zero
- [ ] Backend `401` / `403` error rate is baseline (no spike)
- [ ] WebSocket connection error rate is baseline
### First Hour
- [ ] Cosmos reads and writes are completing successfully (check backend logs for Cosmos errors)
- [ ] Dynamic config refresh cycle completes without error (every `DYNAMIC_CONFIG_REFRESH_MS`)
- [ ] No tenant isolation anomalies in security guard logs
### First 24 Hours
- [ ] Runtime control drift: Cosmos control-plane state matches in-memory trading control mode
- [ ] Kill-switch state matches platform-service state
- [ ] No stale session events (operators are not re-prompted to log in unexpectedly)
- [ ] No build or chunk-size regressions affecting web load time (check browser waterfall)
---
## Post-Cutover Sign-Off
Complete after the first 24 hours with no rollback triggers:
- [ ] All operators confirm the new dashboard is working correctly
- [ ] Monitoring checks above are all green
- [ ] Incident response runbooks updated to reference the new dashboard URL
- [ ] Legacy web repo marked as archived (not deleted — kept as reference)
- [ ] ROADMAP.md: mark "Web internal adoption" as `[x]` Done
- [ ] OPERATIONS.md: update Staged Cutover section to reflect Stage 2 complete
---
## Next Stage
After web internal adoption is confirmed:
**Stage 3 — Mobile Internal Beta** (see planned `docs/CUTOVER_MOBILE.md`)
- Release mobile app to internal testers
- Validate sign-in, session restore, live state, degraded-state handling
- Gate: backend/web contracts must be stable through at least one full backend deploy cycle