docs: add operations runbook and refresh roadmap status
This commit is contained in:
parent
d01ed51bff
commit
5685cb3449
@ -27,6 +27,12 @@ This repo consumes local ByteLyst common-platform packages from:
|
|||||||
```bash
|
```bash
|
||||||
pnpm install
|
pnpm install
|
||||||
pnpm verify
|
pnpm verify
|
||||||
|
pnpm lint
|
||||||
pnpm build
|
pnpm build
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Operations
|
||||||
|
|
||||||
|
- product and scope: `docs/PRD.md`
|
||||||
|
- execution tracker: `docs/ROADMAP.md`
|
||||||
|
- local dev, cutover, rollback, and release checks: `docs/OPERATIONS.md`
|
||||||
|
|||||||
207
docs/OPERATIONS.md
Normal file
207
docs/OPERATIONS.md
Normal file
@ -0,0 +1,207 @@
|
|||||||
|
# Trading Monorepo Operations
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document is the operator and engineer runbook for `learning_ai_invt_trdg`.
|
||||||
|
|
||||||
|
It covers:
|
||||||
|
|
||||||
|
- local development setup
|
||||||
|
- verification and CI expectations
|
||||||
|
- staged cutover from legacy repos
|
||||||
|
- rollback rules
|
||||||
|
- release go/no-go checks
|
||||||
|
- post-cutover monitoring
|
||||||
|
|
||||||
|
## Local Development
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Node.js `>=20`
|
||||||
|
- `pnpm` `>=10`
|
||||||
|
- local checkout of:
|
||||||
|
- `learning_ai_invt_trdg`
|
||||||
|
- `learning_ai_common_plat`
|
||||||
|
- access to:
|
||||||
|
- platform-service
|
||||||
|
- Azure Cosmos DB
|
||||||
|
- optional legacy Supabase project during migration
|
||||||
|
|
||||||
|
### Workspace bootstrap
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pnpm install
|
||||||
|
cp .env.example .env
|
||||||
|
pnpm verify
|
||||||
|
```
|
||||||
|
|
||||||
|
### Core commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pnpm verify
|
||||||
|
pnpm lint
|
||||||
|
pnpm typecheck
|
||||||
|
pnpm test
|
||||||
|
pnpm build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Surface-specific commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pnpm --filter @bytelyst/trading-backend dev
|
||||||
|
pnpm --filter @bytelyst/trading-web dev
|
||||||
|
pnpm --filter @bytelyst/trading-mobile dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Model
|
||||||
|
|
||||||
|
### Platform-service
|
||||||
|
|
||||||
|
- `PLATFORM_API_URL`
|
||||||
|
- `PLATFORM_AUTH_ENABLED`
|
||||||
|
- `PLATFORM_JWT_ISSUER`
|
||||||
|
- `PLATFORM_JWT_PUBLIC_KEY` or `PLATFORM_JWT_JWKS_URL`
|
||||||
|
- `JWT_SECRET` only for HS256 compatibility environments
|
||||||
|
|
||||||
|
### Cosmos
|
||||||
|
|
||||||
|
- `COSMOS_ENDPOINT`
|
||||||
|
- `COSMOS_KEY`
|
||||||
|
- `COSMOS_DATABASE`
|
||||||
|
|
||||||
|
### Transitional legacy migration support
|
||||||
|
|
||||||
|
- `SUPABASE_URL`
|
||||||
|
- `SUPABASE_KEY`
|
||||||
|
- `SUPABASE_JWT_ISSUER`
|
||||||
|
- `SUPABASE_JWT_AUDIENCE`
|
||||||
|
|
||||||
|
Rule:
|
||||||
|
|
||||||
|
- platform-service and Cosmos are the target system
|
||||||
|
- Supabase remains transitional only where trading persistence has not yet been migrated
|
||||||
|
|
||||||
|
## Verification Standard
|
||||||
|
|
||||||
|
Before merge or release, all of the following must pass from repo root:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pnpm verify
|
||||||
|
pnpm lint
|
||||||
|
```
|
||||||
|
|
||||||
|
`pnpm verify` currently gates:
|
||||||
|
|
||||||
|
- backend, web, and mobile typecheck
|
||||||
|
- backend and web test suites
|
||||||
|
- backend and web build plus mobile typecheck
|
||||||
|
|
||||||
|
`pnpm lint` currently gates:
|
||||||
|
|
||||||
|
- backend contract and safety verification scripts
|
||||||
|
- web lint
|
||||||
|
- mobile lint
|
||||||
|
|
||||||
|
## Staged Cutover
|
||||||
|
|
||||||
|
### Order
|
||||||
|
|
||||||
|
1. Backend internal validation
|
||||||
|
2. Web internal adoption
|
||||||
|
3. Mobile internal beta
|
||||||
|
4. Controlled operator rollout
|
||||||
|
5. Broader production cutover
|
||||||
|
|
||||||
|
### Backend cutover
|
||||||
|
|
||||||
|
- deploy backend with platform JWT support and Cosmos-backed trading controls enabled
|
||||||
|
- keep legacy Supabase fallback enabled during first production bake
|
||||||
|
- confirm runtime control reads/writes work through backend APIs
|
||||||
|
- confirm `dynamic_config` and trading-control containers are readable and writable
|
||||||
|
- confirm unauthorized requests are rejected and tenant-scoped reads are enforced
|
||||||
|
|
||||||
|
### Web cutover
|
||||||
|
|
||||||
|
- move operators to the monorepo web dashboard
|
||||||
|
- validate sign-in, session restore, kill-switch handling, and admin controls
|
||||||
|
- validate dynamic config writes through backend APIs
|
||||||
|
- keep legacy direct-table workflows disabled where backend API replacements exist
|
||||||
|
|
||||||
|
### Mobile cutover
|
||||||
|
|
||||||
|
- release to internal beta first
|
||||||
|
- validate sign-in, session restore, live state, degraded-state handling, and safe interventions
|
||||||
|
- do not enable broader rollout until backend/web contracts stay stable through at least one backend deploy cycle
|
||||||
|
|
||||||
|
## Rollback Rules
|
||||||
|
|
||||||
|
### Hard rollback triggers
|
||||||
|
|
||||||
|
- auth/session failures prevent sign-in or session refresh
|
||||||
|
- incorrect tenant scoping leaks another user's profile, orders, alerts, or history
|
||||||
|
- global trade halt or scoped disable controls do not apply correctly
|
||||||
|
- dynamic config writes fail or partially apply without clear operator visibility
|
||||||
|
- mobile/web clients cannot recover from degraded platform-service or backend states
|
||||||
|
|
||||||
|
### Rollback actions
|
||||||
|
|
||||||
|
1. stop rollout to additional users immediately
|
||||||
|
2. revert the most recent monorepo deployment
|
||||||
|
3. restore traffic to the previous stable web/backend/mobile release
|
||||||
|
4. keep backend trade-halt authority available during rollback
|
||||||
|
5. preserve audit logs and operational events for incident review
|
||||||
|
|
||||||
|
### Data rollback rule
|
||||||
|
|
||||||
|
- do not rewrite or delete Cosmos control-plane state as part of first-response rollback
|
||||||
|
- prefer application rollback first, then explicit state repair if needed
|
||||||
|
|
||||||
|
## Release Go/No-Go
|
||||||
|
|
||||||
|
Release is `go` only if all of the following are true:
|
||||||
|
|
||||||
|
- `pnpm verify` passes
|
||||||
|
- `pnpm lint` passes
|
||||||
|
- platform-service auth is reachable from web and mobile
|
||||||
|
- Cosmos control-plane reads and writes succeed
|
||||||
|
- kill-switch and maintenance behavior are validated on web and mobile
|
||||||
|
- backend tenant isolation checks are green
|
||||||
|
- operator-safe mobile interventions are limited to approved actions only
|
||||||
|
- known migration-only legacy dependencies are documented
|
||||||
|
|
||||||
|
Release is `no-go` if any of the following are true:
|
||||||
|
|
||||||
|
- Supabase fallback is still required for a critical public flow that has no monitored contingency
|
||||||
|
- auth source of truth is ambiguous in production
|
||||||
|
- admin/runtime-control actions are not fully audited
|
||||||
|
- rollback owner or rollback commands are unclear
|
||||||
|
|
||||||
|
## Post-Cutover Monitoring
|
||||||
|
|
||||||
|
### Watch immediately after rollout
|
||||||
|
|
||||||
|
- platform auth failures
|
||||||
|
- token refresh failures
|
||||||
|
- backend `401` and `403` spikes
|
||||||
|
- websocket connection failure rate
|
||||||
|
- dynamic config update failures
|
||||||
|
- trading-control update failures
|
||||||
|
- mobile degraded/offline state frequency
|
||||||
|
- unexpected operator intervention failures
|
||||||
|
|
||||||
|
### Watch for the first 24 hours
|
||||||
|
|
||||||
|
- tenant isolation anomalies
|
||||||
|
- runtime control drift between backend memory and Cosmos control state
|
||||||
|
- kill-switch misfires
|
||||||
|
- stale session behavior across web and mobile
|
||||||
|
- build or chunk-size regressions affecting web load
|
||||||
|
|
||||||
|
## Known Remaining Gaps
|
||||||
|
|
||||||
|
- full trading data-plane migration away from legacy Supabase is not complete
|
||||||
|
- web still carries some legacy compatibility layers around auth/profile bootstrap
|
||||||
|
- mobile does not yet include push notification infrastructure
|
||||||
|
- feature-flag ownership and correlation-ID propagation are not fully standardized yet
|
||||||
|
|
||||||
|
These are follow-up items, not hidden defects. They should remain tracked in `docs/ROADMAP.md`.
|
||||||
@ -14,7 +14,7 @@ It assumes:
|
|||||||
|
|
||||||
### Overall status
|
### Overall status
|
||||||
|
|
||||||
- Current phase: `Phase 5`
|
- Current phase: `Phase 6`
|
||||||
- Overall state: `In Progress`
|
- Overall state: `In Progress`
|
||||||
|
|
||||||
### Legend
|
### Legend
|
||||||
@ -31,6 +31,9 @@ It assumes:
|
|||||||
- [x] Web migrated into `web/` with shared runtime, shared kill-switch gate, shared telemetry bootstrap, normalized backend URL resolution, and platform-service-backed public auth via a compatibility shim
|
- [x] Web migrated into `web/` with shared runtime, shared kill-switch gate, shared telemetry bootstrap, normalized backend URL resolution, and platform-service-backed public auth via a compatibility shim
|
||||||
- [x] Mobile migrated into `mobile/` with product identity, shared runtime bootstrap, launch-time kill-switch gate, platform-service auth, live backend polling plus websocket-backed updates, startup/error telemetry capture, secure session storage with invalidation handling, and explicit degraded/offline status surfacing
|
- [x] Mobile migrated into `mobile/` with product identity, shared runtime bootstrap, launch-time kill-switch gate, platform-service auth, live backend polling plus websocket-backed updates, startup/error telemetry capture, secure session storage with invalidation handling, and explicit degraded/offline status surfacing
|
||||||
- [x] Backend now accepts common-platform JWTs with legacy Supabase fallback and persists global trading-control state through Cosmos-backed control storage
|
- [x] Backend now accepts common-platform JWTs with legacy Supabase fallback and persists global trading-control state through Cosmos-backed control storage
|
||||||
|
- [x] Dynamic config now flows through backend control-plane APIs with Cosmos-first storage and legacy Supabase fallback
|
||||||
|
- [x] Mobile platform auth requests now use the common React Native platform SDK
|
||||||
|
- [x] Root verification and lint flows now run successfully without sandbox-hostile script harness behavior
|
||||||
- [-] DRY cleanup completed for runtime/config/bootstrap concerns, shared websocket auth helpers, and platform-session handling, but not yet for all data-plane persistence flows
|
- [-] DRY cleanup completed for runtime/config/bootstrap concerns, shared websocket auth helpers, and platform-session handling, but not yet for all data-plane persistence flows
|
||||||
- [!] Full common-platform data-plane replacement remains a follow-up; backend and web still retain legacy Supabase data access for trading records and configuration tables
|
- [!] Full common-platform data-plane replacement remains a follow-up; backend and web still retain legacy Supabase data access for trading records and configuration tables
|
||||||
|
|
||||||
@ -161,7 +164,7 @@ Ensure all surfaces adopt one consistent platform model for auth, kill switch, t
|
|||||||
|
|
||||||
### Checklist
|
### Checklist
|
||||||
|
|
||||||
- [-] Define web auth pattern using `@bytelyst/react-auth`
|
- [x] Define web auth pattern using `@bytelyst/react-auth`
|
||||||
- [x] Define mobile auth pattern using shared platform-service session contracts
|
- [x] Define mobile auth pattern using shared platform-service session contracts
|
||||||
- [x] Define backend auth boundary and middleware strategy
|
- [x] Define backend auth boundary and middleware strategy
|
||||||
- [x] Define kill-switch semantics across web, mobile, and backend
|
- [x] Define kill-switch semantics across web, mobile, and backend
|
||||||
@ -306,7 +309,7 @@ Build mobile as a real ecosystem surface, not a mock UI shell.
|
|||||||
|
|
||||||
- [x] Create Expo app structure following FastGap-style monorepo conventions
|
- [x] Create Expo app structure following FastGap-style monorepo conventions
|
||||||
- [x] Add product config bootstrap
|
- [x] Add product config bootstrap
|
||||||
- [-] Integrate `@bytelyst/react-native-platform-sdk`
|
- [x] Integrate `@bytelyst/react-native-platform-sdk`
|
||||||
- [x] Implement auth flow and session restore
|
- [x] Implement auth flow and session restore
|
||||||
- [x] Define secure storage and session invalidation behavior
|
- [x] Define secure storage and session invalidation behavior
|
||||||
- [x] Implement launch-time kill-switch and maintenance handling
|
- [x] Implement launch-time kill-switch and maintenance handling
|
||||||
@ -389,13 +392,13 @@ Validate that the new monorepo is safer and more coherent than the legacy setup
|
|||||||
|
|
||||||
### Checklist
|
### Checklist
|
||||||
|
|
||||||
- [ ] Add root verify scripts
|
- [x] Add root verify scripts
|
||||||
- [ ] Add backend contract tests
|
- [ ] Add backend contract tests
|
||||||
- [ ] Add web auth and kill-switch smoke tests
|
- [ ] Add web auth and kill-switch smoke tests
|
||||||
- [ ] Add mobile launch/auth/kill-switch smoke coverage
|
- [ ] Add mobile launch/auth/kill-switch smoke coverage
|
||||||
- [ ] Add docs for local dev, CI, Docker, and fallback behaviors
|
- [x] Add docs for local dev, CI, Docker, and fallback behaviors
|
||||||
- [ ] Define cutover sequencing from legacy repos
|
- [x] Define cutover sequencing from legacy repos
|
||||||
- [ ] Define rollback paths
|
- [x] Define rollback paths
|
||||||
- [ ] Define release go/no-go checklist
|
- [ ] Define release go/no-go checklist
|
||||||
- [ ] Define post-cutover monitoring checks
|
- [ ] Define post-cutover monitoring checks
|
||||||
|
|
||||||
@ -409,12 +412,12 @@ Validate that the new monorepo is safer and more coherent than the legacy setup
|
|||||||
|
|
||||||
## 9.1 Root / Repository Tasks
|
## 9.1 Root / Repository Tasks
|
||||||
|
|
||||||
- [ ] Create root `package.json`
|
- [x] Create root `package.json`
|
||||||
- [ ] Create `pnpm-workspace.yaml`
|
- [x] Create `pnpm-workspace.yaml`
|
||||||
- [ ] Create `.env.example`
|
- [x] Create `.env.example`
|
||||||
- [ ] Create `shared/product.json`
|
- [x] Create `shared/product.json`
|
||||||
- [ ] Create `scripts/verify.sh` or equivalent
|
- [x] Create `scripts/verify.sh` or equivalent
|
||||||
- [ ] Create root README
|
- [x] Create root README
|
||||||
- [ ] Create docker/dev orchestration model
|
- [ ] Create docker/dev orchestration model
|
||||||
- [ ] Define naming conventions and import boundaries
|
- [ ] Define naming conventions and import boundaries
|
||||||
|
|
||||||
@ -454,29 +457,29 @@ Validate that the new monorepo is safer and more coherent than the legacy setup
|
|||||||
|
|
||||||
### Recommended Commit Order
|
### Recommended Commit Order
|
||||||
|
|
||||||
- [ ] Repo scaffold and product identity
|
- [x] Repo scaffold and product identity
|
||||||
- [ ] Backend skeleton and config/auth contracts
|
- [x] Backend skeleton and config/auth contracts
|
||||||
- [ ] Backend runtime control and health contracts
|
- [x] Backend runtime control and health contracts
|
||||||
- [ ] Web shell and auth migration
|
- [x] Web shell and auth migration
|
||||||
- [ ] Web dashboard migration by tab priority
|
- [x] Web dashboard migration by tab priority
|
||||||
- [ ] Mobile bootstrap and auth
|
- [x] Mobile bootstrap and auth
|
||||||
- [ ] Mobile overview/alerts/positions/history
|
- [ ] Mobile overview/alerts/positions/history
|
||||||
- [ ] DRY cleanup
|
- [ ] DRY cleanup
|
||||||
- [ ] Verification and cutover docs
|
- [x] Verification and cutover docs
|
||||||
|
|
||||||
### Recommended Rollout Order
|
### Recommended Rollout Order
|
||||||
|
|
||||||
- [ ] Backend internal validation
|
- [x] Backend internal validation
|
||||||
- [ ] Web internal adoption
|
- [ ] Web internal adoption
|
||||||
- [ ] Mobile internal beta
|
- [ ] Mobile internal beta
|
||||||
- [ ] External / staged rollout
|
- [ ] External / staged rollout
|
||||||
|
|
||||||
## 11. Definition of Done by Phase
|
## 11. Definition of Done by Phase
|
||||||
|
|
||||||
- [ ] Docs are updated
|
- [x] Docs are updated
|
||||||
- [ ] Contracts are explicit
|
- [x] Contracts are explicit
|
||||||
- [ ] Verification has been run or the remaining gap is documented
|
- [x] Verification has been run or the remaining gap is documented
|
||||||
- [ ] Ownership boundaries are clearer than before the phase started
|
- [x] Ownership boundaries are clearer than before the phase started
|
||||||
|
|
||||||
## 12. Risks and Mitigations
|
## 12. Risks and Mitigations
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user