123 lines
9.0 KiB
Markdown
123 lines
9.0 KiB
Markdown
# Architecture Risk Analysis
|
|
|
|
Date: 2026-02-16
|
|
Scope: `bytelyst-trading-bot-service` + `bytelyst-trading-dashboard-web`
|
|
Mode: Risk analysis only (no redesign, no implementation proposal)
|
|
|
|
## Phase Roadmap Tracking
|
|
|
|
### Phase 1: Multi-Tenant Isolation Hardening
|
|
|
|
- [x] Partition runtime REST state by authenticated `user_id` for:
|
|
- `/api/status`
|
|
- `/api/alerts`
|
|
- `/api/symbol/:symbol`
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/a91e253
|
|
- [x] Remove global runtime websocket broadcast surfaces and emit tenant-scoped updates only.
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/a91e253
|
|
- [x] Prevent stale profile cache leakage on profile removal (`unregisterManualTrader` cleanup path).
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/a91e253
|
|
- [x] Enforce RLS coverage for `orders` and `trade_history` in schema + CI policy verification.
|
|
- Commits:
|
|
- https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/d9395b4
|
|
- [x] Add validation proving tenant isolation (`user A` cannot access `user B` profile/trade runtime data).
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/089af51
|
|
- [ ] Run `schema/009_tenant_rls_orders_trade_history.sql` on each target environment (dev/staging/prod) and verify policy existence in Supabase.
|
|
|
|
### Phase 2: Restart Durability & Snapshot
|
|
|
|
- [x] Create durable `bot_state_snapshots` table with `auth.users` FK and RLS guard, then verify presence via policy gate.
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/ae339ba
|
|
- [x] Replace file-only state restore/persistence with `SupabaseService.saveBotStateSnapshot`/`loadLatestBotStateSnapshot` backed by the new table and asynchronous `ApiServer` snapshot flow.
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/ae339ba
|
|
- [x] Rehydrate `TradeExecutor` on startup by pulling open orders from Supabase/exchange, rebuilding lifecycle tracking, and validating duplicates via DB queries instead of in-memory maps.
|
|
- Commit: https://github.com/saravanakumardb/bytelyst-trading-bot-service/commit/ae339ba
|
|
- [ ] Apply the new snapshot migration and ensure each environment records a valid `snapshot_user_id` (or derived owner) before relying on DB restore.
|
|
|
|
## Architecture Risk Analysis
|
|
|
|
- Mitigated in code (pending environment rollout): Multi-tenant state exposure risk.
|
|
- Runtime REST responses and websocket emissions are now tenant-scoped by authenticated user.
|
|
- Evidence (fix commits): `a91e253`, `089af51`.
|
|
- Critical: Snapshot architecture inconsistency for restart durability.
|
|
- Runtime writes snapshots with `user_id='system_backup'` while schema requires UUID FK.
|
|
- Startup flow does not actually restore from DB snapshot path.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:375`, `bytelyst-trading-bot-service/schema/008_schema_gap_backfill.sql:205`, `bytelyst-trading-bot-service/src/services/apiServer.ts:637`, `bytelyst-trading-bot-service/src/services/apiServer.ts:642`.
|
|
- High: Non-transactional lifecycle persistence across orders/history/runtime maps can drift under partial failures.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:597`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:642`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:801`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:920`.
|
|
- High: Single-process authority assumption.
|
|
- In-memory maps are primary state authorities, with no distributed coordination boundary.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:68`, `bytelyst-trading-bot-service/src/services/apiServer.ts:191`.
|
|
|
|
## Missing Enterprise Components
|
|
|
|
- Tenant-scoped state/event partitioning for WebSocket/REST runtime payloads.
|
|
- RLS validation coverage for `orders` and `trade_history` is now present in migration + policy checks; environment rollout remains.
|
|
- Evidence (fix commit): `d9395b4`.
|
|
- Durable structured audit sink for control-plane actions (currently log-line only).
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:399`.
|
|
- Metrics export wiring is not active in API/runtime integration.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/MetricsService.ts:4`.
|
|
|
|
## Structural Integrity Risks
|
|
|
|
- API profile state caches are write/update only, no explicit delete path in merge maps.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:1269`, `bytelyst-trading-bot-service/src/services/apiServer.ts:1285`.
|
|
- Lifecycle UI can become window-biased due to hard query limits, causing false orphan/mismatch interpretations.
|
|
- Evidence: `bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:346`, `bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:397`, `bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:664`.
|
|
- Client-side synthetic lifecycle IDs can diverge from backend trace truth.
|
|
- Evidence: `bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:293`, `bytelyst-trading-dashboard-web/src/tabs/PositionsTab.tsx:297`.
|
|
|
|
## Lifecycle Integrity Risks
|
|
|
|
- Finalization suppression when entry chain is missing protects integrity but can produce realized PnL visibility gaps.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:843`.
|
|
- Mixed-side virtual position reconstruction collapses to dominant side, masking opposite residual slices.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/SupabaseService.ts:1254`.
|
|
- Exit lifecycle control is symbol-scoped, not trade-scoped, for in-flight transition locking.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:548`.
|
|
|
|
## Capital Isolation Risks
|
|
|
|
- Capital checks rely on local in-memory open positions and exclude pending intent reservation.
|
|
- Evidence: `bytelyst-trading-bot-service/src/index.ts:131`, `bytelyst-trading-bot-service/src/services/AutoTrader.ts:167`, `bytelyst-trading-bot-service/src/services/ManualTrader.ts:26`.
|
|
- Concurrent profile evaluations per symbol can pass guard checks before committed state convergence.
|
|
- Evidence: `bytelyst-trading-bot-service/src/index.ts:523`, `bytelyst-trading-bot-service/src/index.ts:530`.
|
|
|
|
## Exchange Reconciliation Risks
|
|
|
|
- Reconciliation parity checks only recent subset (`limit=25` per profile), allowing long-tail drift persistence.
|
|
- Evidence: `bytelyst-trading-bot-service/src/index.ts:845`.
|
|
- Pending-order recovery scope (`pending_new`) is narrower than stale-order sync scope (`pending_new|pending|accepted|new`).
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/SupabaseService.ts:655`, `bytelyst-trading-bot-service/src/services/SupabaseService.ts:597`.
|
|
- Runtime pending order broadcast normalizes to `pending_new`, which can transiently differ from DB/exchange truth.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:957`.
|
|
|
|
## Restart Recovery Risks
|
|
|
|
- Critical idempotency and lifecycle coordination maps are process-local and reset on restart.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:73`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:74`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:75`, `bytelyst-trading-bot-service/src/services/TradeExecutor.ts:76`.
|
|
- Startup restore sequence prefers local file; DB snapshot restore path is intentionally not used.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:621`, `bytelyst-trading-bot-service/src/services/apiServer.ts:642`.
|
|
|
|
## Concurrency and Locking Risks
|
|
|
|
- Loop guards are local booleans only; no shared lock domain across services/instances.
|
|
- Evidence: `bytelyst-trading-bot-service/src/index.ts:462`, `bytelyst-trading-bot-service/src/index.ts:463`, `bytelyst-trading-bot-service/src/services/tradeMonitor.ts:10`, `bytelyst-trading-bot-service/src/services/OrderStatusSyncService.ts:47`.
|
|
- Profile-scale fan-out creates one monitor and one order-sync worker per profile plus orphan sync workers, expanding race and rate-pressure surfaces.
|
|
- Evidence: `bytelyst-trading-bot-service/src/index.ts:196`, `bytelyst-trading-bot-service/src/index.ts:198`, `bytelyst-trading-bot-service/src/index.ts:207`.
|
|
|
|
## Observability Gaps
|
|
|
|
- Readiness signal only tracks loop recency and exchange connectivity flag; lacks explicit monitor/order-sync/profile-sync liveness SLO dimensions.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:352`.
|
|
- No persistent operational audit table for lifecycle control actions.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/apiServer.ts:399`.
|
|
- Metrics service exists but is not exposed as runtime telemetry endpoint.
|
|
- Evidence: `bytelyst-trading-bot-service/src/services/MetricsService.ts:114`.
|
|
|
|
## Production Readiness Score
|
|
|
|
- Score: **58 / 100**
|
|
- Rationale: Core lifecycle execution logic is materially hardened, but enterprise-grade readiness is constrained by tenant data exposure, restart durability gaps, local-process locking assumptions, and incomplete production observability integration.
|